Future of Information and Communication Conference (FICC) 2024
4-5 April 2024
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 14 Issue 10, 2023.
Abstract: Image captioning is an advanced NLP task that has various practical applications. To meet the requirement of visual information understanding and textual information generation, the encoder-decoder framework has been widely adopted by image captioning models. In this context, the encoder is responsible for transforming an image into vector representation, and the decoder acts as a text generator for yielding an image caption. It is obvious and intuitive that the decoder is crucial for the entire image captioning model. However, there is a lack of comprehensive studies in which the impact of various aspects of the decoder on the image captioning is investigated. To advance the understanding of the impacts of text generation techniques employed by the decoder, we conduct an extensive empirical analysis of three types of language models, two types of decoding strategies and two types of training methods, based on four state-of-the-art image captioning models. Our experimental results demonstrate that the language model affects the performance of image captioning models, while different language models may benefit different image captioning models. In addition, it is also revealed that among the decoding and training strategies under investigation, the beam search, AOA mechanism and the reinforcement learning based training method can generally improve the performance of image captioning models. Moreover, the results also show that the combinational usage of these strategies always outperforms the use of single strategy for the task of image captioning.
Linna Ding, Mingyue Jiang, Liming Nie, Zuzhang Qing and Zuohua Ding, “The Impact of Text Generation Techniques on Neural Image Captioning: An Empirical Study” International Journal of Advanced Computer Science and Applications(IJACSA), 14(10), 2023. http://dx.doi.org/10.14569/IJACSA.2023.01410114
@article{Ding2023,
title = {The Impact of Text Generation Techniques on Neural Image Captioning: An Empirical Study},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2023.01410114},
url = {http://dx.doi.org/10.14569/IJACSA.2023.01410114},
year = {2023},
publisher = {The Science and Information Organization},
volume = {14},
number = {10},
author = {Linna Ding and Mingyue Jiang and Liming Nie and Zuzhang Qing and Zuohua Ding}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.