Computer Vision Conference (CVC) 2026
21-22 May 2026
Publication Links
IJACSA
Special Issues
Computer Vision Conference (CVC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 1, 2026.
Abstract: This article presents a systematic review of image captioning approaches conducted according to the PRISMA methodology, ensuring a rigorous, transparent, and reproducible analysis of the literature. The study traces the evolution of image captioning methods, beginning with early machine learning–based techniques that rely on handcrafted visual features, object detection, and template-based or statistical language models. While these approaches established foundational concepts, they are constrained by limited scalability and semantic expressiveness. Specific challenges include difficulty in capturing complex object relationships and inability to generate diverse descriptions for the same image. Image captioning represents a key research problem at the intersection of computer vision and natural language processing, aiming to automatically generate coherent and semantically accurate textual descriptions of visual content. Due to its multimodal nature and practical relevance, it has attracted increasing attention in artificial intelligence research. The review then examines the transition toward deep learning–based models, which have become dominant due to their improved performance. Encoder–decoder architectures are analyzed, highlighting the use of convolutional neural networks for visual representation and recurrent neural networks for caption generation. Attention-based models are discussed for their ability to focus on salient image regions, followed by reinforcement learning–based methods that directly optimize evaluation metrics and semantic-driven architectures that enhance caption relevance. Finally, recent advances based on Transformer architectures and large-scale multimodal pretraining are reviewed, along with key application domains and open challenges for future research in image captioning.
Abdelkrim SAOUABE, Khalid TIZRA and Doha BANOUI. “Evolution of Image Captioning Models: A Systematic PRISMA Review”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.1 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170184
@article{SAOUABE2026,
title = {Evolution of Image Captioning Models: A Systematic PRISMA Review},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170184},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170184},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {1},
author = {Abdelkrim SAOUABE and Khalid TIZRA and Doha BANOUI}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.