The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Outstanding Reviewers

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • ICONS_BA 2025

Computer Vision Conference (CVC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • RSS Feed

DOI: 10.14569/IJACSA.2026.0170184
PDF

Evolution of Image Captioning Models: A Systematic PRISMA Review

Author 1: Abdelkrim SAOUABE
Author 2: Khalid TIZRA
Author 3: Doha BANOUI

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 1, 2026.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: This article presents a systematic review of image captioning approaches conducted according to the PRISMA methodology, ensuring a rigorous, transparent, and reproducible analysis of the literature. The study traces the evolution of image captioning methods, beginning with early machine learning–based techniques that rely on handcrafted visual features, object detection, and template-based or statistical language models. While these approaches established foundational concepts, they are constrained by limited scalability and semantic expressiveness. Specific challenges include difficulty in capturing complex object relationships and inability to generate diverse descriptions for the same image. Image captioning represents a key research problem at the intersection of computer vision and natural language processing, aiming to automatically generate coherent and semantically accurate textual descriptions of visual content. Due to its multimodal nature and practical relevance, it has attracted increasing attention in artificial intelligence research. The review then examines the transition toward deep learning–based models, which have become dominant due to their improved performance. Encoder–decoder architectures are analyzed, highlighting the use of convolutional neural networks for visual representation and recurrent neural networks for caption generation. Attention-based models are discussed for their ability to focus on salient image regions, followed by reinforcement learning–based methods that directly optimize evaluation metrics and semantic-driven architectures that enhance caption relevance. Finally, recent advances based on Transformer architectures and large-scale multimodal pretraining are reviewed, along with key application domains and open challenges for future research in image captioning.

Keywords: Image captioning; vision-language models; semantic-based models; transformer models; attention mechanism; pre-trained models; GPT-based models

Abdelkrim SAOUABE, Khalid TIZRA and Doha BANOUI. “Evolution of Image Captioning Models: A Systematic PRISMA Review”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.1 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170184

@article{SAOUABE2026,
title = {Evolution of Image Captioning Models: A Systematic PRISMA Review},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170184},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170184},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {1},
author = {Abdelkrim SAOUABE and Khalid TIZRA and Doha BANOUI}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

Computer Vision Conference (CVC) 2026

21-22 May 2026

  • Amsterdam, The Netherlands

Computing Conference 2026

9-10 July 2026

  • London, United Kingdom

Artificial Intelligence Conference 2026

3-4 September 2026

  • Amsterdam, The Netherlands

Future Technologies Conference (FTC) 2026

15-16 October 2026

  • Berlin, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computer Vision Conference
  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

The Science and Information (SAI) Organization Limited is a company registered in England and Wales under Company Number 8933205.