The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Outstanding Reviewers

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • ICONS_BA 2025

Computer Vision Conference (CVC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • RSS Feed

DOI: 10.14569/IJACSA.2026.0170535
PDF

Fine-Grained Image Classification Using Vision Transformer Model

Author 1: Zunaira Saleem
Author 2: Uzma Jamil
Author 3: Saman Iftikhar

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 5, 2026.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: Fine-Grained Image Classification focuses on unique features between visually similar subclasses within a wider category, which remains a challenging task due to low inter-class variations and high intra-class similarity. Conventional Convolutional Neural Network-based methods often struggle to accurately capture these minor differences. Utilizing self-attention techniques to represent global relationships within images, Vision Transformers have recently demonstrated robust performance in image classification evaluations. To enhance classification performance on complicated visual categories, this research presents a Fine-Grained Image Classification framework utilizing the Vision Transformer Model. The CIFAR-100 dataset, which includes 100 different image classes, is used for experimental purposes. The images were up-sampled because the Vision Transformer demands higher resolution inputs. To improve training efficiency and generalization, preprocessing techniques, including normalization and data augmentation, are applied. The model is trained and evaluated using standard performance metrics, including accuracy, macro precision, macro recall, and macro F1 Score, to ensure a balanced evaluation across all classes. With an overall classification accuracy of 89.68% and good macro-level assessment scores, experimental results show that the Vision Transformer Model successfully captures subtle visual distinctions among comparable categories. Transformer-based architectures offer an effective substitute for conventional techniques in Fine-Grained Image Classification applications with better performance. This research demonstrates how the Vision Transformer Model can increase classification robustness and accuracy for a dataset with very similar item classes.

Keywords: Data augmentation; fine-grained image classification; CIFAR-100; vision transformer model; deep learning

Zunaira Saleem, Uzma Jamil and Saman Iftikhar. “Fine-Grained Image Classification Using Vision Transformer Model”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.5 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170535

@article{Saleem2026,
title = {Fine-Grained Image Classification Using Vision Transformer Model},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170535},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170535},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {5},
author = {Zunaira Saleem and Uzma Jamil and Saman Iftikhar}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

Computer Vision Conference (CVC) 2026

21-22 May 2026

  • Amsterdam, The Netherlands

Computing Conference 2026

9-10 July 2026

  • London, United Kingdom

Artificial Intelligence Conference 2026

3-4 September 2026

  • Amsterdam, The Netherlands

Future Technologies Conference (FTC) 2026

15-16 October 2026

  • Berlin, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computer Vision Conference
  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

The Science and Information (SAI) Organization Limited is a company registered in England and Wales under Company Number 8933205.