Computer Vision Conference (CVC) 2026
21-22 May 2026
Publication Links
IJACSA
Special Issues
Computer Vision Conference (CVC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 5, 2026.
Abstract: Fine-Grained Image Classification focuses on unique features between visually similar subclasses within a wider category, which remains a challenging task due to low inter-class variations and high intra-class similarity. Conventional Convolutional Neural Network-based methods often struggle to accurately capture these minor differences. Utilizing self-attention techniques to represent global relationships within images, Vision Transformers have recently demonstrated robust performance in image classification evaluations. To enhance classification performance on complicated visual categories, this research presents a Fine-Grained Image Classification framework utilizing the Vision Transformer Model. The CIFAR-100 dataset, which includes 100 different image classes, is used for experimental purposes. The images were up-sampled because the Vision Transformer demands higher resolution inputs. To improve training efficiency and generalization, preprocessing techniques, including normalization and data augmentation, are applied. The model is trained and evaluated using standard performance metrics, including accuracy, macro precision, macro recall, and macro F1 Score, to ensure a balanced evaluation across all classes. With an overall classification accuracy of 89.68% and good macro-level assessment scores, experimental results show that the Vision Transformer Model successfully captures subtle visual distinctions among comparable categories. Transformer-based architectures offer an effective substitute for conventional techniques in Fine-Grained Image Classification applications with better performance. This research demonstrates how the Vision Transformer Model can increase classification robustness and accuracy for a dataset with very similar item classes.
Zunaira Saleem, Uzma Jamil and Saman Iftikhar. “Fine-Grained Image Classification Using Vision Transformer Model”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.5 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170535
@article{Saleem2026,
title = {Fine-Grained Image Classification Using Vision Transformer Model},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170535},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170535},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {5},
author = {Zunaira Saleem and Uzma Jamil and Saman Iftikhar}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.