Future of Information and Communication Conference (FICC) 2025
28-29 April 2025
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 15 Issue 10, 2024.
Abstract: Vision Transformers (ViTs) have become increasingly popular in various vision tasks. However, it also becomes challenging to adapt them to applications where computation resources are very limited. To this end, we propose a novel multi-path hybrid architecture and develop a series of lightweight ViT (MH-LViT) models to balance well performance and complexity. Specifically, a triple-path architecture is exploited to facilitate feature representation learning that divides and shuffles image features in channels following a feature scale balancing strategy. In the first path ViTs are utilized to extract global features while in the second path CNNs are introduced to focus more on local features extraction. The third path completes the representation learning with a residual connection. Based on the developed lightweight models, a novel knowledge distillation framework IntPNKD (Normalized Knowledge Distillation with Intermediate Layer Prediction Alignment) is proposed to enhance their representation ability, and in the meanwhile, an additional Mixup regularization term is introduced to further improve their generalization ability. Experimental results on benchmark datasets show that, with the multi-path architecture, the developed lightweight models perform well by utilizing existing CNN and ViT components, and with the proposed model enhancement training methods, the resultant models outperform notably their competitors. For example, on dataset miniImageNet, our MH-LViT M3 improves the top-1 accuracy by 4.43% and runs 4x faster on GPU, compared with EdgeViT-S; on dataset CIFA10, our MH-LViT M1 improves the top-1 accuracy by 1.24% and the enhanced version MH-LViT M1* by 2.28%, compared to the recent model EfficientViT M1.
Yating Li, Wenwu He, Shuli Xing and Hengliang Zhu, “MH-LViT: Multi-path Hybrid Lightweight ViT Models with Enhancement Training” International Journal of Advanced Computer Science and Applications(IJACSA), 15(10), 2024. http://dx.doi.org/10.14569/IJACSA.2024.01510107
@article{Li2024,
title = {MH-LViT: Multi-path Hybrid Lightweight ViT Models with Enhancement Training},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2024.01510107},
url = {http://dx.doi.org/10.14569/IJACSA.2024.01510107},
year = {2024},
publisher = {The Science and Information Organization},
volume = {15},
number = {10},
author = {Yating Li and Wenwu He and Shuli Xing and Hengliang Zhu}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.