The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Digital Archiving Policy
  • Promote your Publication
  • Metadata Harvesting (OAI2)

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • Guest Editors
  • SUSAI-EE 2025
  • ICONS-BA 2025
  • IoT-BLOCK 2025

Future of Information and Communication Conference (FICC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • Subscribe

DOI: 10.14569/IJACSA.2024.01510107
PDF

MH-LViT: Multi-path Hybrid Lightweight ViT Models with Enhancement Training

Author 1: Yating Li
Author 2: Wenwu He
Author 3: Shuli Xing
Author 4: Hengliang Zhu

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 15 Issue 10, 2024.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: Vision Transformers (ViTs) have become increasingly popular in various vision tasks. However, it also becomes challenging to adapt them to applications where computation resources are very limited. To this end, we propose a novel multi-path hybrid architecture and develop a series of lightweight ViT (MH-LViT) models to balance well performance and complexity. Specifically, a triple-path architecture is exploited to facilitate feature representation learning that divides and shuffles image features in channels following a feature scale balancing strategy. In the first path ViTs are utilized to extract global features while in the second path CNNs are introduced to focus more on local features extraction. The third path completes the representation learning with a residual connection. Based on the developed lightweight models, a novel knowledge distillation framework IntPNKD (Normalized Knowledge Distillation with Intermediate Layer Prediction Alignment) is proposed to enhance their representation ability, and in the meanwhile, an additional Mixup regularization term is introduced to further improve their generalization ability. Experimental results on benchmark datasets show that, with the multi-path architecture, the developed lightweight models perform well by utilizing existing CNN and ViT components, and with the proposed model enhancement training methods, the resultant models outperform notably their competitors. For example, on dataset miniImageNet, our MH-LViT M3 improves the top-1 accuracy by 4.43% and runs 4x faster on GPU, compared with EdgeViT-S; on dataset CIFA10, our MH-LViT M1 improves the top-1 accuracy by 1.24% and the enhanced version MH-LViT M1* by 2.28%, compared to the recent model EfficientViT M1.

Keywords: Multi-path hybrid; lightweight ViT; normalized knowledge distillation; Mixup regularization

Yating Li, Wenwu He, Shuli Xing and Hengliang Zhu, “MH-LViT: Multi-path Hybrid Lightweight ViT Models with Enhancement Training” International Journal of Advanced Computer Science and Applications(IJACSA), 15(10), 2024. http://dx.doi.org/10.14569/IJACSA.2024.01510107

@article{Li2024,
title = {MH-LViT: Multi-path Hybrid Lightweight ViT Models with Enhancement Training},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2024.01510107},
url = {http://dx.doi.org/10.14569/IJACSA.2024.01510107},
year = {2024},
publisher = {The Science and Information Organization},
volume = {15},
number = {10},
author = {Yating Li and Wenwu He and Shuli Xing and Hengliang Zhu}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

IntelliSys 2025

28-29 August 2025

  • Amsterdam, The Netherlands

Future Technologies Conference 2025

6-7 November 2025

  • Munich, Germany

Healthcare Conference 2026

21-22 May 2026

  • Amsterdam, The Netherlands

Computing Conference 2026

9-10 July 2026

  • London, United Kingdom

IntelliSys 2026

3-4 September 2026

  • Amsterdam, The Netherlands

Computer Vision Conference 2026

15-16 October 2026

  • Berlin, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference
  • Communication Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

© The Science and Information (SAI) Organization Limited. All rights reserved. Registered in England and Wales. Company Number 8933205. thesai.org