Computer Vision Conference (CVC) 2026
21-22 May 2026
Publication Links
IJACSA
Special Issues
Computer Vision Conference (CVC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 16 Issue 7, 2025.
Abstract: This study proposes an optimized Transformer model for multimodal data fusion tasks, designed to address the challenges of data fusion from different modes such as text, image, and audio. By improving data preprocessing methods, optimizing model architecture and fusion strategies, the study significantly improves the performance of the model in multimodal tasks. The experimental results show that the optimized model is superior to the benchmark model and other comparison models in key indicators such as accuracy, recall, F1 score and AUC value, and shows stronger performance and higher stability. In particular, the research solves the problems of data heterogeneity and computing resource consumption by introducing a weighted fusion strategy, multi-head self-attention mechanism and lightweight design. At the same time, the processing of missing modal data is optimized to enhance the robustness of the model. Despite the remarkable results, there are still challenges such as data heterogeneity, computational efficiency, and missing modal data. Future research can further optimize modal alignment methods and data preprocessing techniques to improve the performance of the model in practical applications. This research provides a new idea and direction for the application and development of multimodal data fusion technology.
Shanshan Yang and Jie peng. “Transformer Model Optimization Method for Multi-Modal Data Fusion”. International Journal of Advanced Computer Science and Applications (IJACSA) 16.7 (2025). http://dx.doi.org/10.14569/IJACSA.2025.0160708
@article{Yang2025,
title = {Transformer Model Optimization Method for Multi-Modal Data Fusion},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2025.0160708},
url = {http://dx.doi.org/10.14569/IJACSA.2025.0160708},
year = {2025},
publisher = {The Science and Information Organization},
volume = {16},
number = {7},
author = {Shanshan Yang and Jie peng}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.