Computer Vision Conference (CVC) 2026
16-17 April 2026
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 16 Issue 5, 2025.
Abstract: Speech Emotion Recognition (SER), a pivotal area in artificial intelligence, is dedicated to analyzing and interpreting emotional information in human speech. To address the challenges of capturing both local acoustic features and long-range dependencies in emotional speech, this study proposes a novel parallel neural network architecture that integrates Convolutional Neural Networks (CNNs) and Transformer encoders. To integrate the distinct feature representations captured by the two branches, a cross-attention mechanism is employed for feature-level fusion, enabling deep-level semantic interaction and enhancing the model’s emotion discrimination capacity. To improve model generalization and robustness, a systematic preprocessing pipeline is constructed, including signal normalization, data segmentation, additive white Gaussian noise (AWGN) augmentation with varying SNR levels, and Mel spectrogram feature extraction. A grid search strategy is adopted to optimize key hyperparameters such as learning rate, dropout rate, and batch size. Extensive experiments conducted on the RAVDESS dataset, consisting of eight emotional categories, demonstrate that our model achieves an overall accuracy of 80.00%, surpassing existing methods such as CNN-based (71.61%), multilingual CNN (77.60%), bimodal LSTM-attention (65.42%), and unsupervised feature learning (69.06%) models. Further analyses reveal its robustness across different gender groups and emotional intensities. Such outcomes highlight the architectural soundness of our model and underscore its potential to inform subsequent developments in affective speech processing.
Zhongliang Wei, Chang Ge, Chang Su, Ruofan Chen and Jing Sun, “A Deep Learning Model for Speech Emotion Recognition on RAVDESS Dataset” International Journal of Advanced Computer Science and Applications(IJACSA), 16(5), 2025. http://dx.doi.org/10.14569/IJACSA.2025.0160531
@article{Wei2025,
title = {A Deep Learning Model for Speech Emotion Recognition on RAVDESS Dataset},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2025.0160531},
url = {http://dx.doi.org/10.14569/IJACSA.2025.0160531},
year = {2025},
publisher = {The Science and Information Organization},
volume = {16},
number = {5},
author = {Zhongliang Wei and Chang Ge and Chang Su and Ruofan Chen and Jing Sun}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.