Future of Information and Communication Conference (FICC) 2025
28-29 April 2025
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 15 Issue 11, 2024.
Abstract: The rapid growth of digital communication has led to a surge in spam messages, particularly through Short Message Service (SMS). These unsolicited messages pose risks such as phishing and malware, necessitating robust detection mechanisms. This study focuses on a comparative analysis of machine learning models for SMS spam detection, with a particular emphasis on a proposed SVM-DistilBERT model enhanced by a voting classifier. Using the UCI SMS Spam dataset, the models are evaluated based on recall, accuracy, precision, and Receiver Operating Characteristic Area Under the Curve (ROC AUC) scores to assess their effectiveness in correctly identifying spam messages. By leveraging Optuna for hyperparameter optimization, the proposed model achieves superior performance, with an accuracy of 99.6%, surpassing traditional methods like SVM with TF-IDF Bi-gram and AdaBoost, which achieved 98.03%. The study also examines the effects of lemmatization and synonym data augmentation, with lemmatization shown to improve spam detection by reducing feature space redundancy and enhancing semantic understanding. To ensure transparency in decision-making, Local Interpretable Model-Agnostic Explanations (LIME) is applied. The results demonstrate that the optimized SVM-DistilBERT with the voting classifier offers a robust and effective solution for SMS spam filtering.
Sinar Nadhif Ilyasa and Alaa Omar Khadidos, “Optimized SMS Spam Detection Using SVM-DistilBERT and Voting Classifier: A Comparative Study on the Impact of Lemmatization” International Journal of Advanced Computer Science and Applications(IJACSA), 15(11), 2024. http://dx.doi.org/10.14569/IJACSA.2024.01511129
@article{Ilyasa2024,
title = {Optimized SMS Spam Detection Using SVM-DistilBERT and Voting Classifier: A Comparative Study on the Impact of Lemmatization},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2024.01511129},
url = {http://dx.doi.org/10.14569/IJACSA.2024.01511129},
year = {2024},
publisher = {The Science and Information Organization},
volume = {15},
number = {11},
author = {Sinar Nadhif Ilyasa and Alaa Omar Khadidos}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.