The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Digital Archiving Policy
  • Promote your Publication
  • Metadata Harvesting (OAI2)

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • Guest Editors
  • SUSAI-EE 2025
  • ICONS-BA 2025
  • IoT-BLOCK 2025

Future of Information and Communication Conference (FICC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • Subscribe

DOI: 10.14569/IJACSA.2024.01511129
PDF

Optimized SMS Spam Detection Using SVM-DistilBERT and Voting Classifier: A Comparative Study on the Impact of Lemmatization

Author 1: Sinar Nadhif Ilyasa
Author 2: Alaa Omar Khadidos

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 15 Issue 11, 2024.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: The rapid growth of digital communication has led to a surge in spam messages, particularly through Short Message Service (SMS). These unsolicited messages pose risks such as phishing and malware, necessitating robust detection mechanisms. This study focuses on a comparative analysis of machine learning models for SMS spam detection, with a particular emphasis on a proposed SVM-DistilBERT model enhanced by a voting classifier. Using the UCI SMS Spam dataset, the models are evaluated based on recall, accuracy, precision, and Receiver Operating Characteristic Area Under the Curve (ROC AUC) scores to assess their effectiveness in correctly identifying spam messages. By leveraging Optuna for hyperparameter optimization, the proposed model achieves superior performance, with an accuracy of 99.6%, surpassing traditional methods like SVM with TF-IDF Bi-gram and AdaBoost, which achieved 98.03%. The study also examines the effects of lemmatization and synonym data augmentation, with lemmatization shown to improve spam detection by reducing feature space redundancy and enhancing semantic understanding. To ensure transparency in decision-making, Local Interpretable Model-Agnostic Explanations (LIME) is applied. The results demonstrate that the optimized SVM-DistilBERT with the voting classifier offers a robust and effective solution for SMS spam filtering.

Keywords: SMS spam detection; Support Vector Machine (SVM); DistilBERT; hyperparameter optimization; LIME

Sinar Nadhif Ilyasa and Alaa Omar Khadidos, “Optimized SMS Spam Detection Using SVM-DistilBERT and Voting Classifier: A Comparative Study on the Impact of Lemmatization” International Journal of Advanced Computer Science and Applications(IJACSA), 15(11), 2024. http://dx.doi.org/10.14569/IJACSA.2024.01511129

@article{Ilyasa2024,
title = {Optimized SMS Spam Detection Using SVM-DistilBERT and Voting Classifier: A Comparative Study on the Impact of Lemmatization},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2024.01511129},
url = {http://dx.doi.org/10.14569/IJACSA.2024.01511129},
year = {2024},
publisher = {The Science and Information Organization},
volume = {15},
number = {11},
author = {Sinar Nadhif Ilyasa and Alaa Omar Khadidos}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

Future of Information and Communication Conference (FICC) 2025

28-29 April 2025

  • Berlin, Germany

Computing Conference 2025

19-20 June 2025

  • London, United Kingdom

IntelliSys 2025

28-29 August 2025

  • Amsterdam, The Netherlands

Future Technologies Conference (FTC) 2025

6-7 November 2025

  • Munich, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference
  • Communication Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

© The Science and Information (SAI) Organization Limited. All rights reserved. Registered in England and Wales. Company Number 8933205. thesai.org