Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity

Khalid Binsaeed; Gianluca Stringhini; Ahmed E. Youssef

doi:10.14569/IJACSA.2020.0111103

DOI: 10.14569/IJACSA.2020.0111103

PDF

Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity

Author 1: Khalid Binsaeed

Author 2: Gianluca Stringhini

Author 3: Ahmed E. Youssef

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 11 Issue 11, 2020.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: Detecting Internet malicious activities has been and continues to be a critical issue that needs to be addressed effectively. This is essential to protect our personal information, computing resources, and financial capitals from unsolicited actions, such as, credential information theft, downloading and installing malware, extortion, etc. The introduction of the social media such as Twitter has given malicious users a new and a promising platform to perform their activities, ranging from a simple spam message to taking a full control over the victim’s machine. Twitter revealed that its algorithms for detecting spam are not very effective; most of the trending hashtags include unrelated spam and advertising tweets which indicates that there is a problem with the currently used spam detection framework. This paper proposes a new approach for detecting spam in Twitter microblogging using Machine Learning (ML) techniques and domain popularity services. The proposed approach comprises two main stages: 1) Tweets are collected periodically and filtered by selecting the ones that appear more frequently than a decided threshold in the specified period (i.e. common tweets). Then, an inspection is conducted on the common tweets by checking the associated URL domain with Alexa’s top one million globally viewed websites. If a tweet is common on Twitter but does not appear on the top one million globally viewed websites, it is flagged as a potential spam. 2) The second stage kicks in by running ML algorithms on the flagged tweets to extract features that help detect the cluster of spam and prevent it in real-time. The performance of the proposed approach has been evaluated using three most popular classification models (random forest, J48, and Naïve Bayes). For all classifiers, results showed the effectiveness of the proposed method in terms of different performance metrics (e.g. precision, sensitivity, F1-score, accuracy) and using different test scenarios.

Keywords: Spam detection; phishing detection; domain popularity; machine learning; Twitter

Khalid Binsaeed, Gianluca Stringhini and Ahmed E. Youssef, “Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity” International Journal of Advanced Computer Science and Applications(IJACSA), 11(11), 2020. http://dx.doi.org/10.14569/IJACSA.2020.0111103

@article{Binsaeed2020,
title = {Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2020.0111103},
url = {http://dx.doi.org/10.14569/IJACSA.2020.0111103},
year = {2020},
publisher = {The Science and Information Organization},
volume = {11},
number = {11},
author = {Khalid Binsaeed and Gianluca Stringhini and Ahmed E. Youssef}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Detecting Spam in Twitter Microblogging Services: A Novel Machine Learning Approach based on Domain Popularity

Upcoming Conferences