Future of Information and Communication Conference (FICC) 2024
4-5 April 2024
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 12 Issue 10, 2021.
Abstract: Due to the advances in technology, social media has become the most popular means for the propagation of news. Many news items are published on social media like Facebook, Twitter, Instagram, etc. but are not categorized into various different domains, such as politics, education, finance, art, sports, and health. Thus, text classification is needed to classify the news into different domains to reduce the huge amount of news available over social media, reduce time and effort for recognizing the category or domain, and present data to improve the searching process. Most existing datasets don’t follow pre-processing and filtering processes and aren’t organized based on classification standards to be ready for use. Thus, the Arabic Natural Processing Language (ANLP) phases will be used to pre-process, normalize, and categorize the news into the right domain. This paper proposes an Arabic Social Media Dataset (SMAD) for text classification purposes over the social media using ANLP steps. The SMAD dataset consists of 15,240 Arabic news items categorized over the Facebook social network. The experimental results illustrate that the SMAD corpus gives accuracy of about 98% in five domains (Art, Education, Health, Politics, and Sport). The SMAD dataset has been trained tested and is ready for use.
Amira M. Gaber, Mohamed Nour El-din and Hanan Moussa, “SMAD: Text Classification of Arabic Social Media Dataset for News Sources” International Journal of Advanced Computer Science and Applications(IJACSA), 12(10), 2021. http://dx.doi.org/10.14569/IJACSA.2021.0121058
@article{Gaber2021,
title = {SMAD: Text Classification of Arabic Social Media Dataset for News Sources},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2021.0121058},
url = {http://dx.doi.org/10.14569/IJACSA.2021.0121058},
year = {2021},
publisher = {The Science and Information Organization},
volume = {12},
number = {10},
author = {Amira M. Gaber and Mohamed Nour El-din and Hanan Moussa}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.