Future of Information and Communication Conference (FICC) 2024
4-5 April 2024
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 9 Issue 9, 2018.
Abstract: In the recent years, Arabic Natural Language Processing, including Text summarization, Text simplification, Text Categorization and other Natural Language-related disciplines, are attracting more researchers. Appropriate resources for Arabic Text Categorization are becoming a big necessity for the development of this research. The few existing corpora are not ready for use, they require preprocessing and filtering operations. In addition, most of them are not organized based on standard classification methods which makes unbalanced classes and thus reduced the classification accuracy. This paper proposes a New Arabic Dataset (NADA) for Text Categorization purpose. This corpus is composed of two existing corpora OSAC and DAA. The new corpus is preprocessed and filtered using the recent state of the art methods. It is also organized based on Dewey decimal classification scheme and Synthetic Minority Over-Sampling Technique. The experiment results show that NADA is an efficient dataset ready for use in Arabic Text Categorization.
Nada Alalyani and Souad Larabi Marie-Sainte, “NADA: New Arabic Dataset for Text Classification” International Journal of Advanced Computer Science and Applications(IJACSA), 9(9), 2018. http://dx.doi.org/10.14569/IJACSA.2018.090928
@article{Alalyani2018,
title = {NADA: New Arabic Dataset for Text Classification},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2018.090928},
url = {http://dx.doi.org/10.14569/IJACSA.2018.090928},
year = {2018},
publisher = {The Science and Information Organization},
volume = {9},
number = {9},
author = {Nada Alalyani and Souad Larabi Marie-Sainte}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.