Future of Information and Communication Conference (FICC) 2024
4-5 April 2024
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 14 Issue 10, 2023.
Abstract: Despite the plethora and diversity of research on Natural Language Processing (NLP). As a technique allowing computers to understand, generate, and manipulate human language; It still remains insufficient, especially with regard to the processing of Arabic texts and their dialects which are widely used. The proposed approach focuses on the application of machine learning techniques taking into account evaluation criteria such as training to comments expressed in Mauritanian dialect, published on social media notably Facebook, and compares results generated by three algorithms which we applied such as the Random Forest (RF), Na¨ıve Bayes Multinominal (NBM), and Logistic Regression (LR) algorithm. Additionally, We then study the effect of machine learning techniques when different stemmers are combined with other features such as the tokenizers used to process the dataset. Although major challenges exist such as the morphology of Arabic is completely different from Latin letter languages, and there is no pre-existing dataset or dictionary to train the algorithms, the result we obtained after the experiments carried out on Weka shows that the RF and NBM algorithms are more efficient when applied with ArbicStemmerKhoja giving results respectively 96.37% and 71.40%; However, Logistic gets better performance results with Null Stemme is 81.65%. Results obtained by the three techniques applied with a light Arabic stemmer were more than 70%. This article presents a contribution to NLP based on Machine learning, descript also an important study that can determine the best Arabic classifier.
Mohamed El Moustapha El Arby CHRIF, Cheikhane Seyed, Cheikhne Mohamed Mahmoud, EL BENANY Mohamed Mahmoud, Fatimetou Mint Mohamed-Saleck, Moustapha Mohamed Saleck, Omar EL BEQQALI and Mohamedade Farouk NANNE, “Investigate the Impact of Stemming on Mauritanian Dialect Classification using Machine Learning Techniques” International Journal of Advanced Computer Science and Applications(IJACSA), 14(10), 2023. http://dx.doi.org/10.14569/IJACSA.2023.01410106
@article{CHRIF2023,
title = {Investigate the Impact of Stemming on Mauritanian Dialect Classification using Machine Learning Techniques},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2023.01410106},
url = {http://dx.doi.org/10.14569/IJACSA.2023.01410106},
year = {2023},
publisher = {The Science and Information Organization},
volume = {14},
number = {10},
author = {Mohamed El Moustapha El Arby CHRIF and Cheikhane Seyed and Cheikhne Mohamed Mahmoud and EL BENANY Mohamed Mahmoud and Fatimetou Mint Mohamed-Saleck and Moustapha Mohamed Saleck and Omar EL BEQQALI and Mohamedade Farouk NANNE}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.