Future of Information and Communication Conference (FICC) 2025
28-29 April 2025
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 7 Issue 8, 2016.
Abstract: Text classification is a tool to assign the predefined categories to the text documents using supervised machine learning algorithms. It has various practical applications like spam detection, sentiment detection, and detection of a natural language. Based on the idea we applied five well-known classification techniques on Urdu language corpus and assigned a class to the documents using majority voting. The corpus contains 21769 news documents of seven categories (Business, Entertainment, Culture, Health, Sports, and Weird). The algorithms were not able to work directly on the data, so we applied the preprocessing techniques like tokenization, stop words removal and a rule-based stemmer. After preprocessing 93400 features are extracted from the data to apply machine learning algorithms. Furthermore, we achieved up to 94% precision and recall using majority voting.
Muhammad Usman, Zunaira Shafique, Saba Ayub and Kamran Malik, “Urdu Text Classification using Majority Voting” International Journal of Advanced Computer Science and Applications(IJACSA), 7(8), 2016. http://dx.doi.org/10.14569/IJACSA.2016.070836
@article{Usman2016,
title = {Urdu Text Classification using Majority Voting},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2016.070836},
url = {http://dx.doi.org/10.14569/IJACSA.2016.070836},
year = {2016},
publisher = {The Science and Information Organization},
volume = {7},
number = {8},
author = {Muhammad Usman and Zunaira Shafique and Saba Ayub and Kamran Malik}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.