Future of Information and Communication Conference (FICC) 2025
28-29 April 2025
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 9 Issue 6, 2018.
Abstract: Word Segmentation is considered a basic NLP task and in diverse NLP areas, it plays a significant role. The main areas which can be benefited from Word segmentation are IR, POS, NER, sentiment analysis, etc. Urdu Word Segmentation is a challenging task. There can be a number of reasons but Space Insertion Problem and Space Omission Problems are the major ones. Compared to Urdu, the tools and resources developed for word segmentation of English and English like other western languages have record-setting performance. Some languages provide a clear indication for words just like English which having space or capitalization of the first character in a word. But there are many languages which do not have proper delimitation in between words e.g. Thai, Lao, Urdu, etc. The objective of this research work is to present a machine learning based approach for Urdu word segmentation. We adopted the use of conditional random fields (CRF) to achieve the subject task. Some other challenges faced in Urdu text are compound words and reduplicated words. In this paper, we tried to overcome such challenges in Urdu text by machine learning methodology.
Sadiq Nawaz Khan, Khairullah Khan, Wahab Khan, Asfandyar Khan, Fazali Subhan, Aman Ullah Khan and Burhan Ullah, “Urdu Word Segmentation using Machine Learning Approaches” International Journal of Advanced Computer Science and Applications(IJACSA), 9(6), 2018. http://dx.doi.org/10.14569/IJACSA.2018.090628
@article{Khan2018,
title = {Urdu Word Segmentation using Machine Learning Approaches},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2018.090628},
url = {http://dx.doi.org/10.14569/IJACSA.2018.090628},
year = {2018},
publisher = {The Science and Information Organization},
volume = {9},
number = {6},
author = {Sadiq Nawaz Khan and Khairullah Khan and Wahab Khan and Asfandyar Khan and Fazali Subhan and Aman Ullah Khan and Burhan Ullah}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.