Parts of Speech Tagging for Afaan Oromo

Getachew Mamo Wegari; Million Meshesha

doi:10.14569/SpecialIssue.2011.010301

DOI: 10.14569/SpecialIssue.2011.010301

PDF

Parts of Speech Tagging for Afaan Oromo

Author 1: Getachew Mamo Wegari

Author 2: Million Meshesha

International Journal of Advanced Computer Science and Applications(IJACSA), Special Issue on Artificial Intelligence, 2011.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: The main aim of this study is to develop part-of-speech tagger for Afaan Oromo language. After reviewing literatures on Afaan Oromo grammars and identifying tagset and word categories, the study adopted Hidden Markov Model (HMM) approach and has implemented unigram and bigram models of Viterbi algorithm. Unigram model is used to understand word ambiguity in the language, while bigram model is used to undertake contextual analysis of words. For training and testing purpose 159 sentences (with a total of 1621 words) that are manually annotated sample corpus are used. The corpus is collected from different public Afaan Oromo newspapers and bulletins to make the sample corpus balanced. A database of lexical probabilities and transitional probabilities are developed from the annotated corpus. These two probabilities are from which the tagger learn and tag sequence of words in sentences. The performance of the prototype, Afaan Oromo tagger is tested using tenfold cross validation mechanism. The result shows that in both unigram and bigram models 87.58% and 91.97% accuracy is obtained, respectively.

Keywords: Natural Language processing; parts of speech tagging; Hidden Markov Model; N-Gram; Afaan Oromo.

Getachew Mamo Wegari and Million Meshesha, “Parts of Speech Tagging for Afaan Oromo” International Journal of Advanced Computer Science and Applications(IJACSA), Special Issue on Artificial Intelligence, 2011. http://dx.doi.org/10.14569/SpecialIssue.2011.010301

@article{Wegari2011,
title = {Parts of Speech Tagging for Afaan Oromo},
journal = {International Journal of Advanced Computer Science and Applications(IJACSA), Special Issue on Artificial Intelligence}
doi = {10.14569/SpecialIssue.2011.010301},
url = {http://dx.doi.org/10.14569/SpecialIssue.2011.010301},
year = {2011},
publisher = {The Science and Information Organization},
volume = {1},
number = {3},
author = {Getachew Mamo Wegari and Million Meshesha},
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Parts of Speech Tagging for Afaan Oromo

Upcoming Conferences