Future of Information and Communication Conference (FICC) 2025
28-29 April 2025
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 9 Issue 11, 2018.
Abstract: This paper describes the creation of the new Bangor Arabic Annotated Corpus (BAAC) which is a Modern Standard Arabic (MSA) corpus that comprises 50K words manually annotated by parts-of-speech. For evaluating the quality of the corpus, the Kappa coefficient and a direct percent agreement for each tag were calculated for the new corpus and a Kappa value of 0.956 was obtained, with an average observed agreement of 94.25%. The corpus was used to evaluate the widely used Madamira Arabic part-of-speech tagger and to further investigate compression models for text compressed using part-of-speech tags. Also, a new annotation tool was developed and employed for the annotation process of BAAC.
Ibrahim S Alkhazi and William J. Teahan, “BAAC: Bangor Arabic Annotated Corpus” International Journal of Advanced Computer Science and Applications(IJACSA), 9(11), 2018. http://dx.doi.org/10.14569/IJACSA.2018.091120
@article{Alkhazi2018,
title = {BAAC: Bangor Arabic Annotated Corpus},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2018.091120},
url = {http://dx.doi.org/10.14569/IJACSA.2018.091120},
year = {2018},
publisher = {The Science and Information Organization},
volume = {9},
number = {11},
author = {Ibrahim S Alkhazi and William J. Teahan}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.