Future of Information and Communication Conference (FICC) 2024
4-5 April 2024
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 14 Issue 6, 2023.
Abstract: Offensive language identification is a critical task in today's digital era, enabling the development of effective content moderation systems. However, it poses unique challenges in low resource languages where limited annotated data is available. This research paper focuses on addressing the problem of offensive language identification specifically in the context of a low resource language, namely the Kazakh language. To tackle this challenge, we propose a novel approach based on Bidirectional Long-Short-Term Memory (BiLSTM) networks, which have demonstrated strong performance in natural language processing tasks. By leveraging the bidirectional nature of the BiLSTM architecture, we capture both contextual dependencies and long-term dependencies in the input text, enabling more accurate offensive language identification. Our approach further utilizes transfer learning techniques to mitigate the scarcity of annotated data in the low resource setting. Through extensive experiments on a Kazakh offensive language dataset, we demonstrate the effectiveness of our proposed approach, achieving state-of-the-art results in offensive language identification in the low resource Kazakh language. Moreover, we analyze the impact of different model configurations and training strategies on the performance of our approach. The findings from our study provide valuable insights into offensive language identification techniques in low resource languages and pave the way for more robust content moderation systems tailored to specific linguistic contexts.
Aigerim Toktarova, Aktore Abushakhma, Elvira Adylbekova, Ainur Manapova, Bolganay Kaldarova, Yerzhan Atayev, Bakhyt Kassenova and Ainash Aidarkhanova, “Offensive Language Identification in Low Resource Languages using Bidirectional Long-Short-Term Memory Network” International Journal of Advanced Computer Science and Applications(IJACSA), 14(6), 2023. http://dx.doi.org/10.14569/IJACSA.2023.0140687
@article{Toktarova2023,
title = {Offensive Language Identification in Low Resource Languages using Bidirectional Long-Short-Term Memory Network},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2023.0140687},
url = {http://dx.doi.org/10.14569/IJACSA.2023.0140687},
year = {2023},
publisher = {The Science and Information Organization},
volume = {14},
number = {6},
author = {Aigerim Toktarova and Aktore Abushakhma and Elvira Adylbekova and Ainur Manapova and Bolganay Kaldarova and Yerzhan Atayev and Bakhyt Kassenova and Ainash Aidarkhanova}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.