Computer Vision Conference (CVC) 2026
21-22 May 2026
Publication Links
IJACSA
Special Issues
Computer Vision Conference (CVC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 3, 2026.
Abstract: The widespread adoption of social media platforms has facilitated increased usage of offensive content, particularly in native languages where users express themselves more freely. Automated offensive language detection in low-resource languages such as Malayalam faces significant challenges due to severe class imbalance, where non-offensive samples substantially outnumber offensive instances, resulting in biased model performance and diminished detection accuracy for underrepresented classes. This study addresses the critical challenge of class imbalance in Malayalam offensive language identification through a comprehensive data augmentation framework. We propose a novel gradient-guided augmentation technique specifically designed to mitigate minority class imbalance by selectively enhancing underrepresented class samples through the identification and synthesis of challenging instances that improve model robustness. The effectiveness of various augmentation strategies is systematically evaluated, including back-translation, paraphrasing, and NLPAUG techniques, integrated with mBERT and MuRIL models. Our gradient-guided augmentation approach demonstrates substantial performance improvements, achieving a notable 0.09 increase in recall score compared to the baseline model's 0.74 recall, while preserving overall model performance on imbalanced Malayalam offensive language datasets. The proposed methodology offers a promising solution for addressing class imbalance challenges in offensive content detection for low-resource languages. The results highlight that integrating augmentation with explainability not only improves classification performance but also helps in overcoming certain limitations associated with the previous methods, while also contributing more to the study.
Munawwar K V and Nandhini K. “Gradient-Guided Data Augmentation with mBERT and MuRIL for Malayalam Offensive Language Detection”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.3 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170347
@article{V2026,
title = {Gradient-Guided Data Augmentation with mBERT and MuRIL for Malayalam Offensive Language Detection},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170347},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170347},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {3},
author = {Munawwar K V and Nandhini K}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.