The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Outstanding Reviewers

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • ICONS_BA 2025

Computer Vision Conference (CVC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • RSS Feed

DOI: 10.14569/IJACSA.2026.0170347
PDF

Gradient-Guided Data Augmentation with mBERT and MuRIL for Malayalam Offensive Language Detection

Author 1: Munawwar K V
Author 2: Nandhini K

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 3, 2026.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: The widespread adoption of social media platforms has facilitated increased usage of offensive content, particularly in native languages where users express themselves more freely. Automated offensive language detection in low-resource languages such as Malayalam faces significant challenges due to severe class imbalance, where non-offensive samples substantially outnumber offensive instances, resulting in biased model performance and diminished detection accuracy for underrepresented classes. This study addresses the critical challenge of class imbalance in Malayalam offensive language identification through a comprehensive data augmentation framework. We propose a novel gradient-guided augmentation technique specifically designed to mitigate minority class imbalance by selectively enhancing underrepresented class samples through the identification and synthesis of challenging instances that improve model robustness. The effectiveness of various augmentation strategies is systematically evaluated, including back-translation, paraphrasing, and NLPAUG techniques, integrated with mBERT and MuRIL models. Our gradient-guided augmentation approach demonstrates substantial performance improvements, achieving a notable 0.09 increase in recall score compared to the baseline model's 0.74 recall, while preserving overall model performance on imbalanced Malayalam offensive language datasets. The proposed methodology offers a promising solution for addressing class imbalance challenges in offensive content detection for low-resource languages. The results highlight that integrating augmentation with explainability not only improves classification performance but also helps in overcoming certain limitations associated with the previous methods, while also contributing more to the study.

Keywords: Offensive comment detection; gradient guided augmentation; NLPAUG; back translation; paraphrasing with MultiIndicParaphraseGeneration

Munawwar K V and Nandhini K. “Gradient-Guided Data Augmentation with mBERT and MuRIL for Malayalam Offensive Language Detection”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.3 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170347

@article{V2026,
title = {Gradient-Guided Data Augmentation with mBERT and MuRIL for Malayalam Offensive Language Detection},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170347},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170347},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {3},
author = {Munawwar K V and Nandhini K}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

Computer Vision Conference (CVC) 2026

21-22 May 2026

  • Amsterdam, The Netherlands

Computing Conference 2026

9-10 July 2026

  • London, United Kingdom

Artificial Intelligence Conference 2026

3-4 September 2026

  • Amsterdam, The Netherlands

Future Technologies Conference (FTC) 2026

15-16 October 2026

  • Berlin, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computer Vision Conference
  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

The Science and Information (SAI) Organization Limited is a company registered in England and Wales under Company Number 8933205.