The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Outstanding Reviewers

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • ICONS_BA 2025

Computer Vision Conference (CVC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • RSS Feed

DOI: 10.14569/IJACSA.2025.0161165
PDF

Unveiling Gender in Malay-English Short Text: A Comparative Study of ML, DL and Sequential Models with XAI Misclassification Analysis

Author 1: Norazlina Khamis
Author 2: Nur Shaheera Shastera Nulizairos
Author 3: Haslizatul Mohamed Hanum
Author 4: Amirah Ahmad
Author 5: Nor Hapiza Mohd Ariffin
Author 6: Ruhaila Maskat

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 16 Issue 11, 2025.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: Gender identification through written text analysis leverages writer-specific characteristics including linguistic patterns and stylistic behaviors, yet research on gender identification in Malay-English (Manglish) using Traditional Machine Learning (ML), Shallow Deep Learning (DL), and Deep Sequential techniques remains limited compared to English-focused studies. This study addresses this gap by investigating gender identification in Manglish across traditional ML, Shallow DL, and Sequential Deep Model approaches using a self-collected dataset of Manglish tweets from 50 anonymized Malaysian public figures. Following preprocessing, feature extraction employed Word2Vec embeddings and TF-IDF methods, revealing that Word2Vec embeddings delivered superior performance across Shallow DL and Deep Sequential models, with Bi-CNN achieving optimal results of accuracy (0.722), precision (0.727), recall (0.722), and F1-score (0.720), while TF-IDF vectorization yielded substandard performance except for Logistic Regression, which achieved consistent metrics of 0.728 across all evaluation criteria. To enhance model interpretability, eXplainable Artificial Intelligence (XAI) tools including SHAP and LIME were applied to analyze misclassifications, identifying key issues such as frequent shortform usage and word misassignment affecting prediction accuracy, and incorporating these XAI insights through iterative refinements yielded modest improvements from 72.4% to 72.8%, demonstrating XAI's value in model optimization despite limitations in capturing dataset biases and complex linguistic patterns. This study contributes the first gender classification dataset for Malay short text and demonstrates that Shallow DL and Deep Sequential models, enhanced by XAI-driven analysis, show significant promise for mixed-language contexts, highlighting the unique challenges of code-switched languages in NLP tasks while suggesting future research should explore large language models to advance classification performance in multilingual social media environments.

Keywords: Gender identification; Manglish; machine learning; shallow deep learning; deep sequential model

Norazlina Khamis, Nur Shaheera Shastera Nulizairos, Haslizatul Mohamed Hanum, Amirah Ahmad, Nor Hapiza Mohd Ariffin and Ruhaila Maskat. “Unveiling Gender in Malay-English Short Text: A Comparative Study of ML, DL and Sequential Models with XAI Misclassification Analysis”. International Journal of Advanced Computer Science and Applications (IJACSA) 16.11 (2025). http://dx.doi.org/10.14569/IJACSA.2025.0161165

@article{Khamis2025,
title = {Unveiling Gender in Malay-English Short Text: A Comparative Study of ML, DL and Sequential Models with XAI Misclassification Analysis},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2025.0161165},
url = {http://dx.doi.org/10.14569/IJACSA.2025.0161165},
year = {2025},
publisher = {The Science and Information Organization},
volume = {16},
number = {11},
author = {Norazlina Khamis and Nur Shaheera Shastera Nulizairos and Haslizatul Mohamed Hanum and Amirah Ahmad and Nor Hapiza Mohd Ariffin and Ruhaila Maskat}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

Computer Vision Conference (CVC) 2026

21-22 May 2026

  • Amsterdam, The Netherlands

Computing Conference 2026

9-10 July 2026

  • London, United Kingdom

Artificial Intelligence Conference 2026

3-4 September 2026

  • Amsterdam, The Netherlands

Future Technologies Conference (FTC) 2026

15-16 October 2026

  • Berlin, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computer Vision Conference
  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

The Science and Information (SAI) Organization Limited is a company registered in England and Wales under Company Number 8933205.