Computer Vision Conference (CVC) 2026
21-22 May 2026
Publication Links
IJACSA
Special Issues
Computer Vision Conference (CVC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 16 Issue 11, 2025.
Abstract: Gender identification through written text analysis leverages writer-specific characteristics including linguistic patterns and stylistic behaviors, yet research on gender identification in Malay-English (Manglish) using Traditional Machine Learning (ML), Shallow Deep Learning (DL), and Deep Sequential techniques remains limited compared to English-focused studies. This study addresses this gap by investigating gender identification in Manglish across traditional ML, Shallow DL, and Sequential Deep Model approaches using a self-collected dataset of Manglish tweets from 50 anonymized Malaysian public figures. Following preprocessing, feature extraction employed Word2Vec embeddings and TF-IDF methods, revealing that Word2Vec embeddings delivered superior performance across Shallow DL and Deep Sequential models, with Bi-CNN achieving optimal results of accuracy (0.722), precision (0.727), recall (0.722), and F1-score (0.720), while TF-IDF vectorization yielded substandard performance except for Logistic Regression, which achieved consistent metrics of 0.728 across all evaluation criteria. To enhance model interpretability, eXplainable Artificial Intelligence (XAI) tools including SHAP and LIME were applied to analyze misclassifications, identifying key issues such as frequent shortform usage and word misassignment affecting prediction accuracy, and incorporating these XAI insights through iterative refinements yielded modest improvements from 72.4% to 72.8%, demonstrating XAI's value in model optimization despite limitations in capturing dataset biases and complex linguistic patterns. This study contributes the first gender classification dataset for Malay short text and demonstrates that Shallow DL and Deep Sequential models, enhanced by XAI-driven analysis, show significant promise for mixed-language contexts, highlighting the unique challenges of code-switched languages in NLP tasks while suggesting future research should explore large language models to advance classification performance in multilingual social media environments.
Norazlina Khamis, Nur Shaheera Shastera Nulizairos, Haslizatul Mohamed Hanum, Amirah Ahmad, Nor Hapiza Mohd Ariffin and Ruhaila Maskat. “Unveiling Gender in Malay-English Short Text: A Comparative Study of ML, DL and Sequential Models with XAI Misclassification Analysis”. International Journal of Advanced Computer Science and Applications (IJACSA) 16.11 (2025). http://dx.doi.org/10.14569/IJACSA.2025.0161165
@article{Khamis2025,
title = {Unveiling Gender in Malay-English Short Text: A Comparative Study of ML, DL and Sequential Models with XAI Misclassification Analysis},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2025.0161165},
url = {http://dx.doi.org/10.14569/IJACSA.2025.0161165},
year = {2025},
publisher = {The Science and Information Organization},
volume = {16},
number = {11},
author = {Norazlina Khamis and Nur Shaheera Shastera Nulizairos and Haslizatul Mohamed Hanum and Amirah Ahmad and Nor Hapiza Mohd Ariffin and Ruhaila Maskat}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.