Computer Vision Conference (CVC) 2026
16-17 April 2026
Publication Links
IJACSA
Special Issues
Future of Information and Communication Conference (FICC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 16 Issue 5, 2025.
Abstract: Manual annotation of large datasets is a time-consuming and resource-intensive process. Hiring annotators or outsourcing to specialized platforms can be costly, particularly for datasets requiring domain-specific expertise. Additionally, human annotation may introduce inconsistencies, especially when dealing with complex or ambiguous data, as interpretations can vary among annotators. Large Language Models (LLMs) offer a promising alternative by automating data annotation, potentially improving scalability and consistency. This study evaluates the performance of ChatGPT compared to human annotators in annotating an Islamophobia dataset. The dataset consists of fifty tweets from the X platform using the keywords Islam, Muslim, hijab, stopislam, jihadist, extremist, and terrorism. Human annotators, including experts in Islamic studies, linguistics, and clinical psychology, serve as a benchmark for accuracy. Cohen’s Kappa was used to measure agreement between LLM and human annotators. The results show substantial agreement between LLM and language experts (0.653) and clinical psychologists (0.638), while agreement with Islamic studies experts was fair (0.353). Overall, LLM demonstrated a substantial agreement (0.632) with all human annotators. ChatGPT achieved an overall accuracy of 82%, a recall of 69.5%, an F1-score of 77.2%, and a precision of 88%, indicating strong effectiveness in identifying Islamophobia-related content. The findings suggest that LLMs can effectively detect Islamophobic content and serve as valuable tools for preliminary screenings or as complementary aids to human annotation. Through this analysis, the study seeks to understand the strengths and limitations of LLMs in handling nuanced and culturally sensitive data, contributing to broader discussion on the integration of generative AI in annotation tasks. While LLMs show great potential in sentiment analysis, challenges remain in interpreting context-specific nuances. This study underscores the role of generative AI in enhancing human annotation efforts while highlighting the need for continuous improvements to optimize performance.
Rafizah Daud, Nurlida Basir, Nur Fatin Nabila Mohd Rafei Heng, Meor Mohd Shahrulnizam Meor Sepli and Melinda Melinda, “Evaluating Large Language Model Versus Human Performance in Islamophobia Dataset Annotation” International Journal of Advanced Computer Science and Applications(IJACSA), 16(5), 2025. http://dx.doi.org/10.14569/IJACSA.2025.0160512
@article{Daud2025,
title = {Evaluating Large Language Model Versus Human Performance in Islamophobia Dataset Annotation},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2025.0160512},
url = {http://dx.doi.org/10.14569/IJACSA.2025.0160512},
year = {2025},
publisher = {The Science and Information Organization},
volume = {16},
number = {5},
author = {Rafizah Daud and Nurlida Basir and Nur Fatin Nabila Mohd Rafei Heng and Meor Mohd Shahrulnizam Meor Sepli and Melinda Melinda}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.