Computer Vision Conference (CVC) 2026
21-22 May 2026
Publication Links
IJACSA
Special Issues
Computer Vision Conference (CVC)
Computing Conference
Intelligent Systems Conference (IntelliSys)
Future Technologies Conference (FTC)
International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 4, 2026.
Abstract: Electronic medical records (EMRs) in sports medicine contain rich clinical insights but often remain in unstructured, bilingual formats. While locally-deployed large language models (LLMs) offer a privacy-preserving solution for data extraction, their performance in handling Thai-English clinical shorthand remains under-explored. This study evaluated five open-source LLMs for extracting structured clinical data from Thai sports medicine records and assessed the reliability of human-AI collaborative annotation. Mistral-7B, Qwen2.5-7B, Gemma2-9B, LLaMA3.1-8B, and Typhoon2-3.1 were deployed locally. We evaluated the extraction of four clinical fields against a ground truth of 444 records. A standardized JSON schema was utilized to ensure data interoperability. Inter-annotator agreement (IAA) was measured using Cohen’s kappa on a 100-record sample. Mistral-7B achieved the highest F1-score (92.2%), followed by Qwen2.5-7B (91.9%). Typhoon2-3.1 underperformed (32.9%) due to bilingual format mismatches and difficulties in shorthand normalization. IAA for treatment was moderate (kappa=0.43), whereas diagnosis showed near-zero agreement (kappa=-0.04) due to non-standardized institutional shorthand. Locally-deployed LLMs can effectively transform unstructured bilingual EMRs into structured JSON formats, ensuring data privacy and readiness for clinical analytics. However, the lack of standardized clinical coding in Thai EMRs remains a significant barrier. Future digital health initiatives should integrate LLMs with standardized terminologies like ICD-11 to enhance data reliability.
Somkiat Kosolsombat, Phatnattachat Chatsiraphon, Taratep Si-Aksorn and Chiabwoot Ratanavilisagul. “Evaluating Open-Source LLMs for Thai Clinical Information Extraction”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.4 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170460
@article{Kosolsombat2026,
title = {Evaluating Open-Source LLMs for Thai Clinical Information Extraction},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170460},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170460},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {4},
author = {Somkiat Kosolsombat and Phatnattachat Chatsiraphon and Taratep Si-Aksorn and Chiabwoot Ratanavilisagul}
}
Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.