The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Outstanding Reviewers

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • ICONS_BA 2025

Computer Vision Conference (CVC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • RSS Feed

DOI: 10.14569/IJACSA.2026.0170460
PDF

Evaluating Open-Source LLMs for Thai Clinical Information Extraction

Author 1: Somkiat Kosolsombat
Author 2: Phatnattachat Chatsiraphon
Author 3: Taratep Si-Aksorn
Author 4: Chiabwoot Ratanavilisagul

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 4, 2026.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: Electronic medical records (EMRs) in sports medicine contain rich clinical insights but often remain in unstructured, bilingual formats. While locally-deployed large language models (LLMs) offer a privacy-preserving solution for data extraction, their performance in handling Thai-English clinical shorthand remains under-explored. This study evaluated five open-source LLMs for extracting structured clinical data from Thai sports medicine records and assessed the reliability of human-AI collaborative annotation. Mistral-7B, Qwen2.5-7B, Gemma2-9B, LLaMA3.1-8B, and Typhoon2-3.1 were deployed locally. We evaluated the extraction of four clinical fields against a ground truth of 444 records. A standardized JSON schema was utilized to ensure data interoperability. Inter-annotator agreement (IAA) was measured using Cohen’s kappa on a 100-record sample. Mistral-7B achieved the highest F1-score (92.2%), followed by Qwen2.5-7B (91.9%). Typhoon2-3.1 underperformed (32.9%) due to bilingual format mismatches and difficulties in shorthand normalization. IAA for treatment was moderate (kappa=0.43), whereas diagnosis showed near-zero agreement (kappa=-0.04) due to non-standardized institutional shorthand. Locally-deployed LLMs can effectively transform unstructured bilingual EMRs into structured JSON formats, ensuring data privacy and readiness for clinical analytics. However, the lack of standardized clinical coding in Thai EMRs remains a significant barrier. Future digital health initiatives should integrate LLMs with standardized terminologies like ICD-11 to enhance data reliability.

Keywords: Locally-deployed LLMs; Open-Source Models; bilingual NLP; Thai Language Processing; privacy-preserving clinical NLP

Somkiat Kosolsombat, Phatnattachat Chatsiraphon, Taratep Si-Aksorn and Chiabwoot Ratanavilisagul. “Evaluating Open-Source LLMs for Thai Clinical Information Extraction”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.4 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170460

@article{Kosolsombat2026,
title = {Evaluating Open-Source LLMs for Thai Clinical Information Extraction},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170460},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170460},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {4},
author = {Somkiat Kosolsombat and Phatnattachat Chatsiraphon and Taratep Si-Aksorn and Chiabwoot Ratanavilisagul}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

Computer Vision Conference (CVC) 2026

21-22 May 2026

  • Amsterdam, The Netherlands

Computing Conference 2026

9-10 July 2026

  • London, United Kingdom

Artificial Intelligence Conference 2026

3-4 September 2026

  • Amsterdam, The Netherlands

Future Technologies Conference (FTC) 2026

15-16 October 2026

  • Berlin, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computer Vision Conference
  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

The Science and Information (SAI) Organization Limited is a company registered in England and Wales under Company Number 8933205.