The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Outstanding Reviewers

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • ICONS_BA 2025

Computer Vision Conference (CVC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • RSS Feed

DOI: 10.14569/IJACSA.2026.0170261
PDF

An Articulatory-Aware CNN-BiGRU-Attention Framework for Explainable Phoneme-Level Pronunciation Assessment in ESL Speech

Author 1: P. Bindhu
Author 2: Jasgurpreet Singh Chohan
Author 3: M. Durairaj
Author 4: Megha Sawangikar
Author 5: N. Neelima
Author 6: Elangovan Muniyandy
Author 7: G. Sanjiv Ra
Author 8: Loay F. Hussien

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 2, 2026.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: Proper pronunciation at the phoneme level has been known to be one of the most enduring problems affecting the Second Language learners of the English language (ESL) since the slight pronunciatory variations in the learned language may greatly influence its communicative power and the level of intelligibility. The existing methods of pronunciation evaluation, which are mostly made using automatic speech recognition (ASR), place their results at the word level or the sentence level and offer generic numerical scores with little linguistic meaning, which is not effective in assessing accented speech and subsequent correction. To overcome these shortcomings, the paper introduces an articulatory-conscious recognition model of phonemes that provides fine-grained and interpretable feedback to enhance ESL pronunciation. The novelty of the work is in the combination of a hybrid CNN-BiGRU-Attention architecture and an Articulatory Error Mapping Engine, which symbolically transforms phoneme-level articulation errors into articulatory errors, based on place of articulation, manner, voicing, and vowel quality articulatory deviations. The experimental analysis performed on the non-native English speech had a phoneme recognition accuracy of 91.4 that was much higher than the commercial ASR-based systems (78.3) and the traditional HMM-GMM baselines (70.5). The system was very sensitive to ESL pronunciation errors, making it 84 percent accurate in substitution, 82 percent accurate in deletion and 79 percent accurate in insertions in detection and articulatory mapping was over 87 percent accurate in all categories. The framework was tested in Python with deep learning packages and speech processing toolkits, and provided a scalable, explainable, and learner-focused system that can be used to support the intelligent training of ESL pronunciation and provide pedagogically significant feedback at the phoneme level.

Keywords: Articulatory error analysis; attention mechanism; ESL pronunciation assessment; phoneme recognition model; speech processing framework

P. Bindhu, Jasgurpreet Singh Chohan, M. Durairaj, Megha Sawangikar, N. Neelima, Elangovan Muniyandy, G. Sanjiv Ra and Loay F. Hussien. “An Articulatory-Aware CNN-BiGRU-Attention Framework for Explainable Phoneme-Level Pronunciation Assessment in ESL Speech”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.2 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170261

@article{Bindhu2026,
title = {An Articulatory-Aware CNN-BiGRU-Attention Framework for Explainable Phoneme-Level Pronunciation Assessment in ESL Speech},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170261},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170261},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {2},
author = {P. Bindhu and Jasgurpreet Singh Chohan and M. Durairaj and Megha Sawangikar and N. Neelima and Elangovan Muniyandy and G. Sanjiv Ra and Loay F. Hussien}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

Computer Vision Conference (CVC) 2026

21-22 May 2026

  • Amsterdam, The Netherlands

Computing Conference 2026

9-10 July 2026

  • London, United Kingdom

Artificial Intelligence Conference 2026

3-4 September 2026

  • Amsterdam, The Netherlands

Future Technologies Conference (FTC) 2026

15-16 October 2026

  • Berlin, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computer Vision Conference
  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

The Science and Information (SAI) Organization Limited is a company registered in England and Wales under Company Number 8933205.