The Science and Information (SAI) Organization
  • Home
  • About Us
  • Journals
  • Conferences
  • Contact Us

Publication Links

  • IJACSA
  • Author Guidelines
  • Publication Policies
  • Outstanding Reviewers

IJACSA

  • About the Journal
  • Call for Papers
  • Editorial Board
  • Author Guidelines
  • Submit your Paper
  • Current Issue
  • Archives
  • Indexing
  • Fees/ APC
  • Reviewers
  • Apply as a Reviewer

IJARAI

  • About the Journal
  • Archives
  • Indexing & Archiving

Special Issues

  • Home
  • Archives
  • Proposals
  • ICONS_BA 2025

Computer Vision Conference (CVC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Computing Conference

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Intelligent Systems Conference (IntelliSys)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact

Future Technologies Conference (FTC)

  • Home
  • Call for Papers
  • Submit your Paper/Poster
  • Register
  • Venue
  • Contact
  • Home
  • Call for Papers
  • Editorial Board
  • Guidelines
  • Submit
  • Current Issue
  • Archives
  • Indexing
  • Fees
  • Reviewers
  • RSS Feed

DOI: 10.14569/IJACSA.2026.0170246
PDF

Self-Supervised and Explainable Transformer-Based Architectures for Robust End-to-End Speech and Language Understanding

Author 1: Mahfuzul Huda

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 17 Issue 2, 2026.

  • Abstract and Keywords
  • How to Cite this Article
  • {} BibTeX Source

Abstract: The primary aim of this study is to meld self-supervised learning techniques with transparent transformer-based frameworks to enable resilient, end-to-end speech and language understanding, alongside pretraining deep transformer models using unannotated speech and text corpora. But the system's complicated structure makes it very hard to compute, and its ability to be understood depends in part on using rough benchmarks to judge feature relevance. This research work proposes an explainable, systematic transformer-based framework concept for understanding voice and language that integrates self-supervising learning with built-in explainability. The model proposed here presented a low word error rate, high accuracy, and interpretation on multiple datasets. The framework has many strengths, but it also has some challenges, which are highlighted in the work. This deep transformer architecture needs a lot of computing power, and figuring out how important something relies on indirect truth values. In the future, planned improvements include making the framework work with more than one language and more than one field, making transformer models work better in real time, and adding assessment methods that focus on human perspectives to make it even easier to understand. Subsequently, we will work on expanding into datasets that are multilingual and cross-domain, making more efficient forms of transformers for real-time use, and employing human-centered assessment to verify that we are interpreting things correctly in real time.

Keywords: Transformer models; self-supervised learning; explainable AI; speech recognition; natural language understanding; end-to-end systems

Mahfuzul Huda. “Self-Supervised and Explainable Transformer-Based Architectures for Robust End-to-End Speech and Language Understanding”. International Journal of Advanced Computer Science and Applications (IJACSA) 17.2 (2026). http://dx.doi.org/10.14569/IJACSA.2026.0170246

@article{Huda2026,
title = {Self-Supervised and Explainable Transformer-Based Architectures for Robust End-to-End Speech and Language Understanding},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2026.0170246},
url = {http://dx.doi.org/10.14569/IJACSA.2026.0170246},
year = {2026},
publisher = {The Science and Information Organization},
volume = {17},
number = {2},
author = {Mahfuzul Huda}
}



Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

IJACSA

Upcoming Conferences

Computer Vision Conference (CVC) 2026

21-22 May 2026

  • Amsterdam, The Netherlands

Computing Conference 2026

9-10 July 2026

  • London, United Kingdom

Artificial Intelligence Conference 2026

3-4 September 2026

  • Amsterdam, The Netherlands

Future Technologies Conference (FTC) 2026

15-16 October 2026

  • Berlin, Germany
The Science and Information (SAI) Organization
BACK TO TOP

Computer Science Journal

  • About the Journal
  • Call for Papers
  • Submit Paper
  • Indexing

Our Conferences

  • Computer Vision Conference
  • Computing Conference
  • Intelligent Systems Conference
  • Future Technologies Conference

Help & Support

  • Contact Us
  • About Us
  • Terms and Conditions
  • Privacy Policy

The Science and Information (SAI) Organization Limited is a company registered in England and Wales under Company Number 8933205.