Deep Speech Recognition System Based on AutoEncoder-GAN for Biometric Access Control

Oussama Mounnan; Otman Manad; Abdelkrim El Mouatasim; Larbi Boubchir; Boubaker Daachi

doi:10.14569/IJACSA.2023.01411132

DOI: 10.14569/IJACSA.2023.01411132

PDF

Deep Speech Recognition System Based on AutoEncoder-GAN for Biometric Access Control

Author 1: Oussama Mounnan

Author 2: Otman Manad

Author 3: Abdelkrim El Mouatasim

Author 4: Larbi Boubchir

Author 5: Boubaker Daachi

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 14 Issue 11, 2023.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: Speech recognition-based biometric access control systems are promising solutions that have resolved many is-sues related to security and convenience. Speech recognition, as a biometric modality, offers unique advantages such as user-friendliness and non-intrusiveness, etc. However, developing robust and accurate speaker identification and authentication systems pose challenges due to variations in speech patterns and environmental factors. Integrating deep learning techniques, especially AutoEncoder and Generative Adversarial Network models, has shown promising results in addressing these chal-lenges. This article presents a novel approach based on the combination of two deep learning models, namely, AE and GAN for speech recognition-based biometric access control. In the model architecture, the AutoEncoder takes the MFCC coefficients as input, and the encoder converts the latter to the latent space, whereas the decoder reconstructs the data. Then, speech features extracted from the latent space are used in the GAN generator to generate additional speech data. The discriminator network has a dual role, serving as both a feature extractor and a classifier. The first extracts relevant features from generated samples, while the latter distinguishes between generated and authentic samples that come from AutoEncoder. This strategy outperforms DNN and LSTM models on VoxCeleb 2, LibriSpeech, and Aishell- 1 datasets. The models are trained to minimize Mean Squared Error (MSE) for both the generator and discriminator, aiming at achieving highly realistic datasets and a robust, interpretable model. This approach addresses challenges in feature extraction, data augmentation, realistic biometric samples generation, data variability handling, and data generalization enhancement, pro-viding therefore, a comprehensive solution.

Keywords: Speaker identification; speech recognition; biomet-ric access control; authentication; verification

Oussama Mounnan, Otman Manad, Abdelkrim El Mouatasim, Larbi Boubchir and Boubaker Daachi, “Deep Speech Recognition System Based on AutoEncoder-GAN for Biometric Access Control” International Journal of Advanced Computer Science and Applications(IJACSA), 14(11), 2023. http://dx.doi.org/10.14569/IJACSA.2023.01411132

@article{Mounnan2023,
title = {Deep Speech Recognition System Based on AutoEncoder-GAN for Biometric Access Control},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2023.01411132},
url = {http://dx.doi.org/10.14569/IJACSA.2023.01411132},
year = {2023},
publisher = {The Science and Information Organization},
volume = {14},
number = {11},
author = {Oussama Mounnan and Otman Manad and Abdelkrim El Mouatasim and Larbi Boubchir and Boubaker Daachi}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Deep Speech Recognition System Based on AutoEncoder-GAN for Biometric Access Control

Upcoming Conferences