Speech Enhancement using Fully Convolutional UNET and Gated Convolutional Neural Network

Danish Baloch; Sidrah Abdullah; Asma Qaiser; Saad Ahmed; Faiza Nasim; Mehreen Kanwal

doi:10.14569/IJACSA.2023.0141184

DOI: 10.14569/IJACSA.2023.0141184

PDF

Speech Enhancement using Fully Convolutional UNET and Gated Convolutional Neural Network

Author 1: Danish Baloch

Author 2: Sidrah Abdullah

Author 3: Asma Qaiser

Author 4: Saad Ahmed

Author 5: Faiza Nasim

Author 6: Mehreen Kanwal

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 14 Issue 11, 2023.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: Speech Enhancement aims to enhance audio intelligibility by reducing background noises that often degrade the quality and intelligibility of speech. This paper brings forward a deep learning approach for suppressing the background noise from the speaker's voice. Noise is a complex nonlinear function, so classical techniques such as Spectral Subtraction and Wiener filter approaches are not the best for non-stationary noise removal. The audio signal was processed in the raw audio waveform to incorporate an end-to-end speech enhancement approach. The proposed model's architecture is a 1-D Fully Convolutional Encoder-to-Decoder Gated Convolutional Neural Network (CNN). The model takes the simulated noisy signal and generates its clean representation. The proposed model is optimized on spectral and time domains. To minimize the error among time and spectral magnitudes, L1 loss is used. The model is generative, denoising English language speakers, and capable of denoising Urdu language speech when provided. In contrast, the model is trained exclusively on the English language. Experimental results show that it can generate a clean representation of a clean signal directly from a noisy signal when trained on samples of the Valentini dataset. On objective measures such as PESQ (Perceptual Evaluation of Speech Quality) and STOI (Short-Time Objective Intelligibility), the performance evaluation of the research outcome has been conducted. This system can be used with recorded videos and as a preprocessor for voice assistants like Alexa, and Siri, sending clear and clean instructions to the device.

Keywords: Speech enhancement; speech denoising; deep neural network; raw waveform; fully convolutional neural network; gated linear unit

Danish Baloch, Sidrah Abdullah, Asma Qaiser, Saad Ahmed, Faiza Nasim and Mehreen Kanwal, “Speech Enhancement using Fully Convolutional UNET and Gated Convolutional Neural Network” International Journal of Advanced Computer Science and Applications(IJACSA), 14(11), 2023. http://dx.doi.org/10.14569/IJACSA.2023.0141184

@article{Baloch2023,
title = {Speech Enhancement using Fully Convolutional UNET and Gated Convolutional Neural Network},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2023.0141184},
url = {http://dx.doi.org/10.14569/IJACSA.2023.0141184},
year = {2023},
publisher = {The Science and Information Organization},
volume = {14},
number = {11},
author = {Danish Baloch and Sidrah Abdullah and Asma Qaiser and Saad Ahmed and Faiza Nasim and Mehreen Kanwal}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Speech Enhancement using Fully Convolutional UNET and Gated Convolutional Neural Network

Upcoming Conferences