Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement

Chaitanya Jannu; Sunny Dayal Vanambathina

doi:10.14569/IJACSA.2023.0140181

DOI: 10.14569/IJACSA.2023.0140181

PDF

Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement

Author 1: Chaitanya Jannu

Author 2: Sunny Dayal Vanambathina

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 14 Issue 1, 2023.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: Speech enhancement (SE) is an important method for improving speech quality and intelligibility in noisy environments where received speech is severely distorted by noise. An efficient speech enhancement system relies on accurately modelling the long-term dependencies of noisy speech. Deep learning has greatly benefited by the use of transformers where long-term dependencies can be modelled more efficiently with multi-head attention (MHA) by using sequence similarity. Transformers frequently outperform recurrent neural network (RNN) and convolutional neural network (CNN) models in many tasks while utilizing parallel processing. In this paper we proposed a two-stage convolutional transformer for speech enhancement in time domain. The transformer considers global information as well as parallel computing, resulting in a reduction of long-term noise. In the proposed work unlike two-stage transformer neural network (TSTNN) different transformer structures for intra and inter transformers are used for extracting the local as well as global features of noisy speech. Moreover, a CNN module is added to the transformer so that short-term noise can be reduced more effectively, based on the ability of CNN to extract local information. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of STOI (short-time objective intelligibility), and PESQ (perceptual evaluation of the speech quality).

Keywords: Convolutional neural network; recurrent neural network; speech enhancement; multi-head attention; two-stage convolutional transformer; feed-forward network

Chaitanya Jannu and Sunny Dayal Vanambathina, “Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement” International Journal of Advanced Computer Science and Applications(IJACSA), 14(1), 2023. http://dx.doi.org/10.14569/IJACSA.2023.0140181

@article{Jannu2023,
title = {Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2023.0140181},
url = {http://dx.doi.org/10.14569/IJACSA.2023.0140181},
year = {2023},
publisher = {The Science and Information Organization},
volume = {14},
number = {1},
author = {Chaitanya Jannu and Sunny Dayal Vanambathina}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement

Upcoming Conferences