New Speech Enhancement based on Discrete Orthonormal Stockwell Transform

S-transform is an effective time-frequency representation which gives simultaneous frequency and time distribution information alike the wavelet transforms (WT). However, the ST redundantly doubles the dimension of the original data set and the Discrete Orthonormal S-Transform (DOST) can decrease the redundancy of S-transform farther. So, this paper aims to propose a new method to remove additive background noise from noisy speech signal using DOST which supplies a multi-resolution analysis (MRA) spatial-frequency representation of image processing and signal analysis. Hence, the performances of the applied speech enhancement technique have been evaluated objectively and subjectively in comparison with respect to many other methods in four background noises at different SNR levels. Keywords—MRA; Stockwell Transform; DOST; DWT; speech enhancement


INTRODUCTION
The distortion of signals by noise is a ubiquitous problem.In fact, the background noise deteriorates the intelligibility and quality of the speech signals resulting in a harsh drop in performance of speech applications such as sound recording, telecommunications and teleconferencing.These applications need noise reduction and recover the clean signal from noisy signal.Speech enhancement is the most important technique in speech signal processing domain.It eliminates noise and ameliorates the quality and intelligibility of speech communication.
Over the last decades, noise suppression from speech signals is a very interesting area of researchers during speech processing.
The literature is enriched by many works which treat several methods for speech enhancement has been developed and investigated such as Discrete Fourier transformer (DFT), Discrete Cosine Transformer (DCT), Karhunen-Loeve transformer (KLT),Wiener filtering [2,3],Spectral Subtraction [1], Wavelet Transform (WT) [5][6][7][8][22][23][24][25][26], etc.All the methods have their advantages and inconveniences.Particularly, although the Spectral Subtraction [1]- [12] provides a tradeoff between speech distortion and residual noise, it suffers from a musical noise artifact that is perceptually annoying.Also, the Wiener estimator has a moderate computation load, but it offers no mechanism to control tradeoff between speech distortion and residual noise.Thus, the one major problem of wiener filter based methods [2]- [3] is the requirement of obtaining clean speech statistics necessary for their implementation.Among the methods using time-frequency analyses, an approach of reducing different types of noise that corrupt the clean speech is the use of Discrete Wavelet Transform (DWT) [5]- [9], which is a superior alternative to the analyses based on Short Time Fourier Transform (STFT).
Even though the wavelet transform (WT) has dominated signal denoising for years, and become a powerful tool of signal analysis and is widely used in many applications which comprise image processing and signal analysis.However, in wavelet transform, only the scale information is supplied, so the applications using the wavelet transform may be limited when the absolutely-referenced frequency and phase information are required [10].
The Stockwell Transform (ST) proposed by R. G. Stockwell in 1996 [11], is a time-frequency analysis method.The ST improves the time-frequency resolution of Short Time Fourier Transform (STFT), and can be regarded as an extension or special case of wavelet transform (WT) in the multi-resolution analysis domain.The use of S-transform can get more precise relationship between the distribution of frequency and time of the signal.Thus, the Stockwell Transforms [11] is a hybrid of the STFT and the WT.It provides a time-frequency representation of a signal with a frequency-dependent resolution and shows a great promise in various applications.However, the ST redundantly doubles the dimension of the original data set.Due to this redundancy; use of the ST is computationally expensive and even infeasible on some large size data sets.Thus, to improve its computational efficiency, R. G. Stockwell proposed the Discrete Orthonormal S-Transform (DOST) in 2007 [11] which reduce the redundancy of Stransform further and makes S-transform practical in real life www.ijacsa.thesai.organd much more convenient.The DOST is based on a set of orthonormal basis functions that localize the Fourier spectrum of the signal.It samples the time-frequency representation given by the ST with zero information redundancy and retains the advantageous phase properties of the ST.Despite The DOST is fairly young compared to other transform, it has been demonstrated to be useful in some fields Such as in image compression [10][13][14] [15], image restoration [12] , and image texture analysis [16] .It has also been successfully applied in signal analysis to channel instantaneous frequency analysis.
Therefore, in order to preserve useful information in a speech signal and eliminate as much noise as possible, we propose in this paper, a new method for speech enhancement using the Discrete Orthonormal Stockwell Transform (DOST).The main objective of the proposed method is to decrease the speech distortion and increase the speech intelligibility of degraded speech signals and reduce the listener's fatigue.This method was compared to the Discrete Wavelet Transform and Spectral Subtraction by means of objective and subjective criteria.The obtained results indicate a good performance of the proposed method and show its high potential for speech enhancement.This paper is organized as follows: Section 2 depicts a brief introduction to the Stockwell transform, the DOST and wavelet theory.Section 3 attempts to explain the methodology for the proposed speech enhancement technique using DOST and DWT.In Section 4, we present the objective and subjective performance measurement parameters used for speech enhancement and we discuss the results.Finally, Section 5 concludes this paper with a discussion and the imagination of our future work.

A. Stockwell Transform
The Stockwell Transforms proposed in 1996 [11,17,18,19,20], gives a full time-frequency decomposition of a signal.The Stockwell transforms (ST) of h(t) is defined as the Fourier transform (FT) of the product between a Gaussian window function, and h(t) .
) Where f is the frequency, t and τ are time variables, the Stockwell transform decomposes a signal into frequency (f) and temporal (τ) components.The relation between S (τ, f) and the Fourier transform of h(t) is expressed as: ) Where H (f) is the Fourier Transform of h (t).Therefore, we can get the original signal by using this relationship between FT and S-transform:

B. Discrete Orthonormal S-Transform
The DOST is introduced as an orthonormal version of ST.It can be defined as an inner product between a time series h[k] and the basis function d [k].We use ν to specify the center of each frequency band, β represents the bandwidth and τ represents the location in time.Using these parameters, the k th basis vector is expressed as: For a signal of length N, the discrete ST generates N2 coefficients, while the DOST can represent the same signal with only N coefficients.For that, the DOST is a nonredundant version of the ST.In order to calculate the DOST much faster we introduce the FFT into DOST: Using FT we can write:

C. Discrete Wavelet Transforms (DWT)
The Discrete Wavelet Transform (DWT) is a powerful tool of signal and image processing that have been successfully used in many scientific fields such as signal processing and image compression.DWT provides sufficient information both for analysis and synthesis and reduce the computation time sufficiently It analyzed the signal at different frequency bands with different resolutions and decompose the signal into a coarse approximation and detail information.The general form of DWT at L-level expressed in terms of L detail coefficients d_j (k) , and the Lth level approximation a_L (k) coefficients can be written as [9]: Where, and ( ) is the mother wavelet ( ) is the scaling function.The approximation and detail at level j are expressed as: Finally, complete content and organizational editing before formatting.Please take note of the following items when proofreading spelling and grammar Where s (n) is the clean speech w (n) the noise and y (n) the noisy speech signal.The proposed Discrete Orthonormal Stransform used in speech signal denoising can be represented as follows:  In this step, the input speech signal is divided into stationary frames and then transformation method (DOST or DWT) is applied of each frame in order to extract coefficients.
 After performing the transformation method of the speech frame, denoising involves truncating the obtained coefficients below a given threshold values.For truncate the small valued coefficients, we calculate the appropriate threshold τ to retain the original signal and restrict the noise and let all the coefficients in DOST domain compare with the threshold.In this case, the threshold value (τ) is manually adjusted and is chosen from coefficients (0< τ <CvalMax ), where CvalMax is the maximum value of the DOST coefficients or DWT Coefficients. .Soft Thresholding was carried out on the DOST or DWT coefficients before reconstructing the signal.In soft Thresholding, the elements whose absolute values are lower than the threshold are first set to zero.Then the nonzero coefficients are shrinked towards 0.

{ ( )( )
Where X represents the DOST or DWT coefficients and τis the threshold value.
 After that, we can conduct inverse DOST or inverse DWT to get the denoised speech signal.

IV. RESULTS AND EVALUATION
This section presents the experimental results of the proposed speech enhancement method at various SNR levels from -5 to 15 dB.The speech signal taken from the TIMIT Acoustic-Phonetic Continuous Speech Corpus [21], were used to evaluate the proposed algorithm.For this purpose, clean speech signal sampled at 16 kHz is used and recorded by female voice.To illustrate the performance of the proposed enhancement techniques, Several tests in different various noisy conditions, taken from Noisex-92 database: White Gaussian noise, F16 cockpit noise, Volvo car noise and Pink noise with different values of Signal to Noise Ratio (SNR)from -5dB to 15dB were used.
In order to evaluate the denoising performance of the DOST method and to compare it to DWT denoising and Spectral Subtraction; a number of objective tests used for speech enhancement technique evaluation, are presented in this study.Then, the proposed method is subjectively evaluated in terms of Informal listening tests in order to find the analogy between the objective metrics and subjective sound quality.

A. Objective evaluation
Objective measures [28] are based on mathematical comparison between the original and processed speech signals.
The measure of the signal to noise ratio, SNR is one of the most extensively used.As the name suggests, it is computed as the ratio of the signal to noise powers in decibels: -Signal-to-Noise Ratio * ∑ ( ) ∑ ( ) ( ) + (11) where s and sˆ are respectively the clean and the enhanced speech signals.

-Peak Signal to Noise Ratio (PSNR)
‖ ‖ (12) Where N is the length of reconstructed signal, S is the maximum absolute square value of signal s and ||s-s`|| 2 is the energy of the difference between the original and reconstructed signal.
-Normalized Root Mean Square Error (NRMSE) Here, s (n) is the speech signal, s' (n) is reconstructed speech signal and μ s (n) is the mean of speech signal -Perceptual Evaluation of Speech Quality PESQ (Perceptual Evaluation of Speech Quality) [27] is an objective quality measure that is approved as the ITU-T recommendation P.862.It is a tool of objective measurement conceived to predict the results of a subjective Mean Opinion Score (MOS) test.Particularly, PESQ was developed to model subjective tests commonly to assess the voice quality by human beings

B. Results
Several experiments using the TIMIT database were carried out to evaluate the performance of the proposed method and to compare it to DWT based speech enhancement methods [22], [23], [25] and [26] and Spectral Subtraction [1].Indeed, in this work, for comparative purposes, the DWT algorithm given in [6], [7], [8] and [9], the used mother wavelet is "db10", five decomposition levels and Soft thresholding was implemented [24].In this part of the paper, the obtained results from SNR, PSNR NRMSE and PESQ computation is reported These results are obtained by the application of the proposed speech enhancement technique, the Discrete Wavelet transform and Spectral Subtraction on a number of noisy speech signals which are obtained by corrupting the original signals by different types of noise (White, Pink, Volvo and F16) at different values of SNR (-5dB to 15dB).A comparative study between our proposed speech enhancement system using DOST and the DWT denoising proves that the proposed speech enhancement system using DOST outperforms the DWT and the experimental results are shown in tables and figures bellows; www.ijacsa.thesai.orgThe proposed speech enhancement method is compared with a speech enhancement using DWT and Spectral Subtraction.SNR is calculated under different SNR inputs and the results are shown in Table I and Fig. 2.

Fig. 2. Comparison of Output SNR for DOST, DWT and Spectral Subtraction
Table 1 and Figure2 below show that the three denoising techniques improve the signal to noise ratio (SNR).The results show also that the DOST based denoising technique is better than the DWT denoising.In fact, the DOST improves the output SNR for SNR input equal to -5dB and 0dB for various noise conditions.It's clear that the Spectral Subtraction denoising is better than the two denoising methods.In case of White Noise, the DOST denoising method is more efficient, however for Volvo noises; the Spectral Subtraction seems to be more suitable and reliable.From these figures, it can be seen that the DOST method can greatly improve SNR with less distortion of the original speech signal.It is observed as the level of Gaussian noise increased from -5dB to 15dB, the output Peak Signal to Noise Ratio (PSNR) values of our speech enhancement system based on DOST outperform the DWT and spectral subtraction.This demonstrates a significant improvement in signal quality and the powerful of the DOST.However, in case of Volvo noise, the spectral subtraction has the best PSNR.Fig. 9 presents the PESQ scores for the noisy signal and the enhanced mentioned signals when the speech is degraded by Pink noise.Clearly, the PESQ scores of the proposed algorithm outperform the PESQ scores of other methods.Fig. 10 shows the PESQ scores for the noisy signal and the enhanced signals using DWT and DOST over all noise conditions (F16, Pink, White, Volvo) at 5dB SNR.It can be seen from this figure that the proposed method is characterized by the highest PESQ scores showing that the enhanced speech by our method has a better perceived quality.Hence, the high PESQ scores perceived the quality of the enhanced speech.

C. Subjective evaluation
In order to evaluate the enhancement quality of the noisy speech, we have used the Perceptual Evaluation of Speech Quality (PESQ) score which is a mean opinion score, showing high correlations with subjective listening tests.It ranges from 1 to 5. The higher PESQ score shows the higher perceptual quality and the lower speech distortions.Also we have conducted informal listening test where a group of 10 listeners (six women and four men) are permitted and disposed to perceptually evaluate 3 enhanced speech signals from the NOIZEUS database with three background noises (White, F16, and Pink ) at three SNR levels (0 ,5 and 10 dB).The listeners have used the MOS (Mean Opinion Score) method to evaluate the difference between the residual noise characteristics of the enhanced speech (1: Bad, 2: Poor, 3: Fair, 4: Good, 5: Excellent).Fig.7 bellow shows the statistic results of subjective evaluation for 3 speech enhancement methods: The test results show a favorite improvement in the auditory quality of our proposed speech enhancement.In fact, our approach provides a speech signal containing less musical noise while preserving the speech quality.
The listeners found that the subjective sound quality of our proposed method in denoising using DOST has the highest correlation with the objective evaluation in comparison with DWT and spectral subtraction in various noises conditions at different levels of SNR.

V. CONCLUSION
In this paper, a new method for speech enhancement using the Discrete Orthogonal Stockwel Transform has been presented.The evaluation of the proposed technique is performed by comparing it to the speech enhancement technique based on DWT and the technique based on Spectral Subtraction.Both objective and subjective methods used for evaluation of DOST performance in speech denoising.Hence, this evaluation is based on the use of a number of objective criterions which are the SNR, PSNR, NRMSE and PESQ.Also, in this evaluation, a speech signal with a female speaker from the TIMIT database is used and corrupted it by different types of noises which are Pink, White, F16 and Volvo noises at various input SNR levels ranging from -5dB to 15dB.Simulation results show that the proposed method provides better results in terms of higher output SNR, higher output PSNR, higher PESQ score, and lower NRMSE values than the DWT and Spectral subtraction based denoising methods and results in a better enhanced speech.Also, Informal listening tests justify the efficiency of the proposed method that results in a better enhanced speech than that obtained by the other methods.
In the future research works, we will cooperate the proposed speech enhancement method with speech recognition systems in order to increase their recognition rate under noisy environments.
) Where ( ) ( ) are known as wavelet filtersIII.NEW PROPOSED SPEECH ENHANCEMENT METHODIn this research work, speech enhancement algorithm based on transformation is performed using three most commonly used steps: applying transformation (DOST or DWT), truncate coefficients (thresholding) and applying inverse transformation (IDOST or IDWT) to reconstruct the denoised signal (Figure2).

Fig. 1 .
Fig. 1.Block diagram of the suggested method

Fig. 3 .
Fig. 3. Evaluating the proposed system based on the PSNR measure

Fig. 8 .Fig. 9 .
Fig. 8. Performance comparison of PESQ scores for different methods in the presence of Noise

Fig. 10 .
Fig. 10.Performance comparison of PESQ scores for different methods at 5 dB SNR in the presence of various noises Fig.8 shows the PESQ scores for the noisy signal and the enhanced mentioned signals when the speech is degraded by white noise.As clearly shown in the figure below, PESQ scores of the proposed algorithm are better than PESQ scores given by the algorithm based on the Spectral Subtraction and DWT.

Fig. 11 .
Fig. 11.Subjective evaluation of different speech enhancement methods