Performance Evaluation of Loss Functions for Margin Based Robust Speech Recognition

Margin-based model estimation methods are applied for speech recognition to enhance the generalization capability of acoustic model by increasing the margin. An important aspects of margin based acoustic model for parameter estimation is that, the acoustic models are derived from soft margin concept and hinge loss function used in SVM as loss function to attained enhanced speech recognition performance. In this study, performance evaluation of loss functions (Logistic, Savage, Sigmoid) have been computed in the presence of white noise, pink noise, and brown noise with and without SVM classifiers to analyze the impact of noise on loss functions in comparison with hinge loss function used in SVM for parameter estimation in margin based acoustic model. Experimental results show that hinge loss function in the presence of pink noise and white noise have significant effects on isolated digits (0-9) in both pre-conditioned and recorded data samples in comparison with brown noise. Whereas hinge loss functions show serious anomalies with savage loss and sigmoid loss in term of performance and sigmoid loss function provides exceptionally good results in term of percentage error for all prescribed


INTRODUCTION
The prime goal of pattern recognition is to find the parameters of recognizers or classifiers that can decrease the error rate by using the existing training data samples.To build an effective pattern recognizer or classifier, two different categories of learning algorithms in machine learning are generative model learning and discriminative model learning.MLE is considered as generative model or non-discriminative learning approach, which is focus on data distribution modeling instead of directly classifying class boundaries.In contrast, discriminative learning approach discriminately learns the parameters of joint probability model to minimize the recognition/classification error [1].The main idea behind Discriminative training (DT) is to introduce a discriminative criterion to the training method of Hidden Markov Models (HMMs).Several discriminative training methods have been proposed for ASR, such as maximum mutual information estimation (MMIE) [2, 3,4], minimum classification error (MCE) [5,6,7]; and minimum word/phone error (MWE/MPE) [8,9].For Hidden Markov (HMM) based speech recognition, conventional discriminative training criterions directly minimize the empirical risk on the training data sample and do not focus on the model generalization.In other words, the aim of discriminative training criterions is to minimize classification error on training sample as model estimation but do not show any significance performance to improve the generalization capability of the acoustic model for new unseen test data samples [10].The generalization capability is an ability to translate gains in the training data set to test data set.In the past studies, the discriminative training achieved this generalization ability by optimizing the smoothed empirical error rate on training data samples [11].Recently, many researches have been reported to incorporate margins (distance between the decision boundary and well classified data samples) into discriminative training method [12,13,14,15,16,17] to further enhance the generalization capability.The generalization problem of learning classifiers have been studied in the field of machine learning [18,19], whereas, machine learning using concept of statistical learning theory since last three decades to provide the framework for studying inference problem that is of making prediction, gaining knowledge, constructing models and making decisions for a set of data samples [20].From the statistical learning [18] point of view, a test risk bound is defined by the summation of an empirical risk (i.e., training set risk) and a generalization function.Generalization function is often used to measure the possible mismatch between training and testing environments.ASR researchers at York University proposed the concept of large margin estimation (LME) for speech recognition based on the principle of large margin.Large margin estimation (LME) [10,12] and its variant large relative margin estimation (LRME) [21] of HMMs have been proposed with the concept of enhancing separation margin.The main crux of the LME and LRME is that only correctly classified data samples take part in update models whereas, it is important to note that misclassified data samples are also substantial for classifier learning.To address this issue in LME and LRME, the www.ijacsa.thesai.orgextension of LRME [86] was proposed by considering all the training data samples, particularly moving misclassified data samples in the direction of correct decision boundary.Another margin based approach, Soft margin estimation (SME) was proposed by J.Li et al [22] from Georgia Tech University based on the idea of soft margin in support vector machines [23] to enhance the generalization capability of the learning classifiers.Soft margin estimation (SME) performs well as compared to Maximum likelihood estimation (MLE) and conventional discriminative criterion, and it is steadily better than Large margin estimation (LME) due to the well-defined separation(misclassification) measure and good optimized objective function for generalization [24].In contrast, with LME, SME make use of both misclassified and correct classified data samples to update models and the performance of the SME can be improve when the distribution of testing and training data samples become quite comparable [25].Two considerable issues have been identified in [26] related to hinge loss function in Soft margin (SME) 1) hinge loss function performs well when the noise in training sample is insignificant and 2) any misclassified training sample directly affects the time required for optimization and determines the label of the test sample.To improve this limitation of SME, X.Xiao et al [27] proposed feature domain method based on mean and variance normalization (MVN) [28] which showed that SME perform well with feature domain method and reduces the mismatch between training and testing data samples and suggested the combination of SME framework with other noise compensation methods e.g.model adaptation methods for future research..Issues related to hinge loss function, Geometric margin MCE criteria in soft margin estimation framework based on sigmoid loss function were presented to find the strength of robustness by increasing the geometric margin of the acoustic model [29].Loss functions such as ramp loss and 0-1 loss also showed comparable noise tolerance capability like sigmoid loss function [30].In this paper, demonstrative experiments have been performed to observe the behavior of three loss functions (Logistic, Savage, Sigmoid) in the presence of white noise, pink noise, and brown noise with and without SVM (Soft margin) classifiers in comparison with hinge loss function for preconditioned and recorded digit (0-9) taken from environment.
Rest of the paper is organized as follow.The consequent section discusses the loss function for soft margin (SME) including sigmoid, savage and logistic loss functions.In section III, Data collection and recording specifications are defined.The experimental results and discussions are presented with pre-conditioned and recorded digits in section IV.Finally, conclusions are drawn in section V.

II. LOSS FUNCTION FOR SOFT MARGIN ESTIMATION (SME)
The objective of recognition and classification systems is to minimize the classification risk on testing data samples by developing a classifier .The concept behind the risk minimization is to measure the performance of estimator by its risk, in order to select best estimator function we should have a measure of inconsistency between an estimated classification and true classification Y(x) of x as shown in ( 1) and (2) respectively, The performance of classifier can be measure using loss function , which can be defined as; Consider a risk or estimator function providing the true or expected value of loss as follows: Where ∫ and There is a need to find function that minimize the risk function ), but we don't know .Soft Margin Estimation (SME) [13,22] is a margin-based model estimator applied for speech recognition with an objective to enhance the generalization capability and decision feedback learning by increasing the margin and to enhancing the separation measures of the model in the classifier design respectively.Concept of test risk bound has been defined in statistical learning theory bounded by the summation of two terms: A generalization function and an empirical risk (i.e.risk on the training set) [18].The generalization of a model is a monotonically increasing function of its VC dimension which is used to measures the complexity of model bounded by decreasing function of margin [18].Soft margin estimation (SME) combines two target optimization function in a single object function based on soft margin estimation, represent the set of HMM based model parameters, defines the loss function for utterance and is the total number of training utterances.Whereas, and are the coefficient used to balance the empirical risk minimization and margin maximization and soft margin respectively.Margin based acoustic model derived from soft margin concept and hinge loss function used in SVM is defined as loss function to attained enhanced speech recognition performance.Hinge loss function does not perform well in the presence of significant amount of noise.This experimental setup evaluate the performance of hinge loss function in the presence of noise in comparison with three other loss functions with and without SVM classifier for preconditioned isolated digit and digit taken from real environment.The hinge loss function used in SVM can be defined as [22]: is a positive value number relating to smoothness of loss function and represent the for soft margin (SME).Similarly savage loss [31], standard sigmoid loss and logistic loss function can be written as ( 7), ( 8) and ( 9) respectively.

III. DATA COLLECTION AND RECORDING SPECIFICATIONS
The methodology of experiments includes the collection of data that comprises of two individual sets; the set of TI-digits (0-9) standard isolated digit corpus and digits recorded from real environment.For the recording specification of recorded isolated digit corpus, ITU recommendations based standardized procedure was adopted for speech corpora development.Standard recording environment has been used having SNR (signal to noise ratio) greater than and equal to 45dB.We made use of Microsoft Windows 7 built-in sound recorder to record the 10 utterances of each isolated digit (0-9).The recording format is Mono, 32 bit PCM with sampling rate of 8000Hz using microphone with impedance of 32 Ω, Max Input power=40mW, Drive Unit=30mm, Plug Type=3.5MM,Frequency Response=20Hz ~ 20 KHz.Microphone with specified configuration were used to take input digits 0 to digit 9 and recorded in noise free recording studio environment.Afterward white noise, brown noise and pink noise were mixed with both sets of data with the help of audacity software.The purpose of noise addition in isolated TI-Digit and recorded digit samples is to study the behavior of each digit under prescribed conditions with and without white, brown and pink noises.The number of experiments was performed with the evaluation of cepstrum coefficient values for each digit in clean and noisy conditions and four graphs of each digit for both recorded and isolated TI-Digit databases were analyzed separately.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
In this section, we present demonstrative experiments to show the performance of four loss function such as sigmoid, hinge, savage and logistic in the presence of noise with and without SVM (Soft margin) classifiers to observe the behavior of four different loss functions with pre-conditioned and recorded data samples.Data samples were used in experiments consist of isolated digits taken from TI-Digit corpus [33] and recorded digits taken from real environment with and without SVM (Soft Margin) classifier in the presence of three different types of noises (White, Brown and Pink) taken from NOISEX-92 noise-in-speech database [32,34].The experimental frame work was divided in two phases: two sets of data have been used with each phase of experiments comprises on isolated TI-Digits (0-9) samples of recorded digits taken from real environment.First stage shows the results of pre-recorded isolated Ti-digits (0-9) without classifier & with addition of three noises.Second stage contains the same results with performance of SVM classifier.In the next phase of experiment, we studied behavior of different loss functions: hinge loss, sigmoid loss, savage loss and logistic individually for data samples of each set in the presence of white, brown and pink noise with and with SVM (Soft Margin) classifiers.The results obtained from experimental analysis represented in loss function comparative analysis charts clearly evident that behavior of loss functions significantly changes for clean & noisy conditions.Furthermore, we present the graphical analysis of different loss function with and without noise to observe behavior of hinge loss, sigmoid loss, savage loss and logistic loss function for both sets of isolated TI-Digit and recorded digit.SVM (Soft Margin) Classifier took values of loss functions values as an input for both data sets separately.Based on the results obtained in the previous steps, classifier classifies the clean sample from noisy samples.For graphical representation of the experimental results in the proceeding sections, digit "one" was selected with and without SVM (soft margin) classifiers in the presence of white, brown and pink noises for both isolated TI-Digit and recorded data samples.Results generated from the experimental framework were based on numerous pieces of code that were implemented & observed in MATLAB tool version 10.0 and speech processing toolbox.

A. Graphical Representation and Loss Functions Comparative Analysis of Isolated Ti-Digit Without and with SVM Classifiers
To evaluate the performance of data sets in clean and noisy conditions, the cepstrum of the each digit in the data set with and without noise were obtained to determine the peak values of cepstrum coefficient.We made use of these cepstrum coefficients to distinct clean digit from the noisy data sample.Isolated TI-Digit "1" was selected for the graphical representation of entire experimental results in this and later sections to illustrate the behavior of the different loss function under certain conditions.The interpretation of graphical results demonstrated by blue line and red line.The blue line/curve represent the plot of loss function value without noise and the red line/curve represent the plot of loss function value with noise.The gap between two lines or curves obtained from the different loss functions indicate the resultant value of error between loss functions with and without noise.When the gap increases between two line/curve, it provides higher value of error which reflects the poor performance in the presence of noise.
The red and blue lines/ curves in the above figures clearly show that behavior of the different loss functions with three different noises.The gap between lines/curve of sigmoid and savage loss is lesser than the hinge and logistic loss.Loss function comparative analysis for all isolated digits (0-9) in the next section indicates that the savage and sigmoid loss functions perform well in comparison with hinge and logistic loss function except for some anomalies.Similarly, the performance of loss functions have been evaluated with isolated TI-Digit in the presence of white, brown and pink noise using SVM (soft margin) classifiers.We made use of SVM classifiers to separate clean TI-Digit from noisy TI-Digit as shown in Fig. 5, Fig. 6 for digit 1 with white and brown noise respectively.www.ijacsa.thesai.orgIn Table 1, savage and sigmoid loss function represents substantial anomalies when compared with hinge loss function for digits 3, digit 4 and digit 5. Whereas, perform of hinge loss function quite well than Logistic function.Table 2 indicate that hinge loss function perform well than Logistic loss function in the presence of brown noise, while hinge loss function shows considerable anomalies when compared with savage and sigmoid loss function for digit 1, digit 3, digit 4 and digit 6. www.ijacsa.thesai.orgIn Table 3, hinge function in the presence of pink noise not performs well than sigmoid function except for digit 2 and digit 6. Whereas, some anomalies have been observed with savage loss function in comparison with hinge loss for digit 0, digit 3, digit 8 and digit 9. Similarity, loss functions comparative analysis have been performed among sigmoid loss, hinge loss, savage loss and logistic loss in the presence of white noise, brown noise and pink noise with SVM (Soft Margin) classifiers for Isolated TI-Digit.In Table 4, hinge is better than Logistic function for all digits but severe anomalies can be seen when comparing performance with other loss functions.Table 5 shows that hinge is better than Logistic function for all digits but severe anomalies can be seen when comparing performance with other loss functions.In Table 6, logistic loss function not performs well than hinge loss, but considerable anomalies have been observed when hinge loss function compared with savage loss function.

B. Graphical Representation of Loss Functions Comparative Analysis of Recorded Digit without and with SVM Classifiers
The performance of the loss functions have been evaluated in this section, using recorded digit samples taken from environment to study the behavior of the hinge loss, sigmoid loss, savage loss, and logistic in the presence of noise.Fig. 9 and Fig. 10 represent hinge and logistic plot of Isolated TI-Digit 1 in the presence of pink and pink noise respectively without SVM classifiers.Similarly, we evaluated the performance of loss functions with recorded digit taken from environment in the presence of white, brown and pink noise using SVM classifiers.We made use of SVM classifiers to separate clean recorded digit sample from noisy samples.Fig. 11 and Fig. 12 represent hinge and sigmoid plot of recorded digit 1 with brown and pink noise respectively with SVM classifiers.8 and Table 9 provide the comparative analysis of hinge, sigmoid, savage and logistic loss function for recorded digit without SVM (Soft Margin) Classifiers.In Table 7, savage and sigmoid loss function performs well than hinge loss function in the presence of white noise except digits 3, digit 4 and digit 5.
In Table 8, serious anomalies have been observed with hinge loss function when compared with savage and sigmoid in some digits with brown noise.Table 9 displays that savage and sigmoid perform well in comparison with hinge loss except digit l and digit 7 whereas logistic function not performs well than hinge loss function.
Similarly, loss function comparative analysis among hinge loss, sigmoid loss, savage loss and logistic loss in the presence of white noise, brown noise and pink noise with SVM (Soft Margin) classifiers for recorded digits taken from real environment.www.ijacsa.thesai.org11and Table 12 provide the comparative analysis of hinge, sigmoid, savage and logistic loss function for recorded digit with SVM (Soft Margin) Classifiers.In Table 10, hinge loss not perform well as compared to savage and sigmoid loss function but some anomalies have been observed with digit 0, digit 2, digit 3 and digit 8 in the presence of white noise.In Table 11, hinge loss performs well than logistic loss except for digit 2 whereas, some anomalies have been observed with digit 0, digit 3, digit 4 and digit 9 in the presence of brown noise when compared with savage and sigmoid function.Table 12 indicates that logistic loss function not perform well in comparison with hinge loss but savage function and sigmoid function perform well than hinge loss function except for digit 1, digit 3 and digit 8.The following observation has been acquired from demonstrative experiments:  Pink noise and white noise illustrate significant effects on isolated digits (0-9) in both pre-conditioned and recorded conditions in comparison with brown noise.Hinge loss function doesn't perform well than sigmoid loss and savage loss functions but it performs better than logistic loss function in the presence of white and pink noise, however, some anomalies are observed in the presence of brown noise.
 In all four prescribed conditions in demonstrative experiments for both recorded digits taken from environment and pre-conditioned TI-Digits with and without classifiers, logistic loss function not perform well in comparison with hinge loss function whereas hinge loss function show serious anomalies with savage loss sigmoid and functions in term of performance.
 In comparison with hinge loss and Logistic loss functions, sigmoid loss function provides exceptionally good results in term of percentage error for all prescribed conditions in experiments.Whereas, few inconsistencies can be seen in the performance of savage loss function in comparison with hinge loss.

V. CONCLUSION
Motivated by the issue related to hinge loss function used in SVM for parameter estimation in margin based acoustic model, this paper presented the comparative analysis of three loss functions (Logistic, Savage, Sigmoid) in comparison with hinge loss to observe the behavior of loss functions in the presence of white noise, pink noise, and brown noise with and without SVM (Soft margin) classifiers for preconditioned and recorded data samples.Demonstrative experiments have been made on NOISEX-92 (speech and noise-in speech) databases, TIDIGIT corpus and recorded data samples (0-9) taken from real environment.The demonstrative experiments indicated that hinge loss function doesn't perform well than savage loss and sigmoid loss functions but it performs better than logistic loss function in the presence of pink and white noise as compared to brown noise for all prescribed conditioned.Whereas, sigmoid loss function shows remarkably better results in comparison with hinge and other loss function in term of percentage error.

Fig. 7 .Fig. 8 .
Fig. 7. TI-Digit 1 in the presence of brown noise with SVM classifier using hinge loss

Fig. 9 .
Fig. 9. Recorded digit 1 in the presence of pink noise without SVM classifier using hinge loss

Fig. 11 .
Fig. 11.Recorded digit 1 in the presence of brown noise with SVM classifier using hinge loss

TABLE I .
LOSS FUNCTIONS COMPARATIVE ANALYSIS OF WHITE NOISE FOR ISOLATED TI-DIGIT WITHOUT USING CLASSIFIER

TABLE II .
LOSS FUNCTION COMPARATIVE ANALYSIS OF BROWN NOISE FOR ISOLATED TI-DIGIT WITHOUT USING CLASSIFIER

TABLE III .
LOSS FUNCTION COMPARATIVE ANALYSIS OF PINK NOISE FOR ISOLATED TI-DIGIT WITHOUT USING CLASSIFIER

TABLE IV .
LOSS FUNCTIONS COMPARATIVE ANALYSIS OF WHITE NOISE FOR ISOLATED TI-DIGIT USING CLASSIFIERTABLE V. LOSS FUNCTIONS COMPARATIVE ANALYSIS OF BROWN NOISE FOR ISOLATED TI-DIGIT USING CLASSIFIER TABLE VI.LOSS FUNCTIONS COMPARATIVE ANALYSIS OF PINK NOISE FOR ISOLATED TI-DIGIT USING CLASSIFIER

TABLE VII .
LOSS FUNCTIONS COMPARATIVE ANALYSIS OF WHITE NOISE FOR RECORDED DIGIT WITHOUT CLASSIFIER

TABLE VIII .
LOSS FUNCTIONS COMPARATIVE ANALYSIS OF BROWN NOISE FOR RECORDED DIGIT WITHOUT CLASSIFIER

TABLE IX .
LOSS FUNCTIONS COMPARATIVE ANALYSIS OF PINK NOISE FOR RECORDED DIGIT WITHOUT CLASSIFIER

TABLE X .
LOSS FUNCTIONS COMPARATIVE ANALYSIS OF WHITE NOISE FOR RECORDED DIGIT WITH CLASSIFIER TABLE XI.LOSS FUNCTIONS COMPARATIVE ANALYSIS OF BROWN NOISE FOR RECORDED DIGIT WITH CLASSIFIER

TABLE XII .
LOSS FUNCTIONS COMPARATIVE ANALYSIS OF PINK NOISE FOR RECORDED DIGIT CLASSIFIER