Deep Learning Bidirectional LSTM based Detection of Prolongation and Repetition in Stuttered Speech using Weighted MFCC

Stuttering is a neuro-development disorder during which normal speech flow is not fluent. Traditionally SpeechLanguage Pathologists used to assess the extent of stuttering by counting the speech disfluencies manually. Such sorts of stuttering assessments are arbitrary, incoherent, lengthy, and error-prone. The present study focused on objective assessment to speech disfluencies such as prolongation and syllable, word, and phrase repetition. The proposed method is based on the Weighted Mel Frequency Cepstral Coefficient feature extraction algorithm and deep-learning Bidirectional Long-Short term Memory neural network for classification of stuttered events. The work has utilized the UCLASS stuttering dataset for analysis. The speech samples of the database are initially preprocessed, manually segmented, and labeled as a type of disfluency. The labeled speech samples are parameterized to Weighted MFCC feature vectors. Then extracted features are inputted to the Bidirectional-LSTM network for training and testing of the model. The effect of different hyper-parameters on classification results is examined. The test results show that the proposed method reaches the best accuracy of 96.67%, as compared to the LSTM model. The promising recognition accuracy of 97.33%, 98.67%, 97.5%, 97.19%, and 97.67% was achieved for the detection of fluent, prolongation, syllable, word, and phrase repetition, respectively. Keyword—Speech; stuttering; deep learning; WMFCC; BiLSTM


I. INTRODUCTION
For communication between human beings, speech proves to be the most habitually and widely used verbal means to precise feelings, ideas, and thought. Not all human beings are blessed with normal means of speech. The potency of speech in delivering data during communication depends on fluency. Fluency is defined by normal speech flow, which connects different phonemes to make a message [1]. Speech is fluent if continuity among semantic units, rhythm, speed, and energy applied for flow is normal. Any kind of disruption in fluency is known as dysfluency. Stuttering is a complex type of dysfluency. In stuttering, there is a disturbance in continuity and rhythm due to pauses and blocks, the rate is much slower, and efforts are higher than normal. Researchers have categorized the factors that lead to stuttering as of three types, namely, development, neurogenic, and psychogenic.
People who stutter (PWS) may have three sorts of disfluencies: repetition of a sound, syllable, word or phrase, sound prolongation during which a sound is sustained for a markedly more extended period that may be traditional and silent blocks at starting of vocalization or word or within the middle of a word. Johnson [2] introduced this classification for the first time. It has been used by clinicians and researchers ever since.
Even though stuttering may not be considered as a disability by many people, it incites a speech constraint. People who stutter loses not only their confidence but also generate a negative attitude towards their communication skills. Furthermore, it ruins their self-confidence, relationship with others, employment opportunities, and opinions of others about them [3]. Stuttering influence individuals of all ages, culture, and races irrespective of their intelligence and financial status. Many pieces of research have stated that stuttering affects approximately 1% of the world population and is more common in males as compared to females [4]. Therefore, this area is mainly a knowledge base field of analysis for different domains like speech pathology, psychology, speech physiology, acoustics, and signal analysis.
Stuttering is one of the intense issues found in speech pathology. Speech-Language Pathologists (SLP) diagnoses the individual who stutters and measures the fluency to gauge the response of the stutterer throughout the treatment process. Traditionally SLPs used to assess the extent of stuttering manually. They counted and divided the frequency of stuttered events with total spoken words. Such sorts of stuttering assessments are arbitrary, incoherent, lengthy, and errorprone. Over the past two decades, SLPs gave great attention to objective assessment techniques for assessing the stuttered events, as discussed in our previous work [5].
Automatic evaluation of stuttered speech is therefore necessary, to automate the count and classification of stuttered events. The proposed work has employed Weighted Mel Frequency Cepstral Coefficients (WMFCC) feature extraction method and deep-learning-based classification method Bidirectional Long-Short Term Memory (Bi-LSTM) for the automatic assessment of four forms of disfluency prolongation and syllable, word, and phrase repetition. The efficacy of the Bi-LSTM model is assessed as compared to other In this paper, the University College London Archive of Stuttered Speech (UCLASS) database is utilized for analysis. The experimental analysis in this study reveals that WMFCC and Bi-LSTM based proposed method performs more efficiently as compared to other models.
The results elucidate that the model proposed has improved performance and advantages compared with other models. This study makes two significant contributions.
 Firstly, it uses WMFCC instead of traditional MFCC for feature extraction. WMFCC includes the dynamic information of the speech samples, which increases the detection accuracy of stuttered events; and also reduces the computational overhead to the classification stage.
 Secondly, it employs Bi-LSTM rather than traditional RNN and LSTM. Bi-LSTM provides the solution for gradient disappearance in RNN, as well as overcomes the unidirectional flow of information of LSTM.
The paper is structured according to the following. Section 2 reviews the work related to automatic detection of stuttering speech disorders. Section 3 elaborates on the framework for the system proposed. It also includes brief descriptions of the database used, feature extraction, and classification techniques applied. Section 4 consists of experimental results and a comparative analysis of the classification model. Section 5 provides a conclusion.

II. RELATED WORKS
This section reviews work relating to recognition systems designed to detect or classify stuttering speech disorders; previous research has presented various methods and algorithms that have been applied to recognizing stuttering events from speech signals. Table I displays a comprehensive comparative analysis of various feature extraction and classification methods based on the dataset used, type of disfluency, and accuracy. The previous works conducted signifies the importance of feature extraction and classification methods in the stuttered events detection.
Traditional machine learning techniques are being gradually replaced by Deep learning technology. Deep learning provides a more accurate representation of objects and can automatically obtain objects features from a vast amount of data [26]. These are progressively used to further refine computers' capacities in order to understand what humans can do, including speech recognition. Deep structured learning models based on these functional attributes include convolutional neural network (CNN) [27], recurrent neural network (RNN) [28] [24], and long-short term memory (LSTM) [25]. The conventional machine learning techniques for recognition employed shallow structured architectures such as hidden Markov model (HMM), Support Vector Machines (SVM), Artificial Neural Network (ANN), and linear and non-linear dynamical system [29]. These architectures are ideally suited for simple or constrained problems, since their limited capabilities can cause problems in complicated large-scale real-world problems [30]. Such real-world problems involve human speech, language recognition, and visual scenes, requiring a more profound and layered architecture to extract the complex information.
Tian Swee et al. [6] and Thiang and Wanto [9] trained Hidden Markov Model (HMM) model to classify speech samples as fluent and non-fluent. The HMM model determines the likelihood of being in a state depends on its prior state at (t-1) while disregarding all other dependence. It also requires a large number of parameters and data for building and training the model [31]. In [8] and [14], Ravikumar et al. and Hariharan et al. discussed the classification of extracted features through Support Vector Machines (SVM). However, SVM deals with only fixed-size input are not efficient for large databases as well as its computational cost is directly proportional to the number of classes to be classified. Savin et al. [19] employed an ANN for classification. ANN does not have structured methodology as well as time-consuming for large networks [32].
The deep learning technique CNN performs very well on non-sequential data while fails in interpreting temporal information. However, the RNN is good at modeling the temporal data but suffers from the problem of short-term memory caused by vanishing gradient [33][34] [35]. Thus, LSTM was created as a solution to short-term memory [36]. They are capable of learning long term dependencies [37]. Based on the above considerations, this paper applies Bi-LSTM for the classification of a vast amount of speech data [38]. Bi-LSTM model processes the information in two directions and links them to obtain the output class of stuttering. The proposed work has employed the WMFCC feature extraction method and deep-learning-based classification method Bi-directional Long-Short Term Memory (Bi-LSTM) for the automatic assessment of four forms of disfluency prolongation and syllable, word, and phrase repetition. The process for detection of repetition and prolongation in stuttered speech is split into five stages: signal pre-processing, disfluent speech sample segmentation and labeling, labeled sample splitting into training, validation and test sets, feature extraction and classification using network training and model (Fig. 1). The University College London Archive of Stuttered Speech (UCLASS) database is utilized for analysis [39]. The study evaluates the efficacy of Bi-LSTM model, based on the accuracy of the classification of stuttered events.

A. Signal Pre-Processing
A signal is pre-processed by removing the silence regions [40] [41]. There is no excitation in the vocal tract during the silence region, hence no speech production. Thus, preprocessing reduces not only the amount of processing but also enhances the overall efficiency and accuracy of the system proposed. The combination of two widely known approaches, namely Short Time Energy (STE) and Zeros Crossing Rate (ZCR) (Fig. 2), has been used in this work [42] [43]. It is a fast and straightforward approach and gives a better result of classifying the speech into voiced/unvoiced. The short-term energy is the energy-related to short term region of speech [41]. The total energy of a speech frame is determined by the following (1).
Where w(n) represents the windowing function, and n is the shift in the number of samples. The voiced region energy is high in comparison with the unvoiced region. The silent region displays marginal energy content.
Zero-Crossing Rate specifies the number of zero crossings in a given signal [41]. The zero-crossing rate of a stationary signal is calculated by (2): Where ( ( )) is a signum function and is described as by the (3).
The zero-crossing rates in unvoiced sounds are comparatively high as compared to the voiced sounds. The combination of these two features overcome the issue of categorizing the speech into a voiced/unvoiced signal (Fig. 3).

B. Disfluent Speech Sample Segmentation and Labeling
The disfluent speech signals are obtained from the University College London Archive of Stuttered Speech (UCLASS) [39]. It is released in version 1 and version 2, consisting of three types of recording: monologues, reading, and spontaneous conversation. Version 1 has 138 "monologue" recordings contributed by 81 speakers. The database used in this work refers to 20 samples of speech for experimentation [44]. It comprises two female speakers and 18 male speakers aged 7years 8 months to 17 years 9 months. The selection of speech signals aims at covering a wide variety of stuttering rate and age. The samples provided with text script are only included in the database.
This paper investigates only four forms of disfluencies, prolongation, and syllable, word, and phrase repetition. They are easily detectable in monosyllabic words. After preprocessing the selected speech samples, disfluent speech samples were marked and segmented manually by listening to the pre-processed signals. The segmented samples were labeled as five classes, namely, Fluent, Prolongation, Syllable Repetition, Word Repetition, and Phrase Repetition (Fig. 4).   (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 349 | P a g e www.ijacsa.thesai.org

C. Labeled Samples Splitting
The segmented disfluent speech samples were divided into three sets for training, validation, and testing. The training set is a subset of labeled stuttered speech samples used to train the model. The validation set evaluates the performance of the model with different hyperparameter values. It is smaller than the training set. The test set determines the final accuracy of the model and analyses the performance of different models. In this study, the datastore of disfluent speech samples is split into training, validation, and test set in the ratio of 60%, 20%, and 20%, respectively.
The process of pre-processing, segmentation, labeling, and sample splitting is described through an algorithm in Table II.

D. WMFCC Feature Extraction
The extraction of speech features is a sort of dimension reduction technique that is employed to minimize the data that is giant to be processed by an algorithm. The key objective of feature extraction is to upbraid the speech signal into the various acoustically recognizable elements and to get the feature vectors with a nominal amendment to keep the processing efficient. In our previous work [45], a comparative analysis of extensions of MFCC feature extraction techniques [46], namely Delta MFCC, Delta-delta MFCC, and Weighted MFCC [47] was conducted. Its experimental results displayed, WMFCC slightly outperforms Delta-delta MFCC and significantly outperforms Delta MFCC and MFCC in all situations of frame length, alpha values, and frame overlap percentage [45]. The proposed work has applied frequencydomain based Weighted Mel Frequency Cepstral Coefficients. WMFCC is a fusion of MFCC and its derivatives delta and delta-delta. The resultant vector contains both static as well as dynamic information of the signal. Moreover, the feature vector is of size 14; thus, incur less computational overhead to the classification stage. Table III describes

E. Bi-Directional Long-Short Term Memory
Deep learning Bi-LSTM is applied for the classification of stuttered speech samples. It is composed of LSTM cells (Fig. 6). The set of features vectors discussed in the above section are set as input to the classifier. The model is trained and validated with 60% and 20% of the speech samples of the datastore, respectively. The remaining of the samples are used for testing the model.

1) Long-Short Term Memory:
LSTM is a specialized Recurrent Neural Network (RNN) architecture, competent in learning long term dependencies [48]. RNN suffers from short-term memory, caused by vanishing gradient problem. To mitigate this problem, LSTM has a hidden layer known as the LSTM cell. LSTM cells are built with various gates and cell state that can regulate the flow of information. Like RNNs, at each time iteration, , the LSTM cell has the layer input, , and the layer output, . The cell also takes the cell input state, ̃ , the cell output state, , and the previous cell output state, . LSTM architecture has three gates, namely, forget, input, and output gate denoted as , , and , respectively.
The cell state act as the network memory, conveying valuable information across the entire sequence. The gates are specific neural networks that determine which information is permitted on the cell state. Throughout the training, the gates will learn which information is essential to retain or forget. The value of gates and cell state can be determined by using the following (4) to (7): where , , , and are the weights connecting the hidden layer input to all the gates and input cell state. The , , and are the weight matrices mapping previous cell output state to all the gates and input cell state. The , , , and are bias vectors. The and are the sigmoid and tanh activation function, respectively. The cell output state, , and the layer output, , at each time iteration , can be calculated as in (8)-(9): The result of the LSTM layer should be a vector of all the outputs, represented as [ ].
2) Bidirectional LSTM: The Bi-LSTM are originated from bidirectional RNN [50]. It processes sequential data with two different hidden layers, in both forward and backward directions, and links them to the same output layer. Across certain areas, bidirectional networks are considerably stronger than unidirectional ones, such as speech recognition [51]. Fig. 7 represents an unfolded Bi-LSTM layer structure containing a forward and a backward LSTM layer [52]. The output sequence of the forward layer, ⃗ , is determined iteratively using inputs in a definite sequence, while the output sequence of backward layer, ⃖⃗ , is determined using the reversed input. The forward and backward layer outputs are computed using standard LSTM by (4) - (9). The Bi-LSTM layer produces an output vector, , which defines each element by the following Equation (10).

F. Bi-LSTM Model Training and Testing
Although LSTM can acquire long speech sequence information but only takes one direction into consideration. It assumes that only previous frame affects the current frame. But not considers that the next frame is also related to current state. This signifies that there is a two-way relationship and the next speech frame should also be considered. Bi-LSTM provides the solution for this problem (Fig. 8).
Bi-LSTM is capable of solving the relationship between two speech frames. It also strengthens the two-way relationship between the current and next speech frame. Due to the bi-directional time structure of Bi-LSTM, it captures more structural information. Hence gives better classification accuracy as compared to one-way LSTM [53].
From Fig. 8, it can be seen that speech features vectors are obtained through the WMFCC feature extraction technique, and then the feature sequences are passed through Bi-LSTM for training and testing. The Bi-LSTM links the output of the feature extraction module to the further layers. Table IV describes the complete training and testing algorithm.

1) Sort data for padding:
During training, the training feature vectors are split into mini-batches. The training data is padded so that they all have the same length. However, a large amount of padding degrades network performance. In order to prevent too much padding in the training process, the training data is sorted by sequence length.
2) Define Bi-LSTM network: Bi-LSTM network is a layered architecture shown in Fig. 8. The first layer embedding layer is also called as the sequence input layer. It takes the sorted 14-dimensional WMFCC feature vector as input. The second and third layers are the hidden forward and backward LSTM, forming the Bi-LSTM layer with 100 hidden units. Due to these two layers, the current input is related to the previous and next sequence. The input sequence reaches the model in both directions through the hidden layer. After the processing of the hidden layers, the outputs are combined to obtain the final output of the Bi-LSTM layer. The output from both the LSTM layers can be computed by the following (11): (11) where and represents the output of forward and backward LSTM layer, when it takes sequence from and as input. and are to control the factors of Bi-LSTM. is the sum of two unidirectional LSTM elements at time .
The output of the Bi-LSTM layer is the input to the fully connected layer of size equal to the number of classes, i.e., five. This layer links each piece of input feature information with a piece of output information for classification by the next layers.
Finally, the softmax and classification layers categorize speech frames into various disfluencies classes such as prolongation, syllable repetition, word repetition, and phrase repetition. The softmax layer applies the softmax function as an activation function that converts the real vector values into a vector with values between 0 and 1, so it can be interpreted as probabilities. The probability of classifying into class in the softmax regression [54] can be defined by (12).
where represents the number of classes and are the model parameters.
In the classification layer, the model receives the values from the softmax function and assigns each input to one of the classes using the cross-entropy function (13).
∑ ∑ (13) where N represents the number of samples, K is the number of classes, indicates that th sample belongs to th class and represents the value obtained from the softmax function. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 352 | P a g e www.ijacsa.thesai.org  11. Match the similarity between the test labels and predicted labels. 12. Evaluate the stuttered events classification accuracy of the model. 13. If classification accuracy is optimal then output classification accuracy else rebuild the model from step 3

3) Initializing the hyper-parameters of the network:
Once the network is defined, the hyper-parameters of the network are initialized. Model hyper-parameters are properties on which the entire training process depends [55]. They are divided into two categories: Optimizer and model-specific hyper-parameters. The optimization parameters determine how the network is trained and is more related to optimization, such as the number of epochs, batch size, and learning rate. In contrast, the model-specific parameters are variables that determine the model structure, such as the number of hidden units and hidden layers. These parameters should be defined before training.
Hyper-parameter directly controls the training algorithm's behavior and thus have a significant difference in improving model performance [55]. Therefore, choosing appropriate parameters is an integral part of the optimization of the learned model. The process of selecting good hyperparameters involves a large number of experiments, which is a time-consuming and tedious task. Most researchers rely on their experience of selecting appropriate parameters for a deep neural network.
In order to determine appropriate hyper-parameters, the classification accuracy of the validation set is used for evaluation. This work applies a diagnostic approach, in which various hyperparameters performance is investigated on both training and validation datasets. The analysis determines how a given configuration performs and how to be adjusted to obtain better performance. The hyper-parameters such as learning rate, batch size, number of epochs, and number of hidden units are taken into consideration for analysis.

4) Training and testing of the datasets:
Once the Bi-LSTM model and its hyper-parameters are defined, the model is trained by using the training dataset. After the training process is over, the model is validated through the validation dataset. If the classification accuracy of the model is optimized, then its performance is tested; otherwise, the model hyper-parameters are reconfigured. The parameters such as learning rate, batch size, number of epochs, and number of hidden units are considered for reconfiguration. These parameters are tested for various ranges of values. The process of reconfiguration of hyper-parameters is repeated until the model is optimized, as represented in Fig. 9. The classification performance of the optimized model is compared with the traditional LSTM model using the testing dataset. After the testing process is over, the process of performance evaluation is carried out. If the results of the evaluation are optimal, then the process is stopped; otherwise, the complete model is redefined, and the complete training process is repeated until the model is optimized.

IV. EXPERIMENTS AND RESULTS
This section discusses the efficacy and performance of the proposed algorithm based on WMFCC feature extraction and Bi-LSTM classification for four forms of disfluencies. This study evaluates the stuttered events recognition model using the stuttered samples obtained from the UCLASS database. The dataset used in this work refers to 20 samples of speech from UCLASS for experimentation. It comprises two female speakers and 18 male speakers aged 7 years 8 months to 17 years 9 months. The stuttered speech samples are manually identified and segmented from the selected speech samples. The segmented samples were labeled as five classes, namely, Fluent, Prolongation, Syllable Repetition, Word Repetition, and Phrase Repetition. The speech samples were split into training testing and validation datasets. Firstly, the signals are pre-processed by removing the silent regions from the samples using the combination of STE and ZCR techniques. Then 14dimensional acoustic features were extracted from the www.ijacsa.thesai.org segmented samples using the WMFCC feature extraction algorithm. Finally, the extracted feature vectors are inputted to the deep learning Bi-LSTM model. The Bi-LSTM model is trained and optimized through training and validation sets by reconfiguring the hyperparameters. The performance of the proposed model is compared with the traditional LSTM model by using the test set.

A. Adjustments of Parameters
In the training model, various hyperparameters of deep learning classification such as learning rate, batch size, number of epochs, and number of hidden units, also play a vital role in the performance of the learned model.
When training the Bi-LSTM network, these parameters are tuned, and their accuracy on the validation set is observed. The experiments were performed based on the hyperparameters' configuration tabled in Table V. For the first experiment, the best value of the initial learning rate was determined while fixing the typical values for mini batch-size as 16, the number of epochs as 100, and the number of hidden units as 100. The learning rate was varied from 10 -2 to 10 -4 for analysis, and the result is presented in Fig. 10. It can be seen that 10 -2 as the initial learning rate, generated better classification accuracy of 86.67% for available stuttered data.
In the second experiment, the effect of batch-size values 4,8,16 and 32 was determined by fixing the initial learning rate to the best value obtained in the last experiment while the other two with their typical values. The average classification accuracy versus batch size is represented in Fig. 11. The experiment showed that the model produced the highest classification accuracy of 96.67% for the value of mini batchsize as 8.
The effect of the number of epochs was analyzed in the third observational study by fixing the learning rate and batch size as their best values while the typical value for the number of hidden units. The study discussed the effect of different values of epochs, such as 5, 10, 30, 50, and 100. The results are presented in Fig. 12. It can be figured out that number of epochs as 50 outputs best recognition accuracy of speech disfluencies with a value of 96.67%.
Finally, the last experiment was carried out to determine the effect of the various number of hidden units by using the best parameters obtained from the last three experiments. The number of hidden units was varied from 50 to 200 for analysis, and the result is presented in Fig. 13. It can be seen that hidden units as 100 generated better classification accuracy of 96.67%. From the experiments, it was determined that the optimal value for learning rate, batch size, number of epochs, and number of hidden units was 10 -2 , 8, 50, and 100, respectively.

B. Analysis of Experimental Results
The classification efficiency of the proposed WMFCC and Bi-LSTM based model is verified by carrying out the comparison experiments of the proposed model and unidirectional LSTM. During the experiment, the dimension of the WMFCC feature vector was 14, the frame length was 30ms with overlapping of 75%, the pre-emphasis factor alpha was 0.98, the single Bi-LSTM layer with 100 hidden units, the activation function was Adam, the epochs was 50, the batchsize was 8, and the learning rate was set to 10 -2 .
The accuracy and loss function of Bi-LSTM and LSTM is represented in Fig. 14 and 15. From Fig. 14, it can be observed that the Bi-LSTM model has slow convergence speed and high accuracy as compared to the LSTM model. From Fig.15, it can be seen that the Bi-LSTM model decreases the loss value to a shallow stable value as compared to LSTM. Thus, it is concluded that the proposed model accomplished a stronger convergence effect.
The complete illustration of the validity of the proposed model can be performed by using the evaluation indicators of relevant experiments such as precision, recall, specificity, and F measure according to the confusion matrix, on test datasets.
The comparison of the LSTM model and the proposed Bi-LSTM model is displayed in Table VI. The results elucidated that WMFCC and Bi-LSTM based model proposed in this work provides the best and efficient performance and the average overall classification accuracy as 96.67%.
Table VII displays the accuracy, sensitivity and specificity of various disfluency classes. In terms of detecting stuttered events, prolongation detection, and phrase detection displayed the highest sensitivity of 97.5%. Classification of word repetition samples gave the best specificity of 99.37%. The prolongation detection achieved the highest accuracy of 98.67%.
From the analysis of the above results, it is concluded that the proposed model performs better than other models, thus determining the effectiveness of long term and bidirectional dependence on information for stuttered speech analysis. Further, the feature extraction of WMFCC includes the dynamic information of the speech samples, which increases the detection accuracy of stuttered events; and also reduces the computational overhead to the classification stage.   The result summary of this study (Table VII) and previous works results in Table I give comparable results. However, a direct comparison cannot be made due to different languages, different classifiers, and different types, size, and categorical distribution of stuttered speech database, as well as ways of segmentation of database for gathering, stuttered speech samples.

V. CONCLUSION
The present research proposed an automated and efficient method based on the WMFCC feature extraction algorithm and deep-learning Bi-LSTM network for automatic assessment of the stuttered speech. The disfluencies such as prolongation and syllable, word, and phrase repetition are accurately detectable using this method. The speech samples are parameterized into 14-dimensional WMFCC feature vectors. This model can extract static as well as dynamic acoustic features by using WMFCC, which enhances the detection accuracy of stuttered events; and also reduces the computational overhead to the classification stage. The feature vectors are modeled by Bi-LSTM in both forward and backward directions and capable of learning the long dependencies, taking full account of disfluency patterns in speech frames. Experiments show that when the hyperparameters are reconfigured during the training of the model, results in an optimal configuration of parameters and leads to a highly accurate model. The optimally configured model proposed in this study is compared with the unidirectional In the future study, other feature extraction and classification techniques may be applied for improving the process of detection of speech disfluencies.