Variability of Acoustic Features of Hypernasality and it ’ s Assessment

Hypernasality (HP) is observed across voiced phonemes uttered by Cleft-Palate (CP) speakers with defective velopharyngeal (VP) opening. HP assessment using signal processing technique is challenging due to the variability of acoustic features across various conditions such as speakers, speaking style, speaking rate, severity of HP etc. Most of the study for hypernasality (HP) assessment is based on isolated sustained vowels under laboratory conditions. We measure the variability of acoustic features and detect HP using vowel /i/, /a/ and /u/ in continuous read speech with gradually increasing severity of HP of CP speakers. Linear predictive coding (LPC) method is used for acoustic feature extraction. In first part of our study, we observe the variation in acoustic parameters within and across vowel category with gradually increasing HP. We observe that inter-speaker variability in spectral features among CP subjects for vowel /i/ is 0.96, /a/ has 1.13 and vowel /u/ has 2.05. The inter-speaker variability measurement suggests that high back vowel /u/ is mostly affected and has the highest variability. High front vowel /i/ is least affected and has the lowest variability with HP. In the second part, ratio of vowel space area (VSA) of hypernasal and normal speech is calculated and used as a measure for HP detection. We observe that VSA spanned by CP subjects is 0.65 times less than isolated uttered Bangla nasal VSA and 0.43 times less than read speech uttered English oral VSA. Keywords—Speech analysis; Acoustic feature; Hypernasality; Cleft palate; Velopharyngeal opening; Vowel space area; Read speech


INTRODUCTION
Speech is the acoustic end product of the thoughts which is originated in the brain.Disordered speech in which speech quality is reduced may hamper normal communication.Due to physical or neurological impairment the speech quality may be reduced which is a challenge in professional and social activities [1].A specific example of a vocal tract dysfunction that reduces the speech quality is defective VP mechanism [2] which can be caused by physical defects (CP) [3].CP is an incomplete soft or hard palate formation that separates nasal and oral cavities, generating speech disorder such as HP and is the second most frequent congenital malformation worldwide [4].HP is the most common pathology suffered by patients with CP.The research community is becoming more and more interested in the development of techniques for it's detection and evaluation [5][6][7][8].
Voice pathologies are usually diagnosed by invasive techniques using different instruments which may bring discomfort to the patients.This is also not recommended by health physicians as they can produce psychological stress in patients.One of the non-invasive techniques for voice pathology is based on acoustic analysis of voice.Acoustic features contain information regarding voice source and vocal tract behavior.Any abnormality in speech arising from physical defect may be assessed appropriately using speech features of vocal tract or voice source.Fig. 1 shows how the vocal tract transfer function varies for vowel /i/ and /i/ (normal oral, various degree of HP).Consequently, it is convenient to apply signal processing techniques in determining the effectiveness of speech features.The identification of severity of HP using speech processing techniques can be a useful contribution to the diagnosis of CP speakers which aids to decide the severity of HP and what support (surgery or speech therapy) is to be provided to the CP speakers by the physicians.Most of the study on HP assessment using acoustic features concentrates on sustained isolated vowel.This study explores the variation in acoustic parameters and HP assessment in read speech with gradually increasing VP opening of CP speakers.The rest of the paper is organized as follows: Section II presents a description of previous work.Section III describes about speech materials, Section IV discusses about used method and results obtained from experiments and the observations.Section V discusses about the analysis of the result.Section VI concludes the paper.

II. ACOUSTIC ANALYSIS OF SPEECH
Since 1970, researchers have studied abnormal changes in the acoustic features of voices.In 1971, Fujimura, et al. [9] made a detailed analysis of the variation in the voice tone.Fant [10] found that for a nasalized vowel, oral-nasal coupling introduces an additional pole-zero pair into the oral vowel.HP detection is performed by many researchers by analyzing the disordered speech, synthesized hypernasal speech, and nasalized vowels of normal speech.Signal processing based techniques for the assessment of HP is carried out by finding the deviation of the spectrum of hypernasal vowels from the non-nasalized vowels.
Main features of nasalization are changes in the lowfrequency regions of the speech spectrum, where there is a very low-frequency peak with wide bandwidth along with the presence of a pole-zero pair due to the acoustic coupling was shown by Hawkins and Stevens [5] and Glass and Zue [7].Chen [6] and Hawkins [5] showed that nasalization gives rise to changes in the spectrum in the high frequency region in addition to introduction of a new pole-zero pair in the first formant region.However, these changes are not as consistent across speakers and vowels as those in the low frequency region.Sensitivity of the Teager energy operator for multicomponent signals, to detect HP was used by Cairns et al. [11].Presence of zeros in spectrum is used as a cue for the detection of HP by Rah et al [12].From the literature, it is observed that the acoustic cues of hypernasal speech are additional formants, antiformants, formant bandwidth broadening.Study on perceptual analysis was carried out on the vowel sounds by adding nasal formants in spectrum [13][14], showed that formant at 250 Hz plays important role in the nasalization of vowels.Group delay function was used successfully for hypernasality detection using acoustic features of speech in CP speakers [14][15].
VSA refers to the two-dimensional area bounded by lines connecting first and second formant frequency coordinates (F2/F1) of vowels [16].VSA has been used for various purpose such as studying vowel identity, speaker characteristics, speech development, speech disorder, vowel distinctiveness and assess intelligibility that influences vowel production [17][18][19].VSA computation is done by measuring F1/F2 values for several utterances for each of the three point vowels, /a, i, u/ for plotting vowel triangle.The mean F1/F2 value for each of the corner vowels is then used to compute the area of the triangle formed by the corner vowels.As frequencies of the first and second formants is related to the size and shape of the cavities created by mouth opening (F1) and tongue position (F2), the VSA reflects the dynamics of the articulators.In general, studies have shown that VSA is larger in speech that is clearer and more intelligible than speech associated with smaller VSAs.This is because if the articulatory excursions are greater it results in more distinct acoustic vowel targets.Thus, the VSA related to vowel distinctiveness have been quite successful in the study of speaking style and languages.As abnormal vowel formant change (centralization) is a common feature of speech production deficiency, VSA estimations.
is used for characterizing speech motor control, including speech development, speech disorders.In a study with large database, automatic assessment of VSA was done and is reported to have good result than the traditional method of VSA measurement [20].In another study it was found that psychological distress and depression reduces the VSA [21].Hypernasal vowel speech near a plosive of CP children were analyzed and proposed an objective measure.It was found that mean falling and rising slopes of the amplitude in the nasalized vowel are smaller than those of the oral vowel [22].
Most of the study on acoustic analysis of HP is based on isolated sustained vowel for HP.There has not been much contribution on variation in speech parameters with defective VP opening causing hypernasal speech in continuous read speech sentences which exhibit greater complexities with respect to speech intelligibility, which formed the motivation for this study.This study aims to investigate how useful the extracted speech parameter information from read speech rather than isolated sustained vowel to reflect the impact of the underlying movement disorder in terms of VSA for the assessment of HP.This report presents the study on the VP opening variability on speech features and the relationship between HP assessments using VSA in read speech.
Nasality has similarity with hypernasality in production which is reflected in acoustic features.Most of the study is concerned with nasalization of vowels near a nasal consonant to make a comparison with HP.In this study Bangla nasal vowels are used to make a comparison of HP with nasality.In Bangla, all the seven vowels have their nasal counterpart.Bangla is a language in which nasality is phonemic.Thus to make a comparison between nasality and HP, VSA of Bangla oral-nasal vowels are taken into account.Previous work was carried out to explore vowel space of Bangla oral-nasal vowel pairs [23].Acoustic categorization of Bangla oral-nasal vowel pair was done and was shown that VSA of nasal vowel shrinked within the oral VSA.

III. SPEECH MATERIALS
HP is usually observed in vowels and voiced oral consonants.As vowels can be sustained for a relatively longer duration as opposed to consonants, only vowels are considered for the current study.Among all the vowels only the vowels /i/, /a/ and /u/, are considered to represent the three categories of vowels namely, front, mid, and back.The aim of this section is to describe how the speech samples are acquired.
For the purpose of this study, a database is recorded by three male speakers, aged around 25-27 and native non-CP Bangladeshi Bangla speaking.For the acoustic analysis and detection of HP speech data are collected from 7 male speakers with CP and 4 normal non-CP.Three types of data used are: 1) English vowels (/a/, /i/ and /u/) obtained from read speech of eight speakers from normal non-CP (EO) to gradually increasing severity of HP CP speakers 2) Isolated Bangla oral (BO) (/i/, /a/, and /u/) and Bangla nasal (BN) vowels (/i/, /a/, /u/) obtained from three non-CP speakers.The best one is selected for the work.
The experimental part consists of recording each of the isolated vowels at a normal speaking rate three times in a quiet room in a DAT tape at a sampling rate of 48 kHz and 16 bit value.The best one of these three speakers sample data is used for the study.Speech data for three English vowels /i/, /a/ and /u/ of normal and CP speakers with gradually increasing severity of HP are obtained from read speech data of American Cleft Palate Craniofacial Association.A stable portion is cut from each of the selected vowel for the purpose of our work.These digitized speech sound are then downsampled to 22050Hz and normalized for the purpose of analysis.Vowels uttered by non-CP speakers are used as reference.

IV. ACOUSTIC ANALYSIS OF NORMAL AND HYPERNASAL SPEECH AND RESULTS
In this section, preprocessing and the method which is chosen to extract the acoustic features from speech signal is discussed.

A. Preprocessing of the Speech Signal
Speech signal is non-stationary in nature, but it can be assumed to be stationary over short duration called frames by windowing for the purpose of analysis.Speech signal is analyzed frame-wise, with a frame-rate of 50-100 frames/sec, and for each frame the duration of speech segment is taken to be 20-30 msec.A new frame is obtained by shifting the Hamming windowing function by 10msec to a subsequent time.After normalization and windowing, the speech samples are ready to be used for analysis.

B. LPC Analysis Technique
LPC analysis decomposes digitized speech signal into it's fundamental frequency (F0 and it's amplitude i.e. loudness of the source) and the vocal tract is represented by all pole filters, which can be modeled by a number of coefficients known as LPC order.The vocal tract system is excited by an impulse train for voiced speech or a random noise sequence for unvoiced speech.Thus, the parameters of this model are: voiced/unvoiced classification, pitch period for voiced speech, gain parameter G, and the coefficients {a k } of the digital filter.Eq. 1 expresses the transfer function of the filter model in zdomain, where V(z) is the vocal tract transfer function.G is the gain of the filter and {a k } is a set of autoregression coefficients called Linear Prediction Coefficients.The upper limit of summation, p, is the order of the all-pole filter.(1)

C. Acoustic Analysis
This paper examines the variability of CP speakers characteristics using LPC based acoustic features for HP assessment measured by VSA in read speech.The formant analysis is carried out for particular selected speech data.The utterances made by 2 normal subjects as explained in section III are analyzed and reference level is considered for each selected vowel phoneme.If the prediction order is not chosen properly LP-based formant extraction technique may produce ambiguous result for the detection of HP.If the order of analysis of LP spectrum is too low then it fails to resolve two closely spaced formants and if the higher order LP analysis is chosen, it may introduce many spurious peaks in the resultant spectrum.Speech samples are analyzed by LPC method using LPC order 28.Fig. 2 shows the block diagram of procedure of LPC analysis for procuring speech parameters.The selected speech samples are windowed using hamming window of 20ms at 10ms interval.Acoustic parameters (vocal tract parameters, voice source parameters) for the three types of selected speech data (non-CP Bangla oral-nasal vowels, non-CP English oral vowels, CP English hypernasal vowels) are calculated.Acoustic features are extracted from a stable portion of segmented vowels of read speech and a part of data is tabulated in Table 1.

V. VARIABILITY OF ACOUSTIC FEATURES WITH INCREASING VP OPENING AND ASSESSMENT OF HP
In order to study the variation of acoustic features with HP and assess HP within CP speakers, scatter plots and VSA (Isolated Bangla oral, read English, isolated Bangla nasal, read CP English) are plotted among the three types of speakers and language.Fig. 4 shows the block diagram of the working procedure.As discussed in section II, VSA is the acoustic space area which contains information regarding diagnosis and treatment of defects of speech organs and it's function.Simplest relationships between vocal tract configurations and formant frequencies take place for vowels which is utilized for this study.

A. Variability of Acoustic Features
Vocal tract transfer funtion for various degree of HP is plotted in Fig. 5.As the severity of hypernasality increases, it is reflected in the spectrum and changes are visible as compared to the oral vowel and nasal vowel spectrum.New spectral peaks are visible at near about F1 and F2 noticeably around 200Hz, 500Hz, 1kHz, 1.5kHz depending on the vowel.Fig. 6 plots the variation of F1, FN1 (first nasal formant in hypernasal /i/) and FN2 (second nasal formant in hypernasal /i/) against HP.It is observed that F1 has increasing tendency in vowel /i/.This indicates highness and frontness of this vowel reduces, as VP opening increases and more air flows into the nasal tract resulting in reduction.FN1 < FN (Nasal formant of BN vowel /i/ is located around 700 Hz) and FN1 has a decreasing tendency as HP increases.As FN1< FN, FN1 should increase with increasing HP.FN2 > 1100Hz>FN.As FN2>FN, FN2 should increase with increasing HP.FN lies between FN1 and FN2.F1 and F2 values in Hz for each vowel for all speakers are converted to Bark scale which provides more appropriate frame of reference.Each vowel is represented by their F1 and F2 values displayed on scatter plots as shown in Fig. 7 for hypernasal speech.The scatter plot of F1 and F2 for vowels reflects the inter-speaker variation within vowel and across vowels making them useful for differentiation and identification of vowels.Fig. 7 plots the individual and mean formant values (F1 and F2, in Bark) in vowel space for the vowels /i a u/ measured for all speakers in the selected speech data described in section II.Each point represents the mean of three formant measurements per speaker.In the considered utterance context, various degrees of inter-speaker variability is measured in terms of standard deviation about the mean.Variability of acoustic features among CP speakers are calculated to be different depending on the vowel.The inter-speaker variability among CP subjects for /i/ is 0.96 with mean (4.9,13.82).For pronouncing normal /i/ articulator is characterized by semi-openness, and has the highest front position among the vowels./i/ and /u/ has the lowest F1 among the vowels as observed from Fig. 6.Among vowels, /u/ has the highest back position.During the production of /u/, articulators are characterized by liprounding, closeness, backness.Vowel /u/ shows the highest variability among speakers in the concerned speech data reflecting to differences in articulatory openness for some speakers.The standard deviation of /u/ is given by 2.05 with mean (4.58,9.64).The amount of inter-speaker variability of CP speakers in the high front vowels /i/ is less than open vowel /a/ is 1.13 with mean (5.65,10.46)and high back vowel /u/ which is 2.05.

B. Variability of VSA and HP Assessment
Fig. 8 (a) shows average VSA of all speakers spanned by mean values for the three repetitions for each of the three vowels.The differences between isolated oral VSA, isolated nasal VSA, read speech average VSA of CP speakers is investigated.Four types of VSA are obtained.VSA of isolated oral vowel marked by blue triangle has the highest area.Isolated nasal vowel VSA marked by red triangle has the second highest area.Read speech vowel of non-CP speakers marked by green triangle has the third highest area.The lowest VSA marked by magenta triangle is obtained for average of CP speakers read speech.The results show that VSA isolated oral > VSA isolated nasal > VSA read oral >VSA HP Fig. 8(b) plots VSA against degree of HP graph for all speakers as well as average VSA of CP speakers concerned in this study.This graph shows that as the degree of HP increases VSA changes.Isolated uttered oral vowel has the highest VSA, read speech has the second highest, isolated uttered nasal VS is fronted, shrinks and has VSA smaller than isolated oral vowel which is according to the previous study [20].As the VP opening of CP speakers increases gradually VSA changes, but no gradual change is observed.This may be partially due to particular vocal tract characteristics of individual.Ratio of vowel space (VSA ind /VSA ref ) of individual's VSA (VSA ind ) and the reference VSA (VSA ref ) is calculated to characterize how large the individual's vowel space of CP speakers is to the reference (BN and EO) VSA.Fig. 9 shows the vowel space ratio obtained across various VP opening conditions by taking BN and EO as the reference.Highest vowel space ratio came out to be 0.65 and 0.43 while taking BN vowel and EO vowel as reference respectively.These measures may be used as threshold for determining HP.Therefore, VSA of CP subjects is at least 0.65 times less than isolated uttered BN VSA and 0.43 times less than read speech uttered EO VSA.The significant reduction in VSA appropriately reflects the effect of HP of CP speakers across various conditions of severity showing the centralization tendency of articulators while pronouncing the vowels leading to reduction of speech clarity.Therefore as observed from this study, VSA of connected read speech of CP speakers is suitable for detecting HP of CP speakers across various conditions of severity.

VI. CONCLUSION
The study brings together the isolated Bangla oral-nasal vowels and read English vowel for normal and CP speakers.This leads to study the variations of extracted vocal tract features and their comparison in terms of VSA for determining the status of speech disorder of CP patients.The main objective of this work is to study the variation of acoustic features for CP speakers due to various VP opening and assessment of HP.For this purpose, an estimate of the VSA for each of the 9 speakers utilizing the speech data per speaker are computed.The evolution of VSA with the 7 degrees of hypernasal articulation is analyzed.This triangle consists of the three vowels /a/, /i/ and /u/ represented in the space of the two first formant frequencies F1 and F2.The first main conclusion is that interspeaker variability of HP among CP speakers is measured by calculating mean and standard deviation of the selected vowels and /u/ shows the most variability among speakers.Second main conclusion is the significant reduction of the vocalic space as speech becomes less articulated.As the articulatory boundaries are less marked, the resulting acoustic targets are less separated in the VSA.
This study can be further be extended to make a comparison with sustained isolated hypernasal vowel data of CP speakers.Reduced intelligibility in hypernasal speech may partially be explained by this.Read speech of CP speakers is articulated differently from isolated vowel speech and reading proficiency might be a factor for which further investigations are required.

Fig. 1 .
Fig. 1.Vocal tract transfer function for /i/ and /i/for normal and CP speakers

Fig. 3
shows the GUI used for acoustic feature extraction.The LPC spectrum is studied for formant analysis.The speech data is analyzed to check the disorders due to various VP opening.The acoustic analysis Windowing g., vocal tract parameters, voice source parameters, vowel triangle area) is conducted on speech data.

Fig. 3 .
Fig. 3. GUI of obtaining LPC spectrum from speech waveform for /i/ of CP Speaker 8

Fig. 4 .
Fig. 4. Block diagram of working procedure for spectral variability and ratio of VSA calculation

TABLE I .
VOCAL TRACT PARAMETERS (MEAN VALUES OF THREE OBSERVATION) FOR VOWEL /I/ FOR NON-CP SPEAKERS