Quantitative Analysis of Healthy and Pathological Vocal Fold Vibrations using an Optical Flow based Waveform

The objective assessment of the vocal fold vibrations is important in diagnosing several vocal diseases. Given the high speed of the vibrations, the high speed videoendoscopy is commonly used to capture the vocal fold movements into video recordings. Commonly, two steps are carried out in order to automatically quantify laryngeal parameters and assess the vibrations. The first step aims to map the spatial-temporal information contained in the video recordings into a representation that facilitates the analysis of the vibrations. Numerous techniques are reported in the literature but the majority of them require the segmentation of all the images of the video, which is a complex task. The second step aims to quantify laryngeal parameters in order to assess the vibrations. To this aim, most of the existing approaches require an additional processing to the representation in order to deduce those parameters. Furthermore, for some reported representations, the assessment of the symmetry and the periodicity of the vocal fold dynamics needs setting up parameters that are specific to the representation under consideration; which makes difficult the comparison between the existing techniques. To alleviate these problems, the present study investigates the use of a recently proposed representation named optical flow based waveform, in order to objectively quantify the laryngeal parameters. This waveform is retained in this study as it does not require the segmentation of all the images of the video. Furthermore, it will be shown in the present work that the automatic quantification of the vibrations using this waveform can be carried out without applying any additional processing. Moreover, common laryngeal parameters are exploited; hence, no specific parameters are needed to be defined for the automatic assessment of the vibrations. Experiments conducted on healthy and pathological phonation show the accuracy of the waveform. Besides, it is more sensitive to pathological phonation than the state-of-the-art techniques. Keywords—Quantification; vocal fold vibrations; optical flow based waveform; pathology


I. INTRODUCTION
Diseases that affect our ability to speak can have disastrous impacts on our lives as our voice remains the most important way to communicate.It is then important to diagnose and treat vocal disorders as soon as signs like hoarseness, difficulty of speaking and many others, take place.To make a sound, the right and left vocal folds (VF) consisting of two membranes located in the larynx, open and close periodically, and their vibrations are symmetric.For the aim of diagnosing their functional behavior, the examination of these high-frequency vibrations is made possible thanks to the analysis of high speed videoendoscopy (HSV) recordings.HSV allows to capture the VF movements at a frequency that could exceed 4000 images/second.However, this high frequency makes it hard to visually assess the vibrations.For this reason, in a first step, the spatial-temporal information contained in the HSV recordings are commonly mapped into representations appropriate for the human visualization and analysis.Then, in the second step, the VF vibrations are assessed either visually based on the shape of the considered representation, or automatically through the quantification of some vibratory parameters.Many efforts are invested by researchers to propose objective techniques for the assessment of the VF vibrations.However, they suffer from one or more of the following limitations: 1) In order to be generated, the representation used to reflect the VF vibrations requires the glottal segmentation in all the images of a HSV recording.This is a hard and a complex task as the segmentation of the VF necessitates the user intervention and the number of images in a video is large given the high sampling rate.Furthermore, the segmentation may fail when the VF are completely closed.Examples of such representations are the glottal area waveform (GAW) [1], the digital kymography (DKG) [2], the phonovibrography (PVG) [3], the glottovibrography (GVG) [4] and more [5], [6].
2) The objective assessment of the vibrations through a representation, mostly requires a further processing of it in order to be able to derive the laryngeal parameters.For instance, the DKG should be segmented in order to derive parameters like vibration amplitudes, periodicity indices and so on [7].The same holds for the PVG [3], [8] and the GVG [4].
3) The automatic assessment of the vibrations is sometimes based on parameters that are specific to the representation under consideration.Researchers agree on the features to be assessed namely the periodicity and the symmetry of the vibrations; but, different parameters aiming to quantify these features can be found in the literature.Consequently, it becomes difficult to compare between the representations in terms of efficiency and accuracy of the vibration assessment.For instance, parameters specific to PVG are proposed in [3], [8].A comparative study conducted in [9] investigates the efficiency of parameters derived from different methods like GAW, DKG and the laryngotopography [10].
To alleviate the limitations mentioned above, the present work investigates the use of a recently proposed optical flow based waveform (OFW) [11] in order to quantify the vibratory characteristics.The exploitation of this waveform is retained in this work for mainly three reasons.First, to generate the OFW, the segmentation of only one image per vibratory cycle is necessary.This allows a remarkably decrease in the amount of images to be segmented.Furthermore, the segmented image corresponds to the one of maximal opening of the glottis in each vibratory cycle; and hence, results of high accuracy could be obtained.Second, the OFW can be directly exploited for the automatic assessment of the vibrations without any further processing.Third, the quantification of the vibratory characteristics can be easily and accurately computed through the OFW based on the most commonly used parameters.
The most important and common parameters in the literature are [7], [12], [13]: the fundamental frequency, the leftright phase symmetry, the left-right amplitude symmetry, the time periodicity and the amplitude periodicity.The accuracy of the objective analysis of the OFW is evaluated in comparison to the measures obtained by the analysis of the electroglottographic (EGG) signals [14] and DKG.Experiments show that by using the same parameters, the OFW is more sensitive to a pathological phonation than DKG.
The remainder of this paper is organized as follows.Section II describes the quantification technique and the data used for the evaluation of the obtained measures.The results are depicted in Section III and discussed in Section IV.Finally, some conclusions are drawn in Section V.

A. Materials
Video recordings corresponding to healthy and pathological vibrations of the VF are analyzed in order to evaluate the accuracy of the quantification measures.HSV recordings of healthy phonation are provided by E. Bianco and G. Degottex-IRCAM [15], [16] and contains about 48 videos along with the corresponding EGG and audio signals.Videos corresponding to disordered VF of different pathologies are publicly available online 1 .

B. Brief Overview of the OFW Technique
The rOFW (resp.the lOFW) carried out at a level L from the posterior to the anterior commissure is the trajectory of a point located on the right VF (resp.left VF) at the glottal level L, during phonation.Typical values of L are 25%, 50% and 75%.Fig. 1 is an illustration of the OFWs that correspond to healthy vocal folds at the three glottal levels.The amplitudes of the lOFW are displayed in the negative part for more clarity.
To generate the waveforms, the OFW technique proceeds according to the following steps.After sampling the HSV video recording into K images {I (k) } k=1,...,K , the region that includes the glottis and the VF is firstly detected according to the technique described in [4].All the processing is carried out on this region of interest in order to alleviate the computations.The second step aims to partition the set of The optical flow (OF) estimation is carried out in the backward and the forward directions from the reference image within the same cycle, in order to better reflect the vibratory behavior of the VF.The displacements of the points of interest along the cycle are cumulated with respect to their positions in I ref c and constitute the OFW.Further details are given in [11] in which the visual interpretation of the OFW either related to a healthy or a pathological phonation showed its accuracy in assessing the important features of the vibrations such as the periodicity and the symmetry between the two vocal fold dynamics.Despite the efficiency of the visual analysis of the waveforms, an objective and automatic assessment is necessary [17].In the present study, we investigate the objective and automatic analysis of the OFW by quantifying vibratory parameters as described in the following paragraph.

C. Objective Assessment of the VF Vibrations based on OFW
The diversity of the reported representations aiming to facilitate the visual analysis of the VF vibrations (like the glottal area waveform [1], the phonovibrography [3], the glottovibrography [4], the DKG [2]) implied the appearance of numerous quantification measures.Although most of these measures are closely related to their respective representations, their common objective is to assess important features such as the fundamental frequency, the amplitude/phase symmetry and the periodicity of the vibrations.Additionally to these features, we propose to evaluate the amplitude similarities between the two VF vibrations at each instant.a) Amplitude Similarity: It is important to know how similar are the amplitudes of the two waveforms over time.For this reason, the one-way ANOVA test is carried out in order to evaluate the similarity between the right and the left vibration amplitudes at all instants.
b) Fundamental frequency: The fundamental frequency F 0R (resp.F 0L ) of the trajectory rOF W (resp. lOF W ) of the right (resp. the left) vocal fold is estimated using the non linear curve fit model defined by [18]: ( where s 0 is the direct component related to the average value of rOF W over time, a 1 , b 1 , a 2 and b 2 are coefficients.Hence, the vibration period T R is estimated and the fundamental frequency F 0R related to the trajectory of the right vocal fold is deduced as Analogously, the vibration period T L related to the trajectory of the left vocal fold is estimated and its fundamental frequency T L is computed.c) Symmetry: A large left-right asymmetry can cause voice problems especially when frequency differences between the right and the left vocal folds are significant.This behavior appears when the patient suffers from an unilateral laryngeal paralysis.The symmetrical aspect of the vibrations can be viewed in several ways: amplitude, phase and frequency differences [19].They can be assessed using the following quantification measures [7].
• Left-right amplitude symmetry index (ASI): This indicates the degree of similarity between the amplitudes of the two vocal folds when they are at their maximum value (which corresponds to the maximum closing in the waveform) within a given cycle.It is defined by the difference between the maximum right displacement a R and the maximum left displacement a L divided by the sum of them as shown in Fig. 2: A value of ASI that approaches 0 indicates a perfect symmetry in amplitude between the two vocal folds at the selected level of the glottis.
• Left-right phase symmetry index (PSI): When the two vocal folds reach their maximal opening at the same time, their vibrations can be qualified by phasesymmetric.The PSI is defined as the difference between the instants t R and t L when respectively the right and the left vocal folds reach their maximal opening, divided by the mean vibration period [7]: A value of P SI that approaches 0 indicates a perfect symmetry in phase between the vocal folds.d) Periodicity: The periodicity can be defined as the repetition of the same spatial-temporal vibratory behavior along many cycles.It is evaluated by assessing the time and amplitude periodicity.
• Time periodicity index (TPI) : This is the ratio between the shorter duration of a cycle and the larger duration in two successive cycles [7].We analyze the time periodicity in the right and the left vocal fold waveforms and determine the respective time periodicity indices T P I R and T P I L as follows: where T 1 R and T 2 R (resp.T 1 L and T 2 L ) the durations of two successive cycles in the right (resp.left) vocal fold waveform.The values range between 0 and 1.A vibration is perfectly periodic when the corresponding time periodicity index approaches 1.
• Amplitude periodicity index (API) : This is the ratio between the smaller amplitude and the larger one in two consecutive cycles.It is defined by: where A1 and A2 are the sum of the right and the left amplitudes when the vocal folds reach their maximal closing respectively calculated in two successive cycles as illustrated in Fig. 2.

III. RESULTS
In order to evaluate the accuracy of the objective assessment based on the OFW, healthy and pathological phonation are explored.

A. Healthy vocal folds
In healthy phonation, ideally, the waveforms of the left and right vocal folds should be as superimposed as possible ensuring phase and amplitude symmetries.In addition, the same pattern should be observed among many cycles.Quantitatively, the P-values resulting from the ANOVA test should be greater than 0.05, the estimates of the fundamental frequencies of the left and right vibrations should be close to each other, the symmetry indices (such as ASI and PSI) should approach 0 and the periodicity indices (as TPI and API) should approach 1.The quantitative measures obtained by the OFW are compared to those obtained by the analysis of the EGG signals and the DKG.The analysis of the EGG signal is performed according to the technique described in [14] and using the MOQ software 2 .Tests are conducted on four different types of phonation using several laryngeal mechanisms on healthy vocal folds.The aim is to evaluate the accuracy of the objective measures computed through the analysis of the OFW.The P-values computed through the ANOVA test confirm the similarity in vibrations' amplitudes of the right and the left vocal folds for all the tested mechanisms for 97% of the cycles at the glottal level L = 25%, 94% of the cycles at the level L = 50% and 93% of the cycles at the level L = 75% level.As depicted in Tables I, II, III and IV, the fundamental frequency estimated according to the proposed approach is the same for both vocal folds, and within the three glottal levels, in all the sequences related to healthy phonation.Moreover, the estimates are close to the ones estimated by DKG and EGG for all the sequences.To evaluate the periodicity of the vibrations, the TPI and API are used and their values are found to be above 0.8 for all the sequences.Note that the final decision about how well the vocal folds are healthy is delegated to the clinician based on the provided quantitative measures and his experience.The vibratory parameters measured using DKG in one hand, and using the proposed approach on the other hand, have the same values in 70% of the tested situations including the different glottal levels.In the remaining 30%, the difference between the values ranges between 0.02 and 0.14.Concerning the symmetry indices, they are 0 for the majority of cases and multiples of 0.001 and 0.01 in some tests.

B. Pathological Vocal Folds
HSVs related to different vocal fold disorders are used to evaluate the efficiency and the reliability of the OFW in assisting clinicians in the diagnosis.The first video is about a right true vocal cord paralysis where the cause of the paralysis is unknown or idiopathic.The video is sampled into 940 images corresponding to 9 cycles according to the cycle detection approach proposed in [11].The waveforms of the right and the left VF are generated and the ANOVA test is carried out.The majority of the P-values are less than the confidence level confirming the vibration asymmetries between the two VF.In addition, the estimated fundamental frequencies of both waveforms related to the left and the right VF are different and have suspicious values especially at 25% of the glottis, as shown in Table V.These values could suspect the presence of an anomaly, precisely in the right cord where the frequency is 0.02 Hz at 25% and 9 Hz at the two other glottal levels, compared to a frequency of 21 Hz at the left vocal cord for the three levels.Even though the fundamental frequency estimated by DKG has a suspicious value compared to the frequency range related to healthy VF, the OFW is found to be more sensitive to a pathologic phonation.Another suspicious value is given by the TPI parameter which is 0.5, expected to approach 1 in a healthy phonation.
The patient of the second video suffers from a cancer in the larynx involving the left vocal fold.The majority of the P-values is less than the confidence level.The values of the TPI and the API parameters in Table VI let suspect an aperiodicity of the VF vibrations.Also, the estimates of the third video is about a left true vocal cord paralysis from injury to the Vagus Nerve during carotid endarterectomy surgery.The P-values of the ANOVA test are less than the confidence level for more than half of the cycles which confirms the pathological aspect of the vibrations.Besides, the fundamental frequency estimates of the right and the left waveforms are close to each other as shown in Table VII, but their values let suspect an abnormality in the vibrations.Also, the values taken by the periodicity parameters (TPI and API) suspect disordered VF.

IV. DISCUSSION
Numerous techniques are proposed in the literature allowing to map the spatial-temporal information contained in a HSV recording into a representation suitable for the visual and the automatic assessment of the vibrations.Most of the state-of-the-art representations require an additional processing, like applying a segmentation technique, in order to quantify laryngeal parameters.In contrast, it is possible to directly measure these parameters by using the OFW.Moreover, the present study shows that the exploitation of the OFW allows to quantify the most commonly used laryngeal parameters namely the fundamental frequency of the vibrations, the amplitude/phase symmetry parameters and the time/amplitude periodicity without need of dealing with intermediate metrics specific to OFW.
Healthy and pathological vibrations are explored in this study.
Regarding healthy phonation, different mechanisms used by different persons are considered.The quantification accuracy is evaluated in comparison to the measures obtained through the analysis of the corresponding EGG signals (for the evaluation of the fundamental frequency estimate) and the DKG.For all the tested sequences, the fundamental frequency estimated through the OFW is close to those estimated using the EGG signals and DKG.Also, the measures of the symmetry and the periodicity obtained through analyzing the OFW are close to the measures obtained through DKG, for the three glottal levels.
Regarding the assessment of pathological vibrations, tests conducted on sequences of different disorders show the reliability of the OFW in suspecting a vibratory anomaly.Furthermore, it is shown that the OFW is more sensitive to a pathological phonation than DKG.

V. CONCLUSION
The objective assessment of the vocal fold vibrations is the main focus of the present study.To this aim, the optical flow based waveform is retained.Its generation does not require the segmentation of all the images of the HSV.More importantly, the present paper shows that the quantification of the vocal fold vibrations by using OFW does not necessitate an additional processing, contrary to the majority of the stateof-the-art techniques.Furthermore, it is possible to quantify the commonly used laryngeal parameters.The results show the reliability of the OFW in quantifying the VF vibrations.Moreover, it is more sensitive to a pathological phonation.

Fig. 1 .
Fig. 1.Optical flow based waveforms at three levels of the glottal area.rOFW in red.-lOFW (in blue) is displayed instead of lOFW for more clarity.

Fig. 2 .
Fig. 2. The waveform parameters.rOFW in red, (-lOFW) in blue.a R and a L : maximum displacements within one cycle of the right and left VF.t R and t L : instants when the right and the left VF reach their maximal opening.T 1 R , T 2 R , T 1 L and T 2 L : duration periods of two cycles related to the right and the left vibrations.