Speech Impairments in Intellectual Disability: An Acoustic Study

Speech is the primary means of human communication. Speech production starts in early ages and matures as children grow. People with intellectual or learning disabilities have deficit in speech production and faces difficulties in communication. These people need tailor-made therapies or trainings for rehabilitation to lead their lives independently. To provide these special trainings , it is important to know the exact nature of impairment in the speech through acoustic analysis. This study calculated the spectro-temporal features relevant to brain structures, encoded at short and long timescales in the speech of 82 subjects including 32 typically developing children, 20 adults and 30 participants with intellectual disabilities (severity ranges from mild to moderate). The results revealed that short timescales, which encoded information like formant transition in typically developing group were significantly different from intellectually disabled group, whereas long timescales were similar amongst groups. The short timescales were significantly different even within typically developing group but not within intellectually disabled group. The findings suggest that the features encoded at short timescales and ratio (short/long) play a significant role in classifying the group. It is shown that the classifier models with good accuracy can be constructed using acoustic features under investigation. This indicates that these features are relevant in differentiating normal and disordered speech. These classification models can help in early diagnostics of intellectual or learning disabilities. Keywords—speech development; spectro-temporal; intellectual disabilities; timescales; learning disability; classification model


INTRODUCTION
Humans use speech, the vocalized form of communication, to express their thoughts, emotions, necessities etc. In speech research, psycholinguistics, auditory cognitive neurosciences and psychoacoustics focused on cues essential for production and perception of speech sound along with processing done by the brain. Language acquisition process [1], requires to perceive sound, produce sound and relate both of them. Speech perception starts at very early stage, when child is in mother's womb but speech production starts with cooing and babbling in infants and children take several years to become fluent speakers. Speech is a primary key, used by the children at the age of 2-3 years to express their thoughts, necessities, feelings and in creating and maintaining social relationships. While growing children master this skill to intervene higher level complex cognitive tasks. Children [10] take years to learn these complex patterns. Communication is rated as abnormal in children with learning and intellectual disabilities [11][12][13]. It is beneficial to investigate the nature of speech impairment in these special children that will aid in developing tailor made therapies for their rehabilitation. Also, spotting a common speech features deficit pattern in disorders can help in classifying the group and aid in early stage diagnosis of the disease. In this study , we have examined acoustical differences between normal and group of people with mild to moderate intellectual disability to determine a pattern of speech impairments in the given disability.
Intellectual disability is the state in which deficits in the basic intelligence, social and practical skills to execute day to day necessities occur. According to American Association on Intellectual and Developmental Disabilities (AAIDD) [14], Intellectually disability can be characterized by limitation in both intellectual functioning (mental capacity : Intellectual Quotient (IQ) <=70 (approx)) and adaptive behaviour (conceptual, social and practical skill) originate before the age of 18 years. In DSM-5 [15], the severity of intellectual disability is measured by comparing the functional ability with age matched norms. It involves impairment in general mental abilities in three domains, the first is conceptual domain ( language, reading, writing, reasoning, math, memory, and knowledge), the second is social domain which includes lack of interpersonal communication skills, friendship, social judgments and empathy. The third practical domain includes deficit in personal care, responsibilities, money management, job duties and organizing school and tasks. To enhance the quality of life of these special people, many rehabilitation methods/techniques which were specific to these skills have been evolved [16,17]. One of the crucial adaptive skill i.e. lack of communication ability is most salient hurdle in rehabilitation of intellectually disabled population.
The children with mental retardation (including mild and moderate impaired population) lack in phonological development [18][19][20] in their speech. These children also exhibit many articulatory deficit, delays in expressive language [21,22] and show significant limitations in grammar and syntax development [23,24] as compared to age matched control. Several neuro-anatomical and neuro-imaging studies [25] have tried to correlate the disorder with impairments in different parts of the brain. All of these research studies look at the phonological and linguistic aspect of speech of normally developing children, normal adults and subjects with intellectual disabilities. There is a shortfall in bringing quantitative base for finding a common pattern of speech impairments in the subjects with intellectual disabilities. In speech perception, auditory mechanism act as frequency www.ijacsa.thesai.org analyzer. Numerous studies [2][3][4][5][6] analyse speech production development using spectral acoustic measures. One of the Study [5] estimated pitch and intensity in the speech of children and adults for differentiating normal and intellectually disabled population. However, a different line of thought have shown that temporal structure of speech plays a crucial role [7][8][9] in speech perception and production. These studies [26,27] have proposed that amplitude envelope of speech between 20msec-500msec carry information representing phonetic segment duration, formant transition, place of articulation, stress and syllabicity. The spectro-temporal features [28][29][30][31][32][33] at these timescales highly correlated speech intelligibly and ability to understand or comprehend. Studies reported impairments in short time scales [34][35][36] and long time scales [37,38,40] on the speech signal of children with neurodevelopmental disorders. Most of these studies focused on the impairments in the spectro-temporal features in the speech of subjects with intellectual disabilities but none has developed the classification model for these groups based on features encoded at multiple timescales. The present study has partially filled this gap by extracting spectro-temporal features at two timescales followed by building a classification model based on these features. It focuses on spectro-temporal features encoded at two timescales: short (25-50msec) and long (100-500msec).
The study was completed in two phases. During the first phase, the statistics of short and long timescales for all speech samples in each group were calculated. In the second phase different classification models to classify normal and intellectually disabled group based on the statistics obtained in first phase were made. All the models showed good accuracy which demonstrates the differentiating power of the features that are investigated in the present study. This indicates that these features can be used for designing diagnostic and therapeutic tools for children with mild to moderate intellectual disabilities.

A. Subjects
The speech database included two groups, TD (Typically developing) and ID (Intellectually disabled) with ages between 5 to 20 years. TD group was further divided into three subgroups TD1,TD2 and TD3. Detailed description is provided in Table I.

B. Experimental settings and procedure
The 3 minutes (approx) speech recording was done at the sampling rate of 22.5 kHz and 16 bit PCM (Pulse Code Modulation) using head fitted microphone of Sony recorder (ICD-UX533F). The recording procedure consisted of picture naming and reading task. Picture naming task included the pictures of common animals, birds, vegetables, fruits and objects. The participant has to speak the name of the picture presented before them. Reading task compromised reading of the phonetically rich article from a book. Children ages (4-5 years) from TD1 group and all participants of ID group could perform picture naming only. As children in TD1 group were too young to read the book and similarly, participants from ID group could not read the book but could perform picture naming as they were familiar with these pictures. Participants of TD2 and TD3 group have performed both the tasks. The tasks were explained to all the participants before taking their recordings. The low quality speech recordings were checked and poor quality samples were excluded from the experiment

A. Spectro-temporal feature extraction
Speech is a complex signal that fluctuates rhythmically in time and timbrally in frequency. As mentioned earlier (section I: Introduction), the speech signals can be effectively analysed by investigating spectro-temporal features that are encoded at different timescales. In the present study, we have extracted spectro-temporal features encoded at two different time-scales (short timescales: 25-50 msec and long time scales: 100-500 msec) and developed a classification model to differentiate the speech of the normal children and children with mild to moderate intellectual disabilities. The block diagram and workflow of the proposed system is mentioned in Fig. 1. Each speech sample was pre-processed for the removal of background noise. Spectrogram was then generated for each speech sample using 512 point FFT (Fast Fourier Transform) and 22 msec time windowing. The resultant spectrogram image was pre-processed by applying filtering and thresholding for removing noise and extracting spectro-temporal from the converted binary image. The resultant binary image holds spectro-temporal features separated from its surrounding. These features can be extracted using 8-connected component algorithms [39]. These spectro-temporal features were encoded at different timescales (for detail section I: Introduction). Speech signal carries acoustic and linguistic information at multiple timescales [ 26 , 37, 40]. For each speech sample, the spectro-temporal features were classified into two time-scales short time-scales (25-50 msec) which carries information such as stress, intonation, voicing and formant transition and long time scales (100-500 msec) which include features representing tempo, syllabicity and rhythm. As discussed earlier, both timescales altogether define the prosodic information in the sample, hence it is meaningful to calculate the ratio of short versus long timescales and consider it as one of the important attribute. The statistics of short timescales, long timescales and their ratios were taken as the prime input attributes to design the feature vector for classification. To analyse the developmental pattern of these spectro-temporal features encoded in different timescales with age and compare these www.ijacsa.thesai.org patterns with intellectually disabled population, different classification algorithms were constructed.

B. Classification techniques
The spectro-temporal features measured during phase-I ( mentioned in Fig. 1) served as input to the classifier. The count of spectro-temporal features at short timescales, long timescales and their ratios were calculated for each speech sample in both the groups. The created feature vector is shown in Table II. and it represents the number features in short timescales range, the number of features in long timescales, their ratios and the age of the subjects as attributes. These input attributes were used to train five different types of classifiers to develop a predictive model for distinguishing speech of normal and intellectually disabled subject. We have applied k-Nearest Neighbour, Naive Bayes, SVM (Support Vector Machine), Decision tree and Neural Network approaches. The statistical performance of learning in the dataset was estimated by applying 10 fold cross-validation. In this process, model was first trained and then trained model was used to measure the performance. The present model used 70% cases for training and rest 30% for testing. k-NN is an instance based learning that takes the input of k closest training examples in the feature vector and assign the unknown example to the class of single nearest neighbour if k is 1. Gain ratio was used as a criteria to predict the output attribute based on input attributes in decision tree. Naive Bayes is a conditional probabilistic model assumes that the contributions of the attributes are independent to the probabilistic label. Lib-SVM (Support Vector Machine) used with C-SVC and rbf kernel for two class classification, it maps samples nonlinearly into higher dimensions. Neural Network learns with feed forward neural network trained by a back propagation algorithm. Neural network is based on adaptive system that changes the structure based on the information passes in learning phase with propagation and weight update provided in supervised learning. The objective of the current study is to classify the speech of typically developing and intellectually disabled population. The results are discussed in next section   Fig. 2 shows the number of spectro-temporal features at short and long timescales for two children (ages:4-5 years) representative from TD1 and ID group and two adults(male) representative from TD3 and ID group. The Fig 2. inspection reveals that the number of short timescales is more in number than long timescales. Table 3. represents the statistics of short and long timescales in TD1,TD2 ,TD3 and ID groups. In order to examine the validity of these stats significance testing was done. The unpaired t-test for features encoded at short timescales between age matched peers in TD and ID found to be significantly different (p=0.005521), where as long timescales measure (p=0.120905) did not show any significant effect on groups. To check whether the short and long timescales were affected by gender of the participant, unpaired t-test between male and female subjects for TD1, TD2 and TD3 were performed and no significant difference was found amongst genders in these groups. These findings suggest that the short and long timescales in the speech were not influenced by the gender of the speaker. A one-way ANOVA between children of ages 4-5 years, 7-8 years and adult for short timescales was performed in Intellectually disabled group and no significant difference was found (F=0.472523, df=2, p=0.629). However, in case of Typically developing (TD) group, the short timescales measures were significantly different (F=10.04, df=2, p=0.000677). The similar pattern was present for the ratio (short/long) of timescales. Hence, it can be concluded that the short timescales are changes with age in typically developing children but this growth is not happening in intellectually disabled subjects. The changes in long timescales were not significant in typically developing ( F=3.43, df=2, p=0.048). On the other hand, the long timescales are developing relatively well in subjects with intellectual disabilities ( F=9.176132, df=2, p=0.001368).
From these findings, we can conclude that features belonging to long-timescales (100 -500 msec) develop earlier than those belonging to short timescales (25-50 msec). Whereas in intellectually disable population, spectro-temporal features at short timescales were not matured even in adults. It is important to understand the relationship of these timescales so, ratio (short/long timescales) was calculated and considered as one of the attribute. This differentiation of groups can be quantified further by classifying them into two classes, typically developing (TD) and Intellectually disabled (ID). The feature vector for classification consisted of short timescales, long timescales, their ratios and age. Table II. represents the feature vector for classification using k-NN, Naive Bayes, SVM, Decision tree, SVM and Neural Network respectively. The statistical distribution of attributes is shown in Table III.

A. Classification
The accuracy, class recall and class precision for classification of typically developing (TD) and Intellectually disabled (ID) using different classification techniques listed in Table IV. (a), (b), (c), (d) and (e). A good accuracy results were obtained by all five classification techniques. Neural Network and decision tree approach showed better accuracy than other four techniques. Neural Network showed highest accuracy of 95.28% as shown in Table IV (E) where as k-NN model was least accurate for this dataset amongst all five classification techniques. The decision tree approach gave better results than Naive Bayes, k-NN and SVM with the accuracy of 91.39%. In Fig. 3, the Receiver Operating Characteristics(ROCs) for visualizing the performance of different classification techniques are provided. An optimistic ROC curve was calculated which consider correct classified examples before false. From Fig. 3 , it is clear that the decision tree approach has more area under the curve (AUC) than neural network and other three classification techniques. In the next section, conclusion and future scopes are provided.

V. CONCLUSION AND FUTURE SCOPE
In the present study, statistical properties of spectrotemporal features encoded at two different timescales were examined in the speech of normal and intellectually-disabled groups. The spectro-temporal features encoded at short timescales were developing well in normal developing children but this development was inadequate in the speech of age matched children with intellectual disabilities. The spectrotemporal features encoded at short and long timescales were tested on 52 normal and 30 participants with mild to moderate intellectual disabilities. The results revealed that features of short timescales, long timescales and their ratio were significantly different in intellectually disabled and age match control. These features were used to classify intellectually disabled and normal population by developing different classification models. A good accuracy of range of 90% was achieved through decision tree and neural networks. The above system can be used as an early assessment aid for speech disorders and intellectual disabilities. In Future, we wish to apply this system for specific disorders like autism, ADHD, SLI etc . The study can be made more robust by further classifying the intellectually disabled in to mild and moderate level.