Abstract
The goal of our study is to reveal verbal and non-verbal information in speech features of children with autism spectrum disorders (ASD). 30 children with ASD aged 5–14 years and 160 typically developing (TD) coevals were participants in the study. ASD participants were divided into groups according to the presence of development reversals (ASD-1) and developmental risk diagnosed at the birth (ASD-2). The listeners (n = 220 adults) recognized the word’s meaning, correspondence of the repetition word’s meaning and intonation contour to the sample, age, and gender of ASD child’s speech with less probability vs. TD children. Perception data are confirmed by acoustic features. We found significant differences in pitch values, vowels formants frequency and energy between ASD groups and between ASD and TD in spontaneous speech and repetition words. Pitch values of stress vowels were significantly higher in spontaneous speech vs. repetition words for ASD-1 children, ASD-2, and TD children aged 7–12 years. Pitch values in the spontaneous speech of the ASD-1 were higher than in the ASD-2 children. The coarticulation effect was shown for ASD and TD repetition words. Age dynamic of ASD children acoustic features indicated mastering of clear articulation.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
- Acoustic features
- Children
- Typically developing
- Autism spectrum disorders
- Repetition speech
- Spontaneous speech
- Speech perception
1 Introduction
The study of speech of child with autism spectrum disorders (ASD) includes two main problems. On the one hand, disruption of ASD child’s communication makes it difficult to obtain speech material [1], on the other, the acoustic features widely used for speech analysis of typically developing (TD) children [2] do not completely reflect the specificity of ASD child speech. ASD is associated with differences in prosody production from monotonous machine-like to variable, exaggerated [3] and abnormal speech spectrum [4]. In some works, the differences between ASD and TD children in average spectra [4], formant frequencies and their energy [5] were revealed. We assume that some of the noted features of ASD child speech are associated with recording situations, language environment, methods of teaching. In our pilot study [5] we indicated clearer articulation and lower pitch values of repetition words vs. words in spontaneous speech in ASD children. At present we use repetition task as model for ASD children speech research. The purpose of our study is to reveal verbal and non-verbal information in speech features of ASD children.
2 Method
2.1 Data Collection
Participants in the study were 30 children with ASD (F84 according to ICD-10), biologically aged 5–14 years, mental aged 4–7 years and 160 TD coevals (control). For this study the ASD participants were divided into groups according to developmental features: presence of development reversals at the age 1.5–3.0 years (first group – ASD-1) and developmental risk diagnosed at the infant birth (second group – ASD-2). For these children, the ASD is a symptom of neurological diseases associated with brain damage. The ASD groups don’t differ significantly on the base of Child Autism Rating Scale [6] scores and psychophysiological tests on the stage divided child into groups. Places of recording were at home, in the laboratory, kindergarten and school. The situations of speech recording were the play with the standard set of toys, dialogues with parents (for ASD child) and the experimenter, word repetition after experimenter. The recordings were made by the “Marantz PMD222” recorder with a “SENNHEIZER e835S” external microphone.
2.2 Data Analysis
Two types of experimental methods of speech analysis were performed: perceptual (by listeners) and spectrographic.
The aim of the perceptual study is the review of listeners’ (Russian native speakers, adults) recognition of the correspondence of the word repeated by the child to the sample by the meaning and intonation contour, child age and gender on the base of speech samples. The test sequences included words from spontaneous speech (n = 4 tests, for 21 samples of ASD test, and for 30 samples of TD test) and repetition words (n = 14 tests, “adult sample – child response” for 35 samples). Repetition tests contained the words with stress vowels /a/, /i/, and /u/. We used two types of test sequences: the first type (tests 1, 3) included words with minimum coarticulation effect for stress vowels; the second type (tests 2, 4) – words with maximum coarticulation effect. The test sequences were presented to listeners (n = 220, age – 18–46, 23.7 ± 6.8y) for perceptual analysis. The factor of the adult’s experience of interaction with children (at the household level) was not significant, so all data are presented together.
A special experiment included the listening of two tests (ASD and TD with minimum coarticulation effect) for the group of adults (programmer students). For this control group of listeners (n = 12, age – 24.4 ± 12.6 y; 21–44 y) the phonemic hearing, hearing thresholds, and lateral asymmetry profile were determined. This information was needed to determine individual characteristics of an adult (gender, age, experience with children, hearing, phonemic hearing, the leading hemisphere by ear), which have the greatest impact on their recognition of the speech of children with ASD.
Spectrographic analysis of speech was carried out in the Cool Edit (Syntril. Soft. Corp. USA) sound editor. We analyzed and compared pitch values, max and min values of pitch, pitch range, formants frequency (F1, F2, F3), energy and duration for vowels and stationary part of vowels. The same parameters were compared using the Mann-Whitney U criterion in /a/, /i/ and /u/ after the following consonants: /k/ and /d/ for /a/, /b/ and /g/ for /u/, and /t’/ for /i/. Formant triangles were plotted for vowels with apexes corresponding to the vowels /a/, /u/, and /i/ in F1, F2 coordinates and their areas were compared. Vowel formant triangle areas were calculated [7]. The values of the amplitudes (energy) of pitch and the first three formants of vowels by the dynamic spectrogram were determined. The normalized values of formants amplitude concerning to the amplitude of the pitch (E0/En, where E0 is the amplitude of pitch, En is the amplitude of Fn, (n = 1, 2, 3) were calculated [5]. The intonation contour correspondence in “adult sample – child response” was analyzed in Praat program v.6.20.
All procedures were approved by the Health and Human Research Ethics Committee (HHS, IRB 00003875, St. Petersburg State University) and written informed consent was obtained from parents of the child participant.
3 Result
3.1 Perceptual Data
Word’s meaning and intonation contour: Comparative data showed that the majority of listeners (range 0.75–1.0) recognized the meaning of 67% of words of 5–7 years old TD children, 73% of words of TD children aged 7–12 years, 43% of the words of ASD-1 children and 40% of words of ASD-2 children in the test sequences containing the words from spontaneous speech. For those TD children, gender was associated F(6,734) = 19.333 p < 0.000, R2 = 0.1359 with F0 values (Beta = 0.2163) and E2/E0 (Beta = −0.6819); age correlated F(6,737) = 95.256, p < 0.000, R2 = 0.4368 with F1 values (Beta = −0.2122), F0 values (Beta = −0.1132), and E2/E0 (Beta = 0.1394) – Multiple Regression analysis.
Determining the correspondence of the intonation contour of the child’s repeated word to the sample caused greater difficulty for the listeners than determining the meaning of the word. The meaning of TD child words in all tests was recognized by listeners better than intonation contour (Fig. 1A, B).
The coarticulation context (tests 1, 3 and tests 2, 4) doesn’t have influence on the recognition of the word’s meaning and intonation contour. The exception is the data of test 2 for the correspondence of the word meaning recognition of ASD child (61.8%) vs. the words from test 1 (72.2%), test 3 (72.7%), and test 4 (72.2%).
Gender and age: The second task for listeners in repetition tests was to recognize the child age and gender. In the TD tests the speech of boys and girls is presented equally. Adults identified correctly the gender of the TD children. Exclusion was the test of TD children aged 7–12 years in which 11% of the speech samples belonging to girls were attributed to boys. In the tests of ASD children the number of samples of the speech of boys is greater than that of girls, but listeners indicate a greater number of speech patterns as belonging to girls (Fig. 2A).
The age of TD children was determined by adults almost correctly, listeners recognized age of ASD children as below the real age (Fig. 2B). These data are confirmed by the results of the control perceptual experiment (Table 1). Two tests were presented to the listeners, each test contained repetition speech of 5–12 years old TD and ASD children.
Child age was associated with average pitch values F(4,272) = 4.077 p < 0.000 (Beta = −0.5712, R2 = 0.043) – Multiple Regression analysis. The predictors for child gender F(5,271) = 11.2, p < 0.0000 were pitch values (Beta = 0.3081, R2 = 0.1712, p < 0.000), values of F1 (Beta = −0.2087, p < 0.0004), F2 (Beta = −0.2573, p < 0.0011), F3 (Beta = 0.1920, p < 0.02).
The result of the special experiment showed that listeners hearing thresholds (the left ear) influenced on the recognition of the correspondence of word meaning of ASD child to the sample F(4,7) = 2.3752, p < 0.1499 (Beta = 0.732 R2 = 0.5758 – Multiple Regression analysis). Correlations (Spearman p < 0.5) between the adult’s phonemic hearing (the repetition of triples of the syllables) and the recognition of the correspondence of the intonation contour of ASD child words to the sample (r = 0.673, p < 0.5), the child age determination (r = 0.632, p < 0.5) were revealed. The listeners’ experience of interaction with children influenced (r = 0.657, p < 0.5) the determination of TD children age. The results of this perceptual experiment correspond to data obtained by other listeners.
The larger amount of control listeners (75–100%) attributed the words of TD children to the category of corresponding by word meaning (95.8%) and intonation contour (70.7%) than words of ASD children (75% and 62.6% - meaning, intonation contour). The predictors for listeners recognition of the correspondence of ASD child word’s meaning to the sample F(1,33) = 9.1548 p < 0.004 (Beta = −0.4660 R2 = 0.21717) were values of F1 for stress vowels /a/, /i/, /u/ in the words.
15 words of ASD children from different tests recognized by all listeners (probability 1.0) are included in the new test sequence. This test was listened by 20 adults aged 22 to 81 years (group-1 – n = 6, 22–28 years, group-2 – n = 7, 37–64 years, group-3 – n = 7, 71–81 years). The best recognition (range 0.75–1.0) of the meaning of the words of children listening to the second age group was found, compared with the third one F(1,12) = 10.348 p < 0.007 (Beta = −0.6804, R2 = 0.4182 – Multiple Regression analysis). Adult gender and experience of interaction with children did not have an influence on the recognition of the meaning of ASD child words.
3.2 Acoustic Features of TD vs. ASD Spontaneous vs. Repetition Child Speech
Pitch values of stress vowels were significantly higher in spontaneous speech vs. repetition words for ASD-1 children (p < 0.001 Mann-Whitney test), ASD-2 (p < 0.01), and TD children aged 7–12 years (p < 0.05). Pitch values in the spontaneous speech of the ASD-1 were higher (p < 0.001) than in the ASD-2 children (Fig. 3A).
The differences (p < 0.01) between TD 7–12 years-old boys and girls on the base of pitch values of stress vowels in spontaneous speech were revealed. The pitch in spontaneous speech represented by the ages of all children has high values in ASD-1 and ASD-2 children (Fig. 3B). The first two formant frequencies (acoustic keys for vowel recognition) are less correlated with the individual characteristics of the child (gender and age) than their energy (Table 2).
This finding allows comparing data from different types of child speech without individual age and gender. The formant triangles of vowels for the spontaneous speech differ from the ones for the repetition words (Fig. 4A, B). The largest square of formant triangles for the repetition words vs. spontaneous speech were showed (Fig. 4E). The shifts in the values of the first two formants of vowels, leading to displacement of the formant triangles into the higher-frequency area, were seen for the vowels of ASD-1 children vs. ASD-2 peers in spontaneous speech (Fig. 4A); ASD-2 vs. ASD-1 in the repetition words (Fig. 4B). The words with the vowels with maximum coarticulation effect occupy the large area on the two-coordinate plot (Fig. 4D), the formant triangles of these vowels have a larger square (Fig. 4F) than those of the vowels with minimal coarticulation effect (Fig. 4C, F).
We found the specific features of the dynamic spectrum of the stress vowels from the ASD child words. The energy of the third formant higher (p < 0.001) of ASD child vs. TD child. This characteristic more expressed in spontaneous speech vs. repetition words.
3.3 Acoustic Features of the Repetition Words: Longitudinal Data
To confirm the assumption about the clearer articulation in the repetition words and its positive dynamic in the learning, the acoustic characteristics of stress vowels in the words repeated by five children twice at the interval of one year were compared. The significant differences between vowel’s pitch values and values of three formant frequencies from the same word repetition by every child twice per year were shown (Table 3).
The coarticulation effect for words with the stressed vowel /i/ is absent. Significant effect (p < 0.002) for the stressed vowel /a/ on the values of F0, for the words with the stressed vowel /u/ on the values of F0 (p < 0.01) and values of F3 (p < 0.006) were revealed. All children accurately repeated a larger number of words by their meaning (p < 0.01) and intonation (p < 0.01) in the second time of repetition. The energy of the third formant for vowels from repetition words with the minimum and maximum coarticulation effect differs between the first and second testing (for max. & min. coarticulation effect). The Fig. 5 presents data for severely autistic child 4 according CARS score with maximal developmental progress (Fig. 5A) and for mildly autistic child 3 with minimal developmental progress (Fig. 5B).
4 Discussion
The ability of adults to recognize gender, age, meaning of words, the correspondence of the word’s meaning and intonation contour of the child’s repeated word to the sample with less determination of ASD child was shown. Perception data are confirmed by acoustic features. We found significant differences in pitch values, vowel formant frequency and energy between ASD groups and between ASD and TD in spontaneous speech and repetition words.
Adults detected the age of children with ASD lower than the actual age that correlated with higher pitch values of ASD children. The association between child age and pitch values was shown in studies [7, 8]. Data on specific non-developmental phonetic and phonological errors of 5–13 year olds with ASD [9] confirm our results on worse word meaning recognition of ASD children vs. TD children. The meaning of the words of TD children was recognized by listeners better than intonation contour. We revealed the ASD child’s skills to repeat intonation contour. It was surprising because for emotional speech of our participants with ASD the correlation between emotional state and intonation contour specific for TD children [10] was not revealed [11]. Our data are confirmed by the study of prosodic patterns imitation by ASD children and TD children using more complex task in PEPS-C program [12].
Higher pitch values were described in some studies for the spontaneous speech of ASD children [3]. In our work high pitch values were shown for spontaneous speech and repetition words. Repetition task leads to decrease of pitch values and clear articulation that corresponds with spectral characteristics of ASD children. Repetition task is relevant for ASD children; it is based on the developmental specificity of speech – echolalia [1]. Correct repetition needs the motor program for articulation mastering and using verbal memory that allows use this task for speech training. According to the opinion [13], echolalia can be used for communication in speech-language intervention. The repetition task is one of the ways to obtain a speech material from children with ASD for our future studies including speech corpora for automatic recognition. The finding of our work is the revealing of spectral features, the coarticulation effect and longitudinal data for speech of ASD children.
5 Conclusions
On the base of perceptual experiment the recognition of ASD and TD child age and gender, meaning of words and correspondence of the word’s meaning and intonation contour of the child’s repeated word to the sample with less recognition of ASD child was revealed. We found significant differences in pitch values, vowel formant frequency and energy between ASD groups and between ASD and TD in spontaneous speech and repetition words. Pitch values of stress vowels were significantly higher in spontaneous speech vs. repetition words for ASD and TD children, in the spontaneous speech of the ASD-1 vs. ASD-2 children. The coarticulation effect was shown for ASD and TD repetition words. Age dynamic of ASD children acoustic features indicated mastering of clear articulation.
References
Kanner, L.: Autistic disturbances of affective contact. Nerv. Child. 2, 217–250 (1943)
Vorperian, H., Kent, R.D.: Vowel acoustic space development: a synthesis of acoustic and anatomic data. J. Speech Lang. Hear. Res. 50(6), 1510–1545 (2007). doi:10.1044/1092-4388(2007/104)
Fusaroli, R., Lambrechts, A., Bang, D., Bowler, D.M., Gaigg, S.B.: Is voice a marker for Autism spectrum disorder? A systematic review and meta-analysis. Autism Res. 10(3), 384–407 (2017). doi:10.1002/aur.1678
Bonneh, Y.S., Levanov, Y., Dean-Pardo, O., Lossos, L., Adini, Y.: Abnormal speech spectrum and increased pitch variability in young autistic children. Front. Hum. Neurosci. 4(237), 1–7 (2011). doi:10.3389/fnhum.2010.00237
Lyakso, E., Frolova, O., Grigorev, A.: A comparison of acoustic features of speech of typically developing children and children with autism spectrum disorders. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS, vol. 9811, pp. 43–50. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_4
Schopler, E., Reichler, R.J., DeVellis, R.F., Daly, K.: Toward objective classification of childhood autism: childhood autism rating scale (CARS). J. Autism Dev. Disord. 10(1), 91–103 (1980)
Lyakso, E.E., Grigor’ev, A.S.: Dynamics of the duration and frequency characteristics of vowels during the first seven years of life in children. Neurosci. Behav. Physiol. 45(5), 558–567 (2015). doi:10.1007/s11055-015-0110-z
Lee, S., Potamianos, A., Narayanan, S.: Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 105(3), 1445–1468 (1999)
Cleland, J., Gibbon, F.E., Peppé, S.J., O’Hare, A., Rutherford, M.: Phonetic and phonological errors in children with high functioning autism and Asperger syndrome. Int. J. Speech Lang. Pathol. 12(1), 69–76 (2010)
Lyakso, E., Frolova, O., Dmitrieva, E., Grigorev, A., Kaya, H., Salah, A.A., Karpov, A.: EmoChildRu: emotional child Russian speech corpus. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 144–152. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_18
Lyakso, E., Frolova, O., Grigorev, A., Sokolova, V., Yarotsaja, K.: Reflection of the emotional state in verbal and nonverbal behavioral of normally developing children and children with autism spectrum disorders. In: Proceeding of the 17th European Conference on Developmental Psychology, 2015, pp. 93–98. Medimond Publishing Company (2016). S908C0327
Diehl, J.J., Paul, Rh.: Acoustic differences in the imitation of prosodic patterns in children with autism spectrum disorders. Res. Autism Spectr. Disord. 6(1), 123–134 (2012). doi:10.1016/j.rasd.2011.03.012
Saad, A.G., Goldfeld, M.: Echolalia in the language development of autistic individuals: a bibliographical review. Pro. Fono. 21(3), 255–260 (2009). doi:10.1590/S0104-56872009000300013
Acknowledgements
This study is financially supported by the Russian Foundation for Basic Research (projects 15-06-07852a, 16-06-00024a) and Russian Foundation for Humanitarian Studies (project 17-06-0053a).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lyakso, E., Frolova, O., Grigorev, A. (2017). Perception and Acoustic Features of Speech of Children with Autism Spectrum Disorders. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)