Keywords

1 Introduction

The study of speech of child with autism spectrum disorders (ASD) includes two main problems. On the one hand, disruption of ASD child’s communication makes it difficult to obtain speech material [1], on the other, the acoustic features widely used for speech analysis of typically developing (TD) children [2] do not completely reflect the specificity of ASD child speech. ASD is associated with differences in prosody production from monotonous machine-like to variable, exaggerated [3] and abnormal speech spectrum [4]. In some works, the differences between ASD and TD children in average spectra [4], formant frequencies and their energy [5] were revealed. We assume that some of the noted features of ASD child speech are associated with recording situations, language environment, methods of teaching. In our pilot study [5] we indicated clearer articulation and lower pitch values of repetition words vs. words in spontaneous speech in ASD children. At present we use repetition task as model for ASD children speech research. The purpose of our study is to reveal verbal and non-verbal information in speech features of ASD children.

2 Method

2.1 Data Collection

Participants in the study were 30 children with ASD (F84 according to ICD-10), biologically aged 5–14 years, mental aged 4–7 years and 160 TD coevals (control). For this study the ASD participants were divided into groups according to developmental features: presence of development reversals at the age 1.5–3.0 years (first group – ASD-1) and developmental risk diagnosed at the infant birth (second group – ASD-2). For these children, the ASD is a symptom of neurological diseases associated with brain damage. The ASD groups don’t differ significantly on the base of Child Autism Rating Scale [6] scores and psychophysiological tests on the stage divided child into groups. Places of recording were at home, in the laboratory, kindergarten and school. The situations of speech recording were the play with the standard set of toys, dialogues with parents (for ASD child) and the experimenter, word repetition after experimenter. The recordings were made by the “Marantz PMD222” recorder with a “SENNHEIZER e835S” external microphone.

2.2 Data Analysis

Two types of experimental methods of speech analysis were performed: perceptual (by listeners) and spectrographic.

The aim of the perceptual study is the review of listeners’ (Russian native speakers, adults) recognition of the correspondence of the word repeated by the child to the sample by the meaning and intonation contour, child age and gender on the base of speech samples. The test sequences included words from spontaneous speech (n = 4 tests, for 21 samples of ASD test, and for 30 samples of TD test) and repetition words (n = 14 tests, “adult sample – child response” for 35 samples). Repetition tests contained the words with stress vowels /a/, /i/, and /u/. We used two types of test sequences: the first type (tests 1, 3) included words with minimum coarticulation effect for stress vowels; the second type (tests 2, 4) – words with maximum coarticulation effect. The test sequences were presented to listeners (n = 220, age – 18–46, 23.7 ± 6.8y) for perceptual analysis. The factor of the adult’s experience of interaction with children (at the household level) was not significant, so all data are presented together.

A special experiment included the listening of two tests (ASD and TD with minimum coarticulation effect) for the group of adults (programmer students). For this control group of listeners (n = 12, age – 24.4 ± 12.6 y; 21–44 y) the phonemic hearing, hearing thresholds, and lateral asymmetry profile were determined. This information was needed to determine individual characteristics of an adult (gender, age, experience with children, hearing, phonemic hearing, the leading hemisphere by ear), which have the greatest impact on their recognition of the speech of children with ASD.

Spectrographic analysis of speech was carried out in the Cool Edit (Syntril. Soft. Corp. USA) sound editor. We analyzed and compared pitch values, max and min values of pitch, pitch range, formants frequency (F1, F2, F3), energy and duration for vowels and stationary part of vowels. The same parameters were compared using the Mann-Whitney U criterion in /a/, /i/ and /u/ after the following consonants: /k/ and /d/ for /a/, /b/ and /g/ for /u/, and /t’/ for /i/. Formant triangles were plotted for vowels with apexes corresponding to the vowels /a/, /u/, and /i/ in F1, F2 coordinates and their areas were compared. Vowel formant triangle areas were calculated [7]. The values of the amplitudes (energy) of pitch and the first three formants of vowels by the dynamic spectrogram were determined. The normalized values of formants amplitude concerning to the amplitude of the pitch (E0/En, where E0 is the amplitude of pitch, En is the amplitude of Fn, (n = 1, 2, 3) were calculated [5]. The intonation contour correspondence in “adult sample – child response” was analyzed in Praat program v.6.20.

All procedures were approved by the Health and Human Research Ethics Committee (HHS, IRB 00003875, St. Petersburg State University) and written informed consent was obtained from parents of the child participant.

3 Result

3.1 Perceptual Data

Word’s meaning and intonation contour: Comparative data showed that the majority of listeners (range 0.75–1.0) recognized the meaning of 67% of words of 5–7 years old TD children, 73% of words of TD children aged 7–12 years, 43% of the words of ASD-1 children and 40% of words of ASD-2 children in the test sequences containing the words from spontaneous speech. For those TD children, gender was associated F(6,734) = 19.333 p < 0.000, R2 = 0.1359 with F0 values (Beta = 0.2163) and E2/E0 (Beta = −0.6819); age correlated F(6,737) = 95.256, p < 0.000, R2 = 0.4368 with F1 values (Beta = −0.2122), F0 values (Beta = −0.1132), and E2/E0 (Beta = 0.1394) – Multiple Regression analysis.

Determining the correspondence of the intonation contour of the child’s repeated word to the sample caused greater difficulty for the listeners than determining the meaning of the word. The meaning of TD child words in all tests was recognized by listeners better than intonation contour (Fig. 1A, B).

Fig. 1.
figure 1

Correspondence of the word repeated by the child to the sample by the meaning (A) and intonation contour (B)

The coarticulation context (tests 1, 3 and tests 2, 4) doesn’t have influence on the recognition of the word’s meaning and intonation contour. The exception is the data of test 2 for the correspondence of the word meaning recognition of ASD child (61.8%) vs. the words from test 1 (72.2%), test 3 (72.7%), and test 4 (72.2%).

Gender and age: The second task for listeners in repetition tests was to recognize the child age and gender. In the TD tests the speech of boys and girls is presented equally. Adults identified correctly the gender of the TD children. Exclusion was the test of TD children aged 7–12 years in which 11% of the speech samples belonging to girls were attributed to boys. In the tests of ASD children the number of samples of the speech of boys is greater than that of girls, but listeners indicate a greater number of speech patterns as belonging to girls (Fig. 2A).

Fig. 2.
figure 2

A – Percentages of boy’s and girl’s speech samples in test sequences and perceived by listeners as male and female. B – Age of TD children and ASD children recognized by listeners. Horizontal axis – child’s real age, vertical axis – age of the child indicated by the listeners

The age of TD children was determined by adults almost correctly, listeners recognized age of ASD children as below the real age (Fig. 2B). These data are confirmed by the results of the control perceptual experiment (Table 1). Two tests were presented to the listeners, each test contained repetition speech of 5–12 years old TD and ASD children.

Table 1. The boy’s and girl’s speech samples in control tests sequences and perceived by listeners as male and female, percentages

Child age was associated with average pitch values F(4,272) = 4.077 p < 0.000 (Beta = −0.5712, R2 = 0.043) – Multiple Regression analysis. The predictors for child gender F(5,271) = 11.2, p < 0.0000 were pitch values (Beta = 0.3081, R2 = 0.1712, p < 0.000), values of F1 (Beta = −0.2087, p < 0.0004), F2 (Beta = −0.2573, p < 0.0011), F3 (Beta = 0.1920, p < 0.02).

The result of the special experiment showed that listeners hearing thresholds (the left ear) influenced on the recognition of the correspondence of word meaning of ASD child to the sample F(4,7) = 2.3752, p < 0.1499 (Beta = 0.732 R2 = 0.5758 – Multiple Regression analysis). Correlations (Spearman p < 0.5) between the adult’s phonemic hearing (the repetition of triples of the syllables) and the recognition of the correspondence of the intonation contour of ASD child words to the sample (r = 0.673, p < 0.5), the child age determination (r = 0.632, p < 0.5) were revealed. The listeners’ experience of interaction with children influenced (r = 0.657, p < 0.5) the determination of TD children age. The results of this perceptual experiment correspond to data obtained by other listeners.

The larger amount of control listeners (75–100%) attributed the words of TD children to the category of corresponding by word meaning (95.8%) and intonation contour (70.7%) than words of ASD children (75% and 62.6% - meaning, intonation contour). The predictors for listeners recognition of the correspondence of ASD child word’s meaning to the sample F(1,33) = 9.1548 p < 0.004 (Beta = −0.4660 R2 = 0.21717) were values of F1 for stress vowels /a/, /i/, /u/ in the words.

15 words of ASD children from different tests recognized by all listeners (probability 1.0) are included in the new test sequence. This test was listened by 20 adults aged 22 to 81 years (group-1 – n = 6, 22–28 years, group-2 – n = 7, 37–64 years, group-3 – n = 7, 71–81 years). The best recognition (range 0.75–1.0) of the meaning of the words of children listening to the second age group was found, compared with the third one F(1,12) = 10.348 p < 0.007 (Beta = −0.6804, R2 = 0.4182 – Multiple Regression analysis). Adult gender and experience of interaction with children did not have an influence on the recognition of the meaning of ASD child words.

3.2 Acoustic Features of TD vs. ASD Spontaneous vs. Repetition Child Speech

Pitch values of stress vowels were significantly higher in spontaneous speech vs. repetition words for ASD-1 children (p < 0.001 Mann-Whitney test), ASD-2 (p < 0.01), and TD children aged 7–12 years (p < 0.05). Pitch values in the spontaneous speech of the ASD-1 were higher (p < 0.001) than in the ASD-2 children (Fig. 3A).

Fig. 3.
figure 3

Pitch values (median) of stress vowels from word for 5–7 years old TD children, TD children 7–12 years, ASD-1 and ASD-2 children (A); pitch values of vowels from words for TD children (white), ASD-1 (black), ASD-2 (gray) in spontaneous speech (B). * − p < 0.05, ++**− p < 0.01, *** − p < 0.001 Mann-Whitney test; differences between: *** ASD-1 sp vs. ASD-1 rep.; ** ASD-2 sp. vs. ASD-2 rep.; ** ASD-1 rep vs. ASD-2 sp; * ASD-2 rep. vs. ASD-1 rep.; * TD 7-12 sp. vs. TD rep.; +++ ASD-1 sp. vs. TD (sp. & rep.); ++ ASD-2 sp., rep. vs. TD (sp. & rep); + ASD-2 rep vs. TD (sp.& rep)

The differences (p < 0.01) between TD 7–12 years-old boys and girls on the base of pitch values of stress vowels in spontaneous speech were revealed. The pitch in spontaneous speech represented by the ages of all children has high values in ASD-1 and ASD-2 children (Fig. 3B). The first two formant frequencies (acoustic keys for vowel recognition) are less correlated with the individual characteristics of the child (gender and age) than their energy (Table 2).

Table 2. Correlation between acoustical features of child speech and child gender, and age, Multiple Regression analysis

This finding allows comparing data from different types of child speech without individual age and gender. The formant triangles of vowels for the spontaneous speech differ from the ones for the repetition words (Fig. 4A, B). The largest square of formant triangles for the repetition words vs. spontaneous speech were showed (Fig. 4E). The shifts in the values of the first two formants of vowels, leading to displacement of the formant triangles into the higher-frequency area, were seen for the vowels of ASD-1 children vs. ASD-2 peers in spontaneous speech (Fig. 4A); ASD-2 vs. ASD-1 in the repetition words (Fig. 4B). The words with the vowels with maximum coarticulation effect occupy the large area on the two-coordinate plot (Fig. 4D), the formant triangles of these vowels have a larger square (Fig. 4F) than those of the vowels with minimal coarticulation effect (Fig. 4C, F).

Fig. 4.
figure 4

The vowels formant triangles with apexes /a/, /u/, /i/ for spontaneous words (A), repetition words from tests (B); repetition words with min. coarticulation effect for vowels (C), and repetition words with max coarticulation effect for vowels (D). Areas of vowels formant triangles for repetition and spontaneous speech (E), and for repetition words with min and max coarticulation effect for vowels (F) in conventional units. Horizontal axis values are F1, Hz, vertical axis values are F2, Hz (A, B, C, D)

We found the specific features of the dynamic spectrum of the stress vowels from the ASD child words. The energy of the third formant higher (p < 0.001) of ASD child vs. TD child. This characteristic more expressed in spontaneous speech vs. repetition words.

3.3 Acoustic Features of the Repetition Words: Longitudinal Data

To confirm the assumption about the clearer articulation in the repetition words and its positive dynamic in the learning, the acoustic characteristics of stress vowels in the words repeated by five children twice at the interval of one year were compared. The significant differences between vowel’s pitch values and values of three formant frequencies from the same word repetition by every child twice per year were shown (Table 3).

Table 3. Differences between correspondence features of vowels from the same word repetition by child twice per year, Mann-Whitney U test

The coarticulation effect for words with the stressed vowel /i/ is absent. Significant effect (p < 0.002) for the stressed vowel /a/ on the values of F0, for the words with the stressed vowel /u/ on the values of F0 (p < 0.01) and values of F3 (p < 0.006) were revealed. All children accurately repeated a larger number of words by their meaning (p < 0.01) and intonation (p < 0.01) in the second time of repetition. The energy of the third formant for vowels from repetition words with the minimum and maximum coarticulation effect differs between the first and second testing (for max. & min. coarticulation effect). The Fig. 5 presents data for severely autistic child 4 according CARS score with maximal developmental progress (Fig. 5A) and for mildly autistic child 3 with minimal developmental progress (Fig. 5B).

Fig. 5.
figure 5

The distribution of the first three formant amplitudes normalized to the amplitude of the pitch for vowels /a/ in repetition words for two children (A, B). Vertical axis – En/E0 (normalized amplitude), horizontal axis – F0 and formants (F1, F2, F3). Min and max coarticulation effect; 1 – first testing, 2 – second. **− p < 0.01, *** − p < 0.001 Mann-Whitney U test

4 Discussion

The ability of adults to recognize gender, age, meaning of words, the correspondence of the word’s meaning and intonation contour of the child’s repeated word to the sample with less determination of ASD child was shown. Perception data are confirmed by acoustic features. We found significant differences in pitch values, vowel formant frequency and energy between ASD groups and between ASD and TD in spontaneous speech and repetition words.

Adults detected the age of children with ASD lower than the actual age that correlated with higher pitch values of ASD children. The association between child age and pitch values was shown in studies [7, 8]. Data on specific non-developmental phonetic and phonological errors of 5–13 year olds with ASD [9] confirm our results on worse word meaning recognition of ASD children vs. TD children. The meaning of the words of TD children was recognized by listeners better than intonation contour. We revealed the ASD child’s skills to repeat intonation contour. It was surprising because for emotional speech of our participants with ASD the correlation between emotional state and intonation contour specific for TD children [10] was not revealed [11]. Our data are confirmed by the study of prosodic patterns imitation by ASD children and TD children using more complex task in PEPS-C program [12].

Higher pitch values were described in some studies for the spontaneous speech of ASD children [3]. In our work high pitch values were shown for spontaneous speech and repetition words. Repetition task leads to decrease of pitch values and clear articulation that corresponds with spectral characteristics of ASD children. Repetition task is relevant for ASD children; it is based on the developmental specificity of speech – echolalia [1]. Correct repetition needs the motor program for articulation mastering and using verbal memory that allows use this task for speech training. According to the opinion [13], echolalia can be used for communication in speech-language intervention. The repetition task is one of the ways to obtain a speech material from children with ASD for our future studies including speech corpora for automatic recognition. The finding of our work is the revealing of spectral features, the coarticulation effect and longitudinal data for speech of ASD children.

5 Conclusions

On the base of perceptual experiment the recognition of ASD and TD child age and gender, meaning of words and correspondence of the word’s meaning and intonation contour of the child’s repeated word to the sample with less recognition of ASD child was revealed. We found significant differences in pitch values, vowel formant frequency and energy between ASD groups and between ASD and TD in spontaneous speech and repetition words. Pitch values of stress vowels were significantly higher in spontaneous speech vs. repetition words for ASD and TD children, in the spontaneous speech of the ASD-1 vs. ASD-2 children. The coarticulation effect was shown for ASD and TD repetition words. Age dynamic of ASD children acoustic features indicated mastering of clear articulation.