Keywords

1 Introduction

The data on the acoustic features of infant’s vocalizations [1, 2], speech of children during the first years of life [3, 4] are obtained for different languages. Acoustic characteristics of speech of preschoolers [5,6,7], junior schoolchildren, and teenagers [8,9,10,11,12] are less studied. The focus of research is shifted to investigating child’s speech disorders [13,14,15].

The works of Child speech research group of Saint Petersburg State University describe the acoustic features of vocalizations and speech of Russian typically developing (TD) infants [16, 17] compared with orphans and children with neurological disorders [18, 19], the dynamics of the acoustic features of vowels in vocalizations and words of TD children [20] and twins [21] from birth to the age of 7 years. The comparative study of speech formation in TD children and children with autism spectrum disorders (ASD) is carried out [22, 23]. However, in these studies, the speech of TD children was recorded in the model situations - the repetition of words [22, 23] and under the different emotional states [24]. Information on the acoustic characteristics of calm (neutral) speech of Russian-speaking TD children in the age range of 5–16 years is absent. Data on the acoustic characteristics of child’s speech is necessary in clinical practice to clarify the diagnosis, for automatic speech recognition systems [25].

The aim of the study is to determine the age dynamics of temporal and spectral characteristics of vowels in words in 5–16 years old Russian children, depending on the gender and the age of the child.

2 Method

2.1 Data Collection

240 children aged 5–16 years (10 boys and 10 girls in each age) participated in the study. According pediatricians’ conclusion all children developed normally, did not have diagnosed hearing and speech disorders. All children were born and have been living in St. Petersburg city with parents who were also born in St. Petersburg or have been living there for more than 10 years. For all the children the first language (L1) was Russian. At school, children were taught the second language (L2) English.

Audio records of child speech with parallel records of child’s behavior in the situation of dialogue with the experimenter were made. The standard set of experimenter’s questions addressed to the child was used. The experimenter began the dialogue with the request to say your name and age. Then the experimenter consistently asked questions:

  • Do you like to go to school/kindergarten?

  • What do you like in school/kindergarten (classes or play with friends)?

  • What are your favorite tasks? Why?

  • Do you have any hobbies?

  • What are your favorite movies, cartoons, books, games (computer/desktop/mobile)?

  • Do you have brothers or sisters?

  • Do you have pets?

  • Did you visit the zoo, circus, and museum?

This set of questions allowed obtaining the child’s replicas containing similar and identical words. For example every child used in replicas the words: /like (nrAvitsya) – do not like (ne nrAvitsya)/, /know (znAyu) – do not know (ne znAyu)/, /Russian (rUsskiy)/, /bored (skUchno)/, /plays (Igry)/, /tiger (tIgr)/.

The duration of the dialogues was 5–10 min. The recordings were made by the “Marantz PMD660” recorder with a “SENNHEIZER e835S” external microphone and camera “SONY HDR-CX560E”. Speech files are stored in Windows PCM format, 44100 Hz, 16 bits per sample.

All procedures were approved by the Health and Human Research Ethics Committee and signed informed consent was obtained from parents of the child participant.

2.2 Data Analysis

Spectrographic analysis of speech was carried out in the “Cool Edit Pro” sound editor (Syntril. Soft. Corp. USA). We analyzed the duration of stressed and unstressed vowels and the stationary part of vowels; pitch values, formants frequencies (F1, F2) for the stationary part of vowels.

The acoustic features reflecting the basic physiological processes in the vocal tract during the articulation of the speech signal were chosen. Temporal features (vowels duration) are associated with the formation of speech breathing, the pitch values are the indicator of the frequency of oscillations of the vocal folds, the values of the two first formants reflect the processes occurring in the oral cavity and are acoustic keys for the identification of vowels.

Formant triangles with apexes corresponding to the vowels /a/, /u/, and /i/ in F1, F2 coordinates were plotted and their areas were compared. Vowels formant triangle areas [20] and vowel articulation index (VAI) [26] were calculated.

3 Result and Discussion

3.1 Duration of Vowels from the Words of 5–16 Years Old Children

We found dynamics of the duration of stressed and unstressed vowels in child’s words. Duration of stressed vowels in girls’ words significantly increases from the age of 5 years to 7 years (Mann–Whitney test), decreases during the age of 9–11 years (p < 0.01 Kruskal–Wallis test), and stabilizes at the age of 13–16 years (p < 0.001) (Fig. 1A).

Fig. 1.
figure 1

The duration of stressed and unstressed vowels from the words of girls (A) and boys (B) aged 5–16 years. Square marker – stressed vowels, triangle marker - unstressed vowels, round marker – stationary part of stressed vowels, X-marker – stationary part of unstressed vowels. * - p < 0.05; ** - p < 0.01; *** - p < 0.001, Kruskal–Wallis test. Horizontal axis – age, years; vertical axis – duration, ms.

The stressed vowels duration in the boys’ words is significantly higher than the corresponding values of unstressed vowels duration for all ages except the age of 7 years. (p < 0.001 for child’s age of 5 years, 8–9 years, 11 years and from 13 to 16 years; p < 0.01 for age of 6 years, 10 years and 12 years, Mann–Whitney test). The duration of stressed vowels in the words of boys reduces to the age of 13 years (p < 0.05) (Fig. 1B). The duration of unstressed vowels in the words of the girls reduces to the age of 13–16 years (p < 0.05) and to the age of 13 years in the words of the boys (p < 0.05).

Girls age correlates with the duration of stressed vowels F(1.798) = 83.608; p < 0.000 (Beta = −0.308; R2 = 0.095) and the stationary part of stressed vowels F(1.798) = 315.61; p < 0.000 (Beta = −0.532; R2 = 0.283); with the duration of unstressed vowels F(1.1361) = 63.295; p < 0.000 (Beta = −0.211; R2 = 0.044) and the stationary part of unstressed vowels F(1.1361) = 488.50; p < 0.000 (Beta = −0.514; R2 = 0.264) – Regression analysis.

Boys age correlates F(1.760) = 119.26; p < 0.000 with the duration of stressed vowels (Beta = −0.368; R2 = 0.136) and the stationary part of stressed vowels F(1.760) = 435.45; p < 0.000 (Beta = −0.604; R2 = 0.364).

The data on the stabilization of stressed and unstressed vowels duration to the age of 13 years may indicate the formation of speech breathing to this age.

According to the literature, speech breathing in children at the age of 7 years differs from speech breathing of adults [8]. These differences pass away by the age of 10 years, but developing some features of speech breathing (respiratory volume of lungs, sound pressure level) continues in adolescence [8]. Speech breathing features depend on the age of the informant, but not the informant’s gender [27].

Our data on the stabilization of stressed and unstressed vowels duration up to the age of 13 years may point at the end of speech breathing developing at this age.

Thus, we revealed the main trends of the vowels duration. The identification of specific correlation between sex, age, and nonlinear variation in the vowels duration will be the subject of further work.

3.2 Pitch Values of Vowels from the Words of 5–16 Years Old Children

Stressed and unstressed vowels pitch values dynamics with child’s age is traced. Pitch values of stressed vowels in girls’ words reach maximum at the age of 5 years and decrease till 9 years and remain stable at the age of 9–13 years, decrease to the age of 14–16 years (Fig. 2A, Tables 1, 2).

Fig. 2.
figure 2

Pitch values of vowels from 5–16 years olds’ words. A – girls, B – boys, * - p < 0.05; ** - p < 0.01; *** - p < 0.001 – Mann–Whitney test. Horizontal axis – age, years; vertical axis – pitch values, Hz.

Table 1. Pitch values of stressed vowels, Hz.
Table 2. Pitch values of unstressed vowels, Hz.

Figure 2B shows the same data for boys. Maximal values of pitch of stressed vowels are revealed in the words of 5 years old boys, after pitch decreases to the age of 6–8 years, 9–11 years, 12 years. Minimal values of pitch are revealed at the boy’s age of 16 years. Tables 1, 2 represents accurate age and gender data of pitch values of child’s vowels in words.

Pitch values of stressed vowels from girls’ words are significantly higher vs. corresponding values of vowels from boys’ words at the age of 6 years (p < 0.001; Mann–Whitney test), 7–8 years (p < 0.05), 13 years (p < 0.01), and 16 years (p < 0.001). Pitch values of unstressed vowels from girls’ words are significantly higher vs. corresponding values of vowels from boys’ words at the age of 6 years (p < 0.01), 10 years (p < 0.05), 13–14 years (p < 0.001), and 16 years (p < 0.001). Pitch values of unstressed vowels from boys’ words are significantly higher than corresponding values of vowels from girls’ words at the age of 5 years and 11 years (p < 0.05).

Child’s gender correlates F(6.1797) = 5.155; p < 0.0000 with pitch values of stressed vowels (Wilks’Lambda 0.965; p = 0.0000); F(6.2669); p < 0.0000 with pitch values of unstressed vowels (Wilks’Lambda 0.974; p < 0.0000) – Discriminant analysis.

Girls’ age correlates with pitch values of stressed vowels F(1.892) = 458.99; p < 0.000 (Beta = −0.583; R2 = 0.34) and pitch values of unstressed vowels F(1.1361) = 643.32; p < 0.000 (Beta = −0.566; R2 = 0.321) – Regression analysis. Boys’ age correlates with pitch values of stressed vowels F(1.808) = 174.49; p < 0.000 (Beta = −0.421; R2 = 0.178).

The results on the decrease of pitch values with child’s age correspond to the data for other languages [10, 12] and reflect the general patterns of voice formation in ontogenesis [28,29,30,31,32]. In our study sharp changes in the pitch values in boys at the ages of 6 years, 9 years, 12 years and 16 years were revealed. More linear decrease of pitch values with child’s age was described in girls vs. boys. Age-related anatomical changes in the vocal tract, in particular, changes in its length could be used as an explanation of these data. Differences in the length of the vocal tract between boys and girls after the age of 12 years revealed by MRI data caused changes in the pitch values between girls and boys [29]. The authors concluded based on the fMRI that boys and girls have different age dynamics in the vocal tract length, causing various dynamics of pitch [32]. Two age periods with sharp decreases in the pitch values 6–8 years and 12–15 years are revealed in boys, all pitch changes in girls are more linear without sudden changes [32].

3.3 Formant Characteristics of Vowels from the Words of 5–16 Years Old Children

Figure 3 presents the formant triangles with apexes corresponding to values of the first and second formants of stressed vowels /a/, /i/, /u/ from girls (A) and boys (B) words. The first formant values of stressed vowel /a/ from girls’ words are maximal at the age of 5 years and decrease to the age of 12 years (p < 0.05; Kruskal–Wallis test). The first formant values of stressed vowels /a/, /u/, /i/ from boys’ words are maximal at the age of 5 years and decrease with child’s age. The second formant values of stressed vowels from girls’ and boys’ words show tendency to decrease with child age.

Fig. 3.
figure 3

The stressed vowels formant triangles with apexes /a/, /u/, /i/ from words of girls (A) and boys (B). Horizontal axis values are F1, Hz, vertical axis values are F2, Hz. Bold lines indicate the data for 5 and 16 years old children (boundaries of the analyzed age range).

The formant triangles of stressed vowels shift to the low-frequency region on the two-formant coordinate plane by the values of the first formant – up to the age of 14 years for girls and up to the age of 16 years for boys (Fig. 3).

VAI of stressed vowels from girls’ and boys’ words changes with children’s age non-linearly. The end of the preschool period is characterized by the maximum values of VAI (for boys at the age of 6 years, for girls at the age of 7 years), which can be explained by preparing children for schooling and the need for active use of verbal communication in the learning process. At the age of 8 years VAI values are high and similar for boys and girls. A further decrease in VAI may be due to the increase in fluency of speech and the termination of the articulation skills mastering.

VAI of stressed vowels from girls’ words is 0.75 (conventional units) at the age of 5 years and 0.82 at the age of 16 years. VAI of stressed vowels from boy’s words is 0.61 at the age of 5 years and 0.87 at the age of 16 years (Fig. 4).

Fig. 4.
figure 4

Articulation index of stressed vowels. Black columns – data for girls, grey columns – data for boys. Horizontal axis – age of child, years, vertical axis - vowel articulation index values, conventional units.

The values of formant triangle areas of stressed vowels from girls’ words are higher vs. boys’ words except the ages of 10, 11, 12 years (Fig. 5).

Fig. 5.
figure 5

Areas of vowels formant triangles. Black columns – data for girls, grey columns – data for boys. Horizontal axis – age of child, years, vertical axis - areas of vowels formant triangles values, conventional units.

Revealed differences in the values of VAI and formant triangle areas with the age of boys and girls are indirectly confirmed by the data on the material of the Chinese language about the significant decrease in the values of the first two formants in the speech of children in the age range 3–18 years [33]. The authors associate differences between boys and girls on the base of the vowels’ formant frequencies with differences in the volume of the pharynx of children [33].

3.4 Phonetic Data

The phonetic analysis revealed that children use normative phonemes for Russian language in words at the age of 8–16 years. Some differences in pronunciation related to individual features of the child and they do not affect the lexical meaning of the word. Unformed phonemes /l/, /∫/, /∫’/, /tS’/, /r/, /Z/, /s/, /s’/ were described in 5–7 years old children. Phonetic analysis revealed sound replacements not affecting the lexical meaning of the word: changes and replacements of phonemes /r’/ and /r/, change /∫/ to /s/, and /tS’/ to /t’/, replacements of group of consonants with one phoneme.

4 Conclusions

Age dynamics of the duration, pitch values, and spectral features of stressed and unstressed vowels from words of Russian-speaking boys and girls aged 5–16 years was described. Age dynamic of the duration of stressed and unstressed vowels and their stationary parts was shown. Duration of stressed and unstressed vowels from child’s words significantly decrease to the age of 13 years in children of both genders. Significant decrease of pitch values of vowels from words with child’s age was shown. The difference in pitch values between girls and boys was defined. These data reflect general patterns of the voice formation in ontogenesis, taking into account the gender of the informant. The clarity of articulation of Russian-speaking 5–16 years old children was described by the vowels articulation index.

The obtained data on the acoustic features of the speech of TD children can be used as a normative basis in artificial intelligence systems for teaching children, for creating alternative communication systems for children with atypical development, for automatic recognition of child speech [34], for teaching children the clarity of pronunciation based on the use of acoustic feedback.