Keywords

1 Introduction

The speech of non-native speakers may exhibit pronunciation characteristics that result from their imperfectly learning the L2 sound system, either by transferring the phonological rules from the first language to the second language or through implementing strategies similar to those used in primary language acquisition [1]. In speech recognizing tasks, distinguishing between native speakers from non-native speakers remains a big challenge to solve. To improve speech recognizers’ accuracies, a huge amount of speech data is required in the training and testing phases. Features used for this purpose are usually: MFCC, formants, etc. [2,3,4,5,6]. Formants, whether they are defined as acoustic resonances of the vocal tract, or as local maxima in the speech spectrum, are determined by their frequency and by their spectral width. They were widely studied in speech processing as well as in speech perception, sound production comparison within languages, second language (L2) acquisition, speech pathology studies, recognition tasks, etc. [4, 8,9,10,11,12,13,14

Like many other languages, the Arabic language has known several research studies on formants. Regarding the auditory level, we can cite works of [15,16,17] conducted either on Modern Standard Arabic or on Arabic dialects. Abou Haidar investigated the MSA vowels system using a set of monosyllabic words to show the vowel cross-dialectal structural stability at the perception level [15]. The comparative study was done on the vowels of eight informants from different linguistic backgrounds (Qatar, Lebanon, Saudi Arabia, Tunisia, Syria, Sudan, United Arab Emirates, and Jordan). Alghamdi based his work on six isolated Arabic syllables that were uttered by 15 native informants representing three Arabic dialects (Egyptian, Saudi and Sudanese) [16]. Newman realized an experimental study on MSA vowels in connected speech. The speech variety that was studied was the Holy Quran recitation [17].

MSA phonetic system is endowed by six vowels: three short vowels (/a/, /u/ and /i/) vs. three long vowels (/a:/, /u:/ and /i:/). However, the English language has fifteen vowel sounds. The investigation examines only short vowels extracted from the speech material. The present study examines vowel variation quality (the first, second and third formants, hereafter F1, F2 and F3) within L1 and L2 Arabic language. The objective is to put forward formant variation in vowel production in Modern Standard Arabic (MSA) spoken by native vs. non-natives speakers. Thus, we examined the foreign accent of speakers within connected sentences produced by Arabic and American participants.

Afterward, we compared our results with those found in some Arabic dialect studies.

The paper is organized as follows. Section 2 exposes speech material and participants used in the study. Section 3 describes formants extraction. Section 4 shows experiments and findings. Section 5 gives the concluding remarks based on the analysis.

2 Materials

In this section, we outline materials used in the experimental setting. Information was given in detail about texts read, speakers, recordings and technical conditions. There are six vowels (three short vowels vs. three long vowels) and two diphthongs in Modern Standard Arabic (MSA) language. In our study, we focused only on formant frequencies as a part of acoustic features computed from short vowels (/a/, /u/ and /i/). Because vowels’ formant frequencies can be impacted by phoneme co-articulation, recordings used were speech continuous sentences.

Totally 29 speakers participated in the experiment (15 natives/14 non-natives). They uttered five sentences taken from script 1 of the West Point corpus [18]. The English language is the mother tongue of non-native speakers. The first language of non-native speakers was the English language. Technical conditions of recording were as follows: normal speech rate, a sampling frequency of 22.05 kHz. A total of 145 recordings were used in the analysis. Table 1 shows the number and gender of speakers in the sample.

Table 1. Distribution of native and non-native speakers per gender

Formants are distinctive frequency features that refer to frequency resonances. To compute speech formants, we annotated and segmented manually all speech material i.e. 145 recordings of the dataset onto their different segmental units (vowels and consonants) using Praat software [19]. Then, we extracted all vowels of the speech material, which constituted our speech signal material.

3 Feature Extraction

All speech signals of the vowel's dataset were first pre-emphasized with a pre-emphasis filter that is used to enhance the high-frequency components of the spectrum. This is performed by applying the following formula:

$$x_{n}^{^{\prime}} = { }x{ }_{n} - ax_{n - 1}$$
(1)

where \(a\psi\)is the pre-emphasis coefficient which should be in the range \(0 \le a <1\).

Hamming window function was applied on each frame of the signal, to reduce boundary effects. The impulse response of the Hamming window is defined as follows:

$$(n)\, = \,0.{54}\, - \,0.{46}cos({2}\pi n/ \, (N\, - \,{1}))$$
(2)

where \(0 \le n \le N -1\).

The normal frame length was about 30 ms to ensure stationary property

Most formant tracking algorithms are based on Linear Predictive Coding (LPC). LPC coefficients are extracted from pre-processed speech. The estimated future speech samples from a linearly weighted summation of past p-samples using the method of least squares is

$$\hat{x}\left( n \right) = - \sum {_{k = 1}^{p} a\left( k \right)x\left( {n - k} \right)}$$
(3)

where \(x\left( n \right)\) and \(\hat{x}\left( n \right)\) are speech samples and their estimates and

\(a(k) =[a(1),a(2),\dots a(p)]T\) is the LPC parameters and \(p\) is the linear predictive (LP) filter order.

4 Results and Discussions

Formants refer to resonant frequencies of the vocal tract that appear as clear peaks in the speech spectrum. Formants display a large concentration of energy within voiced phonemes as vowels. Most often, the two first formants, F1 and F2, are sufficient to identify the kind of vowel. Nevertheless, in the study, we performed three formants (F1, F2 and F3) values for each vowel to show variation in pronunciation between native and non-native speakers.

The first investigation seeks to provide the assessment of the mean values of all formants for both corpora categories i.e. short vowels produced by native and non-native MSA speakers groups. The calculation was conducted regardless gender of speakers (male/female). Table 2 outlines in detail the average and standard deviation values of F1, F2 and F3 measured for each short vowel /a/, /u/ and /i/.

Table 2. Means and standard deviation of F1, F2 and F3 (Hertz) of native (L1) and non-native (L2) speakers

Figure 1 displays formants measures computed from L1 and L2 MSA for each short vowel /a/, /u/ and /i/. When comparing the L1 vowels against their L2 counterparts, we can observe that the averages of F1, F2, and F3 calculated in /a/ produced by the non-natives were less compared to their homologs in native speech. In contrast, for /u/ uttered by L2 participants, the values of F1, F2, and F3 increased compared to /u/ produced by Arabic speakers. In /i/, a slight variation was noticed in the second formant between both groups. F1 of native speakers was less compared to those of non-native speakers. In the case of F3, natives gave higher value.

Fig. 1.
figure 1

Comparison of formant averages for both native and non-native MSA short vowels

Figure 2 illustrates the plot of the first two formants computed of both groups. The chart shows that the short vowels of non-native speakers are positioned slightly more central than their native counterparts: this tendency is more outspoken for /u/ and /i/. As it can be seen, the vowel diagram plotted for native speakers is larger than the triangle obtained by the second group. The diagram shows also that the latter is almost located inside the triangle drawn for native subjects. Statistical analysis indicates a significant effect in L2 pronunciation in both F1 and F2 formant for the vowel /a/. Likewise, results of /u/ and /i/ show also a significant effect on F1 for both vowels.

Fig. 2.
figure 2

Vowel diagram of native vs. non-native speakers

Comparative analysis was also carried out between the outcomes of this study and some other previous works [14,15,16]. Indeed, we compared our results with those obtained from researches conducted on Arabic dialects (7 and 1 dialects respectiely) by [14] and [15] and another study realized on Modern Standard Arabic (MSA) [16]. The Arabic dialect accents are the Qatari, the Lebanese, the Saudi, the Tunisian, the Syrian, the Sudanese, the Jordanian and the Egyptian. The authors built their experimental studies basing on a different kind of corpus (syllable, words, and connected speech).

Figure 3 depicts the location of MSA native and non-native vowels between all different dialect pronunciations. As it can be seen from Fig. 3, all vowels /u/ of Arabic dialects are positioned outside the MSA vowel diagram. However, almost all dialect values of /a/ and /i/ are within the MSA triangle. The chart shows also, that /u/ of non-native speakers is more central than all other dialects ‘values. Regarding, /a/ and /i/ of the same group the plot displays that almost localized in the periphery of the graph.

Fig. 3.
figure 3

Comparison of MSA native and non-native vowels with dialect vowels

5 Conclusion

The study deals with the formant analysis of Modern Standard Arabic (MSA) vowels produced by15 native (L1) and 14 non-natives (L2) speakers. Three formants were computed from all short vowels (/a/, /u/ and /i/) of the speech material. We computed for all vowels the three first formants F1, F2 and F3. When comparing the L1 vowels against their L2 counterparts, results show that formants calculated in /a/ produced by the non-natives were less compared to their homologs in native speech. In contrast, for /u/ uttered by L2 participants, the values of F1, F2, and F3 increased compared to /u/ produced by Arabic speakers. Regarding the vowel /i/, a slight variation was noticed in the second formant between both groups. The vowel diagram plotted from F1 and F2 values for both native and non-native speakers show that the short vowels of L2 speakers are positioned slightly more central than their native counterparts. This tendency is more outspoken for /u/ and /i/. Non-native triangle is almost located inside the triangle drawn for native subjects.

A comparative analysis of formants values was also carried out between native and non-native MSA with results of some eight Arabic dialect studies. The findings show that that /u/ of non-native speakers is more central than all other values computed in Arabic dialects. Concerning /a/ and /i/ uttered by L2 participants, the outcomes show that the latter are localized in the periphery of the vowel diagram.