Keywords

1 Introduction

Remote working and learning scenarios are not only necessary under certain social or global circumstances but are also currently becoming more and more accepted. Online elements offer different substantial benefits, but they must be consistently evaluated and improved in order to maintain their status within our economic and educational system. The current chapter aims at presenting an exercise that can be used to optimize learners’ pronunciation of English in an online-based environment by combining theoretical aspects of linguistics with practical ideas for the virtual classroom. Praat, a freely available speech analysis program, and the acoustic and auditory characteristics of spoken language represent crucial aspects in this chapter. On the practical side, in turn, certain key notions that have been considered to represent essential ingredients of modern language learning and teaching are used, such as a well-balanced mixture of instruction, individual, and partner work. Native speakers of German at a rather low level in English can take advantage of the work presented here, but the exercise might also be useful for learners with another linguistic background, possibly with modifications. Also, the general spirit of the exercise might help develop tasks concentrating on different pronunciation issues, such as vocalic, other consonantal, or prosodic difficulties, which could be approached with Praat, too.

The chapter is structured as follows. Section 2 introduces the necessary theoretical background of our work. That is, first, the focus lays on one well-known difficulty German learners face when using the foreign language English. Second, the Praat software is introduced and it is described how this tool has been used in the foreign language classroom so far. It is argued that more work on this topic is needed, and the objective is to present a learning situation in which Praat is used more easily and systematically. In contrast to most previous work, a detailed plan is given in this chapter, which can be directly used and integrated into the virtual foreign language classroom. Third, further features that are well-established in the didactic community are discussed, which are supposed to contribute to the success during the learning period. The specific exercise in described in Sect. 3. It primarily aims at helping German native speakers who are learners of English at a low level to improve their realization of the interdental fricative /θ/, a sound that is used in English but not in German, and to distinguish this sound from the common /s/, which both languages use. Praat represents a key component of this virtual exercise. The exercise is further discussed in Sect. 4, before the chapter is concluded in Sect. 5.

2 Theoretical Background

2.1 The Problem: The Production of /θ/ in German Learners of English

Learning a foreign language can imply that individuals are exposed to linguistic structures or phenomena in the target language that their native language does not use. Cross-linguistic variation can be detected on multiple levels such as the syntactic, morphological, lexical, phonological, or phonetic ones. When speaking a foreign language, phonological and phonetic peculiarities take a major place in the learning process. Although a profound discussion of theoretical models on the acquisition of non-native speech sounds is beyond the scope of this paper, it is well known that several of these models assume that one’s native language plays a decisive role when learning a foreign language. The theoretical point of departure are the Perceptual Assimilation Model (PAM, Best & Tylor, 2007) and the Speech Learning Model (SLM, Flege, 1995), which incorporate an idea relevant to the current project, namely the possibility that two sounds of a foreign language are mapped onto a single category in a speaker’s native language. Specifically, the focus is on German learners of English who produce the English phonemes /θ/, which is not part of the German sound inventory, and /s/, which is part of the German sound inventory, in the same way. Put differently, the two sounds, /θ/ and /s/, from the foreign language are mapped onto the single native category /s/.

Speaking is one of the key competences when learning a foreign language and accurate pronunciation can facilitate communication (see, e.g., Cook, 2013; Johnson, 2008). Of all the potentially difficult issues for German learners of English, one from the fricative inventory has been selected and an exercise to approach this problem has been developed. It is common knowledge that English, but not German, uses interdental fricatives (see, e.g., Carr, 2013; Grantham O’Brien & Fagan, 2016; Roach, 2009), and this often creates difficulties for German learners of English. As stated in Hickey (2020), German speakers typically replace the English interdental fricative [θ] with the alveolar fricative [s], which belongs to both the German and English sound inventory. Although the articulatory gestures, such as the tongue movements, involved in the production of interdental fricatives are quite marked (see, e.g., Ladefoged & Maddieson, 1996) and can be explicitly taught to learners of English, it is argued that reaching accuracy in the realization of interdental fricatives can be supported by the use of a combination of auditory, visually presented acoustic information, and numerical values of acoustic parameters. Differences in intensity and the energy distribution across frequencies are relied on to distinguish between the two types of fricatives [s] and [θ] (see also, e.g., Ladefoged, 2003; Machač & Skarnitzl, 2009; Varden, 2006; Zsiga, 2013), and to see whether the pronunciation of the English [θ] is adequate or still requires improvement. That is, for instance, the [θ] is less intense than the [s] and does not show a large amount of energy in the higher frequencies, which the [s] does. The two sounds will be contrasted in more detail in Sect. 3 when the exercise is described.

2.2 Praat and Its Role in Pronunciation Learning and Teaching

Our exercise presented later connects to a large body of research on pronunciation learning and teaching in general and with the help of technology in particular (see among many others, e.g., Low, 2015; Munro & Derwing, 2019; Pennington & Rogerson-Revell, 2019; Reed & Levis, 2015). The central program in the specific exercise presented in Sect. 3 is Praat (Boersma & Weenink, 2021), which is the standard tool in the phonetic and phonological sciences. It can be freely downloaded from the internet within a minute. Apart from the quick and free access, Praat is attractive not only for researchers of theoretical, empirical, and applied linguistics but also for teachers and learners of languages for multiple reasons. These advantages include its immense range of possible functions and methods (see, e.g., Boersma, 2013), its user-friendly interface, and the availability of many online tutorials and sources that can assist one while working with Praat (see, e.g., Conrad, 2019; Mayer, 2017; Styler, 2021; van Lieshout, 2017; Wood, 2020). The functions relevant to the present chapter will be introduced step by step in Sect. 3. In particular, three categories of functions will be used, that is, the auditory /listening function, the visual representation of acoustic properties, and the mathematical calculation of acoustic parameters. Praat can be used offline on a computer; however, the activities in focus later are based on the assumption that learners and the tutor work together from home in a virtual space. Therefore, the second tool necessary for our project is the online communication platform Zoom (Zoom Video Communications, Inc., 2022).

The idea to bring Praat to the foreign language classroom is not a new one. Researchers have used the software to consider different phonetic and phonological aspects in a foreign language, such as consonants (see, e.g., Beňuš, 2021; Olson, 2014; Wilson, 2009), vowels (see, e.g., Brett, 2004; Schweinberger, 2020; Wulandari et al., 2016), and prosody (see, e.g., Aramipoor & Gorjian, 2018; Gorjian et al., 2013; Li, 2019) (for further reading, see also, e.g., Demirezen, 2017; Jolayemi & Oyinloye, 2019; Osatananda & Thinchang, 2021).Footnote 1 Despite previous work, it is claimed that the exercise presented in Sect. 3 adds an important piece and combines several aspects in an unprecedented way. That is, first, a systematic, detailed, and step-by-step plan is provided, which can be directly used by learners and teachers. This is a clear benefit in comparison to most previous work, where individual and interesting pieces have not been put in an easy-to-use program. Second, an online exercise is described, not a normal classroom activity, and the advantage of a virtual environment is emphasized. Third, our exercise has been designed against the background of the three positive didactic concepts mentioned in the next subsection (2.3).

2.3 Further Methodological Considerations

In order to achieve the ultimate goal of our idea, namely an improved pronunciation of English interdental fricatives, a bunch of didactic concepts are considered. First, a type of content-based language learning is offered by integrating theoretical linguistic knowledge, such as the physical properties of speech sounds, into the foreign language classroom. The topic, from physics, is discussed in the foreign language English as a topic on its own and simultaneously serves to improve one of the key competences in language learning, namely speaking and its accuracy. Content-based language learning, that is, applying and using a foreign language in the context of specific subjects such as history, physics, or biology, represents a valuable option in modern foreign language learning and teaching and has been widely discussed in the literature (see, e.g., Dalton-Puffer et al., 2010; De Zarobe et al., 2011; Juan-Garau & Salazar-Noguera, 2015). Second, a kind of computer-based language learning is implemented in that our target group uses a software to monitor and improve their pronunciation and exchange their experience and gained knowledge via an online communication platform. Nowadays, technology plays a crucial role in the domain of language learning and teaching (see, e.g., Andujar, 2020; Buendgens-Kosten & Elsner, 2018; Thomas et al., 2013). Third, a well-balanced combination of tutor instruction, explorative-individual learning, and sequences of cooperative work is proposed and it is assumed that all of these distinct components contribute a positive part to the language learning process (see also, e.g., Archer & Hughes, 2011; Butzmann, 1998; Hollingsworth & Ybarra, 2013; McCafferty et al., 2006).

3 The Exercise

The specific exercise suggested here, which is based on the two tools Zoom and Praat, aims at improving the English pronunciation of German learners of English with a rather low competence in the target/foreign language. In particular, it is concerned with a well-known inaccuracy in the speech of German learners of English, namely the realization of the interdental fricative. Approaching this problem, our objective is to avoid potential communication issues and to contribute to a more native-like pronunciation of our learners. One example of a possible communication issue is the inaccurate production of homophones, as in the merging of two items into one. So, mouth and mouse are both produced as [maʊs], thick and sick as [sɪk], or path and pass as [pɑːs]. The learners and the tutor meet online to conduct the exercise in the way outlined below. Our target group represents learners of English whose native language is German, who have still a low competence in English (A2 to B1), who have difficulties in accurately pronouncing the English interdental fricative /θ/, and who have never used Praat before. Further, they should be 15 years or older, should have had physics at school for a couple of years, and are familiar with basic acoustics, although the central aspects in the context of the present exercise are revised and discussed together. An example of a possible target group is a group of learners who were taught English at school, who never needed it in their job for many years, and who intend to work on their English competencies later in life. More precisely, the exercise might be used is an adult education center in Germany. These centers offer voluntary and chargeable evening classes on various foreign languages and contribute, in addition to regular schools, universities, etc., to the promotion of multilingualism, an important aspect of the German society. Online instruction was an exception in this educational area before the pandemic and the exercise presented here might inspire others to develop new ideas. Note that the exercise is outlined step by step and with all details in mind that unexperienced Praat users need to understand the idea and the tool. To maximize the success rate of the exercise described in this chapter, the group size should be kept small, that is, there are ideally not more than 10 students. The primary language of the virtual classroom is English, but if students do not understand certain parts due to their rather low level in English, the German language can be additionally used to help.

3.1 General Introduction to Praat

Before Praat can be effectively used to optimize one’s pronunciation of English, a thorough introduction to the software itself and its general functions relevant to the learning program is in order (see also, e.g., references given in Sect. 2.2). During this phase, the tutor or teacher guides the learners step by step through Praat, introducing general issues. For this, Zoom is used, which enables the tutor to speak to the students and share her or his screen to demonstrate functions in an easy-to-follow fashion. Of course, during this phase of explicit instruction, participants can intervene at any time, raise questions, and the tutor can repeat specific aspects upon request. Issues that are explained during the instruction phase are specified in the following paragraphs (see Boersma & Weenink, 2021).

Step 1

First of all, Praat has to be downloaded from the website https://www.fon.hum.uva.nl/praat/. The tutor explains that users are not charged any fees, select their operating system (e.g., Windows) as well as the appropriate edition (32 or 64 bit, this piece of information is found in the system information section of one’s computer and the tutor must be familiar with this in order to help, if need be), and download and install the program within a minute. The tutor is available and can help if technical problems occur here or at any other moment during the exercise.Footnote 2

Step 2

Praat can now be started and two windows open. One, called Praat Picture, is not needed for the present exercise. One exclusively works with the other window, called Praat Objects (see Fig. 1), which is shown to the participants of the course via the screen sharing function. It is stated at this step that Praat Objects represents the space where all sound files that one records or uploads from the computer appear. Furthermore, it is explained which tabs from Praat Objects are relevant to the current exercise and what they are used for; more details will follow at later steps. Learners will make use of the tabs New, Open, and Save. New leads one to an interface to record spoken language, which can be analyzed in Praat afterward. Open is used if one intends to upload sound files already available, for instance, files recorded with Praat at earlier stages. Clicking on Save, in turn, users select where and in which format the recorded materials are stored.

Fig. 1
A screenshot of the opening window of Praat. The title bar has a document icon labeled Praat Objects to the left, and 3 buttons to the right for minimize, maximize, and close. The menu bar has Praat, New, Open, and Save to the left and Help to the right. Buttons at the bottom are labeled Rename, Copy, Inspect, Info, and Remove.

The very beginning: Praat objects

Step 3

Assuming that our target group has never worked with Praat or a comparable recording platform, the tutor points the learners to the tab New and the subsequent specification Record mono Sound (see Fig. 2). This leads us to the SoundRecorder, in which one records the English words, phrases, or sentences that are supposed to be analyzed at later stages (see Fig. 3). It needs to be mentioned that the channels specification Mono and a predefined sampling frequency of 44,100 Hz are adequate (without further details on why). Users choose the name of the sound file and fill it in the cell in the bottom right corner in Fig. 3.

Fig. 2
A screenshot of starting a new recording in Praat. Click on the New button on the menu bar. A drop-down box appears which reads record mono sound, record stereo sound, sound, matrix, tables, tiers, create text grid, create corpus, strings, articulatory synthesis, create permutation, polynomial, multidimensional scaling, acoustic synthesis, constraint grammars, symmetric neutral networks, and feedforward neural networks.

Starting a new recording

Fig. 3
A screenshot of the sound recorder. The title bar reads soundrecorder. The menu bar has file, query, and meter. The page to the left has channels mono and stereo, in the center meter, and to the right sample frequency from 8000 to 192000 hertz. Below it has record, stop and play to the left and name to the right. The next line has close, save to list, and save to list and close.

Recording a new sound

Step 4

It is now possible to record sound with Praat. To record a sound file, users must be in a quiet environment. At this step, one clearly sees one major benefit if the current exercise is completed online. Of course, some parts of it could also be conducted in a classroom, but for the present step, the online scenario brings an important advantage. That is, it is much more likely to find a silent place at home in comparison to a classroom or school. Even if family members or others are at a person’s home, too, it should be manageable to guarantee that the recordings are realized in a quiet environment. This would be more difficult in a classroom or school building, where several students are present at the same time and cannot record simultaneously in the same place. Since recording is necessary several times during the exercise, the virtual space offers a unique opportunity to improve one’s competencies in English in a timely and efficient way. Everyone can record on her or his own and can then easily join the entire group again.

The tutor illustrates the recording process by emphasizing the following issues. To record, press the Record button, produce the respective word, phrase, or sentence, press the Stop button, and press Save to list. During the recording, the meter should remain green, which is the case if speakers realize speech at a normal volume (see Fig. 4). Figure 5, in turn, shows an example where a speaker screamed and the meter reached yellow and even red areas; this needs to be avoided in the exercise.

Fig. 4
A screenshot of the soundrecorder while recording. The quarter of the box below the meter is filled with green color.

Meter if the volume is good

Fig. 5
A screenshot of the soundrecorder while recording. The color in the meter box is green below, yellow in the middle, and red at the top.

Meter if the volume is too high

Having clicked on Save to list, the sound, simply called Sound here, appears in Praat Objects (see Fig. 6). Saving the sound file to one’s computer is an essential step, since Praat does not automatically do that. Clicking on Save, then Save as WAV file, and choosing a folder to store the materials does the job (see Fig. 7).

Fig. 6
A screenshot of Praat objects. The page below the menu bar reads objects to the left with one file sound sound. To the right, it reads vocal toolkit with a dropdown menu copy, process, sound help, view and edit, play, draw, query, modify, annotate, analyse periodicity, analyse spectrum, to intensify, manipulate, convert, filter, and combine.

Recorded sound (not saved)

Fig. 7
A screenshot of saving recorded sound. In the Praat objects file, when the save button has been used the dropdown of 20 save as categories and append to the existing sound file appears.

Saving a recorded sound

Step 5

Next, the tutor shows how one can listen to and visually inspect a sound file by explaining the following aspects. Selecting View & Edit on the right side of Praat Objects, one sees the visual representation of the recorded sound. The two decisive visualizations given in Fig. 8 are the waveform on the top and the spectrogram below, specifically for the word pin, which was randomly selected as an example (see, e.g., Boersma, 2013). The waveform plots the air pressure (y axis) and the time (x axis). That is, one observes the variation of air pressure – the result of varying articulatory effort – as the word is realized step by step (see, e.g., Ebert & Ebert, 2010; Hoffmann, 2010; Reetz, 2003). The spectrogram, in turn, visualizes frequency (y axis) and time (x axis); in addition, one sees brighter and darker shading, indicating more or less intense frequency areas (see, e.g., Ladefoged, 2003). Crucially, one cannot only examine the entire word using the waveform and spectrogram, but one can also zoom in to a specific part of the word. After the listening orientation, the learner can mark the area of interest with the cursor (see Fig. 9) and zoom in using the sel (selection) function in the bottom left corner. The selected part is enlarged in Fig. 10. Note that you can listen to a part by clicking on the field where the duration is specified. For instance, if one clicks on “0.500137” in Fig. 8, which is the total duration of the sound file (about 500 milliseconds), one hears the entire file. If you click on “0.050922” in Fig. 9, one hears only the selected part of the file, shaded in red. At this stage, it is important that learners are exposed to the two visualization types and are taught their basic idea. Reading and interpreting these figures requires experience and specific exercises. Therefore, we go into more detail in the exercise steps presented below. After this general introduction to Praat by the tutor, one is now ready to proceed to the specific exercise. The Praat functions needed for the exercise and the role of the tutor and the learners are outlined in detail below.

Fig. 8
A graph of air pressure and frequency versus time for a word recording. A waveform is formed for pressure versus time and a spectrogram for frequency.

The word pin visualized in Praat

Fig. 9
A screenshot of a graph of air pressure and frequency versus time. The waveform of air pressure at 0.050922 seconds is highlighted.

Figure 8 again, with the red shading representing the selected portion

Fig. 10
The screenshot of the selected part of the waveform in a magnified view is highlighted.

Enlarged version of the selected portion from Fig. 9

3.2 The Specific Exercise

The suggested exercise has the objective to optimize the pronunciation of a specific group of English consonants, namely voiceless interdental fricatives. It is well known that English interdental fricatives, as word-initially in the word think, which are not part of the German phoneme inventory, often represent a source of inaccuracy in the realization of German non-native English. Typical mispronunciations include replacing the interdental with an alveolar fricative, producing homophones for think and sink (see, e.g., Hickey, 2020). Relying on Praat, the auditory judgment, the visual representation, and mathematical calculation of the acoustic properties of fricatives, the current exercise helps notice the potential inexactness in pronunciation, or, in the positive case, reassures learners that the production is already adequate.

Step 6

The tutor instructs the students to read out and record the passage below. In order to ensure an unconscious and unfocused expression of the target fricatives, learners are requested to read and record a short text passage – and not just single words – containing tokens of the interdental and alveolar fricative. Doing so, learners do not immediately realize the purpose of the exercise and one can collect real and undistorted data. The precise formulation and text passage are given in (1) and are sent out by the tutor via email. We are specifically interested in the words Miss versus Smith and sink versus think. The words contain either the voiceless alveolar ([s]) or the voiceless interdental fricative ([θ]), once in syllable-initial and once in syllable-final position. Note that the words Miss and Smith on the one hand and sink and think on the other hand are embedded in comparable positions and structures in order to keep the environment, which might affect the articulation of speech sounds, as constant as possible. Relying on the information described in Sect. 3.1, learners are capable of recording and saving a sound file in and with Praat. The passage is read three times to ensure that, in the case of potential slips of the tongue or hesitations during the reading process, learners have at least one file to work with for each test case.

  1. 1.

    Read out the following text passage at a comfortable pace and record and save this using Praat. Read this passage three times and save each version in a separate file. If you have a good external microphone, use this; if not, the microphone of your computer is fine as well. Footnote 3 You have 5 minutes to do so.

    Before Miss Miller left the house, Mister Smith had called her. He told her that the boat would sink soon and that they would have to think about a new one.

Step 7

Once the recording has been saved, learners receive a sound file via email from the tutor containing the passage read by a native speaker of English, which contains the aspects one is interested in and which serves as a comparison. They are asked to work on the following task (see 2).

  1. 2.

    Now, consider both your own recording and the recording from the English native speaker in Praat. Listen to the files and use the View & Edit function. Please focus on the words Miss, Smith, sink, and think; you can ignore the other parts. With respect to the native speaker’s sound file, do you notice similarities and differences between the final sounds in Miss and Smith in the waveform and spectrogram? What about the initial sounds in sink and think, are there any comparable or distinct patterns that you see? Can you observe the same patterns in your own recordings or does your production look quite different? Work on your own first (15 minutes), before discussing your findings with a partner online (15 minutes).

The exercise asks learners to explore the phenomenon on the basis of the sound files and to detect the acoustic characteristics of the segments in focus. Crucially, sound files from a native speaker of English are provided to give the learners an idea of how it is supposed to look like. Note a general aspect here. It is clear that even among native speakers of English a lot of variation in pronunciation exists. This variation can be due to several factors, one being the variety of English (e.g., Canadian, Scottish, Australian) someone speaks. Therefore, before a native speaker reads the text passage, the tutor has to ensure that the person adequately produces the distinction between the [s] and the [θ]. The distinction is actually realized in most native varieties of English, with a few exceptions (see, e.g., Hickey, 2008). Let us assume for the sake of the argument that the realization of a learner’s [θ] is inadequate, specifically, that the learner produces an [s] at the beginning of think, not the [θ]. In this situation, there is a clear contrast between the native speaker’s and the learner’s files: the recorded target words (e.g., sink versus think) should look dissimilar for the native speaker but similar for the learner. While Fig. 11 below shows the waveform and spectrogram of the English word sink in Praat, Fig. 12 represents the word think. The two words are correctly produced, and the visualizations mirror these accurate realizations.Footnote 4

Fig. 11
A screenshot of the waveform and spectrogram of the word sink. The waveforms are tightly packed, and the spectrogram is also darkly shaded before 0.285464 seconds. The frequency of the word sink is high in the first half.

Visualization of the acoustic properties of sink

Fig. 12
The screenshot of the waveform and spectrogram of the word think. The waveform has a high amplitude, and the spectrogram is darkly shaded after 0.237502 seconds.

Visualization of the acoustic properties of think

Step 8

Obviously, learners vary with respect to how many details and how much information they find themselves during the phases of individual and partner work. It is assumed here that the students have difficulties in interpreting the waveform and spectrogram and in comparing their own sound files to the file with the native speaker data. Therefore, a thorough and profound follow-up discussion is of utmost significance. The tutor guides the learners through the phenomenon in a step-by-step manner, focusing on the following aspects.

The [s] serves as a kind of baseline, since it is a common sound not only in English but also in German, and learners are therefore expected to produce it accurately. Hence, typical patterns of this fricative are pointed to in the waveform and spectrogram. The articulatory and acoustic characteristics of fricatives are described in detail in the literature (see, e.g., Ladefoged, 2003; Ladefoged & Maddieson, 1996; Machač & Skarnitzl, 2009; Reetz & Jongman, 2009; Zsiga, 2013) and can be used as a theoretical foundation by the tutor. First, fricatives are realized with a slight constriction at some place in the oral cavity, which creates turbulences when air goes through this narrow passage. The turbulences, or fricative noise, are clearly mirrored in the waveform (see the black oscillations crossing the x axis in the portion shaded in red in the waveform of Fig. 11). This represents the first feature of fricatives visible in Praat. The fricative noise of the [s] in sink is shown in Fig. 11. Second, one can recognize the [s] in the spectrogram, where it often features a specific structure. Most of its energy is located in the frequencies higher than 6000 Hz, for example, between 8000 and 9000 Hz (Ladefoged, 2003; see also Machač & Skarnitzl, 2009). Clearly, darker regions indicate the increased energy and this is visible, roughly, in the area between the two red arrows in Fig. 11.

Step 9

Now, having considered some general aspects of fricatives and the [s] is particular, the next decisive question is how the [s] can be acoustically differentiated from the [θ]. Here, again, the tutor needs to help, relying on the following issues this time. Generally speaking, while the [s] is a so-called sibilant, producing high-pitched and loud noise as a result of the airflow hitting the teeth, the [θ] is not a sibilant and characterized by a noise that is lower-pitched and less intense (Zsiga, 2013, see also, e.g., Beňuš, 2021; Yavas, 2016). This difference is visible in Figs. 11 and 12 in two ways. For one, the amplitude in the waveform, that is, the positive and negative excursions of air pressure on the y axis (Ebert & Ebert, 2010), is more extreme for the [s] than for the [θ]. Second, it is possible to use an intensity analysis (see, e.g., Styler, 2021). The yellow line, once in Fig. 11 and once in Fig. 12, reflects the intensity of the speech sounds and mirrors the aforementioned difference between the [s] and the [θ], that is, the line is higher for [s] than for [θ], which one expects due to the louder noise of the former. To “transfer” the yellow line into a more objective and detailed intensity analysis, one can make use of Praat’s intensity calculations. To do so, one selects the two fricatives with the cursor (see the red shading in Figs. 11 and 12). Note that the segmentation of speech can be a complex task, for which one needs solid criteria to state when one segment ends and the next begins. Separating fricatives and vowels, as in our cases here, is usually one of the easier segmentation scenarios (see, e.g., Ladefoged, 2003; Machač & Skarnitzl, 2009; Turk et al., 2006). To mark the end of the fricative, and therefore the beginning of the vowel, one can rely on the clearly distinct pattern of the waveform. For one, vowels show higher amplitudes in the waveform than fricatives, as can be seen in Figs. 11 and 12. Further, in contrast to the fricative noise described above, vowels are characterized by a (relatively) regular repetition of waves. To recognize this, select the portion shaded in red in Fig. 13 and then click on “sel” to enlarge this part, with the enlarged version given in Fig. 14.

Fig. 13
The screenshot of waveform and spectrogram boundary detection of word sink and vowel. The boundary at 0.047572 is highlighted.

Detecting the boundary between the fricative and the vowel: Part I

Fig. 14
The screenshot of waveform and spectrogram boundary detection of the word think and vowel. The boundary at 0.285483 is highlighted.

Detecting the boundary between the fricative and the vowel: Part II

On the basis of Fig. 13, a very rough marking of the boundary between the fricative and the vowel, and Fig. 14, one can set the boundary at the position of the cursor in Fig. 14, that is, at the red vertical dotted line, which marks the beginning of the more regular pattern and the higher amplitudes in the waveform. The boundary is set at a zero crossing. Note that there are also indications about where to place the boundary in the spectrogram if one zooms out again (see Fig. 15). One clearly sees that the patterns in the spectrogram to the left and the right of the cursor position (red vertical dotted line) are distinct. On the left, in the fricative, one sees a greater amount of energy in the higher frequencies as explained earlier, which becomes fainter and fainter towards the boundary. On the right, in the vowel, one sees the black/dark horizontal stripes further down in the spectrogram, which start at the place of our cursor. These are referred to as “formants” in the phonetic literature and are a typical spectral characteristic of vowels (see, e.g., Reetz & Jongman, 2009).

Fig. 15
A screenshot of the waveform and spectrograph, the intensity between the fricative and the vowel at 0.285483.

Detecting the boundary between the fricative and the vowel: Part III

Having marked the boundary between the fricative and the vowel, one clicks on Intensity (see top menu in Fig. 15) and then on Get intensity. This gives you the average intensity of the two speech sounds, which is higher for [s] (54 dB, see Fig. 16) than for [θ] (39 dB, see Fig. 17).

Fig. 16
The screenshot of the result of the average intensity for s in sink is 54 decibels.

Mean intensity of the [s] in sink

Fig. 17
The screenshot of the result of the average intensity for theta in think is 39 decibels.

Mean intensity of the [θ] in think

The details described give learners a first objective and acoustic feedback about whether the realization of the interdental fricative is accurate. Of course, the pictures vary from person to person and from situation to situation to some extent, but the overall patterns should go in the direction just outlined.

Apart from the waveform and the intensity line, the spectrogram contains information on how the two fricatives [s] and [θ] differ. As can be seen in Fig. 12, the pattern of the [θ] in the spectrogram is clearly distinct from the pattern of the [s] in Fig. 11: the [θ] lacks the marked structure of the [s]. The [θ] is relatively faint and does not show the nuanced color distinction from one frequency area to the next, as expressed in the change from brighter to darker regions in the spectrogram of [s] (see also, e.g., Varden, 2006).Footnote 5 So, if, for instance, an individual speaker’s realization of the word think and in particular the interdental fricative [θ] looks more like the articulation of sink – in terms of patterns in the waveform, intensity line, and spectrogram – the speaker’s pronunciation necessitates improvement.

Further Steps (If Necessary)

Once the theoretical points have been discussed via Zoom, all those learners who still need to optimize their [θ] pronunciation will get additional practice phases, first on their own and then together with a partner, before consulting the tutor again. Relying on the productions of the learners, that is, the sound files along with the waveform and spectrogram, the tutor’s task is to specify potential inaccuracies in the students’ speech and explain the articulatory gestures of the tongue one needs to articulate an interdental fricative, and to contrast this to the known realization of an alveolar fricative. In a class of 10 students, this is done pair by pair in virtual break-out rooms (5 pairs of students). The separate parts of the online exchange (individual work, partner work, work with a tutor) can be repeated several times and are decisive to guarantee an appropriate balance between instruction- and practice-oriented learning sessions. For this, the sound files represent the basis, both for the tutor to derive feedback from her or his auditory and visual impressions and for the learner. Note again that the virtual space offers great benefits over classrooms and schools. Learners can record themselves as often as necessary in a silent environment at home, without being disturbed by others in the same room and without having to look for a quiet place outside the classroom at school.

4 Discussion

Improving learners’ pronunciation represents one of the targets in the foreign language classroom, and a wide variety of tasks, exercises, and materials has been developed for this purpose (see, e.g., Low, 2015; Pennington & Rogerson-Revell, 2019). Among these, one finds some work on the use of the program Praat, a linguistic tool that can assist students during the learning process. In the present chapter, the focus has been on the potential mispronunciation of the interdental fricative /θ/ produced by German learners of English, who are still at a rather low level. Relying on the steps outlined in Sect. 3, learners can, on their own, together with a partner, or together with their tutor, take advantage of Praat’s auditory and acoustic functions to monitor their own speech, compare it to the speech of an English native speaker, and to detect and subsequently improve inaccuracies. The approach described here combines different positive aspects and nicely fits the requirements of modern foreign language learning and teaching.

A major strength of our proposal is its direct applicability. All steps of the exercise are carefully outlined, including a thorough introduction to Praat and the relevant functions. This represents a clear advantage of the present work in comparison to many previous contributions, in which Praat’s role in the foreign language classroom is described, but which miss providing a sufficient level of detail to ensure that learners who have never worked with the tool before can easily use it (e.g., Wilson, 2009). The implementation of clearly defined steps facilitates the learners’ lives and helps them comprehend and improve pronunciation.

Moreover, and compatible with the scope of the present book, the online nature of the activity given in Sect. 3 enables learners to work on their pronunciation more flexibly than in a real classroom and in accordance with their individual needs. One ingredient of the exercise is, if need be, to repeat and record the production of the target segment (/θ/) in order to make progress. Since the simultaneous recording of different students represents a challenge in a real classroom, the virtual environment turns out to be a big plus for this exercise. Students will not disturb each other, and learners who need more practice can do additional trials and remain in exchange with the tutor.

Apart from these two positive characteristics of the exercise, there are at least three others. First, learners benefit, on the one hand, from the tutor’s expertise during the discussion phases but can also, on the other hand, work autonomously and in cooperation with another peer during other parts of the exercise. Second, participants do not only improve their own pronunciation but also acquire (new) knowledge about the physics of speech, compatible with the idea of content-based language learning. Third, in our digital age, using technology in a specific area, such as foreign language learning, can represent an up-to-date and efficient way to make progress. That is, for instance, Praat offers visualizations to illustrate aspects of speech from a different, namely visual, perspective, which can help understand the accurate articulation of foreign language speech.

Different avenues for future research arise and two of them should be pointed out. For one, similar exercises could be developed for other phonetic and phonological aspects, such as the production of (difficult) vocalic, prosodic, or other consonantal issues. Such activities could target, again, German learners of English, but also learners of another foreign and/or with another native language. One example is the production of the English vowel /æ/, which is absent from the German sound inventory. A second route in future research might be to study systematically the effects such exercises have by collecting feedback from learners or evaluating learner speech before and after the completion of the respective task.

5 Conclusion

The current chapter has shown in detail and in a step-by-step manner how the phonetic program Praat can be used in the virtual foreign language classroom to analyze and improve one specific piece of English pronunciation. We hope that learners and teachers can directly benefit from this paper and, by the same token, that other researchers take our work as inspiration to develop similar activities for other linguistic aspects and languages.