Keywords

1 Introduction

Phrase-final lengthening—segmental lengthening at ends of prosodic units—is a complex phenomenon influenced by a number of factors, such as pitch movement type (sentence type) [8], segmental context [2], pausation [12], boundary depth [1] etc. The latter is of particular interest since the data found in literature is scarce and to some extent contradictory.

If we distinguish between two types of prosodic boundaries—signalling the end of the utterance (utterance-final) and signalling the end of the intonational phrase but not the end of the utterance (IP-final)—we might suppose that the deeper the boundary, the greater the degree of phrase-final lengthening; in other words, phrase-final lengthening as a prosodic boundary marker is supposed to have more effect at larger prosodic units (in our case—utterances). However, published experimental data either point to the opposite [12] or to the fact that boundary depth is marked by different speakers differently [1]. Our recent pilot study for Russian [6], where the duration of the word “Ludmila” was measured in different positions within the phrase, enabled us to suppose that utterance-final lengthening might be weaker than intonational-phrase-final lengthening. Since the data was obtained for only one word, answering this question requires a more accurate analysis based on a large speech corpus.

Thus, the main question asked in the present paper is as follows: How does boundary depth influence the degree of phrase-final lengthening?

Additionally, the current analysis will be able provide information on how the lengthening is distributed between the segments of the utterance-final word. In our previous study [4] we have shown that at the end of non-utterance-final intonational phrase the lengthening is observed primarily on stressed vowels, while post-stressed vowels show little lengthening when in open syllables, and no lengthening at all when in closed syllables. Therefore, a question arises whether this holds true for utterance-final words.

2 Material

For the present study the Corpus of Professionally Read Speech (CORPRES) [10] was chosen. The corpus contains texts of different speaking styles recorded from 4 male and 4 female speakers—approx. 30 h of manually segmented speech with phonetic and prosodic annotation. Prosodic information includes the type of the intonation contour, pause type, and prominence.

From this corpus 4 of the 8 speakers were taken, 2 male (A, M) and 2 female (C, K), since they recorded more material than others (approx. 5 h for each speaker).

3 Method

3.1 Obtaining Data from the Corpus

In order to get all the necessary information about the segments (vowels) in question, a Python script was written to process the annotation files of the corpus. For each segment we obtained information about its duration and context.

It is also worth noting here that for the purposes of this study only the voiced parts of vowels were included in the analysis.

3.2 Factors Included in the Analysis

Based on our previous data for Russian as well as papers for other languages we assume that the following factors influence the degree of phrase-final lengthening: segmental context, pitch movement type and prominence, boundary depth, pausation, stress pattern, length of the prosodic unit, speech tempo.

Thus, contextual parameters for each analysed segment of the corpus included the length of prosodic units (clitic group and intonational phrase) where the segment occurs and the position of each prosodic unit within the higher one; the CV-pattern of the clitic group where the segment occurs (e. g. “cVccv” for

figure a

,“mask”; “c” stands for the consonant, uppercase “V” stands for the stressed vowel, lowercase “V” stands for the unstressed vowel); the presence of a pause after the intonational phrase; the type of intonation contour and its location. (The CV-pattern of each clitic group was calculated based on its acoustic transcription. This means, in particular, that in cases where phoneme /j/ was vocalized, it was treated as a vowel.)

Each of the factors listed above were either varied, taken as constant or normalized. Besides boundary depth, which is the object of the present study, we decided to consider stress pattern as the second parameter since we were interested in how the lengthening is distributed among the segments of the final word. In order to obtain statistically reliable results, we decided to process only words ending in -cV, -cVc, -cVcv, -cVcvc, and -cVccv, which are the most frequent CV-patterns occurring word-finally in Russian.

Based on the frequency data, we have selected only intonational phrases containing from 2 to 6 clitic groups and at least 5 syllables. We also limited our choice to intonational phrases with no internal pausesFootnote 1, no prominent words, and no nuclear stress on the IP-final or utterance-final clitic group (e. g. intonational phrase

figure b

(“kicked him not three times [pause] [, but ... times]”) with nuclear stress on

figure c

(three)).

To eliminate pausation as a factor influencing the degree of lengthening, we observed only those intonational phrases and utterances which were followed by a physical pause.

3.3 Duration Normalization

In order to be able to compare duration values for different types of segments (e.g., closed and open vowels, which differ in inherent duration) it is reasonable to calculate normalized duration values. Here the formula given in [12, formula (4)] was used, which allowed us to compensate for the average duration of the segment, its standard deviation, and tempo:

$$\begin{aligned} \tilde{d}(i)=\frac{d(i)-\alpha \mu _{p}}{\alpha \sigma _{p}} \end{aligned}$$

where \(\tilde{d}(i)\) is the normalized duration of segment i, d(i) is its absolute duration, \(\alpha \) is the tempo coefficient, and \(\mu _{p}\) and \(\sigma _{p}\) are the mean and standard deviation of the duration of the corresponding phone p.

The tempo coefficient (\(\alpha \)) was calculated using formula provided in [12, formula (6)]:

$$\begin{aligned} \alpha =\frac{1}{N}\sum _{i=1}^{N}\frac{d_i}{\mu _{p_i}} \end{aligned}$$

where \(d_i\) is the duration of segment i, and \(\mu _{p_i}\) is the mean duration of the corresponding phone.

3.4 Statistical Analysis

To estimate the influence of different factors on segment duration, statistical analysis was carried out using R. For normally distributed data ANOVA and pairwise t-tests were used with Welch’s correction for unequal variance if necessary; for non-normally distributed data Kruskal-Wallis test was used instead.

4 Results

Our analysis included a comparison of clitic groups occurring in three positions:

  • position 1: IP-medial, the clitic group not followed by a pause;

  • position 2: IP-final, but not utterance-final, with a following pause;

  • position 3: utterance-final, with a following pause.

The mean normalized duration values for stressed and post-stressed vowels are provided in Tables 1 and 2, respectively. The values are grouped according to the CV-pattern of the end of the clitic group where the vowel occurs.

Table 1. Mean normalized duration values for stressed vowels in clitic groups ending in differenc CV-patterns, for 3 positions: (1) IP-medial, (2) IP-final, (3) utterance-final (4 speakers). Sample sizes are given in brackets.
Table 2. Mean normalized duration values for post-stressed vowels in clitic groups ending in different CV-patterns, for 3 positions: (1) IP-medial, (2) IP-final, (3) utterance-final (4 speakers). Sample sizes are given in brackets.

4.1 Boundary Depth

As shown in Table 1, there is a clear tendency that stressed vowel duration values are higher in IP-final position (position 2) compared with utterance-final position (position 3). This difference is statistically significant (p<0.05)

  • for 3 out of 4 speakers for clitic groups ending in -cV;

  • for 1 out of 4 speakers for clitic groups ending in -cVc;

  • for 1 out of 4 speakers for clitic groups ending in -cVcv;

  • for 0 out of 4 speakers for clitic groups ending in -cVcvc;

  • for 1 out of 4 speakers for clitic groups ending in -cVccv.

Therefore, this tendency is rather weak in all cases but the first, where the stressed vowel occurs in absolute-final position (ultimate open syllable).

The same general tendency is observed for the post-stressed vowels (see Table 2). The difference between these two positions is statistically significant (p<0.05):

  • for 2 out of 4 speakers for clitic groups ending in -cVcv;

  • for 1 out of 4 speakers for clitic groups ending in -cVcvc;

  • for 3 out of 4 speakers for clitic groups ending in -cVccv.

Similarly to stressed vowels, for post-stressed vowels this effect is more pronounced for ultimate open syllables.

We might conclude that in Russian both stressed and post stressed vowels show more phrase-final lengthening at ends of non-utterance-final intonational phrases than at ends of utterances, but this tendency rather weak for vowels in non-ultimate or closed syllables.

On the other hand, we might be observing at least two different strategies for signalling boundary depth. Some speakers tend to mark non-final intonational phrases using durational characteristics, while others fail to reveal any durational differences; the latter group might be using other markers such as pitch, pausation or amplitude.

4.2 Utterance-Final Lengthening

In order to explore the nature of utterance-final lengthening (as opposed to IP-final lengthening), a comparison of duration values for IP-medial position and utterance-final position was performed. As shown in Table 1, there is a clear tendency that stressed vowel duration values are higher in utterance-final position (position 3) compared with IP-medial position (position 1). This difference is statistically significant (p<0.05)

  • for 3 out of 4 speakers for clitic groups ending in -cV;

  • for 4 out of 4 speakers for clitic groups ending in -cVc;

  • for 4 out of 4 speakers for clitic groups ending in -cVcv;

  • for 4 out of 4 speakers for clitic groups ending in -cVcvc;

  • for 4 out of 4 speakers for clitic groups ending in -cVccv.

Therefore, for stressed vowels this tendency is very strong in all cases.

However, the same general tendency does not seem to be observed for the post-stressed vowels (see Table 2). The difference between the two positions is statistically significant (p<0.05):

  • for 1 out of 4 speakers for clitic groups ending in -cVcv;

  • for 0 out of 4 speakers for clitic groups ending in -cVcvc;

  • for 1 out of 4 speakers for clitic groups ending in -cVccv,

which demonstrates that post-stressed vowels are not lengthened much in utterance-final position.

This leads to a conclusion that the main carriers of utterance-final lengthening are stressed vowels and not post-stressed vowels. Similar results were obtained in our previous studies on phrase-final lengthening [4] for intonational phrases occurring non-utterance-finally.

5 Discussion

For Russian our data provide evidence that the deeper the boundary, the weaker the lengthening effect. In other words, the speaker marks the end of non-utterance-final intonational phrase better than the end of utterance-final intonational phrase—thus showing that the utterance is not finished and the listener is expected to wait for its ending. However, this relation might be speaker-specific, since not all the speakers show a statistically significant difference between these two boundary types.

The present study also enables us to draw conclusions on the locus of utterance-final lengthening. It is widely accepted that it is the final rhyme of the word where the phrase-final lengthening effect is most prominent (see, for example, [1, 3, 12] etc.). Some evidence that the stressed vowel is also involved, but to a lesser degree, can be found in [11] for English.

It is also known that phrase-final lengthening, although considered a universal phenomenon, can in some cases be language-specific. A well-known examples are Japanese [9] and Northern Finnish [7]—languages with phonemic vowel length—where the lengthening effect is limited so that the contrast between short and long vowels could be preserved. Russian, where the main correlate of lexical stress is vowel duration, might also differ from other European languages in terms of the locus of phrase-final lengthening.

Our data seem to support this hypothesis, showing that stressed vowels in penultimate syllables play a greater role in phrase-final lengthening than post-stressed vowels. However, it is worth noting here that final rhyme may not only consist of a vowel, but also include following consonants. Since the present study is focused on vowel duration, it does not provide a full answer to the question of how final rhyme is involved in this process. Our previous studies [5], though, have shown that absolute-final consonants do play a significant role in phrase-final lengthening.