Keywords

1 Introduction

In a land full of diversities, such as India, the various states and provinces follow their respective spoken languages. According to Indian grammarians, there are three grades of recognized accents: “Udatta” which means a raise or elevator that indicates the highest pitch. Second, the term “Anudatta” stands for unelevated or non-accent pertaining it to a low-pitched syllable. “Svarita” the third accent is a mixture of both low and high pitches within a syllable. When this Vedic nomenclature of accents is compared to the tones of Punjabi language in the work done by Grierson, it was found out that, only “Udatta” could be compared to the high tone of Punjabi language, whereas as compared to “Svarita”, the low tone of Punjabi language falls for the first syllable but rises for the rest [1]. The cause for this mismatch became the basic motivation for this study because there are various aspects of Punjabi language which remain uncovered because of it being an under resource language. The variation in these tones is a resultant of the position of the five tonemes ( ) of Punjabi language [2]. Other than these tonemes, one of the phonemes also exhibits tonal characteristics, when placed at the final position, and shows a leveled tone whenever it occurs at the initial position. The differences in the pitch values gathered more interest when it was noticed that the dialectal differences when analyzed exhibit different tonal characteristics. The later part of the paper covers the analyses of pitch variation among most of the dialectal words that have been caused because of the regional variations across the same state. The importance of this study lies in the fact that the correct identification of the lexical is a crucial task while designing any speech processing system.

2 Literature Review

According to ethnologies 2005 estimate, there are 88 million native speakers of Punjabi language which make it 10th most widely spoken language in the world and according to 2001 Census of India, there are 29,102,477 Punjabi speakers in India [3]. But yet the area of research has not been progressive for this language though. The research work/the features of the speech signal in accordance with Tones have been majorly studied for mandarin languages. A tone detection methodology for Mizo language was designed in 2015 that used quantitative analysis of acoustic features of Mizo language [4]. Here, the tone was detected by relying on slope and height due to the availability of a large database. The z- score normalization of the signal is used for eliminating the effects of gender and then the pitch variance results were comparatively analyzed to distinguish whether the tone can be marked as high, low, falling or rising tone. Singh Panday and Aggarwal’s (2015) study of Punjabi Tonemes [3] covered the five Tonemes of Punjabi language and their high, low, and mid-tones, and the paper also throws light on the IPA of these high and low tones words. A representation of an experimental study of the tonal characteristics of the laryngeal phoneme of Punjabi language included the study on words containing phonemes of Malawi dialects carrying tonal effects, as recorded from native Punjabi speakers and then experiments were performed using Praat and Matlab [2]. Tonal analysis of the /h/ phoneme is studied using the (f0) fundamental frequency contour. This study showed that at a syllable level /h/ can reflect tonal occurrences whereas no such thing is observed when /h/ is considered at its initial positions [2]. Analysis of vowel phonemes in Punjabi has been performed but still there persists a twofold interface on acoustic features of vowels in two different languages [1]. The paper throws a lot of light on the fact that the effect of the other non-native languages and changing scenarios has a significant impact on the original Punjabi language, and it is one of the essential features to be kept in mind by designing any ASR system. Furthermore, the detection of Mizo Tones [4] included a lot of technical study over the tonal lexicon in Mizo language. Another paper on the Lexical stress in Punjabi language and its representations in PLS included a lot of linked information with PLS design and a new study about the relation between suprasegmental phonemes such as tone, nasalization, and stress at syllable level [5]. The study made it evident that the nontonal disyllabic words can also carry stress on the second syllable, which can be illustrated through the IPA, which contains the encoded PLS data.

3 Speech Corpus Structure

The regions of undivided Punjabi included the Malwa, Doaba, Majhi, and Puadh regions of Punjab along with the Himachali belt (Rullui, Mandiali, Kangri, and Chambiali). The designed corpus consists of dialectal varieties of undivided Punjab. Thus, the speech corpus was enriched by including the dialectal linguistic differences/varieties from the regions of Undivided Punjab. The sample speech sentences employed in the corpus are shown in Fig. 1.

Fig. 1
figure 1

Speech corpus including different dialects of Punjabi language

The input signal is recorded at 44 kHz using the Sound Forge Software in studio environment. The accuracy and efficiency estimation of pitch of the analyzed dataset is performed on the basis of Tonal and Dialectal variations, through Praat software. The speech corpus includes certain words whose pitch and lexicon vary from region to region; some of the words are depicted in Table 1.

Table 1 Lexicon variations across the regions of undivided Punjab

The idea /motive behind this design of the dataset/corpora were to include all the possible dialectal variations of Punjabi language considering its dialect dependent tonal variations. Section 4 shows speech signal would be modeled to conclude the necessity for study of tones while designing a speech system, especially for an under resource language like Punjabi.

4 Speech Signal Modeling

An input signal is studied for the tonal variations caused because of dialectal differences and of position of its vowel. While modeling a speech signal the recorded signal was studied on the base of tonal variations on three grounds (low, high, mid). Figure 2 shows the canonical pitch contour for Punjabi language. The high tone is a rising–falling tone ( ), low tone is a falling tone ( ), and mid-tone has an intermediate pitch between high and low tones [6].

Fig. 2
figure 2

Canonical pitch contours of Punjabi

Further analysis of the signal is done as per the block diagram shown in Fig. 3.

Fig. 3
figure 3

Block diagram of speech modeling

The input signal is based on dialectal variations of Punjabi language. The signal is then Z-score normalized over a fundamental frequency (F0) to immune it to the gender effects. The tonal word is identified as per the annotations provided in the Praat software the pitch and intensity contours are analyzed from the given input signal.

Z-Score Normalization

As pitch variation due to gender difference is a factor to be overcome while processing the speech signal, thus the Z-score of the pitch contour is taken to normalize the data to certain frequency that makes it gender independent. The Z-score takes a sample within a set of data and determines the number of standard deviations above or below it. The Z-score of a sample can be calculated using the equation given as [4],

$$ p_{z} (x) = \frac{p(x) - \mu }{\sigma } $$

where \( \upmu \) is mean and \( \upsigma \) is standard deviation.

Figure 4 represents the effect of Z-score normalization on the recorded input sound signal by a male and female speaker of the same dialectal region, respectively.

Fig. 4
figure 4

a, b Represent recorded voice of the male and female of the same region and c, d respective Z-score normalized waveform

Role of F 0 frequency

There is a substantial amount of data on the frequency of the voice fundamental (F0) in the speech of speakers who differ in age and sex [3]. The voice fundamental frequency plays a very important role while differentiating the male and female speakers. Published data on the frequency of the voice fundamental (F0) in speech shows its range of variation, often expressed in terms of two standard deviations (SD) of the f0-distribution, to be approximately the same for men and women if expressed in semitones [7]. The male speakers have a low F0 and the female speakers have a high F0; therefore, only F0 values cannot be used for the representation of underlying tonal features of a language. Table 2 states the different values of fundamental frequencies (F0) for the various dialects.

Table 2 F0 values of the recorded signals

Comparative Lexicon Tonal Analysis (Pitch Contours)

As stated before, the dialectal differences as well as the position of the Toneme determine the variation in the pitch and the intensity contour of the dialectal and tonal words.

The effect of tonemes position on the tone of the signal has already been analyzed in Fig. 2.

The effect of tonemes position on the tone of the signal has already been analyzed in Fig. 2 above. The results of this comparative analysis among the dialectal variations are shown through the given Fig. 5. The following graphs are some of the few examples of the pitch contours for the dialectal words in the recorded dataset.

Fig. 5
figure 5

a, b Pitch contour variations for dialectal words

5 Results and Experimental Analysis

The pitch boundaries based on the fundamental frequency of the signal show the dialectal and the tonal variations of the word in different regions. The values of the fundamental frequency (F0) have already been determined and illustrated in Table 2. Table 3 represents the change in the values of the pitch and corresponding Intensity variation of the dialectal varieties of the input signal. From this experiment, it was analyzed that the values of mean pitch for the Punjabi dialects of the present regions of Punjab range from 130.16 to 250.62 and for the Himachali belt of undivided Punjab it ranges from 124.75 to 248.01. Though the ranges are quite similar to one another, we could easily judge from the table given above that the difference in Pitch variation is minimum in the Punjabi dialect Majhi and Punjabi dialect across Himachali belts (Chambiali and Mandiali). These regions offer higher values of pitch indicating the occurrence of more high tone signals. The regions of Kullui, Malwai, and Doabi offer a similarity with the lower values of mean pitch indicating more of low tones. The intensity though remains as an almost constant value which shows a negligible amount of change when observed for the various Punjabi dialects of the regions of Undivided Punjab. Nevertheless, the Chambiali region reflects the intensity and mean pitch very close to the Modern Punjabi dialectal regions as compared to the others.

Table 3 Pitch and intensity variations across different regions of undivided Punjab

6 Conclusions

The paper describes the dialectal variations reflected in the tone of the signal. The pitch, intensity, and fundamental frequency variations of the signal are studied. The pitch boundaries based on the fundamental frequency of the signal show the dialectal and the tonal variations of the word in different regions. The determined values are important because inclusion of tonal information of the words while designing the ASR can show a considerable increase in the efficiency of the designed system.