1 Introduction

Infant cry analysis is a multidisciplinary area, where contributions have been made by the paediatrics, linguists, psychologists, neuroscientists and the engineers. Though, most of the contributions in this field are from the domain of paediatrics, now-a-days interest is taken by researchers from other domains for infant cry research. Infant cry analysis is necessary to understand the needs of the infants and identifying the pathological infants in the initial stages so that they can be treated in the initial stages of disease and can be protected from possible temporary or permanent disorders. In newborns, the cry characteristics, such as, kitten-like cry is an indicator of the possibility of infant suffering from genetic disorders. Similarly, a hoarse cry is an indicator of cramped muscles. Another important aspect in infant cry analysis is the identification of infants who at risk of sudden infant death syndrome (SIDS) (Corwin et al. 1995). SIDS is the condition in which an infant die all of a sudden and the reason of death remains unidentified even after autopsy. Hence, research in infant cry analysis may prove helpful in developing some applications or devices which can monitor the activities of the infants and help the parents.

The work done in infant cry analysis is mostly towards the analysis and classification of cry types. Cry has been divided in different cry types such as hunger, pain, pleasure and birth cries. Estimation of fundamental frequency (F 0) for infant cry signal is proposed in (Petroni et al. 1994). Another method, which was used for F 0 estimation, is average magnitude difference function (AMDF) along with simple inverse filter tracking (Manfredi Claudia et al. 2006). Research has been done in the area of classifying normal infant’s cries from the pathological infant’s cries (Chittora and Patil 2015a). Most of the work is done towards classifying normal infant’s cries from the cries of deaf infant/infant with hearing disorder. The spectral feature set, namely, Mel frequency cepstral coefficients (MFCC) has been used as a state-of-the-art feature set for the classification task with various classifiers (Garcia and Garcia 2003; Reyes Galaviz Orion Fausto et al. 2008). Another feature set used for the normal and deaf babies classification is short-time Fourier transform (STFT) features with time delay neural networks (TDNN), general regression neural networks (GRNN) and multi layer perceptron (MLP) neural networks (Hariharan et al. 2012). Three class classifications is performed for classification of normal, deaf and asphyxiated infants using features such as MFCC and wavelet-based features (Saraswathy et al. 2012). Classification of normal and asphyxia is also attempted using MFCC features in Ali et al. (2012). Classification of asthma and HIE infant cries is reported in Chittora and Patil (2014, 2015b). In (Lederman Dror 2002), the work is done to classify normal and infants with cleft palate, preterm infants and sick infants (cri-du-chat and down’s syndrome) using MFCC, linear prediction coefficients (LPC), linear prediction cepstral coefficients (LPCC) and fundamental frequency (F 0) features using hidden Markov models (HMM). MFCC feature set has also been used to analyze cries of infants suffering from hypothyroidism (Zabidi et al. 2009). In last few decades, attempts have been made to classify and analyze different infant cry types. The cry types defined by several researchers for infants are hunger, pain, pleasure, discomfort, fear, anger and birth cry. Classification of fear, anger and pain cries using MFCC features is reported in Petroni et al. (1995, 2009). Hunger vs. no hunger and pain vs. no pain cries are classified using MFCC feature set with support vector machines (SVM) classifier and NN ensembles (Barajas-Montiel and Reyes-Garcia 2005; Singh et al. 2013). Analysis of pain and manipulation cries (cry during cloth changing) is performed using pitch (F 0), formant frequencies F 1, F 2 and F 3 (Baeck and Souza 2001).

Some researchers have worked in the analysis of first cry of the infants. Most of the work in this direction is done by the medical practitioners and researchers. In (Nicollas et al. 2012) authors have used larynx of two dead newborns to generate sounds by applying air pressure. Their finding shows that the role of larynx is same as excised organ, free of neurologic control. Their role in first cry is not to vibrate by themselves, however, to generate aerodynamic perturbations generating supraglottic vibrations. Complex interactions are responsible for the nonlinear phenomenon found in first cry signal. Neurological control and regulation is absent in first cry. In another study, researchers have used the newborns’ cries to find out the effect of prenatal exposure to cocaine (Bauer et al. 1994). In this paper, distinction between first cry and other cry types is reported using different features and effectiveness of these features in infant development is presented. In our earlier paper, we have shown the importance of the feature unvoicing percentage in the infant cry for the study of infant cry pathologies and development (Chittora and Patil 2015c). However, along with this feature, other features are used in this paper and found useful in the newborn infant cry analysis.

The rest of the paper is organized as follows: estimation of fundamental frequency (F 0) using modified autocorrelation method is presented in Sect. 2. Feature extraction and experimental results are explained in Sects. 3 and 4, respectively. Finally, the paper is summarized in Sect. 6 along with future directions.

2 Fundamental frequency (F 0) estimation using modified autocorrelation method

The autocorrelation method of the pitch estimation is widely used for pitch estimation in speech-related applications (Rabiner 1977). In autocorrelation method of pitch estimation, the speech signal is divided in small frames because the speech is a non- stationary signal. For a small frame of speech such as 20–30 ms [comprising of 2–3 pitch periods (T0)], after pre-processing which includes passing the signal through a lowpass filter, autocorrelation is found. Periodicity which is observed in the periodic signal is also observed in its autocorrelation function. The autocorrelation function is symmetric, distance between two highest peaks is calculated which is equal to the pitch period (T0) of the signal. Autocorrelation function method of F 0 estimation does not work well for infant cry analysis because for noisy infant cry signals, sometimes false spurious peaks are present, which gives misleading false peaks and thereby, high frequency values. In this paper, fundamental frequency (F 0) contour is estimated using modified autocorrelation method. In the pre-processing stage, the infant cry signal is passed through a 4th order Butterworth lowpass filter with a cutoff frequency of 1 kHz, in order to remove high frequency harmonics present in the signal. The filtered signal is then segmented in small frames of duration 30 ms with an overlap of 50 %. On each of the cry signal frame, modified autocorrelation method is applied and peaks corresponding to the pitch values are identified and pitch is estimated. In modified autocorrelation method of pitch extraction, the signal is clipped by a reference level C L. The clipping level C L is chosen as the 25 % of the maximum peak sample values. The resulting signal is given by:

$$y(n) = clc\left[ {x(n)} \right] = \left\{ {\begin{array}{*{20}c} {\left( {x(n) - C_{\text{L}} } \right)}, & & {x(n) \ge C_{\text{L}} } \\0, & & {\left| {x(n)} \right| \, \langle C_{\text{L}} } \\ {\left( {x(n) + C_{\text{L}} } \right)}, & & {x(n) \le C_{\text{L}} .} \\ \end{array} } \right.$$

For the clipped signal y(n), the autocorrelation function is found using the formula:

$$R'(m) = \sum\limits_{n = 0}^{N - 1 - m} {y(n) \cdot y(n + m), \quad 0 \le m \le M_{0} } ,$$

where N is the length of the sequence, M 0 is the number of autocorrelation points to be computed, m is lag or delay. Clipping of the signal removes the added noise effects and hence, it performs better than autocorrelation method of pitch estimation. From the autocorrelation function applied on clipped signal, the peaks are identified. The difference of the peak locations gives the estimate of the pitch or F 0 of the signal. The examples of the modified autocorrelation method for pitch extraction applied to voiced and unvoiced segments of the cry signal are shown in Fig. 1. We can observe that for unvoiced segments, the autocorrelation function have very less number of peaks and thus, the segments which have less than 6 number of peaks are taken as unvoiced segments and pitch is taken as zero for them.

Fig. 1
figure 1

Modified autocorrelation algorithm for F 0 extraction for Panel I: voiced segments and Panel II: unvoiced frame. In all the subfigures a time-domain signal, b center clipped signal of a and c autocorrelation function of b

In Fig. 1, the modified autocorrelation method is illustrated for the voiced and unvoiced segments. In the proposed method, clipping level was suggested as 64 % of the maximum peak amplitude. In case of infant cry signals, it was observed by intensive computer simulation that keeping such a high threshold for clipping is removing most of the peaks of the signal, thereby does not work for pitch (F 0) estimation. By iterative method, we decided the threshold for clipping as 25 % and this is found to give best results for F 0 estimation. To compare the performance of the F 0 extraction with the standard autocorrelation method, spectrogram is used. In infants, reference glottal flow waveform for comparing the performance of the F 0 methods is not available. The glottal flow waveform cannot be acquired from the infants by non-invasive methods and hence, it limits the availability of the glottal flow waveform for infants. Thus, to compare the performances of the two F 0 estimation algorithms, we used spectrogram. If the estimated harmonics match with the harmonics present in the spectrogram, we can say that the algorithm is better. This decision is made after observing the matching of harmonics with spectrogram for many infant cry samples in order to have decision which is statistically significant. From Fig. 2, it can be observed that the modified autocorrelation-based method of F 0 extraction works better than state-of-the-art method, i.e., autocorrelation method of F 0 extraction.

Fig. 2
figure 2

Comparison of pitch (F 0) extraction methods a autocorrelation method and b modified autocorrelation method

3 Feature extraction

Database: In this study, infant cry data was collected from three hospitals of Visakhapatnam, India. Data was recorded with a handheld Cenix recorder (Model: VR-P2340) with external microphone with sampling frequency of 12 kHz and 12- bit PCM quantization (Buddha and Patil 2007). The pain cries of normal infants were recorded during vaccination, birth cries were recorded the nursing home, hunger cries are recorded when the infant cries because of hunger (duration of last feed is used as an indicator for the identification of hunger cry), cries while passing the urine was recorded when infant passed the urine in routine course or while bathing. From one infant, sometimes more than one cry is also recorded. The duration of the cries varies from 30 to 50 s. The corpus statistics are given in Table 1. From this corpus, cry types are separated as shown in Table 2 for different reasons of crying and age. Most of infants considered in this study are below 1 month of age.

Table 1 Corpus statistics for infant cry analysis
Table 2 Distribution of cry samples of newborn infant’s cries

It is known that our ears are sensitive to two parameters, namely, loudness and pitch (F 0). Loudness is associated with the amplitude of the signal, it is a perceptual feature which is recently found to be associated with the strength of excitation (SoE) (Seshadri and Yegnanarayana 2009). However, pitch (F 0) is also a perceptual feature and is associated with the F 0 of the signal. Hence, to extract information of these two parameters, energy and F 0- related parameters are estimated and using them different cry signals are analyzed. For each of the cry sample, F 0 contour is calculated using the modified autocorrelation method and following features are estimated:

  1. 1.

    Minimum of F 0 contour

  2. 2.

    Maximum of F 0 contour

  3. 3.

    Mean of F 0 contour

  4. 4.

    Median of F 0 contour

  5. 5.

    Normalized energy of the signal (E)

  6. 6.

    Normalized energy in 0–2 kHz (E1)

  7. 7.

    Normalized energy in 2–4 kHz (E2)

  8. 8.

    Normalized energy in 4–6 kHz (E3)

  9. 9.

    Unvoicing percentage in the total cry (UV ratio)

The normalized energy of the signal is defined as the energy of signal divided by the length of the signal, i.e.,

$$E = \frac{1}{n}|X(\omega )|^{2},$$

where E is the normalized energy, n is the number of cry segments and X(ω) is the short-time Fourier transform (STFT) of the signal. The normalized energy of the signal is calculated for the three sub-bands, namely, (1) 0–2 kHz, (2) 2–4 kHz and (3) 4–6 kHz (because the data is recorded at 12 kHz sampling frequency and hence, the maximum available bandwidth is 6 kHz). The unvoicing regions are identified as the segments where the number of peaks in the autocorrelation function is less than 6, thereby giving zero pitch frequency. The sum of frames with zero pitch values divided by the total number of frames present in the cry is considered as the unvoicing ratio of the cry signal.

Different cry types defined in Table 2 are analyzed using these features and analysis of variance (ANOVA) analysis is used to find the significance of these features in various infant cry types. The analysis and the results are given in the next Section.

4 Experimental results

Different cry features are analyzed for the reasons of crying of an infant for following cases:

  1. 1.

    Full term birth cry vs. premature newborn’s cry

  2. 2.

    Full term birth cry vs. newborn’s pain cry

  3. 3.

    Full term birth cry vs. newborn’s hunger cry

  4. 4.

    Newborn’s pain cry vs. newborn’s hunger cry

  5. 5.

    Newborn’s pain cry vs. newborn’s cry due to wet diaper

  6. 6.

    Newborn’s pain cry vs. newborn’s cry during passing the urine

  7. 7.

    Newborn’s cry due to wet diaper vs. newborn’s cry during passing the urine

  8. 8.

    Newborn’s hunger cry vs. newborn’s cry due to wet diaper

  9. 9.

    Newborn’s hunger cry vs. newborn’s cry during passing the urine

  10. 10.

    Newborn’s birth cry vs. newborn’s other reasons of crying (hunger\wet diaper\passing urine\pain).

The mean values of the above features along with the standard deviation are given in Table 3.

Table 3 Mean values of the features for different infant cry types

For the simplification purpose, the analysis is taken separately for the F 0-based features and remaining features.

4.1 Analysis using fundamental frequency (F 0)-based features

From Table 3 and Fig. 3, it can be observed that the minimum F 0, maximum F 0 and median of F 0 are almost similar in all the cases. Thus, these features cannot be used to characterize or discriminate a particular cry type. However, mean F 0 feature is showing differences in some cry types such as newborn’s birth cry has mean F 0 of 436.22 Hz while this parameter is 411.15 Hz for the normal newborn’s cry. Differences in the hunger cry and pain cries of the newborns are also observed. In hunger cries, the mean value of the F 0 is 425.19 ± 55 Hz, mean F 0 is 387.48 ± 72 Hz for urination cries. Significant differences are not found in the different features of F 0, based on the reason of crying, except in the two cases mentioned above. In the birth cries as well, these features do not change with the gestation age (GA). These parameters are almost similar for normal full term as well as for premature babies. In newborn cries, mean F 0 lies in the range of 400–600 Hz (Michelson and Michelson 1999). Thus, the results obtained here are in agreement with the previous studies.

Fig. 3
figure 3

Boxplot for the F 0 features a mean F 0 and b median F 0

The ANOVA analysis of the parameters derived from the F 0 contour also suggests the similar results. The results of ANOVA analysis are given in Table 4 for all the features. Here, we have considered 95 % confidence interval in ANOVA analysis which means features which give p value less than 0.05 are of significance in the analysis of those particular cry types.

Table 4 ANOVA analysis of the newborn infant’s cry

4.2 Analysis using normalized energy-based features

Analysis is done for various cry types based on normalized energy-based features. The mean values and standard deviations of the features are also mentioned in the Table 3. From the Table 3, bar plots are drawn for the energy features to illustrate their importance in the cry of an infant.

From Fig. 4, it can be observed that normalized energy of the pain and wet diaper cries are higher than other cry types. The energy is lowest in the premature infant’s cries. The energy of full-term birth cries is higher than the premature infant’s cries. Comparing the distribution of the energy of the cry signals in the three frequency bands as shown in Figs. 5, 6 and 7, we can observe that the pain cries and wet diaper cries have highest energy in all the sub-bands. Moreover, most of the energy lies in the 2–4 kHz sub-band in all cries. In premature infants, distribution of energy is higher in lower frequency bands compared to normal full term infant’s birth cries (as shown in Fig. 7), where the distribution of energy is higher in the mid frequency band (2–4 kHz) (as shown in Fig. 8). In hunger and urination cries, distribution of energy is more in lower frequency bands (0–2 kHz) compared to pain and wet diaper cries where energy in 2–4 kHz band is higher. In the high frequency bands (4–6 kHz), the distribution of energy is very low for infant’s cries except for pain and wet diaper cries as shown in Fig. 7.

Fig. 4
figure 4

Bar plot of mean values of normalized energy values for different cry types. Y-axis represents the normalized energy of the signal

Fig. 5
figure 5

Bar plot of mean values of E1 for different types of cries. Y-axis represents the normalized value of feature E1

Fig. 6
figure 6

Bar plot of mean values of E2 for different types of cries. Y-axis represents the normalized value of feature E2

Fig. 7
figure 7

Bar plot of mean values of E3 for different types of cries. Y-axis represents the normalized value of feature E3

Fig. 8
figure 8

Boxplots of normalized a E, b E1, c E2 and d E3 for birth and pain cries

Results of ANOVA analysis are shown in Table 4. It can be observed that the normal infant’s birth cries are distinct from the premature infant’s cries. Because of higher energy of normal full term infants, we can distinguish their cries from premature infants, who have low energy in the cries. The reason of cry can also be identified from the energy feature. Hunger cries are found distinct from the pain cries and wet diaper cries are found different from the crying during the passing of urine. In case of birth cry and pre-mature infants’ cries, it is observed that the energy difference is very high and this result in identification of the cries by auditory analysis as well. The differences in the two cry patterns are there in the mid- frequency bands. In the band 2–4 kHz, the energy of the birth cry is higher than the pre-mature infant’s cry and in other bands, the distribution of energy is same for both the cries. Birth cries of normal full term infants and pain cries are characterized by high energy of the signal as shown in Fig. 8a. ANOVA analysis in the three frequency bands show that the two cry can be characterized by the distribution of energy in the low and high frequency bands. The energy is high in low and high frequency bands in pain cries compared to birth cries as shown in Fig. 8b, d.

Analysis of hunger, pain, wet diaper and urination cries shows that distribution of energy is similar in hunger and urination cries as well as in pain and wet diaper cries. These two groups of the cries are distinct from each other on the basis of total normalized energy as well as energy in their respective bands. However, it is difficult to characterize differences in hunger and urination cries using energy-based features. Similar is the case for the classification of pain and wet diaper cries, where the energy in all the bands is almost similar irrespective of the reason of crying. Normal full-term birth cries are different from the other reasons of crying such as hunger, pain, wet diaper and urination named here as normal cry, on the basis of E1 and E2. In birth cries, E2 is higher than the other reasons of crying. However, in normal crying (due to other reasons of crying) energy E1 is higher than birth cries of full term healthy infants as shown in Fig. 9.

Fig. 9
figure 9

Boxplots of normalized a E, b E1, c E2 and d E3 for birth and normal cries

4.3 Analysis using unvoicing ratio of the cry

From Fig. 10 and Table 3, we can observe that the birth cries are characterized by very high unvoicing ratio. Compared to cries due to hunger, pain, wet diaper and urination, this higher unvoicing ratio makes birth cries distinct from other cry types. This feature is found to be useful in classifying the reason of crying also where energy-based features are not working. Similar energy-level of cries can be classified according to the ratio of crying present in the cry. Pain and wet diaper cries which have similar energy in all the frequency bands can be distinguished by using UV ratio analysis. In pain cries, UV ratio is higher than the wet diaper cries. Similarly, between wet diaper and hunger cries, hunger cries are found to have more unvoicing and can be distinguished from cries due to wet diaper.

Fig. 10
figure 10

Boxplot for the UV ratio in the cries

5 Summary and discussions

In this study, newborn infants cries are analyzed for the various reasons of crying such as hunger, pain, wet diaper and while passing the urine. These are the various reasons of crying in a newborn. For the analysis of the cries, features used are the F 0-based features, energy-based features and the unvoicing ratio of the cry segments. Some important results from the above analysis are as follows:

  1. 1.

    Birth cry can be characterized by high energy and high unvoicing ratio. The reason for this is, as soon as the newborn come to the external world from the mother’s warm womb; it is his or her response to the external stimulation. At birth, there is poor regulation of central nervous system (CNS) over vocal folds working. At birth cry, lungs open up for the first time and breaths air instead of sack fluid (Lester Barry 1985).

  2. 2.

    Most of the energy in birth cry is located in the frequency band 2–4 kHz. However, normal infant’s cry is having its maximum distribution of energy in 0–2 kHz (i.e., normal, hunger, urinating). Pain cry is also having the same characteristics of having higher E2 than E1.

  3. 3.

    Compared to other infant cry types, pain cries and wet diaper cries have higher energy distribution in 4–6 kHz frequency range. Higher energy in higher frequency ranges asks for the attention of the care taker and informs that a quick action is required. In the other words, higher frequency content in the cry reflects urgency of the attention and discomfort to the infants.

  4. 4.

    Characteristics of hunger cry and cry during passing the urine found to be similar on all the parameters. Similarly, pain cries and wet diaper cries have similar characteristics.

  5. 5.

    Hunger cry and cry during passing the urine can be distinguished from each other using mean F 0 parameter. Remaining parameters are same for them.

  6. 6.

    Unvoicing ratio in infants is an indicator of maturity of infant’s vocal production system. In birth cry, high unvoicing indicate that, in birth cry vocal folds movement is very irregular which results in poor voiced quality of the cry. With the production of the birth cry, infant’s neural system integrates and within few days cries become rhythmic.

  7. 7.

    Wet diaper cries can be distinguished from the pain cries based on the feature of unvoicing ratio. In pain cries, it is found to be higher than wet diaper cries.

  8. 8.

    Mean F 0 in newborn birth cries is higher than the normal infant’s cries. There are no significant differences in the birth cries of newborns and those of premature infants cries. This indicates that until infant achieves a minimum gestation age (GA), vocal folds do not vibrate to produce voiced cry sounds.

  9. 9.

    F 0- related features are not useful in identifying the reason of crying in newborns, though it is a useful parameter in infant’s (more than 1 month of age) cry analysis for understanding the reason of cry.

In future, authors would like to come up with classification of infant cries using these features. In addition, we would like to direct our efforts towards finding differences in male and female infant cries.