1 Introduction

Phonation initiation is a highly complex phenomenon with laryngeal maneuvers that position and stiffen the vocal folds (VFs), leading to self-sustained oscillations driven by the lung pressure. The transient oscillatory dynamics of the VFs from the rest prephonatory position to sustained vibrations are referred to herein as phonation onset (Mergell et al. 1998; Lebacq and DeJonckere 2019)Footnote 1. As a fundamental aspect of voiced speech, phonation onset has been studied for decades (Lisker and Abramson 1967; Mohr 1971; Ohde 1984; Titze 1988; Löfqvist et al. 1989; Mergell et al. 1998; Hanson 2009; Zhang 2011; Sváček and Horáček 2018; DeJonckere and Lebacq 2020; Azar and Chhetri 2022). In the past ten years or so, the fundamental frequency patterns during phonation onset have received renewed attention as they have been found to differ between healthy and pathological voices, enabling development of a practical and useful classification tool based upon relative fundamental frequency (Goberman and Blomgren 2008; Stepp et al. 2010a, 2011; Roy et al. 2016; Heller Murray et al. 2017). Fundamental frequency characteristics during transient periods of phonation, including phonation onset, correlate with kinematic vocal fold stiffness, a measure of laryngeal stiffness (Stepp et al. 2010b), and such a correlation can be a useful clinical indicator of laryngeal tension (McKenna et al. 2016; Park et al. 2021).

Phonation initiation exhibits a variety of fundamental frequency patterns depending on phonetic context and vocal health, as shown schematically in Fig. 1. In the case of isolated and initial vowels, fundamental frequency typically exhibits a gradual increase until a sustained phonation frequency is attained (Mohr 1971; Smith and Robb 2013). When the vowel is preceded by a voiceless consonant, as in /pa/, an initial spike in fundamental frequency followed by a gradual decay is observed (Ohde 1984; Löfqvist et al. 1989), where the onset period has been found to be dependent on the language and the acoustic nature of the voiceless consonant (unaspirated vs. aspirated) (Francis et al. 2006). With such gestures, speakers with healthy voices exhibit higher initial (relative) fundamental frequency values compared to speakers with vocal hyperfunctionFootnote 2 (Stepp et al. 2010a). When the vowel is preceded by a voiced consonant, as in /ba/, there is less agreement in the literature regarding the temporal evolution of fundamental frequency, with some studies observing a gradual increase (Mohr 1971; Hombert et al. 1979), but others finding inconsistent patterns between and within speakers (Ohde 1984). Moreover, it has been found that fundamental frequency patterns in the case of vowels preceded by voiced consonants are context-dependent (Hanson 2009; Kirby and Ladd 2016). Regardless, empirical evidence suggests that onset frequency in vowels preceded by voiceless consonants is higher than that in vowels preceded by voiced consonants (Ohde 1984).

Fig. 1
figure 1

(Color online) Schematic representation of onset fundamental frequency \(f_\mathrm {onset}\), normalized by reference fundamental frequency \(f_\mathrm {ref}\) [e.g., fundamental frequency of the tenth onset cycle (Stepp et al. 2010a)] as a function of normalized time \(t^{*}\) [e.g., number of onset cycles (Stepp et al. 2010a)] for different phonetic contexts and vocal health. The depicted trends are consistent with the patterns observed experimentally in the literature (Stepp et al. 2010a; Ohde 1984; Löfqvist et al. 1989; Mohr 1971; Hombert et al. 1979; Stepp et al. 2011; Heller Murray et al. 2017)

Several underlying factors have been hypothesized to drive the observed fundamental frequency patterns during phonation initiation, including laryngeal muscle tension, aerodynamics, and vocal fold contactFootnote 3. Smith and Robb (2013) empirically investigated onset fundamental frequency patterns of vowels preceded by fricatives and stop consonants, in addition to isolated vowels. They speculated that the rise of onset fundamental frequency in the case of isolated vowels is due to a rise in VF tension. They further suggested that laryngeal muscle tension is a predominant factor in the case of vowels preceded by voiceless consonants. Löfqvist et al. (1989) investigated cricothyroid muscle activation during phonation onset using electromyography and found correlation between increased cricothyroid muscle activation and the higher fundamental frequency observed during the onset of vowels preceded by voiceless consonants. Löfqvist et al. (1995) estimated the glottal flow characteristics from oral flow measurements and found that peak glottal flow is higher in vowels preceded by voiceless consonants in comparison with voiced consonants, indicating a correlation with the observed higher initial fundamental frequency. Moreover, Löfqvist et al. (1995) found that the glottal flow characteristics differ between aspirated voiceless consonants and their unaspirated counterparts.

In addition to clinical studies, there have been several theoretical and numerical investigations attempting to elucidate the underlying mechanisms of fundamental frequency during transient phonation, including phonation onset. Ishizaka and Flanagan (1972) employed a two-mass numerical vocal fold model to explore the mechanics of voiced speech and noted the potential role of vocal fold contact in altering fundamental frequency during phonation onset. Titze (1988) studied theoretically the onset conditions of VF oscillations using a single-mass model, showing that aerodynamics change the equivalent oscillator stiffness, and consequently, the fundamental frequency of the VF system during phonation onset. Zhang (2009) extended this analysis using a continuum two-layered model, noting that under certain conditions, slight changes in VF geometry or stiffness can cause sudden changes in onset fundamental frequency. Serry et al. (2021), in a study of phonation offset using a simple impact oscillator model, demonstrated that increased collision duration results in higher fundamental frequency, which is expected to play a similar role during phonation onset.

As illustrated by Fig. 1, different fundamental frequency characteristics can arise through manipulation of the phonetic context, suggesting potentially complex interrelations between the contributing factors. It can be quite challenging to isolate and control individual factors, such as aerodynamics and laryngeal muscle activation, during phonation onset in studies with human participants. As such, in this paper we aim to investigate, by means of theoretical and numerical analyses, some of the underlying mechanisms leading to the disparate fundamental frequency behaviors depicted in Fig. 1.

In particular, we investigate the dynamic nature of fundamental frequency during phonation onset by extending the impact oscillator model introduced in Serry et al. (2021). This dynamic nature is due, in part, to elevating collision levels of the vocal folds during phonation onset. The theoretical analysis is then verified using the physiologically relevant three-mass body-cover model (Story and Titze 1995). Results from the aforementioned theoretical analysis are capable of predicting the fundamental frequency rise pattern displayed in Fig. 1, which implies collision as a potential underlying mechanism. Subsequently, we explore numerically how laryngeal muscle activation and its temporal variation can underlie the fundamental frequency drop patterns observed during voicing of vowels preceded by voiceless consonants, see Fig. 1. Finally, we attempt to investigate some of the laryngeal mechanisms that can potentially underlie the differences between healthy and hyperfunctional voices.

The organization of the paper is as follows: in Sect. 2, we introduce the employed phonation models; the role of VF collision is discussed in Sect. 3; the influence of the cricothyroid and thyroarytenoid muscles during phonation onset is explored in Sect. 4; and Sect.  5 concludes the manuscript.

2 Phonation models

In this section, we introduce the phonation models used in our analyses. The first is a hybrid phonation model that integrates the impact oscillator model introduced by Serry et al. (2021) and a linearized version of the Titze (1988) single-mass model, which is used to explore the role of collision during phonation onset. The second is a body-cover reduced-order model (Story and Titze 1995) used to explore the role of muscle activation and corroborate findings from the hybrid model with a more physiologically relevant VF description.

Similar to Mergell et al. (1998), it will be assumed, unless otherwise stated, that the neutral prephonatory gap between the VFs is fixed during onset. In the case of isolated vowels, this assumption is supported by empirical data, showing that VF oscillations are initiated from a fixed prephonatory neutral position (Shiba and Chhetri 2016). In the case of vowels preceded by fricatives, onset has been found to start slightly before reaching the final prephonatory position (McKenna et al. 2016; Patel et al. 2017). Moreover, for simplicity we neglect temporal variations in aerodynamic and acoustic parameters, such as the acoustic impedance at the mouth. Variations in such parameters are believed to play roles in altering fundamental frequency during transient periods of phonation (Hombert et al. 1979); however, their significance in comparison with VF contact and laryngeal muscle tension is the subject of some debate. Smith and Robb (2013), for instance, found that onset fundamental frequency patterns of vowels preceded by fricatives and stop consonants are very similar despite their differing aerodynamic characteristics, implying a minor role of aerodynamics. On the contrary, the empirical data of relative fundamental frequency presented in Lien et al. (2014) and Park et al. (2021) suggests that these factors may be prevalent. Herein, we focus on collision and muscle activation, leaving a comprehensive exploration of aerodynamics and acoustics for future work.

2.1 Hybrid phonation model

The hybrid model is shown schematically in Fig. 2. It enables analysis of fluid–structure interactions during phonation onset, where the glottal flow is modeled using a linearized Bernoulli flow model, while incorporating the role of VF collision. The governing equations are

$$\begin{aligned} M\ddot{\xi }-\mathcal {B}_{1}\dot{\xi }+K\xi&=0, \quad \xi (t)\ge -\delta , \end{aligned}$$
(1a)
$$\begin{aligned} M\ddot{\xi }+\mathcal {B}_{2}\dot{\xi }+\mathcal {K}\xi&=-k_{\mathrm {col}}\delta , \quad \xi (t)< -\delta , \end{aligned}$$
(1b)

where \(\xi (t)\) is the VF mass displacement from its neutral position, M is its mass, K is the tissue stiffness, \(k_{\mathrm {col}}\) is collision stiffness, \(\delta \ge 0\) is the neutral gap, and \(\mathcal {K}=K+k_{\mathrm {col}}\). The damping terms are given by \(\mathcal {B}_{1}=2\tau {P_{\rm L}}/(k_{t}\delta )-B\) and \(\mathcal {B}_{2}=B+b_{\mathrm {col}}\), where B is the structural viscous damping coefficient, \(b_{\mathrm {col}}\) is an additional damping coefficient incorporated during collision, \(P_{\rm L}\) is the subglottal lung pressure, \(\tau\) is a time delay term associated with the propagation of the mucosal wave on the medial surface of the VFs, and \(k_{t}\) is a pressure recovery term (Titze 1988). The mass, stiffness, and damping coefficients are given per unit area. The neutral gap, \(\delta\), serves as a proxy for degree of VF adduction, such that \(\delta =0\) corresponds to complete VF closure. This model neglects acoustic effects and assumes negligible supraglottal pressure; hence, \(P_L\) is correspondent to the transglottal pressure. It is assumed that the dynamics of the hybrid phonation model are oscillatory in both collision and non-collision regimes, that is,

$$\begin{aligned} \omega _{1}^2&:=\frac{K}{M}-\frac{\mathcal {B}_{1}^2}{4M^2}>0, \end{aligned}$$
(2a)
$$\begin{aligned} \omega _{2}^2&:=\frac{\mathcal {K}}{M}-\frac{\mathcal {B}_{2}^2}{4M^2}>0, \end{aligned}$$
(2b)

where \(\omega _{1}\), \(\omega _{2}\ge 0\) denote the angular frequencies in the non-collision and collision regimes, respectively.

Fig. 2
figure 2

Schematic diagram of the hybrid phonation model

The impact oscillator model of Serry et al. (2021), referred to herein as the S21 model, can be recovered from Eq.  (1) by omitting the viscous forces (i.e., by setting \(\mathcal {B}_1=\mathcal {B}_2=0\)), resulting in

$$\begin{aligned} M\ddot{\xi }+K\xi&=0, \quad \xi (t)\ge -\delta , \end{aligned}$$
(3a)
$$\begin{aligned} M\ddot{\xi }+\mathcal {K}\xi&=-k_{\mathrm {col}}\delta , \quad \xi (t)< -\delta . \end{aligned}$$
(3b)

This model isolates the effects of collision and primitive parameters (e.g., mass, stiffness, and neutral gap) on fundamental frequency, providing an abstract, yet useful, insight into the role of VF contact during real phonation scenarios.

The linearized version of the Titze (1988) model can be recovered from the hybrid phonation model by assuming collision-free oscillations, that is, \(\xi (t)> -\delta\), yieldingFootnote 4

$$\begin{aligned} M\ddot{\xi }-\mathcal {B}_{1}\dot{\xi }+K\xi =0. \end{aligned}$$
(4)

Equation (4) provides useful insights into the fluid–structure interaction between the VFs and the glottal flow during phonation onset and, in particular, the role of aerodynamics (in the form of negative damping that results from linearizing the Bernoulli flow model) in initiating VF oscillations. The onset conditions predicted from Eq. (4) [see Eq.  (6)] agree reasonably with experimental measurements from a physical model of the VF mucosa; in particular, phonation threshold pressureFootnote 5 is positively correlated with the neutral gap \(\delta\) (for sufficiently large \(\delta\)) and the VF viscous damping coefficient B (Titze et al. 1995).

2.2 Body-cover model

The reduced-order three-mass body-cover model (BCM) (Story and Titze 1995) is employed to verify and extend the findings from the simpler, more analytically tractable, hybrid phonation modelFootnote 6. This model, which embeds the essential physiological components of the VFs, consists of two cover masses and a body mass, all connected via springs and dampers to model the VFs viscoelastic tissues (see Fig. 13 in Appendix 1). The model assumes the motion of the VFs to be symmetric about the medial plane; hence, only one of the folds is needed in the model construction. Collision of the opposing folds is modeled by activating additional nonlinear spring forces applied to the cover masses, where the spring forces are proportional to the degree of overlap of the cover masses with the medial (collision) plane, see Equations (6a) and (6b) in Story and Titze (1995)Footnote 7. The model implements the muscle activation rules of Titze and Story (2002) to control the primitive model variables via three dimensionless muscle activation parameters, \(a_{\mathrm {CT}}\), \(a_{\mathrm {TA}}\), and \(a_{\mathrm {LCA}}\), which account for the relative activation of the cricothyroid (CT), thyroarytenoid (TA), and lateral/posterior cricoarytenoid (LCA/PCA) muscles, respectively. The neutral glottal gap in the BCM is modulated through activation of the LCA muscle, where the neutral glottal half width \(x_{0}\) is given by \(x_{0}=0.25L_{0}(1-2a_\mathrm {LCA})\), where \(L_{0}\) is the resting VF length (Titze and Story 2002). As seen from this relation, LCA activation is negatively correlated with the VF neutral gap, where increasing the activation of the LCA muscle leads to adducting the VFs.

Air flow through the glottis is modeled using a quasi-steady Bernoulli flow formulation with quasi-steady viscous correction for losses in the glottis (Pelorson et al. 1994; Lucero 1996; Lucero and Schoentgen 2015). The quasi-steady viscous model has shown good agreement with experimental observations of flow through a larynx model (Van den Berg et al. 1957). We note herein that our glottal flow model is similar, but not identical, to those presented in Pelorson et al. (1994); Lucero and Schoentgen (2015) as it incorporates flow separation and its formulation is suitable for modelling acoustic effects due to subglottal and supraglottal tracts. Viscous corrections are employed to account for non-negligible losses that occur during the initial stages of phonation when the flow speeds are relatively slow and during periods when the glottis is nearly closed (Fulcher et al. 2013). See Appendix 1 for further details on the employed flow model.

Acoustics are modeled using the wave reflection analog (WRA) method (Kelly and Lochbaum 1962; Liljencrants 1985; Story 2005). Similar to Galindo et al. (2014) and Zañartu et al. (2014), a subglottal tract area function is adapted from respiratory system measurements of human cadavers (Weibel et al. 1963), covering only the trachea and bronchi. A supraglottal tract is also included, which is configured to simulate the /i/ vowel (Takemoto et al. 2006). The BCM dynamics are driven by the lung pressure, \(P_{\rm l}\), input to the inferior end of the subglottal tract. To mitigate numerical instabilities in the WRA implementation, \(P_{\rm l}\) is ramped up from zero to the desired value during phonation onset according to the relation \(P_{\rm l}(t)=P_{{\rm l},0}(1-\mathrm {e}^{-t/\sigma })\), where \(P_{{\rm l},0}\) is the steady-state lung pressure and \(\sigma =0.2\) ms. The settling time for the ramp is less than 1 ms. The system dynamics are solved using an explicit version of Newmark’s method (Newmark 1959; Galindo et al. 2014) with a sampling frequency of \(140\,\mathrm {kHz}\). Initial conditions in all BCM simulations are identical, with zero velocity for all masses and unstretched model springs. As in Serry et al. (2021), we consider the time-series of the glottal area, \(A_{\rm g}\), in our frequency analysis, where frequency is determined from the time duration between sequential signal peaks.

3 Relationship between collision and fundamental frequency

Serry et al. (2021) demonstrated a direct correlation between VF collision and fundamental frequency, wherein transitioning from a VF oscillation regime with collision to one without collision during phonation offset results in a decrease in fundamental frequency due to the net reduction in “system stiffness”. We posit that this mechanism is also a contributing factor underlying the temporal variation in fundamental frequency during phonation onset.

3.1 Insights from the S21 model

In this section, we summarize key findings from Serry et al. (2021), which investigated phonation offset, and expand the analysis therein to explore phonation onset. We note that despite the symmetries between phonation offset (decaying VF oscillations) and phonation onset (rising VF oscillations), there exist some notable differences between the two phenomena, including phonation threshold pressure values (Titze et al. 1995), and aerodynamic characteristics (depending on the phonetic context) (Löfqvist et al. 1995). Herein, we aim to utilize the symmetries between the two phenomena to elucidate the role of VF collision in altering fundamental frequency during phonation onset.

From Eq. (3a) (the S21 model without collision), we note that the natural frequency of the oscillator is \(f_{0}= \sqrt{{K}/{M}}/(2\pi )\). For convenience, we define the normalized neutral gap \(\tilde{\delta }= \sqrt{{K}/(2E)}{\delta }\), where E is the energy of the VF system per unit area, which can loosely be considered as the energy originally imparted to the system via aerodynamicsFootnote 8. The system energy is constant-in-time owing to the lack of viscous losses, as can be seen from Eq. (3). We further define the stiffness ratio \(\tilde{k}=K/\mathcal {K}\). The fundamental frequency of the model is (Serry et al. 2021)

$$\begin{aligned} f={\left\{ \begin{array}{ll} \frac{2f_{0}}{\frac{2}{\pi }\sqrt{\tilde{k}}\arctan \left( \sqrt{(\frac{1}{\tilde{\delta }^2}-1)/{\tilde{k}}}\right) +\frac{2}{\pi }\arcsin \left( \tilde{\delta }\right) +1},&{} \tilde{\delta }\le 1,\\ f_{0},&{} \tilde{\delta }>1, \end{array}\right. } \end{aligned}$$
(5)

for which the behavior depends on whether or not the system has sufficient energy (vibration amplitude) to cause collision. Note that when \(\tilde{\delta }>1\) (no collision), frequency is independent of the oscillator energy. Utilizing a quasi-steady assumption with E as a parameter, we explore the effect of varying the system energy on fundamental frequency. The role of energy, E, becomes evident when collision occurs (\(\tilde{\delta }\le 1\)), wherein fundamental frequency increases as E increases, with an asymptotic value \(2f_{0}/(\sqrt{\tilde{k}}+1)\), corresponding to oscillations at zero neutral gap (\(\delta =0\)) (see Fig. 3). The asymptotic behavior suggests that the collision-based mechanism is inefficient at changing frequency at high energy levels, as large energy increases result in modest gains in fundamental frequency.

In addition to system energy, collision is also modulated via VF adduction, which is embedded in the S21 model through the neutral gap, \(\delta\). Equation (5) shows that decreasing \(\delta\) has a similar effect to increasing E (both lead to decreasing \(\tilde{\delta }\)). That is, for fixed system energy and stiffness, fundamental frequency can be increased purely through adduction. Similar to the energy rise, adduction only impacts fundamental frequency of the model when collision is present (\(\tilde{\delta }\le 1\)). We note that the effect of increasing energy is mediated by the neutral gap, with a more muted response as the gap decreases. That is, the effectiveness of a rise in system energy at increasing frequency during phonation onset is dependent on the adduction level of the VFs. In reality, the exact relation is naturally expected to be complex due, in part, to the geometry of the glottis and the high degrees of freedom of the VFs.

Finally, the S21 model provides useful insights into the role of VF stiffness during phonation onset. From Eq. (3), we observe that K affects the dynamics of the VF system in both the collision and non-collision regimes, wherein increasing stiffness increases the (instantaneous) fundamental frequency. This indicates that changing VF stiffness during phonation onset through intrinsic muscle activation alters fundamental frequency even in the initial stage of onset in some phonetic contexts when VF oscillations are collision-free. Thus, we expect potentially competing factors of adduction, aerodynamic energy transfer, and laryngeal tension to influence fundamental frequency during phonation onset.

Fig. 3
figure 3

Fundamental frequency of the S21 model as a function of the system energy. The dashed vertical line indicates the energy level at which collision initiates

3.2 Analysis using the hybrid phonation model

Analysis of the S21 model in the previous section relies on the quasi-steady assumption, wherein the dynamics of fluid–structure interaction during phonation onset and viscous friction losses are neglected. In this section, we consider an analytical treatment to the onset problem using the hybrid phonation model (see Sect. 2) and elucidate the dynamic nature of the collision-based mechanism. We consider the evolution of VF oscillations during onset while incorporating VF contact, which has been typically omitted in previous theoretical analyses of phonation onset (Titze 1988; Zhang 2009; Lucero and Koenig 2007).

Pre-collision, the hybrid model is equivalent to the linearized Titze (1988) model in Eq. (4), which predicts VF oscillations with exponential growth when [see, for example, Titze (1988)]

$$\begin{aligned} 2\tau \frac{P_{\rm L}}{k_{t} \delta }-B>0. \end{aligned}$$
(6)

However, realistic energy dynamics during phonation onset are complex and oscillatory due to several factors, including nonlinear fluid–structure interaction effects, VF collision, acoustics, and viscous losses. The primary energy transfer mechanism to the VF system is the temporal asymmetry of the average intraglottal pressure, where, loosely speaking, positive energy transfer from the glottal flow and energy dissipation to the flow take place when the VF configuration is convergent and divergent, respectively, with the total energy transferred from the flow being larger than that dissipated to the flow in order to sustain oscillations (Thomson et al. 2005). The hybrid model (Eq. 1) allows exploration of the general trends of the complex oscillatory energy dynamics beyond the initial onset of oscillations by incorporating simplified VF contact and aerodynamic effects.

We examine the energy evolution by considering the discrete system energy at the same phase in a sequence of oscillation cycles. Let \(\tau _{i},~i=0,1,2,\cdots\) be the time instances such that \(\xi (\tau _{i})=-\delta\) and \(\dot{\xi }(\tau _{i})< 0\), which correspond to the beginning of each collision. Let \(\mathcal {V}_{i}=|\dot{\xi }(\tau _{i})|\) be the oscillator velocity magnitude at time instance \(\tau _{i}\). The energy immediately prior to each collision (kinetic energy plus potential energy) is then \(E(\tau _{i})=M\mathcal {V}_{i}^2/2+K\delta ^2/2\). The velocity sequence \(\{\mathcal {V}_{i}\}\) can be obtained approximately using the recurrence relation

$$\begin{aligned} \mathcal {V}_{i+1}=\mathcal {A}\mathcal {V}_{i}+\mathcal {W},~ i=0,1,2,\cdots , \end{aligned}$$
(7)

where the initial velocity \(\mathcal {V}_{0}>0\) is given. The parameter \(\mathcal {A}\) (a scaling term) is modulated by the energy losses and gains in the collision and collision-free regimes, respectively, and the parameter \(\mathcal {W}\) (a drift term) is regulated by the neutral gap \(\delta\). Derivation of Eq. (7) and the exact definitions of \(\mathcal {A}\) and \(\mathcal {W}\) are provided in Appendix 2.

The dynamics of the recurrence relation given in Eq. (7) exhibit various behaviors depending on the numerical values of \(\mathcal {A}\) and \(\mathcal {W}\) (e.g., linear growth, exponential growth, and exponential decay). Herein, we are interested in cases where VF oscillations are bounded, thus corresponding to realistic phonation onset scenarios. On average, the aerodynamic energy transfer is larger than viscous losses after phonation initiation, which induces VF oscillations of growing amplitude. The (average) difference between aerodynamic energy transfer to the VF system and viscous dissipation gradually decays over time until the difference becomes zero, which corresponds to steady-state VF oscillations of constant amplitude (that is, sustained phonation).

The case of bounded energy growth can be determined from Eq. (7) when \(\mathcal {W}\ge 0\) and \(0<\mathcal {A}<1\), which is fulfilled when

$$\begin{aligned} 0<2\tau \frac{P_{\rm L}}{k_{t}\delta }-B< (B+b_{\mathrm {col}})\sqrt{\tilde{k}}. \end{aligned}$$
(8)

This corresponds to the onset condition given in Eq. (6) under the additional constraint that the subglottal pressure is such that the damping ratio in the non-collision regime, \(\mathcal {B}_{1}/(M\omega _{1})\), is smaller than the damping ratio in the collision regime, \(\mathcal {B}_{2}/(M\omega _{2})\), to ensure VF oscillations of finite amplitude. As an example, if we set \(k_{t}=1.1\), \(\delta =10^{-3}\,\mathrm {m}\), \(\tau =1.5\times 10^{-3}\,\mathrm {s}\), and \(B=2380\,\mathrm {Pa \cdot s/m}\) [similar to values used in Titze (1988) and Lucero (1996)] and additionally assume \(b_\mathrm {col}=4B\) and \(k_\mathrm {col}=3K\) [similar to assumptions in Steinecke and Herzel (1995)], Eq. (8) predicts that \(P_{\rm L}\) should be within the approximate range [875, 2180] Pa, in order to have VF oscillations of bounded amplitudeFootnote 9. In this case (\(\mathcal {W}\ge 0\) and \(0<\mathcal {A}<1\)), Eq. (7) can be rewritten as

$$\begin{aligned} \mathcal {V}_{i}=\frac{\mathcal {W}}{1-\mathcal {A}}-\mathcal {A}^{i}\left( \frac{\mathcal {W}}{1-\mathcal {A}}-\mathcal {V}_{0}\right) ,~i=1,2,\cdots \end{aligned}$$
(9)

If

$$\begin{aligned} \mathcal {V}_{0}< \frac{\mathcal {W}}{1-\mathcal {A}}, \end{aligned}$$
(10)

that is, the kinetic energy of the VFs is initially low, then the sequence \(\{\mathcal {V}_{i}\}\) is monotonically increasing with an asymptotic upper bound \(\mathcal {V}_{\infty }=\mathcal {W}/(1-\mathcal {A})\) (see Fig. 4). This shows that during phonation onset, the energy of the VF system increases gradually, on average, and achieves an asymptotic value, where aerodynamic energy transfer equals to viscous losses, which is associated with sustained phonation.

Fig. 4
figure 4

(Color online) Illustrative example of the velocity (circles) and frequency (squares) sequences given in Eqs. (7) and (11), respectively, normalized with respect to the asymptotic upper bounds \(\mathcal {V}_{\infty }\) and \(\mathcal {F}_{\infty }\), respectively, with fixed-in-time parameter values \(M=\mathcal {B}_{1}=K=1\), \(\mathcal {B}_{2}=2,~\mathcal {K}=3,~ \delta =0.1\) and \(\mathcal {V}_{0}=0.5\mathcal {W}/(1-\mathcal {A})\). Note that, for the given parameter values, the conditions in Eqs. (8), (10), and (12) are fulfilled, which explains the monotonically increasing and bounded behaviors

The frequency of the \(i\)th cycle, \(\mathcal {F}_{i}\), is approximately given by

$$\begin{aligned} \mathcal {F}_{i}=\left[ \frac{1}{\mathcal {F}_{\infty }}+\frac{\alpha _{1}}{ \mathcal {V}_{i-1}+\beta }-\frac{\alpha _{2}}{\mathcal {V}_{i-1}}\right] ^{-1},~i=1,2,\cdots \end{aligned}$$
(11)

where \(\mathcal {F}_{\infty }= \left( ({1}/{\omega }_{1})+({1}/{\omega _{2}})\right) ^{-1}/{\pi }\) (see Appendix 3 for the derivation and the definitions of parameters \(\alpha _{1},~\alpha _{2},\) and \(\beta\)). Note that if the sequence \(\{\mathcal {V}_{i}\}\) is monotonically increasing, which is attained if the conditions in Eqs. (8) and (10) are satisfied, and that the initial velocity additionally satisfies the constraintFootnote 10

$$\begin{aligned} \mathcal {V}_{0}\ge \beta \frac{\alpha _{2}+\sqrt{\alpha _{1}\alpha _{2}}}{\alpha _{1}-\alpha _{2}}, \end{aligned}$$
(12)

then the sequence \(\{\mathcal {F}_{i}\}\) is guaranteed to be monotonically increasing (see Fig. 4) with an asymptotic upper bound \(\mathcal {F}_{\infty }\), which corresponds to the fundamental frequency in the case of zero neutral gap. In other words, the fundamental frequency exhibits a bounded increase during phonation onset due to the average increase in the kinetic energy of the VF system, which agrees in essence with the quasi-steady analysis in Sect. 3.1. This rising trend is also consistent with empirical studies of onset of initial and isolated vowels, see Fig. 1. The implications of matching with empirical data are discussed in Sect. 3.4.

3.3 Numerical simulations with the body-cover model

To ground the analysis from the simplified models in a more physiologically-relevant framework, we resort to simulations using the BCM.

First, we consider onset simulations with fixed steady-state lung pressure, \(P_{{\rm l},0}=800\) Pa, and muscle activation values corresponding to low/normal CT and TA activation levels and fully adducted VFs, where \(a_\mathrm {CT}=0.2,~a_\mathrm {TA}=0.2\), and \(a_\mathrm {LCA}=0.5\). Figure 5 displays the fundamental frequency, maximum collision force (among the cover masses), and glottal area profiles of two exemplar cases during onset. In one case, the collision springs are activated, resulting in nonzero collision forces when VF contact occurs (\(\mathrm {col}=1\)), and in the other case, the collision springs are deactivated through the entire simulation period, resulting in zero collision forces (\(\mathrm {col}=0\)).

Fig. 5
figure 5

Two exemplar simulations of phonation onset using the BCM with fixed subglottal pressure, \(P_{{\rm l},0}=800\,\mathrm {Pa}\), and muscle activation values \(a_\mathrm {CT}=0.2\), \(a_\mathrm {TA}=0.2\), and \(a_\mathrm {LCA}=0.5\). Solid lines indicate the simulation with collision springs activated when contact occurs (\(\mathrm {col}=1\)), whereas dashed lines indicate the simulation with the collision springs deactivated, even when collision occurs (\(\mathrm {col}=0\)). a (Color online) Fundamental frequency (left), and maximum collision force (right) time-series. b Glottal area time-series

Figure 5(a: left axis) shows that there is a gradual increase in fundamental frequency for both cases, though the increase is greater for the \(\mathrm {col}=1\) case. Figure 5(a: right axis) illustrates the increase in collision forces during onset for the \(\mathrm {col}=1\) case, which is attributed to the increased energy of the VF system. It can be inferred from Fig. 5a that the more rapid rise in fundamental frequency in the \(\mathrm {col}=1\) case is correlated with the rise in collision forces, which is in agreement with our theoretical analysis in Sect. 3.2Footnote 11. The rise in fundamental frequency in the case of deactivated collision springs highlights the complex nature of the process, wherein nonlinear stiffness and aerodynamic contributions can also influence fundamental frequency. Moreover, the relatively larger increase in fundamental frequency in the case of activated collision springs indicates that collision plays a significant role in increasing frequency during onset when all other controlling factors (e.g., muscle activation) are fixed. Figure 5b shows that the amplitude of the glottal area waveform increases during onset also due to aerodynamic energy transfer, where a larger oscillation amplitude is noticed in the \(\mathrm {col}=1\) case, which can be attributed to the (repulsive) contact forces during the contact periods.

To further highlight the influence of VF contact, we now consider an onset simulation with the same steady-state lung pressure, and CT and TA muscle activation values. However, we vary the LCA activation level from \(a_{\rm LCA}=0.4\) to \(a_{\rm LCA}=0.5\) over a period of \(50\,\mathrm {ms}\), which corresponds to the VFs being initially abducted then proceeding to the fully adducted state. This scenario simulates the glottal state during the onset of vowels preceded by voiceless consonants [see, for example, Diaz-Cadiz et al. (2019)]. Figure 6 displays the fundamental frequency, maximum collision force, and glottal area time-series for the simulation (LCA activation and glottal area time-series are shown in the inset). The figure shows that VF oscillations exhibit contact starting from \(t\approx 30\) ms. Moreover, the figure shows that prior to the initial contact, oscillations exhibit variations in fundamental frequency, potentially due to nonlinear and aerodynamic effects as stated in the discussion of Fig. 5. Furthermore, the figure displays that, starting from the initial VF contact instance, the oscillations exhibit a significant rise in fundamental frequency, which also correlates with the rise in collision forces, in agreement with the theoretical analysis in Sects. 3.1 and 3.2.

Fig. 6
figure 6

(Color online) Frequency and maximum collision force versus time for an onset simulation with time-varying neutral glottal gap, where \(a_\mathrm {LCA}\) varies from 0.4 to 0.5 over a \(50\,\mathrm {ms}\) span (see the inset), \(P_{{\rm l},0}=800\,\mathrm {Pa}\), and \(a_\mathrm {CT}=a_\mathrm {TA}=0.2\). The inset of the figure also depicts the time-series of the glottal area waveform, \(A_{\rm g}\), showing its variation with changing LCA muscle activation

3.4 Comments on relations to empirical observations

The increasing frequency resulting from progressively greater degrees of collision during phonation onset predicted by the models in this study aligns with empirical observations for initial and isolated vowels (Smith and Robb 2013; Mohr 1971) (see Fig. 1). This also agrees with some reported observations for vowels preceded by voiced consonants (Hombert et al. 1979). Whereas variations in laryngeal muscle tension and/or aerodynamics are often proposed to be the underlying factors governing the rise in frequency for these conditions (Smith and Robb 2013), our study shows that these influences need not be present to generate the observed behavior, since the natural system dynamics tend to increase fundamental frequency during phonation onset. We emphasize that this does not mean these other factors are not playing a role during onset, only that they are not necessary to produce the observed frequency patterns.

As has been observed clinically (see Fig. 15), this gradually increasing effect of VF contact during phonation onset is also present in other phonetic contexts, including the onset of vowels preceded by voiceless consonants, implying the relevance of VF contact in various contexts. However, as shown in Fig. 1, fundamental frequency tends to decrease during onset for a vowel preceded by a voiceless consonant, indicating that the collision-based rise in frequency is overshadowed by other factors. In Sect. 4, we show that laryngeal muscle activation can induce the observed decreasing trends of fundamental frequency during the onset of vowels preceded by voiceless consonants.

4 Muscle tension and frequency regulation

In this section, we explore the influence of intrinsic laryngeal muscle tension and, in particular, the role of CT and TA muscles during phonation onset using the BCM. Intrinsic laryngeal muscles and their roles in phonation have been extensively investigated in several clinical (Chhetri and Neubauer 2015; Chhetri et al. 2012, 2014; Choi et al. 1995, 1993) and numerical (Geng et al. 2021; Alzamendi et al. 2020; Movahhedi et al. 2021; Yin and Zhang 2013, 2014) studies, where it has been found that the CT and TA muscles are essential in regulating fundamental frequency. Increasing activation of the CT muscle has been found to increase phonation fundamental frequency (Löfqvist et al. 1989; Chhetri et al. 2014). On the other hand, the role of the TA muscle in modulating fundamental frequency is more complex as its activation can either increase or decrease fundamental frequency, with some conflicting results in the literature [see Movahhedi et al. (2021)]. Activation of the LCA and interarytenoid muscles have been found to be positively correlated with fundamental frequency (Choi et al. 1995), whereas PCA activation exhibits negative correlation with fundamental frequency (Choi et al. 1993).

To the best of our knowledge, there are few studies that have substantially investigated the temporal variations of laryngeal muscle activation during phonation onset and how these variations may underlie empirical observations of fundamental frequency [e.g., Löfqvist et al. (1989)]. In this study, we attempt to explore these temporal variations in order to elucidate some of the underlying mechanisms of phonation onset.

In real phonation scenarios, the intrinsic laryngeal muscles do not act in isolation and their effect on fundamental frequency depends on several factors, including the relative geometry and contraction levels of agonist/antagonist muscles (Alzamendi et al. 2022). To simplify our analysis in this section, we aim to isolate the effects of the CT and TA muscles and assume that the tension variation in other laryngeal muscles is negligible. In all simulations presented below, we set \(P_{{\rm l},0}=800\) Pa and \(a_\mathrm {LCA}=0.5\), which corresponds to fully adducted vocal folds.

4.1 Cricothyroid muscle

The CT muscle plays a crucial role in regulating fundamental frequency by elongating and tensioning the VFs (Titze and Story 2002; Sonesson 1982; Löfqvist et al. 1989; Atkinson 1978; Chhetri et al. 2012). Electromyography has shown that activation of the CT muscle is higher in phonetic contexts wherein vowels are preceded by voiceless consonants in comparison with vowels preceded by voiced consonants, which correlates with the empirically observed higher onset fundamental frequency in such conditions (Löfqvist et al. 1989). It has been speculated that higher VF tension, which correlates with higher activation of the CT muscle, is required to mitigate VF vibrations during the production of voiceless consonants and that the higher tension carries over into the adjacent vowel (Hombert et al. 1979). Here, we aim to investigate this hypothesis numerically by varying CT muscle activation while keeping the activation levels of the TA and LCA muscles fixed.

We begin with a quasi-steady analysis wherein CT activation is fixed in time. Figure 7 presents sustained phonation fundamental frequency as a function of CT activation for different TA activation levels. In all cases, sustained phonation fundamental frequency is positively correlated with the CT muscle activation level (for fixed TA activation level), in agreement with previous numerical and clinical studies (Alzamendi et al. 2020; Titze and Story 2002; Chhetri et al. 2014). There are slight fluctuations observed in the fundamental frequency curves for large \(a_\mathrm {CT}\) values, which can be attributed, in part, to the nonlinearity of the BCM. Assuming variations in the activation levels of other laryngeal muscles to be small, this suggests that, in contexts where vowels are preceded by voiceless consonants, CT muscle activation level may be decreasing in order to achieve the empirically observed decaying fundamental frequency patterns as seen in, for example, Stepp et al. (2010a)Footnote 12.

Fig. 7
figure 7

(Color online) Sustained phonation fundamental frequency as a function of CT muscle activation for different TA activation levels

Figure 8 presents instances of the glottal area time-series during phonation onset for monotonically decaying CT activation, with initial value \(a_{\mathrm {CT},i}\) and final value \(a_{\mathrm {CT},f}\). Specifically, initial activation level \(a_{\mathrm {CT},i}\) is varied between 0.2 and 0.6 across simulations and final activation value is set to be 0.2. The transition between the two \(a_{\mathrm {CT}}\) levels occurs over a duration of \(50\,\mathrm {ms}\) (see insets for the activation level temporal evolution), which is of the same order of magnitude as observed experimentally [see, for example, the electromyographic signals depicted in Figures 1-3 in Löfqvist et al. (1989)]. The figure shows that the amplitude of vibration grows the most rapidly when there is no change in CT activation (\(a_{\mathrm {CT},i} = 0.2\)), with the rate of amplitude growth decreasing with increasing \(a_{\mathrm {CT},i}\). Higher CT activation levels result in stiffer folds, and thus, higher frequency and generally lower amplitude, which relax as the activation level decreases in time. This is in agreement with the claim in Hombert et al. (1979) that the increased VF tension in phonetic contexts with vowels preceded by voiceless consonants is required to inhibit VF oscillations during the production of the voiceless consonant.

Fig. 8
figure 8

(Color online) Time-series of glottal area for varying initial CT activation levels using the BCM at \(a_{\mathrm {CT},f}=0.2\) and \(a_{\mathrm {TA}}=0.2\). Similar trends hold for \(a_{\mathrm {TA}}=0.4\) (not shown).The time-series of CT activation are shown in the inset

Figure 9 presents the temporal evolution of normalized fundamental frequency for the glottal area time-series shown in Fig. 8, as well as for analogous cases with \(a_\mathrm {TA}=0.4\). Decreasing CT activation levels during onset, with sufficiently large initial values, generally results in a decaying fundamental frequency profile, which matches empirical observations of vowels preceded by voiceless consonants, see Fig. 1. Figure 9 displays non-monotonicity of the fundamental frequency profile in some cases, such as for \(a_\mathrm {TA}=0.4\), \(a_{\mathrm {CT},i}=0.4\), wherein the frequency of the second cycle is larger than that of the first cycle despite the monotonic decay of CT activation levels. This has also been observed in empirical studies of onset fundamental frequency [see, for example, Löfqvist et al. (1989), Lien et al. (2015), and Fig. 15 in Appendix 4]. This non-monotonic behavior can be attributed, in part, to the collision-based mechanism, which is dominant in some cases, such as when \(a_{\mathrm {CT},i}=0.2\) (CT activation is constant-in-time). Collision onset causes fundamental frequency to increase, which opposes the effect of decreasing VF stiffness associated with the reduction in CT activation level. This demonstrates the complexity of phonation onset, where competing mechanisms are at play.

Fig. 9
figure 9

(Color online) Time-series of fundamental frequency, normalized with respect to sustained phonation fundamental frequency \(f_{\rm ss}\), for varying initial CT activation levels using the BCM for a \(a_\mathrm {TA} = 0.20\), and b \(a_\mathrm {TA} = 0.40\). The time-series of CT activation are shown in the insets

Similar fundamental frequency trends are observed when fixing initial CT activation and varying the final value, as shown in Fig. 10. This figure, in combination with Fig. 9, shows that when the drop in CT activation level is sufficiently large, the magnitude of the reduction in CT activation is correlated with the drop in (relative) fundamental frequency. This important finding will be used to explain some empirical observations in Sect. 4.3.

Fig. 10
figure 10

(Color online) Time-series of fundamental frequency, normalized with respect to sustained phonation fundamental frequency \(f_{\rm ss}\), with \(a_\mathrm {TA}=0.2\), \(a_{\mathrm {CT},i}=0.6\), and different final CT activation levels. The time-series of CT activation are shown in the inset

4.2 Thyroarytenoid muscle

Activation of the TA muscle contributes to adducting and shortening the VFs (Titze and Hunter 2007; Chhetri et al. 2012). TA activation during phonation initiation has been found to begin when the VFs start adducting and carries over into sustained phonation (Poletto et al. 2004). Studies with human subjects have shown that the relation between TA activation and sustained phonation fundamental frequency is proportional when fundamental frequency values are low, whereas the relation is inverse at high frequency levels (Titze et al. 1989). However, in vitro studies involving excised canine larynges (Chhetri et al. 2014) exhibit some deviations from human studies (Titze et al. 1989).

Figure 11 illustrates the relation between sustained phonation fundamental frequency and TA muscle activation for various CT activation levels. This figure shows that the relationship between steady-state fundamental frequency and TA activation level is relatively complex, with the influence of increasing TA depending on CT activation, in agreement with empirical data (Titze et al. 1989). For example, when \(a_\mathrm {CT}=0.2\), corresponding to relatively low fundamental frequency, the relation is proportional when \(a_{\mathrm {TA}}\in [0.1,0.3]\), but it is inverse when \(a_\mathrm {CT}=0.6\). The steady phonation results suggest that in scenarios where CT activation follows a decaying profile and TA activation is fixed, higher TA activation values can result in smaller differences between initial and final fundamental frequencies (see, for example, the decrease in spread between the curves in Fig. 11 as \(a_{\mathrm {TA}}\) increases from 0.1 to 0.3).

Fig. 11
figure 11

(Color online) Sustained phonation fundamental frequency as a function of TA muscle activation level for different CT activation levels

To test this, we perform onset simulations with decaying CT profiles similar to those shown in the inset of Fig. 10 at various fixed TA activation values. TA activation levels are selected such that states wherein CT and TA activation have agonistic and antagonistic influences on fundamental frequency in the steady-state analysis are both represented (i.e., the relation between fundamental frequency and  \(a_\mathrm {TA}\) is inverse at  \(a_{\mathrm {CT},i}\) and the relation between fundamental frequency and \(a_\mathrm {TA}\) is proportional at \(a_{\mathrm {CT},f}\), according to our quasi-steady analysis). We record the fundamental frequency of the first cycle, \(f_\mathrm {c,1}\) (the inverse of the time difference between the first two detected peaks of the glottal area waveform time-series) and normalize it with respect to the steady-state fundamental frequency during sustained phonation, \(f_{\rm ss}\). Figure 12 displays a contour plot of \(f_\mathrm {c,1}/f_{\rm ss}\) as a function of \(a_{\mathrm {CT},f}\) and \(a_\mathrm {TA}\). The figure shows that increasing \(a_\mathrm {TA}\) from 0.1 to 0.3 results in the initial normalized fundamental frequency decreasing for decaying CT profiles with \(a_{\mathrm {CT},i}=0.6\) and \(a_{\mathrm {CT},f}\in [0.2,0.4]\), in agreement with our quasi-steady analysis.

Fig. 12
figure 12

(Color online) Contour plot of initial fundamental frequency normalized by steady state frequency, \(f_\mathrm {c,1}/f_{\rm ss}\), as a function \(a_{\mathrm {CT},f}\) and \(a_\mathrm {TA}\), where \(a_{\mathrm {CT},i}=0.6\)

4.3 Comments on relations to empirical observations

As seen in Fig. 1, empirical data from human studies indicate that fundamental frequency decays during the onset of vowels preceded by voiceless consonants, with higher onset relative fundamental frequency values in the case of adult speakers with healthy voices in comparison with adult speakers with hyperfunctional voices (Stepp et al. 2010a)Footnote 13, where speakers with hyperfunctional voices often exhibit excessive and/or imbalanced muscular forces (Hillman et al. 1989, 2020). Figures 9 and 10 show that when the magnitude of the drop in CT activation during onset is sufficiently high there is a decrease in normalized fundamental frequency that is correlated with the magnitude of the reduction in CT activation. This suggests that CT activation levels may be a factor underlying the differences between healthy speakers and speakers with vocal hyperfunction, wherein healthy speakers potentially produce relatively larger variations in the CT activation levels during phonation onset. Interestingly, the non-monotonic behavior (initial rise followed by a fall) in fundamental frequency present in some cases in Fig. 9 is also observed in empirical studies, see for example Löfqvist et al. (1989), Lien et al. (2015), and Fig. 15 in Appendix 4. Moreover, Fig. 12 indicates that higher TA activation levels result in lower normalized fundamental frequency values during phonation onset, at least initially. Thus, TA activation level is another potential factor that may underlie clinical observations, where our results suggest that speakers with vocal hyperfunction may produce higher TA activation levels, which also may explain the empirically observed lower initial relative fundamental frequency values (Stepp et al. 2010a).

5 Conclusion

In this paper, we aimed to uncover some of the potential underlying mechanisms driving the observed differences in onset fundamental frequency patterns in different phonetic contexts, see Fig. 1, where we resorted to theoretical and numerical analyses of single- and multi-mass models, respectively (see Sect. 2). We found that the increasing degree of VF collision during onset, associated with the rise in vibration amplitude, and/or the decrease in the neutral glottal gap, naturally gives rise to an increase in system fundamental frequency (see Sect. 3). Such an increase in fundamental frequency is experimentally observed during isolated/initial vowels, and in some instances of a vowel preceded by a voiced consonant (see Sect. 3.4). In these cases, laryngeal muscle tension may still play a role, but it is not a prerequisite for the observed behavior.

On the other hand, our analysis suggested that muscle activation is necessary to produce the observed decay in fundamental frequency evident in vowels preceded by voiceless consonants, since the system dynamics with all control factors fixed produce the opposite trend (see Fig. 9a). In particular, our analysis indicated that reduction in fundamental frequency is due, in part, to a concomitant decrease in cricothyroid muscle activation during onset (see Sect. 4.1). Interestingly, the competing mechanisms of muscle activation and collision can lead to a frequency pattern that initially rises then drops, which has been observed in experimental studies of onset fundamental frequency (see Sect. 4.1). The magnitude of the reduction in cricothyroid muscle activation was found to be a potential factor underlying the differences in relative fundamental frequency between healthy and hyperfunctional voices during the onset of vowels preceded by voiceless consonants (see Sect. 4.3). Furthermore, our investigation suggested that increased thyroarytenoid muscle activation mitigates the drop in relative fundamental frequency caused by a decrease in cricothyroid muscle activation, which may also contribute to the experimentally observed differences between hyperfunctional and normal phonation (see Sects. 4.2, 4.3).

The current study utilized the body-cover model integrated with muscle activation rules and a quasi-steady viscous flow model and implemented simplifying assumptions regarding the role of acoustics and aerodynamics during phonation onset. In future work, we aim to adopt more physiologically accurate phonation models [e.g., articulating triangular body-cover model incorporating all five intrinsic muscles (Alzamendi et al. 2022)] and potentially a more complex flow model in order to gain refined insights into the impact of aerodynamics, acoustics, and laryngeal muscle activation on fundamental frequency during phonation onset. We further aim to more robustly examine vocal hyperfunction in the context of relative fundamental frequency through our modeling framework.