Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Core Messages

  • During phonation the respiratory cycle changes.

  • The vocal fold is composed of the thyroarytenoid muscle, its fibrous tissue cover, and the facing mucosa.

  • Cyclic repetition of closing and opening movements of the vocal cords results in vibration.

  • Voice production depends on neuromotor coordination of all muscles involved in phonation.

Voice production corresponds to the physiological and physical processes by which vibration of the vocal fold is transformed into speech. The primary driving force for vocal fold vibration and voice production depends on conversion of aerodynamic energy to acoustical energy when the vocal folds are closed in the midline. The sound produced by vocal fold vibration is immediately modified and filtered in the cavities located between the vocal folds and the lips (buccopharyngeal resonator). Because a number of factors can affect phonation, voice production is a highly variable process, not only from person to person but also within the same person [24, 30].

1.2 Breath Stream

The diaphragmatic muscle, innervated by the phrenic nerve, is the most important inspiratory muscle. Con­traction increases airway capacity, allowing a larger volume of air to be inhaled. During its relaxation, the air is exhaled from the lungs, which can go back to their initial volume owing to their elastic properties. During quiet breathing, inhalation is shorter than exhalation. Under some conditions (e.g., during phonation), accessory respiratory muscles may also be used. The accessory inspiratory muscles are the external intercostal muscles, scalene muscle, and sternocleidomastoid muscle. The expiratory muscles are the internal intercostal muscles and abdominal muscles, including the three oblique muscles and the right and dorsal large muscles [25, 27, 32].

During phonation, the respiratory cycle changes with shortening of inhalation and lengthening of exhalation. After closure of the vocal folds, blocking the airflow, and increasing subglottal air pressure, the speaker strives to maintain a constantly higher-than-normal expiratory pressure in the lungs and trachea (see reference to phonatory threshold pressure, below). After taking a deep breath during the prephonatory phase, the forces of elastic recoil are called into play. The diaphragm is not relaxed until the recoil forces diminish. The second phase corresponds to involvement of the internal intercostal muscles, which tend to decrease the size of the thorax and thus increase air pressure. The third phase corresponds to activation of the abdominal muscles, which constitute the most important active component. When singing, expiratory pressure, in the best case scenario, is controlled by contraction of the oblique abdominal muscles rather than by contraction of the right large abdominal muscles. Back muscles can also be used to stiffen the thorax.

1.3 Laryngeal Vibrator

The larynx sits on top of the trachea. The thyroid and cricoid cartilages, which are part of the larynx, provide reinforcement and prevent collapse of the airway. The other components of the larynx are mobile and form a closing mechanism that protects the trachea during deglutition. They include the arytenoid cartilage, epiglottis, and endolaryngeal muscles. For more information about the osteocartilaginous elements and the intrinsic and extrinsic muscles of the larynx, the reader is referred to classic anatomical descriptions [8, 33].

The vocal fold is a “multilayered” structure that exists only at the level of the anterior two-thirds of the fold (known as the “ligamentary” portion of the fold as opposed to the posterior cartilaginous portion, which corresponds to the vocal process). The vocal fold is composed of the thyroarytenoid muscle, its fibrous tissue cover, and the facing mucosa [7, 17, 18, 20] (Fig. 1.1).

Fig. 1.1.
figure 1_1figure 1_1

Frontal section showing the multilayered structure of the vocal fold.(From Hirano [18], with permission)

The vocal fold features are specifically designed for vibration. The vibrating free edge is covered with squamous epithelium, which is more resistant to the mechanical constraints produced by vibration and contact than the pseudostratified respiratory mucosa that lines the rest of the larynx. In addition, the epithelium is covered with a mucus layer whose outer layer has a mucin film to prevent dehydration of the underlying serous layer, cilia, and cells. The free edge of the vocal fold is glaborous (i.e., totally devoid of glands that might hinder mucosal wave formation); and most blood vessels as well as elastin and collagen fibers run parallel to the free edge of the vocal fold. The basement membrane is attached to the underlying lamina propria by interlacing fibers whose density appears to depend on genetic factors. Thus, genetics could predispose patients to develop certain lesions, such as nodules. The lamina propria has traditionally been divided into three layers according to the histological composition regarding elastin and collagen fibers (i.e., the superficial layer that corresponds to Reinke’s space in the classic description and the middle and deep layers that correspond to the vocal ligament). Interstitial proteins regulate vocal fold viscosity, which is an essential physical factor in vibration. Proteins also contribute to absorption of mechanical shocks caused by vibration. Hyaluronic acid is especially important for both viscosity regulation and shock absorption. The distribution of fibrous and interstitial proteins probably depends on the mechanical stress to which the vocal folds are subjected and may be genetically determined.

Two of the most important cells of the lamina propria are fibroblasts and myofibroblasts. Fibroblasts play a key role in maintaining the integrity of the lamina propria. They allow replacement of proteins. Myofibroblasts are present only after trauma or damage requiring regeneration or repair of the extracellular matrix. This suggests that vocal folds are competent in repairing microscopic trauma within 36-48 hours. It has been reported that vocal rest is useful to give myofibroblasts time to act.

1.4 Vocal Fold Vibration

All current theories and models of vocal fold vibration are based to some extent on the myoelastic-aerodynamic theory formulated by Van Den Berg. When the vocal folds are closed with appropriate tension on either side of the midline of the glottis (prephonatory attack position), airflow from the trachea is blocked and subglottic pressure increases. Vibration begins when subglottic pressure below the vocal folds exceeds fold resistance (phonation threshold pressure) and some air is released into the supraglottic region. As soon as the vocal folds separate, allowing some air to rush out, subglottic pressure decreases and the folds close back as a result of elastic recoil and the Bernouilli effect. Cyclic repetition of these closing and opening movements results in vibration [2, 6, 13, 16, 20, 23] (Fig. 1.2).

Fig. 1.2.
figure 2_1figure 2_1

Frontal section of the vocal folds shows resolution of the elastic conflict between subglottic air (opening force) and muscle and the elastic fold (closing force)

The mechanism underlying vocal fold vibration is comparable to that of a violin string. When the bow drags the string off the equilibrium point, countervailing elastic forces gradually build until they exceed the force of adhesion to the bow. At this point, the string “unhooks” and is free to oscillate (vibrate). When sufficient energy has been dissipated, the string adheres again to the bow. This is known as the stick-slip friction model involving alternation between a stick phase in which the string is dragged by the bow (“driving force”) and a slip phase in which the string is free to oscillate at a frequency determined by the mass of the string and the amount of tension applied. In the larynx, airflow over the free edge of the vocal fold serves as the driving force (instead of a bow) [6, 13, 15] (Fig. 1.3).

Fig. 1.3.
figure 3_1figure 3_1

Stick-slip friction model of fold vibration. (a, b) Stick phase. (c, d) Slip phase

According to Titze, phonation threshold pressure (i.e., the minimum air pressure required to sustain vocal fold oscillation) is the “missing link” in understanding vocal fold physiology [31]. The phonatory threshold pressure depends on several parameters.

  • Stiffness of the vibrating portion of the vocal fold

  • Viscosity of the vocal fold

  • Thickness of the free edge of the vocal fold

  • Width of the glottal opening prior to phonation

  • Transglottic pressure gradient

Under normal conditions the phonatory threshold pressure is between 2 and 4 hPa and the subglottic pressure is around 7 hPa; however, higher pressures may be necessary when a louder voice is required. It has been shown that the increase of pitch has a relation with vocal fold tension, leading to higher phonatory threshold pressure. In disease states involving vocal fold lesions, mucosal stiffness leads to an increase in phonatory threshold pressure. In the case of unilateral laryngeal paralysis, the prephonatory glottic gap is too wide and the speaker must compensate by increasing the subglottic pressure. Increased phonatory threshold pressure is a fairly accurate indicator of voice strain in disease states.

There are numerous ways to decrease phonation threshold pressure. In general, decreasing the velocity of the tissues can be achieved by improving hydration, thereby decreasing tissue viscosity. Another way to decrease phonation threshold pressure is to decrease the mucosal wave velocity. This can be achieved by lowering surface tension (low-pitched voice) or by hydrating the surface mucus. The prephonatory glottic gap can be narrowed by tightening the muscles slightly. The goal of laryngoplasty in patients with laryngeal paralysis is to decrease the width of the prephonatory glottic opening. It can also be useful to increase the thickness of the vocal fold (e.g., by speaking in a lower-pitched voice or in some cases by changing the register: chest vs. head).

During a cycle of vibration, the vocal folds are not similar. This difference can be heard; but the coupling and adduction of the vocal folds has the effect of synchronizing the vibrating masses. This process is effective so long as differences between the two vocal folds stay within a certain range. Beyond the effective range, however, various abnormalities can appear, including biphonation, which corresponds to synchronization every other cycle. Another, more complex phenomenon is reciprocal modulation of the folds characterized by the presence of subharmonics and bifurcations (i.e., sudden state changes). This problem is frequently observed in patients with unilateral laryngeal paralysis, which is often associated with sudden voice shifts (bitonal voice) [4, 15, 22].

1.5 Pitch Control

The pitch of the human voice is related to the fundamental frequency (F0) of vocal fold vibration. As shown in Table1.1, pitch depends on the length of the vocal folds and the sex, age, and weight of the person. Vocal fold thickness has also been shown to affect pitch, which increases with thickness in both men and women [3, 9, 10, 14, 29] (Table 1.1).

Table 1.1. Voice pitch as a function of age and sex [1]

Pitch control depends on adjusting the F0 of vibration. This adjustment can involve regulation of mass or tension, which can be done actively by contracting the intralaryngeal muscles or passively by contraction of the perilaryngeal muscles. Basically, pitch control involves the combined actions of two muscles: the cricothyroid (CT) muscle, which acts on vocal ligament tension, and the thyroarytenoid (TA) muscle, which acts on the muscle mass of the fold. This adjustment mechanism can be viewed as bipolar, according to the “body-cover” theory described by Hirano and Titze. If the TA muscle is contracted and the CT muscle is relaxed, the total length of the vocal fold increases; moreover, the overall stiffness of all layers increases, so the F0 increases. Conversely, if the TA muscle is contracted and the CT muscle is relaxed, the stiffness of the muscle mass and the F0 increase. Accordingly, these two muscles with different sources of innervation—superior laryngeal nerve (SLN) (for the CT) and recurrent laryngeal nerve (RLN) (for the TA)—can be seen as exercising differential control over the F0 (Fig. 1.4).

Fig. 1.4.
figure 4_1figure 4_1

Differential control of the fundamental frequency (F0) by contracting the cricothyroid (CT) and thyroarytenoid (TA) muscles. (a) Action of the muscles. (b) Map of muscle activation. (From Titze [32], with permission)

Another mechanism that can be used for pitch control involves increasing the cover tension by decreasing the depth of the vibrating tissue. This strategy is used to produce higher pitches, such as a falsetto sound. Decreasing the effective depth can lead to changes in F0 in the same way as changes in fold tension or length. The depth of the vibrating tissue can be regulated using the TA muscle. Although no accurate quantitative data are currently available, it can be speculated that the vocal fold ligament absorbs most of the elongation at high frequency so the remaining mucosa stays fairly loose. A taunt vocal ligament with a “free,” loose tissue edge appears to represent the optimal condition for high-pitched phonation. For lower and middle-range frequencies, the muscle portion of the body can be used for elongation and tensioning. The vocal ligament can remain loose, providing a greater depth of vibration in the modal register [5, 26, 28].

The complex action of the TA on the F0 is related to differences in tension and biomechanical properties of the tissue layers. According to Hirano, contraction of the TA muscle should be associated with an increase in body tension and a decrease in cover tension. If the vibrating tissue is composed only of mucosa, contraction of the TA muscle leads to lowering of the F0. Conversely, if the vibrating tissue is composed mainly of muscle, contraction of the TA muscle leads to a rise in the F0. Correlation between the F0 and TA activity becomes more and more positive as the depth of vibration increases.

Each layer of the vocal fold has distinct biomechanical properties, depending on what is known as the length-tension ratio (i.e., tension induced in the material by changes in length, known as the stress-strain curve). In this regard, it has been shown that collagen fibers are more resistant to elongation than elastin fibers. Variations in the concentration of collagen and elastin in each layer of the lamina propria explain differences in behavior during elongation. A stress-strain curve can be obtained for the whole vocal fold. Total fold strain (tensile strain) corresponds to a combination of various active and passive actions that occur during tensioning.

1.6 Intensity Control

Current research indicates that the optimal glottic configuration for phonation is achieved when the vocal folds are in virtual contact and the vocal muscles are fairly relaxed. A slight gap should be established between posterior ends of the two vocal folds by balancing the activity of the adductor (interarytenoids and lateral cricoarytenoids) and abductor muscles. Under these conditions a quasi-sinusoidal signal with low harmonic content can produce a “pure voice” [11, 26, 28].

Intensity is controlled by combined regulation of subglottic pressure and glottic configuration. Higher intensity is achieved by simultaneously increasing vocal fold adduction and subglottic pressure. Because increased vocal fold adduction leads to longer contact time between the vocal folds, higher intensity is accompanied by a shortened open phase of the vocal folds cycle [1]. This raises the issue of the optimal adduction configuration. If the vocal folds do not touch, the voice is weak and of poor quality. Conversely, excessive contact leads to vocal straining, resulting in a tight, pressed voice quality. The ideal configuration appears to occur when the vocal folds are almost in contact before phonation (decreased prephonatory glottic width). In this configuration, the vocal folds are almost completely free and can express the full range of vibration modes. The signal produced is practically sinusoidal. This mode of functioning corresponds to what some singing teachers refer to as a “free-floating voice.” According to this analogy, glottic resistance is adjusted to ensure the best possible yield from the ­conversion of aerodynamic to acoustical energy with ­minimal effect on vocal fold vibration. To increase the intensity, the glottis operates on a more “open-shut” than “wave” basis. However, glottal efficiency decreases, and a large amount of energy is dissipated at the vocal fold level in the form of friction, which can cause local inflammation and even fold lesions. These lesions, called dysfunctional lesions, are preferentially in the zone where contact of the vocal folds is the strongest (i.e., the middle third).

Increased tension in the voice apparatus can leads to “voice straining” on the part of the speaker. In the English-language literature, voice straining is often referred to as vocal misuse or abuse. In fact, the straining concept goes beyond vocal fold function and applies to all physiological components involved in communication. When striving to attract attention, the speaker increases muscle tension to produce a stronger, more effective (“projected”) voice. This behavior, characterized by stiffening of the body, has been shown to result in increased muscle activity throughout the body. The breathing pattern changes in association with voice straining. Inhalation is deeper to increase subglottic pressure (prephonatory attack phase). Some subjects have trouble relaxing muscles sufficiently to inhale deeply and may need to use their accessory inspiratory muscles (“thoracic breathing” in place of the normal “abdominal breathing”). Stiffening is also observed in all posturing muscles including not only those of the neck and larynx but also those of the calves and back. Increased muscle activity in relation to increased vocal intensity requires more energy. If subjects do not or cannot rest sufficiently to offset the excess energy expenditure, they may develop complications such as dysfunctional laryngopathy (vocal overuse). Because voice straining affects all these components, rehabilitation should not be limited to changing the glottic configuration. Management must include a wide range of aspects, including general muscle tension, stress level, posture, and prephonatory respiration.

A number of factors promote synchronization of the vocal folds [12]. The first is symmetry of shape and tension of the vocal folds in the normal resting state. In this regard, unilateral laryngeal paralysis represents the worst possible condition. It should be noted that acceptable fold vibration can be obtained if contact is reestablished between the vocal folds, as can be observed during speech therapy or laryngeal manipulation. Another factor promoting synchronization is the Bernoulli effect, which applies equally to the two vocal folds and so tends to have the same effects as a function of glottic configuration. The most important synchronizing factor is the tissue mass-combining effect of direct contact between the vocal folds. The quality of contact is highly dependent on the vocal fold cover mucosa, with viscosity playing a major role. In vitro experiments on excised larynx models in our laboratory showed that the frequency of the vibration directly correlated with the viscosity of the artificial lubricant applied. The higher the viscosity of the lubricant used, the more the vibration frequency decreased when the vocal fold “closure” time increased. It has also been shown that more viscous mucosa increases the phonation threshold. Conversely, the greater the degree of asymmetry and freedom tend to be, the greater is the need for “forced” synchronization. It can thus be understood that the mechanism underlying “voice straining,” used to increase loudness, is similar to the mechanism used to compensate for abnormalities in laryngeal vibration.

1.7 Vowel Production in the Vocal Tract

Human speech production depends on sound transformation in the vocal tract. According to the source-filter theory, the source sound is a pulsed airstream from the glottis containing numerous frequencies. Filtration consists of selecting certain frequencies for transmission through the mouth. The vocal tract acts as a resonator by suppressing the transfer of some frequencies. The concept of resonance in a tube is based on interference between waves submitted to multiple reflections. Like a wind instrument used to make music, the human vocal tract resonates at various source fre­quencies depending on the anatomical features that ­determine production of speech sounds (phonemes) [21, 30].

The resonance frequencies (formants) of the vocal tract are commonly numbered consecutively upward from the lowest frequency (F1-F5). Low-pitched formants correspond to the pharynx and high-pitched formants to the oral cavity. Vowels’ formants are the lowest-pitched formants at F1 and F2. Thus, for vowel perception, the filtering process simplifies the code presented to the listener. In traditional phonetics, vowels are classified in regard to how the tongue is positioned in the mouth from top to bottom and from front to back. The tongue is placed at the bottom and back for the French /a/ vowel sound, at top and front for the French /i/ vowel sound, and top and back for the French /u/ vowel sound. These three vowels determine the vowel triangle on a F1 and F2 orthonormal representation.

To produce the French /eu/ vowel sound, the vocal tract is almost tubular with an almost constant cross section due to the neutral position of the tongue. The frequencies are about 500 Hz for F1 and 1500 Hz for F2. On the same diagram we can see the position of F1 and F2 with the tongue in different positions. For the French /a/ vowel sound, the vocal tract can be modeled as a narrow tube for the pharynx and a larger tube for the oral cavity.

Formants can be modified by articulatory movements. In general, the frequencies of all formants decrease evenly as the length of the tube increases. The length of the vocal tract can be changed by lowering the larynx or by projecting or retracting the lips. Because these movements cause frequency sliding without changing the interval between formants, there is no change in vowel identification. The sound of vowels can be modified by rounding the lips to reduce the mouth opening. Horn players sometimes cover the ends of their instruments to achieve this effect. The acoustical effect of obturation is the same as that of lengthening the tube (i.e., frequency sliding to a lower register). With the combination of adjusting the height of the larynx and the position and shape of the lips, it possible to enhance or muffle the pitch of the voice. Singers use several techniques to adjust pitch. Some sopranos can lower pitch by dropping the jaw. Using this technique, F1 can be brought into contact with F0; and acoustical power can be increased. By increasing acoustical power, contraction of the mouth lowers F1 and increases F2 to produce vowels with a wider spectrum (e.g., for the French /i/ vowel sound). Conversely, contracting the pharynx increases F1 and decreases F2 to produce a more compact vowel (e.g., for the French /a/ vowel sound).

Singers frequently talk about voice placement and claim that some vowels have exact locations in the vocal tract. The sensation that some vowels have precise locations could be related to the localization of pressure maxima of the standing waves in the vocal tract. Thus, there would be places where the sensations must be maximum (e.g., high pressure at the palate level for the French /i/ vowel sound, high pressure in the velar region for the French /u/ vowel sound, and high pressure in the pharyngeal region forthe French /a/ vowel sound). It is likely that some singers are able to use this sensation to customize vowel production. Other singing techniques based on sensations such as singing in a mask, using the jaw as a resonator, or directing the note at the level of the palate just behind the upper incisors may also be related to pressure maxima at precise locations in the vocal tract.

1.8 Nervous System Control

Voice production depends on neuromotor coordination of all muscles involved in phonation, ranging from posture and respiratory muscles to the muscles of the larynx, pharynx, and buccolabial articulatory apparatus [34].

1.8.1 Sensory Innervation of the Larynx

Sensory innervation of the larynx is provided mainly by the SLN, which receives fibers from the laryngeal vestibule and the laryngeal margin. These fibers merge with the vagus nerve at the level of the inferior vagus ganglion. Innervation of the vocal fold and subglottic region is also provided by fibers that merge with the RLN. There are sensory mucosal receptors in contact (mechanoreceptors) that induce the cough reflex when stimulated. They are mainly located at the vestibular level. In addition, intrinsic and extrinsic muscles present several types of articular and intramuscular mechanoreceptors (corpuscular, neuromuscular bundles, spiral) that supply nerve centers with proprioceptive information concerning vocal fold tension and elongation. The fibers penetrate the bulb of the vagus nerve and run in the direction of the nucleus of the solitary bundle.

1.8.2 Nerve Centers

The brain areas responsible for motor control of the pharynx and larynx are located in the lower part of the ascending frontal convolution (or precentral gyrus) of both hemispheres. When all or parts of these areas are stimulated, an overall larygeal response is observed with vocalization, inhibition of the posterior cricoarytenoid muscle, and bilateral activation of one or several adductor muscles. Cerebral damage in this area leads to unilateral paralysis.

There are many connections in the brain, particularly with language-related centers (e.g., the gyrus supramarginalis). The associative pathways between pharyngolaryngeal motor regions and cortical and subcortical auditory zones are especially noteworthy.

1.8.3 Reflex Control

Articulatory adjustment during phonation takes place during the prephonatory period and during sound production. Prephonatory adjustment is independent of audiophonatory control. This explains how singers produce sounds at a predetermined pitch and intensity. Prephonatory regulation in the cortex depends on input supplied by laryngeal mechanoreceptors concerning tension and position of the various muscles and articulations. During phonation, this input allows the adjustments necessary to maintain the glottic configuration to be made instantaneously. It is likely that other reflex arcs involving the abdomen thorax, neck, and tongue, among others, provide the feedback needed for continuous adjustment of the larynx during phonation.

1.8.4 Audiophonatory Control

Auditory feedback is a necessary component of voice control. This is demonstrated by the disordered, unmodulated voice produced by people with congenital deafness. Audiophonatory control probably depends on voluntary commands produced by corticobulbar pathways in res­ponse to acoustic input arriving in the auditory cortex as well as a range of acousticolaryngeal reflexes. How­ever, these control mechanisms act in synergy with proprioceptive control, allowing prephonatory tuning.

During the first months after deafness, the proprioception input explains the almost normal voice of people who became deaf.