Keywords

Speech aerodynamics that occur during alaryngeal speech are significantly different than those during laryngeal speech because of the separation of the lower from the upper airway. This chapter considers three factors that have the potential to impact alaryngeal speech aerodynamics. The first relates to alterations to the physiological function of the lower respiratory tract postlaryngectomy. The second set of factors to be addressed are those stemming from the alaryngeal voice source. The primary methods of alaryngeal voice and speech used after total laryngectomy, namely, esophageal (ES), tracheoesophageal (TE), and artificial larynx (AL) speech, have very distinct aerodynamic characteristics. Third, and finally, aerodynamic changes associated with production of consonants across the primary methods of alaryngeal speech options are reviewed.

Function of the Lower Airway in People with a Laryngectomy

Histological and Physiological Changes After Laryngectomy

A wide range of changes in the lower respiratory tract are to be anticipated after total laryngectomy. These changes will occur at both histological and physiological levels. For many decades it has been known that the separation of the lower from the upper airway following total laryngectomy results in histological changes within the trachea that are indicative of chronic inflammation of the epithelium (Griffith & Friedberg, 1964; Rosso, Prgomet, Marjanović, Pušeljić, & Kraljik, 2015). Work by Hilgers and colleagues over many years has delineated the changes in the tracheal and lung environments that are induced by this disconnection between the upper and the lower airway (see Zuur, Muller, de Jongh, van Zandwijik, and Hilgers (2006) for a review; also see Bohnenkamp, Chap. 7, and Lewis, Chap. 8). Briefly, the epithelial irritation stems primarily from reduced warming and humidification of inspired air when breathing through an open tracheostoma. Reduced filtering of particles from the inspired air also can contribute to this tissue inflammation. As a result, irritation and drying of the epithelium results in increased mucus production (Rosso et al., 2015) and a diminished number and functioning of cilia in the tracheobronchial tree (Roessler, Grossenbacher, & Walt, 1988). If left unmanaged through the use of a HME, an increase in bacterial infections and bronchial obstruction that worsens over time may be expected to occur (Todisco, Maurizi, Paludetti, Dottorini, & Merante, 1984; van den Boer, van Harten, Hilgers, van den Brekel, & Retèl, 2014). These changes in the respiratory tract are the cause of a wide range of respiratory complaints that individuals with a laryngectomy self-report, such as a presence of excess phlegm and involuntary coughing (Hilgers, Ackerstaff, Aaronson, Schouwenburg, & Van Zandwijk, 1990).

Standard pulmonary measures obtained via spirometry have been used to quantify the physiological functioning of the respiratory system after total laryngectomy. Ackerstaff, Hilgers, Balm, and Van Zandwijk (1995) reported data for pulmonary measures in 58 individuals after total laryngectomy (median 2.9 years postsurgery). Total lung capacity, maximum vital capacity, forced expiratory volume, peak expiratory flow, and maximum expiratory flow at 50% were reduced relative to predicted values. The finding of reduced lung function by Ackerstaff et al. is consistent with reports in other studies of standard lung function after total laryngectomy (Harris & Jonson, 1974; Todisco et al., 1984).

In a subsequent study, Ackerstaff, Hilgers, Meeuwis, Knegt, and Weenink (1999) also found vital capacity and forced expiratory volumes were reduced when measured at 9 days after laryngectomy and again at 6 months postoperatively. The degree of pulmonary function reduction in this early time period was not as great as the reduction in pulmonary function that Ackerstaff et al. (1995) reported when data collection occurred much further out from the laryngectomy surgery (median of 2.9 years postsurgery). Ackerstaff et al. (1999) posited that respiratory changes after total laryngectomy may worsen as the time from surgery increases. Finally, the upper airway and in particular the nose and nasopharynx provide beneficial airway resistance that ultimately results in high arterial oxygen saturation as well as total lung volume (McRae, Young, Hamilton, & Jones, 1996). Several studies have reported that using a HME results in an increase in tissue oxygen saturation levels (Ackerstaff et al., 2003; McRae et al., 1996; Jones et al., 2003). Overall, the literature supports the conclusion that there are substantial changes in the respiratory system after total laryngectomy.

Although the surgery itself may be a direct cause of pulmonary changes after total laryngectomy, it is also possible that lung function is degraded prior to the laryngectomy procedure. Smoking is a well-documented primary risk factor for laryngeal cancer (Sadri, McMahon, & Parker, 2006; Wynder, Bross, & Day, 1956; Wynder & Stellman, 1977). Approximately 80–85% of people who require total laryngectomy will either be former or current smokers (Achim et al., 2017; Goepfert et al., 2017). The percentage of people who continue to smoke after a total laryngectomy has ranged widely across studies from approximately 7% (Achim et al., 2017) to 30% (Goepfert et al., 2017). The large range of percentage of patients who continue to smoke that is reported across studies may be related to how far out from the surgery patients are queried. Eichler et al. (2016) reported that the percentage of people who continued to smoke immediately after surgery was approximately 22%, dropping to 7.5% at 3 months and eventually dropping to 3.8% at 3 years after surgery in a large German cohort study.

Regardless of whether a person was a former smoker or remains a smoker after total laryngectomy, the risk of respiratory disease is elevated. Chronic obstructive pulmonary disease (COPD) is known to be strongly associated with smoking (Centers for Disease Control and Prevention, 2017). Further, COPD in particular has been identified as a common condition among those who have had a total laryngectomy, occurring in about 80% of people who undergo the procedure (Hess, Schwenk, Frank, & Loddenkemper, 1999; Togawa, Konno, & Hoshino, 1980).

Even without respiratory disease, lung function is known to decline from approximately the fourth decade of life with a steeper slope of change presenting in the seventh decade (Zeleznik, 2003). Overall, the changes in pulmonary function that are observed after total laryngectomy are likely the sum of impacts from smoking prior to surgery, any associated lung disease that might have been caused by smoking, advancing age, and the direct impacts from surgery when the lower and upper airways are separated.

Impact of Lower Airway Changes on Alaryngeal Speech Aerodynamics

The impact that altered pulmonary function after total laryngectomy can have on an individual’s quality of life is well documented (Dassonville et al., 2011; Hilgers, Aaronson, Ackerstaff, Schouwenburg, & van Zandwikj, 1991; Parrilla et al., 2015). The issue considered here is whether altered pulmonary health and baseline pulmonary functioning directly or indirectly impacts the aerodynamics of alaryngeal voice and speech production. Before discussing available literature on how lower airway function might directly impact the aerodynamics of ES, TE, and AL speech, a few indirect impacts from poor pulmonary health are presented in the subsequent section.

Indirect Impact of Pulmonary Status on Alaryngeal Speech

The indirect impacts that pulmonary disease might have on alaryngeal voice and speech are focused more broadly on the rehabilitation process rather than directly on alaryngeal speech aerodynamics. That is, the comorbidities of COPD, specifically fatigue, depression, and cognitive impairment, are of particular concern given that COPD occurs commonly in the total laryngectomy population (Hess et al., 1999). Fatigue has repeatedly been identified as a common complaint in people with COPD (Kentson et al., 2016; Stridsman, Mullerova, Skar, & Lindberg, 2013). In fact, fatigue has been described as the main extra pulmonary symptom of the disease (Antoniu & Ungureanu, 2015). In addition to possible fatigue associated with COPD, it is estimated that 40–90% of individuals with cancer who have been treated with chemoradiation experience cancer-related fatigue (CRF ; Prue, Rankin, Allen, Gracey, & Cramp, 2006). Cancer-related fatigue is a complex of symptoms distinct from the fatigue that someone without cancer experiences because CRF usually lasts longer, does not improve with rest, results in significant distress, and is unpredictable relative to activity level (Gerber, 2017; Jereczek-Fossa et al., 2007; Medysky, Temesi, Culos-Reed, & Millet, 2017).

If the fatigue from COPD with or without CRF is substantial enough, rehabilitation attempts could be negatively impacted because a person is less able or willing to attend sessions or complete scheduled therapeutic activities. Thus, adherence to rehabilitation recommendations and demands may be influenced by the altered respiratory state (see Bohnenkamp, Chap. 7 and Lewis, Chap. 8). Indirectly, then, the ability to learn and use any of the alaryngeal communication methods could be reduced by the presence of fatigue from COPD, CRF, or both. For example, a person who only intermittently is able to keep scheduled treatment sessions with their therapist or who cannot practice with their new alaryngeal communication mode at home may show inconsistent, slow, or no progress in acquiring functional alaryngeal speech and voice. There could be a range of other potential impacts depending on the severity of the fatigue. In some cases, a person may not have the energy to perform the daily care of the stoma or the TE prosthesis, or they may lack the strength to maintain arm, shoulder, and head positions for practicing with an artificial larynx.

Depression among those who have had a total laryngectomy occurs at a rate higher than that of the general population (Batioğlu-Karaaltin, Binbay, Yiǧit, & Dönmez, 2017; Perry, Casey, & Cotton, 2015). There are several factors that might cause depression in this patient population, including COPD (Lou et al., 2012; Ng, Niti, Fones, Yap, & Tan, 2009; Ng et al., 2007). Other factors associated with depression in people after laryngectomy include altered feelings regarding sex and sexuality (Batioğlu-Karaaltin et al., 2017), changes in physical appearance (Danker et al., 2010) and a shifting of family dynamics (Offerman, Pruyn, de Boer, Busschbach, & Baatenburg de Jong, 2015). Depression, regardless of the cause(s), is known to reduce compliance with medical treatment regimens (DiMatteo, Lepper, & Croghan, 2000). A person who is depressed may have decreased motivation to attend treatment sessions, less energy to practice alaryngeal communication skills, and less desire to interact with others, thereby impacting the acquisition and improvement in using any alaryngeal communication method.

Aspects of cognitive function are known to decline as a person ages whether or not they have COPD. For example, age-related declines have been reported for attentional control, working memory, and cognitive processing speed (Edelstein, Pergolizzi, & Alici, 2016). A diagnosis of cancer appears to be associated with further risk of cognitive decline. Dubruille et al. (2015) reported that 46% of adults ≥65 years old who were diagnosed with cancer but had not started treatment demonstrated cognitive declines. Specific to people with head and neck cancer, Bond, Dietrich, and Murphy (2012) and Bond et al. (2016) reported neurocognitive impairment in 38–47% of patients prior to the start of cancer treatment. Furthermore, others have reported cognitive declines following chemoradiation treatments in people with head and neck cancer (e.g., Gan et al., 2011; Hsiao et al., 2010; Yuen et al., 2008). COPD is a further risk factor for cognitive decline to consider for a person with a laryngectomy. Individuals with COPD are now recognized as having a higher incidence of cognitive decline compared to their age-matched peers regardless of other medical diagnoses (Yohnnes, Chen, Moga, Leroi, & Connolly, 2017). Roncero et al. (2016) reported that 39% of 940 adults with COPD were determined to have cognitive impairment as documented on standardized testing. A meta-analysis by Zhang et al. (2016) concluded that those with COPD have a higher risk of cognitive decline compared to participants without COPD. Overall, these declines may not be the most debilitating aspect of a person’s cancer treatment, but a clinician should be vigilant for potential impacts on the therapeutic process. For example, diminished working memory and speed of information processing might require that the pace of providing instructions be altered and that information be provided in several formats (e.g., verbally, written, pictorial). Additional reminders may be needed to help the person complete practice at home. Assistance from others in the household might be needed to remember daily tasks such as charging of a backup AL battery, replacing an HME filter, and so forth. Of particular importance to the communication rehabilitation process are findings from Bond et al. (2012). They identified specific deficits in verbal learning and verbal memory in 99 head and neck cancer patients prior to treatment. In their follow-up study after the patients had undergone chemoradiation therapy (Bond et al., 2016), they reported that 13% had further declines in language domains of verbal fluency and verb retrieval. It is not known how severe those deficits were, but a treating SLP should be mindful that communication deficits could go beyond speech and voice. While language intervention may not take precedence over reestablishing alaryngeal voice and speech, the language deficits could manifest in communication exchanges or could require adjustments to account for reduced verbal memory skills.

Direct Impacts of Pulmonary Function on Alaryngeal Speech

Esophageal Speech

Esophageal speech production does not utilize pulmonary air to initiate voicing. As such, there is limited expectation of a mechanism by which poor pulmonary health or functioning will directly alter ES voice and speech aerodynamics. However, Ackerstaff et al. (2003) provide some data suggesting that improved pulmonary function through the use of an HME can positively influence dimensions of voice across ES, TE, and AL speech. Specifically, they reported that improvements occurred for the dimensions of loudness, intelligibility, and fluency in 59 patients with a laryngectomy who wore an HME regularly during the study. However, broad generalization to ES speech is tempered by the small proportion of the study population that used this method of speech (3 participants out of 59) and lack of description of outcomes per alaryngeal communication mode. However, as a general finding, the Ackerstaff et al. (2003) results indicated improvements in various dimensions of the voice for a heterogenous group of alaryngeal speakers, several for whom the voice source is not directly dependent on lung function, i.e., the 3 ES and 12 AL speech participants.

DiCarlo, Amster, and Herer (1955) investigated speech breathing during ES speech using kinematic measures of chest wall movements. Movements of the rib cage and abdomen were reduced in amplitude during ES speech, as was utterance length, compared to the laryngeal speaking participants. There was evidence that the ES participants judged to exhibit better speaking skills coordinated inspiration through the stoma with their attempt to insufflate the esophagus. Those who were less adept at ES speech demonstrated increased discoordination between these two events. It is difficult to definitively draw conclusions from the finding that ES speech skill level is related to how well a person coordinates inspiration with esophageal insufflation because details about the specific method of esophageal insufflation were not provided. A speculative conclusion is that those with more coordinated action between inspiration and insufflation were utilizing the “inhalation method” to get air into the esophagus. This method relies on respiratory movements to decrease air pressure in the esophagus (see Doyle & Finchem, Chap. 10). This would suggest that the inhalation method is associated with better ES speech skill and would be consistent with the following statement from Gardner (1971) regarding this insufflation method: “The speaker feels this as a sensation of sucking in air, as we all do with breathing. He naturally will believe that the inhalation method is the most natural and the easiest way of moving air into the esophagus” (p. 43). However, evidence from Deidrich and Youngstrom (1966) indicated that superior ES speech skill level is not dependent on the use of the insufflation method.

Additional information about respiratory activity during ES speech is limited. Stepp, Heaton, and Hillman (2008) provide the only other specific investigation of relevance. They investigated the pattern of speech breathing changes over several months and years in ES, TE, and AL speech. More specifically they were looking at the percentage of speaking time that occurred during the inspiratory portion of the breathing cycle. Larger percentages would suggest a dissociation occurring between talking and breathing relative to what happens in people without a laryngectomy for whom talking occurs almost exclusively on exhalation. One ES participant was included in Stepp et al. (2008) and that person’s data were collapsed with data from the larger TE group for statistical purposes. However, Stepp et al. did include a figure that showed the ES speaker’s data recorded at 5, 11, and 15 months postlaryngectomy. At 5 months postsurgery, this person had less than 5% of their total speaking time occurring during the inspiratory cycle. This increased to about 25% of speaking occurring during inspiration when the patient was seen at 10 months postsurgery and then 17% when last evaluated at the 15-month mark. Prudence dictates caution in over interpreting the results from one person. However, if there is a pattern of increased dissociation between speaking and breathing in ES speakers that occurs in the initial months after surgery, it will be important for researchers and clinicians to determine if this dissociation impacts ES speech proficiency. DiCarlo et al. (1955) suggest the possibility that retaining coordination between talking and breathing may be important for good ES speech, but the empirical literature is silent on the matter. At this point in time, there is not sufficient evidence to advocate for direct intervention to alter the relationship between breathing and ES talking unless within a given individual, the SLP can systematically observe and document how such intervention results in improved communication. When teaching the “inhalation method,” it does make logical sense to insure respiratory-talking coordination. This is because the esophageal insufflation method relies on the inspiratory movement to assist in getting air to flow into the esophagus.

Tracheoesophageal Speech

Pulmonary air provides the power supply for TE voice production. It is therefore logical to consider whether poor pulmonary health impacts this method of alaryngeal voice production. One source of evidence which indicates that pulmonary status is influential in TE speech stems from outcome studies on the use of HMEs which are designed to improve pulmonary function. In Ackerstaff et al. (2003), for example, 75% of the participant pool were TE speakers. Not only did pulmonary symptoms improve after wearing an HME for several months, but voice related parameters also improved. Ackerstaff et al. noted that improvements in loudness, fluency, and intelligibility were most apparent when data from TE participants were evaluated without inclusion from the 1 ES and 12 AL speakers who were part of the study. One direct inference from these data is that if pulmonary status is improved in TE speakers, voice parameters are likely to improve. The authors attributed the improved TE voice function following several months of HME usage to a few factors: reduced mucus production that could diminish “bubbly” sounding voice, reduced mucus leading to less frequent obstruction of the prosthesis, increased humidity in the air diverted through the prosthesis resulting in less drying of esophageal mucosa, and improved distribution of stoma occlusion pressures (digital) in the peristomal region, thereby placing less pressure on the voice prosthesis, pharynx, and tracheostoma.

The Ackerstaff et al. (2003) results are consistent with those reported by Dassonville et al. (2011). The latter reported that 25 individuals who were TE speakers self-reported improvements in ease of TE voice production, intensity, and fluency after wearing an HME over a 3-month timeframe. Pulmonary function in terms of coughing, dyspnea, and forced expectoration also improved. The implication provided by the authors was that improved baseline pulmonary functioning was the likely basis for the self-rated improvements in TE voice parameters.

Ward et al. (2007) reported respiratory kinematic data in TE speakers compared to participants who have not had a total laryngectomy. They found that the TE participants initiated speech at a higher percentage of vital capacity and terminated speech at a lower percentage of the vital capacity, than did laryngeal speakers. Bohnenkamp, Forrest, Klaben, and Stager (2011) reported rib cage and abdomen movements in TE speakers during spontaneous speech and while reading that were comparable to those of Ward et al. (2007). Additionally, Bohnenkamp et al. demonstrated an increase in their TE speakers’ resting expiratory levels (REL) which resulted in them continuing to speak into their functional residual capacity. Adults with respiratory compromise are known to consistently terminate speech below their REL (Lee, Loudon, Jacobson, & Stuebing, 1993). Increased lung volume at speech initiation, stopping speech below REL, and producing shorter utterance lengths compared to laryngeal speakers suggest that TE speakers may have an increased respiratory effort to speak (Bohnenkamp, Forrest, Klaben, & Stager, 2012).

Finally, the Stepp et al. (2008) study cited above included two TE speakers tracked over several months and two others who were seen for a single evaluation of respiratory kinematics. The two who were tracked over several months demonstrated an increase in the percentage of their total speaking time that occurred during respiratory inhalation. For one TE speaker, about 5% of their speaking time was spent inhaling when they were evaluated 1.5 months postsurgery; when evaluated at 33 months postsurgery, the percentage had increased to 11%. The second TE speaker was tracked from 4 to 12 months and demonstrated an increase from approximately 15–33% in the amount of their speaking time spent inhaling. A percentage increase in total speaking time that is occurring during inhalation is interpreted as an increased dissociation between speaking and breathing. Overall, these studies suggest that the TE voice may improve as pulmonary function is improved by HME usage. There are changes in the lung volume levels at which TE speech is initiated and terminated, with speech extending below one’s REL. Finally, for TE speakers there may be a dissociation in the temporal relationship between breathing and speaking that increases as a function of time postsurgery. That is, the percentage of the total time spent talking that occurs during inspiration may increase the further the TE speaker is from the time of their surgery.

Artificial Larynx Speech

Electrolaryngeal (EL) speech is not dependent on pulmonary air to produce voice. Therefore, there is perhaps a limited expectation that poor pulmonary function will impact EL speech. However, a few reports relevant to this topic are in the literature. Two studies support the conclusion that EL speakers are likely to talk during the inspiratory portion of the respiratory cycle. Stepp et al. (2008) included nine EL speakers, three of whom were tracked over several months and six who were seen one time at least 12 months postsurgery. Those tracked over time demonstrated a 27% increase, on average, in the total amount of their EL speaking time that occurred during inhalation when assessed 2–4 months after surgery compared to 8–12 months after surgery. This finding is in contrast to a 12% increase of speaking time happening during inspiration for TE/ES speakers in that same study (Stepp et al., 2008). Considering only the single time-point of evaluation that was done for six other EL participants, 33% of the EL talking occurred during inhalation compared to 19% for the TE/ES participants.

Similar to the findings reported by Stepp et al. (2008), Bohnenkamp, Stowell, Hesse, and Wright (2010) recorded chest and abdominal movements in six EL speakers while also recording their speech. The EL speakers started to talk with the EL before peak inspiration occurred (i.e., during inspiration) for 61% of spontaneous utterances and 58% of reading utterances from the Rainbow Passage. Findings from Stepp et al. and Bohnenkamp et al. support the conclusion that the relationship between EL speaking and the respiratory cycle is altered for a substantial portion of the time an EL speaker spends talking. The fact that the respiratory system is not integral to the production of EL speech is speculated to result in this “decoupling” of respiration and speech production (Bohnenkamp et al., 2010), wherein EL talking often occurs during inspiration. However, the findings from Stepp et al. (2008) and Bohnenkamp et al. (2010) differ from those of Liu, Wan, Wang, and Niu (2004) who noted only 1 EL participant out of 12 who spoke on inspiration during sentence and poem reading. Four others in the Liu et al. study were noted to hold their breath, and the remaining were observed to speak during the expiratory phase of respiration. Of note was that breath holding occurred in those who had used the EL the longest, and further, these individuals also had better ratings of acceptability of the EL voice. The authors implied that those who were utilizing a pattern of expiring air during speaking would, or perhaps should, gravitate toward the breath hold pattern over time. Overall, these three studies offer varied data, but one consistent message is that as the time from surgery increases, there is an increased decoupling of EL talking and the respiratory cycle. The variation across studies is that two of them (Bohnenkamp et al., 2010; Stepp et al., 2008) indicate that the dissociation is toward an increased percentage of EL talk time that happens during inspiration, whereas Liu et al. (2004) reported that EL talking occurred during breath holding in speakers who were further out from surgery. From a practical standpoint, breath holding during EL talking imposes a physiological limit on how long the person can talk before needing to stop for breath. Although Liu et al. (2004) appear to encourage breath holding as a positive goal for EL speakers, further investigation of the issue is warranted to determine the benefit and drawbacks of breath holding. Anecdotal evidence suggests that breath holding is not necessary for functional or excellent EL usage.

Bohnenkamp et al. (2010) also provided information about respiratory behavior in EL communication beyond the temporal issues discussed in the previous section by also reporting on various measures of lung volume and respiratory kinematics. The lung volumes utilized by EL participants during speaking tasks were found to be comparable to those reported for adults without a laryngectomy, that is, approximately 60% of vital capacity at speech onset and 40% at termination (Hixon, 1973; Hixon, Mead, & Goldman, 1976). The REL was noted to increase in EL speakers, and they consistently continued to speak into their functional residual capacity. Taken together, the findings of Bohnenkamp et al. (2010) regarding the lung volume data indicate a respiratory system in the EL speaker that is being taxed more than what occurs in normal, non-laryngectomy speakers.

The studies to date on EL speakers indicate that a person becomes increasingly likely to spend more time talking during the inspiratory portion of the respiratory cycle or during breath holding, suggesting a dissociation between the usual pattern of speaking on exhalation. Additionally, the results from Bohnenkamp et al. (2010) further indicate that a person speaking with an EL may be stressing the respiratory system by talking further into their functional residual capacity.

In contrast to the electronic artificial larynx, the pneumatic artificial larynx requires a pulmonary air supply to create voice. It is, therefore, reasonable to speculate that altered pulmonary function after total laryngectomy might have an impact on this form of alaryngeal speech . However, there are no available descriptions of how reduced or altered pulmonary function impacts the use of a pneumatic artificial larynx.

Alaryngeal Voice Source Aerodynamics

The aerodynamics of alaryngeal voice production are altered because of two major changes to anatomy. The first is removal of the normal voice source, namely, the larynx and vocal folds. Alaryngeal voicing requires replacement of this vibratory source. Dependent on the method, the replacement alaryngeal voice source will impact the aerodynamics of sound production. The second anatomical change that alters alaryngeal voicing aerodynamics is the diversion of pulmonary air out of a stoma at the base of the midline neck. Pulmonary air cannot be used to initiate and sustain alaryngeal voice source vibration unless the airstream can be routed toward and through the replacement voice source. The aerodynamics of each method of alaryngeal communication are described separately given that the vibratory source, the air supply, or both can differ across alaryngeal options. As a basis for comparison to studies of alaryngeal speakers, Table 13.1 presents representative data on voice source aerodynamics for laryngeal speakers.

Table 13.1 Representative normative values for aerodynamic parameters involving the voice source in laryngeal speakers

Esophageal Voice

Esophageal voice is produced using the pharyngoesophageal segment (PES) as the vibratory source and air within the esophagus as the driving force that initiates and sustains vibration (see Doyle & Finchem Chap. 10, for details about this process). Briefly, air from the upper vocal tract (mouth, nose, throat) is compressed or drawn into the esophagus and then returned in a controlled fashion to set the PES into vibration. One clear aerodynamic difference between esophageal and laryngeal voice production is the total volume of air that is potentially available to power vibration of the voice source. The esophagus has the capacity to hold approximately 80 cc of air (Deidrich, 1968; Van den Berg & Moolenaar-Bijl, 1959) which is substantially less than the ~3000–5000 cc available in the lungs of adult men and women (Zemlin, 1997). Even though the esophagus may hold approximately 80 cc of air, the amount of air actually injected or drawn into the esophagus per insufflation attempt is substantially less. Stetson (1937) reported that about 3–5 cc of air was injected with each insufflation attempt during esophageal speech, while Snidecor and Isshiki (1965) reported values ranging from 5 to 16 cc. The reduction in the volume of air available or actually used for esophageal phonation can impact on parameters of ES speech production such as loudness, phrase length, syllables produced per esophageal insufflation, pause time, etc.

The use of the PES in ES speech also contributes to the aerodynamic changes reported for this form of alaryngeal communication. In laryngeal voice, air pressure beneath the vocal folds, referred to as subglottal air pressure, must be generated from the lungs to a magnitude that is sufficient to initiate and then sustain vocal fold vibration. The parallel to subglottal air pressure in ES speech is esophageal air pressure, i.e., sub-PES pressure, sometimes called subneoglottal air pressure. A summary of aerodynamic data related to the esophageal voice source is provided in Table 13.2.

Table 13.2 Values for aerodynamic parameters involving the voice source in esophageal speakers

Based on existing data, sub-PES pressure has been shown to be higher in esophageal voice for two of three available studies. Both Damsté (1958) and Ng (2011) reported pressure values ranging from approximately 10–70 cmH2O compared to 5–8 cmH2O in studies of laryngeal speakers. The elevated sub-PES pressure is attributed to the fact that the PES has greater mass and resistance than the true vocal folds. Values from Schutte and Nieboer (2002), however, are much more consistent with laryngeal voice data. It is not clear if this discrepancy with Damsté (1958) and Ng (2011) is due to a sampling, methodological, or instrumentation difference. Schutte and Nieboer (2002) did use transnasal insertion of a pressure sensor that passed through the PES, and it seems possible that the tube could have prevented complete PES closure. If so, this might decrease the pressures measured. In total, the data generally suggest elevated sub-PES air pressure in those who are ES speakers relative to subglottal air pressure for normal speakers.

In addition to the importance of air pressure to ES voice, the rate of airflow through the vibratory source during esophageal phonation is markedly reduced compared to laryngeal voice (see Tables 13.1 and 13.2 for comparative values). Laryngeal voice is generated with about 100–200 mL/s of airflow, while mean flow values for esophageal voice have ranged from 27 to 82 mL/s across studies (Isshiki & Snidecor, 1965; Motta, Galli, & Di Rienzo, 2001; Ng, 2011; Schutte & Nieboer, 2002). A combination of the increased mass and resistance of the PES and substantial limits in overall esophageal air available for esophageal phonation are the presumed causes for the reduced airflow through the PE segment. Elevated pressure below the PE segment and limited trans-PES airflow are believed to have resulted in an elevation of PE voice source resistance as reported by Ng (2011). Voice source resistance values in that study were approximately half an order of magnitude higher than the values reported for laryngeal voice.

Overall, the volume of air available for esophageal voice production is limited for each air insufflation of the esophagus. However, individuals who are proficient at ES speech can consistently and rapidly reload the esophagus with small volumes of air to produce increasingly fluent speech. Additionally, the PES provides higher resistance to airflow than the vocal folds do, causing high sub-PES air pressure. As a result, the primary focus of learning and using ES often centers on producing voice with limited effort and tension. The assumption in such a clinical focus is that it will be easier to get air into the esophagus, as well as easier to return air to start and sustain PE segment vibration (Snidecor, 1969). Interestingly, Ng (2011) included only participants who were carefully selected for inclusion in their study because of their “superior” ES speech skill. High air pressure below the PE segment , restricted airflow through the PE segment, and high voice source resistance were characteristic of those superior speakers. This indicates that lower pressures and resistance, and increased airflow, are not a prerequisite for good ES.

Tracheoesophageal Voice

The PES serves as the voice source in TE speech, as it does for ES. However, the lungs serve as the air supply for TE speech (see Graville, Palmer, & Bolognone, Chap. 11). Briefly, air from the trachea is diverted through a one-way valved prosthesis that is placed in a fistula in the common wall between the trachea and esophagus. When the tracheostoma is sealed, pulmonary air is diverted into the esophagus; when air pressure is sufficient to overcome the resistance of the PES, vibration is initiated. Because the lungs serve as the air supply for TE speech, this alaryngeal mode does not operate under the same degree of air volume restriction that is present in ES speech. However, the need to channel the pulmonary air into the esophagus through a small-diameter prosthesis introduces a degree of airflow resistance. That is, the cross-sectional area and length of the TE prosthesis as well as the hinged valve within the prosthesis all offer resistance to airflow. This allows for the possibility that airflows, pressures, and resistances might be altered in TE speech.

Table 13.3 provides a summary of the available literature detailing the aerodynamics of PES voicing that occurs during TE speech. Studies have varied in terms of the speech sample utilized, but most of the data on sub-PES pressure in TE voice indicates expected values between 13 and 44 cmH2O, on average. These pressures below the PES are higher than what occurs below the vocal folds in laryngeal speech. Two studies allow a comparison of TE to ES speech. Schutte and Nieboer (2002) assessed 18 participants who used TE speech, 5 of whom also used ES. Additionally, they included eight other participants who only used ES speech. The TE participants, excluding those who used both TE and ES speech, were found to use significantly higher sub-PES pressure than the ES speech participants when phonating on sustained vowels and CV syllable trains. For the within-speaker comparison of the five laryngectomees who could use both TE and ES speech, two exhibited significantly higher pressure below the PES when using TE speech, while the other three did not differ statistically between the two modes of alaryngeal voice. In contrast, Ng (2011) reported significantly higher sub-PES pressures for ES compared to TE speech. At present, there currently is not clear evidence of the existence of higher pressures required for voicing in one method over the other. Both TE and ES speech utilize higher sub-PES pressures compared to subglottal pressures in laryngeal voicing. Additionally, there are individual differences across TE speakers in terms of the pressures below the PE segment that are needed for voicing as exhibited in Schutte and Nieboer (2002).

Table 13.3 Values for aerodynamic parameters involving the voice source in tracheoesophageal speakers

With the exception of one study (Kotby, Hegazi, Kamal, Gamal El Dien, & Nassar, 2009), group mean values for average trans-PES flow rates in TE speech fall generally within the range of mean values for laryngeal speakers (see Table 13.1). Comparable flow rates between TE and laryngeal voice have most often been attributed to the use of the pulmonary air stream for both methods of voice production. When comparing TE and ES voice aerodynamics in Tables 13.2 and 13.3, the general trend which emerges is that trans-PES airflow rates in TE speakers are about twice the rate reported for individuals using ES speech. Again, two studies directly compared TE and ES participants using the same stimuli, procedures, and instrumentation. Ng (2011) and Schutte and Nieboer (2002) reported significantly higher trans-PES flow for the TE group. Recall that Schutte and Nieboer (2002) also had five participants for whom they could do within-speaker comparisons across the two alaryngeal voicing methods. Four of their five participants had significantly higher flows when using TE voice. Based on these data, a broad conclusion is that TE voice is characterized by higher trans-PES airflow compared to ES voice, although a given individual may not show this difference.

Resistance to airflow in TE voicing can occur at two levels: the PES and the TE voice prosthesis. Several studies evaluating the in vitro aerodynamic characteristics of various TE prostheses have been published in the literature (Belforte, Carello, Miani, & Staffieri, 1998; Chung, Patel, Ter Keurs, Van Lith Bijl, & Mahieu, 1998; Heaton & Parker, 1994; Hilgers, Cornelissen, & Balm, 1993; Miani et al., 1998; Smith, 1986; Weinberg & Moon, 1982, 1984, 1986). These are not reviewed here other than in summary fashion. The set of studies have established that the aerodynamic characteristics of a TE prosthesis vary depending on a number of parameters such as the prosthesis diameter, prosthesis length, position of the valve within the length of the prosthesis, type of valve, and flow rate used for the testing. TE voice prostheses can be selected that have been specifically designed to have greater or lesser resistance to valve opening depending on the needs of a particular patient (see Graville, Palmer & Bolognone, Chap. 11 and Knott, Chap. 12). What is clear is that the prosthesis itself offers higher resistance to airflow than does the normal open glottis. It also is important to note that the resistance of a given prosthesis to airflow is likely to change over its lifetime when used in vivo. In vitro studies have generally concluded that biofilm development increases the prosthesis’ resistance to airflow (Chung et al., 1998; Heaton & Parker, 1994; Heaton, Sanderson, Dunsmore, & Parker, 1996; Zijlstra, Mahieu, van Lith-Bijl, & Schutte, 1991). In contrast, Schwandt, Tjong-Ayong, van Weissenbruch, der Mei, and Albers (2006) evaluated prosthesis performance in vivo to compare new versus dysfunctional prostheses that had been influenced by the development of biofilm. They reported that biofilm development on the prosthesis created a reduction in airflow resistance. This might occur because of altered structural properties of the valve or changes in prosthesis opening and closing movements of the valve due to biofilm.

In summary, investigations of TE voice source aerodynamics indicate that air pressure below the PES is greater than subglottal pressures associated with laryngeal voice. The data are not clear, however, about whether air pressures below the PES are expected to be higher for TE compared to ES speech. A number of variables are likely to be influential on the pressures in these two speaker groups including speaker proficiency, speech stimuli utilized, the presence of PE tissue hypertonicity, as well as other factors. Average trans-PES airflow in TE speech is typically greater than what is documented in ES speech and similar to what occurs trans-glottally in laryngeal speech. Finally, resistance to airflow in TE voice production is increased compared to laryngeal speech, and this increase is likely related to elevated resistance associated with both the PES and the structural properties of the TE puncture voice prosthesis.

Artificial Larynx Voice

The EL voice is generated via excitation of the static air within the upper vocal tract via transmission of vibration through tissues of the neck or face or alternatively via a small-diameter tube placed within the oral cavity (see Nagle, Chap. 9). The power supply for the EL voice is battery driven, and voice generation occurs via a small piston striking a plastic plate. As such, air pressures and airflows are not part of the EL voice production process like they are for ES and TE speech. However, the pneumatic artificial larynx (also referred to as the Tokyo device) does operate on principals that parallel laryngeal and PE segment voice production. That is, an air pressure differential must be established to create air flows between or across a voice source capable of vibrating. The pneumatic artificial larynx uses a reed or flexible diaphragm (natural, plastic, metal, or rubber) as a voice source. This diaphragm is housed within a chamber that is external to the body. A tube running from the stoma to the chamber allows pulmonary air to serve as the driving force that sets the diaphragm into vibration. A second tube exits the chamber and runs to the oral cavity to deliver the voice signal into the vocal tract. Over the last 10–15 years, there has been a resurgence of interest in pneumatic artificial larynges as evidenced by increasing study of this alaryngeal method (Liao, 2016; Ng & Chu, 2009; Ng, Liu, Zhao, & Lam, 2009; Xu, Chen, Lu, & Qiao, 2009). However, these studies have focused almost exclusively on auditory-perceptual and acoustic parameters to the exclusion of associated speech aerodynamics.

An emerging possibility related to the traditional pneumatic artificial larynx is the development of a TE puncture voice prosthesis with a built-in sound producing element. In this approach, a membrane that can be set into vibration is housed within the prosthesis; as such, it might be described as a pneumatic artificial larynx. Early versions of the approach have been described by van der Torn, de Vries, Festen, Verdonck-de Leeuw, and Mahieu (2001) and van der Torn et al. (2006). Second-generation versions are described by Tack, Verkerke, van der Houwen, Mahieu, and Schutte (2006) and Tack, Rakhorst, van der Houwen, Mahieu, and Verkerke (2007). These second-generation devices have a double membrane lying within the body of a TE prosthesis; this membrane is set into vibration when pulmonary airflows through the prosthesis. The use of the device is described as being potentially beneficial to females who have had a laryngectomy. This suggestion is made because the membrane-based voice-generating prosthesis can attain higher fundamental frequencies than is possible with PES vibration. A higher fundamental frequency may be more appropriate, acceptable, and desired for the female alaryngeal speaker. Additionally, the device could allow for a pulmonary-driven sound source in laryngectomees who have a hypotonic PES that is not capable of vibrating. Tack et al. (2008) reported aerodynamic data for this kind of voice prosthesis for 17 females who had a total laryngectomy; all but 3 had hypotonic or atonic PES vibration and resultant tone.

The voice-producing element was inserted into the lumen of a Groningen ultra-low resistance prosthesis, and tracheal pressure was measured (Tack et al., 2008). Tracheal pressure serves as the force that sets the voice-producing element within the prosthesis into vibration. The tracheal pressures in these 17 females averaged 32 cmH2O on soft phonation attempts and 58 cmH2O during loud phonation attempts. These pressure values were comparable to the tracheal pressures measured in the same participants when wearing the TE valve without the voice-producing element. Airflow values were markedly lower with the voice-producing element inserted in the prosthesis, averaging 43 mL/s during soft phonation vs. 154 mL/s with the standard TE prosthesis, and 78 mL/s during loud phonation vs. 314 mL/s with the standard TE prosthesis. The fundamental frequency produced with the voice-generating prosthesis averaged 234 and 313 Hz for soft and loud phonation, respectively. These fundamental frequencies were notably higher than those reported when using the standard TE prosthesis, which were 66 and 87 Hz for loud and soft phonation. Overall, these frequency-based data appear promising for such a device to serve as an improved postlaryngectomy voice source option, particularly for females.

Articulatory Aerodynamics in Alaryngeal Speech

The aerodynamics of articulatory events in alaryngeal speech could be altered from two sources: (1) separation of the lower from the upper airway limiting availability of air for creating plosive elements and frication and (2) alterations to how the articulators are used after the larynx is removed. There are limited empirical data related to articulatory aerodynamics after total laryngectomy for any of the alaryngeal methods of communication. As a general rule, and regardless of alaryngeal method, a person is instructed to be more careful with their articulation in order to maximize intelligibility (Salmon, 1999; Searl & Reeves, 2014; van As & Fuller, 2014). However, in doing so care also is taken to not make speech appear or sound more unnatural. Such clinical instruction could reasonably be expected to alter the aerodynamics of articulation with any of the three alaryngeal communication methods. The sections below provide a summary of the available literature on the articulatory aerodynamics for ES, TE, and EL speech.

Esophageal Speech

Three of the four studies presented in Table 13.4 have reported oral pressure values during esophageal speech to be elevated during consonant production when compared to expectations for normal speakers. The lone exception was provided by Connor, Hamlet, and Joyce (1985) who reported oral pressure values for /t/ and /d/ that were generally within the 3–7 cmH2O range found in laryngeal speech. Two of three studies that included both voiced and voiceless consonants found that individuals using ES speech produced lower oral air pressures on the voiced cognates similar to what is evidenced in laryngeal speech (Connor et al., 1985; Gorham, Morris, Brown, & Huntley, 1996). Swisher (1980) was the exception, reporting comparable values across voicing feature. Overall, and despite some discrepancies, the preponderance of the data in the literature indicates that oral air pressure during pressure consonant production is elevated during ES compared to laryngeal speech. Additionally, and similar to normal speakers, an oral air pressure difference tends to be maintained between voiced and voiceless cognates.

Table 13.4 Articulatory aerodynamics by alaryngeal speech mode

There is some indication that oral air pressures may differ depending on ES speech proficiency level. Motta et al. (2001) divided their ES participants into a group judged perceptually to be “good” and another judged as “mediocre.” The good speakers generated significantly less oral pressure (mean = 40.5 cmH2O, SD = 5 cmH2O) compared to the mediocre group (mean = 57 cmH2O, SD = 16 cmH2O). It should be noted, however, that the oral pressures for both groups are still substantially greater than what occurs during laryngeal speech. Connor et al. (1985) compared oral pressure for /t/ and /d/ for one ES speaker with low intelligibility and another with high intelligibility. The high-intelligibility speaker demonstrated significantly lower oral air pressure. In contrast to Motta et al. (2001), however, both the low- and high-intelligibility participant in Connor et al. (1985) had mean pressure values that were well within the range expected of laryngeal speech.

Additional aerodynamic studies of articulation in ES speech are not readily available in the literature. Yet there is a suggestion from auditory-perceptual studies (Duguay, 1999) and cinefluoroscopic studies (Deidrich & Youngstrom, 1966; Struben & van Gelder, 1958) that nasals are more likely to be produced with the velopharyngeal port closed. This would result in no or limited nasal airflow on nasals. Deidrich and Youngstrom’s (1966) cinefluoroscopic data further revealed that participants judged perceptually to be “good ES” speakers had more complete palatal closure during a Valsalva maneuver compared to participants judged to be poor ES speakers. The authors suggested that poor palatal closure impedes acquisition of higher level ES speech abilities. In a review of clinical cases as well as the literature available at the time, Berlin (1964) reiterated that palatal weakness is associated with poor ES speech. However, aerodynamic data related to velopharyngeal function in ES has not been reported.

Tracheoesophageal Speech

Several studies have reported high oral air pressure on the bilabial stop, /p/, spoken by individuals using TE speech (Motta et al., 2001; Ng, 2011; Saito, Kinishi, & Amatsu, 2000; Searl, 2002, 2007; Searl & Evitts, 2004). In most of these studies, the pressures ranged from approximately 15–40 cmH2O, a value that is 3–8 times greater than expected in normal laryngeal speakers. Searl (2002) used an oral tube running in the buccogingival sulcus and around the last molar to allow measurement of oral air pressure on other consonants in addition to /p/; these included /t, d, s, z, ʃ, ʒ/. The measured pressures for the voiced consonants ranged from 13 to 17 cmH2O, while pressures on the voiceless counterparts ranged from 6 to 8 cmH2O. Overall, these oral pressures were higher than what occurs in laryngeal speakers for this extended set of consonants, although measures for voiceless consonants are much closer to normal laryngeal speech.

Oral pressures also have been recorded for the nasal phoneme /m/ produced by TE speakers in two studies. Searl (2007) found that pressures on /m/ were elevated to approximately 6 cmH2O, values comparable to the pressures on the oral phoneme /b/ from these same speakers. In laryngeal speakers, oral pressure on /m/ is expected to be quite low (1 cmH2O or less) because the velopharyngeal port is open. The interpretation of these data was that TE speakers in the study may have maintained some greater degree of velopharyngeal closure resulting in the elevated pressure on the nasal phoneme. However, in an earlier study, Searl and Evitts (2004) reported a group mean pressure of 1 cmH2O on /m/ in individuals using TE speech. This is equivalent to laryngeal speech and much lower than pressures recorded with the same instrumentation on different TE speakers in Searl (2007). It may be the case that there is variability across TE speakers regarding how they produce nasal phonemes.

The Searl and Evitts (2004) study is the sole report of nasal airflows in TE speech with data acquired for both consonants /m/ and /p/. Nasal flow values for /m/ were found to be at or above what has been reported for individuals without a laryngectomy, suggesting velopharyngeal opening by the TE speakers. There was essentially absent nasal airflow on the oral phoneme /p/ which parallels what occurs in laryngeal speech. Thus, the lone study of nasal airflow in TE speech during consonant production, which in this case is limited to /m/ and /p/, suggests that nasal airflow may not be substantially altered in TE speech.

Artificial Larynx Speech

There are no reports of articulatory aerodynamics for individuals using an artificial larynx in the peer-reviewed literature. Various textbooks make reference to the need for individuals to compress air intraorally, usually following an instruction for exaggerated or precise speech. The presumed goal of this instruction is to generate a strong burst of air or frication noise. Clinical descriptions from those working with individuals using ALs also often include comments about this type of clinical focus (Doyle, 1994; Duguay, 1983). Quantitative measurements of how articulatory aerodynamics are altered, however, remain lacking in the literature.

Conclusions

Aerodynamic characteristics of ES, TE, and EL speech are impacted greatly by the total laryngectomy procedure which separates the lower from the upper airway and removes the normal voice source. Pulmonary function after total laryngectomy is expected to be altered because of changes that are induced by the surgery and the reaction of the body to the surgery when inspired air is not warmed, humidified, or filtered to the extent that it was presurgically. Additionally, baseline pulmonary functioning prior to surgery has a high likelihood of being reduced if the person was a long-time smoker. The pulmonary changes can have both direct and indirect impacts on the aerodynamics of alaryngeal speech. The aerodynamics of the alaryngeal voice source vary across ES, TE, and EL methods. For ES and TE speech, elevated voice source driving pressures are expected. Likewise, resistance to airflow for the voice source in ES and TE speech is increased. Airflow also is markedly reduced in ES yet may be minimally reduced in TE speech. Although EL voice is not dependent on airflow for production, pneumatic artificial larynx voice is. Unfortunately, however, little is known about the pneumatic artificial larynges that have been on the market for many decades. Emerging work is occurring on a pneumatically driven voice source that can be inserted into a TE voice prosthesis. In terms of aerodynamics involved in articulation, an elevation in oral air pressure is commonly reported for ES and TE speakers; no data are available for EL speech. Furthermore, individuals using ES or TE speech tend to retain an oral air pressure difference between voiced-voiceless cognates. There is very limited aerodynamic data available on other aspects of articulation in alaryngeal speech. Continued gathering of data regarding articulatory changes in ES, TE, and EL speech is important because that information can help researchers and clinicians know what articulatory parameters change, the manner in which they are different compared to presurgical articulation, and the variability to expect in an articulatory parameter within and across alaryngeal speakers. Additionally, more detailed information on articulatory changes is important for developing effective therapeutic approaches and for establishing reasonable treatment goals for alaryngeal speakers.