Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The human voice—those mysterious vibrations that come out of our mouths and enter our and other people’s ears when we speak, sing, hum, cry, cough, or clear our throats—is commonly understood as a sound that represents the person who produces it. What appears on a physical level merely as oscillating air molecules is hereby interpreted as providing the listener with intimate information about the individual from whose mouth the voice emerges. Take as an example the everyday situation when someone we do not know calls us and instantaneously an image of the characteristics of the caller appears in front of us as if this image were propelled from the depth of the sounds reaching our ears. According to the popular science book, The Human Voice: How This Extraordinary Instrument Reveals Essential Clues About Who We Are, which draws on the results of scientific research, the voice is indeed capable of betraying even those of the speaker’s personal characteristics that are normally kept from view:

[T]he moment we open our mouths and start to speak … our voice is doing something terrifyingly intimate—leaking information about our biological, psychological, and social status. Through it, our size, height, weight, physique, sex, age and occupation, often even sexual orientation, can be detected. The voice is a stethoscope, and transmits information not only about anatomical abnormalities but even illnesses. [1]

How are we to understand these instances of vocal self-expression, in which the voice appears to communicate the details of a person’s uniqueness “without even the mediation of articulate speech” [2]? Is it valid to conceptualize the voice as a “stethoscope,” that is, as an examination device that is capable of listening in to and transmitting to others “what we are, what we believe and how we feel” [3]? Or are there indications that the relationship between our voice and the biological, psychological, and social aspects of our subjectivity is more complex than we spontaneously assume, so that a detailed investigation of this relationship is necessary?

In this chapter, I will take a closer look at one aspect of those personal attributes we tend to perceive as if they were contained in and conveyed by the voice and critically examine the assumption that “voice has a sex” [4] or that it reflects a person’s gender [5]. Common sense easily persuades us that when it comes to something like our perception of male-female differences in voice pitch, we must be dealing with a natural phenomenon, something “caused” by basic anatomy and physiology. This understanding of the relationship between notions of voice, sex, and genderFootnote 1 is often taken for granted both in everyday life and in the scientific literature, and it forms the basis for medical diagnosis and treatment of people who experience problems with the communication of gender.

In the following, I will subject this conventional perspective to close scrutiny by raising a series of questions: How do voices become gendered? Does the voice have a sex in the sense of a biologically determined attribute? Do speakers and listeners invest voices with gendered meanings in interaction by using their voice organs in particular ways and interpreting certain aspects of the sound they hear as either “male” or “female”? Or is the gendering of voices a transient outcome of meaning-making practices that are regulated by historically and culturally variable stories about bodies, sex, gender, and communication that are beyond individual control? I will conclude by suggesting that an appreciation of different theories about the relationship between voice and gender provides us with an opportunity to become aware of an unheard of diversity of human bodies, identities, and voices and prompts us to reconsider how we habitually explain what we regard as the successful or unsuccessful communication of gender.

8.1 The Commonsense View/Medical Perspective

The commonsense view of sex differences of voices as “caused” by differences in physiology translates into medical practice. Many medical voice specialists, who may be laryngologists, phoniatricians, ENT surgeons or speech-language pathologists, and nonclinical scientists subscribe to a theoretical perspective according to which “the physical distinction between men and women dictates … that a speaker’s gender can be easily determined on the basis of voice” [6]. How is this claim of the biological determination of the voice’s gender explained in the scientific voice literature?

According to the medical voice literature, on which I will focus my examination, the voice originates from the “voice organ” in the speaker’s throat. The voice organ consists of the larynx or voice box, which houses the vocal folds, and of the vocal tract, which comprises the space above the larynx between the vocal folds and lips and includes the throat, mouth, and nasal cavities. Processes of sexual determination are seen as causing the “sexing” of the human body and of the voice organ as one of its components. Sexual determination is understood here as a complex double-tracked development, in which the presence of XX or XY sex chromosomes leads to the formation of female or male internal sex organs that are capable of producing female or male sex hormones, which are responsible for the shaping of female or male voice organs and voices during puberty. As Abitbol and Abitbol put it: “In the girl, estrogen and progesterone secretion will lead to a woman’s voice. In the boy, testosterone will yield a man’s voice” [7]. Importantly, the notions of a “woman’s” and a “man’s” voice are here understood to reflect the consonance of a biologically female (or male) body and a female (or male) gender identity (a person’s sense of being a man or a woman). According to the (mainstream) medical profession, gender identity is regarded as one of the results of the sexual differentiation of the brain and is therefore seen as following from sex. In other words, the link between XX chromosomes and female gender identity and between XY chromosomes and male gender identity is understood to be biologically determined [8].

The claim that the voice has a sex (and gender identity) is based on an understanding according to which the sexual characteristics (or size) of the voice organ determine the gender (or pitch) of the voice as an acoustical event: The bigger “male” voice organ is seen as naturally inclined to produce a lower-pitched “male” voice, whereas the smaller “female” voice organ produces a higher-pitched “female” voice. As Coleman explains:

Perceptions of a speaker’s vocal “pitch,” and subsequently the maleness or femaleness of his voice, … result from the combining of the information conveyed by both the speaker’s F0 [=fundamental frequency of vocal fold vibration] and resonances of his vocal tract. [9]

According to this perspective, it is the length and mass of the vocal folds and the dimensions of the vocal tract that are regarded as mainly responsible for the gendering of the voice as sound: As an effect of higher testosterone levels male vocal folds are longer, thicker, and heavier; provide more resistance to being blown apart; vibrate more slowly with a bigger amplitude; and produce lower-pitched sounds than the shorter, thinner, and lighter female vocal folds (see, for instance, [10]). Additionally, male adolescents experience a greater increase in vocal tract volume than females during puberty, which leads to a lowering of vocal tract resonance frequencies, also contributing to the perception of a lower pitch than in females (see, for instance, [11]).

Following the concept of the naturally sexed voice, the sexual characteristics of the voice organ, which are regarded as biologically determined, ensure that the voice is already gendered as it passes through the voice organ and before it emerges from the speaker’s mouth. As a consequence, both speakers and listeners are positioned as uninvolved in the gendering of the voice. For irrespective of the speaker’s vocal behavior and irrespective of the outcome of listeners’ perception and interpretation of what they hear, the fixed anatomical dimensions of larynx and vocal tract are taken to have already determined the voice’s gender.

8.2 Challenging the “Natural Binary”

The data used in the medical voice literature to provide evidence about the sex-specific anatomical dimensions of human voice organs are, as a rule, either derived from cadaver studies, computer tomography, or acoustic reflection studies, in which people are asked to remain motionless so that accurate measurements can be taken. In the living human being, however, voice organs are flexible apparatuses that are mostly made up of pliable cartilages, muscles, connective tissues, and mucous membranes and only to a smaller extent of rigid bones. During voice production, we move these structures in order to produce particular speech sounds, pitch levels, and voice qualities. This is to say that, irrespective of whether the voice produces sound in the form of speaking, singing, or other utterances, it is necessarily a production that is not so much shaped by fixed anatomical dimensions of the larynx and vocal tract but rather by the way the speaker or singer moves and shapes his/her voice organ.

When we gesture with our larynx, vocal folds, and vocal tract, we change both the dimensions of our voice organ and the characteristics of the sound waves emanating from our mouth. For instance, we can employ two antagonistic muscles in the larynx to modify the length and mass of the vocal folds, which leads to a change in the fundamental frequency of vocal fold vibration, which, in turn, is perceived by listeners as a change in voice pitch.Footnote 2 As Baken and Orlikoff emphasize, “[s]peech is not usually monotonous: the normal speaker uses a range of fundamental frequencies to indicate word and sentence stress, statement form and affective content” [12].

The voice’s variability (as an organ and external object of audition) is even more obvious in singing. While average singing ranges for adults have been shown to range from 2 to 3 octaves (see, for instance, [12, 13]), some singers are capable of extending their vocal range to up to 8 octaves and of reaching pitches far beyond what is considered “normal” for their voice type.Footnote 3 These observations suggest that the limitations the anatomical dimensions of the larynx and vocal folds impose on the fundamental frequency range of the human voice can be regarded as negligible. If we consider additionally that the normal singing ranges for men and women overlap considerablyFootnote 4 and that the average speaking fundamental frequency of 160 Hz, which is used in clinical practice as a “gender-dividing line,”Footnote 5 lies well within both of those frequency ranges, the following becomes apparent: The difference in average speaking fundamental frequency, which is regarded in the medical literature as “[t]he most accepted difference between male and female voices” [14], seems to be the result of vocal behavior rather than of biological constriction. Additional evidence that this difference might be the effect of learned behaviors rather than of “biophysical inevitabilities” [15] comes from research showing that the average speaking fundamental frequency values for men and women and the extent of the observed gender difference vary between language groups and cultural contexts. Simpson reports, for instance, on the results of a study about a dialect of Chinese that found an average fundamental frequency of 170 Hz for male speakers and of 187 Hz for female speakers (difference of 1.7 semitones [ST] and notably both above the “gender-dividing line”) and on studies that found male and female averages of 127 and 186 Hz, respectively, for English speakers (difference: 6.6 ST) and averages of 118 Hz for men and 207 Hz for women who spoke French (difference: 9.7 ST) [15].

Just as we can actively adapt the length and mass of our vocal folds, we can also modify the dimensions of our vocal tract by moving our larynx up and down in our throat and stretching our lips widely or protruding them. These articulatory activities, which lead to a change in the resonance frequencies of the voice, have been observed, for instance, in preadolescent children who “learn elements of vocal tract gesturing in order to produce gender-typical voices within a short time of beginning to enunciate” [16]. Based on studies showing no differences in the anatomical dimensions of the voice organs of preadolescent girls and boys, we are thus led to assume that children can conform in their voice production to gender models they choose to imitate independent of anatomical possibilities or restrictions. As Delph-Janiurek remarks, these observations taken together point to “the lack of uniform, universal differences between the voices of women and men … [and] suggest that voices themselves are stylized and performed to a far greater degree than is commonly assumed” [16].

Following the view that a voice’s gender is the result of a behavior or a doing rather than a person’s biological characteristic, not only the speaker or singer but also the listener is seen as actively involved in the gendering of the human voice. For if we take a closer look at the processes involved in listening, it becomes apparent that the voice is subject to continuous metamorphosis once it comes out of a person’s mouth. Rather than being equipped with a stable existence that could be measured and compared to normative values, the voice appears as a chameleon-like creature. What emerges from our mouths as a clutter of traveling sound waves is at first transformed from an acoustical to an auditory event when it produces a sensation in the listener’s ear. These auditory sensations are then put in order with the help of processes of perception: Irregular vibrations are discerned from regular vibrations, high-pitched sounds are distinguished from low-pitched sounds and skilled listeners may differentiate various types of voice quality and speech melody. In further steps, we attach meanings to the perceptual categories we have created and might call pleasant sensations “sound”; unpleasant sensations “noise”; high-pitched, melodious, and gentle sounds “female”; and low-pitched, monotonous, and forceful sounds “male”Footnote 6 or follow idiosyncratic interpretation processes.

Several accounts in the research literature indicate that listeners’ classifications of voices as female or male are not necessarily predictable from or in agreement with a speaker’s sex or gender identity. Hall reports, for instance, on a biologically male phone-sex operator who successfully poses as a female before his customers by imitating several aspects and versions of cultural stereotypes of vocal femininity [17]. Studies of listeners’ reactions to voice samples of male-to-female transsexual speakers (see, for instance, [18]) (who are defined as presenting with a “male” voice organ and a “female” gender identity) and female-to-male transsexual speakers (see, for instance, [19]) (who are defined as presenting with a “female” voice organ and a “male” gender identity) provide further evidence that neither the anatomical dimensions of a person’s voice organ nor their gender identity determine whether a voice will be classified as female or male.Footnote 7 Rather, these and other reports indicate that listeners may even have diverging understandings of what constitutes vocal masculinity or femininity and therefore don’t necessarily agree on how they attribute gender to voices (e.g., [20]).

What this suggests is that once the voice has left the confines of the voice organ, its meanings are no longer controllable by the speaker’s anatomy, identity, behavior, or intentions but are reconstructed by sensation, perception, and interpretation processes taking place in the listener, who may draw on conventional or unconventional understandings of gender. The voice’s gender is thus seen as “constituted in interaction” [21] between speaker and listener and appears to be a social accomplishment rather than a natural given. The results of these social doings may prove unproblematic (in case speaker and listener agree in their gender attribution) or entail calls for strategies to repair misunderstandings that occur when speaker and listener diverge in their constructions of the voice’s gender.

8.3 How Voices Become “Appropriately” and “Inappropriately” Gendered

Some theorists go even further in their challenge to the concept of the naturally sexed voice and argue that how the sex of our bodies is classified at birth, how we position our identities along gender lines, how we gesture with our voice organs, and how we interpret the sounds we hear are neither governed by biological forces nor under the conscious control of the individuals involved in a conversation but instead formed by stories (or “discourses”) about bodies, sex, gender, identity, and communication that are circulated among human beings (see, for instance, [16, 22]). In this view, voices become gendered as a result of meaning-making practices, which are seen as shaped by norms and expectations that are prevalent in the historical and cultural context in which interaction partners find themselves.

Such an understanding of voice and gender as “discursive products” [16] draws attention to the tendency in both common opinion and academic discourses to construct notions of sex, gender, and voice as if they each occurred in two and only two mutually exclusive versions, male or female. This is the case despite research findings showing that deviations from this model have been found not only in relation to other time periods and cultural settings (see, for instance, [23]) but also appear as regular entries in contemporary international medical classification lists [24]. Under the heading “congenital malformations, deformations, and chromosomal abnormalities” one can find, for instance, that sex chromosomes in humans don’t come only in two but in several versions (such as X0, XXX, XXY, XYY sex chromosomes), that the adrenal glands of a person with XX chromosomes may produce higher amounts of testosterone than what is normal for a female, that the body of a person with XY chromosomes might not be receptive to the testosterone it produces, and that babies may be born with testes and a vagina or with a vagina and no uterus and ovaries. Moreover, the section entitled “gender identity disorders,” listed under “mental and behavioral disorders,” indicates that there are children, adolescents, and adults who don’t feel comfortable in the confines of the sex category that has been attributed to them at birth and who present with a gender identity that doesn’t follow from their biological sex (see, for instance Bockting [25] for a list of varied self-descriptions of gender identity taken from a national survey of the US transgender population).Footnote 8

Accordingly, the possibilities of communication behavior, and the way human beings gesture with their variously shaped bodies in order to perform their diverse gender identities, are not restricted to the two patterns that are commonly taken for granted. While this diversity is excluded from consideration when the “normal” human voice is discussed, it is partly reflected in clinical terms invented to refer to voices that transgress the normative ranges of the biological male or femaleFootnote 9: People whose vocal folds deviate—due to “sexual hormone imbalances” [26]—from the size that is considered “normal for their sex” are diagnosed with “androglottia” or “gynecoglottia” [27]. Adolescents who persist in producing a high-pitched voice despite the presence of a “normal male voice organ” are diagnosed with “puberphonia,” [28] and in cases where, despite “unambiguous genotypical and phenotypical sex determination there is evidence of a mental sense of belonging to the other sex” [29], people are diagnosed with “gender dysphonia” [30], a voice disorder in which the voice’s sexual characteristics are at odds with the speaker’s gender identity.

The prevalent preconception of sex, gender, and voice as binary oppositions thus produces notions of “appropriately” gendered voices “that cohere with hegemonic, normative prescriptions of gender” [16] and of “inappropriately” gendered voices that deviate from the ideal of the unambiguously male or female voice. If we take a look at textbooks for voice clinicians, we can imagine that the tendency for people to fashion their voice production according to contemporary and local ideals of femininity or masculinity might be compelled by the threat of pathologization that looms as soon as deviations from these norms are detected: As a rule, a “disorder of sexual development” or “intersex condition” is attributed to “women with manly larynx … and men with womanly formed vocal chords and womanly voice production” [27], a “problem of sexual identification” [31] is seen as causing puberphonia in male adolescents and people who don’t identify with the sex that has been attributed to them at birth are diagnosed with “transsexualism” or “gender identity disorder,” which is regarded as an incurable mental health condition [32]. Another example demonstrating that societies ascribe great importance to communication practices that conform to gender ideals is the vocal coaching of male politicians, which aims at eliminating “effeminate” speech habits and encouraging unambiguously “masculine” forms of vocal self-presentation. As Delph-Janiurek (1999) reports, George Bush is a famous example of a politician who was asked to undergo training for voice masculinization.

According to Hirschauer [33], not only speakers but also listeners have an interest in contributing their share to the unproblematic communication of gender in interaction, for the correct detection and attribution of a speaker’s gender is considered an everyday competence of conversation partners and addressing a speaker with the wrong title or pronoun is regarded as embarrassing.Footnote 10 However, if we take a look at listening practices, it becomes apparent that we cannot hear “maleness” or “femaleness” when we listen to someone speak or sing but are merely capable of discerning different pitch levels, sound qualities, and speech melodies. It is only due to the commonsense expectation that human beings fall naturally into two mutually exclusive categories and that the voices of the members of one group sound unmistakably different from the voices of the members of the other group that most people habitually conflate the auditory perception of voice characteristics and attributions of a female or male body and identity and create unequivocal categories of “female” and “male” voices.

While scientific findings show no significant differences between the acoustical properties of the utterances of newborn girls and boys [4], voices are perceived as male or female right from the start of our lives, when people hear even a baby’s cries as sexed sounds and ask themselves “what does she need?” or “is he hungry again?” This normative arrangement of bodies, identities, and voices into two groups is repeated over and over again, for instance, by talking to or about children, adolescents, and adults with words that have sex-specific meanings (for instance, “boy,” “girl,” “he,” “she,” “Sir,” “Madam”) and by ticking one of the two gender boxes that are provided on official forms and documents, which are used to gather personal data. These classification practices reenact the medical sex attribution at birth and contribute to an ongoing affirmation of the expectation that gender identity always follows from sex and that both sex and gender occur in only two versions.

According to the discursive perspective, the gendering of voices is theorized as being the result of a habit of performance and interpretation that is suggested and reinforced by a cultural order that acknowledges only biological males and females as “normal” human beings. While this cultural order is produced and maintained by the communication practices of individuals—speakers and listeners alike—it is a system of rules that exceeds an individual lifetime and that is implemented in so many different forms and by so many different people that its effects are regarded as beyond individual control.

8.4 Conclusion

Both the theory of doing gender and the perspective that emphasizes the effects of discourses on the production of gender make important contributions to a reconsideration of how the relationship between voice and gender is traditionally conceptualized. These theories, along with empirical findings showing variation in both biology and performance, suggest a move away from the concept of the naturally sexed voice organ and voice toward an understanding of the communication of gender as being performed through complex meaning-making practices to which individual speakers and listeners contribute but which they cannot control. The value of revising the concept of the naturally sexed voice is not restricted to a mere intellectual bauble for academics but extends to the everyday experience of any human being who engages in communication and social interactions.

If, for instance, we took seriously the suggestion that we should understand voices as “auditory combinations of the physiological and the discursive” [16], we would no longer think that an individual speaker’s physical characteristics or behavior patterns or an individual listener’s perception skills can be held responsible for situations in which the communication of gender fails and a conversation partner is addressed as a member of a gender grouping to which he/she feels no belonging. Rather, we would think of such situations—in which the speaker’s and the listener’s contributions to the production of gender in interaction diverge—as the standard outcome of the complex and variable processes that take place when we talk to each other and try to make sense of who we are. If we further acknowledged that deviations from the model of the naturally sexed body and mind can and do occur at all levels of sexual determination and that the structures and processes that make up the various notions of the gendered voice don’t appear only in two kinds and don’t necessarily follow the models of the ideal female and male, we would make room for an unheard of diversity of human bodies, identities, and behavior that demonstrates the multiform ways in which gender can be embodied and emphasizes the various meanings the notion of gender can assume.

I conclude by suggesting that instead of striving for speaking and listening practices that are oriented toward an alignment with the ideals of unambiguous maleness or femaleness, we would be better served by considering the following ideas raised in the forgoing discussion: A continued repetition of the myth of the naturally sexed voice leads not only to the consolidation of a narrow concept of sex and gender but also to a restriction of who we can be and become in human encounters. By learning to think, speak, and write differently about the relationship between voice and gender, however, we contribute to a change of meaning-making practices, which will facilitate the gradual replacement of the distinction between “appropriately” and “inappropriately” gendered voices with an understanding that the notion of “normality” is an ideal that cannot be embodied or secured by anybody.