Keywords

When I was in my 20s, a series of lucky breaks resulted in my becoming a hired guitar-slinger in the Chicago music scene. I played in an opening band for blues legend B.B. King, and then found myself playing with other great blues musicians—Koko Taylor, Buddy Guy, and Bo Diddley, to name a few. For a time, Bo Diddley hired me whenever he played in Chicago.

Before my first gig with Bo, I spent a full week of intense preparation, learning and rehearsing his songs. I had hoped to rehearse with him before the gig, or at least discuss the songs I had practiced. I was young. But he only arrived at the venue minutes before the performance, and I met him as he walked on stage in front of a screaming audience. He scarcely noticed me and the other band members, and just plugged into his amp, launching into a loud, rhythmic riff on his trademark rectangular guitar. He never bothered to tell me what song we were playing, what chord changes were coming, what key we were in, or anything. But, as every blues and jazz musician knows, that is how it goes. Bo and the other greats I played with often worked this way. It was a hair-raising on-the-job education. These musicians never told me what was coming next, partly because they did not know themselves. They were masters of the art of improvisation.

To play music in the oral tradition—the oldest system of music learning—I had to fumble to find the chord we were playing. That usually told me the key. Sometimes I could assume a certain chord progression and scale, but not always. Then I had to watch the bandleader like a hawk, for subtle cues—this tilt of the guitar means I solo, that slight bend of the knees means bring the dynamic down, this sudden jerk of the upper body means break, or stop. All forms of folk music performance are like this (e.g., Celtic, Polka, Cueca, Cajun, Klezmer). My friends who play classical Indian music and classical Arabic music also learn the melodies, rhythms, and emotional aspects by this apprentice method. The term “oral tradition” is shorthand foraural/oral/visual/kinesthetic tradition. We emulate and simulate masters, until very slowly we become masterful music practitioners ourselves. This is the ancient tradition of musical learning, and it is a sophisticated form of embodied cognition.

Of course, specializing in music is a relatively modern phenomenon, common in larger hierarchical societies. If a subset of a cooperative community can dedicate large parts of their day to practicing and performing music, then it might reveal something about the prosperity of the group. When Plato is building his utopia in The Republic, he argues that artists arise (with warriors) only after luxury and leisure have been introduced into the previously austere community. Contrary to this notion of music as the product of leisure, there is plenty of empirical evidence that music thrives in economically challenged communities—indeed some of the best music is part of the therapeutic response to hardship. Moreover, small-scale societies may not have a “professionalization” of music per se, but everyone hums, sings, dances, drums, and so on. Some songs are memorized age-old badges of community membership, but most spontaneous music is improvised.

Improvising, in music, is the act of composing and performing simultaneously. It is easy to experiment and play (especially vocally), and difficult to master (especially instrumentally). But it is also universal, and despite the powerful human impulse to plan and program, improvisation is integral to nearly every aspect of our lives. Improvising is a style of thinking generally. It investigates and helps us come to know the world not by theory but by a method of simulation—observing, listening, acting. I would argue, in fact, that it is the most fundamental form of human cognition, one that must have evolved long before deductive and inductive logic, when the first humans began developing the skills needed for their survival in an untamed environment.

In this essay, I want to explore the cognitive roots of music generally, from multiple perspectives, like performance, composing, and audience listening. I want to do this through the explanatory lenses of development and evolution (ontogenetic and phylogenetic approaches). I will try to give a broad review of the available approaches to music, but this chapter is not a review of the literature per se. I have a somewhat didactic agenda, because I think music making and music consuming represent primordial forms of enactive cognition that are still alive and well within the modern mind.

Music, especially when considered as “beautifully useless,” has posed certain challenges to evolutionary explanation. Traditional aesthetics did not imagine music along the lines of utility, but focused on its intrinsic rather than extrinsic value. Arthur Schopenhauer, for example, argued that music and art generally was to be exalted because it was one of the few human projects that did not serve the will, or human craving (appetitive urges). To our contemporary ears this may sound overly romantic or idealist, but there is something true here too. Ultimately, the challenges of reconciling music and evolution are being met, and a rich melodious explanation is slowly emerging. It seems increasingly clear, for example, that our species probably sang before we spoke (Dunbar, 2002; Mithen, 2007; Schulkin & Raglan, 2014). But the origins, functions, and even the semantics of music do not submit to simple adaptationist just-so stories. Even Steven Pinker (1997) seemed flummoxed by the adaptive value of music and suggested that it may just be a happy accident or byproduct (“auditory cheesecake”) of more general cognitive adaptations (like language).

Darwin (1872) speculated that music originates in the vocalizations of male Homo sapiens trying to attract and woo females. Just as song birds, Passeriformes, advertise fitness and initiate reproductive opportunities, human males—according to Darwin—adjusted vocalizations in order to appeal to females who would in turn choose the musical man over his competitors (sexual selection). From this function of sexual selection, all musical variations eventually emerged, including the songs of joy, sadness, anger, and the complex forms of virtuosity.

Herbert Spencer (1890) respected Darwin’s acumen generally, but thought him overly reductionistic on the topic of music. How, Spencer wonders, could romantic cooing generate the full range of musical expression? And bird song itself cannot be reduced to sexual selection either, since it seems equally implicated in territory defense. Spencer argues instead that music originates in the spontaneous overflow of energy that we feel under certain peak psychological experiences. I will call this the “blurt theory” of musical genesis. When we are overwhelmed with sexual attraction, or grief, or rage, we have a “tendency to superfluous expenditure in various forms of action—unusual vivacity of every kind, including vocal vivacity” (Spencer, 1890, p. 4). Music starts as a purposeless byproduct of extreme emotion. But as we evolve, we give higher cognitive flavor to the essential emotional elements. Mozart, Beethoven, and others become musical geniuses, according to Spencer, when they achieve decorative sophistication (through harmony and counterpoint, for example) but retain the primordial emotional meanings. In the end, citing the deep immersion of indigenous musicians who do not experience the disinterested detachment of sheet music notation (which divides the literate musician’s attention), Spencer claims that music is the language of the emotions. More recent work in affective neuroscience, which I discuss below, appears to confirm some of Spencer’s general views (Asma & Gabriel, 2019; Damasio, 2000; Panksepp, 1998). But first, let us consider the dim evidence from deep time.

1 Archaeology

Obviously, music—especially vocal music—does not fossilize, and it does not leave physical traces. Ancient images of musical instruments and performances are helpful for understanding the social context and even playing techniques of instruments (Both, 2009), but unlike stone tool technology, it is very hard to get a clear picture of music before pictorial representation and written forms. The oldest written music notation may be the ancient Mesopotamian “Hurrian Hymn,” which is a cuniform tablature for lyre, dating around 1400 BCE. Chinese drums made of animal skins date back as far as 5000 BCE. And Upper Paleolithic bone flutes are around 40,000 years old. Vocal music and drumming on natural objects could be much older, possibly even pre-sapiens, or even pre-Homo. There is no reason to think that music emerged suddenly in the Upper Paleolithic along with flutes and pictorial representation.

Small group, family-level Homo sapiens of the prehistoric age satisfied the necessities of life with simple technology based on hunting, gathering, and plowless agriculture. Their social and economic structure was likewise relatively simple and depended on the immediate task at hand rather than on status hierarchy. Yet we find evidence of aesthetic behavior far back in prehistory. Over 160,000 years ago (kya), we observe the construction of grindstones and pigments in the material culture of Homo sapiens. Around 70 kya we find remnants of ornamental beads, presumably for social purposes. In the Aurignacian cultural period (35–25 kya) of the Upper Paleolithic (40–11 kya) we find the famous instances of cave images (McBrearty & Brooks, 2000). Approximately 300 sites of Paleolithic parietal art of this period reveal representation in painting, engraving, sculpture, jewelry, as well as fragments of bone, antler, and ivory with patterned markings, and notably, flutes made of vulture bones (Clottes, 1996, 2016; Vialou, 1996). While prehistoric art served many purposes, including mythic and aesthetic functions, the relation between art and experience in the evolution of social technologies seems to indicate the engendering and memorializing of spiritual emotions (Asma & Gabriel, 2019).

In southern Germany, at Hohle Fels cave, flute fragments were discovered that date back 42,000–43,000 years ago. There is debate as to whether Neanderthals had flute technology, or whether it is a strictly Homo sapiens instrument. A cave in Slovenia contained a “flute” made from a cave bear femur, dubbed the Divje Babe flute. It was thought to be played by the Neanderthals who lived in that region, but in 2015 some archaeologists argued that the “diatonic scale” holes were merely accidents of hyenas chewing on the bones, punching serial hole patterns that we misinterpreted (Diedrich, 2015).

Ambiguity is very common in trying to assess prehistoric music technology. In England, three decorated cylinders made of local chalk, called the Folkton Drums, were found buried in a child’s grave from around 3000 BCE. Archaeologists cannot even agree if these are musical drums, or just drum-shaped decorative objects (Longworth, 1999).

An alternative approach to understanding prehistoric music is fraught with challenges and arguable assumptions—namely cross-cultural comparative study of modern human tribes, indigenous or small-scale social groups. How, we might ask, are modern Sioux tribes, African pygmy groups, Australian Aboriginals, Yupik tribes, and so on, making and using music? The subsistence lifestyles of such modern indigenous peoples replicate the hunter–gatherer conditions of the Upper Paleolithic period and the early agricultural conditions of the Holocene, but there is no guarantee that current practices recapitulate ancient ones. Still, good work has been done in this area (Killin, 2018; Morley, 2013), and some of the functional generalizations below rely in part on some of this comparative research. At the very least, such research provides us reliable insight into the possible uses of music.

The ephemeral nature of oral-tradition music is especially frustrating for historians and evolutionists because storytelling songs in particular act as the “cloud storage” for the wisdom and the adventures of a culture (Gioia, 2019). If you wanted to know about a people or even a specific person within a culture, you consulted the singer—who was also a kind of shaman. In the Old English epic poem Beowulf, for example, we learn that one of the greatest honors that can grace a hero (and one of his principal motivations) is to have songs sung about him—songs that will go on after he is long dead. The tradition of singing your tribe’s cultural achievements can also be seen, for example, in the African griot. The griot is a West African poet, singer, musician, who acts as a repository of local knowledge, an entertainer, and an adviser to power. Their function was reconfigured in American music history, in the form of the Blues and Gospel singers of the Jim Crow era.

Songs do not just contain information and historical legends, they also transmit values. They explicitly or implicitly teach norms and social mores, having to do with sexuality, filial duty, war, and so on. From time immemorial, for example, songs have been about sex (Carpentier, 2014). Fertility festivals and seasons would have been driven in large part by songs, and even a recent study of Billboard top-ten chart songs revealed that 92% of the songs refer repeatedly to sex (reproductive phrases) (Gioia, 2019). When I toured with bluesman Buddy Guy, he used to shout encouragement to the band, “Let’s make it so funky they can smell it!” When asked to explain what funk music is, James Brown once said, it’s music that smells like sex.

I will not spend much time discussing song lyrics, even though they are obviously of paramount importance. This is because lyrics are a subset of storytelling behavior, and excellent scholarship has been done and is being done on the evolution of storytelling (Carroll, 2011; Gottschall, 2013). Instead I will tend to emphasize the uniquely sonic, melodic, harmonic, aspects of music, as they relate to experiences and functions.

2 Adaptation

If music is not auditory cheesecake, then it has been selected for because it gave some advantage to music producers and consumers. The mechanisms of evolution have broadened recently, since the decline of strict Neo-Darwinism in the twentieth century. Most importantly, many evolutionary researchers (Henrich, 2017; Richerson & Boyd, 2005) have argued persuasively for gene-culture coevolution. Cultural inheritance is profoundly important for human survival and cultural traits can be selected and sustained over thousands of generations, but may not be genetically significant. Behavioral changes (e.g., fire-starting, hunting technique) and even anatomical changes (e.g., training offspring for large leg-muscle mass, cultural dietary traditions) can be taught and passed down culturally to offspring, giving them significant survival advantages, when there is no specific dedicated genetic circuitry for those specific traits.

We are certainly pre-adapted for music, as I will discuss below, because we have the cognitive and biological architecture for making it, but music is also a clear case of cultural inheritance. And it can be advantageous for the individual organism, as well as the group, so levels of selection are varied. The means by which music replicates is also varied. It has horizontal spread across contemporaries in a community, as a prehistoric campfire song and a pop-music radio hit can catch fire and spread to many people sharing the same real-time lifespans. But it also has vertical transmission, since elders teach youngsters songs, styles, and techniques, carrying on musical traditions for centuries. Rather than “genes for music” it is probably the case that music learning is downstream from our cognitive capacities for social learning (Heyes, 2018).

So, what are the selectable uses and benefits of music? Music is a form of psychological catharsis and emotional management, a form of communication, a form of recreation, a form of social bonding, and a form of spiritual cultivation and communion.

There is increasing evidence that musical training improves our powers of attention (Medina & Barraza, 2019). The more musical training you have, the more you are able to shut out distracting irrelevant information while performing a demanding task. Musical training at a young age has also been shown to prevent speech hearing reductions common to people as they grow older (Bidelman & Alain, 2015). The musical activities of youth constitute a kind of brain training that contributes to greater neural plasticity, and this remains advantageous much later in life. Children who play music also show greater abilities of emotional control, and diminished anxiety (Hudziak et al., 2014). People who can play a musical instrument also show greater connectivity between the hearing and motor control areas of their brains. Additionally, instruments that require two hands, like piano or guitar, create greater motor autonomy between the two hands generally (Palomar-García, Zatorre, Ventura-Campos, Bueichekú, & Ávila, 2017). Some studies have shown that people with musical training are better at hearing and identifying plaintive cries buried within noisy environments (e.g., a baby crying), suggesting adaptive social uses for musical sensitivity. Music has also been shown to provide powerful analgesic effects, reducing the need for post-operative painkillers (Bernatzky, Presch, Anderson, & Panksepp, 2011). There are many more empirical studies showing that music helps humans hear better, move better, think better, and feel better.

A recent cross cultural study (Mehr et al., 2019) performed a big-data analysis of two massive data sets (NHS Ethnography and NHS Discography), and found that music is not only universally present, but has clear ties to specific behaviors like religious rituals. Moreover, robust evidence reveals that all cultures employ music in healing behaviors, dance, and infant care.

Music is also a powerful way to bond independent individuals into a common collective. Music makes people into a “team” or “tribe.” Think about losing yourself in the undulating audience of a rock concert, feeling like you are one with everyone. Or consider the strange sense of allegiance one feels at the beginning of a sporting event, when everyone rises and sings the national anthem together. In part, music evolved to glue us together in social cooperative groups, because we are a highly dependent species and need each other for survival (Schulkin & Raglan, 2014). Such bonding starts in the intimate musicality between mother and baby (motherese or baby-talk), but then it is broadened by coordinated social rituals—creating opportunities of shared intentionality across larger groups.

Musical taste is both a way to distinguish ourselves from others—asserting our individuality, and a way of melting into a crowd and being a part of something bigger (e.g., fan cultures). These functions seem even more pronounced in our large-scale urban societies where individuality and group membership are always being tested, challenged, and reasserted. The counter-culture movement of the 1960s and the early Rock and Roll era before it were self-conscious music cultures that employed music to signal rebellion, autonomy, group membership, and so on. After 9/11, music was part of many healing processes (e.g., composer John Adams composed a choral work entitled “On the Transmigration of Souls”), but also mainstream music grew more patriotic (e.g., Country-pop music), and countercultures and subcultures also expressed their dissatisfaction with dominant narratives through music (e.g., a resurgence of punk music).

These are contemporary imaginative cultures that recreate age-old social uses of music. Long before higher cognition turns our emotions into principled philosophies, it is the work of simpler social and cultural “institutions” to shape and sculpt our feelings into adaptive resources. Anthropologist Polly Wiessner studies how informal institutions, like speech patterns, behavioral traditions, and rituals, shape emotion and cognition (2014). Songs, for example, are important mechanisms in cultivating and directing adaptive emotions. The Enge peoples of New Guinea, for instance, solve ecological and political challenges in part by musical group manipulation. When a group of friends split into two hostile factions, as sometimes happens during competition for resources, the newly opposing groups will rile-up violence by singing songs that demonize their new enemies—songs that describe the opponent families as practicing incest, or describe the opponent women as having thorns in their vaginas, and other dehumanizing narratives. But after several months of warfare and a few casualties, the enemies grow weary and begin to sing peacemaking songs, and songs of consolation. The new songs down-tune the anger and shift the emotional state to one of reconciliation. This leads to expressions of care, and then meals are shared together between the groups. The songs and the meals pacify the rage, and foster prosocial emotions and behaviors (University of California Television (UCTV), 2014).

Music also marks territory. Sometimes we are using music to say, “we are the people who sing these songs, not those other people.” Hagen and Bryant (2003) proposed that the evolution of human music and dance was rooted in coordinated auditory and visual territorial advertisements, like the sonic signals produced by other mammal carnivores. Hominid proto-music, in essence, might have been functionally analogous to the howling of wolves (Hagen & Hammerstein, 2009).

Ecological psychologyand social psychology are helpful ways of reframing the adaptive aspects of music and art generally. Music is an embodied and enactive form of knowing—coming to understand the environment (physical and social) as well as the self. Given the universal demands of social life (e.g., procreation, affiliation, dominance, cooperation), we can consider music and dance (which are almost inseparable) as ways of problem solving. I want to suggest that music and dance are ways of adapting and even thinking with your body (Asma, 2017; Tversky, 2019).

Like other animals, as Darwin suggested, we use dance and song to demonstrate our fitness to potential mates. Ritualized body movement and song is both a show of health in real-time, and a symbolic promise of health for future genetic investment (offspring) and nurturing. Music composition as a signal of potential fitness has been empirically tested on contemporary women, and attraction to complexity of composition changes in step with estrus (Charlton, 2014).

Moving the body and voice in sequential patterns can also communicate rich environmental information to members of your group. Even bees perform a waggle dance that informs other bees where to fly in order to find nectar, pollen, water, and other resources. The body can create a map, and even be a map, for other hive or tribe members that need to navigate space. And musical communication can indicate where allies and enemies are located, creating an adaptive sonic map before language.

In addition to these uses of music and dance, body movement and singing that is synchronized is especially pleasurable. Like other forms of social grooming, the body produces internal opioids (like endorphins) during ritualized rhythmic song and dance. In fact, music and dancing form a tour de force of neurochemical pleasures: serotonin, epinephrine, endorphins, and dopamine. Pain is blocked and euphoria is increased in the musical experience. Unsurprisingly, a team of researchers recently found that group dancing raises endorphins considerably and contributes to social bonding (Tarr, Launay, Cohen, & Dunbar, 2015).

Researchers Hagen and Bryant (2003) suggest that synchronized dance or ritualized body movement would have sent a very strong signal to competing groups: do not mess with us, because we are a unified and formidable group. Coordinated dance and song is a strong form of coalition signaling. Imagine two competing tribes facing down each other over some resource or territory. If one of them jumps, stamps, sings, and generally grooves together like a giant single organism, it signals to the other group that these guys are going to stick together in a fight. A small but coordinated group of warriors can do much more damage than a large but loose coalition. And the performance signal is high fidelity, because it cannot be faked. Groups can pretend or fake strength with bravado and unorganized shouts, but synchronized dance and vocalization actually demonstrates strength directly. Ritualized performance shows, instead of merely says, the group is highly cooperative and has significant history together. Contemporary Samburu and Maasai of Kenya, along with many other tribal peoples around the world, still perform a synchronized warrior dance that is remarkably coordinated and intimidating for any potential enemy to witness. Indeed, modern militaries around the world still use forms of synchronization or entrainment as intimidation.

3 Structures

The component parts of music, or the structural elements, are worth considering when we think about the evolution of music. We can reverse engineer or anatomize any song, and find the distinct markings of earlier songs, traditions, and elements. Muddy Waters said: “The blues had a baby, and they named it rock and roll.” Inside most pop music we find the structures of blues—12-bar chord progressions, pentatonic scales, call-and-response, and so on.

Of course, genres have clear structures—jazz relies heavily on AABA American songbook structure, classical rondos might be ABA, and songs can be dissected into common structural elements like verse, chorus, and bridge. But what are the cognitive structures underlying music as an evolved human activity?

Pitch-bending, for example, is a physiological and cognitive aspect of human singing that is hard to find among other animals. Some scholars think it is a crucial structural aspect of human music because it can be used for social coordination and emotional unification in a group of singers and listeners (Brown, 2000).

Alternatively, consider the phenomenon of rhythmic synchronization. Ethnomusicologist Ingrid Monson, interviewing Jazz player Cecil McBee, describes the aesthetics of rhythm as a wave, and “you understand that particular pulse, where emphasis is placed on two and four … The moment you pick up the instrument and put it into motion you’re supposed to feel that, and then the other things kind of ride the wave” (Monson, 1996, p. 27). An emphasis on the off-beats has a well-known ability to inspire listeners into rhythmic participation. “An entire room of people clapping on 2 and 4 in a gospel service, for example, has the power to motivate all but the most resistant to clap along” (Monson, 1996, p. 27). This ability to sync to a beat and subdivide time inside a beat is called entrainment. Without entrainment animals cannot synchronize tightly. Dancing is out of the question, as is most synchronized singing.

Rhythm is infectious for humans. If one person bangs a drum rhythmically, another person can move their feet, bob their head, or bow their trunk in time to the beat. Next, another person or group of people can start to synchronize the same motions to the same beat. Before you know it, you have a whole tribe in a groove together. This seems relatively simple to us because it is so universal in humans and requires little or no training. But other animals, even our closest cousins, fail miserably to get a decent groove going (Cook, Large, Hattori, Merchant, & Patel, 2014). There are rare exceptions—like Snowball the dancing cockatoo, and some chimps—but most animals seem incapable of synchronizing their own bodies to a beat, and coordinating multiple bodies to a pulse is nigh impossible (Honing, 2019).

Recursion is another deep structural component of music. Language is recursive in the sense that we can keep embedding one clause inside another and adding preposition onto preposition, and there is no upper boundary (syntactically) on this ability. One of the major arguments in the evolution and philosophy of language is how humans arrive at our grammar recursion and embedding ability. It is hard to see how social learning dispositions can give humans this syntactical ability—which is why Noam Chomsky postulated a hypothetical “language acquisition device” in the brain. My own view is that recursion is evidenced in other human cognitive abilities, like motor task grammar for complex behavioral sequences (Asma, 2017), and also in music.

Music has similar syntactical properties as language. Did the properties derive from language or did they precede it? “Blue Moon” by Rogers and Hart, or “Yesterday” by Lennon and McCartney, have AABA structure, which is the classic American song form; 8 bars of the same chord pattern (A1), followed by a slightly modified repeat pattern for 8 bars (A2), then 8 bars of the bridge pattern of chords (B), and finally a return to the original 8 bars (A3). Music is built on such repetition, and it is reasonable to suppose that even stone-age flute music had “parts” in repeated sequence, and parts embedded within other melodic passages (like dropping a “B” sequence or “clause” in-between the recursive A patterning). As Steven Mithen (2007) puts it, “recursion is one of the most critical features of music” (p. 17).

Recursion, as a power of the human mind, probably owes as much to music and dance, as to language and math. Dance is a foundational example of simulation and sequencing. The dancing body is another example of pre-linguistic “grammar” because it has infinite recursion and “step” embedding. A good dancer can subsume many subroutines inside a larger frame of movement repetitions. The basic Tango is a 5-step pattern (slow, slow, quick, quick, slow), using 8 musical counts. I cannot see cave-men doing the Tango, but they may have been doing something equally complicated. There may be something special and uniquely human about recursion and embedding, but we would be wrong to think this is only a feature of language.

I agree with anthropologist Steven Mithen that a crucial feature of Homo sapiens’ mind is “cognitive fluidity” (Mithen, 2007). This fluidity breaks down the dedicated brain circuitry that ties one action-response to one stimulus. Our minds become less machine-like because we can entertain counterfactual images and enlist alternative responses. Most evolutionary psychologists and philosophers have assumed that the cause of this cognitive fluidity was the development of language (in the late Pleistocene), because language provides an obvious syntactical/grammatical system for manipulating representations. This system seemed to be the perfect girder network for expanding the inner headspace of flexible cognition. But more recently, Mithen has argued that another system, namely music, coevolved in parallel with language and gave pre-sapiens similar ways of projecting possible futures. I would push this insight one step further, suggesting that, more than just music per se, a suite of embodied creative abilities—dance, image, music, gesture, etc.—built up an inner space and behavioral space of options that freed Homo from the more deterministic patterns of other hominids. These creative improvisation skills emerged from earlier mammalian habits that manage resource exploitation and social cohesion, and they were emotionally (affectively) driven (i.e., habits like grooming and play fighting). Play, for example, would be selected for because it allows mammals to take threats (and dominance) off-line and rehearse for them in safe environments. And such proto-imagination play is done largely through the body, without much cognitive motivation or even understanding. Music has the ability to represent things and feelings, when the original stimuli are no longer present. Sad and mournful songs, for example, can put an otherwise happy audience, sitting around a campfire or concert hall, into a virtual reality of grief and loss (Taylor & Friedman, 2015). And like any good communication system, refined over eons, music can do this reliably.

While we are considering the “virtual reality” aspect of music—wherein representations are taken offline or decoupled from perception—we need to acknowledge that memory is a crucial deep structure beneath music. A mother sings directly to a child or a cantor chants a hymn in real time, but music also plays in our heads later. Music in real-time can be taken off-line so to speak, and play occasionally or continuously (sometimes irritatingly so) inside our minds. Memory sophistication is requisite when I hum a melody in my head. And eventually a more complete explanation of the origin of music will need to understand the relationship to semantic, episodic, and procedural memory (Groussard et al., 2010). Along with navigational mapping abilities and flint knapping sequences, it is probable that social pressure on musical facility may have helped improve procedural memory, as our ancestors strove to remember and reproduce melodies and beats.

4 Neural Systems

Any discussion of the underlying structures of musical cognition needs to consider the neuroscience and brain imaging data. Mirror neurons, for example, may be the cognitive architecture of imitation that connects our sensory representations of another agent or action to a motor representation of the same action. So, I see a hand grasping, and this matches with an inner motor sense or feeling of my own hand grasping—these are “matching vertical associations” (Heyes, 2018). Observational learning requires a conversion of visual or auditory patterns to bodily patterns (action and affect), and mirror neurons act as the requisite converters. When I hear these sounds (e.g., lullabies), I feel these soothing experiences (e.g., mother’s touch and a flood of oxytocin), and an adaptive association is forged that can be drawn upon for emotional regulation ever after. Mother–infant interaction, with its strong physiological, emotional, and even sonic synchronizing, may have helped create proto-music as early as H. heidelbergensis (Dissanayake, 2000, 2015).

Recently, there has been some fMRI data to confirm that music and speech are processed in different parts of the brain (Angulo-Perkins & Concha, 2019). Musically sensitive areas of the brain were found in the anterior and posterior temporal gyrus (planum polare and temporale), the right supplementary premotor areas, and the inferior frontal gyrus. Speech-sensitive areas, by contrast, were found on the left pars opercularis and the anterior portion of the medial temporal gyrus. It is not entirely clear what this means in terms of evolution, but it lends some corroboration to the theory of pre-linguistic or paralinguistic music communication among pre-sapiens.

Since the development of EEG technology, in the 1920s, we have seen evidence that the brain has a Default Mode Network or DMN. This is the brain phase that we slip into once we stop attending to specific things or tasks in the external world. It consists of medial or middle brain regions, like the medial prefrontal cortex (mPFC), the posterior cingulate cortex (PCC), the hippocampus (in the medial temporal lobe), and the amygdala (in the medial temporal lobe). This brain system is active when we are in wakeful rest, like mind-wandering or daydreaming, introspection, and other non-directional or low attention states of mind. As a default system it characterizes our goal-irrelevant frame of mind. And it contrasts strongly with the Task Positive Network or TPN, which consists of more peripheral brain regions (lateral prefrontal cortex (lPFC), the anterior cingulate cortex (ACC), the insula, and the somatosensory cortex). The TPN underscores our focused attention and goal-directed activities—everything from concentrating on a chess game, or analyzing a mechanical problem, to baiting a fish-hook or solving a math problem.

Some new fMRI evidence suggests that the DMN becomes more active during musical composition, suggesting that it is an important system in creativity. The DMN is in strong communication with the anterior cingulate cortex (ACC) during composition. The ACC (usually more active in the task positive network) acts as an interface between the more rational deliberator functions of the frontal brain and the emotive aspects of the limbic brain. The strong communication between these two areas during creative composition leads some researchers to speculate that the ACC might be providing the otherwise goal-less DMN with some measure of focused intentionality. This may be the brain communication underlying the artist’s active manipulation of daydreaming imagery and impressions, rather than just passive mind wandering. Using a rich flow of potential images, sounds, impressions, memories (dominant in DMN), the creator harnesses them into organized narratives or compositions by attending to some rather than others, by discerning implications, by framing context, and embedding subsections (i.e., activities more dominant in the TPN). On this view, imaginative activity is a toggle between decentered associational mind (i.e., L-state and DMN) and goal-directed intentionality (i.e., F-state and TPN). Even this complicated story is still too simple to capture what is happening in the brain.

Fascinating recent research on the improvising brain reveals some of the neural architecture that underlies the toggle. Cognitive ethnomusicologist Aaron Berkowitz (2010) found that improvising musicians in an fMRI machine enlist the anterior cingulate cortex, ACC, to a significant extent. The ACC is one of the filters or switching stations wherein competing cognitive options and affective values become preferred or chosen. But Berkowitz worked primarily with classical musicians who branched out slightly from established compositions, in improvisational cadenzas. A complementary study by Charles Limb and Allen Braun (2008) opened up the experiment to include different kinds of improvisers, notably jazz musicians and freestyle hip-hop rappers. Limb and others like G. F. Donnay, Rankin, Lopez-Gonzalez, Jiradejvong, and Limb (2014) have continued these experiments and begun a new wave of research into spontaneous creativity. The studies reveal that the lateral prefrontal cortex (lPFC) deactivates during improvisation, while the medial prefrontal activity increases. What does this mean?

It suggests that improvisation succeeds when we shut off our higher order consciousness, particularly self-monitoring awareness, and we let the default mode network (DMN) do its thing. The lPFC is one of the “brakes” or censors in the brain, and the adept improviser is able to disengage the brake so to speak, allowing the usually filtered associations and behaviors to flow more freely. This state of decreased control is sometimes called “transient hypofrontality”—meaning, temporarily reduced frontality. Removing the brake is the last step in the improviser’s series of preparatory behaviors and habits that help her access the note patterns, or rhymes, or free associations that characterize spontaneous creativity.

5 Emotion

Why do we like sad songs? Why do we become aggressive when listening to certain Eminem or Black Sabbath records? Why do we feel amorous when certain Marvin Gaye or Beyonce songs play? Are there chords, melodies, and beats that similarly trigger universal human emotional systems?

Playing a tritone pattern like F and B, or A and Eb, is dissonant and disturbing for most people, while the Ionian major scale is universally heard as happy (Azib, 2017). The Beatles’ song She Said She Said and the Rolling Stones’ song Satisfaction are primarily in Mixolydian mode (i.e., note scale intervals of whole, whole, half, whole, whole, half, whole). The Miles Davis piece So What and the Lady Gaga song Telephone are largely in Dorian mode (i.e., note scale intervals of whole, half, whole, whole, whole, half, whole). These have distinctly different emotional qualities, but it is unclear if they are heard universally with similar feeling-states.

No discussion about the meaning of music would be complete without special consideration of the emotional aspects of musical communication. On the one hand, the relationship between music and emotion is obvious, phenomenologically speaking, but strangely opaque from the scientific perspective (Koelsch, 2015). This is largely because debate still rages around the question: What are emotions? Recent work by constructionists like Lisa Feldman Barrett (2017) has galvanized a cadre of skeptics who think there are no biological emotional systems that we inherit from our ancestors, or share with our mammal cousins. My own view harmonizes with the opposing tradition, which argues that we have inherited a small number of primary emotions from our phylogenetic history (affective systems). We share these affective systems with other mammals, and our neocortical conceptual sophistication transforms these into additional uniquely human emotions.

Following Panksepp, I contend that all mammals share seven foundational affective systems: FEAR, LUST, CARE, PLAY, RAGE, SEEKING, and PANIC/GRIEF. Each of these has specific neural electrochemical pathways, with accompanying feeling states and behavior patterns. Human beings are not just an assembly of mental modules or even emotional circuits. The affective systems are hierarchically structured in three layers of interpenetrating brain activities: primary, secondary, and tertiary functions. Mindbrain processing is stacked like a layer-cake. At the very bottom or at the “core” are the instinctual drives, like fight or flight, and the raw motivations of intentional seeking. This primary-process layer is housed largely in subcortical areas of the brain.

Primary-process emotions are (1) sensory affects (sensorially triggered pleasant-unpleasant feelings), (2) homeostatic affects (hunger, thirst, etc. tracked via brain-body interoceptors), and (3) emotional affects (emotion action tendencies) (Panksepp, 2011).We share these primordial affective systems with all other vertebrates. This layer heavily influences the layer above it, secondary-process emotion, which is more developed in mammals.

Secondary processing includes social emotions, like GRIEF, PLAY, and CARE. It is distinguished from the primary level because it can be sculpted by learning and conditioning. It is the layer of soft-wiring (part native instinct and part learned association), as compared to the hard-wiring of primary level emotion. Panksepp describes secondary-process mind in terms of (1) classical-conditioning, (2) operant conditioning, and (3) emotional habits. Emotions in primary and secondary layers are largely unconscious, and even when we are regulating them, we do not have clear introspective conscious access to their functioning (Winkielman & Berridge, 2004).

Lastly is the top layer of the mindbrain: tertiary-process emotion. This is the layer of mind that most philosophers and psychologists tend to focus on exclusively. Here the emotions are still connected to the primary and secondary processes, but they are intertwined in the cognitive powers of the neocortex. Ruminations and thoughts, underwritten by language, symbols, executive control, and future planning constitute the tertiary-level, though they are energized by the lower level emotion. These ruminations and thoughts also serve as top-down regulators and directors of emotion. At this third level, we arrive at uniquely human emotions, like those elaborate and ephemeral feelings so beautifully articulated by introspective artistic savants like Dostoyevsky, Lennon and McCartney, and Gershwin. Tertiary affects and neocortical awareness function as (1) cognitive executive functions, (2) emotional ruminations and regulations (generally located in the medial frontal neocortex), (3) free will, or reflective intention to act (frontal cortical executive functions).

The biological and psychological sciences have historically isolated or focused on one layer of mind to the exclusion of others, and thereby presented partial and sometimes conflicting pictures of mind and behavior. Many computationally oriented cognitive scientists tend to focus on tertiary-level processing, while behaviorists focus on secondary-level processing. Rami Gabriel and I have argued (2019) that the seeming hostility between the emotional constructivists and the emotional naturalists is also more a matter of selective attention on the top or the middle, or the bottom of the emotion layer-cake.

Some music-based emotion studies seem to confirm this view of the emotions, shedding light on the question of musical universalism. A fascinating study in 2015 exposed 40 Canadians and 40 Congolese Pygmies to musical pieces (approximately equal number of Western and Pygmy songs) (Egermann, Fernando, Chuen, & McAdams, 2015). The participants were continuously measured for emotional responses, including subjective feeling reports, facial expressions, and physiological arousal. The physiological responses to music were measured by using a respiration belt, a blood volume pulse monitor, an electrodermal monitor, and facial muscle detectors. Generally speaking, the subjective reports about the meaning of the music, and even the emotional feelings were quite diverse and varied, as Canadians and Pygmies evaluated the same musical pieces. This suggests, to my mind, considerable flexibility and even idiosyncrasies at the cultural/cognitive level (secondary and tertiary emotions). However, physiological readings were very consistent across cultures as they measured arousal states. Faster tempos aroused both cultural groups in very similar ways, and certain pitches elicited similar physiological responses, suggesting that acoustical characteristics of music like tempo, pitch, or timbre are activating pre-cultural affective systems.

Another study (Fritz et al., 2009) showed that native African Mafa people listening to Western music for the first time successfully identified three basic emotions in respective songs (happy, sad, and fearful). Investigators interpreted this to mean that expression of basic emotions in Western music can be recognized universally.

The means by which emotional systems may be triggered universally is unclear. Some sonic ingredients (e.g., beat tempo, volume) may be universally impacting perceptual/motor systems in a way that activates subcortical responses—analogous to universal startle effects. But emotional contagion is also clearly implicated, and such contagion is probably underwritten by the same mimetic simulation systems that spark empathy in social scenarios.

The view from the stage, so to speak, reveals musicians actively engaged in embodied cognition, but the view from the audience (and the dance floor, and the mosh pit, etc.) also reveals social unifying practices via altered states of consciousness. Often these are spiritual experiences. In high school I followed the Grateful Dead around for a summer. With the other 20,000 audience members, I would drop some psychotropic substance in the afternoon, and be high as a kite by the time Jerry Garcia started noodling guitar riffs in the evening. The Woodstock music festival of 1969 formed the template for this kind of collective, consciousness-altering concert tradition, but Woodstock was really just a safe bourgeois version of the ancient Dionysian and Bacchanalian revels of Greek and Roman culture. Music has always played a part in cultural traditions of ecstasy. Ecstatic festivals use repetitive rhythms and melodies to break down individual egos and merge them together into social super-organisms. Electronic dance music mosh-pits are almost akin to participating in a Sufi whirling dervish ritual, as individual egos are blown-away in search of collective transcendence (Redfield & Thouin-Savard, 2017; Vroegh, 2019). The transcendence, however, is not Cartesian separation from the body, but communal identification with other bodies.

6 Aesthetics

None of what I have discussed here is designed to reduce or eliminate the important work of traditional aesthetics. Music is useful, and adaptive, and therapeutic, but also it is (and will always be) “beautiful,” and “sublime,” and “inspiring,” and even “sacred.” Figuring out why Mozart is beautiful and Stravinsky seems sublime, or why music makes life meaningful, will not be captured entirely in the net of scientific explanation. We also need humanities and philosophical reflection on music that explores the rich connotative, even personal idiosyncrasies of music. Reflecting on emotional resonance and music, my colleague Madeline Cole tells a relatable story about their idiosyncratic connections:

Something as cheesy as Africa by Toto, or Copacabana by Barry Manilow, and even the musical Joseph and the Amazing Technicolor Dreamcoat have a powerful emotional effect on me, personally. I listened to classic rock in high school because it was the only radio station in my car that worked. Therefore, when I listen to it now, I feel emotions linked to the feelings I had as a teenager driving my own car for the first time. Something like Joseph, though very bizarre, is something my grandmother constantly had on. It’s debatable what I took away from it as a religious/moral lesson, but the songs still make me feel a level of comfort, like anyone would with their grandmother. Even songs that are, on a technical level, bad can invoke these positive feelings in us because of the conditioned associations.

Instrumental music in particular is a unique realm of non-representation or pre-representational meaning. The difficulty in expressing its meaning through propositional discourse is a necessary result of its origins in pre-linguistic embodiment. The meaning in music, I submit, is deeper and older than language.

As an evolutionary naturalist I am not inclined to think that a Bach mass, an Arabic Salah, a Rastafarian song, or a gospel hymn are literally representing or presenting God per se. But such music is indeed the stuff of life, however you relate it to metaphysics. In his Twilight of the Idols, Nietzsche famously said: “Without music, life would be a mistake.” Even an evolutionist like myself cannot disagree.

Thanks to my Research Assistant Madeline Cole for many productive discussions and research sources regarding music.