1 Introduction

Philosophy of music has focused on the relationship between music and emotion as a principal issue since its beginnings in Ancient Greece (Cochrane et al. 2013). In recent years, this inquiry has been joined by psychologists and cognitive scientists, who have enriched the field with an impressive array of cross-disciplinary research and theory. This work has affirmed that the sphere of emotions is present in all fundamental aspects of musical experience (Juslin and Sloboda 2001; Zentner et al. 2008). But while the intimate connection between music and emotion is now widely accepted, the precise nature and meaning of this relationship remains a subject of controversy. As a result, discussions over musical emotions have adopted many forms, assumptions, and arguments (Thompson and Quinto 2011). Despite this diversity, however, the literature has been dominated by two main points of view, which attempt to understand musical emotions in terms of what have been referred to, respectively, as the internal and external locus problems (Schubert 2013).

The internal perspective investigates the ‘how’ of musical emotion. That is, it aims at providing an answer to the question of how music induces or causes emotions in listeners (Cochrane 2010a, b; Juslin and Sloboda 2010). The external perspective is mostly concerned with answering ‘where’ questions - e.g. do emotions belong to the music, the performer, the score, or the listener (Davies 2010; Juslin and Timmers 2010)? Put simply, the external ‘where’ problem is mainly associated with emotion perceived as ‘expressed by’, ‘possessed by’, ‘attributed to’ or being ‘located in’ the music itself (i.e. the score and/or performance); while the internal ‘how’ problem generally seeks to understand the causal sequences whereby musical stimuli act on body and brain mechanisms and thus generate emotions in listeners (Scherer and Coutinho 2013; Schubert 2007). These orientations, however, are not always mutually exclusive and sometimes inform each other in various ways to produce more refined approaches.

This juxtaposition of ‘external’ and ‘internal’ points of view has resulted in several influential frameworks (Fabian et al. 2014; Schubert 2013); and has provided important insights across a range of musically-relevant domains such as music therapy (Baker et al. 2007) and music performance (Scherer and Zentner 2001). There are reasons, however, to question whether relying on the external/internal dichotomy represents the best way to shed new light on music and emotions. Indeed, many of its underlying assumptions are increasingly challenged by new research that looks beyond such inner-outer schemas to explore emotion as an embodied phenomenon (Maiese 2011). Along these lines, researchers have increasingly drawn on the so-called ‘enactive’ approach to cognition (Varela et al. 1991) to investigate musical experience in more holistic ways. From this perspective, musical-emotional phenomena cannot be reduced to pre-given outer and inner structures, nor are they best understood in terms of sequential causal chains of events. Rather, the enactive approach understands both emotion and cognition to originate in the embodied activity that simultaneously emerges from and motivates the dynamic interactions between an organism and its environment (see Colombetti 2014; Reybrouck 2005, 2015).

In this paper we explore these concerns in more detail in an attempt to frame an alternative enactive approach to musical emotions. While various interpretations of embodiment and enactivism have been put forward, our perspective is more in line with the classical ‘autopoietic’ or ‘biological’ proposal that originates in the work of Varela et al. (1991), and that has been developed by Thompson (2007) and Colombetti (2014) among others. This said, our goal here is not contrast this framework with similar accounts such as sensorimotor enactivism (O’Regan and Noë 2001a, b) or radical enactivism (Hutto and Myin 2013). Rather, we adopt conceptual tools and models (e.g. dynamic systems theory) that are shared among these perspectives in an attempt to develop the common orientation of these points of view in the context of musical emotion.

The paper is structured as follows. We begin by reviewing a number of influential theories on musical emotions, which are then critically assessed in terms of their problematic commitment to the above-mentioned inner-outer schemas. Here we consider how the pervasive (and often tacit) influence of the standard information-processing model of cognition supports such dualistic perspectives and downplays the importance of personal agency, embodied interactivity, and creative engagement that musical experiences involve (Krueger 2009; Reybrouck 2010; Schiavio 2014). Following this, we offer theoretical grounding for an enactive approach to musical emotions. To do this we develop a range of cross-disciplinary support, most notably drawing on developmental perspectives and related research in affective science and dynamic systems theory (Colombetti 2014). To conclude we consider in more detail how such insights might impact our understanding of musical emotions, offering possibilities for future research and practice.

2 Theoretical and historical background: an overview

Despite the advent of a very influential theory of emotions in the late 19th century - put forward concurrently by James (1890) and Lange (1887); see also Lang 1994 for discussion) - the study of emotions occupied only a secondary role in the subsequent history of psychology, regaining its importance only in the last few decades (see Damasio 1994; Plamper 2015). This may partially explain why the topic of ‘musical emotions’ has been confined traditionally to philosophical and musicological rather than to psychological discussions. However, since the publication of a seminal book by Juslin and Sloboda (2001) on music and emotion, the scientific interest in this domain has expanded greatly, resulting in a rapidly growing body of contributions that demonstrate the current significance of this challenging field (e.g. Clarke et al. 2010; Dibben 2004; Juslin and Västfjäll 2008). In spite of historical and methodological differences, however, philosophical and psychological perspectives on musical emotions coincide in a number of general assumptions. One of them is the distinction between two broad categories of investigation: the expression and recognition of musical emotions on one hand, and the induction and elicitation of emotions on the other (Cochrane et al. 2013; Sloboda and Juslin 2001). As stated above, the former refers to the ‘where’ problem (exploring musical emotions as externally located), the latter to the ‘how’ problem (exploring musical emotions as internally located in the listeners).

2.1 The external locus problem: philosophical and psychological claims

It has often been stated that musical experience appears to involve an ‘emotional message’ that is somehow communicated through the musical sounds, even when the music does not include any lyrics (e.g. Juslin and Laukka 2003). However, this assumption entails a kind of paradox. Music, indeed, is not a sentient being, which makes it difficult to imagine how it could feel and express emotions at all. It can be asked, therefore, to whom these emotions belong. Who is the owner of the emotional message? Musicologists, philosophers, and music psychologists have attempted to answer these questions in various ways, but one common assumption remains: whatever the music expresses, it is to be found ‘outside’ of the listener. This general orientation is in line with a number of traditional beliefs central to Western musicology, where the answer to the ‘where’ (or ‘whose’) question has been taken for granted: musically expressed emotions belong to the composer, who has skilfully imbued his or her private feelings into musical materials so that a competent performer can reproduce it and an educated listener can decipher it (Bohlman 1999; Cook 2001). And indeed, because the strong empirical orientation of music psychology may often overshadow theoretical issues (Eerola and Vuoskoski 2013), such assumptions are often taken-for-granted. As a result, methodology and outcomes in music psychology are often framed according to the tacit belief that musically expressed emotions necessarily belong to the composers or musicians who compose and/or perform the music (Martin 1995).Footnote 1

This assumption has been challenged by a number of philosophical musicologists who point out that neither composers nor performers need to enact in themselves the corresponding emotional state to produce emotionally-expressive music (e.g. Budd 1989; Davies 1994). As a result, three alternative solutions to this ‘ownership problem’ have been proposed: (i) emotions are perceived in music because we have an illusion of a virtual persona to whom they belong, i.e. they are owned by the music, but not necessarily by the composer (e.g. Cone 1974; Levinson 1996); (ii) the perception of emotions in music is a case of misattribution, since the emotions we hear are aroused in ourselves, but are ascribed to the music; in other words, musically-expressed emotions should be attributed to the listener (e.g. Matravers 1998; Nussbaum 2007); and (iii) in order to experience music as expressing emotions there is no need to find a subject that owns them (Davies 1994, 1997; Kivy 1999); the mere fact that musical sounds sometimes resemble human behaviours that are emotionally expressive (e.g. vocal utterances, bodily movements, gestures) should suffice to perceive musical expressions of emotion in the music. In brief, the question of the ‘ownership’ of musically expressed emotions is seen in terms of some combination of the (inner) psychological disposition of the listener in reaction to the (external) structure of the music, leading finally to the experience of perceiving emotions in the music.

2.2 The internal locus problem: routes and mechanisms

In addition to the claim that we may perceive emotions as communicated by or as ‘in’ the music itself, there is also the issue of how music allows us to be ‘moved’ emotionally - i.e. how ‘internal’ emotional states are caused and experienced as a consequence of attending to musical sounds. In line with this, a long-standing assumption in the psychology of emotions suggests that we should distinguish between two possible ‘routes’ that lead to the elicitation of emotions (e.g. Chaiken and Trope 1999). The first route involves the appraisal of the significance of a stimulus for the realisation of our goals. It is grounded in ‘appraisal theories’ (e.g. Lazarus 1982; Solomon 1976; Scherer 2005) and is generally thought to proceed according to rule-based forms of processing. The second route involves associative processing that does not explicitly involve appraisal. Among other things, this entails the reactivation of past emotional states because of their resemblance to aspects of the present situation - including bodily conditions and facial gestures (e.g. Strack et al. 1988; Niedenthal 2007). As we discuss next, both routes have been developed in a variety of ways in order to explain how music may be understood to cause emotions in listeners.

Adherents of the appraisal route have attempted to explain how certain aspects of musical stimuli might be appraised as goal-relevant despite the common assumption that music may have no immediate (or evolutionary) biological relevance for the realisation of our goals in the context of survival and well-being (Juslin et al. 2010; Scherer and Coutinho 2013). This approach is found, for example, in Meyer’s (1956) and Huron’s (2006) expectation theories. Here the claim is that music affords the building of perceptual wholes (gestalts), which may evoke expectations (goals) about how the music will sound next.Footnote 2 By contrasting these expectations with the way the music actually unfolds, it is suggested that different emotional states are elicited like anticipation, tension, surprise, relief, disappointment, and so on. Another proposed mechanism within the appraisal route involves a more ‘primitive’ type of appraisal. Sudden, loud, dissonant, or fast events in the music stimuli, for example, are thought to trigger innate sensorimotor connections that function like reflexes (e.g. Panksepp and Bernatzky 2002), which act on several subcortical areas of the brain that process appraisals of danger or urgency.Footnote 3 These preconscious appraisals are then experienced as feelings of surprise, increased arousal or unpleasantness (Juslin and Västfjäll 2008; Khalfa and Peretz 2004). Here, the chronometric perspective on aesthetic experience, as described by Brattico and colleagues (2013), is also important to consider. This approach explores the temporal order of how the various stages of perception and appraisal interact. For example, primordial engagements with music may be understood to have a place in the early stages of the aesthetic experience, with more explicitly cognitive evaluations occurring later.Footnote 4 In line with this, the chronometric perspective may help us better understand how aspects that range from rapid reflex-like responses and bodily-affective changes, to slower and more explicit evaluations, interact with each other and with the situational and individual characteristics of a given (musical) event in time.

The second route for the induction of emotional states stimulated by music bypasses the need for appraisal. It involves the involuntary activation of past emotion-laden memories though associative processing mechanisms - music that has been previously associated with an emotional experience reinstates that original emotional state without the need for any conscious awareness of the link between both stimuli. This can be seen in cases of evaluative conditioning, where positive or negative responses to a given piece of music are generated because the in past the listener experienced the music as occurring simultaneously with events that were valued as being positive or negative. Similar responses may also occur when listeners have complete awareness of such associations - as in the case of episodic memories, where pieces of music evoke specific emotional life events (Juslin and Västfjäll 2008). Other non-appraisal approaches involve the principle of activation spreading. Here emotions are understood to be organized as networks of nodes (in the brain) connected by associative pathways so that the activation of one of these nodes also triggers the remainder of the network (Innes-Ker and Niedenthal 2002). In the case of music, this principle is thought to underlie the mechanisms of rhythmic entrainment and emotional contagion. The former describes a process whereby the listener’s movements and physiological rhythms synchronise with the periodicity of the music, which in turn increases arousal and/or induces feelings of pleasure (Labbé and Grandjean 2014). The latter describes how listeners unconsciously mirror the emotional expression of the music, and how this mimicry leads to the induction of the same emotion (Scherer and Zentner 2001).

It should also be mentioned that two of the most important (and complex) psychological theories include aspects of both routes. Juslin’s (2013a; Juslin and Västfjäll 2008) approach integrates a range of factors including Brain stem reflexes, Rhythmic entrainment, Evaluative conditioning, emotional Contagion, Visual imagery, Episodic memory, Musical expectancy and Aesthetic judgement (BRECVEMA for short). And likewise, Scherer’s CPM-based approach (Component Process Model; see Scherer 2004; Scherer and Zentner 2001) develops a wide range of interacting features. These involve formal, performance, listener, and contextual factors, which are discussed in terms of five possible mechanisms - appraisal, memory, entrainment, emotional contagion and empathy - that permit the “production of emotion in listeners” (Scherer and Coutinho 2013: 139). Additionally, both theories involve an evaluation of the aesthetic value of the music, which may lead to the induction of so-called ‘aesthetic emotions’ such as wonder, transcendence, nostalgia, tension, or awe (Zentner et al. 2008).

3 Critical assessment of existing theories

In this section we provide a critical assessment of the above-mentioned theories and claims. Our main points of contention are threefold: these approaches (i) often rely on a dualistic and mechanistic inner-outer approach to human cognition; (ii) they tend to ignore developmental concerns; and (iii) they mostly play down the emotional relevance of music for human socialisation and well-being - i.e. the primordial forms of interactive, adaptive, and embodied meaning and world making that musicality affords (Krueger 2013; Schiavio and Cummins 2015). This, we argue, results in reductive views unable to capture the complexity of what emotion and musical experience entails.

3.1 Inner-outer dichotomies

Despite their differences, an overriding assumption of the above-mentioned theories is that musical emotions are caused by external structural antecedents (intrinsic to the music itself), which act on specific internal psychological predispositions of the listener (‘mechanisms’, the ‘affect programs’, or ‘emotional coding’). To put it another way, the musical world ‘out there’ is understood to contain information that corresponds with the inner domain (the processing mechanisms) of the music user,Footnote 5 allowing him or her to develop an internal model of the world by means of a relevant (set of) representation(s). Music, in this view, is understood to cause emotions by acting as an external stimulus that provokes a particular response.

In external locus theories, this sets up a kind of discontinuity between the music and the listener, assuming that emotional ‘content’ is always reducible, in some way or another, to an external category - to something distinctly ‘other’ than the listener - that correlates with (hypothesised) innate emotional coding that allows listeners to pick up the emotional messages ‘in the music’. Inner locus theories also rely on these inner-outer dichotomies. However, they are more focused on what goes on ‘in the head’, which means that they are more specific about the neural mechanisms involved. For example, approaches that seek to explain how music sets up goal relevant ‘appraisals’ through the thwarting and satisfaction of anticipation, or through the activation of memory associations all tend to posit, with varying degrees of complexity, a kind of linear causal schema for emotional responses, whereby ‘external’ information gives rise to ‘internal’ representations though information processing. Huron’s (2006) model, for instance, describes the mental mechanisms involved with the statistical induction of environmental regularities through algorithmic processing. These basic mechanisms may be triggered at various levels and in different ways through learned (cultural) processes to form different types of representational outputs and associated expectations. Listeners’ expectations, accordingly, are therefore ‘weighted sums’ drawn from many representations. Non-appraisal based approaches also make distinctions between a pre-given outer world of musical structures and the pre-given inner domain of innate processing mechanisms that respond to and process musical data. Thus, to varying degrees, both approaches assume an information-processing conception of cognition, where emotional responses are understood as outputs of computational processes that take place ‘in the head’.

This general orientation resonates strongly with the so-called orthodox computational or cognitivist approach to mind (Dennett 1978). From this perspective, we have no ‘direct’ cognitive connection to the world; we can only access it via a process of representational recovery. This involves sequential chains of events that start with the raw data (input) provided by the environment, which are then converted into representations that are manipulated (computed) in a hierarchical way in order to create ever more complex representations. These lead, finally, to behavioural responses (outputs) that correspond with situations in the world ‘out there’. Importantly, all information in understood to be represented ‘in the head’, giving rise to a discontinuity between inner and outer (Varela et al. 1991). Thus, generally speaking, musical emotions are assumed to involve responses to environmental stimuli; little attention is given to the agency of the listener or the role of the body, which is often reduced to a physical entity that does not participate directly in the constitution of lived experience (Husserl 1989; Merleau-Ponty 2002).Footnote 6

Here it should also be noted that although Basic Emotion Theory (BET) properFootnote 7 has not played a major role in musical emotion studies (see Juslin 2013b), many music psychologists and philosophers use (as we mentioned above) rather loose ad-hoc ‘basic’ emotion categories, assuming that real emotions actually come in such categorical forms (e.g. see the discussion of ‘garden variety’ emotions in Kivy 1989). There is, however, an ongoing debate about whether or not musical emotions are best described in terms of discrete and supposedly pan-cultural basic emotions (e.g. happiness, sadness, anger, and fear), and if so, how these may relate to more complex emotional experiences.Footnote 8 In brief, the theories discussed so far all make various assumptions about the independence of pre-given inner and outer domains, the mechanistic and disembodied nature of cognition, and the categorical or discrete nature of what emotions should entail.

At first glance Scherer’s Component Process Model (CPM) may seem to offer an exception to this last concern. However, while the CPM model is indeed critical of theories - such as Juslin’s (2013a; Juslin and Västfjäll 2008) - that endorse the idea of basic emotions it nevertheless imposes its own hypothesised affective categories, three of which are claimed to be ‘properly emotional’ and thus relevant to music. These include the utilitarian, the epistemic and the aesthetic categories, respectively. Moreover, we may recall here that one of the principal motivations behind many of the above-mentioned appraisal theories is to explain how music can cause emotions when it is assumed that musical experiences are not explicitly goal-based because they lack the immediate personal relevance (i.e. for survival and well-being) required for most forms of emotional response to occur (Juslin et al. 2010; Scherer and Coutinho 2013). In line with this, the CPM approach focuses on what it refers to as the ‘aesthetic’ and ‘epistemic’ categories that are thought to correspond more closely to this supposed lack of personal relevance. Indeed, such forms of emotional response are presumed by CPM to partially account for the ambiguity found in studies that attempt to correlate the psycho-physical responses associated with musical emotions with those of (non-musical) everyday emotions. However, as we discuss below, recent research and theory strongly suggests that the assumption that musical experience is not relevant for our personal well-being may be based on a narrow conception of music - one that largely ignores the crucial role of musicality in ontogenesis and socialisation. Despite its complexity, CPM explicitly takes a Kantian aesthetic stance towards musical emotions that conceives of musical experience as a kind of abstract, decontextualised and disembodied perceptual process that, like many of the other perspectives considered, is very much in line with the ‘cognitivist’ model of mind and the detached Western academic approach to music listening and analysis.

While the approaches discussed thus far all offer useful insights, we suggest that the inner-outer schema they assume - and the disembodied notion of cognition this entails - may be problematic. The main issue that emerges here is that these approaches have difficulty addressing the actual experience of music, which arguably involves more than response processes, internal processing or detached aesthetic appraisals. Put simply, these theories tend to suspend the actual living experience of music in order to explain it; and, in the process, reduce it to various categories and loci. To be clear, we are not claiming that such approaches should be abandoned. Rather, our suggestion is that by critically contrasting (and supplementing) their methods and insights with perspectives that attempt to offer more holistic accounts we may gain richer accounts of what human musicality entails.Footnote 9 For example, we have seen how many of the approaches discussed above assume that music is not essential for human survival and well-being. As we consider next, this is increasingly challenged by a growing body of evidence that reveals the central role musicality plays for human development and socialisation. Further on we will explore how these and other concerns may be better addressed through an embodied and enactive approach to musical emotion.

3.2 Embodied interactivity and developmental concerns

The assumption that musical experiences are not explicitly goal-based, and thus not personally relevant, has been questioned by research that stresses the deep significance of musical activity for human well-being. This research embraces an extended conception of what music and musicality entails, exploring the ways it spans biological, social, and cultural modes of being. Indeed, this highlights the primordial necessity of musicality for embodied, pre-linguistic and emotional-empathic forms of understanding, communication and social cognition, beginning with the primordial interactions between infants and primary caregivers (Cross 1999, 2001; Krueger 2013; van der Schyff 2013b). This may be understood in terms of what Trevarthen (2002) refers to as the primary intersubjectivity necessary for developing social bonds. Similarly, musicality is increasingly understood to play a major role in the process of participatory sense-making (De Jaegher and Di Paolo 2007), which describes the way autonomous living systems co-enact meaningful relationships through embodied-affective means. This can be seen, for example, in the way caregivers and infants realise a shared world of meaning through embodied-emotional interactions. Here meaning is not pre-given but rather unfolds in a circular and co-operative fashion, where both parties actively participate in developing a repertoire of (musical) gestures and utterances that are intimately linked to strengthening the relationship (Fantasia et al. 2014; Johnson 2007). In line with such insights, a number of neuroscientists have become increasingly cautious of explaining emotions in purely mechanistic and inductive terms (Ramachandran 2011). And indeed, Koelsch (2013) has argued that music is in fact explicitly personally relevant as it helps to fulfil basic social needs related to survival and well-being.Footnote 10

In brief, one of the key problems that motivates many appraisal-based theories (i.e. music’s putative lack of personal or goal-relevance) loses its significance when the focus shifts towards exploring the role of musicality and emotion in interactive developmental contexts. From this perspective the body plays a central role (both explicitly and covertly) in shaping the way we experience music (Leman 2007; Reybrouck 2006). Indeed, the insights offered by the developmental-relational perspective go beyond inner-outer frameworks, and weaken assumptions of fixed pre-given ‘affect programs’ in-the-skull. They describe our musical-emotional lives not as depersonalised input–output responses, but rather in terms of processes of embodied interactivity - as ongoing histories of organism-environment coupling that afford the enactment of meaningful worlds. With this in mind, an embodied, relational and developmental approach to human musicality may offer the starting point for an alternative perspective - one that considers music, mind, body, and emotions not as distinct categories, but rather as interpenetrative and co-arising aspects of being that emerge and develop through active involvement with the physical and social world (Clarke and Clarke 2011; Matyja and Schiavio 2013; van der Schyff 2015).

4 Towards an enactive alternative

In what remains, we attempt to develop this holistic and embodied perspective through the lens of the enactive approach to cognition. Put simply, this approach to mind is not based in mechanistic metaphors or dualistic loci, but rather in the fundamental life processes through which living systems arise and flourish. As we will discuss, this perspective may offer an innovative way to explore musical emotion and cognition in the context of the embodied dynamic self-making processes common to all autonomous living creatures. In doing so, it may help us better understand musicality as a primordial and universal human sense-making capacity, while simultaneously embracing the great range of experiences and activities it entails.

4.1 Fundamental enactive principles: sense-making, autonomy, and autopoiesis

Enactivism is a cross-disciplinary perspective on human cognition that integrates insights from fields such as phenomenology and philosophy of mind, cognitive (neuro)science, theoretical biology, and developmental and social psychology (Stewart et al. 2010; Thompson 2007; Varela et al. 1991). Most centrally, it explores the deep continuity between mind and life, considering cognitive processes as originating in embodied perceptually guided action. In other words, rather than understanding cognition only in terms of skull-bound structures (representations, neural activations, computations), the enactive approach sees it as an activity constituted by circular interactions occurring between an organism and its environment. These interactions modify and are motivated by the internal norms of the organism’s adaptivity, and emerge from the nervous system, which establishes a sensorimotor coupling with the world (Maiese 2011). Through these continuous sensorimotor loops (defined by real-time action-perception cycles), the organism (including the music user) enacts or brings forth his or her own domain of meaning (Colombetti and Thompson 2008; Thompson 2005), with no actual separation existing between the cognitive states of the organism, its physiology, and the environment in which it is embedded. Cognition, from this viewpoint, originates in a continuous interplay between an organism and its environment as an evolving dynamic system (Hurley 1998). This may be understood in terms of three main interrelated concepts: sense-making, autonomy and autopoiesis.

The first concept, sense-making, describes an organism’s adaptive capacity to develop a repertoire of meaningful relationships with the world to achieve a viable existence (Thompson and Stapleton 2009). In order to survive, develop and maintain its own identity, an organism is required to make sense of its world according to its metabolic needs and its degree of complexity. For example, while a simple single-celled organism, in its relation with the environment, would be mainly concerned with values such as ‘nutrition’, a complex organism (e.g. a music user) may bring forth a much vaster array of meanings to flourish in the richer socio-cultural world he or she inhabits. Sense-making, then, concerns the organism as a whole, from its neural, thermoregulatory, metabolic, and social requirements to the types of relevant sensorimotor skills it develops to establish a concerned point of view that generates meaningful experience (Di Paolo 2005, 2009).

The second concept of autonomy concerns the intrinsic demands of a living system’s own organisation - its physiology and metabolic needs - which in turn shapes, and is shaped by, its environment (Dumas et al. 2014). In this view, a living creature is autonomous because, although constrained by its niche, it is not completely determined by it (Thompson and Stapleton 2009). Autonomy and sense-making are therefore deeply related: a creature’s sense-making has its roots in the circular ways of acting and sensing required to preserve itself under precarious conditions (Varela 1979); this process of perceptually guided action generates its autonomous identity.

The third and over-arching concept, autopoiesis, refers to the way living organisms may be understood as ‘self-producing’ entities that bring forth, and continually strive to maintain a viable and thus meaning-laden life-world via the interactive processes described above. This may be contrasted with non-living ‘cognitive’ systems such as computing devices, which are not self-making and are thus dependent on external entities (i.e. humans) who bring them into existence and imbue their operations with meaning. Living cognitive systems, rather, are autonomous, autopoietic and therefore intrinsically meaningful (Varela et al. 1991).

Taking these three concepts together, the organism may be understood as continually striving to maintain a healthy relationship with its environment - one that permits the continuation of its bounded metabolic processes. This describes the origin of ‘mind’ in the embodied-affective processes through which a given organism continually reaches out to, makes sense of, and thus enacts a viable world according to its metabolic needs. In other words, as organisms shape their world into a place of salience they must affirm their own autonomous identity. They do this by constantly compensating for real-time environmental perturbations that impact their metabolic state and adaptive relationship with the environment. Accordingly, in light of the complex and changing demands of the environment, such self-regulation (stabilisation) must be realised via ‘circular’, non-linear, processes, rather than in a ‘causal’ or linear way. Such dynamical coupling, in this sense, may describe not only the recurrent patterns of action and perception that dynamically link the living system with its environment (Von Uexkull 1934, see also Barrett 2011), but also the web of relational interdependencies that are displayed by the inner biological properties of the system itself. In this way, the dynamics of the organism-environment relationship cannot be understood as having a starting or ending point. Rather, each component depends on the other in a network of constant interactions - i.e. an ongoing ‘history of structural coupling’ between organism and its environment. Importantly, the sense-making activities that support such dynamic processes are always relevant to the life-world of the organism and are thus emotionally motivated - from the ‘primordial affectivity’ of simple organisms to the more complex individual and socio-cultural self-organization of humans (Colombetti 2014). From this standpoint - and given the developmental concerns discussed above - each music user may be understood as a sense-maker who actively ‘brings forth’ an autonomous identity when engaging with music.

Put simply, we suggest that it is the relational and affectively-emotionally motivated dynamics of embodied ‘sense-making’ that most fundamentally characterize musical experience, and that such musical sense-making occurs in ways that are relevant to the life-world of the musical ‘organism’ as constituted through its unique developmental history. And indeed, because the actions of living beings cannot be performed or described in a fully detached or unemotional way (Sinigaglia and Sparaci 2010), musical emotions may be understood to emerge from the complex and recurrent patterns of interaction that unfold between music users and their environment. With regard to this point, it may be helpful to consider the (explicit and covert) sensorimotor trajectories of active engagement that originate in the adaptive and bodily activities required to seek out and make sense of the world: a number of empirical studies have shown how music listening may enhance motor facilitation (e.g. D’Ausilio 2007, 2009; Novembre et al. 2014; see Schiavio et al. 2015 for a review), allowing a music user, within the limitations of his or her motor repertoire, to re-enact the same motor actions required to perform the musical stimulus. With this in mind, it may be suggested that music users participate emotionally in the perception of music through motor engagement. Thus, preparing for action, resonating with music, and making sense of the musical world in personal, meaningful ways may help us describe musical emotionality without necessarily recruiting computations, or reducing such experiences to structures ‘in the head’.

For improvisers, composers, listeners and interacting performers, musical experience emerges through dynamic affective-motivational processes, which play out in unique ways depending on how musical environments interact with the developmental histories of the participants involved. Modes of engaging with music differ not only with regard to the single individual (e.g. two listeners may display diverse emotional experiences, despite having the same background, expertise, etc.) but also in terms of the specific sensorimotor interactions adopted to engage with the musical material. For example, while a performer and a teacher may have very similar embodied engagements with their instruments they may adopt different sense-making modalities to enact their domain of meaning (either serving a desired educational purpose, or emphasising a critically expressive passage in a concert). Such phenomenologically rich contexts may reveal interesting features of this approach. Indeed, musicians explore and play with the dynamic and interactive processes of sense-making in diverse ways, sometimes adjusting their performance and expressions to produce consensus between performers or shared embodied states between interacting listeners (e.g. dancers). At other times they initiate radical shifts that demand new emotional-bodily-cognitive relationships and a heightened adaptability to the sonic environment (e.g. free improvisation). And while the measurable physiological effects of the emotions involved in such diverse settings may cover a relatively limited range of parameters, the actual experience of such emotions may take on a wide range of characteristics and meanings given the situatedness of the music user. That is, while musical emotional episodes may bear striking physiological similarities to one another, they may also involve important phenomenological differences that reflect the contingencies of existence and adaptation.

To summarize, from the enactive perspective we defend, musical emotions may be best understood not in categorical terms, but rather as episodes of experience associated with the ongoing process of maintaining adaptive, self-sustaining, dynamical stability. Therefore, we suggest that while the traditional focus on expectation, appraisal, and the relationship between form and expressivity remain important elements to consider, our perspective allows us to cast things in a broader light - one that highlights the fundamentally embodied, relational, transformative and unique agentic status of the musical organism. As such, it requires new approaches for analysis. With this in mind, we now turn to explore dynamic systems theory (DST) as a possible way to make sense of such complexity.

4.2 Making sense of complexity: dynamic systems theory

The enactive notions of autopoiesis and autonomy resonate with the broader phenomenon of ‘self-organisation’ found in complex dynamic systems in general, including non-biological varieties. Exploring such phenomena is the domain of dynamic systems theory (DST), a branch of mathematics that studies how complex systems - from weather and climate patterns to insect colonies (Strogatz 1994, 2001) - maintain structural unity and generate recurrent patterns of behaviour through networks of mutually influencing processes (Beer 1995; Thelen and Smith 1994). Put very simply, DST attempts to describe how complex systems change over time. This is expressed mathematically in terms of differential equations,Footnote 11 which means that the characteristics exhibited by such systems are not necessarily considered as discrete events or fixed properties, but rather in terms of continuous temporal trajectories (Chemero 2009; Kelso 1995). The latter have tendencies to converge and to deflect, resulting in the development of various relationships and patterns that characterise the state of the system. The term phase portrait has been used to refer to the set of all possible trajectories of a given system. It is represented as a topological space that shows areas of convergence (attractors), areas where the system’s state will evolve towards a particular attractor (attractor basins), and areas of deflection (repellors).

Over time perturbations to the system can lead to phase transitions - qualitative shifts in the total state of the system that is described by a new topology. The perturbations that bring about such transitions result from changes in the constraints that influence the state of the system and can be refined to describe the temporal characteristics of self-organising systems in terms of circular forms of causality, referred to as first and second order constraints (Thompson 2007). A classic example is how changes in heat added to an oil-filled pan perturbs the local interactions of the oil molecules (first order constraints), which, in turn, affect the global behaviour of the oil in its totality (second order constraints). Such macro-level patterns, which are observable as changes in the amplitude of convection rolls of the oil, then impose further reciprocal constraints on the movement of the molecules (Haken 1977). The term emergence is used here to refer to distinct properties or patterns of behaviour that emerge (often recurrently) from the temporal interactions of such complex systems (Friston 2009).

There is of course much more to DST than the brief gloss provided above. However, for the purposes of this paper it suffices to note that this theory offers a mathematically coherent way of describing how self-organising systems develop, stabilise and transform according to the reciprocal influences of local and global factors. Along these lines, it should also be noted that recent work in cognitive and affective science based on DST has weakened the standard assumption that cognition and emotion proceed through fixed programs and brain mechanisms that function according to a decontextualised, linear representational input–output schema (e.g. Kiverstein and Miller 2015). Because of the inclusion of temporality in DST, the circular interaction of local and global factors, and the complex interactions of the multiple trajectories involved (attractors and repellors), these models are necessarily non-reductive. As such, they are well suited to explain emotion in terms of the circular constraints, entrainments, and emergent patterns and properties that arise as the dynamic brain-body-world system continually enacts itself through adaptive interactions. Not surprisingly, recent DST approaches to emotion adopt developmental points of view, understanding emotional episodes not simply as outputs of neural programs, but as emergent properties of ongoing embodied dynamics. The latter include synergistic muscular linkages, neural self-organisation (Freeman 1999, 2000), metabolic processes (Thompson 2007), and environmental factors (Granic 2000; van Gelder and Port 1995). In brief, from this perspective, emotions are not understood as fixed phenomena, but rather as developing over time and in context, highlighting the plasticity of the organism-environment relationship.

Along these lines, DST also describes how the trajectories of two or more dynamic systems may interact with each other, resulting in richer networks of mutually influencing constraints, which may result in the development of still larger systems (shared phase portraits, attractors, and repellors). A (relatively) simple example is how wall mounted pendulums mutually constrain one another, resulting in synchronisation or ‘entrainment’ over time (see Clark 2001). A number of researchers have explored such phenomena in the context of emotional interactivity between individuals, and especially in developmental contexts, revealing that emotions do not simply inhere in the individual but develop relationally (Laible and Thompson 2000; Fogel et al. 1992). This implies, for example, that emotions may be understood as ‘socially extended’ phenomena (see Krueger 2014a, b, c, for musical applications). Thus, given the contingent relationship between environment and individuals, similar dynamic patterns may emerge that can be understood as affording ‘recognisable’ or recurrent states of being - i.e. viable ways of interacting and bringing forth a world (Menin and Schiavio 2012). For humans and other social animals, such states emerge in infancy and develop through histories of valenced embodied experiences - both in a positive and negative sense - resulting in ‘basins of attraction’ that are shared with, and influenced by, the activity of all those that are involved (Sheets-Johnstone 2010, 2012). In this way, emotional interactions may be understood as both plastic and patterned-recurrent (Colombetti 2014). This resonates with the social and developmental significance of musicality discussed above. It also implies that what is often considered as pre-given or discrete emotional categories may not be so clear-cut after all. The states of being we refer to with specific emotional signifiers may be far more complex, contextual, and idiosyncratic than is suggested by our language. For example, what we categorise as ‘fear’ in a given instance may in fact involve a complex range of relational entailments that make this or that fear unique to its context and the person experiencing it. Thus emotions may be considered as dynamically emergent phenomena, which may bear likeness to previous states of being and to episodes experienced by others who share similar metabolic needs and physiologies.Footnote 12

With this in mind, we suggest that DST may provide useful tools for making predictions and developing models of musical emotions without recruiting categories such as ‘inner’ and ‘outer’ and without relying on linear causal models. Indeed, by emphasising the mutuality between music users and musical environments, the dynamic-enactive approach may offer new possibilities for empirical research and for developing richer theoretical frameworks. For example, empirical research might focus more on the real-time dynamics of interaction among complex systems (e.g. musical environments involving multiple interacting participants) to better understand how manipulations of certain musical parameters may perturb the stability of such a system, and how such perturbations correlate with shifts in the individual and shared ‘emotional’ states of the participants involved. These data could be situated within the developmental histories and phenomenological accounts of the participants to develop answers to a number of questions. For example: can the emergence of emotional states be predicted by the musical expertise of music users? How do the characteristics of emotional states change as the history of structural coupling between the music user and the musical environment evolves? Does familiarity among music users play a role in this context? How do certain types of perturbation affect the autonomy of each sense-maker, and the self-sustaining properties of the coupled system as a whole? How do participants adapt and interact creatively to maintain the musical system?

5 Conclusion: enacting musical emotion

We have argued that emotions are not simply responses to an environment, but active engagements involving a wide range of dynamically interacting trajectories. As such, they are central to the ongoing process of embodied sense-making that characterises autonomous and self-organising living systems in their continuous striving to bring forth and maintain a viable life-world. With this in mind, it should be noted that, while emotions might be described as more or less episodic emergent events (Lewis 2000), other related but longer-lasting psycho-physical phenomena such as moods (Scherer 2005) may be included in the broader primordial sphere of affectivity. This is to say that, while specific emotional events may come and go, there is a very strong sense in which life is always fundamentally ‘emotional’ in a primordial context. Indeed, because each organism must enact its world of meaning in order to preserve its autonomous identity, the complex dynamics of living self-organisation necessarily involve a valenced existence not shared by non-living self-organising systems. Thus, if cognition is sense-making (as many enactivists have argued - Varela et al. 1991; Thompson 2007, etc.), and sense-making entails the embodied and affective coupling with the environment that enables self-regulation, then cognition and affectivity cannot be separated from each other.

From the enactive/DST perspective, emotional experiences are not solely the result of a combination of discrete or fixed categories related to genetically determined cognitive mechanisms and affect programs; nor can they be reduced to pre-given external features. Rather, the enactive/DST approach embraces the centrality of affectivity for understanding the adaptive and creative nature of living creatures as active, autonomous sense-makers. Again, this resonates strongly with the developmental and social meanings of musicality considered above, where music users enact their world of meaning by actively participating in musical behaviours in a variety of ways that are relevant to their well-being. And indeed, this may also include metabolic, automatic, processes that are not conscious. As such, the musical mind and its emotional components may best be understood as continuous with the same circular dynamics of autonomy and sense-making that ultimately define the autopoietic nature of life itself: music users develop different ways of interacting meaningfully with the physical, social and cultural worlds they inhabit. Multiple examples can be given, such as listening, performing, learning, educating, worshipping, imagining, interacting with children and caregivers, enacting social and cultural environments. Such forms of structural coupling between organism and environment may be understood as adaptive (and empathic) sensorimotor engagements shaped by the dynamic history and degree of acquired musical skills of the individual music users (Overy and Molnar-Szakacs 2009; Schiavio 2012).

The point we would like to stress is that musical actions (including listening) are always motivated (goal-directed) and hence are also essentially emotive-affective. In other words, the roots of musicality, in a broad sense, may be found in the dynamic interplay between an organism and its environment, with an emotionally motivated cognitive system participating actively in the enactment of its own domain of (musical) meaning. Musicality, from this perspective, may be understood as a primordial way human (and perhaps other) organisms reach out to the world in order to survive and flourish. This claim is supported by research and clinical work in music therapy (see Schiavio and Altenmüller 2015; van der Schyff 2013b). With this in mind, music cognition may be understood as fundamentally affectively embodied as it relies on the bodily power of action in context, rather than being an abstract computational process implemented by a decontextualised, ‘naked’ brain (Barrett 2011; Barrett et al. 2010). This strongly suggests that the whole sphere of ‘affectivity’ and embodied behaviour - including valenced action, moods and emotion - must be taken into account when we consider musical sense-making and cognition in general. Put simply, our view is that musical sense-making is an emergent property of the agent-music relationship and - as such - it is co-created. The agent is never the sole decider of musical meaning because the agent itself is always fundamentally embedded in a world (or, in our case, a musical environment) that presents affordative structures ready to be (en)acted upon and within.

The enactive approach considers musicality beginning at the fundamental levels of embodied sense-making, primordial affectivity, and selfhood; at the origins of our existence as complex bio-cultural beings. As such, it may shed light on the often-ambiguous results produced by research that attempts to make psycho-physical correlations between ‘musical’ and ‘non-musical’ emotional ‘responses’ (e.g. Krumhansl 1997; Lundqvist et al. 2008). Indeed, while research has shown that (when given the appropriate categorical prompting) listeners may consistently attribute specific emotions to a given passage of music, it has proven much more difficult to demonstrate that music actually produces such emotions in listeners. In brief, such observations have led some to suggest that musical emotional experiences may be emotionally ‘cue impoverished’; that they are merely representative of, diminished versions of, or somehow different from, other types of ‘proper’ emotions (see the discussion of CPM above; Sloboda 2000). As we have seen, however, the issue may not be the impoverished state of ‘musical emotions’, but rather that our current categorical and inner-outer conceptions of what both ‘emotion’ and ‘music’ entail lack the descriptive and explanatory richness required.

The enactive approach to music emotions and cognition may also shed new light on the early sense-making abilities of the music user: if human engagement with music arises from the dialogue between the music user in action and the dynamics of the musical environment, rather than being considered as an invariant that is already given, the complex mutuality between active experience, emotion, and skill acquisition can be studied from early infancy as basic aspects of human musicality (Phillips-Silver and Trainor 2005). This insight is particularly significant when considering how traditional approaches to infants’ musicality typically focus on activities such as the recognition of pitch (Clarkson and Clifton 1985), harmony (Trainor and Trehub 1994), rhythm (Trehub and Thorpe 1989) or timbre (Costa-Giomi 2013), which are often considered as discrete, unemotional, and disembedded phenomena. Moreover, because our perspective challenges common and often reifying assumptions about the pre-given and categorical nature of emotions (such as those associated with Basic Emotions Theory), it suggests that we may have a good deal of perceptual autonomy with regard to how we develop affective-emotional interactions with music, and how such engagements may develop in the context of music as a history of embodied experiences. This insight has a number of implications for musicological research (e.g. Leech-Wilkinson 2013) as well as for music education (Bowman 2004; Elliott and Silverman 2014; van der Schyff 2015). The enactive approach also calls into question existing philosophical and research methods by demanding a more nuanced and phenomenologically sound approach that integrates the subjective and the objective, thus moving towards an entre-deux between scientific methods and direct experiences.

Other examples of existing music scholarship inspired by such framework can be found in the work by Joel Krueger (2009, 2011, 2015a, b). While his research is mostly concerned with music listening, it embraces a number of issues strictly related to the current proposal - integrating insights from phenomenology, philosophy of music education, and affective and cognitive science. With regard to musical emotions, he defends an externalist view (2014c), which considers how environmental resources may become coupled with one’s mental processes, giving rise to “otherwise-inaccessible forms of cognition and behavior” (2015b, p. 92). In particular, as we offload into the environment certain cognitive processes to free up internal resources and generate real-time engagements with new problem-solving possibilities, music, as he argues, may play an analogous role in terms of emotional regulation. As such, his work resonates strongly with our focus on the bodily power of action and the importance of the environment in driving cognitive processes and emotionality. Similar ideas have also been put forward by three authors of the present contribution. Schiavio, for example, investigates (both empirically and theoretically) the enactive roots of human musicality starting from early infancy (Schiavio and Gerson 2015; Gerson et al. 2015). His research is situated at the crossroads of neurophenomenology, psychology, and embodied approaches to cognition (Schiavio 2012, 2014), exploring how the insights emerging from such interdisciplinary work may impact musical learning, therapy and performance (Schiavio and Altenmüller 2015; Schiavio and Cummins 2015; Schiavio and Høffding 2015). Similarly, research by Reybrouck puts together semiotics and theoretical biology in order to inspire a richer understanding of what human musicality entails, with particular focus on the notion of musical sense-making (2001). More recently, his work explores the fields of music education and neurology through the lenses of embodied cognition (Gil et al. 2015; Reybrouck 2014; Reybrouck and Brattico 2015). Theoretical approaches to embodied and enactive cognition have also been developed by van der Schyff, whose work includes the relationship between enactivism, critical ontology and the praxial philosophy of music education (van der Schyff 2015; van der Schyff et al. in press). He has also examined the enactive approach to biological evolution in the context of human musicality (2013c).

Much more could be said about the relevance of the enactive perspective for the wide range of activities and experiences we refer to with the word music. This said, we hope that the basic groundwork developed here will continue to be explored in various ways so that new and richer perspectives will continue to emerge. While a definitive model of musical emotions may not be forthcoming in the foreseeable future, the enactive perspective may help us rethink taken-for-granted assumptions about what music and emotion entail, and move towards more holistic perspectives that embrace music as a primordial aspect of what it means to be human. It will be very exciting to see how the growing interest in enactivism across a range of fields (e.g. neuroscience, social psychology, linguistics, biology, education) may impact our future understanding of music, emotion, and the embodied mind.