Keywords

Introduction

A growing number of researchers claim that our traditional views about what cognitive processes are and where they take place must be revised. According to these researchers, the cognitive processes that make up our minds can reach beyond the traditionally conceived boundaries of individual organisms to include as proper parts aspects of the organism’s physical, technological, and socio-cultural environment (Kiverstein et al., 2013, p. 1). This idea is known as the Extended Mind Thesis (EMT henceforth: Clark, 1997, 2008; Clark & Chalmers, 1998).

Proponents of EMT (Farina, 2020; Kiverstein & Farina, 2011, 2012; Menary, 2010; Rowlands, 2010; Sutton et al., 2010; Wheeler, 2010) typically hold that quite familiar human mental states (such as states of believing that so-and-so) can be realized, in part, by structures and processes located outside the human head. EMT thus paints the mind (or better, the physical machinery that realizes some of our cognitive processes and mental states) as, under humanly attainable conditions, extending beyond the bounds of skin and skull (Kiverstein et al., 2013).

The intellectual roots of EMT are rich and varied. One could argue that there are, at least, seven different fields or domains of research that have influenced the development of this theory. In this introductory section, we briefly review these domains and sketch out our  plan for the chapter.

Work on Distributed Cognition

Distributed cognition (also see Hutchins, 2005) is the paradigm that describes cognition as a distributed phenomenon; that is, as occurring and taking place across objects, individuals, artifacts, and tools in an active environment. Distributed cognition takes place when two or more individuals engage in reciprocal interactions in order to solve some difficult cognitive tasks. To date, researchers working on distributed cognition have devoted much of their attention to analyzing and describing the processes and properties of systems of actors interacting with each other and/or with an array of technological artifacts [such as airplane cockpits (Hutchins & Klausen, 1996), ship navigation (Hutchins, 1995), air traffic control (Halverson, 1995), and software teams (Ciancarini et al., 2021a, 2021b; Flor & Hutchins, 1991)]. Nevertheless, the most well-studied distributed cognitive systems are those involving transactive memory systems, where two or more individuals collaboratively store, encode, and retrieve information and knowledge (Wegner, 1987).

In highlighting the many ways in which brain, body, and world may work together in subtle technological, and often complex socially distributed and materially engaged partnerships, research on distributed cognition can be said to have inspired the development of EMT.

Work on Cybernetics and Connectionism

EMT was also inspired though, by early work in cybernetics (Ashby, 1956; Wiener, 1948). Cybernetics attempted to explain cognition in terms of circular causal chains, where some action produced by the system generated some change in its environment, which in turn triggered further changes in the architecture of the system. This work was crucial in inspiring research on EMT because it stressed the importance of feedback loops that run through the body into the world in explaining cognitive processes.

Connectionism is another crucial source of inspiration for researchers working on EMT. In denying that the vehicles of cognition are purely symbols encapsulated in the brain and in considering them instead as patterns of activation distributed across nodes in a network (Rumelhart & McClelland, 1986), connectionism (Clark & Karmiloff-Smith, 1993) has surely foregrounded important insights characterizing EMT, such as the idea that some of the mechanisms that underlie cognitive processes are not all in the cranium.

Work on Situated Robotics

Situated robotics is the study of robots embedded in complex, often dynamically changing environments. Situated robotics is based on the idea that intelligence is for doing things, and thus focuses on building robots capable of displaying complex intelligent behaviors with little internal variable states to model (Brooks, 1991; Pfeifer et al., 2006). The idea is that robots are built with a repertoire of simple behaviors, which, when coupled and combined together via sensory motor links in a (largely) bottom-up fashion, can produce sophisticated actions and eventually cognitive processes (Clark, 1997). Among the most successful research programs in “situated” robotics developed to date are those led by Jun Tani at the Okinawa Institute of Science and Technology and by Dario Floreano at Lausanne. However, the most fascinating examples of situated robotics probably come from research conducted on humanoid robots (e.g., Atlas: Farina, 2020). These robots can perform complex actions (such as walking across rocky terrain, navigating hazardous surfaces, climbing stair steps), which clearly extend into the realm of the cognitive and involve—for instance—the development of a sense of proprioception (body configuration), forms of attention, coordination, balance, and/or planning. Situated robotics influenced the formulation of some of the core principles of the extended mind thesis, such as the idea that cognition depends heavily upon the effects of a special kind of hybridization “in which human brains enter into an increasingly potent cascade of genuinely symbiotic relationships with knowledge-rich artifacts and technologies” (Clark, 2001, p. 1).

Work on Active Vision

Research on animate vision endorsed a view in which a perceiver uses her body and various other structures in the environment to offload perceptual processing onto the world. Gibson (1979) showed how visual perception is the result of a dynamic coupling between a perceiver that manipulates information-bearing structures and its environment. O’Regan and Noë (2001) further confirmed this understanding of visual experience by focusing on the animated exploration of one’s environment, thus displaying a sensitivity to sensorimotor contingencies in the environment of the organism. Ballard et al.’s (1999) animate vision paradigm within computation psychology experimentally demonstrated the deeply embodied and active nature of visual perception. Thus, in explaining the intentional and phenomenal characteristics of perceptual experience through the appeal to sensory-motor dependencies (patterns of contingencies that hold between the movements the perceivers make and what they are able to perceive), work on animate vision (Churchland et al., 1994) provided important theoretical insights for many researchers working on EMT.

Work on Phenomenology

Equally important for the development of EMT was early work conducted in phenomenology. A crucial source of inspiration was Merleau-Ponty (2005), who proposed an embodied account of perception that challenged the dichotomy between mind and body envisaged by Descartes. Another crucial source of inspiration for EMT was Husserl (2005). It has been recently argued that Husserl’s transcendental idealism has tight links with vehicle externalism (Zahavi, 2008)—the thesis that our mental states constitutively depend on items in the external world, which is at the core of EMT. Heidegger’s existentialist phenomenology (1927) and, in particular, his concept of Dasein (“being there”) was another influential source of inspiration for researchers working on the Extended Mind. Dasein is the idea that our minds should not be looked at as detached and passive, but rather as immersed in the world in which we live. In other words, according to Heidegger, we should study the world surrounding us as a system of relations in which we are practically involved and we should understand the mind not as a mere function of brain activity; rather, as emerging through interactive feedback loops between brain, body, and world (this is again a central principle underlying EMT).

Work on Material Culture

Works on cognitive ecologies and material culture also galvanized research on EMT (Knappett, 2011; Malafouris, 2004). Research on so-called cognitive artifacts, in the sense of “exograms” (Donald, 1991), “meshworks of things” (Ingold, 2010), “tools for thought” (Dennett, 1996), or “epistemic actions” (Kirsh & Maglio, 1994) demonstrated that things can have a cognitive life on their own, thus spurring proponents of EMT to develop the idea that cognition exists primarily as an enactive relation between and among people and things. The central notion underlying this line of research is the idea of material engagement, a synergistic process involving material forms that people make/build, through which human cognitive and social life are mediated and often constituted. An example of material engagement is provided by culturally specific technologies that can be said to have constituted part of human cognitive architectures since, at least, the upper Paleolithic period (Donald, 1991).

Work on Cultural-Historical Psychology

Early work on cultural and developmental psychology conducted by Soviet researchers (most notably Luria, 1994; Vygotsky, 1987) was also critical for the development of some of the core ideas underlying EMT. This work showed how higher-level psychological processes could take place through the child’s participation in socio-cultural activities and historical processes. The main idea of this influential line of research was that psychological abilities are produced (not necessarily inherited), and that the structure of these abilities mirrors the external structure of the actions concerning which one is engaged, often in collaborative interaction, with others. In brief, this is the idea that the mind is socially developed. This idea profoundly inspired research on EMT, especially research recently conducted on social cognition, which we review in the last section of this chapter.

This brief overview of the different strands of research that prompted and inspired the formulation and development of EMT demonstrates EMT’s richness as a theoretical hypothesis, but also grounds it firmly in research conducted in the empirical sciences (e.g., neuroscience, psychology, cybernetics, biology, and robotics).

In what remains of this chapter, we look at the empirical support for EMT and clarify the extent to which researchers in philosophy, psychology, and neuroscience, in their everyday work and practice, already implicitly assume extended cognition ideas or even actively operate with them. In particular, in Section “The Extended Mind Thesis: Two Strands of Research”, we introduce the principles and tenets underlying EMT and critically discuss two strands of research on EMT (parity-based and complementarity-based). This is done to provide a fuller understanding of the many facets of this theory. We subsequently review work in “traditional fields” (such as memory, vision and action, language and gesture) that shows how the principles and tenets of this theory have inspired research in the cognitive sciences, thus attesting to the power of EMT as a research paradigm. We then review two major recent applications of EMT (in the field of social cognition and music perception) that further shows how productive these ideas can be for cognitive scientists and psychologists alike. We conclude this review chapter by arguing that EMT is a solid and mature research program with strong practical applications in the empirical sciences.

The Extended Mind Thesis: Two Strands of Research

EMT asserts that mental states and cognitive functions may sometimes supervene on organized systems of processes and mechanisms that criss-cross the boundaries of brain, body, and world. The crucial idea underlying EMT is therefore that some of our cognitive processes can and do actually extend outside of our heads. According to EMT, cognition does not exclusively take place inside the biological boundary of the individual but, on the contrary, can arise in the dynamical interplay between neural structures, body, and world. EMT claims that these pervasive, intimate, action-orienting, and behavior-guiding interactions result in external features that actively participate in an organism’s mental activity, becoming functionally integrated in its cognitive superstructure. In other words, structures and processes located outside the human head can become, under certain conditions (known as glue and trust conditions), parts of the machinery that instantiates cognition. These conditions (the glue and trust conditions) were formulated in order to prevent the mind from spreading rampantly into the world and were offered as requirements needed for an external resource to effectively count as part of the mind. The glue and trust conditions say that an external resource counts as part of our mind only if it is (1) portable, (2) easily accessed, and (3) automatically endorsed (reliable). As Sutton put it: “external systems and other cognitive artifacts are not always simply commodities, for the use and profit of the active mind: rather, in certain circumstances, along with the brain and body which interacts with them, they are the mind” (Sutton, 2010, p. 190).

EMT is often depicted as flowing naturally from functionalist views concerning the “multiple realizability” of cognitive processes (Wheeler, 2010), and indeed one strand of argument for EMT invokes the so-called “parity principle” (PP henceforth). This argument is exemplified by the famous case of Otto and Inga, introduced by Clark and Chalmers (1998), which we discuss next.

Inga, a healthy subject, hears about a new exhibition at the Museum of Modern Art in New York and realizes that she wishes to see it. Upon hearing this information, Inga uses her biological memory to form or retrieve the belief that MoMA is on 53rd street, and makes her way downtown. At the same time, Otto hears about the very same exhibition. He also likes the exhibition’s theme and decides to visit it. Unfortunately, Otto suffers from Alzheimer’s disease. His medical condition prevents Otto from reliably using his biological memory to form or retrieve the belief that MoMA is on 53rd street. However, as a compensatory strategy, Otto has learned to rely upon a notebook, in which he writes all the stuff he can no longer remember with his brain. Otto always keeps his notebook ready to hand, so that when he needs it, he can smoothly retrieve the crucial information from it. In the case at stake here, Otto uses the notebook to retrieve information about the location of MoMA and then sets off.

Clark and Chalmers (1998) ask their readers to compare the cases of Otto and Inga and invite them to reflect on whether they should attribute to both Otto and Inga a standing belief about the physical location of New York’s MoMA. Clark and Chalmers (1998) believe that “the information contained in Otto’s notebook plays the same causal role in guiding his actions as Inga’s biological memory does in the guidance of her actions” (Kiverstein & Farina, 2011, p. 37). For this reason, they count Otto’s notebook as part of the causal machinery that instantiates his standing beliefs. We should not treat Otto’s case differently from Inga’s case, they argue, just because the states that drive Otto’s behavior are partly offloaded onto the environment and therefore located outside of Otto’s physical boundary. This is the thought that stands behind PP, which runs as follows:

If, as we confront some task, a part of the world functions as a process which, were it done in the head, we would have no hesitation in recognizing as part of the cognitive process, then that part of the world is (so we claim) part of the cognitive process (Clark & Chalmers, 1998, p. 2).

PP thus invites us to assess whether a state can count as a belief, in part, on the basis of the causal role it performs. For Clark and Chalmers (1998), it does not really matter where the lodger of this causal role is housed. It can be located within the confines of the biological body, or rather span the brain, body, and world. What makes something a belief is for them a matter of the causal relations that this lodger entertains to inputs and outputs and to other mental states. In other words, Clark and Chalmers (1998) do not believe that the physical details of a state that stands in these causal relations can matter when it comes to decide whether the very same state counts as a belief or not.

Besides this functionalist approach to EMT, a second strand of research (equally important) has been recently developed. This second strand emphasizes considerations of “complementarity.” Complementarity approaches (Sutton, 2010) investigate the many different ways in which diverse components (neural and extra-neural) of a cognitive system intermingle and function together in forging a novel unit of cognitive analysis and complex cognitive behaviors that would not otherwise be experienced by the user’s brain on its own (Farina, 2019). This understanding of EMT entails that outer states or processes need not to replicate the functions and the roles of internal biological ones, but rather that different components of a cognitive system can coalesce and reciprocally intermingle in the production of flexible cognitive behavior.

In truth, complementarity themes can be found in Clark’s seminal work (1997). In “Being There” (1997), Clark highlights the crucial transformative power of artworks, pieces of technology, media, social networks, and institutions for human cognitive behavior, while illustrating the frequency with which we rely, in rich and interactive ways, on the capacities of specific non-biological features. These extra-cranial features are, Clark (1997) argues, quite often “alien but complementary to the brain’s style of storage and computation. The brain need not waste its time replicating such capacities. Rather, it must learn to interface with them in ways that maximally exploit their peculiar virtues” (Clark, 1997, p. 220). Thus, rather than causally aiding the production of cognitive behaviors, these complementary, non-biological/extra-neural factors become “equal (though different) partners in coordinated, coupled larger cognitive systems” (Sutton, 2010, p. 524).

Elsewhere, Clark (2003; 1997, p. 99) further reinforced the idea that external factors can have an impact in multiple and significant ways upon our biology, so as to create augmented systems whose cognitive power goes well beyond that of the naked brain alone. Complementarity is thus clearly a key theme or a trademark label of his own work. Here, however, it seems fair to do a bit more citation and crediting. While standard Complementarity themes are peculiar to Clark’s seminal works and can also be found in earlier treatments [such as Rowlands (1999) for instance] harking back to and building on Wilson (1994) and Haugeland (1998), the idea of picking out Complementarity as a clear alternative route to EMT that differs from parity should be ascribed to Sutton (2002, 2006, 2010), whose treatment has subsequently forged the basis for Menary’s defense of extended cognition based on the idea of cognitive integration (Menary, 2007). Complementarity has also recently become a central theme in Rowlands (2010). For a more technical overview of these issues, please refer to Farina (2020).

Having briefly discussed the two strands of research that characterize EMT, and therefore having better understood the many facets of this theory, we now turn to briefly review some research in the empirical sciences that is consistent with these ideas. The goal of this brief review is to demonstrate that EMT is a solid and mature theory of cognition. In other words, EMT is not just a philosophical mantra empty of empirical contents.

Memory

As Sutton (2006) noticed, an alliance has been forged between memory research in cognitive psychology and the independent set of ideas in theoretical cognitive science labelled as the “extended mind” hypothesis. This alliance has produced an original understanding of the process of reminiscing, which is nowadays often described as an inferential process that is constructive and creative in character, rather than merely reproductive. In recent years, as a result of this strong alliance, a number of memory theorists (Sutton, 2006; Tribble, 2005) have begun treating the vehicles of representation in memory as well as the processes underlying remembering itself as effectively spreading across brain, body, and world. More precisely, extended theorists of memory have argued that stable storage of information over time is, in many cases, only possible against the backdrop of a social context and achievable through the integration of biological and external materials (such as symbolic, technological, and cultural artifacts). As Sutton put it: “mnemonic stability is often supported by heterogenous external resources as well as, and in complementary interaction with, neural resources” (Sutton & Windhorst, 2009, p. 229).

In this research field, EMT has therefore represented a crucial source of inspiration for many researchers. Rowlands (1999), for instance, using the conceptual palette afforded by EMT, argued that at least some memory processes must be understood as the result of a series of interactions between a remembering organism and its environment. On this basis, he claimed that working memory must be described as “hybrid” in character (similar claims have been made by Wilson, 2005). However, EMT was also profoundly shaped by the everyday practice of memory theorists, who provided new powerful insights for its own development. In what follows, we briefly summarize research that highlights the two points introduced in this paragraph.

Let us start with the former, which is about how the principles and tenets underlying EMT have inspired the works of philosophers, anthropologists, and psychologists in memory studies, especially with respect to social interactions (Nelson & Fivush, 2004), augmented memories (Donald, 1991), and collaborative recall (Fivush & Nelson, 2006; Sutton et al., 2010; Tollefsen, 2006; Tribble, 2005).

Nelson and Fivush (2004), two of the major contemporary proponents of the socio-cultural developmental theory of memory, described the emergence of autobiographical memories (episodic and semantic recollections from an individual’s life) as the product of a socio-cultural cognitive system “wherein different components are being opened to experiences over time, wherein experiences vary over time and context, and wherein individual histories determine how social and cognitive sources are combined in varying ways” (p. 487). The idea is thus that autobiographical memory is a cultural activity specific to the individual and the society from which it is shaped.

The ability to create an autobiography, a personal history of self that is continuous in time, with specific events experienced at particular points and linked both to each other and the present, is a complex human skill that relies on multiple component developmental skills, including the development of subjective consciousness, the developing ability to link past self to present self, and the developing ability to construct a personal time (Fivush, 2011, p. 561).

Autobiographical memory, according to Fivush (2011), must therefore be understood as a complex ability with a long developmental history that encompasses both phylogenetic (diachronic) and ontogenetic (synchronic) contributions. It is precisely this link with the ontogenetic level that offers a strong connection with EMT, as EMT is—largely—a theory of cognition that studies the synchronic interrelations between an individuals’ brain and body and his/her own surrounding environment, focusing on the ongoing interactive dance between brain and world through which, by forms of “continuous reciprocal causation,” adaptive action results (Clark, 1997, pp. 163–166).

Donald (1991) also studied the changes to human memory that resulted from the spread of external symbolic representations (exograms) in order to explain how the storage capacity of humans’ biological memory systems became enhanced throughout human cultural evolution. According to Donald (1991), “exograms” allowed humans to augment their working memory capacity by manipulating complex representations that shaped the biophysical and biochemical functioning of their internal biological memory storage system (what Donald calls “engrams”). This thereby created distributed hybrid networks of memories that dramatically transformed our cognitive profile, allowing us to retain crucial information more safely over time than our fragile biological memories could. The link between this research and research on EMT is, again, quite evident, and it relies on the idea of enhancing cognitive processes (memories in this case) through socially engaged and distributed partnerships with rich socio-cultural environments. Yet, the subfield of memory studies, where research on extended cognition has probably exerted the maximum influence, is the study of collaborative recall.

Tribble (2005) investigated how actors in early modern theater were able to meet the challenge of performing an astounding number of plays every week by exploiting the social and physical scaffolding of their environments; that is, by relying on the interplay of biological recall, specially engineered spaces, and clever socio-cultural practices. In a fascinating study, Tollefsen (2006) further explored the space of interaction between memory, social ontology, and extended cognition. In particular, Tollefsen (2006) focused on socially distributed transactive memory and used experimental work on collective recall to explain the formation of intentional and epistemic properties in groups of individuals (Kiverstein et al., 2013). In brief, to explain the emergence of group (collective) minds, Fivush (2007) also tested the claim that reminiscing is an intrinsically social activity and looked specifically at how mothers engage in reminiscing with their kids. Fivush (2007) found out that highly elaborative mothers who engage in long conversations about the past with their kids usually have children who are able to tell more emotionally expressive narratives, to display a better understanding of the self, and to produce more elaborated memories of their past than children whose mothers were not as elaborative.

All of the studies reviewed in this section show how principles and tenets underlying research on EMT (such as the idea of synchronic interactions among individuals, augmented memory capacity developed through proactive collaborations or partnerships with external props) inspired, and to some extent even guided and directed, the practice of many memory theorists. However, we should also acknowledge that research on EMT was, in turn, deeply influenced by work carried out in memory studies. Sutton et al. (2010), for instance, offered a detailed, multidimensional account of collaborative recall in older couples, which demonstrated how collaboration may facilitate or hinder memory among dyads of married individuals. This account was used to articulate and further develop research on the second strand of EMT—complementarity-based, which we discussed earlier on in Section “The Extended Mind Thesis: Two Strands of Research” above—which supports an extended and socially distributed view of memory and cognition.

Having reviewed works in the field of memory studies that clearly showcases the heuristic value of EMT in the field, we next turn to analyzing research in another important domain (vision and action), where such heuristic value is also quite evident.

Vision and Action

A substantial body of research on so-called animate vision (Ballard et al., 1997; Churchland et al., 1994) has proven the embodied and active nature of visual perception. Works on animate vision rejected what Churchland et al. (1994) dubbed the paradigm of pure vision, “the idea that vision is largely a means of creating a world model rich enough to let us throw the world away” (Clark, 1999, p. 345), and by contrast, gave action a starring role. Vision, Churchland and colleagues write, “has its evolutionary rationale rooted in improved motor control” (Churchland et al., 1994, p. 25). Research on animate vision thus endorsed a view in which a perceiver uses her body and various other structures in the environment to offload perceptual processing onto the world (Farina, 2020, p.81).

As an illustration of this point, consider next Dana Ballard’s work. Ballard et al. (1997) performed a series of experiments, which demonstrated how saccades (eye movements) could be used to access task-relevant information in the world. This work demonstrated that in order to avoid maintaining and updating costly, enduring, and detailed internal models of our visual surroundings, we normally end up sampling the environment in ways suited to the particular needs of the moment’ (Farina, 2020, p. 81). Rowlands (2006) offered a fascinating account of normative actions (which he calls deeds) that is independent of any connection to prior intentional states (thus attuned to EMT), and argued that the saccadic deeds involved in certain acquired, skilled movements—such as cricket batting—have a function that is deeply rooted in the individual and collective history of the skillful practice in question. Hurley (2001) also defended an extended account of visual perception in which real-time, action-oriented physical interactions with the surrounding environment are necessary for furnishing the content of visual experience. Likewise, Noë (2004) famously argued that visual perception is not something that happens to us, or in us; rather, it is something that we do. In other words, perceptual experiences ultimately depend on capacities for action and thought, and vision is a kind of skillful activity involving the body as a whole.

More recently, this approach to vision has been further developed by Lauwereyns (2012), in sharp critical reference to Marr and Poggio’s computational account of vision (1979). His “intensive approach to vision is a combination of classic computational theories of perception (a là Marr, 1982) that say that vision is essentially a top-down process, and less conservative accounts (a là Noë, 2004) that emphasize the pervasive sensorimotor nature of perceptual experience and the role that (bottom-up) sensorimotor engagements play in visual processes” (Farina, 2013a, p. 1036).

Summing up, we can thus say that nowadays vision is widely understood (Parr & Friston, 2017) to be an activity in which a perceiver actively uses various structures in the environment to offload cognitive processing onto the world and to determine the content of her experience. Vision is thus fundamentally tied to motor representations (Bruineberg et al., 2016; Zimmermann & Lappe, 2016).

Research on sensory substitution devices (SSDs) represents a very nice example of how some of the basic principles underlying work on EMT are expressed in the field of vision and action. The term “sensory substitution” refers to the use of a sensory modality to supply environmental information normally gathered by another sense (Farina, 2019). Sensory substitution devices (SSDs) thus provide through one sensory modality (the substituting modality) access to features of the world that are generally experienced through another sensory modality (the substituted modality). There are two main classes or categories of SSDs. These are: visual-to-auditory systems [such as the vOICe] that translate visual images taken by a video camera into auditory soundscapes, and visual-to-tactile sensory substitution devices [such as the Brain Port] that translate visual inputs into vibrotactile stimulation. Much of the philosophical debate surrounding SSDs has been characterized by the attempt to individuate which sensory modalities the use of these modalities trigger, and whether the coupling between the impaired users and these devices can augment the users’ cognitive abilities (Macpherson, 2018; Matthen, 2015).

Auvray and Myin (2009) argued that SSD use triggers in proficient users the acquisition of a new sensory modality, which is complementary to those standardly recruited. Kiverstein et al. (2015) echoed this understanding and pointed out that persistent SSD usage delivers a new mode of perception that is not reducible to that of any single existing sense or any combinations of existing senses. Instead, under certain circumstances, a given cortical brain region—e.g., sensory cortex—can exhibit functional plasticity and support novel perceptual processing irrespective of the nature of the sensory inputs that are being sent to it. Farina (2013b) and Auvray and Farina (2017) agreed that the coupling between the user and the device may trigger an entirely new sensory modality but reflected on whether this new modality may count as a form of artificially induced synesthesia. Finally, Kiverstein and Farina (2012) studied the partnership between the user and the SSD, and argued that it enables cognitive transformations that gradually allow the device to become fully transparent and be part of the machinery that instantiates cognition, leading the user to experience new phenomenal occurrences.

Complementarity approaches (Clark, 1997; Sutton, 2010), as we have seen before, investigate the many different ways in which diverse components (neural and extra-neural) of a cognitive system intermingle and function together in forging a novel unit of cognitive analysis and complex cognitive behaviors that would not otherwise be experienced by the user's brain on its own. Research on SSDs clearly provides important empirical applications for this set of ideas. SSDs, in providing their experienced users with a new phenomenal way of accessing the world, become a paradigmatic example of cognitive augmentation and contribute to create a new space of biotechnological synthesis between the user, the device, and the world in which it is used (incorporation and transparency take place).

Having briefly explored the points of connection between research on EMT and work in the empirical sciences in the field of vision and action, we can now turn to investigate how EMT can be used to shed some light on the relationship between language and gesture.

Language and Gesture

Language and Gesture are two fields in which research on EMT has proven to be a good source of inspiration for many researchers and where the principles underlying it have been consistently and successfully applied. We discuss these two fields in this section separately, starting with language.

Work on the extended mind (e.g., Clark & Thornton, 1997) describes language as a kind of meta-tool enabling a variety of cognitive extensions. Clark (2006), for instance, argued that language (understood as a public tool) not only gives us opportunities for communicating our thoughts, but can also be used to enhance our cognitive skills [an understanding also echoed by (Dennett, 1984)]. This particular understanding of language has been quite influential both within philosophy and in the empirical sciences. For instance, Boysen et al. (1996) showed how chimpanzees can be trained to use symbol tokens to decompose high-level cognitive tasks into simple pattern matching tasks, which they then could easily solve. Deacon (1997) developed an influential theory of language that asserts that our capacity for abstract thought, logical reasoning, and compositionally structured, counterfactual thinking is invariably dependent on public language, which is the tool that ultimately allowed the evolution of our minds (Kiverstein et al., 2013).

In a similar vein, Clark (1996) claimed that language is a joint activity, involving individual and social processes, that emerges when listeners and speakers perform their action “in ensembles,” as a single cognitive unit of analysis. Language can thus be considered (in accordance with the principles underlying EMT) as a heterogeneous set of physical, cognitive, and socio-cultural activities unfolding in real-time across multiple time-scales, which can be viewed as a sort of augmenting property (Spurrett & Cowley, 2004) that emerges when we begin coordinating our life worlds with each other, behaviorally and cognitively (Cowley & Spurrett, 2003).

Language, however, often relies on gestures. Gestures can map different meanings—often more effectively—than language (Jamalian et al., 2013). In addition, gestures can help an audience to understand a message, by showing the transformation of ideas and relations between one thought and the next (Chu & Kita, 2011). Furthermore, gestures can also help formulate thoughts and augment comprehension and learning capabilities (Chu & Kita, 2011; Jamalian et al., 2013; Kang et al., 2012).

Clark (2008) extensively discussed the role of gestures in cognition. In particular, he defended the idea that gestures, at least on some occasions, may count as an example of a non-neural activity that becomes part of the machinery that instantiates intra-personal thinking and higher-level reasoning. The claim that gestures play a similarly constitutive role in inter-personal cognitive engagement and thus the idea that they may perform a crucially important expressive role in human communication, becoming themselves constitutive parts of the process underlying human thought has also been endorsed by other researchers, among them cognitive scientists, anthropologists, and even psychologists (Alač & Hutchins, 2004; Hutchins, 1995; Latour, 1984; McNeill, 2005; Radman, 2013). For instance, in a fascinating paper inspired by Clark’s seminal work (1997), Alač and Hutchins (2004) discussed an example of how gestures can become vehicles of mental states in intersubjective interactions. The example involves' an expert scientist interacting with a novice with the aim of teaching her how to grasp the meaning of complex FMRI scans. Alač and Hutchins (2004) show that the gestures performed by the expert are instrumental, constitutive components in the process of teaching the novice scientist how to rightly understand and grasp the meaning of certain structures in brain scans’ (Ciancarini et al., 2021a, p.71945) (see Gangopadhyay, 2011 for a detailed discussion of this experiment).

This is thus a case where the gestures performed by the more experienced group member enabled the less experienced one to acquire a fundamental cognitive skill, and therefore quite clearly demonstrates how gestures (an example of a non-neural activity) can profoundly shape the learning of a cognitive skill (learning to see certain patterns in FMRI images). So, gestures can definitely assist the learning process by scaffolding understanding into more advanced stages of cognitive competence (Krueger, 2013, p. 42). It has also been demonstrated that students who mirror teachers’ gestures learn quicker and more effectively than those who do not (Cook & Goldin-Meadow, 2006). This is because gestures are often used to guide someone’s attention and to represent both actions and ideas (Becvar et al., 2008). Another example of how gestures can become constitutive of certain cognitive abilities is offered by Cappuccio et al., who investigated how pointing (which is a type of gesture) becomes instrumental and thus constitutive for representing geometrical information concerning one’s own gaze direction. Finally, Goldin-Meadow and Singer (2003) also studied how gestural movements not only help us to express our thoughts, but also play an essential role in teaching concepts and ideas.

The work on language and gestures reviewed in this section shows how the principles of EMT (such as the idea of language as an augmenting property arising in real-time interactions across multiple timescales from a different set of physical, cognitive, and socio-cultural activities, or the understanding of gestures as enhancing comprehension and learning capabilities) have inspired—in important ways—research in these two domains, further attesting to EMT’s positive heuristic value.

Having drawn some important connections between EMT and three major domains of research (memory, vision and action, language and gestures), where the principles underlying EMT have been applied, we next review two more major recent applications of EMT (in the field of social cognition and music perception) that further attest how influential these ideas can be for philosophers, social psychologists, and cognitive scientists. The goal of the following discussion, besides showing that EMT is a solid research program, is also to broaden the debate beyond the traditional fields and thus to demonstrate how these “theoretical” considerations can ramify in the wider world.

Two Recent Developments

Social Cognition

A number of researchers have attempted to apply some of the principles underlying research on EMT to the domain of social cognition (Fuchs & De Jaegher, 2009; Gallagher, 2013; Gallagher & Crisafi, 2009), developing a new line of research (the so-called Socially Extended Mind), which claims that the mind is not only extended in the physical environment but also in enactive engagements with the social milieu. The idea underlying this line of research is therefore that social cognition is not fully reducible to the workings of individual cognitive mechanisms and that socially interactive—or distributed processes—can often complement and even replace them.

The first paper that made this point clear was written by Gallagher and Crisafi (2009), who argued that institutions (such as legal systems, educational systems, or cultural organizations like museums) are often the expression of minds that are externalized and extended into the world. According to the authors, these institutions are “mental institutions” because they help us accomplish sophisticated cognitive tasks that as individuals would not be able to accomplish. The basic idea here is therefore that the cognitive processes characterizing these institutions do not necessarily happen in individual brains; quite the opposite, they occur in socially structured interactions that are made possible by the very existence of these “mental institutions.” Building on this idea, Gallagher (2013) further claimed that social-institutional practices allow those who are capable of acting through them to extend their mind socially, both in terms of vehicles and contents.

The question about the possibility of social mental extension, based on considerations about the status of groups of people acting together, was also central to Theiner et al. (2010), who claimed “that groups have organization-dependent cognitive capacities that go beyond the simple aggregation of the cognitive capacities of individuals” (p.378). Ludwig (2017) critically investigated the collective minds hypothesis we described—the possibility that an institutional system of cognitive processes could be considered as a cognitive agent—and provided an interesting challenge to the idea of a socially extended mind by showing that we ought not to treat corporations or institutions as cognitive agents because socially situated minds do not necessarily give rise to collective minds (the reason, on his view, is that the mechanisms and processes that characterize socially salient forms of mental activity are traits of individuals not of groups). More recently, in disagreement with Ludwig (2017), Gallotti and Huebner (2017) framed this work on extended minds within the broader conceptual palette afforded by research on socially distributed cognition. In particular, the authors defended a view that emphasizes the ineliminable social dimension of individual minds. Finally, Lyre (2018) studied the mechanisms of shared intentionality in human beings and argued “that they can too be considered as coupling mechanisms of cognitive extension into the social domain” (p.831).

The field of extended social cognition is relatively young (about 10 years old) and also very fluid. Yet, from the brief review we conducted in this section, it can be seen that ideas underlying this research (such as the idea of mental institutions defended by Gallagher, 2013) have been inspired by tenets and principles of EMT. In addition, this work on socially extended minds stresses the instrumental role of non-neural factors (in this case, mental institutions) in forging novel cognitive and social abilities. As such, we can assert that this work is strongly attuned with the complementarity version of EMT we described earlier on.

Having looked at how ideas underlying EMT are inspiring research in the field of social cognition, we next turn to analyze another field where EMT has exerted extraordinary influence: the field of music cognition.

Music Cognition

To date, three major accounts of music cognition, inspired by EMT, have been developed. These are (a) Cochrane’s “expression and extended cognition” (2008); (b) Krueger’s (2014, 2015) “musically extended emotional mind”; and (c) Kersten’s “extended computational view of music cognition” (2015, 2017).

In truth, though, some of the ideas characterizing these accounts, and therefore underlying EMT, can be found in Clarke’s seminal work (2005), which proposed an ecological approach to music perception. Clarke, in particular, investigated the relationship between music, motion, and subjectivity, and stressed the importance of understanding music as a relationship between active perceivers and rich, highly scaffoldable environments.

However, the first proper account of extended music perception was effectively proposed by Cochrane (2008), who argued (i) that music is a cognitively extended process and (ii) that musicians often use music to enhance the formation of their emotional states (Cochrane, 2008, p. 335). The idea underlying Cochrane’s proposal is that the artistic medium (the particular musical instrument) used by the musician allows her to manipulate the music, thus extending her cognitive repertoire and enabling the creation and production of new musical pieces. This allows the musician, on Cochrane’s view, to reflect and elaborate her emotional states, so that eventually the musical experience replaces the body as the central focus of the emotional content.

Building on Cochrane’s seminal account, Kruger (2014) argued that music is a mighty environmental resource for extending our emotions. On the basis of a series of neuroimaging studies, which showed that our emotional responses to music trigger and activate brain structures involved in generating, detecting, maintaining, and regulating emotions (Koelsch, 2014; Overy & Molnar-Szakacs, 2009), Kruger claimed that music perception and emotion experience are closely coupled processes (Krueger, 2014). He writes: “[M]usical affordances provide resources and feedback that loop back onto us and, in doing so, enhance the functional complexity of various motor, attentional, and regulative capacities responsible for generating and sustaining emotional experiences. It is thus sensible to speak of the musically extended (emotional) mind” (Krueger, 2014, p. 4). The idea is therefore that music through its auditory affordances can intensify our social experiences and—in regulating our emotions—can also alter our behaviors.

This research, in emphasizing the role of affordances and the complementary contributions of the environment to music perception and experience, was clearly inspired by the complementarity-based approach to EMT that we discussed earlier on. However, not all research in the field of music cognition can be said to draw from complementarity considerations. Some researchers have stressed the profound link between the functionalist (computationalist-based) view of EMT and music perception. In particular, an account of music perception based on the functionalist strand of research was recently proposed by Kersten (2015, 2017).

Music, in his view, is an extended phenomenon, but one in which the enacted relation with emotional states of the musicians is not so crucial. What is crucial, on Kersten’s account, is instead the computational and information-processing nature of music perception. The idea is thus that music perception can be understood in light of the wide computationalist framework developed by Wilson (1994), which is the view that asserts that at least some of the unit of a computational system reside outside of the individual. There are, of course, certain conditions that a wide computational cognitive system must fulfil, according to Wilson (1994). We do not have the space to review these conditions here but suffice to say that Kersten, in his papers (2015, 2017), demonstrates that these conditions are fully satisfied and, on these grounds, goes on to substantiate his wide computationalist approach to music perception.

In this section, we reviewed recent work on socially extended cognition and extended music perception and showed how this work was inspired by different strands of research underlying EMT. Next, we draw a short conclusion, summarizing what we have achieved so far, and point to some limitations affecting the theory.

Conclusion

In this chapter, we looked at how EMT inspired several lines of research in philosophy and the cognitive sciences. We reviewed empirical support for EMT and clarified the extent to which researchers in the empirical sciences have sometimes used principles and tenets underlying EMT to further their research. We also looked at how this has sometimes influenced the development of EMT itself.

Of course, EMT has crucial, important limitations, and we do not believe that it can be meaningfully used to explaining all sort of phenomena in cognitive psychology. There are indeed some areas or domains (such as concepts and prototypes, priming effects, word frequency, sentence processing, and possibly face perception), in which EMT will probably not be particularly helpful to psychologists. Yet, after reviewing all of these studies, the modest conclusion that this review chapter wants to make (and that should at this point be evident to our reader) is that EMT is a solid and mature research program with strong practical applications in the empirical sciences, whose importance stretches far beyond the philosophical arena into the wider society.