Introduction

In the last decades, the theory of “embodied language” has been widely discussed in neuroscience (Barsalou 1999, 2008; Buccino et al. 2016; Fischer and Zwaan 2008; Gallese 2003; Gallese and Lakoff 2005; Glenberg 1997; Glenberg and Robertson 1999; Jirak et al. 2010; Lakoff 1987; Pulvermueller 1999, 2002; Zwaan 2004; Zwaan and Taylor 2006). The theory claims that the same neural structures involved in making sensory, motor, and even emotional experiences are also involved in understanding linguistic material related to those experiences. This approach contrasts with the “classical view”, claiming that language is essentially a-modal and mastered by specifically dedicated neural structures (e.g.: Fodor 1975; Pylyshyn 1984; Mahon and Caramazza 2005, 2008; Chatterjee 2010).

One can say that embodied language more and more configures itself as an emerging scientific paradigm (Kuhn 1962) for studying language from the neuroscientific standpoint. Since it progressively and successfully addresses specific issues about human language and its brain correlates, embodiment may be considered as a “progressive research program” (Lakatos 1970). Moreover, it shows a significant potential for future achievements: it is “prospectively fruitful” (Colagè 2014; see also Auletta et al. 2011, pp. 27–50). As for any emerging paradigmatic approach; however, embodied language has to face several issues that are hard to explain within its theoretical framework.

The embodied approach to language has achieved significant empirical results, especially as far as words expressing concrete contents are concerned. For example, during the processing of verbs expressing concrete actions, there is a clear involvement of different sectors of the motor system where the effectors involved in the actual execution of those actions are motorically represented (Hauk 2004; Buccino et al. 2005; Tettamanti et al. 2005; Aziz-Zadeh et al. 2006). Moreover, behavioral results (Buccino et al. 2005; Boulenger et al. 2006; Sato et al. 2008; Dalla Volta et al. 2009) show that the motor responses are modulated by processing verbs involving the responding effector. Similar results were obtained during the processing of nouns (Marino et al. 2013, 2014; Gough et al. 2012, 2013; Tucker and Ellis 2004; for review on object processing, see Martin 2007), even when expressed in a second language (Buccino et al. 2017).

By reviewing the empirical evidence supporting the embodied perspective for concrete language, a recent paper (Buccino et al. 2016) also suggested that the meaning of a linguistic expression derives from actual, real-life experiences, and is, therefore, grounded in the neural substrates underpinning those experiences. From this viewpoint, a major challenge for embodiment is to explain how abstract language is coded in the brain, as it is hard to see how the meaning of abstract linguistic items should rely on modality-specific brain systems and ultimately derive from the speakers’ concrete experiences. This difficulty is reflected by the relative scarcity of empirical findings concerning the brain correlates of abstract language processing and by the variety of theoretical stances about this issue (for recent reviews, see Binder et al. 2009; Wang et al. 2010; Kemmerer 2015).

The main aim of the present review is discussing abstract language within the embodied approach. First, we will inquire into key conceptual issues, trying to clarify the very notion of abstractness in the light of the Western philosophical thought, with a special focus on the British empiricist tradition. We will argue that words usually considered abstract in the scientific literature owe their “abstractness” to the high complexity of the experiential clusters to which they refer, rather than to a supposed detachment from experience. Second, we will review relevant theoretical and psychological approaches and empirical findings concerning the embodied approach to abstract language. Finally, we will bridge the previous sections spelling out the consequences of the mentioned understanding of abstractness. By reviewing the current literature on abstract language, we will keep the same perspective taken in a previous review (Buccino et al. 2016) devoted to concrete language. We will argue that experience is at the basis of abstract words as well as of concrete ones, the difference being in terms of the complexity of the related experiences and the underlying neural substrates. We will forward that the neural substrates engaged in processing abstract language are not distinct from those engaged by concrete language; rather, these very same substrates intervene in processing abstract words in a combined manner. Such a combined interaction grounds the complexity of abstract words meaning. We will define this complexity in terms of an increase in (1) the number of biological effectors (hand, foot, and mouth) recruited; (2) the number of systems (sensory, motor, and emotional) involved; and (3) the number and variety of associated contexts and situations.

Abstractness in philosophy

In the present-day dictionaries (e.g., the Oxford Dictionary), the term “abstract” is usually defined as something related to thoughts and ideas that do not have concrete or physical existence, or as something coming from pure reasoning and unbound from actual experiences and events. Given this understanding of abstractness, abstract linguistic expressions would seem, almost in principle, impossible to be dealt with in terms of embodied language. However, the Western philosophical tradition offers an understanding of abstractness that turns out to be insightful from the embodied language perspective.

According to the Aristotelian standpoint, ideas or essences are not separated entities, but exist in their being embedded in material and concrete objects (see Book V of Aristotle’s Metaphysics). Thus, human beings can grasp concepts as they can “abstract” (extrapolate) them from physical instantiations (see, e.g., Aristotle, De Anima, 429 b 11). Note that abstraction, in this context, does not refer to categories of things or to notions that are only present in the mind and do not have a counterpart in concrete reality; rather, abstraction points at a general process through which any concept is formed. The Aristotelian viewpoint underwent developments that are interesting for what follows in the next sections, especially as far as the British empiricist tradition is concerned. For the aim of the present review, we will focus on John Locke’s thought.

At the beginning of Book II of An Essay Concerning Human Understanding (1690), Locke asks whence human beings can gain their ideas, i.e., all the “materials of reason and knowledge” constituting their thoughts. The answer is as simple as thorough: from experience. More specifically, he states that the only ground of ideas is sensations (i.e., the affection that external reality exerts on our senses) and reflections (i.e., mind’s consideration of its own operations). In Locke’s view, ideas can be divided into two broad classes, i.e., simple ideas and complex ones. Simple ideas correspond to elementary aspects of external reality as grasped by our senses, such as the coldness or hardness of ice, the whiteness of a lily, or the sweetness of sugar. Complex ideas (like, e.g., beauty, gratitude, man, army, and universe) mount up many different simple ideas already acquired from experience. Therefore, in Locke’s view, complex ideas have the same origin as the simple ones, i.e., experience (see Locke 1690, Book II, Ch. 12, n. 8). Complex ideas are such because of the complexity of the experiences to which they refer. It is because of this complexity that these ideas are apparently further from experience than concrete ones.

Interestingly, Book III of Locke’s Essay (1690) is about words, understood as sounds that are signs of ideas. In this respect, he distinguishes proper names (e.g., Aristotle, Napoleon, Mount Everest, Piccadilly Circus, etc.) from general terms. Note that the names of both simple and complex ideas (as defined above) belong to the class of general terms. Thus, both concrete and abstract words are included in the category of general terms. Consequently, the meaning of general terms is always grounded in the experience they point at; the only difference is that the experience is simpler in the case of nouns of simple ideas and more complex for nouns of complex ideas.

Locke’s philosophy might seem naïve in comparison with the complications of post-linguistic-turn philosophy of language, as the linguistic turn tended to link tightly the issue of language with those of formal logic and truth (e.g., Frege 1892; Russell 1905; see also Colagè 2013). However, philosophers like Willard Van Orman Quine (1953), Wilfrid Sellars (1950, 1956), and Richard Rorty (1979) essentially addressed Locke’s problems about ideas and words. Rorty says that: “The picture of ancient and medieval philosophy as concerned with things, the philosophy of the seventeenth through the nineteenth centuries with ideas, and the enlightened contemporary philosophical scene with words has considerable plausibility” (Rorty 1979, p. 263). The point that we would like to stress is that Locke’s thought (as representative of the empiricist tradition) allows one to identify experience as the ground for humans’ dealing with things, ideas, and words. This seems particularly helpful in addressing the issue of linguistic meaning (especially of the so-called “abstract” language) from the viewpoint of the neuroscience of language in the embodied perspective. To this aim, Locke’s insights may be summarized as it follows:

  1. 1.

    Words express ideas. As ideas are rooted in actual experiences, words come to name simple or complex experiences.

  2. 2.

    With the sole exception of proper names, all words express general ideas and not ideas of single particular things. This means that, for example, both words like “cup” and words like “virtue” are abstract words. Consequently, the dictionary definitions of the term “abstract” summarized at the beginning of this section are not entirely coherent with such a conception of abstractness.

  3. 3.

    When considering different kinds of ideas, therefore, the key distinction is not between abstract and concrete ones, but between more or less complex ones. Words like “cup” are different from words like “virtue” because of the different degree of complexity of the underlying experiences. Recall that for Locke ideas always come from experiences, either directly or indirectly.

Theoretical frameworks in psychology and neuroscience

In this section, we will present the main theoretical frameworks elaborated in the last 50 years to address the issue of abstract language.

Since the 1970s, psychologist Allan Paivio proposed the so-called “dual-coding” theory, according to which cognitive processes involve the activity of two distinct systems: a verbal system (operating in the language domain) and a non-verbal, “imagery” system dedicated to real objects and events (Paivio 1971, 1986, 1991). According to Paivio, these two systems are built upon distinct internal representation units: the “logogens” for the verbal system, and the “imagens” for the non-verbal one. Logogens and imagens are modality-specific, and activate when an individual recognizes, manipulates, or simply thinks of words or objects, respectively. This implies that logogens represent specific sensory or motor properties relative to verbal labels, whereas imagens represent object properties from different sensory modalities. For instance, an imagen would represent the shape and color of an apple in the visual format, and another imagen would represent the scent of the apple in the olfactory format. On the other hand, a logogen for the word “apple” would represent the sound of the word, and another logogen would represent the set of motor commands to utter the word. Given the difficulty in imaging the content of abstract words, the dual-coding theory claims that abstract words are represented only through logogens, whereas concrete words would activate both logogens and imagens, thus having a dual codification. The dual codification of concrete words, which also involves the non-verbal system, should be at the basis of the well-known “concreteness effect”, according to which concrete words have advantage over abstract ones in terms of both recalling and recognition (James 1975; Whaley 1978; Rubin 1980).

Another proposal, labeled the “context-availability” approach, instead emphasizes that abstract and concrete words prompt different degrees of accessibility to their meanings as stored in semantic memory (Schwanenflugel 1991). Semantic information would be coded in a single a-modal format (i.e., a format independent of sensory and motor systems). Meaning retrieval is easier for a highly contextualized word because of strong connections formed between the phonological and/or orthographical characteristics of the word and (one of) its meanings. When presented in isolation, abstract words are difficult to understand, because they usually have multiple meanings; they could be more easily understood if or when the context provides elements to disambiguate their meanings (Schwanenflugel et al. 1988; Schwanenflugel and Shoben 1983; Schwanenflugel and Stowe 1989). Concrete words are more easily understood as they are more steadily and univocally linked to a physical referent (concreteness effect). For the context-availability theory, the concreteness effect depends on the greater ease with which the meaning of concrete words is retrieved from semantic memory.

Glenberg and Robertson (1999) forwarded the so-called “indexical hypothesis” of language acquisition, which attributes a key role to the brain motor system for processing both concrete and abstract action verbs. According to this hypothesis, for example, children learn the meaning of a verb expressing the transfer of an object, like “to give”, as they associate an action schema to the verb. Such an action schema is specified by a set of parameters like the direction of movement (e.g., from a giver to a receiver) and a transfer mode (e.g., a specific hand-prehension suitable for object transfer). Action schemas are essential for motor control and are coded in the pre-motor cortex. During language acquisition, an action schema is repeatedly associated, for example, with usages of the verb “to give”, so that the action schema itself would also come to ground the meaning of the verb. By modifying the parameters of the transfer mode, the same action schema can be applied to other verbs like “to deliver” or “to throw”. Eventually, the action schema might become the ground for linguistic expressions of abstract transfer modes (Glenberg and Kaschak 2002), like in the sentence “Lisa tells the tale to the child”. In this case, one could consider the action of telling (i.e., a mode of verbal communication) as a specific value of the transfer-mode parameter: Lisa (the agent) transfers (or “gives”) the tale (the “object”) to the child (the receiver) by uttering a sequence of words (the transfer mode).

Other developments stressed that, like concrete language, also abstract language can be ultimately rooted in sensorimotor experiences (Barsalou 1999; Barsalou and Wiemer-Hastings 2005; Wilson-Mendenhall et al. 2011, 2013a, b; Kiefer and Pulvermueller 2012; Pulvermüller 2013). Following these developments, abstract contents are seen as usually integrated in concrete situations, so that the repeated exposure to such situations may give raise to the meaning of abstract terms. Take, for instance, the assertion “the cup is on the table”. To check whether this assertion is true, an individual typically forms the image of a cup on a table (i.e., re-enact previous experiences of cups and tables) and compares this image with the situation at hand. If the image corresponds to the real, concrete situation, then the truth of the assertion is inferred. Now, the concepts of true and truth may emerge from repeated experiences of this kind, so that the meaning of the abstract word “true” (or “truth”) is still rooted in actual experiences (though at a more complex level than for concrete words, like “cup”). There may be contexts, like formal logic or high-level mathematics, in which it is not immediate to ascertain the embodiment of some notions, e.g., that of differential equation (see, e.g., Arbib et al. 2014). This represents a challenge for embodiment that deserves specific treatment and falls outside the scope of the present review.

A further model (Vigliocco et al. 2009; Kousta et al. 2011) has recently proposed that the meaning of abstract language can be directly grounded in emotional experiences rather than in sensory and motor ones. Emotional experiences stem from the perception of specific internal states of the organism (e.g., an acceleration in hearth pace, a sudden sweat, etc.) in response to certain environmental stimuli (like seeing a wild beast or receiving a harsh scolding from one’s boss) which induce an emotional state (like fear or shame). Consequently, according to this model, while the meaning of concrete language is primarily rooted in neural substrates underpinning sensorimotor experiences of the external world, the meaning of abstract language is grounded in neural substrates involved in processing internal, emotional states.

Another recent proposal, known as the “Words as Tools” (WAT) approach (Borghi and Cimatti 2009, 2012; Borghi et al. 2017), shares with the general embodied approach the idea that all words are grounded in the neural substrates for actions and experiences, but also suggests that the embodiment of abstract words differs in part from that of concrete ones. The WAT approach stresses that concrete and abstract terms are learnt in different contexts and at different ages. The meaning of a concrete word usually derives from the direct interaction with the word’s referent (an object or an event); moreover, many concrete notions are formed in individuals well before they acquire any linguistic competence. The situation is different for abstract notions, like those of God or virtue. The meaning of the word “God” is not primarily grounded in the direct experience that somebody can make of God; rather, according to WAT, the word “God” is used following conventions established by the community and the social context in which individuals live, act, and speak, and eventually on verbal information. The social origin of the meaning of such abstract words depends on the fact that their use is controlled by collectively shared rules. Such socially specified rules lead the individuals to select a set of bodily states (as well as of internal and external experiences) that come to define the meaning of a certain abstract word. Therefore, the social use of an abstract word affects the formation of its meaning in any individual exposed to a particular social context: in this sense, abstract words are social tools. Consequently, the WAT approach stresses that the meaning of concrete words mainly relies on perception and action, whereas the meaning of abstract words, being mainly based on social sharing, primarily resorts to a dedicated language system (Borghi et al. 2011, 2017).

It is worth reminding that, also in the field of linguistics, some authors (Lakoff and Johnson 1980) stressed that several abstract contents are linked to more concrete ones, and expressed through metaphors. For example, daily used abstract concepts like love, time, and conflict may be expressed through concrete nouns like “journey”, “money”, and “war”. Talking about a romantic relationship, one can use expressions like: “We have come a long way”, “this love story leads nowhere”, etc. The authors claim that our conceptual system is for the most part metaphorically structured. For example, we conceptualize what is not physically tangible in terms of what is physically tangible (such as our space notions as drawn from our physical interaction with the environment). Thus, Lakoff and Johnson (1980) argue that abstract contents are understood by linking them to concrete contents directly based on our sensory experience. Similarly, in the field of cognitive linguistics, it has been proposed that the conceptual metaphors at the basis of idiomatic or even poetical expressions are grounded in recurring bodily experiences (Gibbs 1992; Gibbs et al. 2004). The initial formulation of this kind of approach did not provide any hypothesis about the neural substrates mediating words’ meanings. Later, developments (Lakoff 1987; Gibbs and Steen 1999) proposed that the representation of an abstract content can be rooted, at least partly, in the same neural substrates mediating our experiences of the physical world.

As a whole, reviewing the theoretical frameworks proposed to explain how the brain may code abstracts words unveils that, even among the supporters of embodiment, there is not a unique view as for concrete words. Actually, in no case specific neural substrates and mechanisms are considered the only elements necessary and sufficient to process abstract words. Besides and beyond modal aspects, additional a-modal aspects and mechanisms are evoked to fully grasp abstract contents. As underlined in some recent reviews on this topic (Dove 2016; Borghi et al. 2017), hybrid models that take into account modal and a-modal aspects are generally considered as better means to explain the processing of abstract language. In this respect, Dove (2016) has suggested that abstract concepts pose at least three distinct problems for embodiment: the problem of generalization (i.e., the capability of building super-ordinate concepts encompassing several subordinate ones), the problem of flexibility (i.e., the fact that a number of factors—like physical environments, situations, body states, and current tasks—affect the way concepts are realized), and the problem of disembodiment (i.e. the idea that at least some cases of abstract concepts seem strongly divorced from experiential factors).

Abstract language in the brain

In this section, we will review the available experimental data, collected through neurophysiological, behavioral, and brain-imaging techniques, concerning the neural substrates of abstract language. We will also underline how these findings support one or the other theoretical framework reviewed above.

As we have seen, the dual-coding theory hypothesizes the existence of a verbal cognitive system, common to both abstract and concrete language, and an image-based cognitive system specific to concrete language only. According to Paivio (1986), the verbal system would be located in the left, language-specific hemisphere, whereas the image-based one would be spread in both hemispheres. On the contrary, the context-availability theory hypothesizes a single semantic system that concrete words would activate in a stronger way than the abstract ones because of their association to a richer context.

Clinical studies on split-brain patients or patients with the other cortical lesions (Coslett and Monsul 1994; Coslett and Saffran 1989; Coltheart et al. 1980; Zaidel 1978), as well as electrophysiological studies (Nittono et al. 2002; Holcomb et al. 1999; Kounios and Holcomb 1994), seem to suggest that the right hemisphere is much more involved in processing concrete words. The left hemisphere, instead, is similarly involved in both concrete and abstract terms. However, functional Magnetic Resonance Imaging (fMRI) studies failed to show a specific role of the right hemisphere in processing concrete words (Noppeney and Price 2004; Fiebach and Friederici 2003; Grossman et al. 2002; Friederici et al. 2000; Kiehl et al. 1999; Perani et al. 1999). Thus, the debate about the degree of specialization of the two hemispheres for either abstract or concrete language remains open.

An fMRI study (Binder et al. 2005) compared brain areas activated during a behavioral task in which participants had to indicate, pushing a button with a hand, whether letter strings represented real words or pseudo-words. Though balanced in terms of length, number of syllables, and frequency, half of the real words were concrete and half were abstract. The results confirmed that the answers were quicker and more accurate for concrete words than for abstract words or pseudo-words (concreteness effect). Moreover, as compared to pseudo-words, concrete terms activated several cortical areas in both hemispheres, whereas abstract terms activated areas only in the left hemisphere. Direct comparison between concrete and abstract words revealed that the latter activated left frontal areas. These data are in line with the predictions of the dual-coding theory: the cortical areas activated almost identically by abstract and concrete language are the left inferior and middle temporal gyri, and these areas could be the substrate of the verbal cognitive system in Paivio’s hypothesis. Note that the prevalent activation of areas in the left hemisphere during the processing of abstract language also fits with the WAT approach that assumes a specific role of language areas in coding abstract contents. However, it must be stressed that the left frontal activations evoked by abstract language coincide with those evoked by pseudo-words, and are likely the substrates for phonological processing, short-term memory, and lexical retrieval rather than for semantic processing (Fiez et al. 1999; Warburton et al. 1996; Paulesu et al. 1993; Démonet et al. 1992).

While the above studies aimed at assessing subtle differences between left and right hemispheres in processing abstract and concrete language, a number of more recent studies focused on the role of sensorimotor systems in processing abstract language with the aim of supporting one or the other theoretical framework reviewed in the previous section. Glenberg and Kaschak (2002) tested the indexical hypothesis, according to which understanding sentences expressing either abstract or concrete actions implies a process of re-enactment of those actions. The idea behind these studies is that, if our cortical motor system simulates the actions expressed by a sentence, then this simulation should affect the execution of a movement of the same body part involved in the simulation. Participants had to indicate whether the presented sentences were meaningful or not. The sentences could describe transfer of concrete objects (e.g.: “Andy gave you a pizza”, “you gave a pizza to Andy”) or abstract objects (e.g.: “Lisa told you a story”, “you told a story to Lisa”). Half of the sentences expressed a transfer towards participant’s body, and the other half a transfer away from his/her body. Participants gave their responses by means of a device with three buttons aligned, so that the first button was close to the participants and the third one farther: in this way, participants could respond executing a movement of the hand/arm either in the same direction of the transfer expressed by the sentence, or in the opposite direction. Participants’ responses were faster when the direction of the movement requested to give the responses was the same as that of the transfer expressed in the sentence. Interestingly, this facilitation effect occurred even when sentences expressed transfer of an abstract object. These data are compatible with the indexical hypothesis. Supporting this conclusion, a Transcranial Magnetic Stimulation (TMS) study (Glenberg et al. 2008) revealed a stronger involvement of the motor areas representing the arm and the hand when participants had to process sentences expressing transfer of both concrete and abstract objects, rather than sentences expressing no transfer at all. An fMRI study (Boulenger et al. 2009) compared the cortical areas activated during the visual presentation of concrete literal sentences (e.g., “John grasps the object”) versus abstract idiomatic sentences (e.g., “John grasps the idea”), in which the only difference was in the word following the verb. Half of the concrete and of the abstract sentences contained hand-action verbs, whereas the other halves contained verbs expressing inferior-limb actions. Words in the sentences were presented sequentially, one at a time. Cortical activations were assessed within two time-windows: an early one starting from the presentation of the word expressing the grammatical item (e.g., “object” or “idea” in the previous examples), and a late one three seconds after sentence presentation. The results showed that, in comparison to a control condition in which strings of non-linguistic symbols were presented, the presentation of both literal and idiomatic sentences induced the activation of left frontal and temporal perisylvian areas usually considered as core language areas. In addition, activations were present also in left motor and pre-motor areas. Direct comparison of activations induced by idiomatic and literal sentences did not show differential activations, whereas the inferior frontal gyrus was more active for idiomatic sentences at both time-windows. Comparing activations induced by idiomatic sentences containing a hand-action verb with those containing a foot-action verb revealed significant stronger motor activations only at the late time-window. Specifically, verbs expressing hand actions activated ventral (hand) motor areas, and verbs expressing foot actions activated dorsal (foot) motor areas. These results suggest the suitability of the embodied approach even for abstract language. The fact that somatotopical activations are attested only at the late time-window has been interpreted by the authors as reflecting semantic processing at the sentence level rather that at the level of single words. A possible alternative interpretation could be related to the concreteness effect: the late somatotopical activation might reflect the greater computation required by abstract terms and idiomatic sentences over concrete terms and literal sentences for retrieving appropriate grounding experiences. A TMS work by Scorolli and colleagues (2012) assessed the neural mechanisms of processing sentences containing four different combinations of abstract and concrete nouns and verbs (concrete verb and noun, concrete verb and abstract noun, abstract verb and concrete noun, and abstract verb and noun). Such an experimental paradigm allows one to study abstract and concrete expressions along a continuum rather than as two sharply separated categories. Participants had to establish whether the sentences were meaningful or not, and the response was to be given by pushing a pedal. The words in the sentences were presented on a screen one at a time; the motor-evoked potentials (MEPs) of a hand muscle induced by TMS applied on the hand motor cortex were measured. The TMS impulse could be delivered during the presentation of either the verb or the noun. In the latter case, indeed, the MEP amplitude is affected not only by the noun but also by the previously presented verb, thus giving insights on the integration of the meaning of the two words. Response latencies were measured as well. The results have shown that the hand motor system is recruited by both concrete and abstract verbs. In particular, when the verb–noun integration is possible (meaningful sentence and TMS impulse delivered at noun presentation) the recruitment of the motor system is higher for abstract verbs, whereas, when the verb-noun integration is not allowed (meaningless sentence, and TMS impulse delivered at noun presentation), the recruitment is higher for concrete verbs. Such result represents a key support for extending the embodied approach to abstract language. In addition, the recruitment seems to be higher when the noun is abstract. The comparison of the MEP amplitude induced by the impulse delivered at the time of verb presentation with that induced by the impulse delivered at the time of noun presentation has revealed that abstract verbs are associated with a late recruitment of the hand motor cortex, whereas concrete verbs recruit the hand motor cortex earlier on. The analysis of the response latencies showed that the task was performed more promptly when the TMS impulse was delivered at verb presentation and the sentences contained a concrete verb. The authors interpreted the results as supporting the WAT approach for two reasons. First, they suggest embodied processing also for abstract words; second, abstract words recruit the motor cortex later than concrete words. Considering that, according to the WAT approach, abstract words should be acquired through explicit and verbal explication by other speakers, the delay in the recruitment of the hand motor cortex could be explained, hypothesizing that abstract words first recruit the mouth motor representation, and only subsequently the hand motor representation (because of its contiguity with the former). Another possible interpretation could be that the delay in the recruitment of the hand motor cortex for abstract words depends on the need to frame abstract linguistic material within a background of other lexical material before the hand motor cortex can be activated by appropriate stimuli.

The reviewed studies strongly support an involvement of the sensorimotor systems in processing abstract language, thus suggesting that abstract and concrete items are not completely distinct in the brain. Rather, they appear as a continuum that most likely lead to a recruitment of the sensorimotor systems at different degrees. In keeping with this notion, two recent fMRI studies (Desai et al. 2011, 2013) assessed fMRI activations during the processing of literal, idiomatic, metaphoric, and abstract language. They found increasing sensorimotor activation from abstract to idiomatic to metaphoric to literal sentences. The authors conclude that the sensorimotor system is, indeed, involved also in processing abstract language (including metaphoric and idiomatic sentences), but additional areas are necessary to process its meaning depending on how the conventional the message is and/or on its level of abstractness. There is evidence, indeed, suggesting that contextual information favoring a non-literal interpretation of action verbs reduces the fMRI activation of the motor cortex (Schuil et al. 2013).

One of the additional neural structures that seems to play a specific role in processing abstract language is the ventro-lateral prefrontal cortex (VLPFC) (Binder et al. 2009). A study by Hoffman and colleagues (2010) has explicitly assessed the role of VLPFC in the comprehension of abstract words both in normal subjects and in patients with brain lesions. Participants underwent a comprehension test in which they had to judge whether two words were synonymous or not: a probe word was presented together with other three words, only one of which semantically correlated with the probe. The probe word could or could not be presented after a sentence setting a specific context. The results showed that patients with lesions in the VLPFC had more difficulties in understanding abstract over concrete terms, but their performance bettered significantly when the sentence contextualized the probe word. In a second experiment of the same study, the authors inhibited VLPFC by means of repetitive Transcranial Magnetic Stimulation (rTMS) in healthy participants. Consistently with findings in patients, rTMS applied to VLPFC hindered the understanding of abstract words, especially in the absence of the sentence specifying the context. This study suggests that the VLPFC is crucial in processing abstract words without the availability of a context as it helps to select one among the many possible meanings that abstract words usually convey. Indeed, VLPFC involvement is inversely proportional to the amount of available contextual information. These results seem to be compatible with the context-availability theory. A recent study using dense array electroencephalography (EEG) compared spatial and temporal dynamics of the EEG signal during a task where the participants had to decide whether a verb presented on a screen was abstract or concrete (Dalla Volta et al. 2014). The results showed that processing concrete verbs activates a number of parietal and frontal areas thought to be at the basis of sensorimotor transformations involved in planning and observing concrete actions executed with a specific biological effector. Abstract verbs, instead, activate the posterior inferior frontal and the dorsal prefrontal cortices. For methodological reasons, the study by Dalla Volta and colleagues (2014) did not allow the identification of areas involved in processing both abstract and concrete verbs.

Some studies, rather than focusing on the recruitment of sensorimotor systems during the processing of abstract language, took into account differences in the recognition of abstract versus concrete words. A behavioral study (Kousta et al. 2011) used words as homogenous as possible in terms of several psycholinguistic parameters (like familiarity, availability of the context, and modality of acquisition), except, obviously, the level of concreteness/abstractness and imageability. When participants assessed such stimuli in a lexical task, the results showed that—contrary to the work by Scorolli et al. (2012) reported above—abstract words were processed more promptly than concrete ones, reversing the concreteness effect. In a second experiment, the same authors (Kousta et al. 2011) demonstrated that the reversal of the concreteness effect was due to differences in the emotional charge of the employed stimuli. Indeed, the advantage in processing abstract words gets lost when also the emotional charge of both concrete and abstract verbal stimuli was similar. These results suggest that the emotional content plays a crucial role in the processing of abstract language (for additional behavioral results in this line, see also Newcombe et al. 2012). In keeping with the former behavioral results, a recent fMRI study (Vigliocco et al. 2014) used both abstract and concrete words as stimuli, where several psycholinguistic parameters were kept homogeneous with the exception of the level of emotional charge and the degree of alert generated by the words (e.g., the different alert generated by “earthquake” versus “grassland”). The comparison between abstract and concrete words unveiled that the abstract ones selectively activated the anterior portion of the cingulate gyrus in both hemispheres, a region usually considered involved in processing emotions (Etkin et al. 2006). The results from Vigliocco and colleagues (2014) showed a correlation between the degree of activation of the anterior cingulate and the level of emotional charge of the stimuli (i.e., the induced pleasant or unpleasant feelings). The authors concluded that neural substrates engaged in processing emotions are also crucially involved in processing the meaning of abstract nouns. Based on these findings, abstract language seems, therefore, to be endowed with a marked emotional charge that appears as its distinctive feature as compared to concrete language. Despite this, it is worth stressing in this context that other studies (Wilson-Mendenhall et al. 2011, 2013a, b) have shown that emotions themselves are grounded and situated. In this view, and based on the results of these studies, emotions are not coded in specific neural substrates in the brain, and there is no correspondence between a specific emotion and a specific brain circuit in a one-to-one fashion. Rather, feeling a specific emotion or processing a specific emotion when felt by another individual (or, possibly, when described verbally) implies the reactivation of different systems that subserve action, perception, interception, core affect, and so on. This means that multiple systems are engaged during the experience and perception of emotions. In keeping with this view, Binder et al. (2016) suggest that not only actual external experiences but also internal experiences contribute to ground words expressing emotions, and possibly abstract words more generally.

In sum, studies aimed at assessing the neural substrates devoted to processing abstract language have shown a recruitment of sensorimotor circuits also involved in processing concrete words; there seem to be a dominant role of the left hemisphere in coding abstract words as compared to concrete ones. The recruitment of additional areas (like, for example, the VLPFC) and the neural substrates engaged in processing emotions may be a distinctive mark between abstract and concrete language.

In keeping with this view, some recent fMRI studies (Fernandino et al. 2015, 2016a, b) investigated whether an encoding model based on five attributes (sound, color, shape, manipulation, and visual motion) could predict brain activations related to single words. The results showed that processing a noun led to the activation of sensory areas coding for the same sensory features preliminarily attributed to that noun. For example, the word “tomato” is associated with color and shape features, and presenting this word in the fMRI led to the activation of sensory areas coding for shape and color. The encoding model was predictive for concrete words, but not for abstract items, thus weakening the role of the sensorimotor systems in processing abstract items. Moreover, based on the presence of areas conjointly activated by both abstract and concrete words, these authors forward the existence of a network of cortical hubs that eventually allow one to attribute meaning to words. This “general semantic network”, as the authors call it, codes for multimodal information derived from basic, lower sensorimotor processes, possibly functioning as a convergence–divergence zone for distributed concept representation. As suggested by Binder et al. (2009, p. 2774), the “human semantic system” corresponds to a large network of parietal, temporal, and prefrontal heteromodal (also called, supramodal, or a-modal) association areas. It seems to us that this view fits with a weaker understanding of embodiment (Mahon and Caramazza 2008), according to which the recruitment of sensorimotor systems may represent a way to color conceptual processing, enrich it, and provide it with a relational context. We have comprehensively presented our view on the semantics of concrete language in an earlier paper (Buccino et al. 2016): the recruitment of sensory–motor areas is both necessary and sufficient to attribute meaning to concrete words. It is worth reminding that increasing evidence of a causal role of the recruitment of sensorimotor systems in language processing also comes from lesion studies. It has been shown that lesions in brain regions within the sensorimotor systems led to impaired lexical and conceptual knowledge of action (Kemmerer et al. 2012). A very recent study (Desai et al. 2015) tested manual and semantic abilities in chronic stroke patients. The authors found that the degree of impairment for action word processing showed correlation with the impairment in manual performance. In keeping with this, another recent work showed that reversible inactivation of hand pre-motor cortex (obtained by rTMS) in healthy individuals may hinder the comprehension of sentences expressing hand actions (Tremblay et al. 2012). Moreover, several studies (Bak et al. 2001, 2006; Cotelli et al. 2007; Fernandino et al. 2013; Cardona et al. 2013; Buccino et al. 2018) have shown that lesions affecting the motor system, even at subcortical level, may lead to impairment of action word processing. This, in turn, supports the notion of a close-and-causal relationship between sensorimotor systems and semantic processing. If semantic processing was not strictly grounded and was due to a general semantic network, then patients with lesions affecting the sensorimotor systems would not show impairment in this cognitive function. The idea that the meanings of abstract words are embodied or grounded in complex experiential clusters does imply that the human brain (in contrast with brains of other species which do not possess language and high-level abstraction) has structures capable of holding together and combining the varied experiences grounding the meaning of abstract words. However, this does not imply that the meaning of abstract words is coded in such structures without the key and causal support (the “grounding”) of “low-level” sensory, motor, and emotional areas.

Embodying abstract language: towards an operational definition of abstractness

In the light of what we have reviewed so far, in this section, we propose an operative definition of abstract language that could also offer the starting point for the further experimental inquiries. In a nutshell, we forward that abstract language is still grounded in experience and linked to the neural substrates subserving those experiences; in this respect, then, we propose that there is a continuum moving from concrete to abstract words (also in line with Locke’s thought recalled above). As compared to concrete words, however, abstract ones reach higher and higher degrees of complexity (see also Barsalou and Wiemer-Hastings 2005, p. 136, for a proposal in this direction). From this viewpoint, therefore, abstract words are not farther from experience than concrete ones; rather, they express the experiences of increasing complexity. We also propose that this increase in complexity manifests itself in three main aspects that we regard as hallmarks of abstractness:

  1. a.

    Abstract language is effector-unspecific.

  2. b.

    It is multi-systemic.

  3. c.

    And it is dynamic.

Note that our proposal, as compared to the current literature, intends to be fully embodied in the sense that it posits that to process and reach the meaning of abstract words one can rely on the same modal (i.e., sensory, motor, and emotional) areas necessary to process concrete words. As it can be seen from the above section, even within the theoretical framework of embodiment, the current literature proposes a multiple representation view (see Borghi et al. 2017 for review) according to which processing the meaning of abstract words requires the involvement of both modal areas and a-modal components. The main tenet of our proposal is that a-modal components are not necessary to comprehend abstract words.

By saying that abstract language is effector-unspecific, we refer to the fact that abstract words are not grounded on the motor representation of a specific effector, but express motor experiences that are normally done with different effectors (see Fig. 1). For example, while the noun “cup” recruits hand motor representations, the noun “freedom” recruits motor representations that are not only related to the hand, but also to the foot and/or the mouth. You can activate your mouth representation in reference to, e.g., freedom of speech, your foot representation in reference to, e.g., walking through the fields in summertime, or your hand representations when you express, e.g., your freedom of voting. The same holds true also for verbs like “to grasp” versus “to praise”. It is worth stressing that, even within the category of concrete verbs or nouns, the recruitment of the motor system may vary in a gradient-like fashion, depending on different factors. A study found a different modulation of the motor system depending on how precisely the actual action is described by the verb (Marino et al. 2012): compare, e.g., “to rake” or “to sign” with “to waste” or “to book”. A very recent paper (Agosta et al. 2016) found a different recruitment of a motor representation during the observation of biological versus non-biological motions (see also Perani et al. 2001). The recruitment of an effector-related motor representation (like the hand) also occurs during the processing of emblems, which, indeed, convey a conventional meaning using hand movements (Andric et al. 2013) or even during the processing of meaningless actions (Lui et al. 2008). Finally, processing adjectives expressing positive (e.g., “soft”) or negative (e.g., “thorny”) features induces a specific recruitment of hand representation even in the absence of a noun (Gough et al. 2013).

Fig. 1
figure 1

The meaning of abstract words as effector-unspecific. The meaning of abstract words is usually related to more than one single biological effector. Consequently, whereas concrete words like “cup” (left) are mainly embodied in the hand motor representation (circle), abstract words like “freedom” (right) are embodied in the motor representation of more than one effector, e.g., hand (circle), mouth (triangle), and foot (square). In other terms, the number of effectors involved in grounding the meaning of words is one of the dimensions along which the complexity of the grounding experiences usually increases when moving from concrete to abstract words

By the term multi-systemic, we refer to the fact that abstract words may recruit not only one single system (like the motor or the visual ones), but more than one at the same time (see Fig. 2). For example, there is no doubt that the meaning of a verb like “to walk” is grounded in the motor system (see Hauk 2004; Buccino et al. 2005; Tettamanti et al. 2005), specifically in the foot motor representation. However, it is worth stressing that, even within the domain of concrete verbs, more than one system may come into play. Indeed, even scholars who focus their research on the motor grounding of language acknowledge that “much of what we talk about is linked to our (in great part visual) perception of the social and physical environment” (Arbib 2016, p. 4, italics added). A verb like “to fly” may rely on motor and sensory systems. There is evidence that, during action observation, the recruitment of the motor system occurs only when the action is part of the observer’s motor repertoire: the observation of dog barking, for which human individuals do not have a motor representation, activates the visual system (Buccino et al. 2004a). Similarly, subjects’ high competence in specific motor domains induces a stronger recruitment of the motor system (e.g., Calvo-Merino et al. 2005). In the same line of reasoning, if one considers an abstract verb such as “to like”, the meaning might stem from the recruitment of different systems at the same time: one can like to walk (involving the motor system), or an ice-cream (involving taste), a piece of music (sound), a painting (vision), etc. Note also that abstract words often imply a stronger involvement of the emotional system. One could argue that, to some extent, also in the case of concrete words, several systems may intervene: you mainly grasp an apple, but you can also smell or taste it. However, concrete words usually ground their meaning mainly on a single system: the word “cinnamon” activated the olfactory system (Barrós-Loscertales et al. 2012) and the word “salt” activated the gustatory one (González et al. 2006): these activations are enough to fix the meaning of those words. In the case of abstract words, on the contrary, our point is that the recruitment of several systems is necessary for attributing meaning.

Fig. 2
figure 2

The meaning of abstract words as multi-systemic. The meaning of abstract words involves more than one system (sensory, motor, or emotional). Consequently, whereas the meaning of concrete words (like the verbs “to walk” or “to eat”) may be embodied in one or a few brain systems, the meaning of abstract words (e.g., “to like”) is embodied in several systems. In other terms, the number of systems involved in grounding the meaning of words is the second dimension along which the complexity of the grounding experiences may increase when moving from concrete to abstract words. 1: motor cortex; 2: gustatory cortex; 3: olfactory cortex; 4: auditory cortex; 5: somatosensory cortex; 6: visual cortex; 7: emotion-related brain regions

Note that this perspective is in line with that suggested by some authors for emotion coding (Wilson-Mendenhall et al. 2011, 2013a, b). What we are suggesting here is that the recruitment of several systems at the same time is valid not only to process emotions or emotion-related language, but is a necessary pre-requisite to process all abstract words.

By saying that the meaning of abstract words is “dynamic”, we refer to the fact that it may have different nuances depending on age, education, or life style of the speakers and on the historical period or the context in which to act. In other words, the meaning of abstract words may change more significantly in time than that of concrete words. Note that this is true in the history of societies as such, as well as in the personal history at the ontogenetic level (see Fig. 3). The meaning of a concrete word like “cup” is acquired quite early during development, in parallel with the motor experience. Cup-related knowledge does not change significantly with age, and the word “cup” fundamentally means an object for drinking all life long and in all societies. The word “freedom” is a quite different case. Indeed, for a child, “freedom” may express the situation when the parents allow him/her to watch cartoons on the TV. For a teenager, it may express the possibility to go out with friends after dinner or to choose his/her clothes, hobbies, and sports. For an adult, it may additionally refer to an array of situations concerning job, family, politics, etc. On the other hand, there is no doubt that abstract concepts also evolve in the history of cultures and societies. The notions of beauty, culture, and freedom are good examples: their meaning shifted repeatedly during the history of humanity.

Fig. 3
figure 3

The meaning of abstract words as dynamic. The meaning of abstract words usually changes, sometimes even significantly, during the life span of individuals, as life experiences enrich and accumulate. Consequently, (1) the meaning of concrete words, like “cup” (top), is embodied virtually in the same way all along the individual life span (e.g., in the motor cortex [circle]), whereas (2) the meaning of abstract words, like “freedom” (bottom), comes to be progressively attached to more and more experiences, often emotionally charged, and, therefore, embodied in more and more scattered brain systems. Note the increase in activation (symbolized as the increase in the shaded surfaces) of the represented brain regions, and especially the emotion-related ones. 1: motor cortex; 2: gustatory cortex; 3: olfactory cortex; 4: auditory cortex; 5: somatosensory cortex; 6: visual cortex; 7: emotion-related brain regions

Since the general claim which we forward is that processing a word implies the recruitment of those same areas underpinning related sensorial, motor, and emotional experiences, it turns out that when processing abstract words, sensorial, motor, and emotional areas intervene in a different manner and at different levels depending on the context, and on the speaker’s age, education, and so forth. In this respect, abstract words are less “fixed” than concrete words and their processing is grounded in a dynamic and combinatorial recruitment of sensory, motor, and emotional areas in the brain. This, indeed, raises the question of how the brain may recruit modal areas depending on the context, education, age, and so on. One may argue that additional areas in the brain, for example the VLPFC, or areas belonging to the so-called general semantic network (such as the angular gyrus, the posterior cingulate, and the medial prefrontal cortex), or specific language areas (such as the Broca’s region), may have the role of re-enacting in a dynamic and integrated manner the activity of the different (motor or modal) systems, and not that of contributing contents to the semantics of abstract words. Note that a similar role has been proposed for some prefrontal areas during the imitation of novel actions that are not already part of the observer’s motor repertoire (Buccino et al. 2004b; Vogt et al. 2007). It has been suggested that, when learning a novel motor task, prefrontal areas are involved in selecting specific motor representations and in recombining them to fit the new model. Discriminating between these two alternatives still deserves empirical investigations and confirmations.

This view, taking into account the level of complexity of the experiential clusters associated with abstract words, may provide a framework for unifying several aspects of the psychological and theoretical approaches and the empirical evidence discussed above. The concreteness effect, as implied by both Paivio’s dual-coding theory and Schwanenflugel’s context-availability theory, may be explained by the greater simplicity of the experiences grounding some words (those considered “concrete”) than the other ones (those considered “abstract”). This consideration may also explain the findings, suggesting that specification of a context facilitates the comprehension of abstract language: the specification of the context may enable the retrieval of a subset of appropriate experiences within the complex experiential domains at the basis of an abstract word. The findings suggesting that abstract language has a greater emotional charge may also be related to the greater complexity of the experiences grounding abstract words, and to the fact that such experiential clusters enrich with age. This might explain the observation, stressed by the WAT theory (Borghi and Cimatti 2009; Borghi et al. 2011), that one difference between abstract and concrete words is the age of acquisition. This evidence can be justified by the higher complexity of the experiences giving meaning to abstract words. Moreover, as we have mentioned in the case of “freedom”, the meaning of such kind of words is linked not only to “neutral” sensory and motor experiences, but also to personal happenings, situations, choices, etc. often colored with an emotional component, as emphasized by one of the reviewed approaches (Vigliocco et al. 2014; Citron et al. 2012). This may explain the involvement of emotion-related brain structures in processing abstract language. Finally, taking our proposal as the basis for further empirical investigations may help to answer some open questions concerning abstract language, like those pointed out by Dove (2016) and mentioned above in this paper. In particular, regarding the problem of disembodiment, our proposal entails that abstract words are not divorced from experience and predicts that they are grounded in several modal areas and systems, though in a complex and articulated manner. Regarding the problem of abstract language flexibility, our proposal entails that an abstract word may prevalently recruit some areas and systems rather than others depending on the context, so that flexibility does not necessarily imply a-modal, abstract-specific brain areas. As for generalization (which, as in Aristotle’s and Locke’s perspective, we rather consider a general cognitive process not restricted to abstract concepts), our proposal entails that abstract words may have a higher level of generalization because of the increasing number of contexts and situations (and, therefore, of personal experiences) underpinning their meaning.

Conclusion

This work addressed the thorny issue of abstract language within the embodied perspective. The philosophical, psychological, and neuroscientific arguments, theories, and evidence reviewed here suggest that words and linguistic expressions should not be partitioned into a sharp dichotomy of abstract and concrete items. Rather, linguistic material can be ranked along a continuum according to the complexity of the experiential clusters associated with each linguistic item. Therefore, our main conclusion can be summarized by saying that the meaning of those linguistic items usually considered as abstract in the recent literature should not be understood in terms of farness from experience, but, more precisely, in terms of complexity of the associated experiential clusters. This conclusion is in line with an emerging consensus—both in the neuroscience and in linguistics—on the idea that linguistic meaning, quite generally, can actually be grounded in experience (an idea that, as we have seen, finds strong roots in part of the philosophical tradition) and in the related neural circuits (see Buccino et al. 2016 for arguments in this line). The specific suggestion of this work is that so-called “abstract” language is no exception to this general stance. The point is that the complexity of the experiential clusters associated with linguistic expressions usually considered abstract implies the coordinated and integrated re-enactment of the different (motor or modal) systems also involved in processing concrete words.

A further element of interest of our present proposal is the identification of three specific dimensions along which the complexity of the experiential cluster associated with linguistic items may increase: the number of involved effectors, the number of involved systems and modalities, and the richness of dynamically associated experiences. This may encourage the implementation of further experimental paradigms aimed at specifically manipulating one or more of these dimensions so to gain an additional empirical evidence about the embodiment of abstract language.