Keywords

3.1 Introduction

So far different theories of abstract concepts and words representation have been proposed. Below, we will illustrate the main approaches to abstract concepts and words, highlighting their strengths and weaknesses and underlying the similarities and differences between the other theories and our proposal. We will start with embodied theories.

We will first outline theories according to which abstract and concrete concepts do not differ as they are both grounded in action, emotion, and force dynamics. Then, we will illustrate theories which, even if maintaining an embodied stance, argue that abstract and concrete concepts are grounded on different aspects, for example situations versus perceptual properties and direct experience versus metaphors.

Finally, we will illustrate recent approaches arguing for the necessity of a double representation, some of which stem from the classical theory of Paivio. We will outline as well recent theories that, even if not referred directly to abstract concepts, are relevant for our understanding of them.

3.2 Grounding in Action of both Concrete and Abstract Concepts

Within embodied and grounded (EG) theories, we can distinguish two main approaches to abstract concepts. According to the classical EG view, abstract concepts are not special: They are EG exactly as concrete concepts. Other EG views recognize that abstract and concrete concepts differ in content, even if both are embodied and grounded.

In this section, we will illustrate the approach according to which abstract and concrete concepts do not differ. Various research lines provide pieces of evidence suggesting that the representations of the two kinds of concepts at least partially overlap (for a thorough review, see Pecher et al. 2011). According to the proponents of this view, their representations partially overlap since both concrete and abstract concepts are grounded in action (see ACE and approach–avoidance effect) and in force dynamics.

A first line of research underlines the role of action in abstract concepts representation. This view is based on evidence of the so-called “action–sentence compatibility effect” (ACE for short), obtained with both concrete and abstract transfer sentences, using both behavioral and TMS methods (Glenberg and Kaschak 2002; Glenberg et al. 2008a, b). The ACE consists in faster responses when the action implied by a sentence (e.g., “open” vs. “close a drawer”) matches with the action performed to produce a response (pressing a button moving toward vs. away from the body). Glenberg and Kaschak (2002) found an ACE with both concrete (e.g., “Andy delivered the pizza to you/You delivered the pizza to Andy”) and abstract transfer sentences (e.g., “Liz told you the story/You told Liz the story”). However, as Glenberg et al. (2008a, b) argue, while for concrete sentences, the simulation evoked can explain the ACE interaction found with manual responses, it is more difficult to account for it with abstract sentences: For example, “You tell Liz the story” would imply the activation of the neural substrate for mouth, not for hand moving. According to Glenberg et al. (2008a, b), we would develop an action schema, which underlies different transfer verbs (handle, give, etc.) even if the specific parameters, such as those related to the kind of grip, might differ. The same schema would be generalized to abstract transfer sentences. In support of this view, the authors used a sentence sensibility evaluation task applying single-pulse transcranial magnetic stimulation to hand muscles. They found a greater motor system modulation (larger MEPS) when participants read both concrete and abstract transfer sentences (e.g., concrete transfer sentences: “Marco gives you the papers. You give the papers to Marco”; abstract transfer sentences: “Anna delegates the responsibilities to you. You delegate the responsibilities to Anna.”), compared to sentences that did not refer to transfer (e.g., “You read the papers with Marco. You discuss the responsibilities with Anna.”). The MEPs modulation was similar with concrete and abstract transfer sentences, both when the pulse was delivered at the verb presentation and at the end of the sentence. This confirms that simulation of concrete transfer underlies comprehension of both concrete and abstract sentences. Further studies have found the ACE demonstrating that the abstract quantifiers “more” and “less” activate an upward versus a downward movement (Guan et al. 2013). Similarly, recent evidence in our laboratory has shown that calculation processes such as summing or subtracting are associated to different bodily movements (see Fig. 3.1).

Fig. 3.1
figure 1

Study by Lugli et al. (2013). The interaction between kind of operation and movement showing an advantage of the congruent condition (addition and ascending movement, subtraction and descending movement) over the incongruent one (addition and descending movement, subtraction and ascending movement) when participants used the elevator but not the stairs. The congruency effect was present only when participants performed the task during the execution of the movement (online condition), not when they performed it after having executed it (offline condition). The result supports the idea that numbers and calculation processes are grounded in the sensorimotor system

Participants had to make additions or subtractions while performing an ascending and a descending movement, using a lift or taking the stairs. We found a congruency effect between the bodily movement direction and the kind of calculation: Addition was associated to an ascending movement with the lift, subtraction with a descending movement (Lugli et al. 2013). The effect was not present with the stairs, likely due to the fact that the movement associated with the elevator is more clearly vertical and faster. The absence of a congruency effect when the calculations were made after the movement (off-line condition), and when the movement was not directly experienced but simply imagined, indicates the effect is motor and not only perceptual. The result suggests not only that larger quantities are associated with an upward movement, smaller quantities with a downward movement, but also that the different calculation processes are associated with movements differing in direction. Overall, they contribute in revealing the embodied nature of spatial-numerical associations (Fischer and Brugger 2011).

A further line of research, very related to the one on ACEs, has shown that (abstract) emotional terms, such as “love” or “anger,” evoke approach and avoidance movements, thus engaging both the emotional and the motor system. In a seminal paper, Chen and Bargh (1999) found the so-called approach–avoidance effect: Responses were faster when people had to pull a lever toward their body in response to positive words (e.g., “cake”) and to push a level away from their body in response to negative words (e.g., “spider”); responses were slower when participants were required to pull the lever toward their body with negative words and to push it away with positive words. Van Dantzig et al. (2008) extended this result showing that approach/avoidance movements are encoded in terms of their outcome, not of the specific movement: In contrast to negative words, positive ones evoke the tendency to reduce the distance between the stimulus and the self (see also Freina et al. 2009; Förster and Strack 1996) (see Fig. 3.2).

Fig. 3.2
figure 2

Study by Freina et al. (2009). Participants had to classify a word as positive or negative. They had to respond pressing a near or far key on the keyboard, performing a movement either toward or away from their body. When they responded holding a tennis ball in their hand, they were faster when pressing the near key for positive objects and the far key for negative objects. When they responded with the empty, open hand the results were the opposite: Response times were faster when pressing the “far” button for positive words and the “near” button for negative words, as if they “simulated” reaching for something “good” and avoiding something “bad.” Overall, results show that emotion terms are grounded in the sensorimotor system and reveal that there exists a complex interplay between emotional words, movement, and hand posture

Lugli et al. (2012) and Gianelli et al. (2013) recently introduced and manipulated the addressee of the action using sentences such as “The object is attractive/ugly. Bring it towards you/Give it to another person/Give it to a friend/to an enemy.” They found that the simulated social context influenced the kinematics of the movement and the coding of stimulus valence. One problem of the evidence on the approach–avoidance effect is that it concerns emotional terms: As discussed in Chap. 1, emotional terms represent a subset of concepts with idiosyncratic characteristics, and depending on the adopted approach, they can be considered abstract or not.

Finally, a further line of research relevant to the issues discussed here was inspired by cognitive linguistic studies on force dynamics (Talmy 1988). According to the author, physical and social events are conceptualized as opposition between conflicting forces, for example between an agonist and an antagonist force. This is true also for linguistically described events. In Talmy’s view, the representation of both concrete and abstract events relies on the same force mechanisms; the only difference is that in the last case, the agonist tends more toward rest or performs less “physical” actions (see for a thorough review Pecher et al. 2011). In a submitted paper (reported in Pecher et al. 2011), Madden and Pecher (2010) reported evidence favoring this view: When primed by two shapes that interacted according to the same force dynamics pattern, sentence sensibility judgments were faster than in mismatch cases. In support of Talmy’s view, this reveals that the availability of force dynamics information facilitates processing; importantly, as predicted by Talmy, results were the same for concrete and abstract sentences. Even though the theory is fascinating and the evidence found is compelling, it might be difficult to extend it to cases in which single concepts instead of whole sentences are considered.

Overall, the three research areas we illustrated have some communality. Researchers from these three areas, supported by rich and compelling evidence, underline the similarities rather than the differences between concrete and abstract concepts, showing that both are grounded in action. The three areas have some common limitations, though: in all cases, the evidence provided is confined to specific domains—to transfer sentences for ACE evidence, to the emotional domain for approach–avoidance effects, and to the events that can be conceptualized in terms of force dynamics for work inspired by Talmy’s view. It is hard to foresee how far this evidence can be extended to other domains. Furthermore, it remains open the possibility that, although concrete and abstract words do not differ in the dimensions considered, they differ along other dimensions.

In line with the perspective presented in this section, the WAT view proposes that both concrete and abstract concepts are grounded in perception and action, and thus their representations overlap for some important aspects. However, we believe that this is not the whole story and that it would be important to find ways to operationalize the distinctions and not only the commonalities between concrete and abstract concepts.

3.3 Differences in Content Between Concrete and Abstract Concepts

Much of the debate on abstract and concrete concepts focuses on their differences—in format, in grounding, and in the constituent semantic attributes. In this and in the next section, we will discuss proposals and evidence emphasizing the differences rather than the similarities between concrete and abstract concepts.

Some proponents of EG theories, though claiming that abstract concepts are grounded as concrete concepts, admit that the two kinds of concepts can be represented differently. The difference does not pertain their format (e.g., amodal vs. grounded); rather, it depends on their content, that is, on the different kinds of properties evoked by abstract compared to concrete concepts. We will discuss in Sect. 3.3.1 the proposal by Barsalou and Wiemer-Hastings (2005) according to which abstract concepts activate more situations and introspective properties compared to concrete ones, and in Sect. 3.3.2, the proposal by Vigliocco and collaborators (Kousta et al. 2011; Vigliocco et al. 2013a , b) according to which they activate more emotional aspects compared to concrete concepts. The two theories have in common the embodied stance, as well as the fact that they ascribe the difference between concrete and abstract concepts to their differences in content. In addition, different from other proposals, they do not only provide a negative definition of abstract concepts (e.g., concepts less imageable and less sensory-based than concrete ones) (see Vigliocco et al. 2013a, b, for a similar critique against Paivio’s view, and Paivio 2013) but they propose that different semantic features characterize the two kinds of concepts: Concrete concepts evoke more perceptual properties, abstract ones more situations and introspective properties (Barsalou and Wiemer-Hastings 2005), and more emotions (e.g., Kousta et al. 2011).

3.3.1 Situations and Introspective Properties

Barsalou and Wiemer Hastings (2005) demonstrated that abstract concepts focus on situations and on introspective properties. They presented participants with three abstract (“freedom,” “truth,” and “invention”), three concrete (“bird,” “sofa,” and “car”) and three intermediate concepts (“cook,” “farm,” “carpet”), and asked them to produce characteristics of each concept. Each abstract and intermediate concept took three forms which were collapsed for the analyses: for example “a freedom,” “to free,” and “freely.” Concepts were presented in isolation or preceded by a short section illustrating a situation: For example, for the concept “truth,” the situation described a boy who told his mother that he had not broken a vase, and his mother believing him. Results showed that both concrete and abstract concepts were grounded in situations. However, compared to concrete concepts, which activate the physical aspects of situations, abstract concepts focus “on the social, event, and introspective aspects of situations (e.g., people, communication, beliefs, and complex relations).” (p. 152). In a further feature generation task, Wiemer-Hastings and Xu (2005) found with a larger sample of concepts that the concrete ones elicited more item properties, while abstract concepts evoked more introspective properties. They also found that while both concrete and abstract concepts were grounded in situations, they relied on different aspects of situations, since abstract concepts focused more on the social aspects of situations than concrete concepts. Further evidence favoring the idea according to which abstract concepts activate situations has been collected by King (2013). Participants were presented with short scenarios, and later, they were required to perform a lexical decision task on a target abstract word. Results showed that, even if in the scenario no associate word to the target was present, the context had different impact on different kinds of abstract concepts: The scenario facilitated processing of relational abstract concepts (e.g., “ignore,” which describes an act, an actor, a patient being ignored, but no internal feeling), while it did not influence activation of mental states (e.g., “depressed,” which does not expresses relations but refers to a feeling).

The relevance Barsalou and Wiemer-Hastings ascribe to the relation between abstract concepts and situations is partially in line with the original idea on which the contextual availability theory (Schwanenflugel et al. 1988) is based. According to CAT, concrete concepts are strongly related to a small number of contexts, while abstract concepts are weakly related to a high number of contexts. This disparity would account for the processing advantage of concrete over abstract concepts and is supported by evidence showing that the concreteness effect disappears when contexts are provided.

However, Connell and Lynott (2012) recently found that concepts with high perceptual strength, typically assumed to be concrete, evoke a broader variety of contexts than concepts with low perceptual strength. Context diversity enhances processing rather than slowing it down.

This finding gives us some clues to interpret results showing that abstract concepts elicit more situations, as those obtained by Barsalou and Wiemer Hastings’ (2005) with a feature generation task, by Roversi et al. (2013) with a feature generation task, and by Caramelli et al. (in preparation) with a definition task with the concepts “risk,” “danger,” and “prevention.” In fact, it is possible that the number of situations produced for abstract words was higher than those produced for concrete words because in the second case more perceptual, partonomic, and taxonomic properties were produced. If this is true, this would obviously pose some problems to the theory that abstract concepts are grounded in backward situations.

However, an alternative is possible, and it relies on the role played by what Connell and Lynnot (2012) call situational complexity. As they argue, perceptual strength and situational complexity are not mutually exclusive. This position is in line with ours. Indeed, we argued that one of the characteristics of abstract concepts is that they are embedded in complex contexts and relations (see Borghi et al. 2013). Therefore, it is possible that concrete concepts are highly related to a wide variety of simple contexts, while abstract concepts are related to more complex contexts and embedded in a complex network of relations. This characteristics of abstract concepts would be able to account for the two experimental results found, that is, the fact that abstract concepts are less characterized by perceptual strength than concrete ones (Connell and Lynott 2012), and the fact that they evoke a higher number of situations (Barsalou and Wiemer-Hastings 2005; Caramelli et al. in preparation).

In sum, according to a promising view, one distinctive characteristics of abstract concepts is that they are more anchored to situations compared to concrete ones. However, the role of situations for characterizing abstract concepts should be better clarified, since so far little evidence has been provided in favor of this view, as argued by Pecher et al. (2011).

3.3.2 Emotions

A recent proposal highlights the importance of emotions in characterizing abstract concepts. We decided to insert it among the models which highlight that abstract concepts differ in content from concrete ones. However, this theory could be qualified as a hybrid model as well, since it argues that abstract concepts are characterized more by emotions and linguistic information, and concrete concepts more by sensorimotor information.

The proposal, outlined by Kousta et al. (2011; see also Kousta et al. 2009) and by Vigliocco et al. (2013a, b), rests on experimental and neural evidence. In the experimental studies, a large sample of concrete and abstract words was used, in which many lexical dimensions were controlled, among which familiarity, context availability, and imageability. Furthermore, norms on mode of acquisition (Della Rosa et al. 2010) were used. The inclusion among the norms of context availability and imageability is crucial, since two classical and influent theories of abstract concepts, the context availability theory (CAT) and the dual coding theory (DCT), rest on them, as clarified in Chap. 1. Results obtained with a lexical decision task (which implies distinguishing words from non-words) showed that, when context availability and imageability were controlled, the usual advantage in processing of concrete over abstract words (concreteness effect) was not present. Surprisingly, an opposite abstractness effect was found: Abstract concepts were processed faster than concrete ones. A regression analysis extended this result to lexical decision response times of a wide sample of words (n = 2,330). More crucially, the authors found with an experiment conducted on 430 words that the best predictor for the advantage of abstract over concrete words was emotional valence, that is, whether the words had a positive, a negative, or no emotional connotation. Controlling for valence, the advantage of abstract words disappeared. This result was recently complemented by neural evidence showing stronger activation for abstract words in the rostral anterior cingulate cortex (rACC), an area which plays a regulatory role in emotional stimuli processing (Vigliocco et al. 2013a, b). This result will be further discussed in Chap. 5.

The authors conclude that because context availability and imageability were kept constant, neither CAT nor DCT can account for their results. In addition, given that modality of acquisition was kept constant, they argue that differences in activation of linguistic information do not exhaust the difference between concrete and abstract concepts, but that emotions as well play a major role in abstract concept representation. Notice, however, that this is not the whole story: When removing the effect of valence, the advantage of abstract words was still maintained in accuracy. This could be due to the role played by linguistic information for abstract concepts.

On the basis of these data, they develop an embodied theory according to which abstract and concrete concepts differ in terms of the distribution of the experiential information which characterizes them. While concrete concepts are grounded primarily in sensorimotor information, abstract ones evoke mostly linguistic information and emotions.

A novel and interesting part of this theory is that it proposes a developmental trajectory. The authors rest on evidence showing that emotional development precedes language development and indicating that words referring to emotions are acquired rather early at around 20 months of age. They consider concreteness, valence, and age of acquisition of a large sample of 2,120 words and demonstrate that abstract words with emotional content are acquired earlier than neutral abstract words. On the basis of these data, they argue “these data are indicative of the possibility that emotion may provide a bootstrapping mechanism for the acquisition of abstract words” (Kousta et al. 2011, p. 26).

Overall, this view is supported by compelling evidence, both behavioral and neural, and it also proposes an interesting developmental course. Its main shortcoming resides in the fact that emotional words can be considered a very special kind of abstract words, as discussed in Chap. 1 and as shown also by the data by Kousta et al. (2011) on conceptual acquisition. However, this is only a partial limitation. Indeed, the proponents of this view could argue that the theory does not concern only emotional words, but it claims that valence characterizes more generally all abstract concepts (see Chap. 5 for further discussion).

The WAT proposal and this theory emphasizing the role of emotions, even if clearly different, have much in common. Both proposals highlight the fact that both concrete and abstract concepts are grounded in sensorimotor experience, but at the same time, both theories highlight the specificity of concrete and abstract concepts as well. Furthermore, both proposals underline that multiple aspects might underlie conceptual representation. But this is not the end of the story. While many theories have called attention to the primary role of linguistic information for abstract words, no theory to our knowledge has put emphasis on the importance of the social and emotional aspects linked to word acquisition. WAT attempts to emphasize the role of language considering not only the semantic and syntactic aspects, but also the pragmatics aspects (for recent evidence showing how semantics and pragmatics are strictly interwoven, see Egorova et al. 2013; Prinz 2013) and ascribing relevance also to the social context in which language is acquired. In this social context, it is likely that emotions play a major part. Therefore, the finding that emotional valence characterizes more abstract than concrete concepts is fully in line with the WAT proposal, according to which the social context of acquisition is more important for abstract than for concrete words.

3.4 Metaphors

The most influential embodied view on abstract concepts representation is the conceptual metaphor view (for a thorough overview, see Pecher et al. 2011; for a recent special issue, see Fusaroli and Morgagni 2013). This view was initially proposed in cognitive linguistics (Lakoff and Johnson 1980; Lakoff 1987; Gibbs 1994, 2005) and then extended to psychology and cognitive neuroscience. Metaphors are really pervasive in our language: For example, the metaphor “argument is war” would underlie expressions such as “He attacked every weak point in my argument,” the metaphor “time is money” would underlie expressions such as “You are wasting my time” (Lakoff and Johnson 1980). The basic tenet of this view is that concrete concepts are used as metaphor (the “vehicle”) in order to represent abstract concepts (the “topic”). This metaphorical process allows humans to comprehend one kind of experience on the basis of another embodied experience, which provides its structure and grounding. Different metaphors can structure a single concept and capture different aspects of it: For example, the meaning of the abstract concept “love” (the “topic”) would be structured differently by different “vehicles”: Love is a journey, it is madness, it is a magnetic field, etc. Spatialization is an important part of this process: For example, the embodied concept of “up” would structure many domains, such as that of hierarchy and power (power is up), that of happiness (happy is up), and others. Along the same line, Fauconnier and Turner (1998) recognize the centrality of metaphorical projection to structure our thought. According to the conceptual blending theory (Coulson 2000; Fauconnier and Turner 1998), attributes and structure from a source mental space are selected and imported into a blended space, where they can be combined with further background knowledge (see also Coulson and van Petten 2002).

The conceptual metaphor theory is supported by a variety of experimental evidence. Meier and Robinson (2004) demonstrated with linguistic stimuli that evaluations of positive words were faster when words were in the up rather than the down position, while the opposite was true for negative words. Meier et al. (2007) demonstrated that the up–down image schema affected memory of the abstract concepts related to God and Evil: People tended to encode and remember better God-like images when they were in a high position and Devil-like images when they were in a low position. Giessner and Schubert (2007) demonstrated that also the representation of power is structured by an up–down image schema: A longer vertical line increased judged power of managers compared to a shorter one, and the more one manager was presented as powerful, the higher they tended to locate his/her box in an organization chart. This up–down organization seems to structure the representation of the concept of power overall, and it does not intervene only in the selection of response phase. Thus, the up–down image schema represents the background structure of a variety of abstract concepts, such as those of affect (positive up, negative down), divinity, and power.

In a similar vein, Boot and Pecher (2010) showed that the abstract notion of “similarity” relies on the concrete concept of “closeness.” Participants determined whether two squares, located at a different spatial distance, were similar or not in color. Performance with similar color was better when the squares were closer, with dissimilar color when they were farther from each other.

Boot and Pecher (2011) demonstrated that the abstract concept of “category” is grounded in the concept of “container.” Importantly, in many of these studies, the authors decided to avoid using linguistic stimuli in order to demonstrate that conceptual mapping effects pertain concepts and not only word meanings since they exist beyond language.

Probably the domain which has been more thoroughly investigated in order to support the conceptual metaphor theory concerns the relationship between space and time (for a recent review, see Bonato et al. 2012; see Chap. 6 for discussion). The underlying idea is that the abstract concept of time would be structured thanks to the concrete notion of space. Researchers started by considering that the relationship between space and time is asymmetrical: For example, we often rely on space when talking about time (e.g., we say “a long holiday”). Boroditsky and Ramscar (2002) showed with ingenious experiments that people in an ego-moving perspective (for example, people at the beginning of a train journey, people who had just flown in, or people who were at the beginning of a lunch line) tended to respond to an ambiguous time question producing an ego-moving response. When required to process the sentence “next Wednesday’s meeting has been moved forward two days,” they interpreted forward as after (i.e., Friday). In contrast, people in a time-moving perspective (for example, people who were at the end of the trip, or of the line) tended to refer forward to earlier (i.e., Wednesday). This result indicates that time and space are strictly interwoven and suggests that thinking about time is grounded in embodied experience. Casasanto and Boroditsky (2008) showed that, when providing judgments about time, people are not able to ignore spatial information, while the opposite is not true. For example, participants were required to estimate line length and duration: Even when required to estimate duration, they were unable to ignore spatial information on line length, but the interference did not work in the other way round. Results confirmed an asymmetrical dependence of time upon space: Distance affected duration estimates more than duration affected distance estimates. The fact that the task did not involve linguistic stimuli or response led the authors to argue that the metaphorical relation between time and space extends beyond language.

Flusberg et al. (2010) designed a connectionist model showing that the way we think about the abstract concept of time is grounded on our online representations of space. The model accounts for a number of results collected in Boroditsky’s laboratory. Flusberg et al. show that this grounding is not due to the fact that space and time are typically experienced together, but to the structural similarity between time and space; thus, they showed that the neural network progressively learned to map the directionality of time (from early to late) with the directionality of space (from west to east).

The examples we provided are only a few, but the conceptual metaphor theory, and specifically the idea that the notion of time is understood referring to the concept of space, is supported by a lot of evidence. However, it has been also subject of much criticism. In our opinion, one of the most effective critiques to the view according to which thinking about time is grounded on the more concrete experience of space is advanced by Kranjec and Chatterjee (2010) in a recent paper. The authors outline two problems. The first is theoretical: According to them, the notion of spatial schema is a theoretical construct, representing a mediation between perception and language; thus, it is not necessarily “embodied” in a strong sense. The second problem is empirical: They point out that the evidence in favor of the conceptual metaphor theory is mainly linguistic and behavioral, but that neural evidence on the relationships between space and time is lacking. In analyzing the literature, the authors argue that the idea that time is grounded in spatial representation has often led researchers to neglect the importance of time. Time is the most frequently used noun in English; in addition, temporal language appears earlier in the development: The frequency and early acquisition of temporal notions suggest that spatial grounding is not necessary for time representation. Moreover, in the brain, different mechanisms and areas, both subcortical and cortical, represent different kinds of temporal information. Thus, according to Kranjec and Chatterjee (2010), it might be unnecessary to assume that time is grounded in space, given that dedicated neural circuits for time do exist. However, it is also possible that time is grounded both in spatial abstractions and directly in timing areas: As recognized by the authors, the two hypotheses are not necessarily mutually exclusive.

The theory of conceptual mapping is probably the most influential embodied theory of abstract concepts, and there is compelling evidence in its favor, as we have seen thanks to the reported examples. However, various problems remain open (see Pecher et al. 2011).

Recently, a study challenged the postulate of CMT of an unidirectional influence from sensorimotor experience to metaphors, and not viceversa (Slepian and Ambady 2014). Participants learned new metaphors concerning weight and time; later, they were to provide weight estimates of old and of new books. If exposed to the metaphor that the past is heavy, they tended to perceive as heavier old books, if they had been exposed to the metaphor that the present is heavy, they perceived as heavier a book seemingly from the present. These results indicate that novel metaphors can influence sensorimotor processes as well, thus leading to a bidirectional influence between metaphors and sensorimotor states. This evidence requires at a minimum that the conceptual metaphor theory is extended to account for the bidirectional relationship between metaphors and sensorimotor processing.

Further problems of the CMT have been raised by different authors. One debated issue concerns whether metaphorical grounding is necessary in order to understand abstract concepts, or whether accessing to metaphors might simply occur in certain cases. Evidence indicates that, at least in some subdomains, metaphors are automatically activated. However, this does not imply that without metaphors, the comprehension process would be impaired (for a similar objection against embodied cognition evidence more generally, see Mahon and Caramazza 2008). One way to address this claim (even if not completely) may consist in demonstrating that image schemas activation anticipates full comprehension of abstract concepts. Another more convincing way to address this claim is based on research on patients who have lost semantic knowledge of concrete concepts: Are they still able to understand abstract concepts? The absence of evidence in both directions has led some researchers (e.g., Murphy 1996; Pecher et al. 2011) to refuse the strong version proposed by Lakoff and Johnson, according to which abstract concepts’ representation is structured thanks to the concrete concepts on which they are metaphorically grounded, and to endorse instead a weak version: Both concepts would have a structured representation, and the representation of the concrete concept (e.g., space) would influence that of the abstract one (e.g., time).

A second problem of the conceptual metaphor view is that a consistent part of the evidence obtained relies on linguistic stimuli. However, there might be important differences between linguistic metaphors and underlying and more basic representation of the relationship between items. For example, using linguistic stimuli, Casasanto (2008) found that pairs of abstract concepts were judged more similar when the stimuli were closer together, in line with the idea that “similarity is closeness,” while when providing perceptual judgments closer stimuli were judged to be less similar. To cope with this problem, many recent studies have shown that the conceptual metaphor theory does not pertain linguistic but conceptual relations.

One further limitation of this view is that the neural evidence is still lacking. As argued by Kranjec and Chatterijee with regard to the relationship between space and time, it is unclear why the neural regions dedicated to time processing would not be activated during comprehension of time concepts.

Another problem of the conceptual metaphor view (see Dove 2009, for such a critique) is that its developmental trajectory is not plausible (Murphy 2006). Children start indeed to use metaphors rather late (Winner et al. 1976). A further problem highlighted by Barsalou and Wiemer-Hastings (2005) is that it can provide only a partial account of abstractness. Metaphors might take part in the representation of abstract concepts, but this is not necessarily the case. More crucially, the meaning of abstract concepts is not fully exhausted by metaphors: Metaphors might render some of their aspects more salient, but direct experiencing the referents of abstract concepts is crucial for their meaning.

Finally, we see one limit that the conceptual metaphor view shares with the action-based theory of abstract concepts, and that does not appear to be easily solvable with further evidence. It concerns its generalizability. In fact, it is hard to imagine how far this evidence can be extended beyond specific domains (for a similar critique, see Dove 2009; Goldman and De Vignemont 2009). How could we ground “philosophy,” or “truth,” in metaphors?

3.5 Multiple Representation View

In the previous sections, we have reviewed embodied theories according to which abstract and concrete concepts do not differ as they are both grounded in action, and embodied theories that, even if maintaining an embodied stance, highlight possible differences in grounding and representing concrete and abstract words.

In this section, we will distinguish our proposal from similar views. We will focus on proposals that share with WAT the idea that knowledge is represented by multiple systems, based on sensorimotor and on linguistic experience (see also Andrews et al. 2009, 2013).

3.5.1 Representational Pluralism: Dove

Dove (2009)’s view departs from an embodied stance, as it qualifies as only partially embodied—Dove (2011) entitles his Frontiers paper “On the need for embodied and dis-embodied cognition”, and it heavily rests on Paivio’s (1971, 1986) view. Dove (2009) argues that concepts are couched in two different types of representations, modal, and amodal, both perceptual and not perceptual. He recognizes that much evidence favoring an embodied approach has been collected, but this evidence is confined to highly imageable and concrete concepts. He argues that abstract concepts might be grounded in metaphors—for example, the concept of “respect” can involve a vertical metaphor—but this does not exhaust the conceptual content. According to him, the weakness of embodied approaches with respect to abstract concepts is not confined to the collected evidence, but it extends to the theories, which are built to explain concrete concepts and are much more compelling with them (see his critiques to the proposal by Jesse Prinz). Once highlighted the weakness of embodied theoretical accounts of abstract concepts, Dove argues that some abstract concepts imply amodal representations. Specifically, he relies heavily on studies in cognitive neuroscience and neuropsychology, which propose a novel Paivian dual code account, that is, which explain imageability effects on the basis of multiple semantic codes. Dove (2011) goes one step further, as he claims: “The core thesis of this paper is that concepts are couched in two types of simulation-based representations: those associated with non-linguistic experience of the world and those associated with experience of language.”

As recognized by Dove, his view partially overlaps with the WAT proposal: We argue that concrete and abstract concepts are grounded in both sensorimotor and linguistic experience, but that the acquisition of concrete concepts depends more on direct sensorimotor experience and the acquisition of abstract concepts is more likely to depend on linguistic experience. Dove explains that his proposal differs from WAT as in his view, the acquisition of language creates an amodal, dis-embodied system, since “natural language on my view is not merely another source of information about the world but is also another way of thinking about the world…. language is an internalized amodal symbol system that is built on an embodied substrate. As such, it extends our cognitive reach and helps us overcome the problem of abstraction” (Dove 2011, p. 8). In a more recent paper, the view outlined by Dove (2013) becomes closer to ours, as he speaks of language as an “embodied representation system” that interacts with other embodied systems, and he underlines that the abilities acquired thanks to language allow to use it not only as a medium of communication but also of thought.

We agree with Dove that evidence on abstract concepts is still not sufficient and, as we detailed above, we share his concern that not only the evidence but also some theoretical embodied proposals on abstract concepts fall short as they, even if interesting, are too limited in scope and therefore not able to provide a comprehensive account of abstractness. The proposal advanced by Dove and the WAT proposal share many aspects, and the common elements increase if we consider the last version of Dove’s proposal (Dove 2013). Some of the evidence we collected is clearly in favor of a multiple representation approach (e.g., Scorolli et al. 2011, 2012) (see Fig. 3.3), and WAT can be considered as a multimodal approach.

Fig. 3.3
figure 3

Study by Scorolli et al. (2011). German and Italian participants had to respond by pressing a different key on the keyboard whether noun–verb combinations made sense or not. Combinations were composed by abstract/concrete verbs and abstract/concrete nouns. The interaction reported on the graph shows that compatible combinations (abstract–abstract and concrete–concrete) are faster than mixed combinations. The result is in line with multiple representation views, according to which abstract and concrete words are processed in parallel systems (linguistic and sensorimotor), so that costs of processing within the same system are the lowest. The result was replicated and extended in a TMS study by Scorolli et al. (2012). The same stimuli and paradigm were also used in the Sakreida et al. (2013) fMRI study, reported in Chap. 5

Still, the constructive part of Dove’s proposal departs from the WAT view, for at least two reasons.

The first difference between Dove’s approach and WAT is his defense of amodal symbols. We fully agree with him that the internalized language can be used to improve thought processes. But we do not share Dove’s view that this language we use to think would be amodal. According to the EG perspective, language is grounded in perception, action, and emotional systems, thus it is not amodal. We see profound differences between the view, initially proposed by Vygotsky, later adopted and developed by some philosophers claiming an extended mind view (e.g., Clark 2008) and that we share (e.g., Borghi and Cimatti 2010; Borghi et al. 2013), that internalized language can be used as an instrument for thought which augments our computational abilities (see also Mirolli and Parisi 2009) and the idea of an amodal, arbitrary language of thought: The latter does not correspond to the real language we use, but it would be the product of a transduction process, which so far has not been empirically demonstrated (for compelling critiques of this amodal view, promoted for example by Fodor 1975, see Barsalou 1999). Notice, however, that, in a recent paper, Dove (2013) seems to slightly change or to better clarify his position: He does not speak any more of disembodiment but argues that his view is compatible with a weakly embodied approach. Again, however, he stresses the role played by language in terms with which we only partially agree: “If the underlying cognitive system is not inherently symbolic, then the acquisition of a natural language may provide a means of extending our computational power by giving us access to a new type of representational format.” We agree with Dove that language modifies our categories. However, we tend to avoid ascribing certain characteristics, as productivity and combinatorial capacity, only to language and not to other more basic cognitive processes as well. In line with theories of reuse (Anderson 2010; Gallese 2008; Parkinson and Wheatley 2013), we believe that some basic structures and mechanisms, as those of the motor system, are reused at a higher level by language: In this very sense, we can speak of language grounding (Borghi 2012 investigates this issue in more details). This does not mean that language use does not introduce modifications and changes in previously formed and more ancient structures, as those of the motor system. However, some characteristics of productivity and combinatorial capacity are possessed by the motor system as well as studies on the “motor vocabulary” reveal (Fogassi et al. 2005; Rizzolatti et al. 1988; Gentilucci and Rizzolatti 1988). For example, Gentilucci and Rizzolatti (1988) and Rizzolatti et al. (1988) introduced the metaphor of a motor vocabulary to refer to the neurons of area F5 in the monkey’s brain: The “words” of this vocabulary are neurons connected to different motor acts, which can be hierarchically organized: Some refer to the goal of an action (e.g., grasping, holding), some to phases in which the action can be segmented (hand aperture phase), and some to the posture with which an action can be executed (e.g., precision grip). These studies suggest that, similarly to words, motor acts can be combined in novel ways (even if obviously they are not arbitrarily linked to their referents, as words are). As clarified by Barsalou (1999), the productivity and combinatorial properties of symbols do not characterize only amodal symbols but grounded symbols as well. In sum, we are with Dove as he highlights the potentiality of language, but we do not think that language provides a novel representational format.

The second difference between Dove’s view and WAT concerns his defense of an approach, which for many aspects relies on Paivio’s (1986) DCT. Dove’s view departs from DCT in that he proposes that perceptual symbols rather than mental images are the basic units of both verbal and non-verbal representations. However, much of the evidence he reports in support of his proposal, both behavioral and neuroscientific, relies on the important role of imageability; thus, it has the shortcoming we discussed in the introduction: Imageability cannot be conflated with concreteness, and imageability ratings cannot explain perceptual strength as they are profoundly biased toward vision.

3.5.2 Grounding and Sign Tracking: Jesse Prinz

Prinz (2002, 2012) is a philosopher who proposes a theory of abstract concepts that has a lot in common with the WAT view. We will outline it, and then, we will discuss some criticisms Dove advanced to it, and finally, we will identify similarities and differences between this theory and the WAT view.

Prinz (2005) provocatorily claims that the explanation of abstract concept is a challenge for traditional disembodied theories, rather than for embodied ones: “If concepts were amodal, we wouldn’t face the question of how we can depict democracy, but we would face an equally challenging question. How can an arbitrary amodal symbol inside the head represent democracy? How can it represent anything at all?” (Prinz 2005, p. 12). Understanding words requires a tracking strategy: Since they are arbitrary symbols, to get their meaning, they need to be anchored to something non-verbal. Abstract categories are typically correlated with features that can be perceived and that can work as signs for the category. The “sign tracking” strategy consists in “representing such categories by detecting contingently correlated perceivable features” (Prinz 2002, p. 169).

According to the author, a first way to comprehend abstract concepts is to ground them in concrete scenarios. For example, “justice” can be grasped referring to simple situations: Inequality can be simulated referring to a scenario in which a person gets two cookies, and another three. Even if grounding, or sign tracking, holds for many abstract concepts, Prinz recognizes that it might not suffice for all kinds of concepts. He therefore identifies several further ways by which perceptual symbols can explain abstract concepts representation: metaphorical projection, mental operations and emotional connotations, and labeling. Beyond sign tracking, we will consider only the strategies referring to internal perceptual states and to labeling; we have discussed the metaphorical projection strategy elsewhere, since it is part of structured theories. According to Prinz (2002, 2005, 2012), abstract concepts evoke internal perceptual states (see also Barsalou and Wiemer-Hastings 2005, according to whom beyond situations introspective properties would characterize abstract concepts), particularly emotions. For example, the notion of “meaningful activity” is understood through introspection of motivations and of emotions. It is possible, however, that logical abstract concepts such as “truth” or “identity” are not understood through emotions. One further strategy proposed by Prinz (2002) is the labeling one: For example, the concept of “democracy” would be comprehended in terms of a network of associated terms. We will discuss the advantages of this strategy when we will outline distributional approaches. The novelty is that these associated terms are real words, not amodal symbols. Prinz (2005) states that this strategy is not sufficient per se, however. We do need to ground labels in order to understand them. This is exactly in line with our view: Abstract concepts need to be grounded in sensorimotor system; at the same time, they activate associate words more than concrete concepts do.

Dove (2009) criticizes the different points of the proposal by Jesse Prinz. We will report his critiques, clarifying whether and how they depart from our view. First of all, Dove (2009) criticizes the view, according to which abstract concepts are represented through the simulation of people performing the actions typically associated to a given concept. He uses as an example the concept of “democracy.” Dove argues that, given that the representation cannot be really fine-grained due to cognitive load problems, it necessarily has to take into account more typical actions associated with democracy. This would leave it unclear to what extent a representation based on perceptual symbols would have advantages over an amodal one, since a proper tracking strategy cannot be defined. But, as we discussed in Chap. 1, all concepts, even subordinate ones, always imply some degree of abstraction: The category of siamese cats abstracts from single instances of siamese cats as Peg, Fufi, etc. Thus, the capability of tracking differs only in terms of degree between concrete and abstract concepts. In addition, it is unclear to us why and where Dove puts a border between concrete and abstract terms. His argument, that given the conceptual complexity of certain notions only a subset of elements would be considered, holds also for concrete terms, which can be rather complex and the representation of which may vary consistently depending on the expertise: Consider for example complex artifacts such as robots and computers, or consider how complex the “concrete” living beings are.

The strategy based on internal states and emotions is also criticized by Dove, who uses the notion of “democracy” as an example to counterargument to Prinz: “Genuine acts of voting are not distinguished from false ones by the emotion experienced by the voters at the time of voting.” Here, we agree with Dove; however, in his recent book “Beyond human nature,” Jesse Prinz does not use “democracy” to give an example of the simulation of events using introspections or motivation; rather, he clarifies that a notion such as “democracy” can be understood referring to a variety of procedures we typically experience which involve counting votes, as for example, when we are with our family and have to decide where to eat dinner tonight.

As to the labeling strategy, according to Dove (2009), it has the problem of individuating in a precise way which associations pertain the conceptual content and which do not. In addition, he argues that a labeling strategy cannot explain polysemy and synonymy. As argued elsewhere, however (Borghi and Cimatti 2012), anchoring words to the way we use them can represent the solution of the problem. Indeed, polysemic words and synonyms are related to different, but similar, experiences, both linguistic and sensorimotor.

In sum, we think that the view proposed by Prinz (2002, 2012) has a lot of potentialities, and in many respects, it converges with the WAT proposal. The view that abstract concepts are grounded in concrete scenarios is shared by most embodied theories, including WAT. The emotional strategy is interesting as well. According to the WAT proposal, abstract concepts imply a social kind of acquisition; the social aspects, even if present, are typically less prominent in the representation of concrete concepts and in their acquisition. These social aspects might well include some emotional counterparts, given the strong associations between sociality and emotions (see the section on emotions). One further point is relevant in this proposal. According to Prinz, a concept such as democracy would be grasped both through mental imagery and through verbal skills, used to track definitions used by other authoritative members of our community. Comprehending abstract concepts implies the simple capacity “to match mental images with reality and sentences with testimony” (Prinz 2012). This view, according to which we rely on testimony to get the conceptual gist, converges with the idea proposed by WAT and discussed in Chaps. 2 and 4 of the importance of language and in particular of explanations for abstract concepts.

WAT diverges from Prinz’s proposal in some aspects that can be considered as minor. First, Prinz highlights the role of mental imagery and of perception for conceptual representation, while WAT underlines more the importance of the motor system and of action. However, as argued elsewhere (Borghi 2005; Borghi and Caruana in press), no real dichotomy between embodied and grounded theories emphasizing more the role of perception or of action exists. Second, WAT extends Prinz’s view by proposing that words are tools, that is, not mere vehicles of pre-existing experiences but also actions/experiences in their own right. Finally, WAT underlines the peculiar role played by acquisition in determining the representation of both concrete and abstract concepts.

3.5.3 Hybrid Models: Distributional and Embodied Approaches

Recent literature shows a flourishing increase of hybrid models. Embodied and distributional approaches are often divided by disciplinary boundaries, as recently explained by Andrews et al. (2013). While embodied approaches are mostly widespread in cognitive science and neuroscience, distributional and statistical accounts are more popular in computer science and modeling. In distributional views, meaning derives from the relationship between a word and its associate words, not between a word and its referent: As nicely summarized by Firth (1957), “You shall know a word by the company it keeps” (p. 11). According to one of the earliest and most powerful models, the latent semantic analysis (LSA) (Landauer and Dumais 1997), word meaning derives from the statistical co-occurrence of words in large text corpora. This distributive information is able to account for many empirical findings, most notably of semantic priming. Statistical learning theories are interesting also due to their anti-nativist flavor, since they ascribe a major role to linguistic experience. They are interesting, for us, also because statistical learning promoters do not defend Fodor’s ideas of amodal mental words, which are the product of a transduction from sensorimotor to simil-linguistic features. Rather, meaning is captured by the associations and the relations between real words. Andrews et al. (2013) show in a comprehensive review how attempts to reconcile the embodied and distributional approaches are starting to emerge in philosophy, psychology, computer science, and cognitive neuroscience. Our view is completely in line with this reconciliatory proposal. However, we believe that probably reconciling the two approaches is not sufficient to fully explain meaning, and particularly, meaning of abstract words. We will detail the reasons why we think this is not the whole story while analyzing in detail one influent hybrid approach, the symbol interdependency hypothesis proposed by Max Louwerse and collaborators.

3.5.3.1 Symbol Interdependency Hypothesis: Louwerse

Louwerse and Jeuniaux (2008) (see also Louwerse 2011) recently proposed the symbol interdependency hypothesis, according to which language comprehension is both embodied and symbolic. Notice that the theory they propose concerns conceptual representation overall, and does not focus on abstract concepts and words. However, from this general theory, a view on abstract concepts can be derived. The core of their proposal lies in the argument that symbols “can, but do not always have to, be grounded”. Specifically, they propose that, since language captures and keep track of the embodied relations that occur in the world, it can provide a “shortcut to the embodied relations in the world”. Indeed, symbols—for example words—are interdependent and interconnected with other symbols, but also with objects, that is, their referents. This characteristic guarantees the possibility that not all symbols are necessarily grounded: Some symbols are grounded, while others are grounded through the mediation of other symbols. The combination of an embodied and a symbolic approach would allow our conceptual system to be more efficient and to store and retrieve information in a more economical way. According to EG views, it would not be economical to transduce perceptual and action information into amodal symbols (Barsalou 1999). In the same vein, according to the SIH, it would not be convenient to transduce words into modality specific states. In support of their theory, L&J revise existing evidence. They argue that, for tasks implying deep semantic processing, the evidence favoring embodied account is rather uncontroversial. The story is different, however, for tasks that involve superficial semantic processing, such as semantic decision tasks, due to the fact that no transduction in a code other than the linguistic one, that is, no direct grounding, is necessary. Besides evidence favoring an embodied approach, according to them, there is evidence that symbols can derive meanings on the basis of their relationships with other amodal symbols. In particular, models such as LSA (Landauer and Dumais 1997) and Hyperspace Analogue to Language (HAL, Burgess and Lund 1997) determine the semantic relatedness among different text units (words, texts, etc.) analyzing the frequency of their co-occurrences and the similarity of the contexts in which they co-occur. According to these models, the meaning of a word such as “bird” would be the product of statistical computations from associations between “bird” and other concepts such as “nest, beak, fly, and robin.” These models produce outputs that correlate with behavioral results, in particular with semantic priming results, in explaining figurative language (for an overview, see Louwerse 2011).

Evidence in favor of the SIH was recently collected by Louwerse and collaborators. For example, Louwerse and Jeuniaux (2010) presented participants with words arranged either according to an iconic relation (e.g., the word “attic” was displayed above the word “basement” on the computer screen) or a reverse-iconic relation (the word “basement” was presented above the word “attic”) (Zwaan and Yaxley 2003). They found that linguistic factors, such as the word order frequency, better predicted results obtained in semantic judgments for words, while embodiment factors, such as the iconicity ratings, better predicted results obtained in iconicity judgments for pictures. Louwerse and Connell (2011) compared the embodied/modal and the statistical approach in predicting the perceptual modalities of words. They found that the modal approach was more precise, since the finer distinctions between auditory, gustatory, haptic, olfactory, and visual modalities were provided, while the statistical approach was not able to differentiate between olfactory and gustatory modalities. Louwerse and Connell performed an experiment in which they replicated the task designed by Pecher et al. (2003), in which participants had to verify whether a property is true or false of a given item. Two consecutive properties could be of the same modality (e.g., both visual) or of different modalities, thus determining a shift, for example from the auditory to the tactile modality. L&C operationalized the embodied shift as the shift between the five perceptual modalities (visual, haptic, auditory, olfactory, and gustatory modality), the statistics shift as between three linguistic modalities (visual–haptic, auditory, olfactory–gustatory). They found that the statistical approach better predicts faster responses, the embodied approach slower responses. The authors interpret their results as support for the LIH and for the LASS theory (Barsalou et al. 2008), claiming that less precise linguistic processes occur earlier than more detailed simulation processes. Connell and Lynott (2013) recently demonstrated that people tend to use linguistic shortcut to decide whether a combination between concepts will be coherent not only for shallow tasks, as sensibility judgments, but also for tasks involving deep conceptual processing, as the feature generation tasks. Further recent evidence has been found in support of the distributional approach in a variety of domains: For example, Hutchinson and Louwerse (in press) found that the statistical approach is able to account for the SNARC effect, that is, for the finding that left-hand responses to parity judgments are faster with smaller than with larger numbers, while the result is opposite with right-hand responses.

The evidence we illustrated testifies that the statistical account is rather powerful in predicting results. However, it cannot predict the results at the same level of detail as the embodied simulation account. In our opinion, this demonstrates that grounding of concepts is important for full comprehension. As to results on timing, we will discuss them when we deal with the LASS theory, for which they are directly relevant. One further limitation, which is crucial in the context of this book, is that the evidence collected so far concerns only concrete concepts and not abstract ones.

In evaluating the theory as a whole, we think that the part in which it critiques the embodied view has some limitations, but the proposal has many reasons of interest and many similarities with our view. First, we do not think it is true that most evidence on language grounding is obtained with deep processing tasks. A paradigmatic example is given by the work of Pulvermüller and other authors who demonstrated that the motor system is activated very quickly and with tasks that imply shallow semantic processing such as lexical decision tasks. In addition, even studies performed with tasks that require deep semantic processing typically focused on dimensions that were not relevant to the task. For example, Stanfield and Zwaan (2001) found faster responses when sentences like “The ranger saw the eagle in the sky” were followed by a picture displaying a bird with outstretched wings rather than a bird in its nest: Even if the task implied deep semantic processing, the perceptual dimension of shape was automatically evoked during this task. Second, Louwerse and Jeniaux (see also Louwerse 2008) report that some spatial relations such as the upper-lower part structure which are present in the world are also present in the linguistic structure. We do not believe that the very fact that certain kinds of relations, such as the spatial ones, are encoded in language, goes against embodied theories. On the contrary, this finding might contribute to demonstrate that the argument of a complete arbitrariness of language is not completely viable and that the language structure reflects the structure of our experience. We appreciate the fact that Louwerse and Jeniaux ascribe relevance to the real language and to the associations formed between words (see Landauer and Dumais 1997, for a similar position: Knowledge is not represented by amodal symbols, but by statistical distribution of real words). However, at the same time, we believe that meaning cannot be fully explained by word associations and that some form of symbol grounding (e.g., an embodied simulation) is needed for comprehension (Harnad 1990; Cangelosi and Harnad 2000; Pezzulo and Castelfranchi 2007). In our view, both concrete and abstract words are grounded. Thus, both the concrete word “tablecloth” and the abstract word “freedom” would activate a network of associated words, but also a variety of experiences (e.g., flying, running on the grass). But the role of words would be more relevant for the second, given that the objects, situations, and experiences evoked by a label are more diverse than those kept together by the word “tablecloth.”

In addition, the L&J proposal does not consider some important characteristics of language, such as its social aspect and its power in modifying cognition. The social aspects characterizing language, the fact that it is acquired through a form of social embodiment, are only partially explained by the fact that the relations between words reflect socially determined rules—language is learned in an embodied context in which different organisms interact and resonate. Furthermore, L&J do not say anything on the power words have in modifying cognition. One example of such power is given by the role language plays in categorization: Verbal labels can limit the boundaries of previously acquired categories, render them more uniform and lead speakers to converge on a subset of common features (see for example Puglisi et al. 2008). Finally, L&J do not address the issue of the role public words might play as external devices that guide our thought processes. In order to capture the complexity of language use, all these aspects should be considered; therefore, we think that the proposal by L&J should at a minimum be extended to take into account all of them.

3.5.3.2 Language and Situated Simulation (LASS) Theory: Barsalou et al.

One important view with which our proposal shares many similarities is the language and situated simulation (LASS) Theory advanced by Barsalou and collaborators (Barsalou et al. 2008; Simmons et al. 2008). Barsalou et al. (2008) propose that multiple systems underlie knowledge in the brain. LASS focuses on two of these forms of knowledge: linguistic forms (not amodal symbols!) and situated simulations. These two systems underlie concepts representation and processing and are in continuous and dynamical interaction; they are not incapsulated or modular. The LASS theory outlines how the dynamic of the process would occur and delineates the time course of activation of the two systems. Here is how the LASS proponents describe this interactive process. In linguistic tasks, the activation of linguistic forms would peak earlier, even if both the linguistic and the simulation systems are active. This would happen in line with the encoding specificity principle (Tulving and Thomson 1973), according to which memory is most effective when information available at encoding is also present at retrieval. After word recognition, further linguistic forms would be activated, at first through the simpler mechanism underlying conceptual processing, that is, word associations. These associations allow situating the cue-word within a linguistic context, and this permits to execute a variety of tasks. An important assumption of the LASS theory is that, to accomplish some tasks, such as the lexical decision tasks, that requires distinguishing words from non-words, only shallow processing is needed, and no access to deep conceptual knowledge is required. In this respect, the LASS theory is indebted to Glaser’s (1992) lexical hypothesis, according to which the advantage of pictures over words is due to the fact that images access in a straightforward way the conceptual system, while words do not. Once recognized, however, the word starts activating simulations as well, and it allows accessing to conceptual meaning in order to prepare for situated actions. In this case, the word works as a “pointer.” According to the authors, even if the activation of simulations can be rather fast and occur automatically, for many seconds, it does not dominate the initial stages of word processing. This idea is difficult to reconcile with Pulvermueller’s et al. (2005) results, showing that meaning of words such as “kick,” “lick,” and “pick” activate in a somatotopic way the motor cortex about 150 ms after word presentation. To conciliate their view with Pulvermueller’s, the authors propose that “simulations are likely to be activated simultaneously while the executive system is producing responses from the linguistic system” (Barsalou et al. 2008, p. 4). Notice that the authors use the term “linguistic system” to refer to the system that processes linguistic form, not linguistic meaning. This linguistic system, which has evolved later than the simulation system, basically works as a control system for manipulating simulations; thus, it does not imply access to deep conceptual information.

To note that the LASS theory has many similarities with Paivio’s dual coding view. Compared to the dual coding view, however, LASS ascribes more relevance to the simulation system for abstract concepts representation, while Paivio’s view attributes more relevance to the linguistic system.

An fMRI experiment by Simmons et al. (2008) represents one of the most important evidence in favor of the LASS theory. Participants were visually presented with concepts and performed a silent property generation task. In a further scanning section, they were submitted to two localizer tasks: They were asked to perform a word association task with some concepts and to imagine a situation containing the concept for other concepts. Word associations activated mostly neural areas which are typically involved in linguistic tasks, that is, left hemisphere linguistic areas, particularly Broca’s area, while image simulation activated mostly bilateral posterior areas that are typically active during mental imagery. In line with the predictions of the theory, in the property generation task, the linguistic areas were active earlier, while the second were involved in the second phase of property generation. This evidence is very interesting, however, also given the scarce temporal resolution of fMRI; it should be complemented with more detailed analyses of the time course of the emergence of different kind of properties. In addition, we think it has a further problem: It basically testifies that during conceptual tasks which employ linguistic stimuli as cues, the linguistic system is the first to be activated, while the imagery system is activated later. This does not imply at all that meaning is not accessed from very early phases. Indeed, word generation tasks, which involve activation of Broca’s area, imply accessing linguistic meaning. Furthermore, notice that the Broca’s area is considered the human homolog of the F5 area in monkey’s brain, where mirror neurons are found (see Rizzolatti and Craighero 2004, for a review). Activation of this area could also imply an activation of the motor system to prepare for situated action. In line with this hypothesis, many studies report Broca’s area activation for example during motor imagery (Binkofski et al. 2000) or during processing of action concepts disregarding the modality (vision or language) (Baumgaertner et al. 2007).

Notice that the LASS theory concerns word and conceptual processing and representation overall, and it is not focused on abstract concepts. However, Barsalou et al. (2008) report an fMRI study conducted on abstract concepts (see also Wilson-Mendenhall et al. 2014). Participants were presented with an abstract word (e.g., “convince”) and asked to verify whether it applied to a picture (e.g., a politician speaking) presented after. The results showed that the brain areas related to the content of the word were active, while there was no difference in activation of linguistic areas for concrete and abstract words. The authors conclude that “the representation of abstract concepts can differentially recruit the language and simulation systems. When task conditions allow, as in previous experiments, participants rely only on the language system, because it is adequate for task performance (e.g., in lexical decision and synonym tasks). When task conditions require deeper conceptual processing, participants rely on the simulation system, because it provides the necessary information for performing the task (e.g., verifying that an abstract concept applies to a picture…. different mixtures of the language and simulation systems support the processing of abstract concepts under different task conditions” (Barsalou et al. 2008, p. 267). This evidence captures one aspect which is important for the WAT view as well: The fact that not only concrete but also abstract words are grounded. In addition, it highlights the flexibility of the human conceptual system, emphasizing its task dependency. However, we think that something is missing here: Processing a word in order to verify its relation to an image differs from processing a word in the context of other words, and this control situation was not present. Finally, it should be clearer which abstract concepts the authors selected, since for many abstract concepts, it is not easy to form mental images without any cue (e.g., “truth”).

Overall, the WAT view shares with the LASS theory the idea that multiple systems represent knowledge. The main difference between our proposal and LASS lies in a different evaluation of the role played by language. The authors claim “language plays central roles in cognition and conceptualization. Nevertheless, experience plays a role that is at least as central” (Barsalou et al. 2008, p. 276). In disagreement with this, we believe that language and experience cannot be contrasted. In our view, perception-action and linguistic experiences do not have a different status. Words do not carry meaning only when they work as pointers. Instead, language carries meaning, which can be transmitted either pointing to referents in the world or referring to other words through a network of associates. One of the advantages of distributional models is that they have highlighted this very fact, allowing us to depart from the view according to which words are only pointers to their referents. Production of word associations is, in our view, a way to access to meaning. We agree that the role and weight of the linguistic and simulation systems might vary depending on the task. Without denying the profound dynamicity of our conceptual system, however, we believe that their respective weight differs also depending on the kind of concepts. As clarified more in depth elsewhere, the linguistic system is more relevant for abstract concepts, because the linguistic context was more crucial for their acquisition.

3.6 Conclusions: Many Theories, One Unifying Theory?

The aim of this chapter was to overview the most prominent recent theories seeking to account abstract concepts representations. As it appears from the discussion, the different theories capture many important aspects of abstract concepts representation. However, whether a unifying theory explaining all abstract concepts is possible is still an open issue, and we believe it is one important goal for future research.

In the overview, we have focused on embodied and on hybrid theories, and tried to show their similarities and differences from the WAT proposal. While discussing them, we have tried to demonstrate that all concepts, not only concrete ones, are grounded. We are with Prinz (2002, p. 148) as he argues: ‘‘…the failure to see how certain properties can be perceptually represented is almost always a failure of the imagination.” There are indeed compelling demonstrations that abstract concepts activate the motor system, similarly to concrete ones. However, this is not the whole story. We think that the grounding of abstract concepts differs from that of concrete ones, as many authors recognize. Abstract concepts activate more situations, more linguistic information, and more emotions compared to concrete concepts, which evoke more sensorimotor information. In line with distributional models and similarly to hybrid models, the WAT proposal stresses the role of language for abstract concepts representation. However, it does not equate the role of language to the information derived from word associations in a distributed network. Certainly, linguistic experience is also that, but it goes beyond that. WAT intends language in a complex sense, as a social experience which involves our body and which triggers our emotions.