Introduction

What role does language play in our cognitive lives? Is it a rich source of inner grounding or simply a vehicle for communicating our thoughts? Our purpose in this essay is to examine the varied ways in which grounded simulations of linguistic experience might help us acquire, represent, and use new concepts. The idea under consideration is that the multimodal networks associated with the dynamic production and processing of language provide an effective means of acquiring information that goes beyond our immediate experience. These networks can play a role in most concepts but are especially important in abstract concepts. We identify some of the ways that embodied language influences the acquisition and retrieval of abstract knowledge by acting as a source of inner grounding and means of social action. We defend the thesis that words, as physical symbols that we manipulate in an embodied fashion, enable us to leverage cognitive resources that would not be available to us otherwise: they enhance our perception, they allow us to sharpen and refine our representation of categories, particularly of those that are not directly tied to our immediate experience; they provide a means of coordinating context- and task-specific content; and they facilitate metacognitive processes that evaluate other cognitive operations. The capacity to act in a grounded way with language by means of sensorimotor re-presentations supports these cognitive and metacognitive functions, which are central to our capacity for abstract thought.

In what follows, we consider two theories—the Words As social Tools, or WAT, theory (Borghi & Binkofski, 2014; Borghi et al., 2017, 2019a) and the Language is an Embodied Neuroenhancement and Scaffold, or LENS, theory (Dove, 2019)—which propose that embodied language plays an active and significant role in our thinking and in our interaction with the physical and social environment. Both theories are committed to an embodied approach to cognition in which our concepts are grounded in experiential systems, and both claim that the grounded language system makes a number of significant contributions to our capacity to conceptualize our world and our social environment. Although these theories differ in their details and focus, they share the view that language is a significant embodied resource that transforms our cognitive niche (Clark, 2006).

What are abstract concepts?

Compared to concrete concepts like “bottle”, abstract concepts like “freedom” typically refer more to events, mental states, and situations and less to clearly bounded, manipulable objects or entities. Abstract concepts are often more flexible with respect to their semantic content, since they generally refer to relations between elements. A good example is the concept “cause”, which involves an actor, an action, an object/patient etc. (Pulvermüller, 2018). With respect to concrete concepts, abstract ones are more variable within and across individuals (Borghi & Binkofski, 2014), and tend to be more contextually flexible (Falandays & Spivey, 2019). This higher contextual flexibility may render abstract concepts more variable also across languages (Borghi, 2019; Kemmerer, 2019).

Approaching the distinction between concrete and abstract concepts as a dichotomy would, however, be misleading, since many abstract concepts have concrete components and vice versa: for example, the concept of “money” can refer both to material and physical properties of money as well as to more abstract elements, from social exchange to deontic positions (Barsalou et al., 2018; Borghi et al., 2017; Tummolini & Castelfranchi, 2006). Furthermore, these components might play a different role depending on the context. We, therefore, think that abstractness/concreteness ratings or imageability ratings are not sufficient for their identification (see Connell & Lynott, 2012 for supporting evidence).

A proper characterization of abstract concepts requires placing them in a multidimensional space. Specifically, compared to (more) concrete concepts, (more) abstract concepts are generally acquired later (late age of acquisition, AoA) and more through language and social interaction (e.g., when other people explain the meaning to us) than perception (linguistic modality of acquisition, MoA). We depend on other people more to understand their meanings (social metacognition; Borghi et al., 2018b). Concrete concepts are more imageable (imageability; Paivio, 1990), they involve more bodily interactions with the external world (Body Object Interaction, BOI, Tillotson et al., 2008), and they activate more contexts (Schwanenflugel et al., 1992; Villani et al., 2019; Crutch et al., 2013; Troche et al., 2017).

Finally, one of the most interesting developments in the recent literature on the topic has been the acknowledgement that a variety of abstract concepts exist (e.g., Borghi et al., 2018a). Until a few years ago, the psychological and neuropsychological literature on concepts focused mainly on distinctions between kinds of concrete concepts, such as that between natural objects and artifacts (Warrington & Shallice, 1984, Forde & Humphreys, 2005), and more recently between these two categories and those of natural and manufactured food (Rumiati & Foroni, 2016). In contrast, abstract concepts were treated as a unitary whole. Recently, however, a number of studies have showed how rich the organization of abstract concepts is, from emotions to philosophical and religious concepts, from theory of mind to numbers and time/space concepts, to social concepts (Catricalà et al., 2014; Desai et al. 2018; Fingerhut & Prinz, 2018; Fischer & Shaki 2018; Ghio et al. 2013; Harpaintner et al., 2018; Mellem et al. 2016; Roversi, Borghi & Tummolini, 2013; Villani et al., 2019) and that they might differ in the properties/dimensions they evoke (linguistic, interoceptive, exteroceptive, emotional, social, etc.). In a rating task, Villani et al., (2019) demonstrated that different kinds of abstract concepts exist, with a different degree of embodiment and of grounding in sensorimotor and interoceptive experiences. Villani et al. (2020, under review) also demonstrated with a behavioral interference paradigm that interoception is more crucial for abstract concepts (in particular for emotional concepts; see Connell et al., 2018) than it is for concrete concepts, and that manual action is more crucial for concrete concepts and abstract concepts with more concrete physical, spatio-temporal and quantitative content.

A multimodal and multilevel conception of grounding

Before we get to the specifics of the WAT and LENS theories, we need to do a bit of housekeeping. Embodied theories come in different strengths (Meteyard et al., 2012). Strongly embodied theories posit semantic representations that are fully constituted by experiential simulations within primary affective and sensorimotor areas, and weakly embodied theories leave room for higher-level modal, crossmodal, or even heteromodal representations and often acknowledge that a degree of abstraction takes place within and between modalities (Simmons & Barsalou, 2003; Vigliocco et al., 2004). In general, researchers are moving in the direction of weak embodiment (Barsalou, 2016; Pulvermüller, 2013).

Both the WAT and LENS theories are committed to a multimodal view of semantic memory that relies on widely distributed conceptual representations. They hold that concepts rely on a hierarchy of neural circuits that extend from modality-specific areas up to multimodal areas located within association cortices (Binder, 2016; Ferandino et al., 2016; Garagnani & Pulvermüller, 2016; Simmons & Barsalou, 2003). Heteromodal convergence zones (Meyer & Damasio, 2009) or network hubs (van den Heuvel & Sporns, 2013) make important contributions to our concepts. This hierarchical structure provides an explanation of how we are able to generalize or abstract away from experience.

There are currently two major approaches to generalization that seek to explain it in terms of cognitive architecture. The first views generalization in terms of so-called deep learning (Goodfellow, Bengio, & Courville, 2016). Buckner (2018) identifies a form of hierarchical processing in deep convolutional neural networks that he refers to as “transformational abstraction” in which sensory-based representations of category exemplars are iteratively converted into new formats that are more tolerant of variation and noise. The second views abstraction as a design feature of a hierarchical predictive coding view of neural functioning (Friston, 2003). This view emphasizes the importance of the interaction between top-down predictions and current sensory input in the explanation of action and perception (Clark, 2015). Both of these approaches are compatible with the sort of multimodal and multilevel embodiment favored by both WAT and LENS theories.

All of this leads to an important caveat: although language makes significant contributions to our capacity for abstract concepts, it is not the sole source of generalization or abstraction (Borghi & Binkofski, 2014; Dove, 2016). Even concrete concepts rely on a capacity to abstract from category exemplars. More positively, both WAT and LENS propose that the grounded language system serves as a rich source of inner grounding and social action within the context of an inclusive version of embodiment.

The WAT and LENS theories

Several recent theories suggest that language contributes to our grounded conceptual system. Some examples are Embodied Conceptual Combination or ECCo theory (Lynott & Connell, 2010), language and situated simulation or LASS theory (Barsalou, Santos, Simmons, & Wilson, 2008), Language and Associations in thinking or LASSO theory (Tillas, 2015), and Symbol Interdependency or SI theory (Louwerse & Jeuniaux, 2010; Louwerse, 2011, 2018). In addition, conceptual metaphor theory or CMT ascribes a relevance to language by proposing that knowledge is structured by metaphorical mappings from sensory experience and that culturally specific knowledge can be reflected in metaphors that differ across languages (Boroditsky, 2009; Casasanto & Boroditsky, 2008; Lakoff & Johnson, 2003; Winter et al., 2015). We can differentiate these theories in part by the importance that they assign to the language system. The LASS theory, for example, treats language as little more than a cognitive shortcut for embodied or grounded conceptual processing (see also Connell, 2019), while the SI theory holds that linguistic information plays a dominant role.

One of the defining features of the WAT and LENS theories is their commitment to viewing language itself as a richly embodied phenomenon. This commitment enables them to both acknowledge the power of nonlinguistic grounded cognition, which involves the re-engagement of action, emotion, and perception representations, and identify a number of significant contributions of language to our concepts in general and abstract concepts in particular. Although the WAT and LENS theories are broadly complementary to each other, they focus on different aspects of the contributions of language to cognition and are guided by different research interests.

Words as social tools

The words as social tools (WAT) theory (Borghi & Cimatti, 2009; Borghi & Binkofski, 2014; Borghi et al., 2019a) proposes that words can be considered as social tools useful to modify and impact the social and physical environment, and as inner tools, useful to support and refine our perception, categorization, thought processes (Borghi, 2020 under review; Lupyan & Winter, 2018). With respect to abstract concepts this proposal has four main tenets, that we will briefly summarize here (Borghi et al., 2019a):

  1. 1.

    The acquisition modality of concrete and abstract words differs: since abstract words collect heterogeneous members and are not characterized by a unitary, well bounded referent, they need more linguistic and social support to be acquired.

  2. 2.

    The neural representation of abstract concepts includes sensorimotor networks (embodiment), but involves interoceptive, linguistic and social networks to a larger extent than that of concrete ones.

  3. 3.

    Since linguistic experience as a whole is crucial for abstract concepts acquisition and representation, the mouth motor system is actively involved during their acquisition and processing, to a larger extent than what happens with concrete concepts.

  4. 4.

    Because linguistic experience is pivotal for abstract words, they are more influenced by differences across cultures and spoken languages than concrete ones.

As suggested above ("What are abstract concepts?"), these predictions are made with the general assumption that the concrete/abstract distinction is not a dichotomy, that it will be important to investigate and to study differences within kinds of (concrete and) abstract concepts, and that new, more ecological methods should be adopted to capture real conceptual use (see “Inner speech and metacognition” below).

Our elaboration of the WAT theory will only appeal to evidence directly collected by Borghi, Barca, Tummolini and colleagues, and throughout this subsection the first-person plural pronoun will refer to them inclusively. The findings outlined below support the specific tenets provided above. The overall behavioral and neuroscientific evidence consistent with WAT has been discussed in a longer paper (Borghi et al., 2019a).

Embodiment and language in acquisition and processing (tenets 1, 2)

A number of our studies suggest an important role of sensorimotor, linguistic and interoceptive systems in abstract concepts processing (tenets 1, 2), and show that the importance of these dimensions varies depending on the considered sub-kinds of concepts (e.g., emotional vs. philosophical). fMRI evidence with simple abstract and concrete sentences indicated that both sensorimotor and linguistic neural networks are activated during abstract sentence processing (Sakreida et al., 2013); evidence from a study on Italian Sign Language (LIS) showed that different abstract concepts are represented using different levels of embodiment, and that for some abstract concepts (e.g., “linguistics”, “truth”), LIS complemented iconic gestures with linguistic information derived from signed/spoken/written Italian or from other sign languages (Borghi et al., 2014). Villani et al. (2019) in a rating study with 425 abstract concepts demonstrated that the more abstract concepts are the later they are acquired and greater the importance of the linguistic modality. Villani et al. (2020, under review) have recently found with an interference paradigm and a difficulty rating task that abstract concepts, particularly emotional ones, are judged as more difficult while performing a concurrent interoceptive task—a result that appears to reveal the importance of inner grounding for abstract concepts (see also Connell et al., 2018). In the same study, there was also a condition in which participants were required to perform a manual action (squeezing a ball) during conceptual processing. This interfering condition led to an increase of the perceived difficulty of both concrete concepts and more “concrete” abstract concepts, i.e., “physical, spatio-temporal and quantitative” abstract concepts (e.g., acceleration, number, result).

Embodiment, language and social dimension (tenets 1, 2)

Villani et al. (2019) found that participants rated abstract concepts higher with respect to the importance of “social metacognition” (how much they needed others to understand the word meaning). Fini, Era, Darold, Candidi, & Borghi (2020) asked participants to guess the abstract/concrete word represented by a picture and were given suggestions by two confederates. Later they performed a motor interaction task (grasping a bottle) with an Avatar and were told that the movements of the Avatar were controlled either by the confederate who gave them suggestions on the abstract concepts or the one who gave them suggestions on the concrete concepts; they were also told that a further guessing section would follow. Participants asked for more hints to guess abstract concepts than they did to guess concrete concepts. More importantly, as predicted, their movements were more synchronous with the Avatar controlled by the person who gave them suggestions on abstract concepts. This finding is consistent with our hypothesis that we need more help from others and tend to be more collaborative with abstract concepts.

Evidence on the activation of the mouth motor system (tenet 3)

We found a number of facilitation effects associated with the activity of the mouth motor system. The processing of abstract words was facilitated with mouth compared to hand responses (microphone or device among the teeth vs. keyboard) in tasks mimicking conceptual acquisition in adults with novel concrete and abstract concepts and words (Borghi et al., 2011; Granito et al., 2015). In a definition matching task, the facilitation in response times of hand responses was limited to concrete words and not extended to abstract words (Borghi & Zarcone, 2016). A facilitation of mouth responses with abstract words was found to be present in a recognition task but not in a lexical decision task (Mazzuca et al., 2018). We have also collected TMS evidence with sentences composed of concrete/abstract nouns and verbs showing early activation of hand-related areas during processing of concrete verbs, and delayed activation of the same areas during processing of abstract verbs; the result was interpreted as a cascade effect owing to a previous mouth motor system activation (Scorolli et al., 2012).

Interference effects may also occur. A behavioral study on concrete/abstract categorization (Zannino et al., 2020, submitted) revealed that abstract concepts are more impaired by a concurrent articulatory suppression than by a concurrent ball squeezing task, suggesting that inner speech plays an important role during their processing.

The mouth motor system appears to influence conceptual development: in two cross-sectional studies we found that the extended use of a device that interferes with mouth/facial movements, the pacifier, affects the acquisition of abstract concepts and has a long-term influence on their processing later in life (at least up to 8 years of age). In a definition task, 6-year-old typically developing children who had used the pacifier for a longer period (e.g., more than 36 months of age) were as accurate as their classmates who had used the pacifier less or not at all (Barca et al., 2017). Verbal responses were also qualitatively coded based on the conceptual relations produced for defining the concepts. We distinguished ‘concrete strategies’ such as those, for example, referring to the perceptual properties of the concept such as its shape and colours (e.g., ‘walnut—something that has a hard shell, that you have to break to eat it’), from more ‘abstract strategies’ such as those, for example, referring to social norms (e.g., ‘helmet—you have to put it to ride a bicycle’). Children who used a pacifier for more than 36 months of age make more use of concrete strategies such as exemplification and functional relations, and less of abstract strategies. In contrast, those who did not use a pacifier used more abstract strategies such as free associations, or referred more frequently to social norms, or social-interactive situations.

Such an odd interference effect (since none of the tested children still used the pacifier, nor did they use it during the task) also occurs with older children performing a different task. In a categorization task, 8 years who had used the pacifier for a longer time were slower to respond to abstract than to emotional and concrete concepts (Barca et al., 2020). Pacifier use may affect motor aspects of speech (interfering with the building and consolidation of fine-tuned motor program) and auditory representations of speech (as the child receives an unstable trace of his/her own speech; for an account of the pacifier effect within a neurocomputational model of speech see Barca, 2019). Using the device in daytime during social interaction may interfere with both speech articulators and online social feedback, two important factors in linguistic, conceptual, and socio-emotional development (Pezzulo, Barca, D’Ausilio, 2014; Rychlowska and Vanderwer, 2020). It may have a greater impact on the acquisition and processing of abstract words as they are acquired later in life and rely more heavily on linguistically conveyed information during social interaction.

Overall, current findings suggest that the use of a device that limits mouth/facial movements during infancy and beyond might have a selective and long-term influence on the acquisition and processing of abstract concepts for which the linguistic and social input is more crucial.

Cross-linguistic studies, studies on different languages (tenet 4)

Behavioral cross-linguistic evidence with Italian and German participants showed that participants were faster with congruent sentences (abstract verb + noun, concrete verb + noun) than with mixed combinations, and when in mixed combination responses were faster when the first word was a concrete one, independent of the language and grammatical class of the word. Interestingly, however, there was an effect of language, linked to the different word order in German and Italian (Scorolli et al., 2011). Behavioral evidence on Chinese two-character words has revealed that Chinese participants are sensitive to the concreteness of their component characters (D’Aversa et al., 2020, under review), while cross-linguistic evidence with Italian and Iranian participants has shown that the interaction between concrete and abstract sentences and action differs depending on the considered culture/language (Ghandhari et al., 2020). Further studies with a free-listing task focused on the concept of gender have shown that its conceptual representation varies according to gender-related experience and language (Mazzuca et al., 2020; Mazzuca et al., 2020, submitted). Overall, these studies indicate not only the importance of the abstract/concrete distinction, but also the need to avoid taking it for granted, and the need to study how it is differently manifested across cultures and languages.

Limitations of WAT and future directions

The WAT proposal has clearly some limitations. Some data on conceptual development and language processing are hard to account for on this theory. For example, data on conceptual acquisition showed that there was no difference in lexical decision and definition accuracy between children with developmental language disorders (DLD) and other children (Ponari et al., 2018). These data challenge the primacy of language for abstract concept acquisition (tenet 1). Other results challenge the role played by the mouth effector during language processing (tenet 3). Studies on first graders revealed that prolonged use of a pacifier did not have an effect on definition accuracy (Barca et al., 2017). However, again, when looking at their content and at the conceptual relations produced, definitions of abstract, concrete and emotional concepts were less distinct in children who had used the pacifier beyond 3 years of age. Studies with adults revealed that there was no facilitation of mouth responses in a lexical decision task (Mazzuca et al., 2018). However, such a facilitation was found in a subsequent recognition task. Articulatory suppression seems to influence a categorization task (Zannino et al., 2020, under review) but not a task in which participants have to rate word difficulty (Villani et al., 2020, under review). It is unclear whether this means that inner speech is not recruited with abstract concepts in certain tasks, or whether in some cases/tasks inner speech is not specified at the articulatory level (Oppenhenim & Dell, 2010).

Beyond the data that are difficult to account for, some issues are currently underspecified. One example concerns the role of syntax, and its relationship with semantics in influencing abstract concepts acquisition and use (see Desai, 2019, and the reply by Borghi et al., 2019b). The role of syntax should be further clarified and specified. Another example pertains to the mechanisms involved during language—and mouth motor system—recruitment. In particular, the relationship between inner speech and mouth motor system activation, the relationship between inner speech, metacognition and social metacognition, and the relationship between inner speech and mind wandering should be better elucidated and further investigated. Another example concerns the levels of embodiment, and in particular the role played by interoception, especially for abstract concepts that are not strictly emotional. Finally, the differences between languages also deserve more investigation (Kemmerer, 2019).

Further data should be collected to better understand at which level language (overt language and inner speech) influences abstract concept acquisition and use, and in which tasks it flexibly intervenes. Specifically, further studies should be conducted on development and concept acquisition. Furthermore, more evidence should be collected to support/disconfirm the tenets of WAT, in particular as regards the kinds of embodiment (e.g., role of interoception), the role of inner speech, the metacognition, and the differences across languages, possibly with new and more ecological methods (see "Language and the flexibility of grounded cognition").

An important step that needs to be taken concerns the change of experimental paradigms. So far, the majority of studies have been conducted using classical behavioral tasks, while more studies should focus and investigate the relationship between language and abstract concepts using novel methods that capture online interaction and use of abstract concepts (see "Language and the flexibility of grounded cognition").

Language is an embodied neuroenhancement and scaffold

A seemingly paradoxical tension emerges from neuropsychological research. On the one hand, there are remarkable cases of preserved cognitive and, specifically, conceptual abilities in the face of significant language impairment (Lecours & Jeonette, 1980; Schaller, 2012). On the other hand, language impairments are often comorbid with other cognitive impairments (Noppeney & Wallesch, 2000). The LENS theory defuses this tension by offering a robust account of thinking with words within the context of grounded cognition. Its core thesis is that acquiring a natural language transforms our conceptual ecosystem as an important component of our flexible, multimodal, and multilevel conceptual system (Dove, 2019). It predicts that the grounded language system actively contributes to our concepts in at least four ways:

  1. 1.

    The language system should be more engaged in abstract concept processing than it is in concrete concept processing.

  2. 2.

    The presence of a label should have a number of effects on how we conceptualize and process a category. As grounded representations, words should actively modify and enrich our concepts in ways that are particularly useful for abstract concepts.

  3. 3.

    Some conceptual content should be encoded in the associations of words with other words.

  4. 4.

    Simulations of conversations should also play an important role. Because of this, knowledge about word use and discourse pragmatics should contribute to the task- and context-specific realization of concepts.

In sum, the LENS theory offers a view of how the grounded language system might augment and enhance our concepts, particularly abstract ones. By its lights, the language system is a grounded symbol technology that transforms our cognitive landscape broadly in the way that other grounded symbol technologies such as mathematical notation transform our cognitive abilities locally (Dove, 20092011, 2014, 2018).

The role of the language system

A robust and diverse body of evidence supports the generalization that the language system is more active during the processing of abstract concepts than it is during the processing of other concepts. Several neuroimaging studies find that abstract words elicit greater activation than concrete words in superior regions of the left temporal lobe (Binder et al., 2005; Giesbrecht, Gamblin, & Swaab, 2004; Mellet, Noppeney & Price, 2004; Sabsevitz et al., 2005) and inferior regions of the left prefrontal cortex (Binder et al., 2005; Fiebach & Friederici, 2004; Giesbrecht, Gamblin, & Swaab, 2004; Goldberg, Perfetti, & Schneider, 2006; Noppeney & Price, 2004; Sabsevitz et al., 2005). Meta-analyses reveal these areas to be the ones that are most likely to show increased activation with abstract concepts (Binder et al., 2009; Wang et al., 2010). In keeping with this, abstract nouns have been found to elicit greater activation than concrete nouns in the left superior temporal and left inferior frontal cortex (Sabsevitz et al., 2005). Accuracy with a lexical decision task was found to decrease with abstract concepts when repetitive transcranial magnetic stimulation or rTMS was applied over the left frontal inferior gyrus and the left superior temporal gyrus (Papagno et al., 2009).

Neuropsychological case studies provide further support for the involvement of these areas. A greater impairment for the processing of abstract words compared to concrete words has been found to be associated with left hemisphere damage, including patients who present with aphasia (Goodglass, Hyde, & Blumsten, 1969), deep dyslexia (Coltheart, Patterson, & Marshall, 1980; Franklin, Howard, & Patterson, 1995; Shallice & Warrington, 1975), and deep dysphasia (Katz & Goodlglass, 1990; Martin & Saffran, 1992). Reverse concreteness effects have also been found in patients with herpes simplex encephalitis (Sirigu, Duhamel, & Poncet, 1991; Warrington & Shallice, 1984) and patients with semantic dementia, a neurodegenerative disease that primarily affects the anterior and inferior portions of both temporal lobes (Bonner et al., 2009; Reilly & Peelle, 2008; Yi, Moore, & Grossman, 2007; for evidence that this pattern is not a typical feature of semantic dementia see Hoffman & Lambon Ralph, 2011). Patients who have undergone a selective unilateral anterior temporal resection (in either the right or left hemisphere) exhibit a reverse concreteness effect when their performance was compared to healthy controls and a group of patients with a more general semantic impairment (Loiselle et al., 2012).

Taken together, this evidence suggests that the brain areas that most reliably exhibit greater activation with abstract concepts are portions of the left ATL and the left IFG. Subregions of these areas have been linked to the language system: the left superior ATL has been linked to high-level speech perception and sentence comprehension (Hickok & Poeppel, 2004, 2007; Humphries et al., 2006; Vandenberghe, Nobre, & Price, 2002) and the left IFG (which includes Broca’s area) has been linked to several types of language processing, including auditory-verbal short-term memory, and the retrieval and selection of semantic knowledge (Badre & Wagner, 2007; Jeffries & Lambon Ralph, 2006; Thompson-Schill, 2003). Admittedly, this is all somewhat preliminary: not only are the neuroimaging results variable (Binder, 2007; Mkrtychian et al., 2019), but an argument can be made that a more fine-grained accounting of the neural circuits involved in processing concrete and abstract concepts is needed (Montefinese, 2019).

Words alone

The LENS theory predicts that the dynamic presence of words (as grounded re-presentations of sensorimotor experience) should modulate how we conceptualize objects and events in ways that are particularly helpful with abstract concepts. In keeping with this prediction, evidence suggests that labels may preferentially activate the diagnostic features of categories (Boutonnet & Lupyan, 2015). Verbal cues (such as the spoken word dog) appear to activate more general representations than non-verbal cues (such as the sound of a dog barking; Edmiston & Lupyan, 2015). If the active presence of embodied representations of words helps us process general features, then we would expect that aphasics would struggle in comparison to age and education matched neurologically intact controls on certain categorization tasks. Support for this is provided by the fact that a selective impairment on categorization tasks involving low-dimensional categories (in which the objects share few features such as “things that are green”) has been found (Lupyan & Mirman, 2013).

How might this influence be realized neurologically? One possibility is that words might serve as a means of stabilizing and organizing distributed conceptual representations. Pulvermüller (2013, 2018), for instance, proposes that the presence of linguistic representations enable the formation of Action Perception Circuits (APCs). Learning a language, on this account, leads to the formation of these distributed circuits by means of both Hebbian and anti-Hebbian learning mechanisms.

Word-to-word associations

Knowledge of word-to-word associations is important for the acquisition of syntactic competence as well as the capacity to produce and comprehend speech. The LENS theory hypothesizes that associative and structural links between grounded representations of words play an important role in our concepts. Some initial support for this is provided by the success of models of distributed semantics that treat concepts in terms of knowledge of statistical patterns derived from spoken and written language (Blei, Ng, & Jordan, 2003; Landauer, Foltz, & Laham, 1998; Lund & Burgess, 1996). This admittedly indirect support is strengthened somewhat by the apparent superiority of hybrid models that combine non-linguistic experiential knowledge and language-based distributional knowledge to non-hybrid models that limit themselves to one type or the other (Andrews, Frank, & Vigliocco, 2014; Louwerse & Jeuniaux, 2010; Riordan & Jones, 2010).

A particularly striking real-world case is the acquisition of color concepts by congenitally blind people (Lupyan et al., 2020). Such individuals have been shown to have a remarkable understanding of color space and the color of objects (Dimitriva-Radojichikj, 2015; Lenci et al., 2013; Shepard & Cooper, 1992). Researchers have shown that it is possible to recover a significant amount of information about color from the distributional structure of color language (Kim, Elli, & Bedny, 2019; Lewis, Zettersten, & Lupyan, 2019). A recent study finds that a region of the left dorsal anterior temporal lobe supports the knowledge of object colors in congenitally or early blind participants and sighted controls (Wang et al., 2020).

Although most research to date has focused on distributional properties that ignore the structural relationships between words (even word-order), the LENS theory predicts that knowledge of syntactic relationships should also play a role in the acquisition and use of abstract concepts (see also Kemmerer, 2019). Developmental evidence connecting the emergence of theory of mind abilities to the mastery of certain syntactic constructions (Astington & Jenkins, 1999; de Villiers, 2007) provides some support for this additional hypothesis (Dove, 2019).

Words and conversations

The LENS theory predicts that pragmatic and discourse-related knowledge indigenous to the language system should contribute to our understanding of concepts in general and abstract concepts in specific. In contrast to standard theories of concepts that posit fixed conceptual cores (Machery, 2015), LENS adopts a contextualist perspective. Given that abstract concepts go beyond our direct experience and are thus especially flexible, language may provide an important source of invariance (Yee, 2019). Conversations may provide the means by which we are able to dynamically coordinate the content of abstract concepts.

The LENS theory proposes that knowledge pertaining to word use shapes the context- and task-specific realization of our concepts. It proposes that conversations provide a means of metacognitively examining and refining our conceptual knowledge. Rehearsing and imagining conversations with others in addition to self-directed inner speech may help us fine-tune and adjust our concepts (Clark, 2006; Kompa, 2019). In keeping with this, evidence suggests that folk-psychological narratives structure and influence the development of theory of mind (Berio, 2020).

We discuss both of these influences of language, the facilitation of context- and task-sensitivity and the enhancement of metacognition, below in "Language and the flexibility of grounded cognition" and "Inner speech and metacognition", respectively. These influences warrant special emphasis, because they reflect the ways in which the WAT and LENS theories offer richer visions of the role of language in cognition than other theories.

Limitations of LENS and future directions

The LENS theory provides an overarching account of the role of the grounded language system in cognition. A weakness of such an account is a lack of granularity. For example, in contrast to the WAT theory, the LENS theory does not make specific predictions concerning the role of the mouth motor system during conceptual processing. It predicts that distributed sensorimotor representations should be at play but does not make detailed predictions about specific modalities. This leaves room for future investigation into the relative contributions of particular action and perception systems.

An important function of the LENS theory is to unify and extend the findings from more focused research. Ultimately, its success will be measured in terms of the robustness of its generalizations concerning the importance of grounded linguistic representations as anchors for distributed semantic circuits, elements of associative links, and components of simulated conversations. For a specific example of how this integration might work, the LENS theory brings together independent and pre-existing bodies of research indicating the importance of labels, word-to-word associations, and conversations to the development of theory of mind (Dove, 2019). This example highlights an important feature of the LENS theory: its commitment to a dynamic view of the impact of language on cognitive development. Future work should seek to examine the degree to which these factors are important to the acquisition of abstract concepts more generally.

The LENS theory fails to account for some of the same data highlighted above in our discussion of the WAT theory. Ultimately, it would be falsified by either of two theoretical possibilities. First, abstract concepts could be embodied in such a way that the grounded language system does not play a central role. The LENS theory is clearly incompatible with accounts of abstract concepts that account for them primarily in terms of nonlinguistic sensorimotor simulations. Fortunately, this possibility fits poorly with studies that implicate the language system in at least some abstract concepts (e.g., Harpaintner, Trumpp, & Kiefer, 2018; Desai, Reilly, & van Dam, 2018). Second, the language system could be important to abstract concepts but only as a source of amodal cognition (Kompa & Mueller, 2020). In contrast to this possibility, the LENS theory predicts that future research will highlight the importance of grounded linguistic simulations to abstract concepts.

Language and the flexibility of grounded cognition

Abstract concepts are more variable than concrete concepts with respect to both how they are characterized by individuals and apply to situations. People converge with each other more in defining and explaining the meaning of concrete than of abstract words, since these last refer to varieties of heterogeneous and idiosyncratic experiences (e.g., “freedom”); at the same time, even if all concepts are continuously updated as a function of the current context, abstract concepts vary more in the way that specific individuals represent them. In other words, our concept of “bottle” changes less over time than our concept of “justice”. The contextual flexibility of abstract concepts is not a new problem. Indeed, it underlies Aristotle’s well-known charge that Plato is too quick to assume that a universal goodness is shared by all good things (Shields, 2016). Aristotle famously points out that if one looks at different examples of goodness, one finds evidence that the term is multivocal (Aristotle, 1995).

Researchers have developed different measures of contextual flexibility. For example, semantic diversity is a measure of the degree to which a word is used in different contexts (Hoffman, Lambon Ralph & Rogers, 2013). Abstract words tend to rate higher on this measure than concrete words. For example, the word spinach tends to occur in contexts relating to cooking and eating and thus receives a relatively low semantic diversity rating, while the word life can occur in a number of different contexts and thus receives a higher rating (Hoffman, 2016). Words that have high semantic diversity irrespective of imageability take longer to process in semantic relatedness tasks (Hoffman & Woollams, 2015). Abstract concepts also exhibit less situational systematicity than concrete concepts—that is, they are less constrained with respect to the situations that they involve or invoke (Davis, Altmann, & Yee, 2020). Both of these measures fit well with theories of concepts that view them as schemas capturing information about how objects and events interact within real-world situations (Barsalou, Dutriaux & Scheepers, 2018; Gilboa & Marlatte, 2017).

Even though abstract concepts are widely recognized as being highly variable across contexts, most of the methods used to investigate how they are represented fail to take this into account (Barsalou, 1993). The majority of studies thus far have used comprehension tasks with single words. Researchers often use ratings on individual dimensions, such as imageability, interoception, perceptual strength, Body Object Interaction to characterize concepts and obtain normative data (e.g., Connell et al., 2018; Lynott et al., 2019; Tillotson et al., 2008; Della Rosa et al., 2010; Villani et al., 2019, under review). Among the frequently used tasks are written or auditory lexical decision tasks (e.g., Lund, Sidhu, Pexman, 2019; Ponari et al., 2018), categorization tasks (e.g., Barca et al. 2020), recognition tasks (e.g., Mazzuca et al., 2018; Paivio, 1990), property verification tasks (e.g., Pecher et al., 2003; Borghi et al., 2011). Even though some studies address how different tasks affect conceptual processing, the very fact that concepts are studied in isolation prevents researchers from capturing their relative flexibility and contextual variability. Studies that make use of simple sentences instead of single words, e.g., requesting participants to evaluate the sensibility of different kinds of sentences (Glenberg et al., 2008; Pecher & Boot, 2011), may reveal to some extent the variability of abstract concepts across minimal linguistic contexts. Again, however, we believe that these methods are not sufficiently tailored to capture how flexible abstract concepts are.

A frequently used method to investigate concepts is the feature listing task. Compared to the aforementioned ones, this task is less constrained and lets participants more freely produce properties associated to the target concepts; in this case, participants are required to produce word associations, to generate features (e.g., Barsalou & Wiemer-Hastings, 2005; Harpaintner et al., 2018; Roversi et al., 2013) or to provide definitions (e.g., Barca et al., 2017; Ponari et al., 2018). These tasks have the advantage of better reflecting the dynamics of the current situation than the other methods which consider words in isolation or embedded within very simple sentences. However, feature listing tasks also have some limitations: for example, people tend to list fewer features for abstract than for concrete concepts, and the short sentences they produce might not be able to capture the complexity of the underlying conceptual representation (Zdrazilova et al., 2018).

As recently argued by Barsalou (2020), we need to investigate concepts in the context of situated action. We are convinced that researchers need to go even further, and to profit from new methods developed in psychology and neuroscience that might be employed to investigate acquisition and use of concepts with interactive paradigms. The recent understanding that multiple dimensions, including language and sociocultural practices, have a crucial role for concepts renders it pivotal to promote a shift of paradigms.

A recent example of a promising task has been provided by Zdrazilova, Sidhu and Pexman (2018) who analyzed words and gestures used during the taboo task, in which participants had to communicate the meaning of an abstract word without using the word themselves. Gestures can provide important clues to gain information on the underlying conceptual system (Hostetter & Alibali, 2008). They found that when using abstract instead of concrete words participants, not only referred more frequently to people and used more introspective features, but also used more metaphorical and beat gestures. Conversely, when participants used concrete words they referred more frequently to objects and entities and used more iconic gestures.

So far to our knowledge few researchers have investigated abstractness by means of interactive paradigms. However, some recent developments in the literature offer fruitful suggestions for developing novel methods. Some examples: There is a broad and well-established line of research investigating joint action from an embodied perspective, in which a variety of interactive tasks are adopted (e.g., Knoblich, Butterfill & Sebanz, 2011; Galantucci & Sebanz, 2009; Pezzulo et al., 2017). Furthermore, interesting approaches have been developed that consider dialogue as a form of joint action (e.g., Pickering & Garrod, 2013). Other recent studies investigate the use of abstraction (not of abstractness) in interactive tasks such as Lego constructions or problem-solving situations (e.g., Bjørndahl et al., 2014; Tylen et al., 2018). This literature, which mainly adopts behavioral and kinematic methods, can provide useful hints about how to change and update our methods of investigating concepts. At the same time, naturalistic methods are increasingly spreading in neuroscience, and proponents of dual person neuroscience emphasize the importance of studying neural and cognitive processes in the context of social interaction (Schilbach, 2015; Redcay & Schilbach, 2019). For example, studies adopting these approaches directly investigate emotions in interactive contexts (Nummenmaa et al., 2012) both in neurotypical populations and other populations, e.g., individuals with an autism spectrum condition (Stevanovic et al., 2019). Because abstract concepts are more variable across contexts compared to concrete concepts, and because they are more strongly influenced by the social dimension, it is high time to take inspiration from other domains to start employing novel, interactive methods to investigate their use (Falandays & Spivey, 2019).

Some of the evidence from cognitive neuroscience can be reevaluated in light of the full-bodied contextualism embraced by both WAT and LENS. As mentioned above, one of the more reliable findings from brain imaging studies is that the LIFG is more active during the processing of abstract concepts than it is in the processing concrete concepts (e.g., Binder et al., 2005; Fiebach & Friederici, 2004; Noppeney & Price, 2004; Papagno et al., 2009). There have been two main theoretical accounts of the contribution of the LIFG (Della Rosa, Catricala, Canini, Vigliocco, & Cappa, 2018). On the first, the LIFG engages a network of circuits associated with language processing (Barca et al., 2011; Goldberg et al., 2007; Wang et al., 2010). On the second, the LIFG handles the semantic control functions that enable the selection of appropriate aspects of meaning in a specific context (Fiebach & Friederici, 2004; Hoffman, Binney, & Lambon Ralph, 2015; Noppeney & Price, 2004). Manipulating both imageability and context availability, a recent study found evidence that the LIFG was involved in two separate networks: one associated with low imageability located primarily in the left hemisphere and the other associated with low context availability located primarily in the right hemisphere (Della Rosa et al., 2018). If this evidence holds up, we have reason to think that the language system contributes both to the representation of abstract concepts and their contextual flexibility.

Inner speech and metacognition

In this section, we argue that inner speech may play a number of metacognitive roles in abstract concept processing. Classical research on inner speech has been linked to two influential traditions, one started by Vygotsky (1986), according to which inner speech would be a condensed inner conversation endowed with a regulatory function, and one linked to working memory research (Baddeley, 2010). The legacy of Vygotsky is certainly more relevant for us, but we think that both views can contribute in explaining how inner speech might be employed in the context of abstract concepts.

The importance of inner speech is viewed by the framework of the WAT theory as a way to explain the activation of the mouth motor system. Such activation was consistently found in a variety of studies. Mouth activation was found with behavioral tasks with adults and children when comparing abstract with concrete concepts (e.g., Barca et al., 2017, 2019, 2020; Borghi et al., 2011; Borghi & Zarcone, 2016; Granito et al., 2015; Mazzuca et al., 2018, see Sect. 3) but also with ratings and with fMRI with abstract mental state concepts when compared with other kinds of concrete and abstract concepts (Dreyer & Pulvermüller, 2018; Ghio, Vaghi & Tettamanti, 2013). Mouth activation specifically does not play an important role in the LENS theory, but it does fit with this theory’s commitment to the idea that simulations of language experience serve as a means of inner grounding.

We view inner speech as a form of real speech, engaging the motor system in a way that is similar to overt conversation and with sensorial dimensions that can potentially involve not only audition but also tactile and visual sensory modalities (Loevenbruck et al., 2018; see for a different vision, for which not only phono-articulatory but also auditory imagery is necessary, Langland-Hassan, 2018). The auditory dimensions might be differentially involved depending on the stage of development, for example neurodevelopmental models suggest that auditory feedback is crucial during the early stages of language acquisition but is less prominent later. Crucial for us is the fact that the phono-articulatory and sensory component is linked to the semantic one (Bermúdez, 2018; Carruthers, 2018): in our view, inner speech might be condensed and synthetic, but it certainly brings semantic content.

For one, inner speech can be employed as a way to re-enact the modality of acquisition of words: that is, as emphasized by Vygotsky (see also Morin, 2018), we internalize social exchanges, including conversations. Inner speech could thus provide a means—and not necessarily the only one—to simulate the social context of conceptual acquisition by internally rehearsing the experienced social exchanges. This is consistent with the activation of the left IFG during abstract word processing: as shown in a meta-analysis by Morin & Hamper (2012) of brain imaging studies involving self-referential thinking, which found the highest activation rate of inner speech (77%) in studies that involve retrieval of autobiographical information. In this case the use of inner speech may involve a form of working memory that engages the phono-articulatory loop.

Inner speech may also represent a form of second-order cognition (Clark, 1998) or metacognition, a form of thought aimed at reflecting on our own thought processes. It can, therefore, be seen as a monitoring mechanism endowed with a predictive role (Pickering & Garrod, 2013; Carruthers, 2018; Loevenbruck et al., 2018; Swiney, 2018). This mechanism has been considered as especially active and helpful during early word acquisition, as revealed by work on 18 months showing that they innerly name objects displayed in visual images (e.g., Mani & Plunkett, 2010). We hypothesize that it might be more extensively used during abstract than it is during concrete concept acquisition, both because abstract words can rarely be learned through ostension (e.g., pointing to a referent) and because of their lower frequency. More importantly, this monitoring mechanism might be crucial during abstract word processing and word production. In our view, the meanings of abstract concepts generate more uncertainty than those of concrete ones, and a stronger and longer monitoring process is often needed. Basically, we are less certain of the meanings of abstract words, less confident in how well we understand them, and less sure of how proficient we are in using them. We have recently proposed (Borghi, Fini & Tummolini, 2020, under review) that this awareness of our knowledge limits (Shea, 2018) might have two possible outcomes: it might lead us to continue searching the meaning or to prepare us to ask information to others (social metacognition, Borghi et al., 2018b, 2019a). Both activities could occur through inner speech. Notice that we do not necessarily assume that these processes occur sequentially, i.e., monitoring might coexist with meaning search and with social metacognition, with oscillations back and forth and possibly even a feedback loop.

In this case inner speech would be more strictly related to the preparation of real speech—speech that might, however, remain implicit and never become overt. The motoric component would play a prominent role, but it could be accompanied by the sensorial auditory component. Importantly, as underlined elsewhere, we think that these mechanisms can coexist. In keeping with this, Alderson-Day et al. (2016) have distinguished between a monologic and a dialogic form of inner speech, which engages a broader and more bilateral network of neural areas going beyond the left frontotemporal linguistic regions.

Collecting experimental evidence is of paramount importance to disentangle these mechanisms and better identify their roles. In particular, future studies should clarify whether, and in which, processing phases inner speech is involved. Our hypothesis is that it facilitates word acquisition but also plays an equally important role during word processing and production. Furthermore, this evidence should help us to understand which components of inner speech are activated – we predict that if it is mainly involved as a form of working memory and is used to search for meaning, then phono-articulatory aspects would be important; in the case of social metacognition, instead, motor preparation would dominate. Notice that in both cases inner speech would have an important predictive and preparatory role, as interpretations of inner speech in terms of predictive coding clearly highlight (Swiney, 2018).

Another potentially important issue concerns the relationship between inner speech, mind wandering, and abstract concepts. Bastian et al. (2017) recently investigated the relationship between inner speech and mind wandering. They demonstrated that the likelihood of spontaneous mind wandering was impaired during articulatory suppression, which notably suppresses inner speech, while the presentation of verbal material did not influence the likelihood of mind wandering. They employed an ecological method, using a smartphone application, where participants received questions about their mind-wandering episodes randomly throughout the day. Inner speech vividness was positively correlated with the particpants' awareness of their mind wandering, while such awareness was not predicted by the visual or auditory vividness of their thoughts. Hence, inner speech and mind wandering appear to be related. In future studies we intend to explore the relationship between mind wandering and abstract concepts. Abstract concepts are less bounded to a specific concrete referent and evoke different contexts; in a similar fashion, mind wandering tends to be unrelated to the current task (Christoff et al., 2016; Ciaramelli & Treves, 2019), hence we hypothesize that they could activate more mind wandering than more concrete concepts.

In this section, we have identified several ways in which inner speech may underwrite metacognition and the flexible use of abstract concepts, and we have pointed to novel methods for investigating such dynamic and interactive phenomena. These functions and novel methods fit well with the general claim that language itself is a form of grounded cognition and the specific proposals at the heart of the WAT and LENS theories concerning the importance of the simulation of speech in abstract concept acquisition and use. They do not fit well with more minimal accounts of the contribution of language to our concepts that focus on language as a cognitive shortcut or a source of distributional information (e.g., Barsalou et al., 2008; Connell, 2019; Louwerse & Jeuniaux, 2010; Lynott & Connell, 2010; Tillas, 2015).

Conclusion

The idea that the language system might contribute to grounded cognition is shared by several recent theories. Few of them, though, explicitly focus on the acquisition, representation, and use of abstract concepts in a sustained manner. In contrast, both the WAT and LENS theories offer full-throated accounts of how language serves as a source of inner grounding and social action for abstract concepts.

According to the WAT theory, language is not simply a cognitive shortcut dependent on superficial processing. Words are not merely placeholders but are, instead, tools for action within our social niche. Specifically, they are physical tools that change and refine our perception of the external environment; they are inner tools that modify and enhance our thoughts; and they are social tools that allow us to interact with others (Borghi, 2020, under review). They are acquired and are used in ways that dynamically interact with the social context. Since instances of abstract concepts are typically more heterogeneous and diverse than those of concrete concepts, WAT proposes that linguistic and social input is particularly crucial for their acquisition, and that this experience influences their representation and use. Furthermore, during language use we might be more uncertain on abstract than on concrete word meaning; this might push us to actively search for the help of others (social metacognition, Borghi et al., 2018b); this recognition of others as dispensers of knowledge (Fini & Borghi, 2019) might foster and promote social bonds (Borghi & Tummolini, in press). While WAT has mainly stressed the importance of words as social tools, it also emphasizes the importance of language as an inner tool that enhances our cognitive abilities through the use of inner speech.

According to the LENS theory, language supports a form of thinking that would not otherwise be available, because linguistic competence amounts to the ability to manipulate a physically instantiated combinatorial symbol system. Importantly, the symbols themselves require distributed multimodal simulations of physical events and are fundamentally tied to our embodied experience. Having access to these symbols enhances cognition and enables us to capture and communicate about concepts that go beyond our direct experience. Abstract concepts often depend on the presence of words as labels, the statistical and structural relationships between words, and the pragmatic capacity to generate and respond to conversations.

Examining these theories together sharpens our understanding of each, highlighting their possible strengths and weaknesses, but it also accomplishes something more: it demonstrates the promise of a research program focused on the role of language in abstract concepts within the context of a flexible, multimodal, and multilevel conception of embodied cognition. In addition to reviewing the tenets of each theory and the evidence supporting them, we have identified some revealing preliminary research and discussed the need to adopt new experimental paradigms.