Keywords

4.1 Introduction

In the previous chapters, we have proposed that the difference in representation between concrete and abstract words might be due to their different acquisition modality. The aim of this chapter is to explore the rich literature on word learning and word acquisition and to illustrate evidence on how these processes occur that might be relevant for the WAT proposal. First, we will outline approaches that emphasize the importance of social aspects in word learning, and then, we will turn to embodied approaches highlighting the role of perception and action in word acquisition. We will also show that some hybrid approaches have been proposed, in which both social-linguistic and perception–action elements are taken into account. According to them, multiple and different cues might lead to word learning, and the role of perception and action cues might have more weight in the early phases of word learning, while the role of social and linguistic cues might be more prominent once children master some social abilities and possess a consistent vocabulary. We will then describe the literature on modality of acquisition, according to which different kinds of words, concrete and abstract, might be learned through different strategies. Finally, we will report some acquisition studies with adults realized in our laboratory. In this review, we have no pretense of being exhaustive in providing an overall framework of studies on children word acquisition. We simply intend to outline approaches and illustrate evidence that are in line with and that provide support to the WAT proposal.

4.2 Social Aspects in Word Learning

Research on conceptual development has been widely influenced by the image of the child as a lonely learner, advancing and testing hypotheses on his/her own, as shown by Gelman (2009) in her critical review. In contrast with this view of children as lonely hypotheses testers, every form of learning, even if apparently self-directed, relies on a social and cultural milieu, which is often taken for granted, and therefore not investigated. We will briefly overview some research lines where the importance of social input for children’s word learning has been highlighted. First, we will illustrate studies that put some emphasis on the social aspects involved in language learning, as the approaches of Vygotsky, of cultural psychology, and of comparative studies, for example, by Tomasello and collaborators. Then, we will turn to studies that show that children—and adults as well—use different modalities to acquire words, one more based on perception, one more linguistic and other combinations of the two aforementioned ones. Finally, we will describe studies on testimony that demonstrate the kind of linguistic contribution adults give to children’s learning.

4.2.1 Cultural Psychology and Vygotsky

An obvious exception to the individualistic way of conceiving children’s learning is given by cultural psychology, an area of psychology that considers mind and culture as strictly interwoven and which focuses on the impact of culture on human thought, questioning universalist’s assumptions on the human mind (e.g., Bruner 1990; Nisbett 2003; Nisbett et al. 2001; Shweder 1991). In this area, demonstrations are flourishing, showing that experimental results obtained in Western industrialized societies cannot be generalized and considered as universal (e.g., Henrich et al. 2010; Medin and Atran 2004; see also Prinz 2012); rather, culture has a marked impact on cognition.

This view relies heavily on the thought of the well-known Russian psychologist Vygotsky (1978, 1981, 1986, 1987). The distinction proposed by Vygotsky between scientific and spontaneous concepts is relevant and fully in line with the WAT proposal, since it partly mirrors the distinction between concrete and abstract concepts we proposed. Let us briefly analyze Vygostky’s thought on this topic (see Karpov and Bransford 2005). Compared to spontaneous, everyday concepts, scientific concepts are more general and have a systematic organization and are related primarily to other concepts rather than to the object or event they refer to. “The interdependence between spontaneous and scientific concepts stems from the special relations existing between the scientific concept and the object. In the scientific concepts that the child acquires in school, the relation to an object is mediated from the start by some other concept. Thus, the very notion of scientific concept implies a certain position in relation to other concepts, i.e., a place within a system of concepts.” (Vygotsky 1986, p. 172).

Crucially to us, the acquisition of scientific concepts differs from that of everyday spontaneous concepts. Spontaneous concepts are learned through the guidance of adults but during children’s everyday activities, such as play and interaction with others; for this reason, they are not systematic and for a while they remain unconscious. The acquisition of both kinds of concepts is thus guided by a social input, but while spontaneous concepts are learned participating in the activities in which they are typically used, learning of scientific concepts typically occurs in a specialized setting and requires systematic forms of instruction. “The development of the scientific… concept, a phenomenon that occurs as part of the educational process, constitutes a unique form of systematic cooperation between the teacher and the child. The maturation of the child’s higher mental functions occurs in this cooperative process, that is, it occurs through the adult’s assistance and participation. In the domain of interest to us, this is expressed in the growth of the relativeness of causal thinking and in the development of a certain degree of voluntary control in scientific thinking. This element of voluntary control is a product of the instructional process itself.” (DSC, pp. 168, 169, original emphases).

The systematic forms of instruction required to learn scientific concepts imply the use of verbal definitions. Learning of scientific concepts differs from learning of everyday concepts also because of this explicit linguistic mediation.

The child becomes conscious of his spontaneous concepts relatively late; the ability to define them in words, to operate with them at will, appears long after he has acquired the concepts. He has the concept (i.e., knows the object to which the concept refers), but is not conscious of his own act of thought. The development of a scientific concept, on the other hand, usually begins with its verbal definition and its use in non-spontaneous operations—with working on the concept itself. It starts its life in the child’s mind at the level that his spontaneous concepts reach only later. (Vygotsky, p. 192).

4.2.2 Studies on Testimony

Studies on “testimony” are highly relevant for the WAT theory, as they qualify the role adults play in shaping children’s knowledge. Two different views on testimony are present in the literature. According to the first, children trust only testimony that extends the empirical data they can collect on their own, without contradicting their own observations. However, recent evidence has revealed that children rely on testimony not only in domains such as psychology, cosmology, and biology, where testimony can extend their observations, but in domains such as theology as well, for example, when they receive information about God and the afterlife. Indeed, in some cases, explanations are the only possible source of information for acquiring some concepts, such as scientific concepts, supernatural concepts (e.g., “God”), concepts for which there is no correspondence between the perceptual features and the scientific classifications (e.g., “bats,” “whales”), and social concepts (e.g., “race,” “ethnicity,” etc.). Let us leave aside concepts for which there is no correspondence between the scientific classification and the perceptual basis. Apart from those, which might have a concrete referent, all other concepts are subsets of what we consider the general category of abstract concepts.

Harris and Koenig (2006) review evidence showing how children acquire key concepts that they cannot observe first-hand and verify empirically, relying on others’ testimony: There is evidence that they acquire through testimony knowledge on the relationship between the brain and the mental processes, on the spheric shape of the earth (Vosniadou 1994), on the life-cycle and the relationship between the death and the possession of vital internal organs (Jaakkola and Slaughter 2002). More crucially, children form abstract notion of domains where they do not have direct experience. For example, they form the abstract notion of God, progressively acknowledging that he has extraordinary cognitive, biological, and creative powers; importantly, their view is not autonomously developed but is heavily influenced by the beliefs of the community of which they are part, since the differences between fundamentalist and non-fundamentalist communities are marked. This suggests that children do not invent these powers, but rather rely on the knowledge of their community members. The same is true for the notion of afterlife. Even if children rely on testimony of others, they are not passively receptors of the information they gather. In all cases, children actively rework the information they receive and integrate it with their previous knowledge. The same can be true for ideology, even if we are not aware of literature specifically investigating the influences of ideology on children’s thought.

One interesting example of this active search for information is the questions children ask adults, for example, the persistent “why” questions between 3 and 6 years. While these questions often concern anomalies children register during their interaction with the world (e.g., “Why doesn’t butter stay on top of hot toast?” “How is it that when we put our hand into the water, we don’t make a hole in it?”), in some cases, they pertain spiritual domains, i.e., domains of which they cannot have direct experience. In some cases, they pose a persistent series of questions, engaging in what have been called real “passages of intellectual search” (Tizard and Hughes 1984). Crucial to us is the fact that when they register an anomaly, they refer to adults to ask for conciliatory clarifications and explanations. Importantly, they seem to trust adults as reliable sources of information to understand hidden and mechanistic causes of events, in a variety of domains.

The credibility of adults is carefully evaluated. Sabbagh and Baldwin (2001) demonstrated that 3- and 4-year-olds are sensitive to explicit cues given by informants: Children prefer to learn new words from people who declare knowledge rather than ignorance, and from people who declare certainty rather than from people who are uncertain and use locutions as “mmm, maybe,” etc. These studies reveal that during word learning, children do not pay attention only to the relationship between a word and its referent, but also to the characteristics of the speaker who teaches a word: No word learning occurs when the speaker reveals uncertainty, even if there is no referential ambiguity.

Other studies reveal that children are sensitive also to less explicit cues and are able to take into account the accuracy of others. Trust in adults is influenced by familiarity with them but can be revoked: in a study with children aged 3–5 years, Corriveau and Harris (2009) demonstrated that children preferred to trust familiar over unfamiliar teachers who named novel objects and pantomimed their function. However, they were also able to monitor the accuracy of their informants: While for younger children, the trust in their familiar informants was not affected by inaccurate information, 4- and particularly 5-year-old children took accuracy into account: On this basis, if familiar teachers were not accurate, they tended to revoke their trust in them. Further studies reveal that children’s learning is modulated by the trust in the people from whom they are learning, in a variety of domains and not only during language learning. 3- and 4-year-olds keep in mind specific information on other people’s accuracy and spontaneously use it during learning: For example, when they have to choose between conflicting information in learning to use a tool, they rely on the cues given by the informant who has demonstrated to be more accurate in the past (Birch et al. 2008).

Overall, literature on testimony is important for the WAT view: It shows that children are prone to receiving information and clarifications by adults, that they accurately monitor the accuracy of the information they receive, and it reveals that the ability to learn language and to learn overall is strictly interwoven with children’s social abilities. At the same time, literature on testimony shows that the information provided by others is particularly precious in domains where the perceptual inputs are insufficient or lacking, as the domains that pertain abstract concepts and words.

4.2.3 Comparative Studies on Apes and Children

The role of social imitation and of socio-pragmatic aspects in human word learning has been stressed by comparative studies that highlighted the role social learning plays for our species. One of the main contributions of this literature can be summarized with the following words by Tomasello and Akhtar (2000): “Although learning object labels may appear to involve straightforward mapping of word to referent…, it also requires the social-cognitive ability to tune into speakers’ referential intention” (p. 130).

According to the socio-pragmatic theory of language acquisition (e.g., Tomasello 1992), language acquisition is an intrinsically social phenomenon. An important building block of the capability to acquire language is given by the culturally inherited ability to imitate others—our species is endowed with the capability to imitate, while other social species only emulate actions. The capability to learn language is possible only when entering into a joint attention based activity with adults language speakers (Bruner 1983; Tomasello 1992).

Importantly, word learning is not limited to the association between an auditory input and the referent. Rather, the role of intersubjective aspects is crucial for language acquisition. In order to learn language children engage in joint activities with adults, they benefit from a variety of cues, not only from linguistic cues. For example, a very important social cue to determine reference is gaze direction. Children rely on the gaze direction of adults when they pronounce new words in the presence of multiple objects; they rely on others’ gaze even if the adult is looking into a bucket and the target object is invisible to the children (Baldwin 1993). Already 24-month-old children rely on adults’ gaze direction rather than on objects perceptual salience (Tomasello and Akhtar 2000).

Overall, research has shown that linguistic cues are relevant but not the only information source at the basis of language learning. According to a more classic approach, children rely on constraints to learn new words (Markman 1989; Nelson 1988). For example, they assume that two different words apply to different objects and have difficulties in applying the same word to the same referent (e.g., “animal,” “dog,” “Fufi”) (mutual exclusivity constraint); in addition, in the absence of counter-indications, they assume that a novel word refers to an object (whole object constraint). The approach based on constraints is founded on the assumption that it might be difficult to determine reference only on the basis of pragmatic cues, since they might be ambiguous.

In contrast, a number of studies and researches favouring the socio-pragmatic view have emphasized the richness of the pragmatic and social context (Tomasello and Akhtar 1995). Children take into account the whole pragmatic and discourse context and are sensitive to a variety of social cues: They register the activities of adults, what they are doing and why, they are able to detect their intention to act and to predict the outcome of their actions, they are able to determine what is new for them; more generally, they benefit from a variety of information that goes well beyond the linguistic inputs. Gergely et al. (1995) have demonstrated with beautifully designed experiments that one-year-old infants are able to interpret and predict others’ goal-directed actions and that they possess a theory of rational action. Tomasello et al. (2005) have shown that the human specificity cannot be limited to the capability to understand others’ intentions. Other species are also able to understand the intentions of others, but they do not engage in social and cultural activities with others. 6-month-olds perceive others and follow their gaze, being able to predict the outcome of familiar actions; at 9-month babies are able to understand that people have goals and persist in order to accomplish them, at 14, they are able to understand intentional actions. On this basis, they can start forms of imitative learning, in which they make an action plan to pursue a given goal. In spite of the importance of intention understanding, according to Tomasello et al. (2005) what is specifically and uniquely human is the so-called “shared intentionality,” the ability to create joint actions and to be involved in collaborative activities with others. Apes have a sophisticated capability of understanding the intentions of others, but they lack the motivation and ability in exchanging emotions and experiences and in engaging in common activities as humans do. Instead, children from 9 to 12 months engage in a series of triadic behaviors, involving the child, an adult and an object—a variety of joint attentional skills emerge, as for example, following the gaze direction of the other, pointing to an object, holding it to show it to someone, imitating the gestures of others with the object. Language learning is grounded in these abilities (Tomasello 2000). In this perspective, language is a prominent outer sign of this intrinsically social character of human cognition: “What is language if not a set of coordination devices for directing the attention of others?” (Tomasello et al. 2005, p. 16). Importantly for us, according to Tomasello (2000) the basic linguistic units are not single words, but utterances and the first utterances children produce are concrete ones, i.e., instantiations of item-based schemas, while later abstraction emerges from generalizing across a variety of different schemas.

Overall, the studies that compare apes and children cognition and language learning are highly relevant for the WAT proposal. They stress indeed the importance of the social aspects for human learning, and they ground language learning in social abilities, in particular in joint attention and shared intentionality capabilities. Furthermore, they testify that children learn within a context, heavily relying on adults’ collaboration, that learning concerns utterances rather than words, and that the acquisition process goes from concrete instantiations to more abstract constructions.

4.3 Embodiment and Statistics in Word Learning

A different way out to the problem posed by constraints is provided by the empiricist approach to word learning. As previously outlined, a more traditional approach is based on the idea that children use constraints to guide their word learning process. In fact, given the richness of the perceptual input, a problem of referential ambiguity is present: A heard word can be referred to single objects, to their parts, or to different aspects in a scene. To solve this referential ambiguity, children need constraints: for example, they assume that each object is referred to by a single word (Markman 1989). Recent work by Smith and collaborators (Yu and Smith 2007; Smith and Yu 2008) shows that 12- to 14-month-old children and adults solve the referential problem not by relying on constraints, but rather by computing statistics across trials. For example, the word “ball” is experienced across different scenes, in which not only a ball but different items are present. Children have the impressive ability to keep track of the different word occurrences, in order to solve the problem of the referent of the word “ball,” and later they become able to attach the label to the proper referent. A higher number of words and of referents provides clearer evidence, thus it leads to a better learning: Yu and Smith (2007) demonstrated that adults learned more word-referent pairs with sets containing 18 words and referents than with sets with only nine words and referents. The role of cross-situational learning, according to which multiple meanings are encoded across different situations using statistical procedures, is not uncontroversial and is challenged by evidence that indicate that learners use instead a one-trial fast mapping procedure, hypothesizing a single meaning and maintaining this hypothesis across different trials (see Medina et al. 2011). Despite this, experiments and models that show how statistical learning can account for word-to-referent mappings are growing. According to this empiricist account, words are learned by mapping them with their referents, through an associative process. For example, Smith (2005) spoke of a dumb attentional mechanism which characterizes word learning, leading to the associations of words with perceptually salient inputs. A particularly salient perceptual property is shape, probably due to the fact that it is not only a visual but also an action-based property (Smith 2005). Shape is indeed a truly embodied object property. An impressive number of experiments starting from the seminal paper by Landau et al. (1988) have shown that children of the Western societies extend nouns on the basis of similarity in shape. When taught a new name (e.g., when they are told “This is a dax”), children extend it to objects similar in shape rather than in color, texture, or other perceptual properties. The shape bias becomes rather stable at around 2 years. Importantly, however, infants are not passive recipients of information, and perceptual inputs are not passively experienced. Infants actively move in the environment, search for objects, and focus their attention on some of them. Intriguing new data obtained tracking infants’ gaze with head-mounted cameras indicate that infants learn new names focusing attention on an object within a scene. In this way, they maximize the role played by co-occurrence statistics. In a clever study, 15-month-olds were presented with two objects and two names for each trial. The looking pattern of infants who were able to learn the new names suggests that statistical learning is important for learning, but further mechanisms are necessary to maximize the information it gives. One important mechanism is embodied selective attention: Statistical learning succeeds if infants are able to focus on the named object, “cleaning” the perceptual input when it is too complex. Gaze direction, head movements, and hand movements (for example, holding the object) contribute in reducing the ambiguity of the input. Yu and Smith (2012) used head-mounted cameras for infants and parents and found that infants actively move to select an object, making the eyes and the head closer to it and holding it in the hands. In this way, they focus on it reducing the role of potential distractors. Parents typically, but not always, provided the names in optimal sensory moments, when the object was under the infant’s attentional focus; when they chose the optimal sensory moment, word learning occurred. Data of this kind provide a clear indication of how infants’ movements and actions, together with social stimuli, contribute in selecting attention on the object to be named and create the optimal conditions for statistical learning to occur. Importantly, recent evidence (e.g., Wojcik and Saffran 2013) reveals that during word learning, children do not only learn mappings between a word and a referent, but encode also information on the relations between objects, as for example, the similarities among word referents. 2-year-olds were taught four novel words referring to four novel objects, grouped in two pairs of visually similar objects. Then, they listened to the repetition of word pairs: Results showed that they listened longer to word pairs referring to similar than to dissimilar objects.

Further literature is in keeping with the idea that the kind of input given has a strong impact on language acquisition and later on language mastering, as revealed, for example, in reading and comprehension abilities. Here, we do not intend to enter into the nature-nurture debate, discussing whether language acquisition is innately pre-specified or whether learning plays a major role, since this is outside the aims of this book. Notice, however, that stressing the role played by the kind of input is in line with the general idea that language is learned rather than innately pre-specified, as studies focusing on language statistical learning in infants and children are beginning to demonstrate (e.g., Saffran et al. 1996; Romberg and Saffran 2010; Gomez and Gerken 1999, 2000). Research on statistical learning has shown that infants develop remarkable abilities in parsing language into word-like constituents based on combinations of syllables, in encoding word order information, in abstracting over linguistic categories as determiners, adjectives, verbs, and nouns. Importantly, it has been shown that experience with statistical cues that mark categorical distinctions provides the basis for learning word meanings (e.g., Lany and Saffran 2010). In addition, recent results have pointed out that the ability to attend to and to keep track of statistical regularities in matching words and referent is at the basis of the word learning capability (Smith and Yu 2008; Yu and Smith 2007). Overall, this line of research questions the centrality of semantics for word acquisition, revealing that the development of semantic and syntactic competence might be more intertwined than previously thought.

Overall, studies show that word learning occurs through the associations between words and referents, which investigate the mechanisms of statistical learning, and that reveal how embodied selective attention contributes to focus attention on single objects and to learn their name are highly relevant to the WAT proposal. They stress indeed the fact that word learning is an embodied and grounded process.

4.4 Hybrid Approaches of Word Learning

As we have seen, both social and perception–action aspects contribute to word learning. Some recent views propose that both perceptual and social aspects count, but that they might have a different weight at different ages. Evidence supports this view suggesting that very young children are more sensitive to perceptual aspects, becoming progressively interested in social cues when contrasted with perceptual salience. Pruden et al. (2006) investigated how 10-month-old babies learn novel words, trying to disentangle the perceptual and the social dimensions. Babies were shown interesting objects, i.e., brightly colored objects either producing sounds or with moving parts, and boring objects, i.e., objects gray and uniform in color that did not produce sounds nor had moving parts. An experimenter taught them a new label for the objects (e.g., “MODI”). In the coincidental condition, the name was referred to the perceptually interesting object, in the conflict condition to the boring one. In the new label test trial, they were taught a new name for the object (e.g., “look at the GLORP, not at the MODI”), the hypothesis being that if they had already learned a name for the object, they should look away from it, due to the mutual exclusivity constraint. In the final recovery task, they were told to look at the “MODI” again. Visual fixation times were analyzed, and the pattern of results was straightforward: In the recovery test, the babies attached the label to the perceptually salient object, not to the socially interesting one. The objection that no word learning occurred, and that children simply looked at the most interesting object, was ruled out by the results of the new label test: As predicted, children looked away from the object to find a novel object to name. At 12 months, the pattern is already different, since children seem to be sensitive to social cues: They learn a new label for a novel object when the social and the perceptual cues are aligned; otherwise, no evidence of word learning is present. Thus, children are responsive to social cues, but they do not recruit them for word learning. At 19 and 24 months, the social cues dominate: learning of new words occurred for socially interesting objects, independently of whether they were perceptually interesting or boring. On the basis of this kind of data, the hybrid emergentist coalition model (ECM) has been proposed (Hollich et al. 2000; Hirsh-Pasek et al. 2000, 2004; Golinkoff and Hirsh-Pasek 2006). According to ECM, multiple cues are at the basis of word acquisition, and different processes characterize the early and the later stages of word learning: “As they break through the language barrier, children are guided (though not completely) by associationist laws. As they mature into veteran word learners, they are guided (though not completely) by socio-pragmatic strategies.” (Hirsh-Pasek et al. 2004). To test the model, a variety of experiments were run on three different samples: 12- to 13-month-olds who are starting to learn words; 19- to 20-month-olds who may or may not have yet experienced a vocabulary spurt; and 24- to 25-month-olds who typically master a large production vocabulary. They tested reference and found that children initially rely on perceptual similarity and then become progressively more sensitive to social cues; furthermore, they tested extendibility, i.e., they investigated whether children adopt a “narrow to large” principle, starting with a proper noun hypothesis, initially adequate for a given object, and analyzed the perceptual and social cues on the basis of which they extended it to other category members. The overall pattern of results suggests that children use multiple cues for word learning and that depending on their maturity level, they are able to appreciate their variety: Initially, they rely on perceptual cues, later they are able to appreciate the role played by social ones. This developmental trend would also help clarify why early word learning is rather slow (1–2 words per week) compared to faster word learning occurring after 19 months of age. Support to this, view comes also from further studies. For example, Weizman and Snow (2001) investigated mother–child conversations in 5 settings (e.g., play, mealtime, and book readings). 99 % of maternal lexical input consisted of the 3,000 most frequent words. They found that early exposure to sophisticated linguistic input, i.e., to words beyond the 3,000 most common in English, had a marked influence on children’s later vocabulary performance, more than the quantity of lexical input overall. Woodward et al. (1994) studying word learning in 13- and 18-month-old children, speculated, “Perhaps prenaming explosion children have highly effective nonlinguistic associative mechanisms that allow them to map sound patterns onto the environmental entities that are presented with them, whereas postnaming explosion children learn words through more advanced linguistic mechanisms” (p. 564).

Hybrid models, according to which more cues—perceptual and social—coexist to promote word learning, are highly relevant to the WAT proposal. However, in the WAT theory, we do not focus on age differences, but on differences in kind of words. Even more in line with the WAT view are proposals according to which perceptual and linguistic information contribute differently to learning of different kinds of words, as evidence on modality of acquisition we will review in the Chap. 6 will help to clarify. These proposals start from the consideration that words refer to the world in multiple ways and that not all are equally easy to learn. Gentner (2006) proposes that children learn first words the referents of which can be easily individuated, as proper nouns of animate entities and concrete nouns. Other words, such as verbs and abstract words, are learned later. In the same vein, Gleitman et al. (2005) distinguish between hard words, more abstract and therefore more difficult to acquire, and easy words. They move from the consideration that early learning seems to be predicted by the “concreteness” of words, rather than by a specific word class: for example, children learn the verb “kiss” before the noun “idea” and even before the noun “kiss,” while they acquire the verbs “think” and “know” later than the verbs “hit” and “go.” At the same time, they claim that the very concept of “concreteness” is vague and needs to be sharpened. Gleitman et al. (2005) overview a series of studies they performed, reported in Gillette et al. (1999), and Snedeker and Gleitman (2004), aimed at studying learning of easy and hard words. In these studies, they used the human simulation paradigm (HSP): They had adults observe short video-clips of mother–child interactions recorded in natural situations. In some experiments, video-clips were silenced, and when the mother pronounced a “mystery word,” participants heard a beep. The same word was presented in six video-clips in sequence. Later participants had to guess which was the “mystery word.” Their performance differed consistently depending on the word and reflected the order of acquisition of words in children. Concrete nouns referring to whole objects/entities (“elephant”) were easier to identify than abstract nouns (“idea”), and concrete verbs (“throw”) were identified faster than abstract ones (“know”). In further experiments, participants had to perform the same task, but they received different sources of information as a cue: They were either given visual information, i.e., the video-clips, or linguistic content information, consisting in further names occurring in the mother’s utterances, or linguistic syntactic information, in which the frames in which the word occurred were presented, but where the content words and the mystery words were substituted by nonsense words (e.g., “Why don’t ver GORP telfa?”). The visual cues were most useful for nouns and concrete terms, while for abstract terms, syntactic information was critical: for example, visual cues were useful to identify “go” but not “know,” while the opposite was true for syntactic cues. On the basis of evidence of this kind, Gleitman et al. (2005) propose that the first words, which typically are basic level concrete nouns, are rather easily acquired through a word-to-world mapping mechanism. In contrast, there exist words which are more difficult to acquire, the “hard words,” which are typically more abstract. Once provided with a substantial amount of easy words, children—but also adults, during word acquisition—can proceed to the process of structure-to-world mapping. In order to acquire abstract words, a consistent amount of sophisticated linguistic knowledge is necessary. Syntactic information contributes to this acquisition process, so that syntax and semantics should be considered as deeply intertwined: Word meaning can be learned probabilistically, and hard words can be learned with the help of other linguistic information. Basically, word learning occurs probabilistically, and it benefits from different sources, linguistic and extralinguistic. What changes with time is the capability to master linguistic information. Indeed, children hardly master syntax during the acquisition of the first 100 words, which are mostly concrete. Only later, when the amount of learned words triplicates, they demonstrate the capability to use syntax. In this perspective, the later acquisition of abstract words would not be due to the necessity of a conceptual change, but to informational reasons: Many other words should be acquired as they can provide the background for the acquisition of hard words.

4.5 Age of Acquisition and Modality of Acquisition

An important support to the WAT view comes from literature on modality of acquisition (MOA) (Wauters et al. 2003). MOA is a new construct referring to the kind of input children receive while acquiring the meaning of a word or of a sign. The child can acquire a word meaning perceptually, linguistically, or benefiting from both modalities. The most frequent acquisition modality is perceptual: for example, the child experiences different red objects and entities and, in different occasions, he/she consistently hears the word “red”. The same is true for other words having a concrete referent, such as bottle, doll, and house. The situation differs for a word like “century,” and for the majority of the words, we called “abstract”: The child cannot directly experience their referent, thus he/she has to rely on someone else’s explanations or definitions, be they spoken, written, fingerspelled, or signed, or he/she has to infer its meaning from spoken, written, fingerspelled, or signed information. In many cases, both perceptual and linguistic information contribute to word acquisition. MOA does not depend only on the type of concept but also on the type of context: for example, words like “tundra” or “snow” can be acquired perceptually by children living in some areas but have to be acquired linguistically by others. Similarly, children whose parents are carpenter will learn perceptually the meaning of carpentry tool words that other children will learn only through the linguistic mediation. Differently from concreteness and imageability, thus, MOA is context-dependent. MOA can be determined by asking adults to rate words for acquisition modality on a 5-point scale. MOA is correlated with imageability (0.64), concreteness (0.47), and age of acquisition (0.59), but it does not overlap with them. It can help explaining why abstract words are learned later than concrete ones. An analysis of the elementary school textbooks revealed a progressive increase in linguistically provided information over perceptual information. Finally, MOA is influenced by the possibility to access to complete perceptual and linguistic information or not: On this basis, it explains differences in comprehension by children who cannot see or hear. In some experiments, hearing and deaf children between 7 and 15 years of age were required to read sentences in which a target word was present and to answer a question after reading it by pushing a yes or no button; both reading times and comprehension measures were taken (Wauters et al. 2008). Reading times were faster and comprehension scores were higher for perceptually acquired than for linguistically acquired words for both groups; the difference between the two modalities decreased with age only for hearing children. In addition, MOA proved to be an important factor in explaining the poorer reading comprehension of deaf compared to hearing children. Texts with a high proportion of words rated as being learned linguistically are difficult for deaf children.

Overall, studies on MOA are highly relevant to the WAT proposal, for a number of reasons. First, they draw attention to the role played by the input given to children and to the different acquisition modality of words. Second, they point out that words rated as linguistically acquired are more difficult to read and to comprehend compared to perceptually acquired words. Third, they show an increase with age of linguistically acquired words in educational textbooks, probably due to the emphasis put first on decoding and only later on acquiring information. Fourth, they give some hints as to the process of words acquisition in the course of the development: Interesting parallels can be drawn between the later acquisition of words through language and the later acquisition of abstract words, which do not possess a concrete perceivable referent.

4.6 Acquisition of Novel Words in Adults: An Embodied Approach

We will briefly illustrate two studies on novel categories and word acquisition in adults, performed in our lab, the results of which support the WAT view. Both studies were aimed at testing the following general hypotheses:

  1. (a)

    Linguistic (and social) information should play a major role for the representation of abstract categories and word meanings, while sensorimotor information is more crucial for the representation of concrete ones. We will review only the aspects of the studies that are relevant to the present discussion.

  2. (b)

    Perception and action information should be crucial for both, but the higher differences within the exemplars of abstract categories and the complex relations that characterize them compared to the greater compactness of concrete categories should render the linguistic input more relevant for the first.

  3. (c)

    The different acquisition modality of concrete and abstract categories should have an influence not only on the way we represent them, but also on how we respond to them with our body: Abstract categories and words should activate more mouth responses, concrete categories, and words’ hand responses. Notice, however, that this would not be true for concrete words the meaning of which explicitly refers to the hand or mouth, as “finger,” “tongue,” etc. (Bergen et al. 2003; Bergen 2012).

Borghi et al. (2011) presented adult participants with novel objects to explore. Concrete objects were bidimensional novel objects with bright colors, abstract concepts were composed by two or more elements of uniform color which moved and interacted in novel ways. Participants had some time to study them: They were allowed to “manipulate” concrete objects, moving them with the mouse on the screen, while they could simply observe the abstract concepts (see Fig. 4.1).

Fig. 4.1
figure 1

Study by Borghi et al. (2011). Some examples of the stimuli we used. Exemplars of concrete categories consisted of bidimensional novel objects with bright colors; participants could manipulate them with the mouse and were required to verify whether they fitted into a donut shaped figure. Exemplars of abstract categories were composed by two or more elements of a bluish uniform color which moved and interacted in novel ways; participants could simply observe their interactions

Later, we verified whether participants were able to form a category, asking them whether two objects belonged to the same category or not. Results showed that it was more difficult to form abstract than concrete categories, in line with what happens in real life. Later participants were taught the novel category name—they read on the screen “This is a fusapo/a calona, etc.”. In half of the cases, they were also given a written explanation of the category meaning, as for example, “clash between two elements, which later separate.” In a further test, they had to answer by pressing a key on a keyboard whether the object they saw had a given name. Again, abstract words showed a disadvantage compared to concrete words (see Fig. 4.2).

Fig. 4.2
figure 2

Study by Borghi et al. (2011). An example of two of the tasks we used. The first task, a categorical recognition one, was aimed at verifying the facility to form categories independently from knowing their name. Participants were required to press two different keys on the keyboard to decide whether two objects belonged to the same category or not. In the second task, aimed at verifying the effect of labels on categorization, participants were instructed to press a different key on the keyboard depending on whether the label corresponded to the object they saw or not

When given the names of concrete and abstract categories and asked to produce their properties, participants produced the properties typically found in property generation tasks, respectively, for concrete and abstract concepts, i.e., more perceptual properties for the first and more abstract and general statements for the second. This test gave us the guarantee that the novel concrete and abstract categories we used corresponded in content and structure-to-real-life ones. The most crucial test experiments, however, consisted in a property verification task. Participants had to respond by pressing a key on the keyboard with the hand or by saying “yes” on the microphone if the property was typically true of the category. Response time results showed that abstract words were responded to faster with the microphone, particularly if not only the label but the explanation as well had been given, while responses to concrete words were faster with the keyboard (see Fig. 4.3).

Fig. 4.3
figure 3

Study by Borghi et al. (2011). The interaction obtained in the property verification task showing that responses to abstract words were faster with the microphone and that responses with the concrete words were faster with the keyboard

A control experiment showed that the advantage of the microphone over the keyboard with abstract words was not present if they were not grounded, i.e., if the information provided by perceptual input and that provided by the linguistic input (label plus explanation) were dissociated. Even if no data on acquisition were collected, the results of this study suggest that concrete and abstract words might be acquired relying on different cues, perceptual and linguistic. Linguistic cues, i.e., both labels and notably explanations, were more relevant for abstract words. These results thus contribute to foster the idea that different acquisition modalities are present and have a different weight in concrete and abstract words representation. One important aspect of the study is that it reveals that these different acquisition modalities have a direct bodily consequence, as they activate different parts of our body and of our motor system: Abstract words lead to the activation of the mouth, concrete words to the activation of the hand.

In a recent study, Granito et al. (in preparation) used a similar paradigm, but introduced some variations. In order to train subjects with more ecological materials, instead of using shapes presented on the computer screen, participants were given objects composed by Lego bricks. Exemplars of concrete categories consisted of single objects composed by different parts, while exemplars of abstract categories were composed of the same objects arranged in complex and different relations (see Fig. 4.4).

Fig. 4.4
figure 4

Study by Granito et al. (in preparation). Some examples of stimuli we used, composed by Lego bricks. Exemplars of concrete categories consisted of single objects composed by different parts, while exemplars of abstract categories were composed of the same objects arranged in complex and different relations

Thus, while in the previous study, concrete words were defined as referring to single, perceptually varied manipulable objects and abstract words as referring to complex objects moving in different ways, here the focus of the distinction between concrete and abstract concepts lies in their different degree of complexity and in the fact that the latter can be defined as referring to relations rather than to single objects. Both concrete and abstract words’ referents are perceivable/manipulable, but the exemplars of concrete categories were similar from a sensorimotor point of view, those of abstract categories were not. Each relation category was defined exclusively by the spatial relation existing among the component objects, e.g., “the two objects have two contact points,” and the spatial relation remained constant across the category exemplars. As a result, members of concrete categories were similar from a sensorimotor point of view, while members of abstract categories greatly differed. Participants were first given the time to explore the exemplars and were let free to manipulate them. Then, they had to sort them in different groups. As predicted, their sorting criteria corresponded more to the experimenters’ criteria for concrete than for abstract categories. The sorting criteria used for abstract categories allowed us to divide participants in two further groups: Some of them used a perceptual strategy, thus their sortings differed greatly from the categories defined by the experimenters, while participants of a second group used spatial criteria more similar to the ones that had been defined. Then, half of the participants were taught new labels to design the exemplars and were provided with explanations of the category meaning. Differently from the study by Borghi et al. (2011), in order to enhance the role played by the social input, participants did not read the written category labels and explanations of the category content; rather, labels and explanations were given directly by the experimenter. Because labels and explanations were given in different moments, participants could be divided into early and late language learners. In a further test, they were submitted to a categorical recognition task, i.e., they were required to decide whether two objects belonged to the same category, responding “yes” either with the keyboard or with the microphone (see Fig. 4.5).

Fig. 4.5
figure 5

Study by Granito et al. (in preparation). Two of the tasks we used: On the left is represented the categorical recognition task, in which participants had to decide whether two objects belonged to the same category, either pressing a “yes” button on keyboard or pronouncing “yes” with the microphone. On the right is represented the word-to-picture task: participants saw the image of an object/relation followed by a label and had to press a key on the keyboard or to say “yes” with the microphone if the label designed the relation (abstract) or the object (concrete)

The performance with concrete categories was better than that with abstract categories. As predicted, the performance of participants who had received linguistic training was better than the performance of participants who had not. However, this was true for relations, not for objects. This reveals that the acquisition of abstract concepts benefits more than that of concrete categories of the linguistic input. Using language had an effect on body: Participants who did not undergo linguistic training responded faster with the hand than with the mouth, while hand and mouth responses did not differ for participants who were linguistically trained. In addition, linguistically trained participants were faster than non-linguistically trained participants with mouth responses. When all participants were taught labels and explanations, they were submitted to a word-to-picture task: They were presented with the image of a relation followed by a label and had to press a key on the keyboard or to say “yes” with the microphone if the label designed the relation or the object. Among early language learners, participants who had initially employed a perceptual rather than a spatial strategy and responded with the mouth scored better with abstract than with concrete categories, while no difference was present for participants who had used a spatial strategy from the start. This indicates that participants who benefit more from the linguistic help are those whose initial categories are quite distant from those defined by the experimenters and suggests that the role of linguistic input can be modulated not only by the kind of category/word to acquire, but can have a different impact also depending on the strategy subjects use. This is confirmed by the analyses on participants adopting a perceptual strategy: with concrete categories they performed similarly across all conditions, while with abstract categories early language learners scored better with mouth than with hand responses; the opposite was true for late language learners. Finally, late language learners had a worse performance with hand than with mouth responses, while no difference between mouth and hand responses was present for early language learners.

Overall, the two studies provide support to the WAT, in various ways. The first study demonstrates that the different acquisition modality (manipulation vs. observation) has an impact on concrete and abstract categories representation. This impact has a bodily counterpart, as responses with the mouth are faster with abstract than with concrete categories. This effect is particularly marked when not only labels, but explanations of the category meaning are provided as well. This study demonstrates that the linguistic input has a more crucial role for abstract than for concrete categories, and it shows that not only labels, but explanations of the category meaning play a major role. At the same time, our results diverge from the results predicted by theories according to which abstract categories are defined only by linguistic information (e.g., Paivio 1986, 2013). For abstract categories, linguistic information is crucial, but embodied sensorimotor information is crucial as well. Indeed, when explanations are dissociated from perceptual and motor characteristics, the advantage of the mouth over the hand with abstract categories is not present.

The second study complements the first as it provides not only a linguistic input but a social input to word learning as well—labels and explanations are orally given by the experimenter, not by the computer. In addition, to train participants, we used real objects instead of images presented on the computer screen. In this study, we did not manipulate the acquisition modality, but we intended to explore how early or late learning of labels and explanations contribute to the representation of the word meaning, and whether this has an impact on responses performed with different body parts. Overall, results revealed that abstract categories benefited more than concrete ones by early language learning; accordingly, mouth responses were facilitated. In addition, the results of this study suggest that participants might use different categorization strategies. The advantage of linguistic information is particularly marked for those who initially tend to be more perceptually bounded and less sensitive to the relations between elements of a category. Crucially, then, even if the acquisition modality was not manipulated by us, different acquisition modalities emerged: With abstract categories, participants relied more on linguistic input because they needed it; this was particularly true for participants who adopted a perceptual rather than a spatial strategy.

Furthermore, notice that the two studies are complementary also because the way in which concrete and abstract categories are operationalized differs. In the first, concrete categories are defined as categories endowed with a single concrete, manipulable, and perceptually rich referent, while abstract categories are defined as not manipulable, given by more than one referent having complex mutual interactions. In the second, both concrete and abstract categories had manipulable referents or parts, but concrete categories have a single referent composed by many parts, the Lego bricks (e.g., a cup is composed by the container, the handle, etc.), while abstract categories are composed by many objects having different relations (e.g., injustice can be spatially conceptualized as given by one larger element above a smaller one).

4.7 Conclusion: A Possible Acquisition Trajectory

In the developmental literature, an empiricist account as that promoted by Linda Smith and collaborators (Smith 1999, 2000) has typically been contrasted with the socio-pragmatic account. As briefly illustrated, the socio-pragmatic view underlines the importance for word learning of the comprehension of others’ intentionality and of sharing perspectives with the others. As we have seen, hybrid approaches propose that the perceptual associative and the social aspects might have a different weight at different ages.

We are not developmental psychologists, thus we do not aim to enter the debate, also because the main points of the debate are outside the aims of the present book. Since abstract words are acquired rather late, the issue of which mechanisms drive early word learning is out of the focus of the WAT view. While the discussion of constraints is of marginal interest for us, for other aspects, we believe that the empiricist, the socio-pragmatic, and the hybrid approaches provide insightful hints that can help provide a basis for the WAT proposal.

In fact, we argue that a similar associative mechanism might work for acquisition of both concrete and abstract words. Perceptive salience, embodied attention and bodily actions would contribute to the success of learning concrete words through associations, eventually aided by parents (Yu and Smith 2012): Children move in their environment, and through selective attention, they get closer and focus on interesting objects, encoding also the similarities and the relationship between them (Wojcik and Saffran 2013). However, for acquiring abstract word, an associative mechanism based on words to referents mapping might be more time consuming and difficult to apply: given the sparse variety of referents abstract words have, children might need a lot of evidence to form a category as, for example, “freedom”. This does not mean that an embodied mechanism of referent searching and of selective attention is not present. However, due to the higher difficulty in selecting the perceptual referent, both the social (Tomasello and Akhtar 2000) and the linguistic inputs might be particularly relevant for abstract word learning (Gleitman et al. 2005; Gentner 2006; Wauters et al. 2003). As shown in the acquisition studies by Borghi et al. (2011) and Granito et al. (in preparation), the linguistic input becomes particularly precious since the referents of abstract words are more sparse than those of concrete ones. In addition, words might be associated not only to perceptually salient referents but also to other words, to facilitate learning. This is possible also because, as shown by literature on age of acquisition (Barca et al. 2002), abstract words are acquired rather late compared to concrete ones, when the vocabulary burst has already occurred (Gleitman et al. 2005). Thus, an associative learning mechanism is present both in associating words and referents and words to other words. But, this is not the whole story: in keeping with an embodied perspective, learning of these associations has an impact not only on representation but on the body as well, since different effectors, the mouth and the hand, are activated (Borghi et al. 2011; Granito et al. in preparation).