Keywords

1 Introduction

In this article, we extend the theory of Construction Morphology to the analysis of sign language structure. Our goal is to demonstrate that there is much to be gained from the construction-theoretic analysis of sign languages: Construction Grammarians will find that sign languages provide additional support for the position that human languages vary in fundamental ways, even as they exhibit cross-linguistic functional similarities. Sign linguists will find that a Construction Grammar framework can resolve several longstanding puzzles concerning the morphological transparency of conventional signs. However, to date, there have been few points of contact between the literatures on Construction Grammar and sign language analysis.Footnote 1 For this reason, we begin with a brief introduction to Construction Grammar, for sign language linguists, and to the structure of signs, for Construction Grammarians.

The family of theories that we refer to here as “Construction Grammar” began in analyses of fixed expressions with idiomatic as well as productive properties. Accounting for phrasal idioms in English like the “let alone” construction (Fillmore et al. 1988) and the “what’s X doing Y” construction (Kay and Fillmore 1999), for example, led to increased recognition of the fact that many linguistic constructions exhibit fixed and variable structural elements at once. In the “what’s X doing Y” construction, the order and identity of the words what’s and doing are analyzed as fixed components, and the remaining elements are variable slots to be filled with words and phrases like a fly and in my soup, or that and there, in the course of actual language use. This tendency for constructions to contain both fixed and variable elements models a speaker’s ability to use language in conventional as well as productive ways.

Importantly, in Construction Grammar, phrasal patterns themselves are typically associated with semantic or pragmatic functions that cannot be attributed to the identity and arrangement of their internal constituents alone. In the case of the sentences What’s a fly doing in my soup? and What’s a poodle-haired rock musician doing writing a book about crinolines?, Footnote 2 the phrasal pattern “what’s X doing Y” accounts for the expression of surprise, as well the inference that the request is for an explanation of how the state of affairs came to be, not a literal description regarding the activity being done. These additional aspects of meaning cannot be derived from the individual words in the sentence, but rather are associated with the phrasal template directly. Construction Grammarians thus consider constructions to be theoretical primitives that are crucial for explaining observed patterns of language use.

Since the 1980s, several varieties of Construction Grammar have been developed (see Hoffmann and Trousdale (eds.) 2013 for an overview). Though these approaches differ in important ways, they share some fundamental assumptions that set them apart from other approaches to linguistic analysis. Following Goldberg (2013: 15–16), we can articulate some of these assumptions as follows:

Assumptions of a Constructionist Theory

  1. 1.

    Constructions: Phrases and lexical items alike are analyzed as constructions, which are learned pairings of form and function (including meaning).

  2. 2.

    Network organization: Constructions are related in a single, structured network, and are not sequestered in either “the lexicon” or “the grammar”.

  3. 3.

    A Usage-based theory: Languages are learned through generalization over events the speaker has experienced as the speaker experiences them, and abstract generalizations are emergent from knowledge about particular items.

Thinking about how a construction-theoretic approach might be extended to the domain of morphology in particular, Booij (2010, 2013) has demonstrated that words exhibit the same sort of schematic structure that phrasal idioms do. Like idioms, words are memorized wholes with analyzable internal structure. Accordingly, in the theory of Construction Morphology, individual word tokens are analyzed as concrete instantiations of abstract word-formation schemas, such that the English words acceptable, believable, and doable license the generalization of a constructional schema “Xable”, for example. This schema is a lexical pattern linking a network of learned, conventional English words, and also provides a recipe for the formation of new, previously un-encountered words. Like the phrasal pattern “what’s X doing Y”, the lexical pattern “Xable” is associated with a meaning, something along the lines of “can be VERB-ed”, and it contains both fixed and variable slots that account for its idiomatic and productive properties. Construction Morphology thus fleshes out the lexical side of the construction-theoretic argument that the difference between a phrasal construction and a lexical construction is a matter of degree, rather than kind.

Turning now to introduce the field of sign language linguistics, it is important to note that the analysis of sign language structure has traditionally followed a different set of assumptions than those outlined above. Sign language linguistics began in earnest with William Stokoe’s seminal demonstration that signs exhibit contrastive internal structure (Stokoe 1960). While even among linguists, signs had previously and widely been assumed to be concrete, holistic, non-linguistic gestures, Stokoe’s structural analysis showed that the signs of sign languages are more like the words of spoken languages than they are like the manual gestures produced by non-signers: like spoken words, signs can be broken down into a finite, listable, and language-specific inventory of formational elements that can, in principle, be recombined to yield a staggering number of possible sign forms.Footnote 3

Following Stokoe’s example, it remains typical in sign language linguistics to use pairs of minimally different signs to argue for the linguistic status of contrastive and “meaningless” sub-lexical formatives (see Klima and Bellugi 1979: 42 for a widely-cited demonstration of this point, and Fenlon et al. (2018) for an overview of sign language phonology from this perspective). As a result, though the labels and distinctions differ across studies, sign linguists can minimally describe sign forms through reference to the shape of each hand and the way the hands move in the articulation of the sign.Footnote 4 As an example, consider the ASL (American Sign Language) sign glossed as nice, shown in Fig. 1.Footnote 5 This sign is formed with the fingers on each hand extended and held together to make a wide, flat shape, and during the articulation of the sign, the palm of the right hand slides across the open palm and fingers of the left hand, in a single movement.

Fig. 1
figure 1

The ASL sign glossed as nice is formed with two “flat” hands contacting one another in front of the body

The handshape and movement of the sign nice can be analyzed as discrete phonological elements in ASL: the “flat” handshape used to form the sign nice is drawn from the inventory of conventional ASL handshapes, and the straight path movement can also found in other ASL signs. The same “flat” handshape is also used, for example, in the formation of the ASL sign please. However, this sign’s movement pattern is different, as it is formed with the flat hand tracing a small circle on the chest (Fig. 2a). The circular movement pattern in please is yet another recurring element found in other signs: the ASL sign glossed as sorry differs from please only in that the hand forms a closed fist, rather than a flat palm (Fig. 2b).

Fig. 2
figure 2

A minimal pair in ASL: (a) the ASL sign glossed as please is formed with one “flat” hand tracing a small circle on the signer’s chest, and (b) the ASL sign glossed as sorry is formed with one “fist” hand tracing a small circle on the signer’s chest

The signs please and sorry can therefore be considered a minimal pair in ASL: they are formed identically, except that they require different handshapes. Though please and sorry are also related in function, as both signs are conventional indicators of politeness, it is not possible to compositionally derive the meaning of either please or sorry from the combination of the handshape or movement pattern involved. Accordingly, the difference between the signs please and sorry is most commonly characterized as an arbitrary phonological difference in ASL.

We have just seen that ASL signs can be formed with one hand or two, and it is possible to describe the phonological structure of a sign in terms of recurring formational elements such as the shape of the hand(s) or how the hand(s) move during the articulation of the sign. These phonological elements can be isolated through comparison of minimally different, semantically unrelated pairs of signs. We contend that this traditional approach to sign language analysis has led the field of sign language linguistics to retain several unquestioned assumptions about the nature of linguistic structure (e.g., Fernald and Napoli 2000: 42 after Liddell and Johnson 1986: 496). These assumptions can be articulated as follows:

Assumptions of a Structuralist Theory

  1. 1.

    Building blocks: Linguistic expressions are built from smaller, discrete units, either phonemes (meaningless building blocks) or morphemes (meaningful building blocks).

  2. 2.

    Lexicon and grammar: Linguistic knowledge is divided into two types: the lexicon as a list of minimally meaningful forms, and the grammar as the set of rules that create well-formed complex utterances.

  3. 3.

    A Derivational theory: Languages are learned through abstraction of rules that combine morphemes to derive semantically compositional expressions.

In the remainder of this article, we present a usage-based implementation of Construction Morphology for the analysis of sign language structure. In advocating for a construction-theoretic approach, we take as our point of departure the “rule/list fallacy” (Booij 2010; Bybee 2006; Langacker 2008). The rule/list fallacy is the belief that grammatical rules and lexical entries constitute mutually exclusive kinds of linguistic knowledge, following from the assumption that the meaning of any complex linguistic expression can be computed as a function of the meanings of its parts. As a consequence of this assumption, any linguistic expression must be treated either as complex and made of smaller meaningful building blocks, or, alternatively, as a simple, minimally meaningful building block, itself. In Sect. 2, we describe how the rule/list fallacy has contributed to two long-standing categorization problems in the field of sign language linguistics, which we refer to as the Core vs. Classifier problem and the Language vs. Gesture problem, and which we address in Sects. 3 and 4, respectively.

2 The Rule/List Fallacy

As we have just described, the field of sign language linguistics has been established on a broadly structuralist foundation. Through exhaustive decomposition of conventional signs into meaningless-seeming parts, sign language linguists have identified formative building blocks that can be considered analogous to the phonemes of spoken languages. In early linguistic analyses of sign language structure, this conceptual move was essential to the argument that sign languages are indeed full-fledged human languages and not collections of pantomimic gestures.

In this section, we demonstrate that the principle of exhaustive semantic decomposition, or analyzability, also continues to occupy a central and explanatory role in sign language linguistics. In this “post-Bloomfieldian” tradition (Blevins et al 2016), sign linguists treat analyzability as a metric that can divide linguistic expressions into two categories: linguistic expressions such as words and phrases are analyzable when they exhibit compositional and thus meaningful internal structure, and are unanalyzed when they are holistic, minimally meaningful forms exhibiting only meaningless internal structure. Analyzability is thus used as a criterion to distinguish minimally meaningful lexical entries from derived complex expressions: a linguistic expression that is not predictable from the meanings of its parts is considered to be lexically listed as a morpheme, while expressions that are analyzable in terms of their parts are considered to have been derived from the concatenation of morphemes by rule.

Under a post-Bloomfieldian conceptualization of language structure, lexicon and grammar are defined in opposition to one another, such that complex expressions are considered to be derived by the grammar, while minimally meaningful expressions are instead retrieved from a list of learned items in the lexicon (e.g., Pinker and Prince 1988, 1994; Pinker 1999). Accordingly, simple linguistic symbols like run, which are considered to have arbitrary meanings, are analyzed as different in kind from complex constructs like running, whose forms and meanings are considered to be predictable according to general derivational rules. This view leads naturally and intuitively to the conclusion that, with the appropriate grammatical rules identified, it would be inefficient and inelegant to also commit complex words to lexical storage, whether as part of a speaker’s linguistic knowledge or in an actual printed dictionary.Footnote 6

However, it is an error to elevate this axiom of descriptive economy to the level of a foundational assumption about the mental nature of human language (see Hockett 1967: 219 for a similar observation). As formulated by Langacker (1987), the rule/list fallacy is the assumption that these two kinds of linguistic knowledge, rules and lists, are mutually exclusive, to begin with.Footnote 7 Though complex expressions may sometimes exhibit regular, fully transparent structure, these structural considerations in no way preclude complex expressions from becoming entrenched, or stored and activated holistically in the minds of speakers (Bybee 2001, 2010; Langacker 1987, 2008). The alternative to the rule/list fallacy is for linguistic theories to recognize that knowledge about specific items and knowledge about sets of related items “can perfectly well coexist in the cognitive representation of linguistic phenomena” (Langacker 1987: 42). Accordingly, in a usage-based theory, a speaker’s mental representation of their language is affected by their myriad experiences using their language, and not determined by structural principles alone.

This conceptual shift to a usage-based approach has profound implications for analyses that appeal to a notion of lexicalization. Because sign linguists have traditionally assumed an a priori division between lexicon and grammar, the term lexicalization has been used to describe the process through which an expression with rule-governed, analyzable structure has become lexically listed as an unanalyzed whole (see Battison 1978: 342; Berent and Goldin-Meadow 2015; Cormier et al. 2013; Klima and Bellugi 1979: 80; Liddell and Johnson 1986 for examples of this view). In this sense, lexicalization refers to the process of “having become a part of the lexicon”, with holistic rather than compositional semantics (see e.g., Himmelmann 2004; Lepic 2015).Footnote 8

To take one example, stemming from the very practical concern of determining what ought to be listed in a sign language dictionary, Johnston & Schembri (1999: 115) define a lexeme as “a linguistic unit with a ‘given’ rather than a ‘generated’ meaning”, such that lexemes are listed in the lexicon, rather than derived by the grammar. Here the criterion of unanalyzability determines whether a sign has been lexicalized: lexemes are lexicalized signs that are holistically paired with meanings that go beyond the sum of their parts or are otherwise not amenable to compositional analysis (see Johnston and Schembri 1999: 127–129 for examples).

As another example, from a theoretical perspective, Aronoff et al. (2003: 74) are also explicit in their use of analyzability as a metric for lexicalization: they consider signs to be unanalyzed lexical entries, and characterize the lexicalized sign write in Israeli Sign Language as an “unanalyzed sign” that is “listed in the mental lexicon”. Though Aronoff and colleagues also demonstrate that signers readily “reanalyze” the structure of “unanalyzed” signs in the course of normal signing, they assume a distinction between listed signs with meaningless phonological structure and reanalyzed signs with morphological structure, treating them as resulting from distinct modes of linguistic knowledge.

In setting aside the rule/list fallacy, we recast “grammatical rules” and “lexical lists” as inherently inseparable forms of linguistic knowledge. Because construction-theoretic accounts do not use the criterion of analyzability to determine whether a construction has been committed to linguistic knowledge, they allow for conventional, actually occurring words, whether simple or complex, to be registered as part of linguistic knowledge. In a usage-based theory of Construction Grammar, a speaker’s individual linguistic experience determines the extent to which the expression is represented (“entrenched”) as a unit in their linguistic knowledge. This degree of entrenchment is instead determined by facts of language use, particularly frequency of occurrence (see Brooks et al. 1999 on children, Bybee and Scheibman 1999; Bybee 2001 for more general description). As such, the “lexicon” and “grammar” are not considered to be distinct components of linguistic knowledge, and are not adopted as theoretical primitives. Instead, constructions vary in fixedness and specificity and are related to one another in a single, highly structured network (sometimes referred to as the “constructicon”). Similarly, signs with wholly specified forms and meanings co-exist with more variable morphological schemas, and thus exhibit graded rather than discrete internal structure (see also Hay and Baayen 2005 for similar arguments in spoken language morphology).

3 The Core vs. Classifier Problem

3.1 Signs Exhibit Ambiguous Sublexical Structure

In this section, we address the Core vs. Classifier problem, the first categorization issue following from the assumptions of a structuralist approach to sign language morphology. Here we suggest that assuming an a priori division between unanalyzed and analyzable forms precludes an intuitive analysis of transparent morphological structure in conventional ASL signs.

As an initial example, consider the ASL sign pictured in Fig. 3, glossed as meet. This sign is standardly formed with the hands held upright, with both index fingers extended, and with the hands moving to contact one another in front of the signer’s body in a single, coordinated movement. Like the English word meet, this sign has a conventional, agreed-upon meaning, “to come together or become acquainted”, in ASL.

Fig. 3
figure 3

The ASL sign glossed as meet is conventionally formed with two “1” hands moving to contact one another in front of the body

Following the discussion of sign structure in Sect. 1, we can describe the phonological structure of the sign meet as involving two “1” handshapes, which are part of the inventory of conventional handshapes in ASL, and a straight “path” movement, also found in other ASL signs, here executed by each hand simultaneously. However, the sign meet is also instructive because its phonological structure co-varies with its meaning in numerous ways. For example, unlike the English word meet, the form of the ASL sign meet implies that exactly two human participants are involved, carrying out a reciprocal action with a defined endpoint. This is because it is signed with two hands, and the shape and movement of each hand profiles the upright shape and forward movement of a human body in motion (see Lepic et al. 2016). The structure of the ASL sign meet can therefore be considered both motivated by and reflective of its meaning: it is a morphologically complex sign.

The form of the sign meet can also be altered to describe an encounter between a couple and an individual, by changing the shape of the dominant hand to form a “2” handshape, with index and middle finger extended, while keeping only the index finger on the non-dominant hand extended. Or the movement pattern can be altered to spatially align one hand with the signer, and the other with the addressee, as is conventional in the common greeting “nice to meet you”.Footnote 9 Or the sign meet can be altered to form a morphologically-related sign that we gloss here as miss-each-other, by moving the hands past one another, instead of bringing them together, as in Example 1, with the relevant signs pictured in Fig. 4:

Fig. 4
figure 4

Two morphologically-related signs from Example 1, (a) meet and (b) missed-each-other

(1)

two friend should meet index school, but oh-i-see, missed - each - other

two friends were supposed to meet at school, but they missed each other

As suggested in Sect. 2, previous treatments of sign-internal structure have discussed the difference between signs like meet and missed-each-other in terms of mutual exclusivity between unanalyzed “core” lexical signs and morphologically complex “classifier constructions” (e.g., Brentari and Padden (2001), but see also Brennan (1990) for a different view). Core lexical signs like meet have standard citation forms and meanings that are considered to be idiosyncratic or are otherwise unpredictable from their sub-lexical structure. As conventional pairings of meaning and form, these signs can be expected to be found in an ASL dictionary, for example. Classifier constructions like miss-each-other, in contrast, exhibit more variability and transparency; they have non-standardized forms and are necessarily interpreted in context, and so are not expected to be found in the dictionary.

In sign language linguistics, classifier constructions are so named because they use an inventory of handshapes to classify referents according to semantic criteria (Supalla 1982, 1986). In ASL, the “1” handshape, with only the index finger extended, is a semantic classifier for upright (human) figures, as in meet. Similarly, the “3” handshape, with thumb, index, and middle finger extended, is a semantic classifier for vehicles, and the “A” handshape, with only the thumb extended, is semantic classifier for upright objects like statues and buildings, more generally. In any classifier construction, the movement and location of the hands depicts the movement and/or location of the referent entities. For example, Klima and Bellugi (1979: 14) and Supalla (1986: 205) provide illustrations of how these handshapes can be used productively to depict people, vehicles, and objects either meandering along, winding up a hill, or arranged in a row, by altering the way the hand moves in the articulation of the classifier construction. Other types of classifiers include handling classifiers, which categorize referents according to how they are held and manipulated, as well as other classifiers that categorize and describe objects according to their size and shape.

Crucially, classifier constructions all have in common that they are interpreted in context, and seem to straightforwardly derive their meanings from the meanings of their internal parts, namely the shape and movement of the hand(s): “Classifier construction” has become, in a way, a general label for a class of morphologically transparent and highly productive uses of the body and space in discussions of sign language structure (but see Schembri 2003 and Cormier et al. 2012 for critical discussion regarding the name for this class of phenomena).

Relevant for our construction-theoretic analysis here is the point that many core lexical signs exhibit synchronic connections to productive classifier construction signs. One candidate description of this relationship is that the “lexical” sign meet is homophonous with “classifier construction” meaning “two upright beings approach each other face to face” (Eccarius and Brentari 2007: 1170; recall also Figs. 3 and 4a). Similarly, the conventional ASL sign fall can be regarded as homophonous with a transparent classifier construction meaning “a two-legged entity inverts and moves downward” (cf. Supalla 1986: 183), and the ASL sign write can be considered an “unanalyzed” sign that is related to a productive classifier construction meaning “a thin instrument is moved across a flat surface” (cf. Aronoff et al. 2003: 74).

As Fig. 5 suggests, these sign forms are each ambiguous between a more idiomatic interpretation as a core lexical sign and a more analytic interpretation as a classifier construction (see also Johnston and Ferrara 2012 for a similar observation). The theoretical puzzle that these sign forms present is how to best account for their dual nature as holistic lexical signs and as complex signs exhibiting analyzable morphological structure. We name this categorization dilemma the Core vs. Classifier problem.

Fig. 5
figure 5

(a) The ASL sign glossed as fall is formed with a “2” hand moving downward, and (b) the ASL sign glossed as write is formed with a dominant “precision grip” hand moving across a “flat” non-dominant hand

We wish to emphasize here that it has never been a question whether many ASL signs are amenable to idiomatic- and analytic-seeming interpretations; on the contrary, this ambiguity has been noted in several previous studies. Assuming that core lexical signs and classifier construction signs are mutually exclusive categories, these previous analyses have been primarily concerned with the nature and directionality of the relationship between unanalyzed lexical signs and analyzable classifier constructions. The cross-linguistic tendency for classifier signs to become increasingly idiomatic with repeated use has been described, for example, as “freezing” (Supalla 1986: 183), “lexicalization” (Aronoff et al. 2003), and even “local lexicalization” (Johnston and Schembri 1999: 123). Similarly, the tendency for core lexical signs to be used in a way that suggests that they nevertheless exhibit transparent morphological structure has been described as “mimetic elaboration” (Klima and Bellugi 1979:13), “backformation” (Sandler and Lillo-Martin 2006: 94), and “de-lexicalization” (Cormier et al. 2012: 388).

However, we contend that this fluid ambiguity between idiomatic lexical signs and transparent classifier signs is only remarkable if we assume a categorical division between core lexical signs and classifier construction signs, to begin with. In the remainder of this section, we propose an alternative analysis of sign structure, following the assumptions of a construction-theoretic approach. Rather than stored, unanalyzed forms, frequently-occurring signs are considered to be fixed pairings of meaning and form that become increasingly entrenched in linguistic knowledge as a result of a language user’s individual experience with language. Though “fixed” in form, these entrenched signs nevertheless retain gradient aspects of analyzable structure. We demonstrate that a usage-based theory of Construction Morphology accounts for signers’ productive “reanalysis” of lexical signs as productive classifier constructions in signed discourse; this creative re-use of learned patterns demonstrates that lexical signs are holistic gestalts with analyzable internal structure.

3.2 Signs Are Gestalts Exhibiting Analyzable Structure

In Construction Morphology, morphological schemas are patterns that serve two functions: First, they summarize the conventional pairings of form and meaning that speakers (are expected to) have extracted from their experiences using their language over the course of their lives. Second, morphological schemas model any language user’s capacity to extend the patterns of their language to create or interpret new complex expressions, productively. By registering actually-occurring complex linguistic expressions as part of linguistic knowledge, along with constructional schemas that generalize across conventional expressions, Construction Morphology avoids the rule/list fallacy described in Sect. 2. Instead, the relationship between a morphological schema and its specific instantiations is one of default inheritance (Goldberg 2013; Booij 2017), with morphological schemas organized in a network according to the aspects of meaning and form that are fixed or variable across its particular instantiations. As a result, a construction-theoretic approach captures the fact that language users’ utterances are often at once quite innovative and highly formulaic: some constructions may be fully specified and ready to use “off the shelf”, and others specify some aspects of content while also leaving schematic slots open for new content. Though constructions contain both specific and schematic aspects of form and meaning, “specificity” and “schematicity” are gradient rather than categorical notions, and so aspects of form and meaning exist on a cline from more specific to more schematic.

Applying these assumptions to the analysis of sign language morphology, we can first think of concrete utterances or tokens of “the same sign” as instantiations of an abstract constructional representation of that sign type. For example, the sign deaf, which occurs frequently in ASL discourse, has been previously described as occurring in three phonetic variants: one in which the index finger moves from ear to chin, one in which the index finger moves from chin to ear, and one in which the index finger contacts the cheek only once (Fig. 6, see Liddell and Johnson 1986 and Bayley et al. (2000) for examples and discussion).

Fig. 6
figure 6

In all variants, the ASL sign glossed as deaf is formed with one “1” hand contacting the side of the face between the mouth and ear

Though the use of each of these variant forms is conditioned in part by social and structural factors, including preservation of the preceding phonological place of articulation, the recognition that these different forms are variants of the same sign reflects the fact that signers categorize them as instantiations of the same morphological construction. Grouping distinct usage events as instances of the same element, whether word or phrase, provides evidence for that element’s status as a conventional pairing of form and meaning that has been registered as part of the user’s linguistic knowledge.Footnote 10

It has similarly been shown that repeated fingerspelled words undergo slight phonetic reduction as they recur within a single stretch of ASL discourse (Brentari 1998): signers’ recognition of these distinct usage events as tokens of the same word despite their phonetic reduction suggests that they too are categorized as instantiations of the same morphological construction, that of a particular borrowed English word in ASL discourse. For any newly-borrowed fingerspelled word, the morphological construction may be only weakly or temporarily registered to memory, by virtue of its use being more or less limited to a single usage event as an ad hoc borrowed English word. However, certain fingerspelled words that occur frequently across a variety of contexts, such as the signs glossed as #off, #back, and #ok, have been shown to have undergone considerable phonological restructuring and semantic specialization (cf. Battison 1978, who uses #-notation to indicate highly nativized fingerspelled signs in ASL). These changes can be seen as resulting from (and contributing to) these particular fingerspelled words’ becoming entrenched as bona fide ASL signs, with corresponding constructional representations, by virtue of their frequent use.

Language users may also gradually stop categorizing frequently-used variants of “the same sign” as instances of the same sign. For example, when signs undergo the gradual process of grammaticalization, they become conventionally associated with a particular grammatical function, and typically also exhibit specialized phonetic reduction (Bybee 2010). This divergence in usage patterns can result in the gradual formation of a new set of exemplars and a corresponding constructional schema that may overlap only partially with the original schema. This process has been documented with respect to the ASL signs finish (Janzen 1995), self (Wilkinson 2013), and happen (Anible and Occhino-Kehoe 2014): these studies all identify phonetic variants of the “same” sign, but demonstrate that the relevant phonetic variants are associated with diverging grammatical functions. This provides evidence that phonetic variants of the same sign may gradually become associated with distinct functions, phonetic realizations, and syntactic distributions, these changes both resulting from and feeding into the formation of increasingly divergent constructional schemas.

A potential outcome of this gradual divergence is that signers may ultimately stop seeing sign tokens as instances of the same sign altogether. Though the historical records necessary for analyzing sign language etymology in depth are scarce, a single historical etymon has likely yielded the synchronically distinct signs please and enjoy in ASL (Shaw and Delaporte 2014: 87; recall Fig. 2a): these signs are both formed with a flat palm tracing a small circle on the chest, differing only in that the sign enjoy is a two-handed sign, formed with the non-dominant hand mirroring the movement of the dominant hand at a slightly lower location on the abdomen, while the sign please is formed with only one hand. These signs have also diverged in function, as their English glosses suggest, with the one-handed form please functioning as a marker of politeness, and two-handed enjoy acting as a full psych verb, in ASL.

While individual sign constructions like deaf, #off, or please can be thought of as highly specified morphological schemas, representing quite fixed pairings of form and meaning, sign schemas can in turn be analyzed as instantiations of more abstract morphological schemas, which exhibit only partially-fixed structure. Such morphological schemas are referred to descriptively as sign families in the literature on sign language morphology (after Frishberg and Gough 2000): sign families are groups of signs with recurring aspects of form and meaning shared among them.

As a concrete example of a sign family, many ASL signs for “women and female family members” are conventionally articulated at the signer’s chin, including girl, mother, grandmother, woman, sister, and daughter. These signs can be analyzed as instantiations of a morphological schema in which the phonological space near the signer’s chin is associated with the meaning “female (family member)”, represented schematically in Fig. 7. In this representation, the re-use of the chin location among signs referring to “female (family members)” licenses the abstraction of a morphological schema as a pairing of meaning and form.Footnote 11 The fixed sign constructions (represented here with glosses) provide the basis for abstraction of a more general pattern, without requiring that the fixed sign constructions necessarily exhibit semantically compositional internal structure.Footnote 12

In ASL, other sign families are also organized around shared use of the chin location, as well: these include families of signs such as eat, drink, and taste, all relating to “eating”; signs such as talk, shout, and answer, all related to “communication”; and not, nothing, and deny, all inherently “negative” in some respect. Figure 8 represents this extended network of sign families that all are formed at the chin; here we see four clusters of signs that all share some element of form and meaning. In these families of signs, the chin location is a fixed constant across a group of semantically-related signs. However, each family associates the same identifiable formal element with a different aspect of meaning. This view of morphological schemas as emergent generalizations over actually-occurring signs suggests that the chin location is not an independently listed as a phonological realization of a minimal unit of meaning, but rather comes to be associated with particular aspects of meaning as a result of its systematic reuse across a number of conventional signs in ASL.

Fig. 8
figure 8

Four families of ASL signs sharing a place of articulation at the signer’s chin

Thus far, we have been describing how a Construction Morphology analysis can account for relatively fixed, listable sign constructions (which we might otherwise refer to as “core lexical signs”) and their corresponding morphological schemas (which we might otherwise refer to as “sign families”). This usage-based, construction-theoretic analysis of sign structure can also be extended to classifier signs to resolve the Core vs. Classifier problem. Under this analysis, “classifier constructions” make productive re-use of morphological schemas that have been extracted across multiple sign tokens.

One such morphological schema is the “movable object” construction. In ASL, several signs are articulated with two “A” hands, a closed fist with the thumb extended, moving relative to one another in signing space. Many of these signs also describe the relative movements and locations of paired referent entities. Frishberg and Gough (2000: 112), for example, list several signs that participate in this sign family, including ahead, behind, challenge, chase, date, fall-behind, far, follow, game, pass, and together; other conventional signs in this family include avoid, compete, superior, and which (see also Supalla and Clark 2014). These examples can all be considered conventional sign constructions in ASL, as they are all specified as fixed, conventional pairings of meaning and form. However, these sign constructions also exhibit analyzable internal structure, which provides the motivation for grouping them together in a sign family in the first place.

Like the examples fall and write discussed above, each of these sign constructions can be said to correspond to both a holistic/idiomatic and a compositional/analytic meaning. However, from a construction-theoretic perspective, idiomaticity is recognized as a gradient rather than categorical status: some signs, such as challenge, date, game, and which, seem to exhibit quite unpredictable meanings, while for other signs, like ahead, behind, pass, and together, even the learned conventional meaning remains quite transparent. This is illustrated with the signs challenge, which derives metaphorically from two paired entities contacting “head on”, and ahead, which straightforwardly places one hand ahead of the other to represent a spatial configuration between two entities, in Fig. 9.

Fig. 9
figure 9

(a) The ASL sign glossed as challenge is formed with two “A” hands moving to contact one another in front of the body, and (b) the ASL sign glossed as ahead is formed with a dominant “A” hand moving in front of a non-dominant “A” hand

A morphological schema that describes the association of meaning and form across this large family of signs should specify (i) that they are all formed with two “A” hands, and (ii) that they all describe a (spatial) relationship between two entities. However, the exact nature of the relationship between the entities is to be left schematic, as is the movement used in the articulation of the sign construction. Accordingly, this morphological schema can be represented as in Fig. 10 (after Wilcox and Occhino 2016: 5 and Bybee 2001: 23).Footnote 13 Here we use the particular signs far, chase, and follow to represent their entire family: across signs in the family of “movable object” signs, the paired A-hands are fixed as part of the construction, and the movement patterns vary across signs as the relationship between the paired referents changes.

Fig. 10
figure 10

Two morphological schemas contributing to the “movable object” construction in ASL: across related signs like far, chase, and follow, handshape is fixed, but movement is variable

In this schematic representation, associations of meaning and form across three signs, far, chase, and follow, are extracted to create a morphological schema in which paired A-handshapes are fixed aspects of form that represent “paired entities”. Similarly, movement is analyzed as a less fixed, more variable schematic slot: phonological movement patterns profile relative movements and spatial relations between entities, but the particular movement patterns are not specified as part of the general “movable object” construction. Note however, that we neither expect nor reject the possibility of compositional-seeming sign-internal structure here. Though the paired handshapes are fixed as part of a schema, and seem to recombine straightforwardly with different movement patterns, the movement patterns themselves are so variable as to seem unlistable, and can be modified in quite fine-grained ways. This is not surprising if we consider signs to be holistic gestalts that also exhibit analyzable internal structure.

Beyond describing configurations of form and meaning that are shared within a family of sign constructions, morphological schemas also model a signer’s ability to create complex expressions productively. Accordingly, a morphological schema like the one in Fig. 10 makes two related predictions. First, conventional (lexical) sign constructions that instantiate a morphological schema are expected to retain analyzable internal structure, even as they begin to gradually take on more idiomatic meanings. This internal structure provides the basis for linking signs together in a family, in the first place. Second, signers are expected to productively modulate their articulation of a schematic sign construction according to the aspects of meaning to be conveyed. Thus, productive extensions of the “movable object” construction will use varied phonological movement patterns to describe the relative location and movement of two referents.

To illustrate how these predictions are borne out in actual signing, consider the ASL sentence in (2), which has been extracted from an online video posting from an ASL news show. This sentence was uttered in a discussion of the United States’ Democratic party’s primary polling results from September 2015. This sentence contains three instantiations of the “movable object” construction from Fig. 10, highlighted in bold: the sign glossed fall-back is articulated twice, followed by a sign glossed catch-up. These signs are pictured in Fig. 11.

Fig. 11
figure 11

Two related signs from Example 2, (a) fall-back and (b) catch-up (Images taken from http://youtu.be/9qeHwcYbCXs?t=2m40s)

(2)

silly big change why? two-months past, index b-e-r-n-i-e fall - back c-l-i-n-t-o-n

b-y 21 points fall - back, now catch - up and lead

…and that’s a huge change, because two months ago, Bernie was trailing

Clinton by 21 points, but now he’s caught up and is leading

In these sign tokens, the signer’s hands represent the relative metaphorical spatial positioning of two discourse referents, “Bernie’s ranking in the polls” and “Hillary’s ranking in the polls”, respectively. The analyzable internal structure in this pair of signs is also relevant for the structure of the sentence they participate in: as is schematized in Fig. 12, after the second fall-back token, the signer keeps his non-dominant hand (his left hand, L) in the same location in signing space until the dominant hand (his right hand, R) articulates the subsequent sign catch-up.

Fig. 12
figure 12

The configuration and position of the left hand is maintained across the three-sign sequence (a) fall-back, (b) now, and (c) catch-up (Images taken from http://youtu.be/9qeHwcYbCXs?t=2m40s)

In this sentence, the continued presence of the non-dominant hand in the signing space after the second fall-back token continually profiles “Hillary’s ranking in the polls”. The subsequent movement of the dominant hand forward to contact the non-dominant hand matches a semantic construal of “Bernie’s ranking in the polls” catching up to “Hillary’s ranking in the polls”.

The question of whether the signs fall-back or catch-up should be analyzed as core lexical signs or as classifier construction signs is entirely beside the point here: both signs can be analyzed as concrete instantiations of the “movable object” schema represented in Fig. 10, gradiently altering the movement of the hands according to the spatial relationship to be described. Moreover, rather than the degree to which they are analyzable, the degree of entrenchment or conventionality of either sign are expected to be dependent on their frequency of use, both at the level of the sign token and at the level of the constructional type (after Bybee 2010; Hay and Baayen 2005).

In this section, we have demonstrated that a usage-based, Construction Morphology approach to sign language analysis presents a straightforward solution to the Core vs. Classifier problem, the assumption that all signs must belong to one of two mutually exclusive categories, based on their analyzable internal structure. A construction-theoretic analysis instead treats entrenched, highly fixed “lexical” signs and more schematic and productive “classifier” signs alike as learned pairings of form and function (or meaning). Rather than assigning individual sign tokens to distinct domains of linguistic knowledge, all sign constructions can be considered primarily meaningful wholes that also exhibit gradient internal structure. Constructions exist on a continuum from highly entrenched to highly productive sign tokens, with many signs falling somewhere in the middle as wholes exhibiting some analyzable morphological structure.

4 The Language vs. Gesture Problem

Now we turn to discuss the Language vs. Gesture problem, the second categorization dilemma following from the assumption that language is formally discrete and semantically compositional in nature. As we have described in Sect. 1, sign languages were once considered to be non-linguistic systems akin to pantomime. Because they were working against widespread misconceptions even within the field of linguistics, many early studies of sign language structure were devoted to debunking the idea that sign languages are “mere gesture”. This was accomplished by showing that, like spoken languages, sign languages exhibit morphophonological and morphosyntactic structure that can be described using symbolic structural rules.

However, several empirical and theoretical questions remain concerning the relationship between signed utterances and the visible actions that hearing people naturally produce while speaking. In large part, these questions arise as a consequence of cognitive scientists also rejecting the assumption that gestures are idiosyncratic wholes lacking conventional or analyzable internal structure (see the work of Abner et al. 2015; Calbris 1990; Kendon 2008; Núñez and Sweetser 2006; Singleton et al. 1993). On the contrary, like signs, gestures may exhibit analyzable structure or become entrenched and conventional for individuals and communities of language users. Several studies have demonstrated that there is a close functional connection between spoken language and co-speech gesture: these studies have revealed that spoken language is embodied (Barsalou 2008; Glenberg and Kaschak 2002; Marghetis and Bergen 2015), multimodal (Andrén 2014; Cienki 2013; Kok and Cienki 2014; Vigliocco et al. 2014), and dynamic (Elman 1995; Langacker 2000).

For example, work in simulation semantics and embodied cognition has shown that linguistic utterances are not discrete or isolated from the real world. Instead, utterances are integrated with real-world cues as humans simulate and update their understanding of what is being said to make contextual inferences in real-time (Barsalou 1999, 2008; Bergen 2007; Zwaan and Madden 2005). Similarly, recent work on multi-modal spoken language has shown that references to analogical, real-world structure are rampant in discourse. One reflex of this is that many speech acts are infelicitous without an accompanying co-speech gesture which makes reference to real space as, in the example, “Then the car went [tracing the trajectory of the car in the air]” (Kok 2016: 164). Zima (2014; 24) also found that 85% of tokens of the “all the way from X PREP Y” construction occurred with a gesture that “filled in” the relevant spatial information, revealing a tight link between the spoken and gestural modes. This mounting evidence from various disciplines suggests that both language and gesture vary in degree of conventionality and innovation, compositionality and idiosyncrasy, discreteness and holism, schematicity and specificity, abstraction and concreteness.

In accordance with this changing perspective on co-speech gesture, in sign language linguistics the discussion of the relationship between sign and gesture has now shifted to determine to what extent the gestures of hearing non-signers and the signs of sign language might similar functions or exhibit similar kinds of structure (Cormier et al. 2012, 2013; Emmorey 1999; Goldin-Meadow and Brentari 2017; Johnston and Schembri 1999; Liddell 2003; Padden et al. 2013; Sandler 2009; Schembri et al. 2005). One consequence of this line of questioning in sign language linguistics has been the hypothesis that, just as spoken language is multimodal, and transmitted through an integrated linguistic (spoken) and gestural (manual) channel, sign language might similarly be viewed as an integration or fusion of linguistic and gestural material. It is here that we encounter the Language vs. Gesture problem, which derives from the structuralist assumption that language and gesture are inherently different in some respect, and are combined in the course of multimodal communication (see also Wilcox and Occhino 2016 for arguments against this view).

In assuming an a priori, categorical division between language and gesture, the sign linguist takes on the burden of determining which aspects of sign language use can be considered linguistic and which are gestural (e.g., Emmorey 1999; Goldin-Meadow and Brentari 2017; Goldin-Meadow et al. 2012; Liddell and Metzger 1998; Sandler 2009). In lieu of the obvious articulatory difference between the voice and the hands, previous analyses have sought to define gesture by positing a categorical distinction between elements that are listable, analyzable, and conventional, on the one hand, and those that are holistic, context-dependent, and defy rule-based generalizations, on the other. In sign language linguistics, then, “gesture” has recently been repurposed as a general label for any kind of graded structure, especially aspects of signing that index or analogically represent some real-world property such as space or movement. Gestural analyses have been extended to include several types of signs, namely pronouns, classifier constructions, and directional verbs, which all make productive reference to what could be construed as real-world locations or spaces. Such signs involve gradient forms which are neither derivable by rule, nor listable in the lexicon (Cormier et al. 2013; Lillo-Martin and Meier 2011).

However, as we have already demonstrated in Sect. 3, in a usage-based theory of Construction Morphology, recognition of gradient structure need not pose any problem for sign language analysis: all linguistic constructions exhibit gradient structure, and highly schematic constructions are emergent generalizations extracted by language users through their experiences with language. Under a usage-based approach, gradient structure is not gesture: it is grammar. Morphological schemas of the type represented in Fig. 10 describe the schematic internal structure of conventional and entrenched linguistic constructions, thereby avoiding the Language vs. Gesture problem altogether.

Moreover, morphological schemas can also straightforwardly account for the level of innovation and variability that is observed in everyday language use, whether spoken or signed. Our proposal, then, is that the tools of Construction Morphology can similarly be extended to the analysis of multimodal spoken language which, like sign language, occupies a continuum between more fixed and more gradient aspects of structure.

Accordingly, in the remainder of this section, we illustrate how a Construction Morphology analysis can be extended to the analysis of multimodal spoken language, without appealing to the pre-specified categories of “language” or “gesture”. In doing so, we will eschew the traditional labels of speech and co-speech gesture, and instead characterize the multimodal construction as involving vocal articulations together with non-vocal articulations, including manual actions, eye-gaze, and head positioning. The purpose of this exercise is to demonstrate that multimodal spoken language constructions are similar in many ways to multichannel sign language constructions, in the sense that multiple articulatory actions simultaneously co-construct meaning. This analysis demonstrates that both vocal articulations and co-vocal manual movements exhibit recurring aspects of structure which, taken together with the rest of the multimodal utterance, create a composite meaning that exceeds the sum of its parts. In other words, multiple articulators, whether vocal or manual, or fixed or schematic, are used to construct meaning in context.

Here we analyze a two-usage-event sequence involving related multimodal constructions from a televised celebrity interview from January 2016. In this interview, the speaker retells his experience scuba diving with sharks on his honeymoon. The first multimodal construction we analyze is the speaker’s description of the movement of sharks swimming in a circle underwater, shown in Fig. 13.

Fig. 13
figure 13

Manual action co-occurring with vocal information: “They swim very close, these guys swim very close.” (Images taken from https://youtu.be/2mPsb3V-Y1g?t=1m16s)

In this multimodal usage event, the speaker explains in the vocal channel, “They swim very close, these guys swim very close.” During this spoken utterance, his eye gaze is primarily directed at the studio audience, holding their attention during the recounting of the behavior of the sharks. At the same time, the movement of his right hand provides relevant information about the scene. Extending an index finger on his right hand, the speaker articulates a counter-clockwise cyclic movement in front of his body, beginning with his arm extended away from his torso, and moving first toward his right shoulder, then back around to the original starting position. He executes three full circles in succession during the vocal utterance. The movement of the speaker’s hand contributes information not provided by the vocal channel, namely that the sharks move in a circle underwater.

In the second multimodal utterance, after a short aside, the speaker repurposes the elements of the previous multimodal utterance, maintaining the same handshape and the same starting location away from the body, but changing the direction of the cyclic movement. While saying “And all of a sudden, one broke off from the circle and swam behind us”, he traces a larger arc to his right, and ultimately ending over and behind his right shoulder. The speaker then shifts his body to look over his left shoulder and to point to an imagined spot in the distance with his thumb, as in Fig. 14.

Fig. 14
figure 14

Manual action, eye-gaze, and bodily movement co-occurring with vocal information: “(a) And all of a sudden, one broke off from the circle (b) and swam behind us.” (Images taken from https://youtu.be/2mPsb3V-Y1g?t=1m25s)

As the manual arc unfolds, the speaker tracks the movement of his extended index finger with his eyes. Looking beyond his finger, rather than looking directly at it, he conveys the distance of the shark, highlighting the fact that it was farther than a literal arm’s length away. As he traces a clockwise arc behind his right shoulder, the speaker’s gaze briefly returns forward, and he glances at the host, before turning his body and head to gaze over his left shoulder, and to convey that the shark had completely circled behind him, which he emphasizes by pointing with his thumb.

In this second multimodal utterance, the speaker’s manual action and his eye-gaze continue to convey information about the scene established in the first multimodal utterance. His continued use of certain aspects of the previous multimodal construction, such as the extended index finger on his right hand, lend a sense of continuity to the scene as events of the story unfold. The more varied aspects of the manual action, such as the change in motion and the change in overall trajectory, convey that, rather than circling with the rest of the sharks described one utterance prior, a particular shark from that same cohort broke away from the circling pattern, arcing out from the group and circled behind the speaker.

In this two-utterance sequence, we have identified a number of articulatory movements that together co-create a multimodal signal. The vocal channel, the movement of the right hand, and the speaker’s eye-gaze all come together to create a full, informative message. A Construction Morphology analysis allows us to capture the fact that these two usage-events make use of recurrent structural elements at varying levels of schematicity. In particular, across both usage events, the speaker’s handshape is fixed in a single configuration, while there is some degree of variation in the location, movement, and eye-gaze. These shared and varying aspects of structure alike are represented the constructional schemas in Fig. 15.

Fig. 15
figure 15

Constructional analysis of the first (a) and second (b) multi-modal utterances. These constructional schemas exhibit similar handshapes with differing movements and eye-gaze patterns. RHS right hand shape, MOV movement, LOC location, GAZE eye-gaze

Parallel to the analysis of the signs fall-back and catch-up as instantiations of the “movable object” construction in ASL in Sect. 3, here, far from simply sharing an overlap in formal features, these two multimodal utterances re-use the same formal features for similar discourse functions, in a systematic fashion. Across the two vocal utterances “They swim very close, these guys swim very close,” and “And all of a sudden, one broke off from the circle and swam behind us,” the speaker tracks “the sharks” as a discourse referent with the extended index finger on his right hand, and systematically manipulates the movement of his right hand, along with his eye-gaze, to convey information about the movement of “the sharks”.

Though the division between the vocal channel and the manual channel is quite salient to English speakers, and we are accustomed to thinking of vocal information as “language” and manual action as “gesture”, here we have demonstrated that it is possible to provide a complete analysis of multimodal communication without assuming these labels as analytic primitives. The alternative, under a Construction Morphology approach, is to start with constructions as pairings of form and function, and to analyze the patterns that emerge from systematic reuse of form-meaning pairings as they unfold in language use. Configurations of form and meaning that recur across constructions are likely to be categorized by the language user as participating in the same construction, leading to the extraction of emergent morphological schemas with repeated use. These schemas both describe the structure of observed multimodal constructions and explain how future usage events can put constructional schemas to productive, innovative use. Aspects of structure that are consistent across usage events are likely to be extracted as a more fixed aspect of the construction, while aspects of structure that vary across usage events are likely to be extracted as more variable in the construction.

For the sake of completeness, in Fig. 16 we provide a more in-depth characterization of the usage event in Figs. 14 and 15b as a multimodal construction. Here we provide descriptions of the forms and functions of many of the component constructions that together form the larger multimodal composite construction.Footnote 14

Fig. 16
figure 16

Multimodal form-function analysis of the bodily actions that accompany the vocal utterance “And all of a sudden, one broke off from the circle and swam behind us”. RHS right hand shape, MOV movement, LOC location, HEAD head positioning, GAZE eye-gaze

The representation in Fig. 16 is compressed and does not illustrate the dynamic, temporal relationship between each of these component articulatory channels. However, it shows that each component construction consists of both a formal and a functional side, forming a symbolically complex unit unto itself. Rather than together determining the meaning of the composite multimodal utterance, these sub-constructions are only meaningful when taken together, in the context of the other component structures, as parts of a larger whole. In other words, the individual channels in Fig. 16 both exhibit aspects of internal structure and derive their meaning from the structured gestalt they appear in: they are structured wholes that themselves are the structure for larger structured wholes. Consistent with Goldberg’s (2006: 18) characterization of linguistic organization, the multimodal usage event can be straightforwardly analyzed in Construction Morphology as consisting of “constructions all the way down”.

5 Conclusion

In this article, we have demonstrated that the theory of Construction Morphology can resolve two long-standing categorization problems in the field of sign language linguistics. These categorization problems arise as a consequence of the assumptions that linguists are accustomed to making in the course of analyzing sign language structure: when we assume that language is inherently compositionally structured, and complex utterances are built up procedurally from independently meaningful parts, we are led to the conclusion that any linguistic expression must either be compositionally structured, or a minimal building block itself. We have named this dilemma the Core vs. Classifier problem. When we assume that language and gesture constitute distinct categories, such that that linguistic patterns are discrete and rule-governed, while gesture is holistic and idiosyncratic, we are led to the conclusion that any gradient aspects of signing must be considered gestural and non-linguistic, by definition. We have named this dilemma the Language vs. Gesture problem.

As a construction-theoretic approach to word-internal structure, Construction Morphology instead assumes that morphological schemas are abstractions of patterns over memorized complex words, exhibiting fixed as well as variable aspects of structure. Under this view, the fact that conventional signs may exhibit transparent, analyzable structure, and that everyday signing may involve both highly conventional and highly innovative utterances, is neither unexpected nor surprising. In a construction-based theory, recurring structurally complex expressions are expected to be associated with holistic meanings and functions, just as they are expected to participate in larger families of related constructions, and just as they are expected to exhibit analyzable internal structure. Our linguistic knowledge consists of a structured network of parts and wholes, in accordance with our daily experiences using our language(s).

This construction-theoretic perspective applies to the signs of sign languages, whose constructional representations may range from almost entirely formally fixed, in the case of well-entrenched, conventional signs, to almost entirely schematic, in the case of one-off sign tokens that occur naturally in everyday signing. This perspective also applies to multimodal communication more generally, which can be viewed as involving structured wholes which themselves display analyzable structure. Though there has been little contact between the fields of Construction Grammar and sign language linguistics, to date, we are confident that sign language linguists and Construction Grammarians alike will benefit from continued discussions of cross-linguistic variation, multimodal language use, and morphological transparency, moving forward.