Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

3.1 Introduction

It is generally accepted that the most basic use of language is in conversation or dialogue. Everyone who speaks can converse, whereas the ability to give a speech or even the ability to listen to one is difficult to acquire. Yet dialogue has never taken priority in the language sciences. Theoretical linguists analyze isolated sentences of the kind found in monologue. Until recently, this was also true for computational linguists. In turn, psycholinguists concentrate exclusively either on processes of language production or on processes of language comprehension without considering the relationship between the two.

By contrast, this chapter deals with dialogue processes and attempts to explain why interactive language use is so easy compared to speaking or listening on your own. It is not just that dialogue is basic. We argue that it may also tell us something about language learning and language change. Language is acquired through conversation. Furthermore, even adult conversationalists adapt their language to that of their partner. Hence, during dialogue language learning takes place all the time. In this respect there is a kind of continuity between childhood language acquisition and adult conversation. Continuous speaker adaptation may also help explain why languages are always historically changing. In this way language processes occurring in short-lived interactions may tell us something about language over a larger time-scale.

3.2 The Challenge of Conversation

Conversation involves an extremely complicated set of processes in which participants have to interweave their activities with precise timing, and yet it is a skill that all speakers seem very good at (Garrod and Pickering 2004). To understand how remarkable this is, consider this transcript of a dinner-party conversation (Tannen 1984), with brackets indicating overlapping speech and numbers indicating noticeable pauses in seconds:

  1. 1

    A: I shook hands with Rubenstein once? [and his hand

  2. 2

    B:             [Yeah we did together

  3. 3

    A: That’s right. we were together. wasn’t it incredible?

  4. 4

    B: (laughing) oh it was like a cushion.

  5. 5

    C: What’s this?

  6. 6

    A: [I (0.5) we shook] hands with Rubenstein.

  7. 7

    B: [Rubenstein’s hands].

  8. 8

    D: and he had –?

  9. 9

    A: his hands –

  10. 10

    D: short stubby hands?

  11. 11

    A: they were like (0.5) [jelly. they were like — (1.0)

  12. 12

    B:             [a famous concert pianist

  13. 13

    A: they were like (0.5) putty. (0.5)

  14. 14

    D: [really?

  15. 15

    A: [just completely soft and [limp

  16. 16

    B:              [mush

  17. 17

    A: just mush. it was as though there was [no bone

  18. 18

    B:             [and warm.

  19. 19

    D: and short stubby fingers?

  20. 20

    A: short stubby fingers but just (0.5) totally covered with

  21. 21

    B: fat.

  22. 22

    A: fat

This conversation differs greatly from formal prose (such as the rest of this chapter). In particular, the speakers regularly produce elliptical and fragmentary utterances that would make little sense on their own (e.g., 7, 10, 12, 16, 17, 20). It is jointly constructed by all four speakers, and involves a great deal of interruption, overlapping speech, and disfluency. However, the participants appear to be satisfied with the conversation. They seem to understand what everyone says, as do non-participants such as ourselves. How can this be?

The more we think about conversations such as this, the more remarkable they appear. The interlocutors cannot be sure what contributions their partners are going to make, so they cannot securely plan far in advance. They have to construct their utterances so that they are appropriate for their listeners at that particular point, and therefore must pay constant attention to any feedback (e.g., whether a particular term is understood). For example, B’s interruption at (2) causes A to abandon (1) and produce the appropriate response (3) on the fly. They have to decide whether to contribute to a conversation and if so precisely when they should do so, and they may have to decide who to address. In addition, they have to constantly switch between speaking and listening, even though task-switching is often difficult.

So why is dialogue so easy? We believe that the key to this question is found in its repetitiveness. Notice how the participants reuse each others words and expressions. For instance, consider the various repetitions of hands and Rubenstein in the conversation above (6, 7, 9, 10). Our central argument is that such repetitiveness is mirrored in the participants’ minds, so that they are replicating each other’s mental states and not merely their form of words. This is the core to what we shall call interactive alignment.

3.3 Interactive Alignment During Conversation

One argument for why conversation is so easy is that conversational partners tend to become aligned at different levels of linguistic representation and therefore find it easier to perform this joint activity than the individual activities of speaking or listening (Garrod and Pickering 2009). Pickering and Garrod (2004) explain the process of alignment in more detail in terms of their interactive-alignment account. According to this account, conversation is successful to the extent that participants come to understand the relevant aspects of what they are talking about in the same way as each other. More specifically, they construct mental representations or models of the situation under discussion, and successful conversation occurs when these models become aligned. Such alignment largely occurs as a result of the tendency for conversational partners to repeat each other’s choices at many different linguistic levels, such as words and grammar (e.g., Branigan et al. 2000; Brennan and Clark 1996; Garrod and Anderson 1987). This is a form of imitation. Essentially, conversational partners prime each other to speak about things in the same way, and people who speak about things in the same way are more likely to think about them in the same way as well (Box 1). In this way the language processing system makes a virtue out of what appears to be a vice, by coupling together speaking and listening processes.

The interactive alignment account has three implications for language processing in dialogue. First, it implies that there is parity of representations used in speaking and listening. Second, it depends on the idea that alignment processes operating at different levels (words, structure, meaning) interact in such a way that increased alignment at one level leads to increased alignment at other levels. Finally, it assumes that these alignment processes are based on imitation which is largely automatic.

Box 1: Evidence for linguistic imitation at many levels

Evidence for imitation is found in many language experiments. Interlocutors become aligned at many different linguistic levels simultaneously, almost invariably without any explicit negotiation. At the level of the situation model, interlocutors align on spatial reference frames: if one speaker refers to objects egocentrically (e.g., ‘on the left’ to mean on the speaker’s left), then the other speaker tends to use an egocentric perspective as well (Watson et al. 2004). More generally, they align on a characterization of the domain, for instance using coordinate systems (e.g., A4, D3) or figural descriptions (e.g., T-shape, right indicator) to refer to positions in a maze (Garrod and Anderson 1987; Garrod and Doherty 1994, see Box 4). They also repeat each other’s referring expressions, even when they are unnecessarily specific (Brennan and Clark 1996). Imitation also occurs for grammar, with speakers repeating the syntactic structure used by their interlocutors for cards describing events (Branigan et al. 2000, see Box 2 for details) or objects (Cleland and Pickering 2003), and repeating syntax or closed-class lexical items in question-answering (Levelt and Kelter 1982). They even repeat syntax between languages, for example when one interlocutor speaks English and the other speaks Spanish (Hartsuiker et al. 2004). Finally, there is evidence for alignment of sound representations (Pardo 2006), and of accent and speech rate (Giles et al. 1992).

3.3.1 Parity of Representations

A critical aspect of the alignment model is what we term parity of representations—the same representations are constructed during speaking and listening. In other words, language involves use of a common code for representing your own actions (your speech) and your partner’s actions (his or her speech). This explains why linguistic repetition occurred in experiments such as Branigan et al. (2000), who had participants take turns to describe and match picture cards, and found that they tended to use the form of utterance just used by their partner (Box 2). For example, they tended to use a “prepositional object” form such as the pirate giving the book to the swimmer following another prepositional object sentence but a “double object” form such as the pirate giving the swimmer the book following another double object sentence (though both sentences have essentially the same meaning). In such cases, the same grammatical representation is activated during speaking and listening.

Though the common-coding assumption may appear to follow from the reasonable claim that language users do not call upon different knowledge of language when speaking and listening, it is important to realize that traditional psycholinguistic theories of production and comprehension have largely developed in isolation from each other (see Fodor et al. 1974; Harley 2007). For example, theories of lexical representation during production (e.g., Levelt et al. 1999) are not used in theories of word recognition. Historically, this separation of the study of comprehension and production goes back to the idea that language can be thought of as a code. On this view, communication involves two distinct processes: encoding a message (language production) or decoding a signal to reveal the message (language comprehension). If one accepts such an account then it makes sense to study the production (encoding) process and the comprehension (decoding) process as distinct activities. However, this approach is not appropriate for understanding communication in dialogue (Garrod and Pickering 2007). During dialogue, production and comprehension processes become inextricably linked. Speakers need to interpret feedback from their addressees while speaking and addressees need to prepare appropriate responses (e.g., spoken feedback or subsequent responses to queries) while listening to the speaker. The most straightforward way of accounting for this interplay between production and comprehension processes is to assume close parity of linguistic representations underlying production and comprehension processes.

3.3.2 Percolation Between Levels of Alignment

Another important aspect of the interactive alignment account is that alignment at one level affects alignment at other levels. For example alignment of syntactic structure is enhanced by repetition of words, with participants being even more likely to say The cowboy handing the banana to the burglar after hearing The chef handing the jug to the swimmer than after The chef giving the jug to the swimmer (Branigan et al. 2000). Thus, alignment at one level (in this case, lexical alignment) enhances alignment at another level (in this case, grammatical alignment). Similarly, people are more likely to use an unusual form like the sheep that’s red (rather than the red sheep) after they have just heard the goat that’s red than after they heard the door that’s red (Cleland and Pickering 2003). This is because the meaning of sheep is related to the meaning of goat but not door. So alignment at the semantic level increases syntactic alignment. Furthermore, alignment of words leads to alignment of situation models—people who describe things the same way tend to think about them in the same way too (Markman and Makin 1998). This means that alignment of low-level structure can eventually affect alignment at the crucial level of speakers’ situation models, the hallmark of successful communication.

Box 2: Confederate dialogue experiment to test for syntactic priming (Branigan et al.2000)

A naïve participant and a confederate sat on opposite sides of a table with a divider between them. They take turns to describe cards to each other and to select the appropriate card from an array. For example, the confederate described a card as The chef giving the jug to the swimmer. After the participant selected the matching card, she tended to describe her next card as The cowboy handing the banana to the burglar. But if the confederate had described the card as The chef giving the swimmer the jug, the participant tended to say The cowboy handing the burglar the banana. Such repetition of syntactic form occurred on about 4 trials out of 6 when the confederate and the participant used different verbs. But when they both described cards with the same verb (e.g., handing), repetition occurred on about 5 trials out of 6 (Fig. 3.1).

Fig. 3.1
figure 1

Schematic representation of the experimental set-up for Branigan et al. (2000) confederate scripted syntactic priming experiment

Fig. 3.2
figure 2

Automatic channels of alignment for participants A and B during a conversation

3.3.3 Automatic Channels of Alignment

An important property of interactive alignment is that it is automatic in the sense that speakers are not aware of the process and that it does not appear effortful. Such automatic imitation or mimicry occurs in social situations more generally. Thus, Dijksterhuis and Bargh (2001) argued that many social behaviours are automatically triggered by perception of action in others (Box 3). We propose that the automatic alignment channels linking different levels of linguistic representation operate in essentially the same fashion (see Fig. 3.2). In other words, conversationalists do not need to decide to interpret the different levels of linguistic representation carried by alignment channels for them to influence alignment (Garrod and Pickering 2006). This is because the alignment channels reflect priming rather than interpretation. In addition there are aspects of automatic non-linguistic imitation that can facilitate alignment at linguistic levels (Garrod and Pickering 2009). For example, when speakers and listeners align their gaze to look at the same thing this can facilitate alignment of interpretation (Richardson and Dale 2005; Richardson et al. 2007).

Box 3: Automatic perception-action links during social interactions

Automatic perception–action links are well documented in the neurophysiological literature (e.g., motor imitation arising from the firing of mirror neurons in monkey premotor cortex; Rizzolatti and Craighero 2004) and in the psychological literature (Hommel et al. 2001). There is evidence for automatic links in controlling facial expressions, movements and gestures, and speech. For example, when observing another person experiencing a painful injury and wincing, observers imitate the wince in their own expression (Bavelas 1986). Similarly, participants will mimic postures such as foot shaking and nose rubbing carried out by a person with whom they are conversing (Chartrand and Bargh 1999) and when they repeat another’s speech they adopt the other’s tone of voice as well (Neumann and Strack 2000). Finally, conversational partners align their posture (Shockley et al. 2003).

3.4 Alignment and Routinization

The interactive alignment account gives a basic mechanism for alignment of understanding during dialogue. But also it may have a bearing on both acquisition of language and the process of language change. To understand this, we need to consider the interactive alignment process in more detail. In particular, we need to consider how it works on two time scales. First, there is alignment based on short-term co-activation of representations at various linguistic levels. This comes about through priming, whereby there is a boost in activation of relevant representations (e.g., for words or for syntactic structures) following exposure to their corresponding forms. Second, there is longer term alignment arising from the repeated co-activation of different representations. This longer term process we refer to as routinization.

As we have noted already real conversation is extremely repetitive, and the comparison with carefully crafted monologue (as in texts) is very striking indeed (Tannen 1989). Pickering and Garrod (2004) argued that expressions that are repeated become routines for the purposes of the dialogue. By routine we mean an expression that is “fixed” to a relatively large extent. Extreme examples include repetitive conversational patterns such as How do you do? and Thank you very much. Many examples are idioms, such as kick the bucket (where all the words are fixed) or keep (lose) one’s cool (where some words are fixed but others can vary). However, many common expressions such as I love you have literal interpretations.

Groups of people may develop particular types of routine, perhaps in order to aid their fluency. Kuiper (1996) described the fixed language used by auctioneers and sportscasters. For example, radio horseracing commentators produce highly repetitive and stylized speech which is quite remarkably fluent. He argued that the commentators achieve this by storing routines, which can consist of entirely fixed expressions (e.g., they are coming round the bend) or expressions with an empty slot that has to be filled (e.g., X is in the lead), in long-term memory, and then accessing these routines, as a whole, when needed. Processing load is thereby greatly reduced in comparison to non-routine production. Of course, this reduction in load is only possible because particular routines are stored; and these routines are stored because the commentators repeatedly produce the same small set of expressions in their career.

Most discussion of routines refers to the long-term development of fixed expressions that come to behave like words (e.g., Aijmer 1996; Kuiper 1996; Nunberg et al. 1994; Bybee 2006). But routines may also be established for the purposes of a particular interchange. If one speaker starts to use an expression and gives it a particular meaning, the other will most likely follow suit. In other words, routines are set up ‘on the fly’ during conversation. We propose that the use of routines contributes greatly to the fluency of conversation. For example, Pickering and Garrod (2004) give the example the previous administration, which can take on a specific meaning (referring to a particular political body) as part of a conversation, and where other interpretations of the individual words (e.g., administration meaning work) or of the expression as a whole (e.g., referring to a different political body) are not considered. The establishment of this form of words and meaning as a routine has the effect that interlocutors access it without seriously considering alternatives. In production, they do not make a difficult choice between using the word administration or its near-synonym government; and in comprehension, they do not consider (non-routinized) interpretations of the words (e.g., of administration). After the conversation is over, however, the interlocutors may ‘drop’ this routine and return to their ‘standard’ use of the words. Conversational routines can be elicited experimentally. Consider the brief transcript of an interaction (shown in Box 4) in which A and B are trying to establish their respective positions in a maze. In particular, the expression right indicator takes on a specific meaning (referring to a particular configuration within mazes). Once the players have fixed on this expression and interpretation, they do not describe the configuration in alternative ways. Although we can be less certain of what happens during comprehension, the responses to references to right indicator strongly suggest that they also understand the expression in its special sense.

Pickering and Garrod (2005) drew a distinction between short-term interactive alignment and routinization. Interactive alignment involves the priming of particular levels of representation and the links between those levels. Producing or comprehending any utterance leads to the activation of those representations, but their activation gradually decays. However, when interactive alignment leads to sufficiently strong activation of the links between the levels, routinization occurs. Routinization involves the setting down of new memory traces associated with a particular expression, so that the expression becomes lexicalised. A formal approach compatible with this is found in Jackendoff (2002), who argues that lexical entries consist of linked components concerned with meaning, sound structure (phonology), and syntax. For example, the word indicator would consist of a sound representation (in phonemes) linked to a syntactic representation (Noun) linked to a conceptual representation (POINTING DEVICE). This scheme can be extended to account for complex lexicalisations such as right indicator or kick the bucket.

Pickering and Garrod (2005) argued that routines are not simply recovered from long-term memory as complete chunks (in contrast to Kuiper 1996, for example). They enumerated various reasons to suspect that producing routines involves some compositional processes. First, it can straightforwardly explain how people produce semi-productive routines with a variable element, as in take X to task, where X can be any noun phrase referring to a person or people. Second, the structure of non-idiomatic sentences can be primed by idiomatic sentences in production (Konopka and Bock 2009). Third, it is consistent with the production of idiom blends like That’s the way the cookie bounces (Cutting and Bock 1997). There is also evidence for syntactic processing of routines in comprehension. For example, syntactically appropriate continuations to phrases are responded to faster than syntactically inappropriate ones when the phrase is likely to be the beginning of an idiom (e.g., kick the) (Peterson et al. 2001). We now consider the implications of routinization for language acquisition and language change.

Box 4: History of a conversational routine

Below is an extract from a maze-game dialogue taken from Garrod and Anderson (1987), and which relates to the figure below. When B says It’s like a right indicator (11), the expression right indicator is not a routine, but is composed of two expressions whose interpretations are relatively standard, and whose meaning involves normal processes of meaning composition. So, B accesses the standard meanings of the words right and indicator and creates a phrase with the standard meaning. Importantly, however, B does not simply use right indicator to refer to any object that can be referred to as a right indicator, but instead uses it to refer to a particular type of object that occurs within this maze. A accepts this description with yes (12), presumably meaning that he has understood B’s utterance correctly. He then interprets A’s utterance at this stage using the normal processes of meaning corresponding to the compositional processes that A has used in production. The expression right indicator now keeps recurring, and is used to refer to positions in the maze. Whereas initially it was used as part of a simile [it’s like a right indicator in (11)], subsequently it is used referentially [that right indicator you’ve got in (15)].

  1. 8

    A: You know the extreme right, there’s one box.

  2. 9

    B: Yeah right, the extreme right it’s sticking out like a sore thumb.

  3. 10

    A: That’s where I am.

  4. 11

    B: It’s like a right indicator.

  5. 12

    A: Yes, and where are you?

  6. 13

    B: Well I’m er: that right indicator you’ve got.

  7. 14

    A: Yes.

  8. 15

    B: The right indicator above that.

  9. 16

    A: Yes.

  10. 17

    B: Now if you go along there. You know where the right indicator above yours is?

  11. 18

    A: Yes.

  12. 19

    B: If you go along to the left: I’m in that box which is like: one, two boxes down O.K.?

figure a

How does such routinization occur? Pickering and Garrod (2005) proposed that the activation of right and indicator plus the specific meaning that right indicator has acquired leads to the activation of the phonological representation and syntactic representation together with the activation of the specific meaning (“right-hand-protrusion-on-maze”). Therefore the links among the phonology, syntax and semantics are activated (as specified in the interactive alignment model). That increases the likelihood that the interlocutors are going to subsequently use right indicator with that specific meaning. But in addition to this basic interactive-alignment process, the activation of the links “suggest” the positing of a new long-term association, essentially that right indicator can have the meaning “right-hand-protrusion-on-maze”. We propose that when activation is strong enough, a new lexical entry (similar to a word) is constructed and stored in memory as a routine.

3.4.1 Routinization and Language Learning

So far we have focused on the establishment of temporary routines for the purpose of a particular interchange. This appears to be an important and almost entirely neglected aspect of language use. But routines need not be ‘dropped’ once the conversation is over. When this happens, the new lexical entry remains in the speaker’s lexicon.

In fact, experimental evidence suggests that routines do extend beyond the particular interchange. Garrod and Doherty (1994) had people play the maze game (see Box 4) with different partners. When all members of a group played with each other (e.g., A with B, C with D, then A with C, B with D, then A with D, B with C), they converged on description schemes (consisting of both fixed and semi-productive routines) to a much greater extent than when participants played with members of a different group on each interchange (e.g., A with B, C with D, A with C, A with E, B with F). In other words, interlocutors who formed a ‘network’ converged to a much greater extent than those who did not (and indeed converged more than those who played repeatedly with the same partner). This shows that they converged on description schemes that lasted beyond one interchange, and hence that the routinization of the schemes persisted. (Interestingly, this same convergence can be demonstrated for non-linguistic graphical communication among groups of communicators, see Box 5).

Box 5: Group convergence during graphical communication

Garrod et al. (2007) developed a non-linguistic communication task to study the emergence of novel graphical signs. The task was a laboratory version of the popular parlour game ‘Pictionary’. Participants would take turns to draw pictures of concepts drawn from a list in such a way that their partner could identify the concept from the same list. The process was then repeated over a number of blocks (within each block participants communicated 12 items from a list of 16). In the original version of the task Garrod et al. (2007) found that with repetition the drawings for the same item became increasingly simple and abstract and the 2 participants would end up depicting a given concept in the same way as each other (see bottom right panel in Fig. 3.3). Fay et al. (2010) developed a community version of this experiment similar to Garrod and Doherty (1994)’s community maze game study. Groups of 8 players carried out the ‘Pictionary’ task in successive pairs involving 7 rounds of play. Each round consisted of 6 blocks of trials with a new partner drawn from the same group. In this way, by the end of the experiment each member of the group had interacted graphically with each other member. The top panel of Fig. 3.3 shows the drawings from 1 group of players for 1 item (Brad Pitt). On the top left of the figure are drawings taken from the beginning of the first round for each of the original pairs and on the right top panel are drawings taken from the beginning of the final round. Whereas the original drawings are complex and varied, the final drawings are simple and homogenous. This suggests that interactive communication with members of a closed community leads to the evolution of a common representation whether it be a linguistic or a non-linguistic one.

Fig. 3.3
figure 3

Drawings of ‘Brad Pitt’ elicited by the ‘Pictionary’ task (Fay et al. 2010, see Box 5). The top left panel shows drawings from community pairs in the first round (1 and 2, 3 and 4, etc. before the community has been established), the top right panel shows drawings from the same individuals in the final round. The bottom panels show drawings from matched isolate pairs in the first and final rounds of the task

Garrod and Doherty (1994) showed that interlocutors who did not come from the same community failed to converge. In terms of our current proposal, this occurred because of a clash between routinization and priming: one participant’s routinized lexical entries may not match with the priming that occurs as a result of the other participant using a different lexical entry. In other words, if A has routinized a particular expression with partner B (e.g., right indicator) and now encounters partner C from a different community, then A’s routines will not correspond to B’s routines (e.g., B might have routinized T on its side for the same maze configuration). As a consequence after encountering a number of different partners from different communities each interlocutor’s tendency to use different routines will get in the way of the short-term priming process.

This suggests that the establishment of routines can be equated with the processes that take place during language learning. In particular, the process by which children set down representations for novel words and expressions may be akin to routinization. However, we need to explain why routinization might lead to large-scale vocabulary acquisition, when it clearly extends adults’ store of expressions to a much more limited extent.

Of course, children encounter new words much more often than adults. But in addition, young children are much more set up to accept novel pairings between form and meaning (and grammar, though we ignore this here) than adults. In other words, the links between the components of linguistic representations are particularly strong. This can be seen in the strong tendency children have to avoid synonyms (e.g., Clark 1993). For example, if a young child refers to particular footwear as boots she will tend not to accept the term shoes to refer to the same objects. This is compatible with a particularly strong link being set up between the word and a particular meaning. Garrod and Clark (1993) found that children (aged 7–8 years) playing the maze game (Box 4) would converge on referring expressions and description schemes to refer to maze positions to at least as great an extent as adults. But they were much less happy than adults to abandon those referring schemes when it became clear that they were leading to misunderstanding. Garrod and Clark interpreted this result as showing that the natural tendency for the children is to converge (as predicted by interactive alignment) and it is only as they mature that they are able to inhibit this tendency when required to do so.

Such commitment to particular form-meaning pairings is efficient both for processing and acquisition. For processing, it means that the space of alternatives that the child has to consider is rapidly reduced. But it has the difficulty that it reduces the ability of the child to express a wider range of concepts (assuming that synonyms can have slight differences in meaning, or can have differences imposed for particular interchanges) and to comprehend the full range of meanings that a speaker expresses. These problems do not of course matter so much if the speaker (the “parent”) is aware of the child’s limitations, and (for instance) employs a limited vocabulary. For acquisition, if novel lexical items follow from the fixation of form-meaning pairings, then children will establish new routines more easily than adults. If a child hears right indicator being used to refer to a bit sticking out from a maze, then she will establish the link between right indicator and its meaning in such a way that she will be unable to accept another term to refer to the same thing. We have argued that this occurs in adults too, but the assumption is that adults can abandon such conventions more straightforwardly than children. This means that adults’ conversation is more flexible than children’s, but that the establishment of novel items is more straightforward for children.

3.4.2 Routinization and Language Change

Moving to a larger time-scale languages undergo historical change. Expressions come into the language and drop out of it and may change as a consequence of usage (Labov 1994, see also Croft, this volume). Can interactive alignment and routinization tell us anything about this process?

A key issue in the study of language change is explaining how changes in the language can spread within and across generations of speakers. Kirby (1999) refers to this as the problem of linkage. In biological evolution, linkage occurs through the inheritance of genes from one generation to the next. The traditional linguistic analogy is to explain linkage through the passing down of a language from one generation to the next during its acquisition. It is then assumed that language change is determined by constraints (which Kirby 1999, calls the linguistic bottleneck) that apply to the language learning mechanism (see also Kirby, this volume). However, interactive alignment and routinization offer an alternative linkage mechanism associated with language use. In the same way that experimental communities of speakers establish their own routines over the course of repeated interactions, so real communities of speakers can establish and maintain routines as well. Hence, one kind of language variation is found in what Clark (1996) calls communal lexicons—particular sets of expressions associated with different communities. For example, skiers talk of piste, physicists of quarks, statisticians have a special interpretation of significance and normal distribution. This kind of variation would be expected if each community establishes its own routines.

As we have argued, routines can be considered lexicalisations, bits of language stored and accessed directly from memory. One important topic in the study of language change is the emergence and maintenance of simple and complex lexicalisations. Take for example, the process of grammaticalization in which lexical elements increasingly take on grammatical functions. A good example of this is the evolution in English of the complex future auxillary going to from the simple lexical verb of motion going, which may even become reduced to the simple gonna (Hopper and Traugott 1993). This historical process follows a similar pattern to that of routinization in dialogue. Initially, an expression takes on a contextually determined interpretation (in this case with reference to a future action presumably involving motion). This expression-meaning mapping then becomes fixed and eventually generalizes to other analogous future actions that do not involve motion. As soon as it becomes fixed in this way it becomes a routine which can be reduced like any other lexicalisation with repeated usage (e.g., becoming the simple lexical item gonna). The important distinction between this account of language change and the more traditional acquisition-based account is that the evolutionary process arises from usage rather than constraints on learning, because the linkage is through interactive alignment and routinization. For a more detailed discussion of how frequency of usage relates to processes of grammaticalization we refer the reader to Bybee (2006).

Another evolutionary phenomenon in English concerns the steady loss of irregular verb forms. Here the problem is somewhat different from that of the going to auxillary. Over the years irregular past tense verbs such as mown have been replaced by their regular counterparts in English (mowed). Interestingly, this regularization process is sensitive to the frequency of use of the verb, with recent research suggesting that verbs regularize at a rate that is inversely proportional to the square root of their usage frequency (Lieberman et al. 2007). How can this be explained? If we consider irregular expressions as lexicalised routines, this may help to explain the circumstances in which they are lost. On our account speakers use routines because they can be accessed directly from memory, thereby bypassing the complex decisions of non-routine language production. However, this is only beneficial if the routine is readily accessible (see Wonnacott, this volume, for discussion of dual-route models of production). In other words, if accessing the routine (e.g., mown) takes longer than formulating the full form (e.g., \(\text{ MOW }\,+\,\)-ED), or if speakers fail to access it at all on occasion, then it will fall out of use to be replaced by the non-routine regular form. Again Bybee (2006) gives a detailed account of how the process of regularization can be explained in terms of the probability of retrieving stored representations.

3.5 Summary and Conclusions

We began the chapter with the observation that taking part in a conversation is more straightforward than speaking or listening in isolation, despite the apparent complexity of the process. We went on to explain this paradox in relation to an account of dialogue processing called interactive alignment. Interactive alignment arises from automatic priming processes that link production with comprehension and vice versa. The essential notion is that people prime each other to use similar expressions at many linguistic levels simultaneously. This kind of alignment of speaking leads in turn to alignment at the level of deeper representations including the situation model adopted by the conversational partners. Because such alignment of situation models is the hallmark of successful communication, the interactive alignment process, operating during dialogue, greatly facilitates communication.

Interactive alignment also enables conversational partners to adapt to each other. Such adaptation happens both at a local level with speakers and listeners adopting each others’ grammar and meaning in consecutive utterances and over longer time-scales. The longer term alignment occurs through a process of routinization with speakers and listeners creating routines or partially frozen expressions. We argued that this longer term alignment may be a central mechanism both for the acquisition of language and processes of historical language change.