1 Introduction

Inference has been an important research topic in both theoretical linguistics and statistics. Yet, their traditional treatments are quite different: dominant theories in linguistics (pragmatics) propose machinery as to how meanings are conveyed beyond the truth-conditional semantics (e.g., Grice’s work on conversational implicature [2]), while in statistics, inference is seen as the process of estimating the values of the parameters of the relevant model. Apparently, their collaboration has not been as successful as has been expected.

An example is given in (1). The main job of -mas is to convey the speaker’s respect for the addressee.

  1. (1)

    Watasi-wa anata-o    sonkei    si-mas-en    yo.

    I-top      you-acc respect do-ah-neg sfp

  2. (i)

    I do not respect you;

  3. (ii)

    the speaker respects the addressee.

Simple as it may sound at first, this suffix gives several challenges to the extant theories of linguistics. Among such, we concentrate on two main issues. First, from the morphosyntactic perspective, -mas appears in an unexpected position. In general, discourse-oriented elements are known to appear at the edge of a sentence, which is borne out by Korean and Thai AHMs. But -mas does not, as shown in (1), contradicting the above generalization. Second, its meaning needs a careful treatment. The main message of (1) is “I do not respect you”; -mas, however, apparently carries the opposite information. Yet the sentence is acceptable, which makes researchers wonder if its meaning is different from the descriptive layer of the meaning. But if so, what semantic representation (denotation) is appropriate?

in statistics.

In the following, we see how the study provides an answer to these questions. The first problem is discussed as a result of an agreement. The second issue is involved with multidimentionality of the meaning. Of course, agreement and multidimentionality are not new, but this study uses them in a developed fashion; in particular, the study proposes a pragmatic model in which insights from Bayesian statistics are incorporated.

2 Morphosyntax

Regarding the (morpho)syntactic treatment of -mas, Miyagawa’s (2012, 2017) work has been the most influential. Cross-linguistically, discourse-oriented expressions distribute in sentence periphery (aka, the performative hypothesis). Respecting this generalization, he proposes the representation in (2). SAP and saP are called speech act phrases, in which the speaker and addressee get representations in the syntax, an assumption commonly adopted in the recent literature. He, then, identifies the position of -mas as the C(omplement-izer) head (which assumes to move to sa).

figure a

By relating -mas to a head in sentence periphery, he maintains the performative hypothesis. But as a drawback, he fails to capture the morphological order; -mas is followed by (= is lower than) the negation marker, which is assumed to be lower than the Tense Phrase (TP), let alone Complementizer Phrase (CP).

To avoid this undesired side effect, the dissertation advocates an agreement analysis. The structure in (2) is respected and adopted, but unlike Miyagawa, an honorific feature is on the addressee, which establishes an agreement relation with the honorific feature that appears in a position inside TP. The phonological exponent (i.e., -mas) is analyzed as the realization of the latter feature, whilst the feature we interpret at the logical form is the highest feature. The basic generalization of performative hypothesis is maintained thusly.

figure b

This analysis yields some other fruitful results. First, previous studies have analyzed subject-/object-honorific markers as an agreement. [7] The above analysis is, thus, seen as a natural extension of the extant theories of honorifics: honorifics are all involved with agreement, but they differ in the target. Second, being an agreement, -mas is predicted to be phase-sensitive; i.e., it cannot be licensed when it is deeply embedded. As in (4), this prediction is borne out.


(4)

  1. a.

    [kare-ga ik-u-koto]-o      sitteiru.

    he-nom go-prs-C-acc know

  2. b.

    *[kare-ga iki-mas-u-koto]-o sitteiru.

    he-nom    go-prs-ah-C-acc know

    ‘I know that he goes.’

Now that we have solved the major problem in morphosyntax, let us turn to the semantico-pragmatic issue: how the feature on addressee is interpreted, and affects our inference.

3 Semantics and Pragmatics

In the semantics literature, honorificity has been treated as an instance of conventional implicature (CI). The most influential, mainstream account for the treatment of honorificity is the interval approach [4, 5, 10]. The gist of this approach is summarized in (5).


(5)

  1. a.

    The honorific meaning lies in the expressive dimension (Multidimensional semantics).

  2. b.

    Origo and Target must be identified.

  3. c.

    A sentence encodes an interval representing the honorificity of the given sentence.

  4. d.

    A context stores an interval representing the honorificity of the given context.

  5. e.

    The sentence contributes to update the context honorific state.

First, as shown by (1), the respect of honorificity is orthogonal to the at-issue meaning (the main message). To explain this, several meaning strata have been proposed (aka, multidimensional semantics), and the honorificity is proposed to exist in the expressive dimension, distinct from the main content of the sentence.

Second, the respect-bearer (Origo) and its target are identified. The performative hypothesis as we have seen in Sect. 2 is a syntactic attempt of relating honorificity with the speaker and the addressee, which naturally leads to the property given in (5)b.

Third, the honorificity of a sentence is seen as emotional intensity, and is modeled as an interval. The denotation in (6) is an example of such an attempt. It acts as an identity function in terms of the at-issue meaning, but it adds information about the honorific relation in the expressive plane: sp and addr are the speaker and addressee; I is the interval representing the honorific range (e.g., [0.75, 0.9] for a high honorific range).

  1. (6)

    \(\llbracket \textsc {addressee}_{\textsc {hon}} \rrbracket = \lambda p.\ p\ \bullet <sp, I, addr>\)

Fourth, the context also stores an honorific interval. In the tradition of dynamic semantics/pragmatics, a context c is modeled as a structured set of subcomponents (let us call it p): i.e., \(c=<\cdots , p>\)

Finally, on the basis of the honorific interval proposed by the sentence, the context interval gets updated in an appropriate manner. Potts [10] and McCready [4, 5] differ in how the context interval is updated. But they both assume that the context update leads to a local update. Schematically, this is expressed by the following formula: the i-th state of the context honorific state \(p_i\) is determined only by the state \(p_{i-1}\), \(h_i\) and nothing else. As shown in (8), the i-th context discards the previous state \(p_{i-1}\), and in place of that it newly possesses the newest state \(p_i\): n.b., cg, qs, and tdl stand for the common ground, the question set, and the to-do list, respectively.

  1. (7)

    \(p_i=p_{i-1}+h_i\)

(8)

  1. a.

    \(c_{i-1} = <cg_{i-1}, qs_{i-1}, tdl_{i-1}, ..., p_{i-1}>\)

    \(\downarrow \) update triggered by the i-th utterance

  2. b.

    \(c_i = <cg_i, qs_i, tdl_i, ..., p_i>\)

Figure 1 graphically represents the main idea of the interval approach. Every time an utterance has been produced, a context change takes place.

Fig. 1
figure 1

Interval-based approaches

Fig. 2
figure 2

Cumulative effect

3.1 Problems of Previous Studies

While modeling the performative aspect of the use of an AHM, the interval approach fails to capture important properties of the inference system.

The first is the cumulative effect. The formula in (7) guarantees the locality of the update: \(p_{i-2}\), which is used to determine the state of \(p_{i-1}\), is no longer used for the state of \(p_{i}\). On the one hand, the clearance of the memory seems to be a good property of the model of human inference: we do not remember all the past states. Readers, on the other hand, would reasonably wonder if the inference of our real world is affected by the states prior to \(p_{i-1}\), not entirely, but to some extent.

To articulate the doubt, let us compare the situations in Figures 1 and 2. The intervals in \(p_i\) are identical. Thus, interval approach predicts that with the same \(h_i\), they result in the same interval for \(p_{i+1}\). However, due to the history of the low range intervals prior to the state \(p_i\), we wish \(p_{i+1}\) in Fig. 2 to be located (at least, slightly) lower than the one in Fig. 1. If we reflect this difference, we need to somehow relativize \(p_{i+1}\) with respect to the accumulated history of the past conversation. Certainly, if we remember all the states up to \(p_i\), the problem can be circumvented, but as a drawback, the inference system encounters a serious memory load, when i gets bigger, which is counter-intuitive.

The second is the learnability issue. For the Interval Approach, an interval is assumed not only for the contextual information (p), but also for the semantics of an honorific expression h. As a result, the denotation of the honorificity must be given with a precise interval range, such as [0.5, 0.75]. But no one can justify why this interval is better than the other, for example, than [0.5, 0.749]; it should be that these values are just proposed for purposes of explanation. Without any external criterion, we cannot truly identify the semantics of honorificity; that is, the interval is never learnable. Furthermore, -mas is either present or absent, making a binary system. Thus “we will need a theory of the relation between the simple grammatically encoded oppositions and the complex social world [9].”

3.2 From an Interval to a Set of Summary Parameters

We wish to consider the honorific update as a local change, specifying the relation between \(p_i\) and \(p_{i+1}\). Yet we want to make the update somewhat sensitive to the past entire conversation. To settle the dilemma, the dissertation replaces (5)c-d with the followings.

(9)

  1. a.

    A sentence encodes 1 or 0 for the honorificity of the given sentence.

  2. b.

    A context stores a set of summary parameters, reflecting the speaker’s use of AHMs in the past conversation.

Rather than tracking the politeness range by keeping estimating an interval, we track the values of summary parameters. For example, let \(\alpha _i\) be the number of AHMs used priors to the i-the utterance, and \(\beta _i\) be the number of non-AHMs. A simple model is \(p_i = (\alpha _i, \beta _i)\). The context update to \(p_{i+1}\) is now either \((\alpha _i + 1, \beta _i)\) (when the i-th utterance contains -mas) or \((\alpha _i, \beta _i + 1)\) (otherwise). Under this model, the denotation, \(\llbracket \textsc {addr}_{\textsc {hon}} \rrbracket \) is no longer an unlearnable interval, but it is either \(<sp, 1, addr>\) and \(<sp, 0, addr>\): if we see 1, we change \(\alpha \); if not, we update \(\beta \).

How does this new model overcome the problems? First, as for the cumulative effect, \(p_{i+1}\) is only determined by \(p_i\) and \(h_i\). Yet by looking at the magnitude of \(\alpha \) and \(\beta \), we can reconstruct the past history, and the previous states cumulatively and indirectly contribute to determining the state of \(p_{i+1}\). Second, as for the learnability, the proposed denotation does not involve any kind of gradualness; it is uniquely identified.

Fig. 3
figure 3

Board game pragmatics

3.3 From Summary Parameters to a Statistical Learning

The idea of summary parameters comes as a result of the pursuit of an appropriate model in pragmatics, but it can receive different interpretations beyond linguistics. Let \(\pi \) be the probability of the speaker’s using -mas. It is proven that the honorific value (1 or 0) is seen as a sample from \(Bernoulli (\pi )\). When we use a Beta distribution for the audience’s uncertainty of the speaker’s \(\pi \), the summary parameters are interpreted as the parameters for the Beta distribution: \(Beta(\alpha , \beta )\). Upon this view, the discourse update from \(p_i\) to \(p_{i+1}\) is a transition from the prior distribution \(Beta(\alpha _i, \beta _i)\) to the posterior distribution \(Beta(\alpha _{i+1}, \beta _{i+1})\). We can thusly synthesize the pragmatic model with Bayesian statistics, two independent fields of research otherwise and previously disconnected.

Geometrically, the update of summary parameters is seen as a movement in a space. In particular, from the standpoint of Information Geometry, it is nothing more than the process of manifold learning [1]. As shown in Fig. 3 (left), for any \(p_i\), we can uniquely identify a position in the two-dimensional space. From there, we move to an adjacent cell; from (0, 0), we can move to (1, 0), then we can move to (1, 1). Figure 3 (right) shows potential positions we are in after 20 steps from A. With 20 non-AHMs, we are in B, and with 20 AHMs, we are in C. The distance from the origin (A) reflects the length of the past conversation. We would benefit from the board game metaphor, as it facilitates our understanding by visualization. For the details, I invite the reader to the original dissertation.

4 Theoretical Implications

Not all inferences are made by linguistic cues. However, many inferences are driven by our verbal communication, and hence studies examining how our inference is related to linguistic expressions are of great importance. Truly, a tremendous development has been reported in statistics, and this is a welcoming result by itself. But mathematical models themselves do not tell us how the inference is based on our verbal cues. Linguists, too, have not attempted to incorporate statistical algorithms into the theory. Thus, it has been unclear how a linguistic element contributes to statistical reasoning process, despite glorious developments in engineering studies. By improving the discourse model proposed in the tradition of dynamic pragmatics, this dissertation situates statistical learning within the pragmatic inference, opening a doorway to the interaction between theoretical linguistics and statistics.

Not to mention it, this study has large room for improvement. A necessary improvement is to provide a more suitable structure by linking \(\pi \) with a set of social and pragmatic factors (e.g., probit/logistic regressions). Since honorific uses are subject to psychological distance, social hierarchy and formality [4, 5], with an appropriate data set, building and comparing different statistical models would be of theoretical importance, and easily fit into the tradition of NLP studies. Another future development would be to expand the synthesis beyond the dimension of honorificity. In the classic dynamic pragmatics, context sets and other kinds of discourse information have been modeled on the basis of possible worlds. The success in extending the model is expected to create a system mimicking the human inference in a non-black-box fashion—as has been done in deep learning—and would improve our understanding of the human cognitive system to a substantial degree.