Keywords

1 Introduction

In this paper, we present a further development of the analytic tableau system for natural logic introduced by Muskens in [12]. The main goal of [12] was to initiate a novel formal method of modeling reasoning over linguistic expressions, namely, to model the reasoning in a signed analytic tableau system that is fed with Lambda Logical Forms (LLFs) of linguistic expressions. There are three straightforward advantages of this approach:

  1. (i)

    since syntactic trees of LLFs roughly describe semantic composition of linguistic expressions, LLFs resemble surface forms (that is characteristic for natural logic); hence, obtaining LLFs is easier than translating linguistic expressions in some logical formula where problems of expressiveness of logic and proper translation come into play;

  2. (ii)

    the approach captures an inventory of inference rules (where each rule is syntactically or semantically motivated and is applicable to particular linguistic phrases) in a modular way;

  3. (iii)

    a model searching nature of a tableau method and freedom of choice in a rule application strategy seem to enable us to capture quick inferences that humans show over linguistic expressions.

The rest of the paper is organized as follows. First, we start with the syntax of LLFs and show how terms of syntactic and semantic types can be combined; then we briefly discuss a method of obtaining LLFs from surface forms as we aim to develop a wide-coverage natural tableau system (i.e. a tableau prover for natural logic). A combination of automatically generated LLFs and an implemented natural tableau prover makes it easy to extract a relevant set of inference rules from the data used in textual entailment challenges. In the end, we present the performance of the prover on several training data sets. The paper concludes with a discussion of further research plans.

Throughout the paper we assume the basic knowledge of a tableau method.

2 Lambda Logical Forms

The analytic tableau system of [12] uses LLFs as logical forms of linguistic expressions. They are simply typed \(\lambda \)-terms with semantic types built upon \(\{e, s, t\}\) atomic types. For example, in [12] the LLF of no bird moved is (1) that is a term of type st.Footnote 1 As we aim to develop a wide-coverage tableau for natural logic, using only terms of semantic types does not seem to offer an efficient and elegant solution. Several reasons for this are given below.

$$\begin{aligned}&(\mathtt{no}_{(est)(est)st} \; \mathtt{bird}_{est}) \; \mathtt{moved}_{est} \end{aligned}$$
(1)
$$\begin{aligned}&(\mathbf{no}_{\mathtt{n},(\mathtt{np},\mathtt{s}),\mathtt{s}} \; \mathbf{bird}_{\mathtt{n}}) \; \mathbf{moved}_{\mathtt{np},\mathtt{s}} \end{aligned}$$
(2)

First, using only terms of the semantic types will violate the advantage (i) of the approach. This becomes clear when one tries to account for event semantics properly in LLFs as it needs an introduction of an event entity and closure or existential closure operators of [3, 14] that do not always have a counterpart on a surface level.

Second, semantic types provide little syntactic information about the terms. For instance, bird \(_{est}\) and moved \(_{est}\) are both of type est in [12], hence there is no straightforward way to find out their syntactic categories. Furthermore, \(M_{(est)est} H_{est}\) term can stand for adjective and noun, adverb and intransitive verb, or even noun and complement constructions. The lack of syntactic information about a term makes it impossible to find a correct tableau rule for the application to the term, i.e. it is difficult to meet property (ii). For example, for \(A_{est} B_e\), it would be unclear whether to use a rule for an intransitive verb that introduces an event entity and a thematic relation between the event constant and \(B_e\); or for \(M_{(est)set} H_{est}\) whether to use a rule for adjective and noun or noun and complement constructions.Footnote 2

Finally, a sentence generated from an open branch of a tableau proof can give us an explanation about failure of an entailment, but we will lose this option if we stay only with semantic types as it is not clear how to generate a grammatical sentence using only information about semantic types.Footnote 3

In order to overcome the lack of syntactic information and remain LLFs similar to surface forms, we incorporate syntactic types and semantic types in the same type system. Let \({\mathcal {A} = \{e,t,\mathtt{s},\mathtt{np},\mathtt{n},\mathtt{pp}\}}\) be a set of atomic types, where \(\{e,t\}\) and \(\{\mathtt{s},\mathtt{np},\mathtt{n},\mathtt{pp}\}\) are sets of semantic and syntactic atomic types, respectively. Choosing these particular syntactic types is motivated by the syntactic categories of Combinatory Categorial Grammar (CCG) [13]. In contrast to the typing in [12], we drop s semantic type for states for simplicity reasons. Let \(\mathcal {I_A}\) be a set of all types, where complex types are constructed from atomic types in a usual way, e.g., \({(\mathtt{np},\mathtt{np},\mathtt{s})}\) is a type for a transitive verb. A type is called semantic or syntactic if it is constructed purely from semantic or syntactic atomic types, respectively; there are also types that are neither semantic nor syntactic, e.g., \({ee\mathtt{s}}\). After extending the type system with syntactic types, in addition to (1), (2) also becomes a well-typed term. For better readability, hereafter, we will use a boldface style for lexical constant terms with syntactic types.

The interaction between syntactic and semantic types is expressed by a subtyping relation (\(\sqsubseteq \)) that is a partial order, and for any \( \alpha _1,\alpha _2,\beta _1,\beta _2 \in \mathcal {I_A}\):

  1. (a)

    \( e \sqsubseteq \mathtt{np},~~ \mathtt{s}\sqsubseteq t,~~ \mathtt{n}\sqsubseteq et,~~ \mathtt{pp}\sqsubseteq et\);

  2. (b)

    \((\alpha _1,\alpha _2) \sqsubseteq (\beta _1,\beta _2)\)  iff  \(\beta _1 \sqsubseteq \alpha _1\) and \(\alpha _2 \sqsubseteq \beta _2\)

The introduction of subtyping requires a small change in typing rules, namely, if \(\alpha \sqsubseteq \beta \) and A is of type \(\alpha \), then A is of type \(\beta \) too. From this new clause it follows that a term \(A_\alpha B_\beta \) is of type \(\gamma \) if \(\alpha \sqsubseteq (\beta ,\gamma )\). Therefore, a term can have several types, which are partially ordered with respect to \(\sqsubseteq \), with the least and greatest types. For example, a term love \(_{\mathtt{np},\mathtt{np},\mathtt{s}}\) is also of type eet (and of other five types too, where (np,np,s) and eet are the least and greatest types, respectively). Note that all atomic syntactic types are subtypes of some semantic type except \(e \sqsubseteq \mathtt{np}\). The latter relation, besides allowing relations like \((\mathtt{np},\mathtt{s}) \sqsubseteq et\), also makes sense if we observe that any entity can be expressed in terms of a noun phrase (even if one considers event entities, e.g., singing \(_{\text{ NP }}\) is difficult).

Now with the help of this multiple typing it is straightforward to apply \(\mathbf{love}_{\mathtt{np},\mathtt{np},\mathtt{s}}\,\mathbf{mary}_{\mathtt{np}}\) term to \(c_e\) constant, and there is no need to introduce new terms \(\mathtt{love}_{eet}\) and \(\mathtt{mary}_{e}\) just because \(\mathtt{love}_{eet}\,\mathtt{mary}_{e}\) is applicable to \(c_e\). For the same reason it is not necessary to introduce \(\mathtt{man}_{et}\) for applying to \(c_e\) as \(\mathbf{man}_\mathtt{n}c_e\) is already a well-formed term. From the latter examples, it is obvious that some syntactic terms (i.e. terms of syntactic type) can be used as semantic terms, hence minimize the number of terms in a tableau. Nevertheless, sometimes it will be inevitable to introduce a new term since its syntactic counterpart is not able to give a fine-grained semantics: if \(\mathbf{red}_{\mathtt{n},\mathtt{n}} \mathbf{car}_\mathtt{n}c_e\) is evaluated as true, then one has to introduce \(\mathtt{red}_{et}\) term in order to assert the redness of \(c_e\) by the term \(\mathtt{red}_{et} c_e\) as \(\mathbf{red}_{\mathtt{n},\mathtt{n}} c_e\) is not typable. Finally, note that terms of type \(\mathtt{s}\) can be evaluated either as true or false since they are also of type t.

Incorporating terms of syntactic and semantic types in one system can be seen as putting together two inference engines: one basically using syntactically-rich structures, and another one semantic properties of lexical entities. Yet another view from Abstract Categorial Grammars [7] or Lambda Grammars [11] would be to combine abstract and semantic levels, where terms of syntactic and semantic types can be seen as terms of abstract and semantic levels respectively, and the subtyping relation as a sort of simulation of the morphism between abstract and semantic types.Footnote 4

3 Obtaining LLFs from CCG Trees

Automated generation of LLFs from unrestricted sentences is an important part in the development of the wide-coverage natural tableau prover. Combined with the implemented tableau prover, it facilitates exploring textual entailment data sets for extracting relevant tableau rules and allows us to evaluate the theory against these data sets.

We employ the C&C tools [4] as an initial step for obtaining LLFs. The C&C tools offer a pipeline of NLP systems like a POS-tagger, a chunker, a named entity recognizer, a super tagger, and a parser. The tools parse sentences in CCG framework with the help of a statistical parser. Altogether the tools are very efficient and this makes them suitable for wide-coverage applications [1]. In the current implementation we use the statistical parser that is trained on the rebanked version of CCGbank [8].

Fig. 1.
figure 1

A parse tree of several delegates got the results published. by the C&C parser

In order to get a semantically adequate LLF from a CCG parse tree (see Fig. 1), it requires much more effort than simply translating CCG trees to syntactic trees of typed lambda terms. There are two main reasons for this complication: (a) a trade-off that the parser makes while analyzing linguistic expressions in order to tackle unrestricted texts, and (b) accumulated wrong analyses in final parse trees introduced by the various C&C tools.

For instance, the parser uses combinatory rules that are not found in the CCG framework. One of such kind of rules is a lexical rule that simply changes a CCG category, for example, a category N into NP (see lx[npn] combinatory rule in Fig. 1) or a category \(S\backslash NP\) into \(N\backslash N\). The pipeline of the tools can also introduce wrong analyses at any stage starting from the POS-tagger (e.g., assigning a wrong POS-tag) and finishing at the CCG parser (e.g., choosing a wrong combinatory rule). In order to overcome (at least partially) these problems, we use a pipeline consisting of several filters and transformation procedures. The general structure of the pipeline is the following:

  • Transforming a CCG tree into a CCG term: the procedure converts CCG categories in types by removing directionality from CCG categories (e.g., \(S\backslash NP / NP\) \(\leadsto \) \((\mathtt{np},\mathtt{np},\mathtt{s})\)) and reordering tree nodes in a corresponding way.

  • Normalizing the CCG term: since an obtained CCG term can be considered as a typed \(\lambda \)-term, it is possible to reduce it to \(\beta \eta \)-normal form.Footnote 5

  • Identifying proper names: if both function and argument terms are recognized as proper names by the C&C pipeline, then the terms are concatenated; for instance, \(\mathbf{Leonardo}_{\mathtt{n},\mathtt{n}} (\mathbf{da}_{\mathtt{n},\mathtt{n}} \mathbf{Vinci}_{\mathtt{n}})\) is changed in a constant term \(\mathbf{Leonardo\_da\_Vinci}_\mathtt{n}\) if all three terms are tagged as proper names.

  • Identifying multiword expressions (MWE): the CCG parser analyzes in a purely compositional way all phrases including MWEs like a lot of, take part in, at least, etc. To avoid these meaningless analyses, we replace them with constant terms (e.g., \(\mathbf{a\_lot\_of}\) and \(\mathbf{take\_part\_in}\)).

  • Correcting syntactic analyses: this procedure is the most complex and extensive one as it corrects a CCG term by inserting, deleting or replacing terms. For example, the type shifts like \(\mathtt{n}\!\leadsto \!\mathtt{np}\) are fixed by inserting corresponding determiners (e.g., \((\mathbf{oil}_{\mathtt{n}})_{\mathtt{np}}\) \(\leadsto \) \(\mathbf{a}_{\mathtt{n},\mathtt{np}} \mathbf{oil}_{\mathtt{n}}\)) or by typing terms with adequate types (e.g., \((\mathbf{Leonardo\_da\_Vinci}_\mathtt{n})_\mathtt{np}\) \(\leadsto \) \(\mathbf{Leonardo\_da\_Vinci}_\mathtt{np}\) and \((\mathbf{several}_{\mathtt{n},\mathtt{n}} \mathbf{delegate}_\mathtt{n})_\mathtt{np}\) \(\leadsto \) \(\mathbf{several}_{\mathtt{n},\mathtt{np}} \mathbf{delegate}_\mathtt{n}\)). More extensive corrections, like fixing a syntactically wrong analysis of a relative clause, like (3), are also performed in this procedure.

  • Type raising of quantifiers: this is the final procedure, which takes a more or less fixed CCG term and returns terms where quantified noun phrases of type \(\mathtt{np}\) have their types raised to \(((\mathtt{np},\mathtt{s}),\mathtt{s})\). As a result several LLFs are returned due to a scope ambiguity among quantifiers. The procedure makes sure that generalized quantifiers are applied to the clause they occur in if they do not take a scope over other quantifiers. For example, from a CCG term (4) only (5) is obtained and (6) is suppressed.

$$\begin{aligned}&\mathbf{old}_{\mathtt{n},\mathtt{n}} (\mathbf{who}_{(\mathtt{np},\mathtt{s}),\mathtt{n},\mathtt{n}} \mathbf{cry}_{\mathtt{np},\mathtt{s}} \mathbf{man}_\mathtt{n}) \leadsto \mathbf{who}_{(\mathtt{np},\mathtt{s}),\mathtt{n},\mathtt{n}} \mathbf{cry}_{\mathtt{np},\mathtt{s}} (\mathbf{old}_{\mathtt{n},\mathtt{n}} \mathbf{man}_\mathtt{n}) \end{aligned}$$
(3)
$$\begin{aligned}&\mathbf{and}_{\mathtt{s},\mathtt{s},\mathtt{s}} \big ( \mathbf{sleep}_{\mathtt{np},\mathtt{s}} \mathbf{john}_{\mathtt{np}}\big ) \big (\mathbf{snore}_{\mathtt{np},\mathtt{s}} (\mathbf{no}_{\mathtt{n},\mathtt{np}} \mathbf{man}_{\mathtt{n}})\big ) \end{aligned}$$
(4)
$$\begin{aligned}&\mathbf{and}_{\mathtt{s},\mathtt{s},\mathtt{s}} ( \mathbf{sleep}_{\mathtt{np},\mathtt{s}} \mathbf{john}_{\mathtt{np}}) (\mathbf{no}_{\mathtt{n},(\mathtt{np},\mathtt{s}),\mathtt{s}} \mathbf{man}_{\mathtt{n}} \mathbf{snore}_{\mathtt{np},\mathtt{s}}) \end{aligned}$$
(5)
$$\begin{aligned}&\mathbf{no}_{\mathtt{n},(\mathtt{np},\mathtt{s}),\mathtt{s}} \mathbf{man}_{\mathtt{n}} \big (\lambda x.\,\mathbf{and}_{\mathtt{s},\mathtt{s},\mathtt{s}} ( \mathbf{sleep}_{\mathtt{np},\mathtt{s}} \mathbf{john}_{\mathtt{np}}) (\mathbf{snore}_{\mathtt{np},\mathtt{s}} x_{\mathtt{np}})\big ) \end{aligned}$$
(6)

The above described pipeline takes a single CCG tree generated from the C&C tools and returns a list of LLFs. For illustration purposes CCG term (7), which is obtained from the CCG tree of Fig. 1, and two LLFs, (8) and (9), generated from (7) are given below; here, \(\mathtt{vp}\) abbreviates \((\mathtt{np},\mathtt{s})\) and \(\mathbf{s}_{\mathtt{n},\mathtt{vp},\mathtt{s}}\) term stands for the plural morpheme.

$$\begin{aligned}&\mathbf{got}_{\mathtt{np},\mathtt{vp}} \Big ( \mathbf{s}_{\mathtt{n},\mathtt{np}} \big (\mathbf{who}_{\mathtt{vp},\mathtt{n},\mathtt{n}} (\mathbf{be}_{\mathtt{vp},\mathtt{vp}} \mathbf{publish}_\mathtt{vp}) \mathbf{result}_\mathtt{n}\big ) \Big ) \Big ( \mathbf{several}_{\mathtt{n},\mathtt{np}} \mathbf{delegate}_{\mathtt{n}} \Big )&\end{aligned}$$
(7)
$$\begin{aligned}&\mathbf{several}_{\mathtt{n},\mathtt{vp},\mathtt{s}} \mathbf{delegate}_{\mathtt{n}} \Big (\lambda x.\, \mathbf{s}_{\mathtt{n},\mathtt{vp},\mathtt{s}} \big (\mathbf{who}_{\mathtt{vp},\mathtt{n},\mathtt{n}} (\mathbf{be}_{\mathtt{vp},\mathtt{vp}} \mathbf{publish}_{\mathtt{vp}}) \mathbf{result}_\mathtt{n}\big ) \big (\lambda y.\, \mathbf{got}_{\mathtt{np},\mathtt{vp}} \, y_\mathtt{np}\, x_\mathtt{np}\big )\Big )&\end{aligned}$$
(8)
$$\begin{aligned}&\mathbf{s}_{\mathtt{n},\mathtt{vp},\mathtt{s}} \big (\mathbf{who}_{\mathtt{vp},\mathtt{n},\mathtt{n}} (\mathbf{be}_{\mathtt{vp},\mathtt{vp}} \mathbf{publish}_{\mathtt{vp}}) \mathbf{result}_\mathtt{n}\big ) \big ( \lambda x.\, \mathbf{several}_{\mathtt{n},\mathtt{vp},\mathtt{s}} \mathbf{delegate}_{\mathtt{n}} (\mathbf{got}_{\mathtt{np},\mathtt{vp}} \, x_\mathtt{np})\big ) \end{aligned}$$
(9)

4 An Inventory of Natural Tableau Rules

The first collection of tableau rules for the natural tableau was offered in [12], where a wide range of rules are presented including Boolean rules, rules for algebraic properties (e.g., monotonicity), rules for determiners, etc. Despite this range of rules, they are insufficient for tackling problems found in textual entailment data sets. For instance, problems that only concentrate on quantifiers or Boolean operators are rare in the data sets. Syntactically motivated rules such as rules for passive and modifier-head constructions, structures with the copula, etc. are fruitful while dealing with wide-coverage sentences, and this is also confirmed by the problems found in entailment data sets. It would have been a quite difficult and time-consuming task to collect these syntactically motivated tableau rules without help of an implemented prover of natural tableau. For this reason the first thing we did was to implement a natural tableau prover, which could prove several toy entailment problems using a small inventory of rules mostly borrowed from [12].Footnote 6 With the help of the prover, then it was easier to explore manually the data sets and to introduce new rules in the prover that help it to further build tableaux and find proofs.

During collecting tableau rules we used a half portion of the FraCaS test suite [5] and the part of the SICK trial data [10] as a development set.Footnote 7 The reason behind opting for these data sets is that they do not contain long sentences, hence there is a higher chance that a CCG tree returned by the C&C tools will contain less number of wrong analyses, and it is more likely to obtain correct LLFs from the tree. Moreover, the FraCas test suite is considered to contain difficult entailment problems for textual entailment systems since its problems require more complex semantic reasoning than simply paraphrasing or relation extraction. We expect that interesting rules can be discovered from this set.

Hereafter, we will use several denotations while presenting the collected tableau rules. Uppercase letters \(A,B,C,\ldots \) and lowercase letters \(a,b,c.\ldots \) stand for meta variables over LLFs and constant LLFs, respectively. A variable letter with an arrow above it stands for a sequence of LLFs corresponding to the register of the variable (e.g. is a sequence of LLFs). Let \({{[\,]}}\) denote an empty sequence. We assume that \({e\!n\!p}\) is a variable type that can be either \(\mathtt{np}\) or e and that \(\mathtt{vp}\) abbreviates \((\mathtt{np},\mathtt{s})\). Let \((-,\alpha ) \in \mathcal {I_A}\) for any \(\alpha \in \mathcal {A}\) where the final (i.e. the most right) atomic type of \((-,\alpha )\) is \(\alpha \); for instance \((-,\mathtt{s})\) can be \(\mathtt{s}\), \((\mathtt{np},\mathtt{s})\), \((\mathtt{vp},\mathtt{vp})\), etc. While writing terms we may omit their types if they are irrelevant for discussions, but often the omitted types can be inferred from the context the term occurs in. Tableau rules will be followed by the names that are the current names of the rules in the natural tableau prover. The same rule names with different subscripts mean that these rules are implemented in the prover by a single rule with this name. For instance, both mod_n_tr \(_1\) and mod_n_tr \(_2\) are implemented by a single rule mod_n_tr in the prover. Finally, we slightly change the format of nodes of [12]; namely, we place an argument list and a sign on the right side of a LLF – instead of \(Tci:\mathtt{man}\) we write \(\mathtt{man} : ci : \mathbb {T}\). We think that the latter order is more natural.

4.1 Rules from [12]

Most rules of [12] are introduced in the prover. Some of them were changed to more efficient versions. For example, the two rules deriving from format are modified and introduced in the prover as pull_arg, push_arg \(_1\), and push_arg \(_2\). These versions of the rules have more narrow application ranges. Hereafter, we assume that \(\mathbb {X}\) can match both \( \mathbb {T}\) and \(\mathbb {F}\) signs.

figure a

The Boolean rules and rules for monotonic operators and determiners (namely, some, every, and no) are also implemented in the prover. It might be said that these rules are one of the crucial ones for almost any entailment problem.

4.2 Rules for Modifiers

One of the most frequently used set of rules is the rules for modifiers. These rules inspire us to slightly change the format of tableau nodes by adding an extra slot for memory on the left side of an LLF:

$$memorySet : LLF : argumentList : truthSign$$

An idea of using a memory set is to save modifiers that are not directly attached to a head of the phrase. Once a LLF becomes the head without any modifiers, the memory set is discharged and its elements are applied to the head. For example, if we want to entail beautiful car from beautiful red car, then there should be a way of obtaining (11) from (10) in a tableau. It is obvious how to produce (12) from (10) in the tableau settings, but this is not the case for producing (11) from (10), especially, when there are several modifiers for the head.

$$\begin{aligned} \mathbf{beautiful}_{\mathtt{n},\mathtt{n}} (\mathbf{red}_{\mathtt{n},\mathtt{n}} \mathbf{car}_\mathtt{n}) : c_e : \mathbb {T}\end{aligned}$$
(10)
$$\begin{aligned} \mathbf{beautiful}_{\mathtt{n},\mathtt{n}} \mathbf{car}_\mathtt{n}: c_e : \mathbb {T}\end{aligned}$$
(11)
$$\begin{aligned} \mathbf{red}_{\mathtt{n},\mathtt{n}} \mathbf{car}_\mathtt{n}: c_e : \mathbb {T} \end{aligned}$$
(12)

With the help of a memory set, \(\mathbf{beautiful}_{\mathtt{n},\mathtt{n}}\) can be saved and retrieved back when the bare head is found. Saving subsective adjectives in a memory is done by mod_n_tr \(_1\) rule while retrieval is processed by mods_noun \(_1\) rule. In Fig. 2a, the closed tableau employs the latter rules in combination with int_mod_tr and proves that (10) entails (11).Footnote 8

figure b

Hereafter, if a rule do not employ memory sets of antecedent nodes, then we simply ignore the slots by omitting them from nodes. The same applies to precedent nodes that contain an empty memory set. In rule int_mod_tr, a memory of a premise node is copied to one of conclusion nodes while rule int_mod_fl attaches empty memories to conclusion nodes, hence, they are omitted. The convention about omitting memory sets is compatible with rules found in [12].

figure c

The other rules that save a modifier or discharge it are mod_push, mod_pull and mods_noun \(_2\). They do this job for any LLF of type \((\mathtt{vp},\mathtt{vp})\). For instance, using these rules (in conjunction with other rules) it is possible to prove that (13) entails (14); moreover, the tableau in Fig. 2b employs push_mod and pull_mod rules and demonstrates how to capture an entailment about events with the help of a memory set without introducing an event entity.

Fig. 2.
figure 2

Tableaux that use rules for pulling and pushing modifiers in a memory: (a) beautiful red car \(\Rightarrow \) beautiful car; (b) john ran slowly today \(\Rightarrow \) john ran today

Yet another rules for modifiers are pp_mod_n, n_pp_mod and aux_verb. If a modifier of a noun is headed by a preposition like in the premise of (15), then pp_mod_n rule can treat a modifier as an intersective one, and hence capture entailment (15). In the case when a propositional phrase is a complement of a noun, rule n_pp_mod treats the complement as an intersective property and attaches the memory to the noun head. This rule with mod_n_tr \(_1\) and mods_noun \(_1\) allows the entailment in (16).Footnote 9

$$\begin{aligned} \mathbf{in}_{\mathtt{np},\mathtt{vp},\mathtt{vp}} \mathbf{paris}_\mathtt{np}\big (\lambda x.\, \mathbf{a}_{\mathtt{n},\mathtt{vp},\mathtt{s}} \mathbf{tourist}_\mathtt{n}(\lambda y.\, \mathbf{is}_{\mathtt{np},\mathtt{vp}} \, y_\mathtt{np}\, x_\mathtt{np}) \big ) \mathbf{john}_\mathtt{np}\end{aligned}$$
(13)
$$\begin{aligned} \mathbf{in}_{\mathtt{np},\mathtt{n},\mathtt{n}} \mathbf{paris}_\mathtt{np}\mathbf{tourist}_\mathtt{n}\mathtt{john}_e \end{aligned}$$
(14)
$$\begin{aligned} \mathbf{in}_{\mathtt{np},\mathtt{n},\mathtt{n}} \mathbf{paris}_\mathtt{np}\mathbf{tourist}_\mathtt{n}\mathtt{john}_e ~~ \Rightarrow ~~ \mathtt{in}_{eet} \mathtt{paris}_e \mathtt{john}_e \end{aligned}$$
(15)
$$\begin{aligned} \mathbf{nobel}_{\mathtt{n},\mathtt{n}} \big ( \mathbf{prize}_{\mathtt{pp},\mathtt{n}} (\mathbf{for}_{\mathtt{np},\mathtt{pp}} C_\mathtt{np}) \big ) ~~ \Rightarrow ~~ \mathbf{nobel}_{\mathtt{n},\mathtt{n}} \mathbf{prize}_{\mathtt{n}} \end{aligned}$$
(16)

Problems in data sets rarely contain entailments involving the tense, and hence aux_verb is a rule that ignores auxiliary verbs and an infinitive particle to. In Fig. 4, it is shown how aux_verb applies to 4 and yields 5. The rule also accidentally accounts for predicative adjectives since they are analyzed as \(\mathbf{be}_{\mathtt{vp},\mathtt{vp}} P_{\mathtt{vp}}\), and when aux_verb is applied to a copula-adjective construction, it discards the copula. The rule can be modified in the future to account for tense and aspect.

figure d

4.3 Rules for the Copula be

The copula be is often considered as a semantically vacuous word and, at the same time, it is sometimes a source of introduction of the equality relation in logical forms. Taking into account how the equality complicates tableau systems (e.g., a first-order logic tableau with the equality) and makes them inefficient, we want to get rid of be in LLFs whenever it is possible. The first rule that ignores the copula was already introduced in the previous subsection.

figure e

The second rule that does the removal of the copula is be_pp. It treats a propositional phrase following the copula as a predicate, and, for example, allows to capture the entailment in (17). Note that the rule is applicable with both truth signs, and the constants c and d are of type e or \(\mathtt{np}\).

$$\begin{aligned} \mathbf{be}_{\mathtt{pp},\mathtt{np},\mathtt{s}} (\mathbf{in}_{\mathtt{np},\mathtt{pp}} \, \mathbf{paris}_\mathtt{np}) \, \mathbf{john}_\mathtt{np}~~\Rightarrow ~~ \mathbf{in}_{\mathtt{np},\mathtt{pp}} \, \mathbf{paris}_\mathtt{np}\, \mathtt{john}_e \end{aligned}$$
(17)

The other two rules a_subj_be and be_a_obj apply to NP-be-NP constructions and introduce LLFs with a simpler structure. If we recall that quantifier terms like \(\mathbf{a}_{\mathtt{n},\mathtt{vp},\mathtt{s}}\) and \(\mathbf{s}_{\mathtt{n},\mathtt{vp},\mathtt{s}}\) are inserted in a CCG term as described in Sect. 3, then it is clear that there are many quantifiers that can introduce a fresh constant; more fresh constants usually mean a larger tableau and a greater choice in rule application strategies, which as a result decrease chances of finding proofs. Therefore, these two rules prevent tableaux from getting larger as they avoid introduction of a fresh constant. In Fig. 3, the tableau uses be_a_obj rule as the first rule application. This rule is also used for entailing (14) from (13).

figure f

4.4 Rules for the Definite Determiner the

We have already presented several new rules in the previous section that apply to certain constructions with the copula and the determiner the. Here we give two more rules that are applicable to a wider range of LLFs containing the.

Since the definite determiner presupposes a unique referent inside a context, rule the_c requires two nodes to be in a tableau branch: the node with the definite description and the node with the head noun of this definite description. In case these nodes are found, the constant becomes the referent of the definite description, and the verb phrase is applied to it. The rule avoids introduction of a fresh constant. The same idea is behind rule the but it introduces a fresh constant when no referent is found on the branch. The rule is similar to the one for existential quantifier \(\mathtt{some}\) of [12], except that the is applicable to false nodes as well due to the presupposition attached to the semantics of the.

figure g

4.5 Rules for Passives

Textual entailment problems often contain passive paraphrases, therefore, from the practical point of view it is important to have rules for passives too. Two rules for passives correspond to two types the CCG parser can assign to a by-phrase: either \({\mathtt{pp}}\) while being a complement of VP, or \((\mathtt{vp},\mathtt{vp})\) – being a VP modifier. Since these rules model paraphrasing, they are applicable to nodes with both signs. In Fig. 4, nodes 6 and 7 are obtained by applying vp_pass \(_2\) to 5.

figure h

4.6 Closure Rules

In general, closure rules identify or introduce an inconsistency in a tableau branch, and they are sometimes considered as closure conditions on tableau branches. Besides the revised version of the closure rule \(\bot \) \(\le \) found in [12], we add three new closure rules to the inventory of rules.

figure i

Rule \(\bot \) there considers a predicate corresponding to there is as a universal one. For example, in Fig. 3, you can find the rule in action (where be_a_obj rule is also used). The rules \(\bot \) do_vp \(_1\) and \(\bot \) do_vp \(_2\) model a light verb construction. See Fig. 4, where the tableau is closed by applying \(\bot \) do_vp \(_1\) to 6, 8 and 2.

Fig. 3.
figure 3

A tableau for the man who dances is John \(\Rightarrow \) there is a person who moves. The tableau employs be_a_obj rule to introduce 3 from 1. The first two branches are closed by \(\bot \) \(\le \) taking into account that \(\mathbf{man} \le \mathbf{person}\) and \(\mathbf{dance} \le \mathbf{move}\). The last branch is closed by applying \(\bot \) there to 12.

Fig. 4.
figure 4

A tableau proof for a beautiful dance was done by Mary \(\Rightarrow \) Mary danced

The tableau rules presented in this section are the rules that are found necessary for proving certain entailment problems in the development set. For the complete picture, some rules are missing their counterparts for the false sign. This is justified by two reasons: those missing rules are inefficient from the computational point of view, and furthermore, they do not contribute to the prover’s accuracy with respect to the development set. Although some rules are too specific and several ones seem too general (and in few cases unsound), at the moment our main goal is to list the fast and, at the same time, useful rules for textual entailments. The final analysis and organization of the inventory of rules will be carried out later when most of these rules will be collected. It is worth mentioning that the current tableau prover employs more computationally efficient versions of the rules of [12] and, in addition to it, admissible rules (unnecessary from the completeness viewpoint) are also used since they significantly decrease the size of tableau proofs.

5 Evaluation

In order to demonstrate the productivity of the current inventory of tableau rules, we present the performance of the prover on the development set. As it was already mentioned in Sect. 4, we employ the part of the SICK trial data (100 problems) and a half of the FraCaS data (173 problems) as the development set. In these data sets, problems have one of three answers: entailment, contradiction, and neutral. Many entailment problems contain sentences that are long but have significant overlap in terms of constituents with other sentences. To prevent the prover from analyzing the common chunks (that is often unnecessary for finding the proof), we combine the prover with an optional simple aligner that aligns LLFs before a proof procedure. The prover also considers only a single LLF (i.e. semantic reading) for each sentence in a problem. Entailment relations between lexical words are modeled by the hyponymy and hypernymy relations of WordNet-3.0 [6]: \(term_1 \le term_2\) holds if there is a sense of \(term_1\) that is a hyponym of some sense of \(term_2\).

We evaluate the prover against the FraCaS development set (173 problems) and the whole SICK trial data (500 problems). The confusion matrix of the performance over the FraCaS development set is given in white columns of Table 1. As it is shown the prover was able to prove 31 true entailment and 2 contradiction problems. From all 173 problems, 18 problems are categorizes as defected since LLFs of those problems were not obtained properly – either the parser could not parse a sentence in a problem or it parsed the sentences but they were not comparable as their CCG categories were different. If we ignore the obvious errors from the parser by excluding the defected problems, the proverget the precision of .97 and the recall of .32. For this evaluation the prover is limited by the number of rule applications; if it is not possible to close a tableau after 200 rule applications, then the tableau is considered as open. For instance, the prover reached the rule application limit and was forcibly terminated for 12 problems (see the numbers in parentheses in ‘No proof’ column). In Fig. 5, it is shown a closed tableau found by the prover for the FraCaS problem with multiple premises. The first three entries in the tableau exhibit the LLFs of the sentences that were obtained by the LLF generator.

Table 1. The confusion matrix of the prover’s performance on the FraCaS dev-set and on the SICK trial data (with a gray background). Numbers in parentheses are the cases when the prover was force terminated (after 200 rule applications).
Fig. 5.
figure 5

A tableau proof for FraCaS-103 problem: all APCOM managers have company cars. Jones is an APCOM manager \(\Rightarrow \) Jones has a company car (Note that 4 was obtained from 2 by be_a_obj, 5 and 6 from 1 and 4 by the efficient version of the rule for every of [12], 7 from 6, 8 and 9 from 3 and 7 by the efficient version of the monotone rule of [12], 10 and 11 from 8 and 9 in the same way as the previous nodes, and 12 from 10 and 11 by \(\bot \) \(\le \) and the fact that \(\mathbf{a} \le \mathbf{s}\)).

The results over the FraCaS data set seem promising taking into account that the set contains sentences with linguistic phenomena (such as anaphora, ellipsis, comparatives, attitudes, etc.) that were not modeled by the tableau rules.Footnote 10

The evaluation over the SICK trial data is given in gray columns of Table 1. Despite exploring only a fifth portion of the SICK trial data, the prover showed decent results on the data (see them in gray columns of Table 1). The evaluation again shows the extremely high precision of .98 and the more improved recall of .42 than in case of the FraCaS data. The alignment preprocessing drastically decreases complexity of proof search for the problems of the SICK data since usually there is a significant overlap between a premise and a conclusion. The tableau proof in Fig. 6 demonstrates this fact, where treating shared complex LLFs as a constant results in closing the tableau in three rule applications.

Fig. 6.
figure 6

A tableau proof for SICK-6146 problem:two people are standing in the ocean and watching the sunset \(\bot \) nobody is standing in the ocean and watching the sunset. The tableau starts with \(\mathbb {T}\) sign assigned to initial LLFs for proving the contradiction. The proof introduces 5 from 2 and 4 using the efficient version of the rule for no of [12] and, in this way, avoids branching of the tableau.

Table 2. The false positive problems

The problems that were classified as neutral but proved by the prover represent also the subject of interest (see Table 2). The first problem was proved due to the poor treatment of cardinals by the prover – there is no distinction between them. The second problem was identified as entailment since cry may also have a meaning of shout. The last one was proved because the prover used LLFs, where no hat and a backwards hat had the widest scopes.

6 Future Work

Our future plan is to continue enriching the inventory of tableau rules. Namely, the SICK training data is not yet explored entirely, and we expect to collect several (mainly syntax-driven) rules that are necessary for unfolding certain LLFs. We also aim to further explore the FraCaS data and find the ways to accommodate in natural tableau settings semantic phenomena contained in plurals, comparatives, anaphora, temporal adverbials, events and attitude verbs.

At this stage it is early to compare the current performance of the prover to those of other entailment systems. The reason is that entailment systems usually do not depend on a single classifier but the combination of several shallow (e.g., word overlap) or deep (e.g., using semantically relevant rules) classifiers. In the future, we plan to make the prover more robust by combining it with a shallow classifier or by adding default rules (not necessarily sound) that are relevant for a development data set, and then compare it with other entailment systems. We also intend to employ other parsers for improving the LLF generator and knowledge bases for enriching the lexical knowledge of the prover.