Keywords

1 Introduction

Compositional question answering is important in natural language understanding. Previous works involve semantic parsing which maps a natural language(NL) sentence into its meaning representation(MR). Semantic parsers based on syntax first derive syntactic trees from the NL sentences, then the syntactic trees are converted to the corresponding MRs [13]. Semantic parsers based on machine translation technologies use synchronous grammars, which match NL string patterns and construct MRs synchronously [47]. Semantic parsers using dependency-based compositional semantics(DCS) derive all the possible MRs from NL sentences, then a probabilistic model is used to find the answers [8, 9]. Recent years, semantic parsers using knowledge bases [1013] such as Freebase [14], or using grounded information [15, 16] are developed to handle domain-independent, large-scale corpus.

Overall, two kinds of information are used to improve the generalization performance of a semantic parser. One is the syntactic information. Words or phrases are generalized into syntactic non-terminals, which capture the unseen phrases in the training corpus. Another is the semantic information such as knowledge bases, which capture the unseen relations in the training corpus. However, traditional methods using syntactic information always tend to overgeneralize the NL words, and the methods using semantic information do not capture the hierarchical relations which would further improve the generalization of semantic parsers.

A novel approach for semantic parsing is proposed in this paper. A concept base is introduced which contains hierarchical relations between concepts. A new form of MR for the semantic parser is proposed to match the concept base. It shares the same structure with the concept base. Compared to the widely used Montague semantics based on lambda calculus, the proposed MR is easily to combine with a concept base. Compared to DCS which constrains the concept relations to several kinds, the proposed MR is free to contain all relations.

To integrate the concept base into the semantic parser, a set of constructions are introduced which map concept sequences into their MRs. The constructions have some resemblances to the rules in synchronous grammars. They both capture the syntactic information in a specific way, which avoids the overgeneralization of NL words. But constructions are based on the concept base. They can also captures the semantic information between concepts.

The experiments in GeoQuery [17], a benchmark dataset, have shown that the proposed system outperforms all existing systems both in accuracy and generalization performance.

2 Concept Base

The basic elements in the concept base are concepts. A string with the first letter capitalized denotes the corresponding concept if no ambiguities exist. The formal definition of the concept base is \(K=<E,A,R,E_i,A_i,R_i,R_h>\), in which:

E represents entity, examples include City, State, Person, etc. \(E_i\) represents the instances of entities. The extension of an entity is the set of the instances of that entity. For example, Austin is an instance of City, so it’s an element in the extension of City.

A represents attribute, examples include Height, Length, Area, etc. \(A_i\) represents the instances of attributes. The extension of an attribute is the set of the instances of that attribute.

R represents relations. A relation relates a set of concepts which are relative elements of that relation. \(R_i\) are the instances of relations. The extension of a relation is the set of instances of that relation. For example, a relation Loc(CityState) means a City is located in a State, while Loc(AustinTexas) represents an instance of the relation Loc(CityState), where City and State are instantiated as Austin and Texas.

\(R_h\) represents the hierarchical relations. They are a kind of relations. If a concept \(C_1\) is the hypernym of another concept \(C_2\), then there exists a hierarchical relation between \(C_1\) and \(C_2\). Examples include the relation between City and Capital, the relation between Attribute and Area, etc.

The hypernym of a concept C is denoted as \(C_h^C\). The extension of a concept C is denoted as Ext(C). Note that words in NL are also regarded as concepts, they are instances of the entity Word.

Since there are no appropriate hierarchical concept bases for GeoQuery currently, a manually annotated one is used to conduct the experiments. Theoretically the hierarchical concept base is domain-independent, and can be used in any other systems.

3 Meaning Representation

3.1 Semantic Tree

The MRs of sentences also consist of concepts. A basic assumption is made that all the MRs are trees. A semantic tree is represented as \(t=<Root\ (C_1)\ (C_2)\ \dots \ (C_m)>\), where Root is the root of the tree, and \(C_1,C_2,\dots ,C_m\) are the child trees of Root. Figure 1 shows some examples. Each concept in the tree is a hyponym of some concept in the concept base, because they have different extensions. In Fig. 1(a), the extension of \(State'\) should be the states bordering Texas, while its hypernym, State, has all the known states in its extension. This specific form of MR shares the same structure with the concept base, which allows convenient computation of the semantic tree.

Fig. 1.
figure 1

Some examples of semantic trees in our system

3.2 Computation

The computation of a semantic tree is defined as the procedure of finding the extension of the focus concept in the semantic tree. Normally, the focus concept is the root of the tree. If there exists interrogatives in the tree, such as When, Where, What, How, etc., then focus concept is the concept connected to that interrogative. There are two typical cases:

(1) The focus concept is a relation R, and all the concepts connected to it are its relative elements, denoted as \(C_1,C_2,\dots ,C_m\). The instances of the focus concept should be included in the extension of its hypernym \(C_h^R\). Denote an instance in the extension of \(R_h^R\) as \(R_i\). Its relative elements are \(C_{1i},C_{2i},\dots ,C_{mi}\), which are all instances. If for every \(j\in {1,2,\dots ,m}\), \(C_{ji}\) is an instance of \(C_j\), then \(R_i\) is an instance of R. Formally:

$$\begin{aligned} \begin{aligned} Ext(R)=\{&R_i(C_{1i},C_{2i},\dots ,C_{mi})\in Ext(C_h^R)| \\&C_{1i}\in Ext(C_1)\wedge C_{2i}\in Ext(C_2 )\wedge \dots \wedge C_{mi}\in Ext(C_m )\} \end{aligned} \end{aligned}$$
(1)

This is called the “match” method. Since not all relation instances are known in advance, several computing methods are needed in GeoQuery:

Count. This computing method is used when counting the number of instances of a concept. Formally:

$$\begin{aligned} Ext(R)=\{R_i(C_{1i},C_2)\in Ext(C_h^R)| C_{1i}\in Ext(C_1) \wedge C_{1i}=|Ext(C_2)|\} \end{aligned}$$
(2)

Quantification. Examples of Quantification include Total, Average, etc. For example, the computing method of Average is:

$$\begin{aligned} \begin{aligned} Ext(R)=\{&R_i(C_{1i},C_2)\in Ext(C_h^R)|C_{1i}\in Ext(C_1) \wedge \\&C_{1i}=\frac{1}{N} \sum _{j=1}^{|Ext(C_2)|} C_{2j},C_{2j}\in Ext(C_2)\} \end{aligned} \end{aligned}$$
(3)

Comparative and Superlative. This computing method is used when comparing entities \(C_1\) and \(C_2\), on a specific attribute \(A_3\). Formally:

$$\begin{aligned} \begin{aligned} Ext(R)=\{&R_i(C_{1i},C_2,A_3)|C_{1i}\in Ext(C_1),\forall C_{2i}\in Ext(C_2),\\&R_i(C_{1i},C_{2i},A_3)\in Ext(C_h^R)\} \end{aligned} \end{aligned}$$
(4)

Negation. If the relation is negative, this computing method is used. Formally:

$$\begin{aligned} \begin{aligned}&Ext(R)=\{R_i(C_{1i},C_{2i},\dots ,C_{mi})|C_{1i}\in Ext(C_1)\wedge C_{2i}\in Ext(C_2)\wedge \\&\dots \wedge C_{mi} \in Ext(C_m) \wedge R_i(C_{1i},C_{2i},\dots ,C_{mi})\not \in Ext(C_h^R)\} \end{aligned} \end{aligned}$$
(5)

(2) The concepts connected to the focus concept are relations, and each relation takes the focus concept as one of its relative elements. Denote the tree as \(t=<C (R_1) (R_2) \dots (R_m)>\). The extension of C is first set to be the extension of its hypernym \(C_h^C\). Then the extensions of those relations are computed using the computing methods introduced above. For each instance \(C_i\) in the extension of C, if \(C_i\) is one relative element of some instance of \(R_1\), as well as \(R_2,\dots ,R_m\), then \(C_i\) is an instance of C. Usually, there exists a computing order. Absolute relations, such as relations computed using “match”, are computed before relative relations, such as ones computed using “comparative and superlative”.

$$\begin{aligned} \begin{aligned} S=&\bigcap _{j=1}^m\{x\in Ext(C_h^R)|\exists C_{1i}^j, C_{2i}^j,\dots C_{mi}^j, R_j(C_{1i}^j, C_{2i}^j,\dots ,x,\dots ,C_{mi}^j)\\ {}&\in Ext(R^j), R^j\ is\ an\ absolute\ relation \} \end{aligned} \end{aligned}$$
(6)
$$\begin{aligned} \begin{aligned} Ext(R)=&\bigcap _{j=1}^m \{x\in S|\exists C_{1i}^j, C_{2i}^j,\dots C_{mi}^j, R_j(C_{1i}^j, C_{2i}^j,\dots ,x,\dots ,C_{mi}^j)\\ {}&\in Ext(R^j), R^j\ is\ a\ relative\ relation\} \end{aligned} \end{aligned}$$
(7)

For semantic trees containing the both cases, the result should be the intersection of the results obtained in them. Using the computing methods above mentioned, a semantic tree can be recursively computed.

4 Construction

Construction Representation. A construction encodes the correspondence of a concept sequence and its MR. Note that words in NL are regarded as concepts, so constructions can also be used to denote the correspondence of words and their MRs. A basic assumption for constructions is that except for the constructions of words, the MR of a construction should have one and only one relation concept, and all the other concepts in the MR are the relative elements of that relation. Formally, \(cons=<P;F;T;H;R>\). Here P represents the concept sequence. F represents morphological and semantic features(MSFs), such as number for entities, participle for relations, affirmative or negative for relations, etc. They are used to restrict the concept usage to appropriate syntax and semantic context. They are universal and domain-independent. T is the corresponding semantic tree which consists of the concepts in P, and H is the root of the semantic tree. R represents the only relation in T.

Construction Annotation. The sentences in training corpus are manually annotated into constructions. For a sentence S, the procedure is as follows:

(1)For every word W in S, the corresponding MSF is extracted as \(F_w\). Assume that W corresponds to only one MR in this context, denoted as C. The construction is annotated as: \(<W;F_w;<C>;C;\epsilon >\).

(2)For every phrase \(P_{nl}=<W_1,W_2,\dots ,W_m>\) in S, first map every word in the phrase to corresponding MR, \(P=<C_1,C_2,\dots ,C_m>\), with the extracted MSF sequence as F. Then for every concepts in P, find its highest level of hypernym in the hierarchical concept base. This forms a new concept sequence, denoted as \(P_h\). Annotate the corresponding semantic tree T with head H for \(P_h\). Note that there should be only one relation R in T, otherwise the phrase should be split into smaller phrases. Thus the construction is \(cons=<P_h;F;T;H;R>\).

Probabilistic Construction. In general, a word or a phrase may correspond to multiple MRs. These ambiguities are the main motivation for extending constructions into probabilistic constructions. Given a word or a phrase, the probability of a MR derived from this phrase is p(THR|PF). It can be obtained from the corpus:

$$\begin{aligned} P(T,H,R|P,F)=\frac{count(<P;F;T;H;R>)}{\sum _{(T,H,R)}count(<P;F;T;H;R>)} \end{aligned}$$
(8)

5 Semantic Parsing

A construction can be split into two parts, a production with H as its left hand side (LHS) and P as its right hand side (RHS), and a semantic tree T. Denote a random subsequence of the input concept sequence as \(P_s=(C_{s1},C_{s2},\dots ,C_{sm})\), whose MSF sequence is \(F_s\). For a construction \(cons=<C_{c1},C_{c2},\dots ,C_{cm};F_c; T;H;R>\), and for every \(j\in {1,2,\dots ,m}\), check the following propositions: (1) \(C_{sj}\) is \(C_{cj}\); (2) \(C_{sj}\) is one hyponym of \(C_{cj}\); (3) \(C_{sj}\) is an instance of \(C_{cj}\). If one of them is true, then replace \(C_{cj}\) in T as \(C_{sj}\). This forms a new semantic tree \(T_s\), which is part of the meaning representation of the input concept sequence. Denote the root of \(T_s\) as \(H_s\). \(H_s\) can be further used to combine with other concepts in the input concept sequence. The parsing task has some resemblance with the probabilistic context-free grammar (PCFG) parsing, and the Earley’s context-free parsing algorithm [18] is used. Unlike Earley’s algorithm which is operated on words and syntactic non-terminals, here the parser operates on concepts. The probability of a semantic tree is obtained by multiplying the probabilities of all the constructions used in that semantic tree. Finally, the most probable one is selected from the semantic tree set \(T_{set}\):

$$\begin{aligned} \begin{aligned} T_{best}&=argmax_{T\in T_{set}} (P(T,H,R|S,F) \\&=argmax_{T \in T_{set}}(\prod _{T_{j}\in T}P(T_{j},H_{j},R_{j}|P_{j},F_{j} ))) \end{aligned} \end{aligned}$$
(9)

6 Experiments and Results

The experiments on GeoQuery are conducted. The dataset contains about 800 facts asserting relational information about U.S. geography, and 880 questions annotated with the corresponding MRs. The average length of a sentence is 7.48 words. The proposed system is compared to: (1)WASP [4], which is based on machine translation techniques; (2) \(\lambda \)-WASP [5], an extension of WASP for handling MRs; (3) SYNSEM [1], which combines syntactic information and semantic information together, here we choose its result based on the gold-standard syntactic parses; (4) L2013 [9], which uses DCS as MRs; (5) W2014 [3], which performs best in current CCG-based parsers.

In the first experiment, the accuracy of the proposed system is tested. The corpus is split as 600 questions for training and 280 questions for testing. Table 1 shows the results. A few observations can be made: (1)The new system outperforms all existing systems; (2)Though the new system needs more annotation, compared to SYNSEM which uses gold-standard syntactic parses, it still performs better.

Table 1. The Accuracy on GeoQuery

The second experiment is about the generalization performance. The standard 10-fold cross validation is used. Figure 2 shows the learning curves of difference systems. It can be observed that, the new system outperforms other systems by wide margins, matching their best final accuracy with only 50 % of the total training examples. This can greatly alleviate the burden of annotation.

Fig. 2.
figure 2

Learning curves for various parsing algorithms on the GeoQuery corpus

7 Conclusion

Hierarchical information can greatly improve the performance of semantic parsers. It includes a concept base which encodes the hierarchical relations between concepts, and a set of constructions which encodes the correspondences of concept sequences and their MRs. By using Earley’s algorithm in the semantic parser, the accuracy and generalization performance on the standard semantic parsing dataset, GeoQuery, is clearly improved.

However, the constructions in the system were manually annotated. Learning these constructions automatically will be the future work. This could alleviate the burden of annotation, and also reduce the errors in the annotated corpus.