Keywords

1 Introduction and Motivation

One of the tasks of the Competition on Legal Information Extraction and Entailment [1] consists of finding whether a given statement is entailed by the given legal article(s) or not. This is similar to the Recognition of Textual Entailment (RTE) challenge [3]. It has been observed by Bos and Markert [7] that classification based on shallow features alone performs better than theorem proving, for RTE. Androutsopoulos and Malakasiotis [3] state that most approaches for RTE do not focus on converting natural language to its formal representation. However, we believe that approaches using statistical or machine learning methods on shallow features do not offer much explanation about why a certain sentence is entailed or not, hence providing little insight into the cause of entailment. In this respect, we consider the approaches based on logical reasoning to be more promising.

There have been several works that propose logical representations and logics for representing and reasoning with legal information [1114, 21]. Reasoning rules and frameworks assume that the information given in the form of natural language can somehow be understood and represented in the required form. However, current methods to convert legal text to formal representations [4, 8, 17, 18] either focus on extracting important facts or are not generalizable to a wide variety of representations. Currently, there is no consensus on a single representation to express legal information. Therefore, a system that can translate natural language to a wide variety of formal languages, depending on the application, is desired. In this paper we show how our NL2KR system can be used for translation of simple legal sentences in English to various formal representations. This will facilitate reasoning with various frameworks.

2 Related Work

Some approaches to translate text into formal representations focus on extraction of specific facts from the text. For example, Lagos et al. [17] present a semi-automatic method to extract specific information such as events, characters, roles, etc. from legal text by using the Xerox Incremental Parser (XIP) [2]. The XIP performs preprocessing, named entity extraction, chunking and dependency extraction, and combination of dependencies to create new ones. Bajwa et al. [4] propose an approach to automatically translate specification of business rules in English to Semantic Business Vocabulary and Rules (SBVR). Their method is essentially a rule-based information-extraction approach, which identifies SBVR elements from text. The goal of other approaches like the work by McCarty [18] is to obtain a semantic interpretation of the complete sentence. This approach uses the output from the state-of-the-art statistical parser to obtain a semantic representation called Quasi-Logical Form (QLF). QLF is a rich knowledge representation structure which is considered an intermediate step towards a fully logical form.

There have been similar efforts in other languages. Nakamura et al. [20] present a rule-based approach to convert Japanese legal text into a logical representation conforming to Davidsonian style. They ascertain the structure of legal sentences and identify cue phrases that indicate this structure, by manually analyzing around 500 sentences. They also define transformation rules for some special occurrences of nouns and verbs. In their subsequent work [16], they propose a method to resolve references that point to other articles or itemized lists, by replacing them with the relevant content.

As mentioned in the previous section, different legal reasoning frameworks expect input in different logical representations. Even though Legal Knowledge Interchange Format (LKIF) [15] was an attempt to standardize the representation of legal knowledge in the semantic web, currently, no single representation has been unanimously considered the de-facto standard for legal text. Therefore, we need a system that can translate natural language to a particular representation depending on the application.

3 The NL2KR Framework

NL2KR is a framework to develop translation systems that translate natural language to a wide variety of formal language representations. It is easily adaptable to new domains according to the training data supplied. It is based on the algorithms presented in Baral et al. [5]. The workflow using the NL2KR systems consists of two phases: (1.) learning and (2.) translation, as shown in Fig. 1. In the learning phase, the system takes training data, an initial dictionary and any optional syntax overrides, as inputs. The training data consists of a number of natural language sentences along with their formal representations in the desired target language. The initial dictionary (or lexicon) contains meanings of some words. The dictionary is manually supplied to the system. Using these inputs, NL2KR tries to learn the meanings of as many words as possible. Thus, the output of the learning phase is an updated dictionary which includes the meanings of all newly learned words. The translation phase uses the dictionary created by the learning phase to translate previously unseen sentences.

Fig. 1.
figure 1

The NL2KR system showing learning (left) and translation (right)

Table 1. Example

At the core of NL2KR are two very elegant algorithms, Inverse Lambda and Generalization, which are used to find meanings of unknown words in terms of lambda (\(\lambda \)) expressions. NL2KR is inspiredFootnote 1 by Montague’s approach [19]. Every word has a \(\lambda \) expression meaning. The meaning of a sentence is successively built from the combination of the \(\lambda \) expressions of words according to the rules of combination in Lambda (\(\lambda \)) calculus [9]. The order in which the words should be combined is given by the parse tree of the sentence according to a given Combinatory Categorial Grammar (CCG) [22]. As an example illustrating this approach, consider the sentence “John loves Mary” shown in Table 1. The CCG category of “loves” is (S\NP)/NP. This means that this word takes arguments of type NP (noun-phrase) from the left and the right, to form a complete sentence. From the CCG parse, we observe that “loves” and “Mary” combine first and then their combination combines with “John” to form a complete parse. The \(\lambda \) expression corresponding to “loves” is \(\#y.\#x.loves(x,y)\) Footnote 2, which means that this word takes two inputs, \(\#x\) and \(\#y\) as arguments and the application of this word to the arguments results in a \(\lambda \) expression of the form loves(xy).

The close correspondence between CCG syntax and \(\lambda \) calculus semantics is very helpful in applying this method. In the first step, the \(\lambda \) expression for “loves” is applied to “Mary”, with the former as the function and the latter as the argument, in accordance with CCG categories. This application, denoted as \(\#y.\#x.loves(x,y)@mary\) results in \(\#x.loves(x,mary)\). Proceeding this way, the meaning of the sentence is generated in terms of \(\lambda \) expressions. This is a very elegant way to model semantics and has been widely used [5, 6, 10, 23]. The problem, however, is that for longer sentences, \(\lambda \) expressions become too complex for even humans to figure out. This problem is addressed by NL2KR by employing the Inverse Lambda and Generalization algorithms to automatically formulate \(\lambda \) expressions from words whose semantics are known.

Learning Algorithms: The two algorithms used to learn \(\lambda \) semantics of new words are the Inverse Lambda and Generalization algorithms. When the \(\lambda \) expressions of a phrase and that of one of its sub-parts (children in the CCG parse tree) are known, we can use this knowledge to find the \(\lambda \) expression of the unknown sub-part. The Inverse Lambda operation computes a \(\lambda \) expression F such that \(H = F@G\) or \(H = G@F\) given H and G. These are called Inverse-L and Inverse-R algorithms, respectively. For example, if we know the meaning of the sentence “John loves Mary” (Table 1) as loves(johnmary) and the meaning of John as john, we can find the meaning of “loves Mary” using Inverse Lambda, as \(\# x.loves(x,mary)\). Going further, if we know the meaning of “Mary” as mary, we can find the meaning of “loves” using Inverse Lambda.

The Generalization algorithm is used to learn meanings of unknown words from syntactically similar words with known meanings. It is used when Inverse Lambda algorithms alone are not enough to learn new meanings of words or when we need to learn meanings of words that are not even present in the training data set. For example, we can generalize the meaning of the word “likes” with CCG category \((S \backslash NP) / NP\)), from the meaning of “loves”, which we already know from the previous example. The meaning of “likes” thus generated will be \(\#y.\#x.likes(x,y)\). We will illustrate learning in later sections with the help of examples.

For every sentence in the training set, we first use the CCG Parser to obtain all possible parse trees. Using the initial dictionary supplied by the user, the system assigns all known meanings to the words (at the leaf) in each parse tree. Moving bottom up, it combines as many words with each other as possible (in the order dictated by the parse tree) by performing \(\lambda \) applications. The meaning of each complete sentence is known from the training corpus. We need to traverse top-down from this known translation, while simultaneously traversing bottom up, by filling in missing word or phrase meanings. Meanings of unknown words and phrases are obtained using Inverse Lambda and Generalization, as applicable, until nothing new can be learned.

Dealing with Ambiguity: To deal with ambiguity of words, a parameter learning method [23] is used to estimate a weight for each word-meaning pair such that the joint probability of the training sentences getting translated to their given formal representation is maximized. However, this method might not work in all cases and more complex approaches, possibly involving word sense identification from context, might have to be used. Completely addressing this problem is a part of future work.

Translation Approach: Given a sentence, we consider all the possible parse trees, consisting of meanings of every word learned by the system or obtained from Generalization algorithm. Then we use Probabilistic CCG (PCCG) [23] to find the most probable tree, according to weights assigned to each word.

Availability: NL2KR is freely available for Windows, Linux and MacOSX systems at http://nl2kr.engineering.asu.edu. It is configurable for different domains and can be adapted to work with a large number of formal representations. A tutorial has also been provided.

4 Translating to Formal Legal Representations

NL2KR can be used to translate sentences into various logical representations, either directly or by using an intermediate languageFootnote 3. It can be customized to different domains based on the initial dictionary and training data provided. The quality of these inputs affects NL2KR’s performance. A language class can be considered a good analogy of NL2KR. The effectiveness of learning depends on the richness of vocabulary imparted to the students beforehand (similar to initial lexicon) and the sentences chosen to teach the language (training data). In our experiments, we observed that learning simpler sentences before complex ones aided learning. We will also give some guidelines for creating the initial dictionary. Several logics have been proposed in the literature for representing legal information [1114, 21], from which we have selected a few. In this section, we will illustrate the method of creating a good initial lexicon and demonstrate how to use the system to learn new word meanings, with respect to these examples. We will start with simple examples and progress to more complicated ones.

4.1 Translating to First Order Logic Representations

In this section, we demonstrate translating a sentence from the Competition on Legal Information Extraction and Entailment [1] corpus to a first order logic representation.

Sentence: Possessory rights may be acquired by an agent.

Translation: \(rights(X)\wedge type(X,possessory) \wedge agent(Y) >\)

acquirable(XYmay)

Here \(>\) is used to denote implication. The form of an action, for e.g., “acquirable” is action(XYZ). It denotes X(possessory rights) is being acquired by Y(agent) and the type of this action is Z. In the given example, acquiring is a possibility, not an obligation, which is why we use may as its type.

Once we provide this training data and other required inputs to the NL2KR learning interface (Fig. 2), we can start the Learning process. We will describe how to create inputs for learning in the next sub-section. The initial dictionary contains a list of words and their meanings in terms of \(\lambda \) expressions. Even if we do not know meanings of some words, we can use the system to figure them out on its own, using Inverse Lambda or Generalization algorithms. Figure 2 shows that the system learns the meaning of “rights” automatically using Inverse Lambda.

Fig. 2.
figure 2

NL2KR automatically learning the meaning of “rights” using Inverse Lambda Algorithm : feature = rights : [N] : #x3.#x1.right(x1) \(\wedge \) type(x1,x3)

4.2 Translating Sentences with Temporal Information

Consider the following sentence and translation, which shows an example of temporal ordering.

Sentence: After the invoice is received the customer is obliged to pay.

Translation: \(implies(receipt(invoice,T1)\wedge (T2>T1),\)

obl(pay(customerT2)))

Here implies(xy) denotes \(x\rightarrow y\). The predicate obl denotes that the action is an obligation (usually marked by words such as obliged to, shall, must, etc.) in contrast to a possibility (usually marked by words such as may). T1 and T2 are the instances of time at which the two events occurred.

Words that do not contribute significantly to the meaning of the sentence can be assigned the trivial meaning \(\#x.x\) in the dictionary. It is a \(\lambda \) expression that does not affect the meaning of other \(\lambda \) expressions. We can assign it to words such as “is”, “to” and “the” since these do not carry much meaning in this example. Next, we can start entering the meanings that are evident from looking at the target representation. Since “invoice” occurs as itself, we can give it the simple meaning invoice (similarly for “customer”). From the representation, we observe that “received” is a function called receipt with two arguments, hence we can give it the meaning \(\#x.\#t.receipt(x,t)\) (similarly for “pay”). Obliged is a more complicated function because it takes another function (pay) as its argument and therefore uses @y@t to carry forward the variables in pay to the next higher level of the tree, where we obtain the real arguments (customer and T2). Once all these meanings (Table 2) have been supplied, the system can automatically find the meaning of the word “after” using Inverse Lambda algorithm (Fig. 3). This is remarkable from the perspective that the meaning of “after” looks complicated and it might be tedious for users to supply such meanings manually in the initial lexicon. This demonstrates one of the advantages of using NL2KR. The meaning of “after” makes intuitive sense. The \(\lambda \) expression #x12.#x11.implies(x12 @ T1 \(\wedge \) T2 >T1,x11 @ T2) means that “after” is a \(\lambda \) function which takes two inputs:x11 and x12, where the first input event (x12) occurs at time T1, T2 >T1, the second input event (x11) occurs at time T2 and x12 implies (or leads to) x11. Hence, we were able to learn a significantly complicated meaning automatically by providing relatively simple \(\lambda \) expressions in the initial dictionary.

Table 2. \(\lambda \) expressions and CCG categories in the initial dictionary for the sentence “After the invoice is received the customer is obliged to pay.”
Fig. 3.
figure 3

NL2KR automatically learning the meaning of “after” using Inverse Lambda Algorithm : feature = after : [(S/S)/S] : #x12.#x11.implies(x12 @ T1 \(\wedge \) T2 >T1,x11 @ T2)

4.3 Translating to Temporal Deontic Action Laws

Giordano et al. [14] have defined a Temporal Deontic Action Language for defining temporal deontic action theories, by introducing a temporal deontic extension of Answer Set Programming (ASP) combined with Deontic Dynamic Linear Time Temporal Logic (DDLTL). This language is used for expressing domain description laws, for e.g., action laws, precondition laws, causal laws, etc., which describe the preconditions and effects of actions. It is also used for expressing obligations, for e.g., achievement obligations, maintenance obligations, contrary to duty obligations, etc. We will take examples of several domain description laws from the paper and demonstrate how to translate them automatically from natural language to the Deontic action language, using NL2KR.

Since NL2KR does not support some special symbols used in the Temporal Deontic Action Language, we first use NL2KR to convert the natural language sentences to an intermediate representation which is directly convertible to the Temporal Deontic Action Language. Then using the one-to-one correspondence between the intermediate language and the action language, we obtain the desired representation. In the Intermediate representation shown below, we have defined the predicate creates(xy), which means x creates y. We have also changed the representation of until to have two parameters a and b denoting “a until b” (as defined by Giordano et al. [14]).

Action Law:

Sentence: The action \(accept\_price\) creates an obligation to pay.

Translation: [\(accept\_price\)]O(\(\top U <pay>\top \))

Intermediate: creates(action(accept_price),O(until(a(T),b(pay,T))))

The NL2KR learning component can be used to make the system learn words and meanings from this sentence. The iterative learning process is depicted in the screenshot in Fig. 4. We start by giving meanings of simple words first. We give trivial meanings \(\#x.x\) to “the”,“an” and “to”, because they do not significantly affect the meaning of the sentence. Next, we guess the meanings of words from the target representation. Since “action” is a function that accepts a single argument, we give it the meaning \(\#x.action(x)\). Similarly, “accept_price” which occurs as itself is given the meaning \(accept\_price\). Similarly, Obligation, O is also a function, but it contains more structure, which can be obtained from the target representation. We interpret an obligation to also have an implicit notion of time by having “until” embedded in its meaning. However, the verb “pay” should be replaceable, because there can be other sentences such as “The action \(accept\_price\) creates an obligation to ship”. Therefore, we leave it as a variable input. The word “pay” could have been simply pay but we assign it the meaning \(\#x.x@pay\). This is because the node, “an obligation”, expects “to pay” to be an argument to it, but their CCG categories dictate otherwise. In cases where there is such inconsistency, we use meanings prefixed with \(\#x.x@\) for the function (according to CCG categories), so that their role is flipped to that of argumentsFootnote 4. The \(\lambda \) expressions and CCG categories of the constituent words are shown in Table 3. After giving these meanings, we find that the meaning of “creates” is obtained automatically by the system using the Inverse Lambda algorithm (Fig. 4).

Fig. 4.
figure 4

Screenshot of the Learning Process in NL2KR for the sentence “The action \(accept\_price\) creates an obligation to pay.” The meaning of “creates” is obtained automatically by the system using the Inverse Lambda algorithm

Table 3. \(\lambda \) expressions and CCG categories for the words in the action law “The action \(accept\_price\) creates an obligation to pay.”

Once the learning process is complete, we can use the Translation component of NL2KR to translate a new sentence. In this case, we use NL2KR to translate the following action law.

Action Law:

Sentence: The action \(cancel\_payment\) cancels the obligation to pay.

Translation: [\(cancel\_payment\)]\(\lnot \) O(\(\top U <pay>\top \))

Intermediate: cancels(action(cancel_payment),O(until(a(T),b(pay,T))))

The screenshot of the translation process is shown in Fig. 5. We observe that the sentence was automatically translated by the system successfully. This was done by generating the meanings of unknown words (“cancel_payment” and “cancels”) using Generalization (Fig. 6) on the words learned from the first action law.

Fig. 5.
figure 5

Screenshot of the Translation Process in NL2KR for the sentence “The action \(cancel\_payment\) cancels the obligation to pay.”

4.4 Translating to Temporal Object Logic in REALM

Regulations Expressed as Logical Models (REALM) [13] is a system that models regulatory rules in temporal object logic. The concepts and relationships occurring in this rule are mapped to predefined types and relationships in a Unified Modeling Language (UML) model. Using some examples from this paper, we will show how NL2KR can be used to translate rules specified in natural language to this temporal object logic representation.

Fig. 6.
figure 6

Generating the meanings of unknown words (cancel_payment and cancels) using Generalization during the Translation Process in NL2KR for the sentence “The action \(cancel\_payment\) cancels the obligation to pay.”

Table 4. Initial Lexicon containing \(\lambda \) expressions and CCG categories for both REALM examples

As in the previous section, we have created an intermediate representation which can directly be converted to the desired temporal object logic representation. This is needed due to unavailability of certain symbols in NL2KR’s vocabulary. We also assume that coreference in sentences has been resolved. For example, in the following sentence, the second occurrence of “bank” has replaced the pronoun “it”.

Sentence: Whenever a bank opens an account bank must verify customers identity within two_days

Translation: \(\square t_{open}(DoOn_{F}(bank,open,a) \rightarrow \)

\(\lozenge t_{verify}(DoInput_{F}(bank,verify,a.customer.record)\wedge t_{verify}-t_{open} \le 2_{[day]})\)

Intermediate: implies(g(do(bankopenaT1)), 

\(f(do(bank,verify,a\_customer\_record,T2)\)

\( \wedge equals(difference(T2,T1),two\_days)))\)

Similar to the previous examples, we use NL2KR to learn unknown words from these sentences. We do not give the meaning of “verify” for the second sentence (which is different from its meaning in the first sentence) but the system is able to figure it out on its own. Moreover, it also generalizes the correct meaning of three_days using the meaning of two_days from the previous sentence.

Sentence: Whenever a bank can not verify an identity bank has to close the account within three_days

Translation: \(\square t_{open}(DoOn_{F}(bank,open,a) \rightarrow \)

\(\lozenge t_{verify}(DoInput_{F}(bank,verify,a.customer.record)\wedge t_{verify}-t_{open} \le 2_{[day]})\)

Intermediate: \( implies(g(do(bank,verify,a\_customer\_record,T1)\wedge isfalse),\)

\(f(do(bank,close,a,T2) \wedge equals(difference(T2,T1),three\_days)))\)

We observe that the initial dictionary for this case (Table 4) looks more complicated than the one in Sect. 4.3. This is because the target language in this case is such that the functions which would have been intuitive according to their natural language meanings, for e.g., “opens”, “verify”, etc. are not functions but arguments of an artificially created function, “do”. It is obvious that the language of REALM was designed for different purposes than that of translation, which is why such a situation exists. Our motivation here is to give the reader an explanation of why some languages are easy for NL2KR to translate, while others are more difficult.

5 Conclusion and Future Work

Although legal text is written in natural language, one needs to do some kind of formal reasoning with it to draw conclusions. The first step to do that is to translate legal text to an appropriate logical language. At present there is no consensus on a single logical language to represent legal text. Therefore, one cannot develop a translation system targeted to a single language. Thus, a platform that can translate legal text to the desired logical language depending on the application, is needed. We have developed such a system called NL2KR. In this paper, we showed how NL2KR is useful in translating sentences from legal texts in English to various formal representations defined in various works, thereby bridging the gap from language to logical representation and enabling the use of various logical frameworks over the information contained in such texts.

So far we have experimented with a few small sentences picked from the literature on logical representation of legal texts. However, we need to expand this approach to capture nuances of legal texts used in real laws and statutes. Further enhancements are needed in NL2KR to equip it to deal with longer and more complicated sentences. One approach that can be used would involve breaking the sentence into smaller parts and subsequently dealing with each part separately. Such a parser, called L-Parser is available at http://bioai8core.fulton.asu.edu/lparser. We also plan to combine statistical and logical methods in the future. In particular, we are considering using a combination of distributional semantics and hand curated linguistic knowledge to characterize content words (especially, noun, verbs and adjectives) and use logical characterization for grammatical words (prepositions, articles, quantifiers, negation, etc.).