Keywords

1 Introduction

Temporal information is a very important dimension in documents. Being able to extract it would enable higher level functionalities, such as event-based summarization or search, pattern detection in cases, and timeline generation, that would facilitate the understanding of legal documents, usually difficult to comprehend by layman users, as well as enhance other NLP tasks over legal documents. Nevertheless, not a lot of research has been done in the legal domain in the field of temporal information.

TimeLex [1] is a suite of tools that aims to cover this gap in the domain, providing approaches to several parts of the temporal information extraction task. In this paper we briefly present the different contributions we have created in order to process this kind of texts from the temporal perspective.

The remaining of this paper is structured as follows. Section 2 presents related work in previous literature. Section 3 introduces lawORdate, a preprocessing tool that deals with legal references in order to facilitate latter temporal tagging task. Section 4 presents AñotadorFootnote 1, a temporal tagger designed to find and normalize temporal expressions in legal texts. Section 5 shows a first approach for event extraction, introducing the tool WhenTheFact, which is able to generate a timeline of events from the information extracted from a document from the European Court of Human RightsFootnote 2. Finally, Sect. 6 presents the conclusions and details the next steps in this research, targeting the semantic representation of the temporal annotations for further applications.

2 Related Work

Most effort related to temporal information in the legal domain has been done in relation to normative texts. This is the case of CronoLex [2], that aims to help lawyers by representing the legal norms in Spanish storing information about their life cycle, among others. Also in this direction, Akoma Ntoso [3] allows to represent several types of legal text in a standard way, including temporal information in the metadata.

Regarding the processing of temporal expressions and events, Schilder [4] analyzed the different types of legal documents with regard to temporal information, and divided them in statutes or regulations (where temporal information usually are constraints), transactional documents (including documents for legal transactions like contracts) and case law. In this paper, Schilder deeply studies the two first types of legal documents, but case law narrative structure was considered similar to the narratives in news, and received no dedicated attention.

Again in normative texts, Isemann et al. [5] used Named Entitiy Recognition and temporal processing in order to process the temporal dimension of regulations. This work, on the other hand, also described usual problems found by temporal taggers find in legal texts (not only in normative texts). Among them we can highlight the similar pattern of legal references and dates, that tend to be misleading to temporal taggers (e.g. “Directive 2012/33/EC”), or the distinction between generic events and episodic events. While the first refer to abstract events, general truths, rules, expectations or laws, episodic events are those that actually happened. Finally, also works on transactional documents [6, 7] and reasoning in legal evidence [8] can be found in literature.

3 LawORdate

lawORdate is a tool that cleans legal references with a date form from text documents. It addresses an important problem when processing legal documents from the temporal perspective, since common legal references in Spanish tend to include dates or patterns that can be misleading to temporal taggers. For instance, in the following excerpt:

".. creado via el Real Decreto 2093/20081, de 19 de diciembre2. Ha sido actualizado por ultima vez el 13 de agosto de 20173."

Most temporal taggers would find in this excerpt the three expressions in bold. Nevertheless, expression number one is not a date (despite of following a date-ish pattern) and expression number two is a date but does not belong to the narrative of the text (is part of a legal reference), so they should not be tagged. Therefore, the only one that should be tagged is the one underlined.

LawORdate is currently available both as a webapp [9] and as a GitHub repository [10], and finds and replaces misleading legal references in the texts, storing the original references. Once the temporal tagging is done, the references are restored in the text. Figure 1 shows the pipeline of use of lawORdate.

Fig. 1.
figure 1

Pipeline of use of lawORdate.

In the pipeline in Fig. 1, a text with legal references is first sent to the service. Then it finds all the misleading legal references that could affect to the precision of a temporal tagger and replaces them with inoquous expressions, storing the original references for further restoring. The output of this first step is to be used in a temporal tagger (in the demo, HeidelTime is offered, but any other can be used). Then, the output of the tagger (in TimeML) is sent back to lawORdate, that restores the original legal references. We therefore obtain the original text, but tagged without the interference of any legal references in it.

4 Añotador

Añotador [11] is a temporal tagger for Spanish and English able to find temporal expressions in texts, specially targeted to the legal domain. Añotador can detect different types of temporal expressions included in the TimeML standard, namely dates, times, sets (this is, expressions that repeat over time such as “every Thursday” or “twice a week”) and durations, and some additional temporal expressions developed for the legal domain, such as specific expressions (e.g., “business days”) and the type interval. Añotador outperforms the available state-of-the-art temporal taggers for Spanish [12]. It receives as input the text to annotate and optionally a reference date, called anchor date (if no date were introduced, the current date would be considered). With this information, the system is able to both find and normalize temporal expressions, this is, express them as a standard value, usually normalizing with regard to a reference date. If we had for instance the sentence “I went to the park yesterday” with “2019-09-20” as reference date, we would consider that the normalized value of ‘yesterday’ is “2019-09-19”. Nevertheless, not every temporal expression is normalized with regard to this initial anchor date. Once the temporal expressions in the text are identified using some hand-made rules specifically developed for the Spanish language, we apply a normalization algorithm that takes into account previous dates in the text for normalizing temporal expressions (see Fig. 2).

Fig. 2.
figure 2

Pipeline of Añotador. The user introduces the text to annotate and optionally a reference date.

Figure 2 shows how the text is first preprocessed using CoreNLP [13] and some IxaPipes [14] models. Then, different rules apply at different stages in order to detect the temporal expressions in the text. Once we have them, a normalization algorithm is applied in order to find their value. Finally, the system returns the text tagged.

Añotador has been tested against different state-of-the-art temporal taggers, both for legal English and for the Spanish language. Updated results of these evaluations can be found in its website [11].

5 WhenTheFact: Dealing with Events

After being able to identify temporal expressions using Añotador and lawORdate, the next logical step would be to detect events. Our current work focuses in detecting legal events in judgments, not covering just in the mention of the event (as most temporal taggers do), but also considering all the surrounding information available, such as the parts involved, when and where it happened or the jurisdiction involved. This was already done for a different type of legal document in a previous work, detecting events related to the lifecycle of a contract [15], but while in that case a rule-based approach was successful, taking into account the limited amount of events targeted, for legal judgments the amount of relevant events demands a more flexible approach.

To this aim, we are considering different lines of research in parallel, in order to detect the different types of events we can find in a judgment (e.g., the facts under judgment, that change from case to case, and the legal events, such as applications or decisions, that are court-related and tend to occur in all cases). To test the different approaches, a corpus of legal documents annotated with events (the first publicly available of its kind as far as the authors know) has been built [16], in collaboration with experts from other institutions, based on previous related works [17, 18].

Fig. 3.
figure 3

Different tools available in for temporal processing of legal texts.

In Fig. 3, first, the text is preprocessed by lawORdate; then, the temporal expressions can be more accurately found by Añotador (once done, lawORdate would return the original legal references to the text). Finally, WhenTheFact detects the relevant events in the text (current online implementation already includes Añotador in order to perform the full processing).

6 Conclusions

In this paper we presented the different tools created for processing temporal information from legal texts. The first service introduced, lawORdate, “cleans” the document of misleading legal references; then, Añotador is able to tag and normalize the temporal expressions in the text. WhenTheFact detects events and builds a timeline from it. The suite therefore covers a full processing from the temporal perspective.

Although WhenTheFact is still an ongoing tool, able just to extract events from very specific types of texts whose structure is already known and with room for improvement. Additionally, WhenTheFact builds a timeline, but we consider further applications would be useful for the legal domain, such as event-based summarization or pattern recognition. To facilitate these potential applications, represent temporal information in a standard and NLP focused manner would be extremely helpful. For this reason, next steps include the definition of this representation option by gathering different already available ontologies and schemas. Additionally, WhenTheFact is currently being expanded to cover more languages and types of documents. All advances in these directions will be reflected in the website of TimeLex [1].