From Laws and Decrees to a Legal Dictionary

Kourtin, Ismahane; Mbarki, Samir; Mouloudi, Abdelaaziz

doi:10.1007/978-3-030-92861-2_16

Ismahane Kourtin^9,10,
Samir Mbarki¹⁰ &
Abdelaaziz Mouloudi¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1520))

Included in the following conference series:

International Conference on Automatic Processing of Natural-Language Electronic Texts with NooJ

249 Accesses
1 Citations

Abstract

The mass of information in the legal field, which is constantly increasing, has generated a capital need to organize and structure the content of the available documents, and thus transform them into an intelligent guide, capable of providing complete and immediate answers to queries in natural language. Therefore, the Question Answering System (QAS), which is an application of the Automatic Language Processing domain (NLP), responds perfectly to this need by offering different mechanisms to provide adequate and precise answers to questions expressed in natural language. The general context of our work is the construction of an ontology-based legal question-answering system, allowing users to ask questions about desired information using natural language without having to browse through documents.

In this article, we will mainly focus on the construction of a legal dictionary from textual laws and decrees, for the natural language automatic processing platform NooJ. The legal dictionary that we propose to build from laws and decrees, will bring together the terminological material that will serve as a linguistic resource for the automatic processing of users’ questions in natural language, and in particular during the information extraction step which is necessary for the formulation of SPARQL queries equivalent to users’ questions.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Legal Question Answering Ontology-Based System

A Semi-automated Ontology Construction for Legal Question Answering

Article Open access 06 September 2019

A Methodology for a Criminal Law and Procedure Ontology for Legal Question Answering

Keywords

1 Introduction

Question-answering systems (QASs) offer different mechanisms to provide adequate and precise answers to questions expressed in natural language. Indeed, this type of system allows user to ask a question in natural language and receive a precise answer to his request instead of a set of documents deemed relevant, as in the case of search engines.

The first process in QASs is to extract the information from users’ questions that are expressed in natural language. One of the crucial steps in the extracting information from texts is the recognition of named entities. The term named entity appeared during the MUC6 conference (Message Understanding Conference) [1]. These are the entities that have a determined designator (e.g. “EDF”, “Jules Verne”). They include proper names or expressions such as the species names (e.g. “Bengal tiger”), diseases, or chemicals. This definition has also been extended to temporal expressions such as dates and times, or to numeric values (e.g. 2.3 g/l).

By legal entities, we mean named entities specific to the legal field such as acts and facts. Detecting such entities requires the availability of resources describing the domain vocabulary and / or training corpus allowing the learning of the common characteristics to these entities.

Our goal in this article is to build a legal dictionary that will be used for the automatic analysis of the users’ questions expressed in natural language in order to extract the information that is needed to formulate SPARQL queries equivalent to users’ questions.

The rest of this document is organized as follows: First, Sect. 2 presents related work on extracting terms from texts. Subsequently, Sect. 3 presents the legal field and its complexity. Then, Sect. 4 describes the methodology used for the construction of the legal dictionary. Finally, we end this article with the results of the experimentation of the legal entities recognition by applying our legal dictionary in Sect. 5, and conclude in Sect. 6.

2 Extracting Terms from Texts

A term is an expression with a unique meaning for a particular domain [2]. In the legal field, the words “tax service” become a term in relation to the field, it has a unique meaning in this field.

Term extraction consists of identifying potential terms in a specific text or a set of texts (corpus) as well as the relevant information related to the use of these terms or to the concepts to which they refer (definition, context, etc.).

Extracting terms is an important step in building a dictionary from a corpus. Terms are words or expressions having a precise meaning in a given context, and represent the linguistic supports of the concepts. The problem of building up resources is at the heart of terminological activity. If the notion of “term”, which appeals to that of concept and is often based on a particular act of reference, does not seem to lend itself to computer processing, a certain number of tools aiming to extract the terms of a corpus have seen the day [3].

The definition of the term given above exerts strong constraints on the form and the functioning of the terminological units. These constraints constitute the operational principles of terminology extraction software that have been developed in recent years. The objective of these software is to automatically provide a more or less structured lexicon of the domain.

We can distinguish three types of approaches for the automatic term extraction: (i) linguistic approaches that use lists of named entities and manually written recognition patterns [4, 5], (ii) statistical approaches based on learning techniques from annotated texts [6, 7] and (iii) hybrid approaches which integrate the first two methods [8, 9]. Table 1 gives a brief description of each approach for the automatic term extraction.

Table 1. Approaches for the automatic term extraction

Full size table

3 The Legal Field

The legal field is a complex field by its terms which can be:

Terms with only a legal meaning;
Terms with at least one legal and non-legal meaning;
Terms designated by their synonyms in different texts;
Terms appearing in different morphological forms;
Non-synonymous terms with the same legal meaning.

In addition, there are different lexical forms that legal terms can take. Table 2 gives some examples of legal terms with their lexical form.

Table 2. Examples of legal terms with their lexical form

Full size table

These examples of legal terms show the diversity and the infinity of the lexical forms of the legal terms. We find terms in the form of “Noun”, “Noun-Adjective”, “Noun-Preposition-Noun”, etc. This lexical diversity makes it impossible to automatically extract the legal terms based on lexical grammars.

No resource on the legal terms has been developed for the legal field. Therefore, we decided to build a NooJ legal dictionary describing the legal terms and their categorization, which will be used for the automatic analysis of the users’ questions that are expressed in natural language, using the natural language automatic processing platform NooJ [15]. The latter makes it possible to build, test and manage formal descriptions in a wide coverage of natural languages, in the form of electronic dictionaries and grammars.

4 The Legal Dictionary

The description of natural languages is formalized in the form of electronic dictionaries and grammars represented by organized sets of graphs. NOOJ dictionaries are used to represent, describe and recognize simple and compound words. Dictionaries are.nod files compiled from editable.dic source files.

Our goal is to build an electronic dictionary of legal terms for NOOJ. A term can be simple if it contains one word, or compound if it contains more than one. A compound word is built from simple words. Silberztein M. [16] defines a compound noun as a consecutive sequence of at least two simple forms and blocks of separators. A simple form is a consecutive nonempty sequence of characters of the alphabet appearing between two separators. A single word is a simple form that constitutes a dictionary entry.

The legal dictionary that we propose to build from laws and decrees, will bring together the terminological material necessary for the automatic processing of legal texts, and in particular during the stage of transforming users’ questions, in natural language, to SPARQL queries in our question-answering system. We have adopted a methodological framework in 6 steps for the construction of the legal dictionary (see Fig. 1).

4.1 The Constitution of the Legal Corpus

In this step we have built up a legal corpus from laws and decrees. We focused our study initially on the general tax code of Morocco. The general tax code has 3 books (see Fig. 2).

The first book deals with the tax and recovery rules, and has 9 titles and 209 articles. Book 2 deals with the tax procedures and has 3 titles and 39 articles. Book 3 deals with other duties and taxes and has 5 titles and 40 articles.

We started with the first title of the first book of the general tax code, on “corporation tax” (see Fig. 3).

4.2 Extracting the Legal Entities

In this step we have manually analyzed the corpus and extracted the legal entities. We identified 679 legal entities.

4.3 Lemmatization of Legal Entities

Then, we proceeded to the lemmatization of the extracted legal entities by passing words bearing inflection marks (plural, conjugated form of a verb…) to their reference forms (lemma or canonical form).

For example, the legal entity “Personnes imposables” (Taxable persons) becomes “Personne imposable” (Taxable person).

4.4 Inflectional and Derivational Morphology

In this step, we established the inflected and derived forms of the legal entities using NooJ grammars. An extract is given in Fig. 4.

For example, the inflectional model “ACHAT” is defined by “ACHAT = <E>/m+s | <PW> s/m+p;” and means that the legal term that uses this inflectional model has two forms:

The term as it is: masculine singular
The term with an “s” at the end of the first word: masculine plural.

4.5 Conceptualization

After having established the list of the legal entities, we proceeded to group these entities into semantic classes by establishing a list of concepts. We have established 42 concepts. Table 3 gives some examples of legal concepts with their description and some examples.

Table 3. Examples of legal concepts

Full size table

4.6 The Construction of the Legal Dictionary

Finally, we proceeded to the structuring of the legal terms by building an electronic dictionary of legal terms. The electronic computer dictionary was developed with NooJ [17,18,19] and has 679 entries. An extract is given in Fig. 5.

For example, for the dictionary entry “acte d’acquisition définitif”:

acte d’acquisition définitif, NC+TJ+ACTE+FLX = ACHAT.

acte d’acquisition définitif: the legal entity
+NC+TJ: the categories are compound noun and legal term
+ACTE: the semantic class “ACTE”
ACHAT: the inflectional model “ACHAT”

The inflectional model “ACHAT” is defined by “ACHAT = <E>/m+s | <PW>s/m+p;” which means that the legal term has two inflected forms:

acte d’acquisition définitif: masculine singular
actes d’acquisition définitif: masculine plural.

5 Experimentation

The NooJ legal dictionary, which we have developed, is able to annotate and recognize legal entities in natural language text. However, with the legal dictionary one is able to automatically analyze and recognize legal terms in natural language questions, using the natural language automatic processing platform NooJ.

Figure 6 shows the result obtained from the annotation, with the NooJ legal dictionary that we built, of the question in French “Quelles sont les sociétés qui sont passibles de l'impôt sur les sociétés?” (Which companies are liable to corporation tax?). The result of the annotation shows that the term “société” (company) was identified by: noun and legal term masculin plural, of semantic class “COMPANY”; and that the term “passibles de l’impôt sur les sociétés” (liable to corporation tax) was identified by: noun and legal term masculin plural, of semantic class “STATE”.

6 Conclusion

In this work we have developed an electronic NooJ dictionary that allows annotating and recognizing legal terms in natural language texts. We have adopted a methodological framework in 6 steps for the construction of the legal dictionary: (1) we have constituted a legal corpus of laws and decrees focusing on the first title of the first book of the general tax code, on “corporation tax”; (2) we manually analyzed the corpus and extracted the legal entities by identifying 679 legal entities; (3) we lemmatized the extracted legal entities by passing words bearing inflection marks (plural, conjugated form of a verb…) to their reference forms; (4) we have built grammars describing the inflectional and derivational morphology of the legal entities; (5) we have grouped the legal entities into semantic classes by establishing 42 concepts; (6) we have structured legal entities by building a NooJ electronic legal dictionary capable of annotating and identifying legal terms in natural language texts.

As perspectives, we will integrate the legal dictionary into our question-answering system, by using it in the automatic processing of the users’ questions in natural language, which the objective is to extract the information necessary for the formulation of SPARQL queries equivalent to users’ questions.

References

Grishman, R., Sundheim, B.: Message understanding conference 6 - a brief history. In: Proceedings of COLING, Copenhagen, Denmark, (AUG 1996), pp. 466–471 (1996). (Cited pages 17 & 19)
Google Scholar
Azé, J., Heitz, T.: Cours sur la Fouille de textes et Apprentissage (2004). http://www.lri.fr/~aze/enseignements.php
Piwowarski, B.: Techniques d'apprentissage pour le traitement, d'informations structurées: Application à la recherche d'information, Doctoral thesis, University of Paris 6 (2003)
Google Scholar
Poibeau, T.: Le repérage des entités nommées, un enjeu pour les systèmes de veille. In: Terminologies Nouvelles (actes du colloque Terminologie et Intelligence Artificielle, TIA’99, Nantes), no. 19, pp. 43–51 (1999). (Cited page 17)
Google Scholar
Elkateb-Gara, F.: Extraction d’entités nommées pour la recherche d’informations précises. Dans 4e Congrès ISKO France, Grenoble (2003). (Cited page 17)
Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, features induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL (2003). (Cited page 17)
Google Scholar
Raymond, C., Wei, W.: Named entity recognition using hybrid machine learning approach. In: IEEE ICCI, pp. 578–583 (2006). (Cited page 17)
Google Scholar
Kosseim, L., Poibeau, T.: Extraction de noms propres à partir de textes variés: problématique et enjeux. In: TALN 2001, pp. 365–371 (2001). (Cited page 1)
Google Scholar
Fourour, N.: Nemesis: un système de reconnaissance incrémentielle des entités nommées pour le français. In: TALN 2002, pp. 255–264 (2002). (Cited page 17)
Google Scholar
Malaisé, V.: Méthodologie linguistique et terminologique pour la structuration d’ontologies différentielles à partir de corpus textuels, Doctoral thesis, University of Paris 7 – Denis Diderot France (2005)
Google Scholar
Drouin, P.: Acquisition automatique des termes: l’utilisation des pivots lexicaux spécialisés, Doctoral thesis, University of Montreal (2002)
Google Scholar
Lebart, L., Salem, A.: Analyse statistique des données textuelles. Dunod, Bordas, Paris (1988)
Google Scholar
Velardi, P., Missikof, M., Fabriani, P.: Using text processing techniques to automatically enrich a domain ontology. In: Proceeding of ACM-FOIS (2001)
Google Scholar
L’Homme, M.-C.: Nouvelles technologies et recherche terminologique. Techniques d’extraction des données terminologiques et leur impact sur le travail du terminographe. In: L’impact des nouvelles technologies sur la gestion terminologique, Toronto (2001)
Google Scholar
Silberztein, M.: NooJ manual (2006)
Google Scholar
Silberztein, M.: Le dictionnaire électronique des mots composés. In: Langue Française, No. 87, septembre 1990
Google Scholar
Aoughlis, F.: A computer science electronic dictionary for NOOJ. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds.) NLDB 2007. LNCS, vol. 4592, pp. 341–351. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73351-5_30
Chapter Google Scholar
Aoughlis, F.: Construction d’un dictionnaire électronique de terminologie informatique et analyse automatique de textes par grammaires locales. Thèse, Université Mouloud Mammeri, Tizi Ouzou (2010)
Google Scholar
Hildebert, J.: Dictionnaire des technologies de l’informatique. vol. 2, Français/Anglais, La maison du dictionnaire (Paris), Hippocrene Books Inc., New York (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

ELLIADD Laboratory, Bourgogne-Franche-Comté University, Besançon, France
Ismahane Kourtin
MISC Laboratory, Faculty of Science, Ibn Tofail University, Kenitra, Morocco
Ismahane Kourtin, Samir Mbarki & Abdelaaziz Mouloudi

Authors

Ismahane Kourtin
View author publications
You can also search for this author in PubMed Google Scholar
Samir Mbarki
View author publications
You can also search for this author in PubMed Google Scholar
Abdelaaziz Mouloudi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université de Franche-Comté, Besançon, France
Magali Bigey
Université de Franche-Comté, Besançon, France
Annabel Richeton
Université de Franche-Comté, Besançon, France
Max Silberztein
Université de Franche-Comté, Besançon, France
Izabella Thomas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kourtin, I., Mbarki, S., Mouloudi, A. (2021). From Laws and Decrees to a Legal Dictionary. In: Bigey, M., Richeton, A., Silberztein, M., Thomas, I. (eds) Formalizing Natural Languages: Applications to Natural Language Processing and Digital Humanities. NooJ 2021. Communications in Computer and Information Science, vol 1520. Springer, Cham. https://doi.org/10.1007/978-3-030-92861-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-92861-2_16
Published: 03 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92860-5
Online ISBN: 978-3-030-92861-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

From Laws and Decrees to a Legal Dictionary

Abstract

Similar content being viewed by others

A Legal Question Answering Ontology-Based System

A Semi-automated Ontology Construction for Legal Question Answering

A Methodology for a Criminal Law and Procedure Ontology for Legal Question Answering

Keywords

1 Introduction

2 Extracting Terms from Texts

3 The Legal Field