Keywords

1 Introduction

According to the Strategy of scientific and technological development of the Russian Federation, among the highest priorities are “transition towards leading digital and intelligent production technologies, robotic systems, new materials and construction methods, creation of systems for big data processing, machine learning and artificial intelligence”, “capability for the Russian society to respond to major challenges involving interaction of human and nature, human and technologies, social institutions on the current stage of the global development, particularly through application of methods from the humanities” [1, paragraph 20].

The project activities department of the Russian Government has prepared proposals on digitalization of law-making and law enforcement to enhance the somehow outdated legislation, which should aid in improving the business and social environments, as well as the quality of life indexes. The issues of automated law-making based on artificial intelligence (AI) are being actively discussed, since this development direction was stated by the “Digital Economy” program asserted in July 2017 by the government. They also devise the actions plan for legal regulation in the digitalized economy, among which the development of machine-readable language for law-making and application of AI for analysis of legal acts.

The today’s legal reality, shaped by the society’s development needs, requires holistic legal regulation of social relations and, thus, consistent and unified law-making. Besides, it is necessary to follow all the principles and rules of the legal technique, binding for all the law-makers: the stability of legislation, advisability, timelessness, system analysis, completeness and concreteness of regulation, unambiguity of legal acts interpretation by all the legal relations subjects [2, p. 7], including the judicial authorities. At the same time, as noted by some researches, the persisting important problem in legal regulation is certain skewness in the legislation and lack of systematic vision for its development [3]. Ignorance and non-compliance with the legal technique inevitably cause errors in law-making that subsequently lead to imbalance in legislation. The harmoniousness of legislation must be ensured by rigorous application of the technique in every document’s adoption, but at the same time imperfections of law-making are widely noted by both law theorists [4] and practicians from various branches of the law [5]. We, after many other researchers, lawyers and even politicians, argue that legal expert systems could significantly advance digitalization of the law-making and law enforcement.

The international legal practice has known software expert systems for solving individual tasks since the 1970s – more than 25 research projects in AI application in law has been carried out in USA, Germany and Great Britain. JUDITH (1975) developed by Heidelberg and Darmstadt Universities was one of the first legal expert systems that allowed the lawyers to obtain expert opinions on civil cases. The knowledgebase of the system contained prerequisites and executive files that indicate the relationships between the prerequisites; JUDITH could be also used for studying the legal reasoning [6]. Another example is Shyster system for consulting in case-based law [6]. There is currently ongoing development in creation [7] and integration [8] of legal ontologies and the so-called Legal Knowledge Based Systems that use them: e.g. for classification of cases based on natural language processing [9], or for structuring the selection of legal documents [10]. The “legal tech” industry in the USA and Europe is currently booming, and the “robotized” legal services are often cheaper and more efficient than the ones provided by humans. The exponential growth in popularity of new services is already reshaping the industry, and it’s forecasted by Deloitte that in Great Britain alone computer algorithms will replace 114 thousands lawyers (39% of their total number) in the next 20 years.

However, since the semantic analysis technologies are language-dependent and legislations in different countries are very much diverse [11], the results and products from one country generally cannot be directly employed in another one, while the approaches and frameworks [12] can only be partially useful. In Russia, the most widely used legal systems are Consultant Plus and Garant, which are essentially legal assistance systems capable of finding relevant texts by keywords, thus allowing selection of legal materials. Also, reasonably popular are expert legal systems employed in forensic science and investigations, but their functionality is understandably limited. Most expert systems used in Russia are of management or technological domains, so the current practice suggests that the country is quite far from introduction of computational law [13].

Effectiveness of legal acts is the result of their enforcement, testifying the ability of legal norms to resolve the corresponding social and legal issues, considering the resources spent on the enforcement. The evaluation of the effectiveness is carried out during legal monitoring and it needs to also produce recommendations on how to implement the norms and make legal decisions in a more efficient way. One recognized way to increase this organizational quality is implementation and utilization of legal ontology-based intelligent systems. Developing such a system generally involves application of special juridical methods, construction of ontology for automated natural- (or, rather legal-) language text processing and mining, choosing and applying vectorization models, semantic analysis of the texts with classical and heuristic algorithms, as well as general AI and machine learning methods, such as artificial neural networks, regression analysis, etc.

In our current work we focus on development of legal ontologies, specific for the legal texts mining and indexing tasks. The remaining of our paper has the following structure. In Sect. 2 we describe methods and tools in knowledge engineering that are relevant for the problem, and briefly justify the use of OWL as the knowledge representation format. In Sect. 3, we perform analysis in housing legislation with respect to relations between lexical and judicial terms and then provide description of demo OWL ontology module implementation in the popular Protégé editor. In Conclusions we summarize our contribution and outline directions for further work.

2 Methods and Tools

A formalized approach should be able to identify and overcome flaws in law-making and law enforcement, since legislation is the basis for law enforcement, which is in turn operationalization of the effective legal acts. One possible way to increase the effectiveness of legal regulation is monitoring of law enforcement [14] that is generally positively assessed by researchers [15, 16]. This legal monitoring is the systematic activities by the responsible governmental bodies, research community, society institutions and organizations in evaluation, analysis, generalization and forecasting of the legislation status and enforcement practice. The monitoring of the law enforcement, particularly of legal proceedings, allows identification of effectiveness of various legal acts and selected legal norms, forecasting of the future state of social relations (both the ones regulated by the laws and the ones outside the legal system), making proposals on systematic improvement of laws and regulation, i.e. correcting the legal norms and the law enforcement practice [17].

We presume that the main directions for solving the above problems should be the analysis of pending and effective legal acts in the following aspects:

Identification of Legal Acts Regulating the Respective Social Relations for Introducing Coordinated Modifications.

High intensiveness of law-making on various levels often causes the modifications to be fragmented and inconsistent, as the novelties lack unified logics. Some of the errors inflicted by the law-makers and the impossibility to incorporate the new law into the legislation are only found during law enforcement. There is currently lack of pre-emptive measures, but it is highly desired to identify all the related legal acts already during the new law’s development stage, to make coordinated changes in all of them.

Identification of Gaps in the Legislation.

The gaps in legislation are unavoidable, but a high number of them inevitably imply low level of legal regulation of social relations. Among the causes for the gaps are both objective factors, as legislation always lags behind the real life that brings on new forms of social relations, and subjective ones, related to the law quality of law-making. Law-related literature has quite a lot of research on individual gaps and the ways for overcoming them [18, 19], but there’s lack of integral solutions that could allow avoiding the gaps during the legal act’s development stage.

Identification of Collisions in the Legislation

Legal science knows the concept of the legal collision and their types [20], but just as for the gaps, no pre-emptive technologies exist that could effectively detect the collisions. The developers of laws identify them “manually”, generally relying on legal reference systems.

Unification of the Conceptual Apparatus Employed in the Legal Acts [21].

Ambiguity in lexical terms is always a serious challenge in such a complex domain as legal system. The growing specialization of knowledge increases the amount of scientific and technological terms, special expressions, etc. To attain the uniformness of legal terms, it is necessary to consistently use the same terms throughout the normative texts [22]. Meanwhile, the attempts to unify the legal terms, such as the one undertaken by the Russian Legal Academy of the Ministry of Justice in regard to the laws developed by the Federal executive power bodies [23, p. 56] have not so far gained wide application and did not resolve the problem.

Anti-corruption Inspection of the Legal Acts Drafts.

According to the current legislation, the anti-corruption inspection is carried out on Federal, Regional and Municipal levels. The legal foundation for the inspection is laws and by-law documents [24, 25], while law-related literature contains quite a lot of works dedicated to the legal aspects of such inspection [26, 27]. However, its AI-based automation has drawn the researchers’ attention just during the few last years, and there’s clear lack of effective solutions in this field.

2.1 Technologies for Knowledge Representation and Reasoning

Semantic analysis, which is the prime technology for extracting meaning from texts, generally involves indexing – describing the text with a set of special terms extracted from the text or taken from a constrained (controlled) dictionary. Information retrieval systems perform indexing, formulation of query (user’s information needs specified in a language understood by the system) and its comparison with the available (indexed) information. The most widely used technologies for creating indexes are [28]:

  1. 1.

    Bag of words: a set of unrelated terms (sometimes also called tags) that describe a certain object or information resource. Bag of words indexing/classification is currently widely applied to multidimensional objects such as audio, video or images, in data stores and recommender systems.

  2. 2.

    Taxonomy: when terms describing a domain form a hierarchy of categories, a taxonomy is a structure that generally has high clarity and is easy to comprehend due to the fact that only one semantic relation is used in such representation, that is, “parent – child”. However, such choice of the relation imposes certain limitations.

  3. 3.

    Thesaurus: a collection of terms and word combinations grouped into units named concepts. They are organized either hierarchically or with semantic (associative) relations. The chosen relations form a pre-defined and fixed set which generally includes such relations as “parent – child”, “part – whole”, “cause – effect”, or linguistic relationships.

  4. 4.

    Ontology: when a domain (field of knowledge) is formally described with concepts (classes), their attributes, relations between them, application axioms, and constraints, this description forms ontology. Thus, ontologies are more flexible than thesauri since ontology includes any kind of semantic relations, and at the same time permits a more detailed domain specification due to attributes, constraints, axioms, etc.

Since the 1990s, ontologies have been applied in Information Science for knowledge representation, management and integration; they are the key element in the Semantic Web concept. Currently, the following types of ontologies are identified based on their purpose: upper ontologies, domain-specific, and task-specific ontologies. The former aim to describe universal knowledge or codify the use of language (e.g., ontology specification language); perhaps, one of the most prominent examples is CYC, a common sense knowledge ontology. The scope of domain-specific ontologies is a certain domain of knowledge (e.g. LKIF for legislation [29]), while task-specific ontologies are even more concrete and generally built for a particular application. Another important advantage of ontologies that gained wide use since the 2000s, when several ontology libraries were created, is relative ease of integration. That is, domain ontology can be developed based on the concepts available in already existing upper ontology, even though finding a relevant ontology for integration or reuse [8] may involve additional research work.

So, the generally recognized benefits from using ontologies include:

  • joint usage of common information structure by people and software agents;

  • ability to re-use domain knowledge specified in ontologies;

  • specification of explicit assumptions in the domain;

  • possibility to formally analyze the domain knowledge.

All the above aspects are relevant for legal analysis, studies of juridical processes, legal acts development and decision-making. In the current work we will focus on ontology integration and analysis of the legal terms application, including improper usage, duplication, etc.

2.2 LKIF Core Ontology

In the quest for specifying the conceptualization in the legal domain, existing ontologies do provide some basis for reuse on the upper levels. However, the details differ on domain level, due to inconsistencies in national legislations and legislation systems, such as Anglosaxon vs. Continental laws. Task-specific ontologies, such as the ones for legal texts mining and indexing, generally have to be created anew for different languages and nations.

Let us illustrate this on the basis of Legal Knowledge Interchange Format (LKIF) ontology [29], which was created for the “translation of existing legal knowledge bases to other representation formats” and “has a firm grounding in commonsense”. The authors seek to introduce architecture for developing legal knowledge systems and facilitate the exchange of knowledge between the existing systems. The overall structure of LKIF Core is presented in Fig. 1.

Fig. 1.
figure 1

Structure of LKIF (Core modules) [29].

Some modules need virtually no changes per various nations and legislation systems (example in Fig. 2). Some parts of the ontology under integration have to be closely examined for necessary changes and adaptations to national legislations (example in Fig. 3). The important technical issue in the ontology integration is the knowledge representation format. During the last decade, OWL (Web Ontology Language) prescribed by the W3C [30] is becoming the de-facto standard, even though certain modifications to it do exist.

Fig. 2.
figure 2

Actions, agents and organizations in LKIF [29].

Fig. 3.
figure 3

Qualifications and norms in LKIF [29].

2.3 OWL Ontologies and Protégé Editor

The general requirements towards ontology languages include [31]:

  1. 1.

    a well-defined syntax,

  2. 2.

    a well-defined semantics,

  3. 3.

    efficient reasoning support,

  4. 4.

    sufficient expressive power,

  5. 5.

    convenience of expression.

To address these requirements, some of which can be actually incompatible, the W3C’s Web Ontology Working Group has devised Web Ontology Language (OWL), whose major family members currently include OWL Full, OWL DL, and OWL Lite. OWL has the highest level of expressivity compared to many other knowledge representation formats and the relations between classes in OWL ontology can be formally modeled based on description logics. Basically, it adds semantics to data schema and allows specification of many details about classes (concepts) and their properties. Another important advantage of OWL with respect to ontology integration is that it can be serialized in many languages and notations (RDF/XML, Turtle, etc.) and even stored in more traditional relational databases [32].

A popular tool for creating and managing OWL ontologies is free and open-source Protégé-OWL editor developed by Stanford University’s Center for Biomedical Informatics Research (https://protege.stanford.edu/). It provides GUI interface, framework for adding extra components (e.g. for graphical representation of the ontology), supports names in different languages and encodings, allows OWL ontology export to several formats and has modular ontology support.

In the next section we first perform analysis of a legal branch domain with respect to relations between lexical and judicial terms and then describe the formal conceptualization as demo OWL ontology in Protégé editor.

3 Implementation

Development of a domain ontology involves the following principal steps:

  1. 1.

    Defining the purpose and scope of the ontology. Legal ontologies are the basis for legal intelligent systems that perform the tasks we previously outlined. In our current paper, we are developing an example ontology for a domain of limited scope – the housing legislation.

  2. 2.

    Considering existing ontologies. As we mentioned before, existing upper level legal ontologies can and should be used in the undertaking. However, particulars of national legislations generally make it impossible to re-use ontologies that conceptualize legislations of other countries and in different languages.

  3. 3.

    Listing the relevant terms (concepts) and structuring them. The list of the relevant terms and the relations between them in our work are provided by domain experts (practicing lawyers) who extracted them from existing legal documents. The structuring is mainly performed with a top-down approach.

  4. 4.

    Specification of properties, allowed values (facets) and individuals. Again, in our work these are specified based on the information extracted from domain experts and the acting legal documents. Based on the ontology goals, we are also introducing some semantic relations between the concepts.

3.1 The Housing Legislation

As an implementation example, in our current work we focus on housing legislation with respect to the following most urgent issues:

  1. 1.

    Identification of legal acts that contain the conceptual apparatus under analysis and can be subject to the modifications.

  2. 2.

    Identification of various types of collisions in the legislation.

  3. 3.

    Unification of the conceptual apparatus used in the legal acts.

The research has been done with the basic concepts in the housing legislation for the purpose of revealing the uniformity of their application in the normative legal acts of the constitutional, civil, housing, criminal and other branches of legislation, as well as the possibility of identifying the distinguishing features of the concepts under study. Together with the domain experts, we build ontology “module” with the terms that relate to the identified and classified problems that the authors found in the legislation, such as “premises” (пoмeщeниe), “house” (жилищe), “living premises” (жилoe пoмeщeниe) and “housing” (жильё). “Living premises” is the foundational concept in the housing legislation and is a kind of “premises”. At the same time, Russian legislation in various domains also use the terms “house” and “housing” that do not always carry meaning. They are often used hierarchically and have ambiguous meaning, which contradicts to the legal technique requirements (resulting in collision) and decreasing both efficiency and effectiveness of law-making and law enforcement. The “living premises” term is used in a very large number of legal acts – in 700 just federal laws. However, some of them in quite a chaotic manner also contain the terms “house” and “housing”, which unlike “living premises” are never explained in legislation or doctrine. Sometimes they are used as synonyms, but sometimes they contradict each other and the “living premises” term and alter the meaning of a legal regulation. According to the rules of logics, “living premises” must be a subset of “premises”, but even this is not always respected. To make the issue even worse, the existing legal search systems (Consultant Plus and Garant) do not distinguish the terms and the search on “house” returns all the legal acts containing the terms with the same root.

Such an approach appears not just methodologically incorrect and complicating the application of legal acts, but also plain dangerous. According to the rules of logics, each concept has its own meaning that is expressed through certain terms – i.e. word or expression that is unambiguous for the term in all scientific fields. The reasoning behind the existing search engines is understandable – the search is performed based on associations (Fig. 1), and the above terms both have the same root and the same association with the place to live. With respect to legal search (that in this case offers virtually no means for refining the query), these additional results just cause the need for extra information processing – a mere inconvenience, – but in legal acts development it may lead (and actually does lead, as the analysis of legislation and the law enforcement suggests) to major errors. Consequently, the accurate implementation of rules prescribed by legal technique is essential in such situation.

Currently, the concept best detailed in Russian legislation and doctrine is “living premises”, derived from the doctrinal concept “premises”. The following are the attributes of the “premises”:

  1. 1.

    Being a real estate object;

  2. 2.

    Seclusion – i.e. boundedness by a 3D perimeter, existence of a separate entrance.

Being a premise, the “living premises” inherit these attributes and also has its own qualifying attribute – fitness for permanent living. Besides, the additional qualifying attribute is the division between the living area and the supplement area (of support facilities). In the absence of the attributes allowing qualifying a premise as a living premise, it is considered non-living.

The analysis of Russian legislation allows concluding that by using the term “house”, the law-maker implies both a living premise, a non-living premise, and objects that do not possess attributes of a premise at all. The same trend can be seen in international legal acts, including the practice of the European Court of Human Rights [33]. Examples of objects identified as “house” are an advocate’s office or a trailer [34]. The problem would not be particularly adverse if each concept was qualified using the attributes stated above. So, if the law-maker intended to specify the object for the needs of criminal law – as the one intrusion to which is punishable – then it was unacceptable to borrow the inter-branch terms with accepted doctrinal and legal attributes without considering their relations. With the accepted classification, “house” should have been defined as “living premises, as well as other kind of building or premise used for temporary living, and a house where a citizen lives”. Such a wording would allow avoiding the artificial mix-up of the conceptual apparatus elements.

Another term with the same root is “housing”, which is currently understood per the following five meanings:

  1. 1.

    A concrete living premise;

  2. 2.

    A set of living premises (housing stock);

  3. 3.

    Objects from a group of premises;

  4. 4.

    Objects lacking the attributes of premises (sometimes very exotic ones, such as a boat cabin or a train conductor compartment, etc.);

  5. 5.

    Residential area.

If we are to analyze the relationships between them, we may note that the first four meanings duplicate the other existing legal concepts. So, it’s unfeasible to use the pairs:

  • “housing” – “living premises”: since the attributes of the latter are well defined, these are synonyms;

  • “housing” – “housing stock”: since the concept and the types of the housing stocks are defined in art. 19 of the Housing Code of the Russian Federation [35], these are synonyms also;

  • “housing” – “non-living premises”: an associative relation is absent, common attributes are absent, the premise is not suited for living, but a citizen may reside there due to temporary reasons, e.g. staying in a hotel premise;

  • “housing” – “objects lacking any attributes of a premise”: also lacking associative relation, no common attributes.

Consequently, with respect to formal legal language and the conceptual identity rule, the only acceptable meaning is the fifth one: “housing” is the residential area.

3.2 The Housing Legislation Ontology Module

To operationalize the results of the above analysis of the housing legislation, we implemented the Housing Legislation legal ontology module in Protégé-OWL editor (version 5.2). The classes represented in Fig. 4 are the concepts (terms) covered by the analysis, initially with hierarchical relations only between “premises” – “living premises” and “premises” – “non-living premises”. The Disjoint with relationship (that exists by default in Protégé-OWL) was specified between the “living premises” and “non-living premises” classes. The corresponding individuals (instances of the classes) mentioned in the analysis are presented in Fig. 5.

Fig. 4.
figure 4

Classes in the Housing Legislation ontology module.

Fig. 5.
figure 5

Individuals in the Housing Legislation ontology module.

In Fig. 6 we show selected properties created in the ontology, corresponding to housing objects. Of particular interest are the lexical synonymy and the legal equivalence properties: the former reflects the actual use of two terms in legal documents, while the latter denote the proper state of affairs, corresponding to good legal technique as explained in the analysis.

Fig. 6.
figure 6

Properties in the Housing Legislation ontology module

So, all the five pairs of terms (“Housing” – “Living premises”; “housing” – “Housing stock”; “Housing” – “Non-living premises”; “Housing” – “Non-premise object”; “Housing” – “Residential area”) have lexical synonymy, but only the pair “Housing” – “Residential area” has legal equivalence.

The overall structure of the Housing Legislation ontology module with the hierarchical relationships is shown in Fig. 7 (auto-composed with OntoGraf tool).

Fig. 7.
figure 7

The structure of classes and individuals in the Housing Legislation ontology module.

4 Conclusions

In our paper we illustrated one of the many contradictory aspects in the complex legal conceptual apparatus and the problems arising due to low quality of the regulatory material, caused in particular by modifying it without consideration of relations between the concepts, their scope and content. Lexical and terminological ambiguity is definitely a serious disadvantage of such a sophisticated system as legislation. For unification of juridical terms, each of them must be used consistently, denoting a certain concept in legal text. However, today when a new legal act draft is being prepared, the large amounts of related data are generally processed “manually”, using just intelligence and time available for the developers. Accordingly, there are risks caused by their inability to process large datasets, inattention, fatigue, immense time costs, etc. Automation of selected processes in these activities could free significant amounts of human intelligence resources for their more efficient utilization and ultimately enhance the effectiveness of law-making and law enforcement in Russia.

In the current work we devised an OWL ontology module for such a popular law branch as housing legislation, which can be used for integration with a higher-level legal ontology, such as e.g. LKIF Core, which we considered as an example. The ontology implements the results of the analysis of the housing legislation with respect to lexical and legal-bounding terms, which was carried out by expert lawyers. Further implementation of the approach may decrease the number of errors, over-complexities and ambiguities in legal texts, allow automated search for relevant documents, and categorize complicated legal relations. These should save the practitioners from spending too much time on routine tasks, simplify decision-making in law enforcement and reduce the subjectivity, and ultimately contribute to creating uniform and consistent legislation.

As A. Ivanov noted, “The law has certain inherent properties that won’t let fully entrust its making and enforcement to AI, i.e. the machines. The necessary discrepancy of legal norms with the formal logics rules – is one of the obvious reasons preventing its digitalization. Too many clauses and exceptions will have to be made when transforming the polysemantic terms into computers’ language. Thus, the law itself would have to be transformed first, so that its terms have the same meaning in all the regulations. A titanic task!” [36]. Obviously, human mind won’t be fully replaced by artificial intelligence, but the task could and should be resolved to a reasonable extent, by joining the efforts of researchers from various fields.

In our current paper we only provide illustrative example justifying the general need and approaches for the legal ontologies development. Our further research will involve implementation of larger scope legal ontologies, covering a whole group of social relations, and subsequently institutions and sub-fields of the law.