Keywords

1 Semantic Finlex Linked Open Data Service

Finnish legislation and case law have been published as web documents since 1997 in the Finlex Data BankFootnote 1. Although the Finlex service is widely used by the public, it does not provide machine-readable legal information as open data, on top of which services and analyses could be built by the ministry or third parties. The first version of Semantic Finlex based on Linked Data was published in 2014  [4]. The data included 2413 consolidated statutes, 11904 judgments of the Supreme Court, and 1490 judgments of the Supreme Administrative Court. In addition, some 30000 terms used in 26 different thesauri were harvested for a draft of a consolidated vocabulary. During the work, shortcomings of the initial RDF data model became evident as well as the need for using the then emerging new standards for EU level interoperability: ELI European Legislation Identifier  [3] and ECLI European Case Law Identifier  [2]. The dataset also consisted of only one version (2012) of the statutory law and was not updated, as new legislation and case law was published in Finlex. The issues were resolved in the new version of Semantic Finlex  [10] that currently hosts a dataset comprising approximately 28 million triples. The data was enriched by automatic annotation to named entities (judges mentioned in the court decisions) and references to legal texts (such as EU law transposed by the statutes and statutory citations appearing in court cases), vocabularies, and data sources, such as DBpedia, by utilizing different named entity linking tools [10, 13].

The Semantic Finlex service adopts the 5-star Linked Data modelFootnote 2, extended with two more stars, as suggested in the Linked Data Finland model and platform [7]. The 6th star is obtained by providing the dataset schemas and documenting them. Semantic Finlex schemas can be downloaded from the service and the data models are documented under the data.finlex.fi domain. The 7th star is achieved by validating the data against the documented schemas to prevent errors in the published data. Semantic Finlex attempts to obtain the 7th star by applying different means of combing out errors in the data within the data conversion process. The service is powered by the Linked Data FinlandFootnote 3 publishing platform that along with a variety of different datasets provides tools and services to facilitate publishing and re-using Linked Data. All URIs are dereferenceable and support content negotiation by using HTTP 303 redirects. In accordance with the ELI specification, RDF is embedded in the HTML presentations of the legislative documents as RDFaFootnote 4 markup. In addition to the converted RDF data, the original XML files are also provided. To support easier use by programmers without knowledge of SPARQL or RDF, a simplified REST API is provided, too. As the underlying triplestore, Apache Jena FusekiFootnote 5 is used as a Docker container, which allows efficient provisioning of resources (CPU, memory) and scaling.

2 LawSampo Semantic Portal

To demonstrate the use of Semantic Finlex in applications, the semantic portal LawSampo is being developed. LawSampo is a new member in the SampoFootnote 6 series of semantic portals, based on the “Sampo model” [6], where the data is enriched through a shared ontology and Linked Data infrastructure, multiple application perspectives are provided on a single SPARQL endpoint, and faceted search and browsing is integrated with data-analytic tooling. The faceted search and tooling are implemented using the Sampo-UI frameworkFootnote 7 [8]. The Sampo portalsFootnote 8 have had millions of end users on the Web suggesting that it is a promising model to create useful semantic portals.

The landing page of the LawSampo portal offers different application perspectives: 1. Statutes. By clicking on Statutes, a faceted search interface [14] for searching and browsing statutes is opened. The facets on the left include document type (with seven subtypes), statute type, year, and related EU regulation. After filtering out a set of documents (or a particular document) of interest, the user is provided with two options. First, the user can select a document from the result list and a “homepage” of the document opens, showing not only the document but also linked contextual information related to it such as the referred EU regulations linked to EU CELLARFootnote 9 or other documents from Semantic Finlex referring to it. For example, court decisions in which the statute has been applied can be shown. Second, it is possible to do data analysis based on the filtered documents. For example, a histogram can be created showing the dates of the filtered documents. 2. Case Law. In the Case Law perspective, a similar faceted search interface opens for searching and browsing court decisions. In this case, the facets include court, judge, and keywords characterizing the subject matter of the judgement. 3. Case Law Search. The third perspective is an application, where a law case judgement, or more generally any document or text, can be used for finding similar other case judgements. For example, if one gets a judgement from a court, this application can be used to find out what kind of similar judgements have been made before. Several methods for finding similar cases were tested when implementing this application including TF-IDF, Latent Dirichlet Allocation (LDA), Word2Vec, and Doc2vec  [11, 12]. 4. Life Events. In addition, a fourth perspective is being implemented by which legal materials can be searched for based on the end user’s life situation problem at hand (e.g., divorce).

3 Related Work and Contributions

Our work on legal Linked Data services was influenced by the MetaLex Document ServerFootnote 10  [5] that publishes Dutch legislation using the CEN Metalex XML and ontology standards. Other Metalex ontology based implementations include legislation.gov.ukFootnote 11 and NomothesiaFootnote 12 that also implements ELI-compliant identifiers. Various ELI implementations and prototypes have also been implemented in existing legal information portals nationally, e.g., in LuxembourgFootnote 13, FranceFootnote 14, and NorwayFootnote 15. Many countries already produce ECLI-compliant case law documents to be indexed by the ECLI search engineFootnote 16. A prominent example of publishing EU Law and publications as linked data is the CELLAR system. Previous related works in the U.S. include, e.g., the Legal Linked Data project aiming at enhanced access to product regulatory information [1].

LawSampo aims to widen the focus of these related works by providing both legislation and case law to end users through intelligent user interfaces, such as semantic faceted search and document similarity-based search. The documents are automatically enriched with contextual linked data, and the end user is also provided with ready-to-use data-analytic tooling for analyzing the documents and their relations. In the future, we plan to expand the related enriching datasets to include, e.g., related parliamentary documents and discussionsFootnote 17, in the spirit of [15]. In order to be able to publish more legal documents in cost-efficient way, we also work on semi-automatic pseudonymization of court judgements [9] and automatic annotation of legal documents [13].