Keywords

1 Problem and Motivation

Open data, open government data definitions and principles were presented in our previous work [1]. This paper will focus on how the technology could be used in dealing with a different regulation of the important subject – open government data (OGD).

In general, data is a fuel for Enterprise Information Systems. According to the Report [1] EU economy could potentially grow by 1.9% GDB by 2020 as a result of reusing big & open data. In the ideal World the idea of Linked Open Data [2] could be realized easily, but the law and the regulation of data make this idea hard to accomplish in a real-life. Governments, municipalities and other public bodies are releasing Public Sector Information (PSI) under different legal and technical conditions, which are unstable and create artificial barriers to get benefits from the re-use of information. Probably, the most efficient results that follow from the use of open government data can be extracted when the data is merged, connected, combined, mixed or enriched and analyzed in other ways. However the legal problems, that do not allow to do it smoothly and to reach the expected economic benefits, exist.

Open data licenses (or other regulation as legal notices, terms of use) are not unified. This problem influences a deep analysis of open data licenses for every developer before starting to connect different datasets in a mashup model. The results of The Survey of the Licensing of Open Government Data [3] had discovered a critical situation concerning regulation (licensing) regime: the national open government data portals consist of datasets which are protected by different licensing regimes starting from 33 (Spain), 16 (Germany, Italy) and ending up to 1–2 (Austria, EC, Moldova, Portugal, UK) regimes.

Different licensing terms mean that: first of all it is not clear if the datasets can be merged, used for commercial purposes or are there any limitations applied to the mashup work protection, also if the different Adapters licenses can be used. The Survey [3] identified that OGD portals consists of the datasets, which identify wrong licensing regimes, or do not identify any licensing regime at all (it is not clear if the link to regulation is missing, or there is no regulation applied), or the rules that come from national PSI law are not being copied. This situation creates a possible risk that government (the owner of OGD) could start legal procedures against the developers of OGD because of violation of the national PSI rules, even when notification about the licensing regime is provided not correctly by the government itself.

So how the developers of Enterprise Information Systems which use OGD could avoid investments to legal analysis of OGD regulation and to reduce risks coming from possible failure of misinterpretation of national law in the global environment? The possible solution is to force governments to withdraw all regulation of the OGD, or alternative solution is to have a tool which provides legal analysis of OGD automatically, or at least semi-automatically.

We believe that it is possible to create such a tool. We decided to deal with the legal problems coming from EU Member States in that way: (1) we have identified general problems existing in the PSI domain of EU (different regulation object in national law, PSI directive and Revised PSI directive is not implemented fully); (2) we have found what kind of specific legal requirements are applied to open government datasets by national PSI law and (3) we have tried to model those requirements in the Ontology aiming to create a useful tool to understand the complexity of OGD regulation on EU level.

This paper is organized: (1) introduction to the problem and motivation; (2) analysis of implementation of Revised PSI Directive; (3) analysis of EU Member States national PSI law; (4) ontology for the legal requirements of OGD; (5) conclusions and future work.

2 Open Government Data: Legal Problems Coming from EU in Re-Use of PSI Domain

In European Union the philosophy of re-use of public information and the main legal requirements applied to Open Government Data are coming from PSI Directive. If the concept of PSI Directive [2] (including Revised PSI Directive [3]) worked as it is planned, legal problems concerning the re-use of open datasets would not exist. Unfortunately the reality is different. EU Commission still has a lot of work to do in order to change the existing opinion, that the information hold by the public institution is the property of the state and “no one can touch it”.

Our investigation has found that the development of EU Commission supported PSI concept could be grouped as:

  1. (1)

    The period before the PSI Directive was adopted;

  2. (2)

    The period of implementation of the PSI Directive (~2003/2005–2013/2015);

  3. (3)

    The period of revision of the PSI Directive in 2013 and its implementation.

Before the PSI directive was adopted, the concept of PSI was developing de-centralized in EU member and pre-member countries. Every single country had its own independent concept which had created “Tower of Babel” effect. In 2003 the PSI Directive was published and should have been implemented until 2005. PSI directive sets a minimum harmonisation of national rules and practices of PSI concept and its re-use. Implementation of PSI directive wasn’t enough successful in Community and revision of PSI directive was made after 10 years. The revised PSI directive gives tools to EU Commission to control the implementation of the PSI directive and hopefully in the next years the united concept of PSI in EU could be found, if EU Commission could use those tools effectively.

2.1 Implementation of Revised PSI Directive

The survey investigated the laws of the national PSI law of Member states published in the Portal of European Commission [4].

There are some explanations of the Table 1: (1) in Spain different charges for the commercial re-use may apply while Revised PSI Directive do not allow such an option; (2) in Latvia the re-use is allowed only for private individuals; (3) in Denmark charging principles are not applied; (4) in Hungary different terms of exclusive arrangements are provided from the 1st of January 2016 instead of the 17th of July 2013 and Hungary excludes libraries, museums and archives, university libraries from the duty to provide the information for the re-use and etc. (5) Finland has not implemented the PSI directive because it had already implemented their unique concept: PSI belongs to the public domain.

Table 1. Implementation of revised PSI directive

2.2 Analysis of National PSI Law

As we already have found the implementation of Revised PSI Directive was not successful, we continued the analysis of national PSI law to get a clear view regarding the legal framework and discover the differences that follows from the OGD regulation.

We have asked two questions to start the legal analysis of national PSI laws in EU Member States: (1) Does the investigation object – public sector information - is understood in the same way as it is defined in EU PSI Directive, if not? If yes, then - how it differs? (2) What are the legal requirements applied to OGD licensing?

Analysis of PSI Term Used in Legal Domain of EU Member Countries.

Analysis of the legal domain in EU and its member countries indicates that the main problem is that term “Public sector information” is differently understood in EU member countries, but EU legislation is trying to gather different concepts to one united concept of PSI.

In the wider approach, PSI concept could be found not only de-centralized or united, but also direct or expanded. Direct concept covers the idea of the concept which already comes exactly from the term “Public sector information” and includes different forms of information managed by Public sector. Expanded concept fulfills the direct concept by extra rules, exceptions and tasks.

There is a good example of direct PSI definition published by The Organization for Economic Co-operation and Development (OECD): Public sector information is “information, including information products and services, generated, created, collected, processed, preserved, maintained, disseminated, or funded by or for the Government or public institution” [5]. OECD PSI definition is clear enough and describes PSI basically as all the information that with holds the Public institution.

EU PSI Directive represents expanded form of PSI concept and presents a bit different concept of PSI (comparing to OECD), because the PSI concept has been developed from “the right to get access to public information” and it’s basically could be described shortly as accessible information to public which can be re-usable by public and it is hold by Public institution. This concept during 10 years has changed a bit from “can be re-usable” (in PSI Directive, 2003) to “must be re-usable” (in Revised PSI Directive, 2013).

The term "information” got expansive meaning in nowadays and usually is used as synonym to data, records, documents and etc. Erik Borglund and Tove Engvall investigated how the open data discourse is communicated in legal text and they found out that there is no single term and the principal words are: record, information, document and data [6].

It is not a surprise that the terminology problems arrive to European Union, especially including its Member States’ legislation. In European Union Member States legislation Public sector information (PSI) definition is understood differently.

In Directive 2003/98/EC (PSI Directive) PSI is understood as a “document” and during revision of the directive the definition was not changed but concept was expanded in Directive 2013/37/EC (Revised PSI Directive). Implementation of PSI Directive and the Revised PSI Directive in the EU Member States still is developing, so the PSI definition is not yet harmonized by EU Member States national law.

Definition of the document is provided by Directive Article 2 Para 1Sec 3: ‘Document’ means: (a) any content whatever its medium (written on paper or stored in electronic form or as a sound, visual or audiovisual recording); (b) any part of such content.” [2] So basically, Public sector information is understood as document or part of the document, no matter what form or content. In preamble of Directive term “document” used as synonym to information and includes also data.

In legal interpretation term “document” is more related to legal responsibility of institution or information holder comparing to other terms as “information” or “data”. Also, concept “access to documents” comes from “right to get information from public sector” and it was understood as right to get some concrete documents.

Secondly, after 10 years PSI directive was revised with an intention to harmonize more the PSI definition in member states. The legislators of Directive 2013/37/EU (revised PSI directive) noted: “since the first set of rules on re-use of public sector information was adopted in 2003, the amount of data in the world, including public data, has increased exponentially and new types of data are being generated and collected (recital 5).” [3] “At the same time, Member States have now established re-use policies under Directive 2003/98/EC and some of them have been adopting ambitious open data approaches to make re-use of accessible public data easier for citizens and companies beyond the minimum level set by that Directive. To prevent different rules in different Member States acting as a barrier to the cross-border offer of products and services, and to enable comparable public data sets to be re-usable for pan-European applications based on them, a minimum harmonization is required to determine what public data are available for re-use in the internal information market, consistent with the relevant access regime. (recital 6)” [3]. On one hand, legislators expressed their good will to harmonize “public data” (it affects internal European information market) in preamble of Revised PSI Directive but, on other hand, important changes to definition was not done in the text of PSI Directive Article 2, only the concept of PSI was updated.

Thirdly, the PSI directive 2003/98/EC is implemented in all EU member countries and EEA countries (Iceland, Liechtenstein and Norway). The problem exists that “EU Member States have implemented the PSI Directive in different ways. 13 Member States have adopted specific PSI re-use measures: Belgium, Cyprus, Germany, Greece, Hungary, Ireland, Italy, Luxembourg, Malta, Romania, Spain, Sweden, United Kingdom. 3 Member States have used the combination of new measures specifically addressing re-use and legislation predating the Directive: Austria, Denmark and Slovenia. 9 Member States have adapted their legislative framework for access to documents to include re-use of PSI: Bulgaria, Croatia, Czech Republic, Estonia, Finland, France, Latvia, Lithuania, Netherlands, Poland, Portugal, Slovak Republic” [4].

Deeper investigation of national EU member states law shows existing differences of PSI definition. Some countries use PSI definition as “document”, “information”, “data” or other.

These differences could be classified to those which are using: (1) same definition of PSI as it is provided in PSI Directive (Austria (including Vienna, Vorarlberg, Lower Austria, Tyrol, Styria, Salzburg and Upper Austria lands), Cyprus, Slovak Republic (from 2012), Greece (from 2006 till 2014), Luxembourg and Spain) and (2) those which have adopted specific definition (all others).

It could be classified also to 4 groups: document group (definition of PSI is strongly related to a document), information group (PSI is understood as some kind of information), data group (PSI is understood as a data, record, file and etc.) and other group (PSI is understood as representation of content, knowledge, matters and other).

A document group could be classified to the smaller parts: (1) Document (Austria (including Vienna, Vorarlberg, Lower Austria, Tyrol, Styria, Salzburg and Upper Austria lands), Cyprus, Slovak Republic (from 2012), Greece (from 2006 till 2014), Luxembourg, Spain used the same definition as it is provided in PSI Directive; (2) Documented information (Estonia defines it as information which is recorded and documented. It means that information which is not documented is not under the scope of PSI; Latvia it defines as “documented information – information whose entry into circulation can be identified”); (3) Administrative documents (France and Portugal it defines as “administrative documents”); (4) Documents, information and data (Greece (from 2014) implements Revised PSI Directive and provides updated conception of PSI: it is the documents, information and data which are made available online as a dataset or via programming interfaces in open machine-readable format which complies with open standards); (5) Documents, record and data (Ireland it defines as document and it means all or part of any form of document, record or data); (6) Document and any content (Romania it defines as a document and it means any content or part of such content).

An information group could be classified to: (1) Information and metadata (Czech Republic it defines as “publicly disclosed information”. Also includes metadata which is named as “accompanying information”); (2) Any information (Bulgaria defines it as any information collected or created by a public sector body); (3) Public information (It is defined as public information in The Netherlands and Poland (all information about public matters constitutes public information) and Slovak Republic (till 2012) used very narrow definition of PSI limited to information only about public money, state/municipality property and concluded agreements); (4) Information in the form of a document, case, register, record and other documentary material (Slovenia it defines as information originating from the field of work of the body and occurring in the form of a document, a case, a dossier, a register, a record or other documentary material drawn up by the body, by the body in cooperation with other body, or acquired from other persons); (5) Information means content (UK 2015–2015 it defines as information and it means any content or part of such content).

A data group could be classified to these parts: (1) Data (Croatia defines it as any data owned by a public authority. It means that ownership of rights to data is important. Hungary 2005–2015 it defines as data of public interest and data made public on grounds of public interest); (2) Data collections (Denmark (from 2005) granted access not only to document but also to data collections. Exception was made to information produced for commercial activities of a public sector body’s, or for which third parties hold a non-material right. “Data collection” means registers or other systematic lists for which use is made of electronic data processing); (3) Files (Denmark (till 1985) granted access to files only if (a) they were the substance of the authority’s final decision on the outcome of a case; (b) the documents contain only information that the authority had a duty to record; (c) the documents are self-contained instruments drawn up by an authority to provide proof or clarity concerning the actual facts of a case, or (d) the documents contain general guidelines for the consideration of certain types of cases); (4) Any record (Germany it defines as any record stored in any way).

Another group consists of these parts: (1) Presentation and message (Finland it defines as “written or visual presentation, and also as a message”); (2) Presentation of acts, facts and information (Italy it defines as document and it means the presentation of acts, facts and information); (3) Any representation of content (Vorarlberg land (of Austria) till 2015 it defines as any representation of content, or part of it which public-sector body may decide whether to allow reuse); (4) Representation of acts, facts or information - and any compilation (Malta till 2015 it defines as document and it means any representation of acts, facts or information - and any compilation of such acts, facts or information); (5) Knowledge (Lithuania it defines as “document shall mean any information; information shall mean knowledge available to a State or local authority institution or body”); (6) Known factual statements on matters (Carinthia and Burgenland lands (of Austria) it defines as factual statements on matters which at the time of the request for information are known to the body); (7) Matter or recording and compilation of information (Sweden it defines as a document and it means any written or pictorial matter or recording which may be read, listened to, or otherwise comprehended only using technical aids. It also includes a compilation of information taken from material recorded for automatic data processing).

Analysis of definitions shows the most EU Member States use different terms to describe the Public sector information. Looking from open government data perspective it is not so important which term is used “document” or “data”, but is more important to see can definition set extra limits which goes out of the scope of the PSI directive.

Firstly, it is risky to limit PSI definition only to administrative documents or documented information. Because there are plenty of information held by public bodies which are not administrative documents or just “documents”, “documented information” in bureaucracy terms. E.g. live traffic data from municipality’s sensors/cameras do not fit the requirements of administrative documents.

Secondly, the ownership of information should be also avoided (ex. belongs to public sector institution), because some works belongs to public domain and according to Revised PSI Directive it should be provided (e.g. from archives, museums) as public domain works. Also, there are discussions [7] held by open data community: does PSI belongs to Public sector or it belongs to public domain (because it was produced by public money).

Thirdly, it is a common mistake, that PSI is defined as information given to re-use. E.g. “Document held by a public sector body: a “document” regarding which the public sector body is entitled to allow re-use” [8]. PSI limitation to only information which is provided for re-use by institution should be avoided, because it limits the right to get access to information and initiative to ask for new information which is not provided by institution. On other hand such limitation is right of each EU member country according to PSI Directive recital 9: “This Directive does not contain an obligation to allow re-use of documents. The decision whether or not to authorise re-use will remain with the Member States or the public sector body concerned. This Directive should apply to documents that are made accessible for re-use when public sector bodies license, sell, disseminate, exchange or give out information” [2].

Finally, implementation of Revised PSI Directive makes changes in PSI terminology, because PSI concept was updated by including metadata, open and machine readable formats, and up-coming understanding what is open data. Example, Spain PSI regulation from 2015: Document: All information or part thereof, whatever the medium or form of expression, whether textual, graphic, audio visual or audiovisual, including associated metadata and data content with the highest levels of accuracy and disaggregation [9].

There is a hope that the implementation of Revised PSI Directive will help for Community to adopt definitions of PSI, which will be constructed to support open data concept, e.g. as it did Greece [10].

Analysis of the Legal Requirements Applied to OGD Licensing in National PSI Law.

In each country all public sector data which is released as Open Government data (or, in other words, PSI ready for re-use) is regulated by national PSI law. Depending on the country there could exist also land’s (e.g. Wiener Informationsweiterverwendungsgesetz (WIWG)), municipality’s, public institution’s PSI laws, but those laws follows the federal or national PSI regulation. Our analysis is limited to the main national PSI regulation.

Analysis has discovered that there exist differences concerning legal requirements applied to OGD licensing among EU Member States. Those differences in the most cases are not significant and follows EU PSI Directive’s rules, but there exist some contradistinctive, e.g. in Spain re-user of PSI could be fined up to 100000 Eur for violation of re-use policy; in Croatia up to 100000 HRK/~ 13000Eur could be fined public authority which prevents or restricts the exercise of the right of access to information and re-use of information.

In order to make those requirements understandable in machine-readable format, primer version of the ontology has been developed.

3 The Ontology of Open Government Data Licenses Framework for a Mashup Model (OGDL4M)

The Ontology of Open Government Data Licenses Framework for a Mashup Model (OGDL4M) is an OWL ontology formalizing a legal knowledge of Open Government data licensing Framework to represent legal requirements applied to open government datasets in mash-up model. OGDL4M is still under development and we expect to present it by the end of 2016. This section describes a part of OGDL4M which is dedicated to present legal requirements for open government data licensing, terms of use and sanctions for the violations which is coming from national re-use of public information (PSI) laws of EU Member States.

3.1 State of Art

At the moment there are no similar ontologies representing EU Member countries national-level PSI domain, but there are ontologies which analyses licensing (L4LOD [11], RDFLicense [12]), intellectual property (IPROnto [13], CopyrightOnto [14]), linked data rights (ODRL v.2.1 [15]), legal norms (LKIF [16]) and expression language ccREL [17].

Main scholars which are working on subject related to this ontology are M. Palmirani [18, 19], S. Peroni, P. Casanovas [20], V. Rodríguez-Doncel [21], S. Villata, F. Gandon, A. Kasten, D. Paehler, R. García, and J. Delgado.

3.2 Merged Ontologies

OGDL4M Ontology re-use some elements of other ontologies (Table 2):

Table 2. Merged ontologies objects

3.3 Objective

The objective of this part of ontology is to help to create the theoretical model, which will be able to inspire an automatic or the semi-automatic computational model that could represent national law PSI rules of EU Member countries, especially when licensing regime is not clear, or when conditions for re-use are not provided.

3.4 Formation of List of All the Relevant Terminology and Production of Glossary

We have developed a table in which we indicate the terms, provide legal description, legal source and normalized definition (Table 3).

Table 3. Example of the glossary

3.5 Overview

OGDL4M consist of core part, which presents general concept, and other parts based on each country profile.

In the Fig. 1 the fragment of core part of OGDL4M is presented. A class LKIF:LegalSource should be indicated as a source of all possible regulatory sources which could apply to dataset released by public sector. E.g. if information system wants to evaluate what are legal requirements (Class ConditionsOfPSIReuse) applied to dataset (class OpenGovDatasets), it must investigate all possible legal sources (class LKIF:LegalSource).

Fig. 1.
figure 1

The fragment of OGDL4M core part: legal source.

Classes LegalNotice, TermsOfUse and License represent forms of regulation which are commonly used to express connection between dataset and legal regulation. Usually, by mistake those forms are applied without taking care of other important class LKIF:Legal_Document which represent different regulation coming from different legal areas: Personal data protection, Copyright law, EU Database sui generis right, and PSI law which is divided to country level (national PSI law) and lands, municipality, institutions PSI law level (localized PSI law).

In the Fig. 2 the fragment of core part of OGDL4M is presented, which explains the model how different national PSI regulation could be explained. National PSI regulation provides rules which explain are those PSI re-use requirements are obligatory or only recommended, or maybe those (some/all/none) requirements are not regulated by national law, but must/could be regulated by local PSI law.

Fig. 2.
figure 2

The fragment of OGDL4M core part: general requirements.

Class NationalPSILaw represents National PSI law, which is legally binding and sets general countries legal rules applied to re-use of PSI conditions. The class GeneralRequirements is subclass of NationalPSILaw and represents general countries legal rules applied to re-use of PSI conditions. Those rules could be obligatory (class ObligatoryGR) or only recommended (class RecommendedGR) to apply. In those cases when rules are obligatory to apply, all other contra legal rules set on dataset is not valid. E.g. in Finland OGD could be released only as part of public domain, so no other rules can apply to OGD released by public institution in Finland, especially other license which do not represents public domain (like cc-by), or if there is licence missing it is clear that dataset is part of public domain.

In other cases when national PSI regulation only recommends to follow some rules, usually PSI policy is dedicated to the lower authority. The class of SpecialRequirements is used to present link to local psi law (of land, municipality, institution or other public authority) and limitation of possible use (without deeper analysis) of the ontology for current country profile.

3.6 OGDL4M Model for the Country Profile

Legal requirements applied to OGD licensing in the national PSI law is modeled by identifying which requirements are obligatory to apply and which are recommended. Requirements are presented by identifying the legal source of the requirement (concrete part of the law). It is necessary for quick cross-checking and evaluation is that norm still valid. If there are sanctions of violation of PSI re-use policy class Sanctioning Regime is used. In country profile ISO 3166 code is attached to PSILaw, Jurisdiction, GeneralRequirements classes.

In a Fig. 3 the OGDL4M model for Finland is presented. The class PSILawFI represents legally binding Finland’s PSI law - Act on the Openness of Government Activities with its amendments [22]. The model explains that general requirements (class GeneralRequirementsFI) are set by Chapter 1 Sect. 1(1) of Act on the Openness of Government Activities and it is applied obligatory. Legal requirement is only one applied to OGD: PSI belongs to Public domain.

Fig. 3.
figure 3

The fragment of OGDL4M representing Finland’s legal requirements to OGD.

In Fig. 4 the OGDL4M model for Spain is presented. The class PSILawES represents legally binding Spain’s PSI law – Law on the re-use of public sector information it’s amendments [9]. General requirements (class GeneralRequirementsES) are obligatory to apply. Model explains that: (1) there could OGD released by no conditions/license (class NoConditionsForReuse) or (2) OGD could be regulated only by standard license. Standard license has a bunch of conditions: license should be open, not limit competition, not restrict re-use and etc. The model explains that there could be only two licensing regimes in Spain, but in reality we found 33 during the Survey. Licensing regimes which do not follow Spain’s PSI law’s regulation are not correctly applied.

Fig. 4.
figure 4

The fragment of OGDL4M representing Spain’s legal requirements to OGD.

In Fig. 5 specific conditions for re-use is presented. Those conditions basically implement similar to non-derivative license conditions (cannot be altered). It means that licensed OGD released by public authority cannot be used in mash-ups in Spain. There is a conflict of legal norms which requires not limiting re-use of PSI and asks for not altering the PSI. The conditions which limits PSI re-use are supported by sanctions.

Fig. 5.
figure 5

The fragment of OGDL4M representing Spain’s legal requirements to OGD.

In Fig. 6 sanctioning regime is explained. If OGD released by Spain with a license, those sanctions should apply, e.g. failure to indicate the date of the latest update of information will cost to developer from 1000 to 10000 Eur.

Fig. 6.
figure 6

The fragment of OGDL4M representing Spain’s legal requirements to OGD.

4 Conclusions and Future Work

The legal analysis of EU Member States national PSI law has indicated the main problems: national law is not harmonized with the EU law, that’s why situation in most EU countries is different and requires deeper analysis of the national legal domain. OGDL4M ontology could be a very useful tool for evaluating country’s PSI policy, and could be used as a tool for automatic or semi-automatic evaluation of the legal regulation of datasets released by the public bodies of EU Member countries in the future.

Moving forward we expect to enrich the ontology and present the completed version of OGDL4M by the end of 2016.