Keywords

1 Introduction to the Problem

Since 2009 Open data domain was under the scope of scientists: more than 34,000 of the papers have been published and over 2,200 explicitly focused on Open Government Data [1]. But still there are lacks of investigations which focus to the legal side of opening the government data. According to the Open Data (OD) community opinion there is not much to discuss about: just “give us the data with an open license” and you will get the first star of five [2, 3]. But this investigation shows that it is not so simple task for the public administration institutions to deal with the legal issues of the open government data concerning the licensing. A lot of the Open Government Data (OGD) which is published in the governmental Open Data portals does not fulfill the requirements classified by the OD community as the simplest first step (the first star). Several definitions are used in this paper: for favoring the reader they are included in the Annex A below.

1.1 Principles

Analysis of the OD principles provided by Open data community, shows that freely reuse of data is necessary for open data idea.

The Universal Participation principle declares that everyone must be able to use, reuse and redistribute - there should be no discrimination against fields of endeavor or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for the certain purposes (e.g. only in education), are not allowed [8].

The Open Knowledge Foundation suggests the following definition of the open government data: “Data produced or commissioned by government or government controlled entities and it can be freely used, reused and redistributed by anyone” [9].

Why freely reuse is so important? Reuse is one of the pillars of the interoperability and for producing a digital society ecosystem. It is so true that the Directive 2003/98/EC introduced first the concept of the re-use rather than concept of open data. Secondly, the Linked Open Data (LOD) provided a technical framework for supporting the re-use and stressed the freely re-use concept. This characteristic is fundamental for implementing the digital economy. The answer comes from the LOD domain. If there are no legal limitations to connect the datasets, then the LOD principles are satisfied. The LOD first step or first star requires the open license [3].

1.2 Goal, Research Questions and Methodology

Do governments respect the OD principles, or not? What are the tendencies? If not respects, then why? Those questions are too difficult to answer by doing analysis of few countries because the results could be misleading. Data cannot be stopped by borders, so the answers could be found only by the survey of the global OGD domain. The main goal of the Survey is to present the state of art of the current OGD licensing situation. In the paper we address several fundamental questions: does the OGD need legal protection; if so, what kind of licenses should be used; does the CC0 license fulfil the EU regulation? The OD community requires fewer barriers for re-using OD, so should the license be used?

The methodology was to check the legal protection status (license/no license/legal notice) of the datasets provided in the ODG national portals listed in the Annex B below. Because of lack of resources it was not possible to identify the all OGD in every country, so only the key OGD portals have been chosen. The criterions of choosing the portal were those: it should be presented by official public institution as the main OGD portal of the country or the federal state, also OGD portal held by European Commission. Land, state, municipality or other portals held by private and public initiatives were out of a scope of the investigation.

During the survey the condition of the datasets or the links to the datasets was not checked, only metadata was collected. In all cases there were datasets containers (collections of datasets) identified as the singular datasets. All information from the portals was taken as-is. Overall the information of the 435,682 datasets were classified and investigated.

In the absence of specific licenses, we have identified all the legal notices about the obligations, the limitations, the liability, the privacy rules, etc. published on the web in order to understand whether these fragmented legal regulations can fully replace the license instruments. The survey is a representation of the penetration of the license culture in the OGD and also underlines the misuse of the license instrument, which is often adopted as an admission rather than a contract, especially in the EU. Second, we investigated the principles coming from the OD domain and how they comply with re-use of PSI, copyright law and administrative law principles in the EU-level domain. We also evaluate the impact of the PSI and related licenses on a mashup scenario, presenting a comparative table concerning the compatibility of licenses with the main PSI and OD principles. Thirdly, we have analyzed case studies from Italy, Lithuania and the UK in an effort to model whether it is possible to release OGD without a license, with a CC0 license, with other CC licenses and the like with respect to principles originating from the corresponding jurisdiction and to detect are the OGD of these countries is ready for mash-up in the global OGD domain.

2 The Survey of the Licensing of Open Government Data

In January of 2015 the survey of the licensing of the OGD has been done. The goal of the survey was to collect state-of-art of licenses used in OGD portals to cover datasets.

During the first part of the survey it was checked in the OGD portals: (1) are there datasets covered by any license; (2) if a dataset is not covered by a license, are there any legal notice, conditions for re-use applied to the dataset; (3) are there datasets without the license, or information about the license is not provided.

The results of the first part of the survey are those: (a) 56 % of all datasets from the investigated portals are covered by the license; (b) 17 % of all datasets are not covered by the license, or information about the license in the OGD portal is not provided, or there is any other conditions set of re-use of dataset or is license-free; (c) 27 % of all datasets are covered by legal notice in the portal or in the metadata of the dataset or indicated as legal notice. The legal notice is used in OGD portals of European Union (EU) (100 %), Moldova (100 %), Spain (11 %), US Federal datasets (100 %), other US datasets (0, 3 %) and Germany (only 3 datasets). In Spain and US there are different legal notices.

The second part of the survey is dedicated to multiplicity of the licenses in the global OGD phenomena. The varieties of the licenses were checked and the most popular licenses were identified. The licenses provided by the national authorities and applied only locally are named local licenses and it does not include Creative Commons localized licenses.

The results are these: (1) the most popular from the licenses are local licenses which covers 90 % licensed datasets (e.g. Open Government License – Canada, License Ouverte, Open Government License (UK), Non-Open Government License (UK), Data license Germany – attribution – version 1.0 and 2.0, Italian Open Data License 2.0 and 1.0, NLOD, Uruguay Open Data License); (2) the second most popular (6 %) from the licenses are CC-BY licenses, including localizations (e.g. CC BY 3.0 AU, CC BY 3.0 NZ, CC BY 3.0 AT, CC BY 3.0 CL, CC BY 3.0 GR and etc.); (3) the third most popular (2 %) are licenses waiving copyrights to public domain CC0 and Open Data Commons Public Domain Dedication and License (PDDL); (4) all other licenses covers only 2 % of the datasets. That 2 % pie of the other licenses is divided: Open Data Commons Open Database License (ODbL) (45 %), CC BY-NC (noncommercial) including versions and localizations (38 %), CC BY-SA (attribution, share alike) including versions and localizations (10 %), Open Data Commons Attribution (3 %), GPL (2 %), Against DRM (1 %), CC BY-ND (no derivative works) (1 %), CC BY-NC-ND (attribution, noncommercial, no derivative works), CC BY-NC-SA (attribution, noncommercial, share alike), GNU Free Documentation License (GFDL) (cf. Fig. 1).

Fig. 1.
figure 1figure 1

Results of the survey of the licensing of open government data.

Finally, the survey has discovered that in the global licensing scenario the incredibly huge part of the licenses are ruled by the local licenses. Only 17 % of datasets are covered not by the license or the legal notice. Taking to the account that still there is developing stage of the OGD portals and ODG domain, the numbers should change in the coming future. The second important discovery is that CC-BY license is becoming more and more important and is understandable as a standard in the global OGD scenario. From Creative Commons copyright licenses CC-BY has least restrictions to re-use the dataset and is classified as the open license. The third discovery shows, that such countries as The Netherlands, U.S., Italy, Costa Rica, Brazil, Belgium, New Zealand, France, Germany, Greece and Spain release the datasets to the public domain. Last but not least, 27 % of the investigated datasets is “covered” by the legal notices. This is emerging question: how to attach the legal requirements to the dataset in the LOD domain.

3 Analysis of the Licenses for the Datasets Mash-up Scenario

In the datasets mash-up scenario when two different datasets meet, analysis of licenses (or legal regimes applied to datasets) compatibility is needed. No need for it only if dataset is not covered by any license or legal notice or is covered by the license dedicated to the public domain, because these datasets are compatible with any dataset covered by the license.

The survey of the licensing of the OGD showed us 6 most popular legal regimes of the datasets. Compatibility of these licenses and legal notices are shown in a Table 1.Footnote 1

Table 1. Top licenses comparison for mashup model

The results are joyful because the most datasets from the investigated OGD portals are compatible because of the correct license regime.

The only problem is to ensure the attribution requirements, which basically are statements about the source of the resource and links to the licenses.

On other hand datasets is not only important by quantity but also by quality. Still there are a lot of datasets which are covered by other licenses. Example of Creative commons licenses compatibility is shown in a Table 2.Footnote 2

Table 2. Creative commons licenses comparison for mashup model

In the Table 1 is shown that not CC licenses are compatible, that means not compatible licenses is a barrier for LOD. Datasets covered by not compatible licenses are “out of the cloud of data” in OGD domain and will not create any value in mash-ups of the datasets. Contract type licenses also are a big barrier for LOD.

Only when the contracts made by software agents will be recognized in PSI re-use domain, then the barrier disappears. Otherwise closed platforms as a pools of datasets could be used in specific re-use of PSI projects (e.g. in medicine, where sensual personal data is held and the identification and contracts are needed), or such platforms as ENGAGE [10] could be upgraded to solve contract problems by unifying them.

One of the biggest problems in mash-up scenario is legal notes, which are not unified, does not have common structure. Sometimes it is a document (e.g. EU legal notice), sometimes it is only one sentence (Spain, U.S. datasets) or just a note that legal note is applied (without a reference to that note). Those legal notes usually are placed separately from metadata of the dataset, it means that automatic process of connecting legal notes with dataset is very complicated; lifecycle of legal notes in mash-up scenario of datasets is hardly realizable.Footnote 3

To sum up, the most used licenses and unified legal notes to protect OGD are compatible in global scenario. Some licenses (e.g. CC-BY-NC-ND) are not compatible in mash-up scenario. Still there exists a reasonable amount of datasets, especially in Spain, which are covered by not unified legal notes and such legal regime of legal protection of dataset is not suitable for lifecycle of dataset legal regulation.

4 European Case Analysis

4.1 Italy

4.1.1 OGD Regulation

The Italian Open Government Data Legislation support Public Administrations to release open dataset at national, regional and local levels. The Italian process of open data is quite good at regional (22 bodies) and local level (62 bodies), less important at the ministerial side (26 bodies).Footnote 4 The legal framework of the OGD is composed by several different Acts. The fundamental important pillars are: the legislative decree n. 82/2005 Digital Administration Code and modifications, the implementation of the Directive 2003/98/EU with the legislative decree n. 69/2009 and the legislative decree n. 33/2013, the Transparency Act [11]. The d.lgs. n. 82/2005 defines the Open Government Data modality, but there are two levels for releasing data: (a) to release data using only a technical requirement using open formats (e.g., XML, CSV, etc.); (b) to implement open data paradigm including licenses, reuse without commercial limitations, processes of production of the datasets, quality check. The d.lgs. n. 69/2009 provides the definition of public administration document and the modality and practical means for the public administration that permits the release of documents in open format. In d.lgs. n. 33/2013 we can read a long list of public documents that must be published in digital format in a specific part of the official web site of the public administration, following a strict hierarchical web site tree, but not mandatory in Open Data. The framework is sufficient for implementing a concrete plan of OGD, however the legal scenario is confusing and contradicting.

The Transparency Act is mandatory for each public administration and the prescription is stronger rather than the Digital Administration Code. It obliges to release a relevant number of documents/datasets, but limited to cope with the transparency finality (e.g., grant, budget, funds, accountability, performance), limited on time (after three years it is mandatory to move these data in another part of the web site for the right to be forgotten) and without any requirement about licenses. So the document/data are released in open format (e.g., XML), but the ownership and the control of the dataset/documents are in the hand of the public administration that can decide to remove all the information from the publication portal in any moment. The Digital Administration Code includes wider principles of open data paradigm including the economic benefits produced on the society, the improvement of the quality of life of the citizenships, the effectiveness of the services of the public administration in the governance of the territory. However it imposes to compliance more strict the rules about the privacy related to the Italian Personal Data Protection Code (d.lgs. 196/2003 and the connected GuidelinesFootnote 5) and so the public administration can (not must) publish a large variety of data (not limited to the accountability matter) using the Digital Administration Code rules, but only anonymized. This double track is creating a confuse situation in the public administration about the licenses: the web site of the Transparency Act should usually apply non-open data license considering that we have sometime personal data included in the documents (e.g., payments, salaries, grants) and the Open Data portal must publish only using open data licenses. The risk is to have the same data/document released in different format (anonymized and not anonymized) with two different licenses (e.g., funds for natural disasters).

4.1.2 OGD Licensing Review

The Italian situation about the open data licenses is promising [12]. Since the 2010 Formez (the government agency for the public administration training and learning programmers) defined the IODL 1.0 (the Italian Open Data License). It is similar to a cc-by-sa, it imposes the same license for the derived works. In the 2012 Formez released the IODL 2.0 that removes the Share Alike clause. The current situation of the licenses in the open data portals in Italy is the following: the most adopted license by the public administration is the IODL, but in term of number of datasets the cc-by is the larger collection. This variety of licenses criteria creates the problem how to combine them in order to reuse different large datasets coming from different heterogeneous sources (cf. Fig. 2).

Fig. 2.
figure 2figure 2

Table of the statistics concerning the Italian licenses used in open data portals. (Dataset of data.gov.it visited February 2015.)

One of the most used license is the CC0 especially by the technical experts because it resolves the problem of the mush-up of dataset easily. However the CC0 is a waive license and the owner of the dataset frequently is the public administration. Following the public law the owner is the State or the local administration and for this reason the employer does not have the power to waive the rights of the IPR in favour of the community. The artt. 10 and 53 of the Cultural Heritage Code d.lgs., 22 January 2004, n. 42, define the dataset and moreover the digital document as “digital patrimony” of the State and it is inalienable. For this reason is not appropriate to use CC0 for the OGD.

4.1.3 OGD Italian Portal

The data.gov.it portal is the national portal of open data and it hosts all the national, regional, local datasets in a unique central catalogue. We have more than 1,400 dataset and the most used format is CSV, JSON, XML. The license more used is cc-by with 6,527 dataset. The portal permits also to integrate the local open data portal using API in order to share the data and so to build the national catalogue in CKAN.Footnote 6

4.2 Lithuania

4.2.1 OGD Regulation

Public sector information which could be provided for re-use is regulated by the Law on Management of State Information Resources. In Article 10 Sect. 1 part 8 describes important principle for re-use of PSI: openness of the information resources, which means that favorable conditions for natural and legal persons are created for re-use of information managed by the institutions when carrying out statutory functions independently of the natural and legal persons legitimate operating objectives and legal form thereof. In the Article 30 Sect. 3 it is noted, that information from state information systems shall prepared for PSI re-use.

The law divides PSI suitable for reuse by 3 parts: 1. data from state registers; 2. Data from state information systems; 3. Other PSI.

Article 26 Sect. 5 introduces obligations to re-users of data of state registers: the recipient may not change the data obtained from the registry and the registry information and must indicate the data source when using them. This obligation means that CC BY-ND 4.0 or similar local license covers data from state registers. Also, by the default information is provided for a charge, except for the exceptions provided in this law and other laws of the Republic of Lithuania, European Union legal acts and the register’s regulations (Article 29).

Data from state information systems is provided free of charge. Article 35 Sect. 5 sets same requirements as for data of registers: the data obtained from an institution may not be changed and their source must identified when using the data.

The conditions of re-use other PSI is not regulated by the law, but regarding openness principle, should be open without any restriction to use it, except if there are special requirements set by other law.

Requirement of the contract but not the license come from Article 37 Sect. 3: “When information files containing information managed by the institution that is important for the entire state or several institutions are published on the institution’s website and the Republic of Lithuania laws and (or) other legal acts provide for special conditions of the use of such information, the institution shall establish electronic authorization, which includes the terms of use of such information files that must be followed. Such information files shall be provided to persons after their electronically expressed consent with the terms of the electronic authorization.”

Data providers must put legal notice according to the Article 37 Sect. 1: “The institution shall disclose on its website the information about its managed information, terms and conditions for the use of this information. In cases when pursuant to the laws of the Republic of Lithuania and other legal acts institution shall not continue to process, update, provide or publish its managed information, it shall announce about the aforementioned on its website no later than two months in advance.”

To conclude, OGD can come only from not important states information sources. The most valuable data from state registers and information systems is locked by “electronic authorization” and “click contract” (basically its electronic contract, not a license), re-user has obligation not change the data. Data of state registers is by default provided for a charge; exception can be made by law (cf. Table 3).

Table 3. Requirements to OGD coming from Law on Management of State Information Resources

4.2.2 OGD Licensing Review

OGD without a license could be released, but there should be legal notice provided. Otherwise, there is a risk that copyright law can be applied automatically.

Copyright law does not cover legal acts, bills, drafts, official translations of law, administrative documents, official symbols and signs, and separate data (Article 5 of Law on Copyright and Related Rights). Therefore, CC0 or other public domain license could be applied only to datasets, which carry data not protected by copyright and not taken from state register or information system (e.g. legal acts register). Other OGD could be covered by CC-BY license, free of charge information from state registers and information systems could be covered by a contract similar to CC BY-ND. There is no law which forbids re-using PSI for commercial purpose. The draft of local license (by restrictions equal to CC-BY) was developed in 2013, but has been never adopted. To sum up, in Lithuania OGD development is a very politically related, most valuable government data has requirement of the re-use contract, cannot be modified, not valuable government data can be re-used freely without restrictions. Public domain license can be applied very rare only cases.

4.2.3 OGD Portal and Other Initiatives

The central OGD portal (http://opendata.gov.lt) actually is not designed as the OGD portal attractive for re-users but as a list of the PSI resources (implements only formally the Article 9 of PSI directive). The portal provides information only in a local language, and consists of 263 links to the data providers of the public data resources available for re-use. There also exist some institutional initiatives, e.g. the Ministry of Economy lead by pro-western politicians started the first in the country OGD portal (http://data.ukmin.lt/duomenys.html) but after 2012 the data is not updated (at that moment was set the new minister representative of Lithuanian Social Democrats Party, which roots comes from ex-communist party). Other initiatives are coming from the NGO’s (e.g. “Transparency International” Lithuania collects from the government information about mass media owners and provides it in the open datasets) and private sector (e.g. datasets on CKAN data management system is developed by private person http://atviriduomenys.lt/dataset; popular the OGD visualization project http://freedata.lt/).

4.3 United Kingdom

4.3.1 OGD Regulation

The main act concerning OGD is Protection of Freedoms Act 2012, which has updated Freedom of Information Act 2000. Also important acts are: the Re-use of Public Sector Information Regulations 2005 Act, the Copyright and Rights in Databases Regulations 1997, the Copyright, Designs and Patents Act 1988 and the Copyright and Related Rights Regulations 1996. Freedom of Information Act 2000 regulates right of access to information held by public authorities. The Re-use of Public Sector Information Regulations 2005 Act (PSI Act) implements PSI directive. PSI Act 5(b) establish important exclusion when PSI may not be provided: public information for re-use may not provide if a third party owns: (a) copyright (within the meaning of Sect. 1 of the Copyright, Designs and Patents Act1988), (b) database right (within the meaning of regulation 13 of the Copyright and Rights in Database Regulations 1997), (c) publication right (within the meaning of regulation 16 of the Copyright and Related Rights Regulations 1996), and (d) rights in performances (meaning the rights conferred by Part 2 of the Copyright, Designs and Patents Act 1988). This exclusion shows, that OGD in the most cases will not have related to intellectual property rights (IPR) included material, otherwise IPR holder must give permission for re-use.

Protection of Freedoms Act 2012 (FA) has updated Freedom of Information Act 2000 and is designed for OGD. FA defines dataset in Part 6 Sec 102 Sub 2(c): “means information comprising a collection of information held in electronic form where all or most of the information in the collection (a) has been obtained or recorded for the purpose of providing a public authority with information in connection with the provision of a service by the authority or the carrying out of any other function of the authority, (b) is factual information which (i) is not the product of analysis or interpretation other than calculation, and (ii) is not an official statistic, and (c) remains presented in a way that (except for the purpose of forming part of the collection) has not been organized, adapted or otherwise materially altered since it was obtained or recorded.”

FA Part 6 Sec 102 Sub 3 describes how datasets containing copyright works should been released: “When communicating the relevant copyright work to the applicant, the public authority must make the relevant copyright work available for re-use by the applicant in accordance with the terms of the specified licence.”

In UK exists also unique Crown Copyright. It is applied to works made by “an officer of the Crown, this includes items such as legislation and documents and reports produced by government bodies. Crown Copyright will last for a period of 125 years from the end of the calendar year in which the work was made. If the work was commercially published within 75 years of the end of the calendar year in which it was made, Crown copyright will last for 50 years from the end of the calendar year in which it was published. Parliamentary Copyright will apply to work that is made by or under the direction or control of the House of Commons or the House of Lords and will last until 50 years from the end of the calendar year in which the work was made” [13]. OGD release also depends from the Data Protection Act 1998, the Freedom of Information (Scotland) Act 2002, the Environmental Information Regulations 2004 and the Environmental Information (Scotland) Regulations 2004. Further information concerning OGD could be found in UK Government Licensing Framework [14].

4.3.2 OGD Licensing Review

There exist 3 types of OGD licenses: the Open Government Licence, the Non-Commercial Government Licence and the Charged Licence. The first license is open license, second – applies limitation for commercial re-use and third is applied for information which is charged by Public body. Open Government Licence v3.0 (UK) satisfies open license criterions and is suitable for LOD domain. According to this, the movement to current versions of Creative Commons license is hardly likely. In 2018 there is planned to revise PSI directive; if decision to have one license in EU will be agreed, there could be changes. Realize of licenses dedicated to public domain or without a license is possible only to those datasets which do not have copyrighted materials or copyright and database rights have expired. E.g., this year Crown copyright is expired to works made until 1890, but taking to account that digitalized copy of work or adaptation of work suitable for OGD dataset also could be protected by Crown copyright, digital copies of works made until 1890 still can be protected by Crown copyright. To conclude, the release of OGD without a license or with public domain license without changing the regulation in this century is not likely.

4.3.3 OGD Portal

At 9th of January 2015 the OGD portal (http://data.gov.uk/) consist of 16234 datasets, 11679 datasets were covered by Open Government Licence v3.0 (UK) and 4555 datasets were covered by Non-Open Government Licence. Also there were 4074 unpublished datasets, which metadata is available in a portal and the reasons of not-publishing are provided. The OGD portal provides more than 350 apps from which the most popular is called Scope Nights: Astronomy Weather Reports.

5 Conclusions and Future Work

The survey underlines some important findings: (i) the majority of the OGD does not fulfil the OD principle of freely re-use of data, but mostly the limitation of re-use ends only with an attribution requirement and the link to the license/legal notice; (ii) the legal notices fragment the legal protection on different parts of the web site containing OD without any prospect of re-use; (iii) there is the risk that datasets covered without a license in an EU jurisdiction could be automatically protected by the U.S. copyright; (iv) the legal requirements in the different jurisdictions preclude the use of a singular license or contribute to an aversion to the use of the license altogether; (v) the CC licenses are used as brands for communicating an attitude and philosophy rather than a real legal permission or an obligation framework; (vi) the OGD licenses not always comply with PSI re-use policy, e.g. there are restrictions for commercial re-use; (vii) in the global phenomena the most OGD datasets are covered by the licenses which are ready for the mash-up scenario, but in a country level the results could be different.

The regulation of the OGD depends of the national intellectual property, the public law and the database copyright regulation. The CC0 license could be used more as an exception in rare cases, than a rule in investigated EU member countries. The most U.S. federal datasets are in the public domain, but still the U.S. OGD portal sets the attribution requirement in the legal notice. In the EU the most cases copyright is applied automatically to the works without making a notice about copyright or registration. Differently in US, works are protected after making the notice that is copyright work and registration for the extension to copyright term is required.

The interoperability among datasets is fundamental for the economical exploitation of the OD and for developing an inclusive society. Only a good license framework of the OGD can assure legal protection in long-term and guarantee the end-user in the chain of the re-use (e.g. re-use of re-use, derivative works, etc.). In a future, development of the ontology of the global regulation of the OGD domain is required. It could be used as a tool for automatic or semiautomatic mash-up of the licensed and not licensed open government data.