Assessing FAIR Data Principles Against the 5-Star Open Data Principles

Hasnain, Ali; Rebholz-Schuhmann, Dietrich

doi:10.1007/978-3-319-98192-5_60

Ali Hasnain²⁶ &
Dietrich Rebholz-Schuhmann²⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11155))

Included in the following conference series:

European Semantic Web Conference

3516 Accesses
16 Citations
2 Altmetric

Abstract

Access to biomedical data is increasingly important to enable data driven science in the research community. The Linked Open Data (LOD) principles (by Tim Berner-Lee) have been suggested to judge the quality of data by its accessibility (open data access), by its format and structures, and by its interoperability with other data sources. The objective is to use interoperable data sources across the Web with ease.

The FAIR (findable, accessible, interoperable, reusable) data principles have been introduced for similar reasons with a stronger emphasis on achieving reusability. In this manuscript we assess the FAIR principles against the LOD principles to determine, to which degree, the FAIR principles reuse LOD principles, and to which degree they extend the LOD principles. This assessment helps to clarify the relationship between both schemes and gives a better understanding, what extension FAIR represents in comparison to LOD.

We conclude, that LOD gives a clear mandate to the openness of data, whereas FAIR asks for a stated license for access and thus includes the concept of reusability under consideration of the license agreement. Furthermore, FAIR makes strong reference to the contextual information required to improve reuse of the data, e.g., provenance information. According to the LOD principles, such meta-data would be considered interoperable data as well, however, the requirement of extending of data with meta-data does indicate that FAIR is an extension of the LOD (in contrast to the inverse).

You have full access to this open access chapter, Download conference paper PDF

Ontological Representation of FAIR Principles: A Blueprint for FAIRer Data Sources

Towards a More Reproducible Biomedical Research Environment: Endorsement and Adoption of the FAIR Principles

Investigating the FAIRness of Science and Technology Open Data: A Focus in the Scandinavian Countries

Keywords

1 Introduction

The advent of the World Wide Web [2] has enabled public publishing and consumption of information on a unique scale in terms of cost, accessibility and size. In the past few years, the linked open data cloud has earned a fair amount of attention and it is becoming the standard for publishing data on the Web [5, 8, 11].

One of the ambitions behind the linked data effort is the ability to create a Web of interlinked data which can be queried using a unified query language and protocol, regardless of where the data is stored [7, 9]. Core to this achievement is the adoption of the resource description framework (RDF) as the knowledge representation formalism as well as the SPARQL protocol for the retrieval of data. Ideally, all data is openly available and accessible through a maintained IT infrastructure [12].

Since the idea of linked data is in place, different schemes, proposals and recommendations (with guiding principles) have emerged from researchers and practitioners to make it open and easy to use by others [13]. One such scheme is proposed by Tim Berners-Lee in year 2010 known as “Linked Open Data 5-star [1]”. Under the star scheme, one star can be get if the information has been made public under an open licence. More stars denote the quality of data leading to better structure and interoperability for reuse.

While LOD has had some uptake across the web, the number of datasets using this protocol compared to the other technologies is not up to the mark and significantly modest [9]. But whether or not one uses LOD, one does need to ensure that the datasets are designed specifically for the web and for reuse by humans and machines. To provide guidance for creating such data content independent of the technology used, recently the FAIR principles were issued through the Future of Research Communications and e-Scholarship (FORCE11) [3]. The FAIR principles provided by Wilkinson et al. [17] put forth characteristics that contemporary data resources, tools, vocabularies and infrastructures should exhibit to assist discovery and reuse by third-parties through the web. FAIR stands for: Findable, Accessible, Interoperable and Re-usable.

In this paper we want to highlight the overlapping aspects proposed by Linked Open Data 5-star and FAIR principles and present our position on how these two are related.

2 Related Work

Cox et al. [4] presented a rating system known as the 5* OzNome Data tool^{Footnote 1} which allows users to carry out a self-assessment by classifying the 14 facets into data quality guidelines. These data quality guidelines are provided and derived by FAIR principles (Findable, Accessible, Interoperable, Reusable) along with an additional dimension called Trustable or being Trusted. In doing the self-assessment, one can explore ways in which they are able to improve their data collection and how it is accessed by others. This gives data providers targets to improve their data collection and publishing process.

Related work can also be taken into account from the perspective of quality analysis of datasets. Although this paper doesn’t provide insights or comparative quality analysis of existing approaches, but both the FAIR and 5-star principles targets the quality aspects of data.

Yamaguchi et al. presented YummyData [18] which is designed to improve the findability and reusability of Life Science Linked Open Data (LS-LOD) by monitoring the states of their Linked Data implementations and content.

SPARQLES^{Footnote 2} [15] monitors SPARQL endpoints registered in DataHub to determine their availability, performance, interoperability, and discoverability.

Hasnain et al. [9, 10] presented SPORTAL (SPARQL portal) that provides the analysis in terms of discoverability, findability and accessibility aspects of Linked Open Data publish as public SPARQL endpoints.

3 LOD 5-Star

Tim Berners-Lee proposed 5-star scheme for Linked Open Data that provide guidelines for data providers and publishers in order to make data more accessible, available and reusable over the Web.

make your stuff available on the Web (whatever format) under an open license
make it available as structured data (e.g., Excel instead of image scan of a table)
make it available in a non-proprietary open format (e.g., CSV instead of Excel)
use URIs to denote things, so that people can point at your stuff
link your data to other data to provide context

In the following subsection we look into the importance of achieving these stars from user and publisher perspective

One : Achieving one star means that the user can be able to (1) access the data, (2) consume the data, (3) store it locally, (4) manipulate the data and 5 share the data. Whereas being publisher (1) it is easy and simple to publish the data.
Two : Achieving two stars the user can be able to (1) process the data, (2) aggregate the data, (3) perform calculations, (4) visualise the data and can be exported into another (structured) format. Whereas being data publisher it is still simple to publish.
Three : Achieving three stars helps data user to do all what can be done using two stars Web data and additionally one can manipulate the data without any proprietary software package. Similarly as data publisher a converter may be needed to export the data from the proprietary format.
Four : Achieving four stars helps data user to do all what can be done using three stars Web data and additionally (1) data can be linked using URI’s, (2) data can be partly accessed and (3) existing tools and libraries can be reused. On the other hand as data publisher (1) using RDF “Graph” of data can be more effort than tabular (Excel/CSV) or tree (XML/JSON) data, data can be combined safely with other data, (2) data can be merged and combined safely, (3) data can be presented using URIs which is a global scheme for representing data. Being publisher (1) one has fine-granular control over the data items and can optimise their access (load balancing, caching, etc.), (2) data can easily be sliced and diced, (3) URIs must be assigned to data.
Five : Achieving five stars helps data user to do all what can be done using four stars Web data and additionally (1) More data can be discovered while consuming the data, (2) data schema can directly be learned, (3) consumers have to deal with broken data links, and (4) One have to be cautious for linking new data with existing for trust and provenance related issues. On the other hand being data provider or publisher one (1) must have to make data discover-able, (2) can increase the value of the data, (3) can gain the same benefits from the links as the consumers, (4) need to invest resources to link data to existing data over the Web and (5) may face some overhead to repair broken or incorrect links.

From the first star to the last star, the openness of the data is supported by better structure, better interoperability, and thus better reuse through open access, openness of data standards and the interlinking of data as a World Wide Web (WWW) data resource.

4 Fair Data

As proposed by Wilkinson et al. [17] the FAIR principles provide guidelines so that any resource over the Web including data resources, tools, vocabularies and infrastructures should exhibit certain characteristics to ensure the discovery and reuse by anyone. These principles also provide guidelines for data publishing, exploration, sharing, and reuse in both manual as well as automated settings. There have been a number of recent domain specific practices and guidelines for data publishing and management [6, 14, 16], FAIR guiding principles stand unique in number of ways as these are domain-independent and high-level that can be applied to both metadata as well as data.

In the following we present the elements of FAIR Principles directly taken from Wilkinson et al. [17]. It is worth noting that these elements are related but independent of each other.

The FAIR Guiding Principles [17]:

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

\(=>\) corresponds to the usage of URIs (i.e. Four ).
F2. data are described with rich metadata (defined by R1 below)

\(=>\) this certainly requires structured representation (i.e. Two ) and at best would relate to linkage to other data for context (i.e. Five ).
F3. metadata clearly and explicitly include the identifier of the data it describes

\(=>\) corresponds to the usage of URIs (i.e. Four ).
F4. (meta)data are registered or indexed in a searchable resource

\(=>\) this is a statement about technologies making use of the data, which is the consequence of data access over the Web (i.e. One )

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardised communications protocol

\(=>\) corresponds to the usage of URIs (i.e. Four ), although other identifiers could be used as well.
A1.1 the protocol is open, free, and universally implementable

\(=>\) corresponds again to URIs and related open data standards (i.e. Four ).
A1.2 the protocol allows for an authentication and authorisation procedure, where necessary

\(=>\) the LOD only covers the open access part as the primary requirement (i.e. One ).
A2. metadata are accessible, even when the data are no longer available

\(=>\) At best, this corresponds to the URIs, but long-term acquisition is not a matter of data standards but relates to IT infrastructures.

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation

\(=>\) this corresponds to the use of “non-proprietary open formats” (i.e. Three ).
I2. (meta)data use vocabularies that follow FAIR principles

\(=>\) this statement is self-referential and refers to contextual standardised data (i.e. Five ).
I3. (meta)data include qualified references to other (meta)data

\(=>\) again this refers to contextual standardised data (i.e. Five ).

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

\(=>\) this refers to the structured data, but makes a stronger reference to contextual information (i.e. Three ).
R1.1. (meta)data are released with a clear and accessible data usage license

\(=>\) a statement about the delivery of (meta)data, which is meant to be open according to LOD (i.e. One ).
R1.2. (meta)data are associated with detailed provenance

\(=>\) provenance is part of the contextual data (i.e. Five ).
R1.3. (meta)data meet domain-relevant community standards

\(=>\) again, community standards are part of the contextual data and should be openly available (according to LOD, i.e. Five ).

Like LOD five stars these principles are incremental and can be considered in any combination result in achieving the higher degrees of ‘FAIRness’. In other words, the better the adoption of FAIR guidelines the higher is the degree of ‘FAIRness’ claimed.

For each FAIR principle, we give a judgement, which LOD star is best aligned with the FAIR principle, if any applies. As can be seen from the Table 1, more than one LOD principle could be linked to a FAIR principle, but we only give a judgement to the highest LOD 5-star principle.

Table 1. FAIR principles mapping to LOD 5-star scheme.

Full size table

5 Distinctions and Overlaps

5.1 LOD and Non-data Assets

The LOD 5 stars scheme is for the Open Data. This scheme provides the guidelines for data providers and publishers in order to make data more accessible, available and reusable. Whereas the FAIR principles can equally be applied to any non-data assets in the same manner as data.

FAIR does not impose any constraint that the data must be openly available and requires that access to the license agreement is made available. The concern can be raised that restricted data will also limit reusability only by the limitation that access is not openly available.

Furthermore, FAIR asks for meta-data to be provided with the data to improve interoperability, which relates to the contextual information from the LOD principles through interoperable data linked as one or several interoperable open data sources. Here, the FAIR data principles envision data about data to improve reusability.

Next, FAIR sees access to the FAIR data as a task which would have to be achieved through a supported IT infrastructure. This is certainly a requirement which would play a more important role on data that is not openly available (non open access data), since the data cannot be openly replicated without license agreement.

Finally, the LOD principles reside on URIs as the key element to achieve openness, interoperability and reuse, whereas FAIR would allow for a wider range of identifiers that would achieve the same purpose. This principle certainly – again – applies mainly to data sources that are more restricted in comparison to Linked Open Data.

5.2 Tool and Technology Independent

The high-level FAIR Guiding Principles are independent of implementation choices, and do not suggest any specific technology, standard, or implementation-solution. On the other hand, these principles are not, themselves, a standard or a specification. They act as a guide to data publishers as well as consumers to assist them in evaluating whether any specific implementation choices are able to support their digital research artefacts Findable, Accessible, Interoperable, and Reusable [17]. Same is the case with LOD 5-star scheme, which are not themselves a standard or a specification but provides the guideline for choosing the tools and technologies to make linked open data more available, searchable and open.

6 Discussion and Conclusion

The LOD 5-star scheme has been developed under the assumption that linked and open data will lead to reusability, shared benefits and other advantages. This could lead to the conclusion that the LOD scheme has been defined for a more narrowly defined set of data, i.e. data that is openly available, in contrast to the larger range of data, i.e. data that has to used in compliance with license restrictions. It is straight forward to assume that data will be openly accessibly in the future, however, in recent periods this state did not necessarily come about.

On the other side, the FAIR data principles have been defined to drive re-usability of data. The used approach for the definition of FAIR data makes strong reference to the existence of meta-data for the reusable data, in particular to cover license agreements, however, the FAIR approach did not (seem to) capitalise on the LOD data principles. A priori, we can argue that FAIR data would cover a broader scope and would include precious data that is not given in the open space, e.g. data being provided by publishing companies, and still we could take the opposite position pointing out that even data under license agreements would be best served in the linked open data space.

Another point of distinction is related to the requirements of accessibility as an essential element of reusability. As pointed out in our reasoning, this aspect is directly depending on IT infrastructures that serve the data. In this sense, the FAIR principles are more driven by implementation considerations than the LOD principles are. To some degree, we can reason that the FAIR principles cover a wider scope (e.g., including the licensed data), but - by contrary - also have to realise the burden to deliver the data to the users without making strong assumptions on the openness of the data.

As an overall conclusion, the FAIR principles are focused to user needs around a wide range of existing data, whereas the LOD principles form an idealistic approach to a world using open data. LOD principles appear to be more stringent in its own principles, whereas the FAIR data principles can be seen as reusing the LOD principles without making explicit reference and with added aspects of separating data into the core data as well as the meta-data, including license considerations. The future will tell, whether the FAIR data principles lead to the reusability of data under the consideration that the access to meta-data and to license agreements open the avenues of reusabiltiy, or whether the LOD principles are the gatekeepers to reusability, since – a priori – the data should be openly available anyways.

Notes

1.
(http://oznome.csiro.au/5star/) la:20-03-2018.
2.
http://sparqles.ai.wu.ac.at/ l.a (10-04-2018).

References

Berners-Lee, T.: Is your linked open data 5 star. Repéré à (2010). https://www.w3.org/DesignIssues/LinkedData.html
Berners-Lee, T., Fischetti, M., Foreword By-Dertouzos, M.L.: Weaving the web: the original design and ultimate destiny of the World Wide Web by its inventor. HarperInformation (2000)
Google Scholar
Bourne, P.E., et al.: Improving The future of research communications and e-Scholarship (Dagstuhl Perspectives Workshop 11331). Dagstuhl Manifestos 1(1), 41–60 (2012). http://drops.dagstuhl.de/opus/volltexte/2012/3445
Cox, S., Yu, J.: Oznome 5-star tool: a rating system for making data fair and trustable. In: Proceedings of the 2018 eResearch Australasia Conference (2017)
Google Scholar
Hasnain, A., Fox, R., Decker, S., Deus, H.F.: Cataloguing and linking life sciences LOD Cloud. In: 1st International Workshop on Ontology Engineering in a Data-Driven World Collocated with EKAW 2012 (2012)
Google Scholar
Hasnain, A., et al.: Linked biomedical dataspace: lessons learned integrating data for drug discovery. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 114–130. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_8
Chapter Google Scholar
Hasnain, A., et al.: BioFed: federated query processing over life sciences linked open data. JBMS 8(1), 13 (2017)
Google Scholar
Hasnain, A., Mehmood, Q., Sana e Zainab, S., Decker, S.: A provenance assisted roadmap for life sciences linked open data cloud. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2015. CCIS, vol. 518, pp. 72–86. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24543-0_6
Chapter Google Scholar
Hasnain, A., Mehmood, Q., e Zainab, S.S., Hogan, A.: SPORTAL: profiling the content of public SPARQL endpoints. Int. J. Semant. Web Inf. Syst. (IJSWIS) 12(3), 134–163 (2016). http://www.igi-global.com/article/sportal/160175
Hasnain, A., Mehmood, Q., e Zainab, S.S., Hogan, A.: SPORTAL: Searching for public SPARQL endpoints. In: International Semantic Web Conference (Posters & Demos) (2016)
Google Scholar
Hasnain, A., et al.: A roadmap for navigating the life sciences linked open data cloud. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 97–112. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15615-6_8
Chapter Google Scholar
Hasnain, S.M.A.: Cataloguing and linking publicly available biomedical SPARQL endpoints for federation-addressing aPosteriori data integration. Ph.D. thesis (2017)
Google Scholar
Saleem, M., Hasnain, A., Ngomo, A.C.N.: LargeRDFBench: a billion triples benchmark for SPARQL endpoint federation. J. Web Semant. 48, 85–125 (2018)
Google Scholar
Sandve, G.K., Nekrutenko, A., Taylor, J., Hovig, E.: Ten simple rules for reproducible computational research. PLoS Comput. Biol. 9(10), e1003285 (2013)
Article Google Scholar
Vandenbussche, P.Y., Umbrich, J., Matteis, L., Hogan, A., Buil-Aranda, C.: SPARQLES: monitoring public SPARQL endpoints. Semant. Web 8(6), 1049–1065 (2017)
Article Google Scholar
White, E.P., Baldridge, E., Brym, Z.T., Locey, K.J., McGlinn, D.J., Supp, S.R.: Nine simple ways to make it easier to (re) use your data. PeerJ PrePrints (2013)
Google Scholar
Wilkinson, M.D., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016)
Google Scholar
Yamamoto, Y., Yamaguchi, A., Splendiani, A.: YummyData: providing high-quality open life science data. In: Database 2018 (2018)
Google Scholar

Download references

Acknowledgement

This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, co-funded by the European Regional Development Fund.

Author information

Authors and Affiliations

Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
Ali Hasnain & Dietrich Rebholz-Schuhmann

Authors

Ali Hasnain
View author publications
You can also search for this author in PubMed Google Scholar
Dietrich Rebholz-Schuhmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Hasnain .

Editor information

Editors and Affiliations

University of Bologna, Bologna, Italy
Aldo Gangemi
IBM Research - Almaden, San Jose, CA, USA
Anna Lisa Gentile
CNR-ISTC, Rome, Italy
Andrea Giovanni Nuzzolese
Technische Universität Dresden, Dresden, Germany
Sebastian Rudolph
Karlsruhe Institute of Technology, Karlsruhe, Germany
Maria Maleshkova
University of Mannheim, Mannheim, Germany
Heiko Paulheim
University of Aberdeen, Aberdeen, UK
Jeff Z Pan
CNR-ISTC, Rome, Italy
Mehwish Alam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hasnain, A., Rebholz-Schuhmann, D. (2018). Assessing FAIR Data Principles Against the 5-Star Open Data Principles. In: Gangemi, A., et al. The Semantic Web: ESWC 2018 Satellite Events. ESWC 2018. Lecture Notes in Computer Science(), vol 11155. Springer, Cham. https://doi.org/10.1007/978-3-319-98192-5_60

Download citation

DOI: https://doi.org/10.1007/978-3-319-98192-5_60
Published: 02 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98191-8
Online ISBN: 978-3-319-98192-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Assessing FAIR Data Principles Against the 5-Star Open Data Principles

Abstract

Similar content being viewed by others

Ontological Representation of FAIR Principles: A Blueprint for FAIRer Data Sources

Towards a More Reproducible Biomedical Research Environment: Endorsement and Adoption of the FAIR Principles

Investigating the FAIRness of Science and Technology Open Data: A Focus in the Scandinavian Countries

Keywords

1 Introduction

2 Related Work

3 LOD 5-Star

4 Fair Data

5 Distinctions and Overlaps

5.1 LOD and Non-data Assets

5.2 Tool and Technology Independent

6 Discussion and Conclusion

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Assessing FAIR Data Principles Against the 5-Star Open Data Principles

Abstract

Similar content being viewed by others

Ontological Representation of FAIR Principles: A Blueprint for FAIRer Data Sources

Towards a More Reproducible Biomedical Research Environment: Endorsement and Adoption of the FAIR Principles

Investigating the FAIRness of Science and Technology Open Data: A Focus in the Scandinavian Countries

Keywords

1 Introduction

2 Related Work

3 LOD 5-Star

4 Fair Data

5 Distinctions and Overlaps

5.1 LOD and Non-data Assets

5.2 Tool and Technology Independent

6 Discussion and Conclusion

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation