Keywords

1 Introduction

Currently, there are over 519 open data portals or catalogs available globallyFootnote 1 and over 8,000 datasets published in the European Union (EU) Open Data Portal [13] in response to calls for unprecedented government openness and the EU’s Public Section Information directive [7]. Unfortunately, barriers such as limited access to open data by citizens and third-parties and lack of dedicated resources on the part of government organizations to sustainably publish datasets of high value [17] has significantly limited the expected innovation and benefits of open data. While releasing data on processes and decisions of governments should, in general, improve transparency, recent studies have shown that transparency does not only depend on how visible information is made but also on how understandable the information is. Few studies like [23] have discussed the importance of open data platforms in enhancing transparency through features for accessing, using and collaborating on open data. In our opinion, the innovation potentials of open data are significantly impacted by the transparency-related affordances of the underlying open data platforms. Therefore, we believe that a good understanding of the limitations of existing platforms and the stakeholders’ needs with respect to transparency-related affordances of open data platforms, is imperative for harnessing open data resources held on them. Albeit there are a few existing studies on Open Data Platforms such as [14], none of these studies explicitly address the affordances of these platforms regarding their explicit support for both data and organizational transparency. Building on our previous work reported in [1], the study presented here fills this gap by first reviewing the features of eleven open data platforms. The selected features are those that impact on the accessibility and understandability (or transparency) of datasets managed on the platforms. The eleven platforms were reviewed in this study include: CKAN [6], DKAN [11], Socrata [27], PublishMyData [24], Information Workbench [16], Enigma [12], Junar [18], OpenDataSoft [22], Callimachus [4], DataTank [8] and Semantic MediaWiki [26]. Next, the paper describes findings from the expert interviews and collective intelligence workshop co-organized with with the Dublin City Council to identify perceived shortcomings of current generation open data platform and the desired features of next generation ones from the perspectives of different categories of open data stakeholders. The rest paper is structured as follows: Sect. 2 provides an extensive review of existing platforms, Sect. 3 presents the methodology for the review and evaluation while Sect. 4 reviews the findings from the study, Sect. 5 discusses our findings, while concluding remarks are given in Sect. 6.

2 Review of Existing Platforms

Platform, in general, as a concept comprises three aspects: (1) a stable low-variety core, (2) a changeable set of complements and (3) interfaces which enable core and complements to operate as a single system [3]. In our study, we consider an Open Data Platform to be a software platform which comprises a software ecosystem that supports different end-user interactions with open data including search and discovery of datasets, publishing of datasets, analysis and visualization of datasets as well as sharing and development of stories from datasets. As part of the background study to our investigation, we reviewed eleven open data platforms based on some features which we considered pertinent for both data and organizational transparency. Data transparency is related to the accessibility and understandability of data. Organizational transparency relates to the degree to which both internal and external agents can observe or access different kinds of information about its functioning [19]. These platforms were selected based on their popularity and availability of them. A brief description of the features and our observations are explained as follows:

Metadata Schema and Data File Formats: open data platforms supports different metadata schema and a variety of data file formats. Metadata refers to information about the dataset which includes information such as the title, author, subjects, keywords, publisher, revision history, changes, and the source of data. Metadata also includes structural information about the data such as keys, indexes, columns. Metadata information enables search capabilities and enables interoperability between different open data platforms. Several file formats are associated with the datasets managed on open data platform. Example of file formats include CSV, JSON, XLS, RDF, PDF and HTML.

Search facility for datasets: contemporary platforms provide keyword-based search capability on metadata associated with the dataset and filtering of results; emerging platforms such as Enigma offer record-level search capability and information filtering at multiple levels.

Social Media and Collaboration: increasingly available platforms allow interaction between users through social sharing features. Specifically, the feature enables users to comment, share, review, and rate available datasets.

Dataset Publishing: supports dataset publishing process including dataset refinement, upload, and linking to other datasets.

Federation and Harvesting: Federation allows data replication across different instances of a platform; enabling seamless integration between the various independent platform instances. This allows searching across multiple instances of the platform. Harvesting allows automated extraction of data from different open data sources and catalogs into a one more more catalogs.

Extensibility mechanism: this feature enable adaptation and extension of the platform with complementary features through mechanisms such as APIs and libraries, connectors, plugins and extensions.

Data Analysis tools: features enables simple statistical operations, online analytical processing (OLAP) operations, use of dashboards and analysis widgets. It also provides access to external analytical tools such as R Programming Language.

Visualisation tools: feature enables the use of maps and charts in visualizing datasets. It employs existing maps services such as OpenStreetMaps, Google and Bing maps. Also, libraries such as D3.js and recline.js are supported.

Personalization tools: this feature allows users to modify the platform’s look-and-feel for branding purposes and adopt specific portal view for different users based on desired preferences. This could include how datasets and search results are displayed.

Customization tools: feature allows the portal administrators to define the metadata schema, extensions to use and configuration of the data store and limits.

Dataset licensing service: feature allows licensing information to be added to the dataset as one of the metadata information.

Resource Accessibility: features enables access the data through APIs. These APIs provides clear specifications for external requests using protocols such as REST (Representational State Transfer) or SOAP (Simple Object Access protocol) services.

Technical Environment: describes the programming environment of the open data platform; a vital information for extension of platforms.

Others: We capture here all other features available on the platforms such as data download statistics that could impact on the accessibility of datasets.

Table 1 presents the summary of features for each of the eleven platforms reviewed. Summary of our observations on their features are discussed in Sect. 4 while the we refer the reader to [25] for detailed discussion.

Table 1. Summary of platform features

3 Methodology

3.1 Research Objectives

The aim of the study is to determine the needs of stakeholders regarding the desirable affordances of the next-generation open data platform that could reduce the barriers to the adoption and use of open data. The work explicitly aims at answering the following questions:

(Q1) To what extent are transparency-enhancing features supported on selected open data platforms? In answering this question, we adopted a set of criteria that enable direct and indirect support for dataset transparency and socialisation around datasets for evaluating selected platforms. These criteria were partly based on past studies [14]. The identification of remaining features (done iteratively) was based on whether the features will impact data and organizational transparency. Finally, the twelve features elaborated in Sect. 2 were employed for the analysis - Metadata, Data and File Formats; Flexible search facility for datasets; Social Media, and Collaboration; Dataset Publishing; Federation and Harvesting; Data Analysis; Visualisation tools; Personalisation; Customisation; Dataset licensing; Accessibility and Extensibility mechanisms. We also reviewed extant literature on open data platforms.

(Q2) What are the perceived limitations of open data platforms? To tackle this question, we considered the barriers identified by stakeholders during the collective intelligence workshops and interviews that are related to data transparency and organizational transparency. We associated these barriers with specific stakeholder categories and specified the nature of transparency quality impacted by each barrier. We present the analytical model for Transparency Qualities in Sect. 3.2.

(Q3) What specific platform features suggested by Stakeholders? To answer this question, contributed solutions to identified barriers provided by stakeholders during interviews and workshops were analysed. The specific features were organized under three categories including (a) information needs of stakeholders – what kinds of datasets do stakeholders highlight as important for them? (b) social and collaboration needs – what features relate how stakeholders wish to interact and collaboratively make sense of the published data? (c) understandability, usability, and decision making needs – what platform features do stakeholders desire to be able to carry out inferences on datasets, enable easy sense-making of data and make decisions based on available data.

3.2 Analytical Framework

Our study is conceptually grounded in computer-mediated transparency characterized by Meijer in [21]. Computer-mediated transparency is characterised as unidirectional or one-way between parties involved in the transparency relationship and decontextualized regarding information being shared. As a necessary condition for any form of transparency, computer-mediated transparency should ensure that external or receiving parties are capable of processing the information that has been made available [15]. Thus, in our study, we argue, that open data platforms features should explicitly mediate effective transparency in terms of access and understandability. In particular, our study conceptualizes transparency as a quality which should be “satisficed” by an open data platform [5]. Specifically, we employ the deconstruction of the transparency construct described in [5] as shown in Fig. 1 to underpin our analysis of pathologies and platform-related barriers.

Fig. 1.
figure 1

Transparency Qualities and Sub-Qualities [5]

3.3 Data Gathering

We employed four data gathering methods in this study. The first source of information is extant literature on open data platforms. The second source of information is the survey of selected open data platforms through hands-on use and exploration of selected platforms in addition to review of accompanying documentations. Expert interviews are the third source of information. Lastly, we conducted a Collective Intelligence (CI) workshop involving stakeholders such as publishers, data intermediaries and wrangler, platform developers and end-users.

The CI workshop was co-organized with Dublin City Council in April 2015 to identify and to understand the perceived barriers and issues with using the current set of open data platforms and essential and expected features in the next generation platforms. There were ten participants in the workshop from different stakeholders’ categories including data consumers, suppliers, mediators or intermediaries and Enablers. Data consumers are end-users of open data as the general public or apps developers. Data suppliers include all entities involved in the publishing of datasets, Non-Governmental organizations or private sector entities.

4 Results

4.1 Features of Existing Platforms

We studied the features available on the platforms by analysing extant literature and documentations about the platforms in addition to a systematic review of selected instances of these platforms. We observed that CKAN, DKAN, Sokrata and Semantic MediaWiki are clear outliers by providing fully-developed features that support between nine and twelve of the review criteria described in Sect. 2 (details presented in Table 1). Other solutions support between one and seven fully-functional features. Nevertheless, while the use of social media, personalisation and customisation of platform features are common in state-of-the-art platforms, the support for metadata schema adaptation, advanced visualisation of datasets and granular accessibility of datasets are limited. Features such as availability of publishing pipelines were also found to be relatively limited on existing platforms. Personalisation and customisation features are very common features of existing platforms and most of these platforms also support social media integration for social sharing and posting to social media pages. Table 1 provides a summary of features for the eleven platforms.

4.2 Platform Pathologies

This section presents our findings based on the analysis of the information captured from the interviews and workshop sessions about barriers and limitations of current open data platforms, and the desired affordances to address the identified shortcomings. The categories of stakeholders engaged include consumers, enablers, suppliers, and mediators of open data.

The findings show that the most common obstacle for using of open data platforms is perceived the poor quality of open data provided on the platforms. According to the stakeholders, poor data quality is largely characterised by poor metadata, failure to present data appropriately to different audience and difficulty in locating data of interest. Other barriers are related to irrelevancy of datasets, the poor usability of platforms and lack of examples of the prior use of available datasets. We highlight in Table 2 some of the issues identified by stakeholders. For each obstacle identified we specified a category for the problem such as “Non-relevancy” or “Poor awareness” and also associated the barrier with a high-level transparency quality attribute (such as Accessibility) and a more specific quality attribute like “Availability”. This coding is underpinned by the model described in Sect. 3. Poor usability stood out as the most prominent shortcoming.

Table 2. Shortcomings of selected open data platforms

4.3 Desired Affordances

The desired essential features contributed by stakeholders for next generation open data platforms which were obtained from workshop and interviews were grouped under three related categories: (1) Information needs, (2) Social and Collaboration, and (3) Understandability, Usability and Decision making needs.

On the information need, stakeholders requested future platforms to ensure easy access to datasets related to their immediate environment and communities. They also requested the inclusion of datasets of interests: crime statistics, public health data as well as data about the environment. In addition, stakeholders demanded dataset rating, feedback comments on datasets, collaborative curation of datasets, prioritization and voting on dataset requests. Furthermore, stakeholders requested incentives for the use of platform through reward system and gamification under the social and collaborative needs. For better understandability, usability and improved decision-making on the upcoming platforms, stakeholders demanded customizable dashboards, custom visualization tools, data mining tools, support for linked data and map-based search. Some also demanded for a question and answering features. Details of the desired affordances are shown in Table 3.

Table 3. Desired features in future open data platforms

5 Discussion

The contemporary open data platforms provide limited data cataloging features and largely lack interactivity, which implies low user engagement. While the design of the current open data platforms aligns well with the technical users needs (developers and data scientists); the findings from our study show that ordinary end-users (members of the public) require more usability and user-friendliness as an imperative for improved transparency. We argue that the principle of citizen-centric and one-stop service design applied to e-government e-services portals should be mapped onto the next generation open data portals, in order to offer the public with a one-stop access to “data services”. In particular, we argue that tools supporting social interaction between platform users are very essential for improved transparency in the context of monitorial, deliberative as well as participatory democracy [19, 20, 21]. Specifically, based on the feedback received from the interviewed experts and workshop participants, features like content sharing and possibility to discuss around datasets are pivotal. The integration of social media platform with an open data platform was one of the key subjects of the research reported in [2]. Moreover, the stakeholders expressed a strong need for anonymity while using open data. This observation supports the thesis that citizens perceive open data portals as a government-owned website and fear that their privacy may be endangered. Our findings are consistent with Open Data Barometer report [9, 10]. We argue that, the next generation open data platforms should augment and extend the existing solutions. Therefore, based on the review of the contemporary platforms we have identified platforms implementing open architecture that can be extended to accommodate new desired features. We observed that platforms like CKAN, DKAN and Semantic MediaWiki are the best candidates as they are most extensible, free and open source. Moreover the platforms provide a rich set of extension mechanisms with a set of guidelines for supporting developers in building new extensions. Callimachus and DataTank platforms are also open source and could be modified however considering no explicit extension mechanism implemented, any modifications would require significantly more effort. Thus, we believe that existing platforms provide a good foundation for next-generation platforms with the desirable set of affordances.

6 Conclusions

In this work we provide an elaboration on the shortcomings of the current generation and core affordances for future open data platforms. This study complements state-of-the-art knowledge on open data platforms from the perspective of different stakeholders groups. While related studies have dealt mainly with the technical aspects of the platforms, our findings provide deeper insight on socio-technical issues in general based on the direct contributions of different categories of stakeholders. Guided by our results, we argue, that some open data platforms such as CKAN, Socrata, DKAN, Semantic MediaWiki are extensible and provide suitable support for better data transparency of datasets. Specifically, CKAN, DKAN and Semantic MediaWiki are open-source platforms which standout regarding extendibility. Thus, we consider these platforms as plausible base infrastructure for building next generation open data platforms with significantly improved adoption and use potentials by public administration, citizens and other stakeholders.