Keywords

1 Introduction

In 2019, the Government of the Russian Federation approved the Concept for the creation and operation of National Data Governance System (hereinafter referred to as the System) [22], developed to implement the objectives of the federal project “Digital Government Management” as a part of the national program “Digital Economy of the Russian Federation”Footnote 1. The goal of the System is to increase the efficiency of government data creation, collection and use for the delivery of public services at the state and municipal levels, implementation of state and municipal functions, and the provision of access to the information in accordance with the needs of citizens and business [22].

The Concept authors plan to achieve this goal by “ensuring the completeness, relevance, consistency and coherence of government data”. The task of regulatory legal and methodological framework development sets the requirement to establish “the rules for creating the government data model based on the principles of continuous development, gradual filling, and consistency, including the development of descriptions and relations of entities as well as their formats”. According to the definition this government data model is “the totality of government data descriptions, organizational and technological rules and standards used to manage government data, including the description of relations between data types, as well as between the objects defined by them” for the purposes of cross-agency information exchange (interaction). Among the declared functions of the System we can distinguish the feature “to maintain government data model, including the description of the structure, contents and relations of government data, their suppliers and users, ensuring the historicity and versioning of the model, as well as the management of government data lifecycle”  [22].

However, the Concept Road Map does not contain any action focused on the development of such government data model. This obviously contradicts to one of the System principles “to ensure the ontological unity of government data contained in the information resources of public sector bodies and organizations” [22].

Many researchers agree that the lack of accessible ontologies and standards inhibits the process of government data development as well as the achievement of the fifth level in the 5-star model [4]. Proposed by Tim Berners-Lee this model serves for OGD maturity assessment in Europe [18]. 5-star model defines all government data should be open, linked and published in a machine-readable format, providing the context to the data consumers [4]. This means that for “maintaining the government data model” it is necessary to develop and implement methods and tools for creating and reusing such ontologies (and other semantic models). It is also important to support the collaboration of experts working on domain models which will form the basis for the future government data model. The life cycle management should cover not only the data, but also the models describing them [1]. Unfortunately, the Concept and its Road Map do not declare such functions or actions.

Nevertheless, according to the Concept “the linking of government data in various information systems” serves to achieve significant economic effects: increasing the accuracy of planning and forecasting, the speed and quality of government decisions made within the framework of public administration tasks due to the use of “big data” tools and machine learning technologies.

In 2018 we conducted the first round of this research aimed to answer “if Russia is ready for digital transformation of e-Government” [2]. We suggested assessment methodology based on a detailed review showing the importance of Linked Open Data (LOD) for the establishment of data-centric and model-oriented paradigm in the achievement of new e-government maturity levels and its changing over to data-driven digital government [2]. This year new challenges posed by the implementation of recently accepted data strategies in Russia have motivated us to focus mostly on linked open data and to check how the situation changed over the past two years.

The task to reach data-centricity through LOD has become complicated not only in Russia, but also in other countries. This challenge arises even though open government data has long been recognized as a stimulating driver for innovative public services and increasing the level of public value. The driver, which changes the approach to the public services development and provides the possibility of proactive delivery in accordance with the demand and expectations of consumers.

Government organizations produce and publish many open datasets. Some of them do this only to fulfill regulatory requirements and perform their direct functions. But the majority hopes that the customers will effectively use published data for analysis, visualization, and/or application to new digital services. But, unfortunately, many researchers agree this potential has not yet been realized at the expected level. Among the most common barriers that impede the use of OGD they name the “lack of quality, lack of license and lack of technical know-how” [16, 19]. According to our previous review and practical experience in OGD dissemination and reuse there are some other significant barriers: (1) the absence of models for the preparation and interpretation of data such as semantic assets of various levels (ontologies, thesauruses, glossaries, dictionaries) providing the data with semantic annotation; (2) the lack of methods and tools for the development and distribution of these models, as well as for the collaboration of domain experts. To ensure semantic interoperability in a heterogeneous information environment of e-government with lots of inherited systems and services is not enough just to assign URIs to the concepts describing data. It is necessary to set all the relations sufficient to unambiguous data interpretation and reuse, and not to lose or distort their meaning in the process of machine processing.

Given the acceptance of this new National Data Governance System Concept in Russia we consider rather important to make the second round of our research and renew the assessment results. The main objective of this paper is to reexamine Russian e-government readiness for digital transformation in terms of data-centricity using the same criteria to ensure comparability of results.

Among the ways to overcome the barriers mentioned above and get the expected value from the open government data produced by public administration the experts suggest the use of Linked Open Statistical Data (LOSD) and open cubes [3, 16, 21]. Indeed, most OGD have close connection to statistics: demographic (for example, census data), economic or social indicators (for example, the number of new enterprises, unemployment rate) [9]. Open multidimensional statistics is one of the main OGD domains. It provides an important basis for accelerating socio-economic development by creating new socially significant public services and innovative projects using disruptive technologies. In Russia Digital Analytical Platform for providing statistical data will be the part of National Data Governance System with the aim to improve statistical data production and dissemination [22]. Due to the importance of this topic we give the review of existing approaches to Linked Open Statistical data and the use of ontologies (i.e. semantic models) in this field, which serve to support current Rosstat’s initiatives and ongoing research on LOSD implementation for statistics in Russia.

2 Methods

The study of digitalization experience in Russia and abroad conducted in 2018 [2] represented a detailed review and put the ground to identify the following criteria characterizing the data-centric and model-oriented approach to the development of data-driven e-government:

  • Linked open government data publication (use of semantic models).

  • Application of information exchange models to achieve semantic interoperability.

  • Use of open data model standards [2].

This year in the second round of assessment we used the same methodology to ensure the comparability of current research results. Therefore, we have conducted a comparative analysis of results (2018-2020), based on the previously identified criteria, showing the level of digital transformation in Russia in:

1. Information sharing practice. We again used the 5-star model [4] to measure the maturity of Open Data published on Russian PortalFootnote 2 and EU OD PortalFootnote 3, and then compared these results with the previous round of research (Sect. 4.1). The use of 5-star model is relevant to the existing methods of open data quality evaluation [4, 18].

2. Preconditions to the establishment of data-centric and model-oriented paradigm in digital government. Guided by the fact that such prerequisites are usually reflected in academic papers we used Google Scholar to reveal the works devoted to the digital transformation in public sector of Russia and abroad. We took 2019-2020 as a new publication period and compared achieved results to make the conclusion (Sect. 4.2).

According to expert opinion, linked OGD sets play an important role in the development of new public services development providing the data layer for innovative applications [16]. Moreover, a number of works devoted to the development of linked open statistical data [14, 15, 20] proves how LOSD (and LOD in general) projects supported by the groundwork of semantic models and standards [2] are able to bring new life to disparate sets of open government data and significantly increase their value for the development of new digital public services and data-driven government. We represent the review of the existing practice in LOSD and the application of such semantic models as ontologies (Sect. 3). We consider this approach necessary for the implementation of linked open statistics in Russia.

3 Review

In 2018 we conducted the first round of this research to study if Russia was ready for digital transformation of public sector. We represented rather detailed analysis and review to prove the role of data-centricity supported by models in digital transformation. That time we highlighted the needs to take effort and move towards a data-centric paradigm to achieve the goals of Russian Digital Economy program. The first review set the basis both for the developed research methods and for the identified assessment criteria, but it did not focus on the experience and the role of LOD in statistics.

In statistics Linked Open Data provides a comprehensive analysis of disparate and isolated datasets. In fact, many national statistical institutes and public agencies already actively follow the linked paradigm in publishing statistical data on the Internet  [15]. Many standard vocabularies have been already proposed in this domain (for example, QB, SKOS, XKOS), and necessary semantic models have been developed (for example, in the LOD2 projectFootnote 4) [1]. Within the framework of the European Statistical System LOSD ESSnet has been recently established to collect and analyze best practices for publishing Linked Open Statistical Data implemented by statistical organizations of various levels (National Institutes, Eurostat). Pan European programs (for example, already mentioned ISA (see Footnote 2)) also support and encourage LOSD development.

In the process of LO(S)D creation developers feel the demand to significantly expand existing semantic standards to meet the requirements and reflect various statistical concepts classification and management specific features. The management of statistical concepts requires the use both hierarchies (e.g. in statistical classifications) and associations because they are more informative. At the same time, common (standard) relations are not sufficient to description of statistical concepts since it is necessary to determine either cause-effect or temporal interconnections. Thus, to remove SKOSFootnote 5 restrictions, for example, in 2013 UNECE and Eurostat proposed to use eXtended Knowledge Organization System (XKOS) [8]. Another example of SKOS extension is Japan Open Data Project providing an expanded set of external dictionaries and models.

Ontologies, successfully used over many years in the less formalized Semantic Web environment, provide naming, definition, and description of domain concepts, as well as various relations between these concepts. In official statistics, there are also some vocabularies, ontologies, or other semantic models, but as a rule they have no formal representation as well as they are usually not consistent with each other. Being one the leaders in digital transformation of public sector the United Kingdom pays great attention to the development of ontologies for government data, including statistics domainFootnote 6. The Italian National Institute of Statistics (Istat) also reports on the use of ontologies for the integration and dissemination of statistical data. They follow Ontology-Based Data Management approach (OBDM) proposed to integrate several heterogeneous data sources. Italy has applied this experience on Istat Linked Open Data PortalFootnote 7.

The High-Level Group for the Modernization of Official Statistics is responsible for the implementation of ontologies at UNECE. It includes a special group formed to support standards and find the ways to develop, improve, integrate, and promote their implementation necessary for modernizing statistics. This group has operational responsibility for maintaining and developing the General Model of Activities for Statistical Organizations (GAMSO) [10], General model of statistical business processes (GSBPM), General Model of Statistical Information (GSIM) [12] and Common Statistical Production Architecture (CSPA) [6].

The important activity for LOSD development is the creation of Core Ontology for Official Statistics (COOS) which began in November 2018. Its main objective is solving the problem of heterogeneity and fragmentation inherent for existing semantic models in statistics. Additionally its indirect, but important task is to bring together the expert community interested in developing ontologies for statistical data [7].

Following this review, which proves the importance of LOD supported by semantic models in statistics, we extended our research with a brief study of the initiatives taken in Russia in LOSD production and dissemination.

4 Results and Discussion

4.1 Information Sharing Practice

Experts widely use the 5-star open data deployment scheme [4] to evaluate the maturity level of open data.

Table 1 demonstrates aggregated statistics showing OGD publication in Russia (criterion 1) during the first assessment (2018) and at present time (2020) during our research.

Table 1. Russian OGD Publication in 2018 vs 2020.

In May 2020, the number of data sets available on the official Russian Federation Open Data PortalFootnote 8 remains minimal. 23,775 sets have been published over the entire period. This is 9 times less than in the USAFootnote 9 (211,000 sets) and 46 times less than on the European Data Portal (1,086,559 sets). Thus, the “open data available on the web (whatever format)”, is still catastrophically small in Russia. Over the past two years, no more than 3,000 datasets were added. From this aspect Russian OGD is still at the 1st level of 5-star model. However, over 60% of OGD is available in CSV, and it maintains the position corresponding to the 3rd star. We should note that over this period not a single new dataset in RDF format has been published, and those 5 existing datasets have not been updated since 2016–2018. It confirms the fact, in Russia there is still no practice in LOD publication and development.

The study of open data registryFootnote 10 shows no positive changes in datasets updating. In 2020 from 15,234 datasets published under the tag “State” only 705 have the status “updated”. Totally there are 2036 datasets updated since 2018. Many datasets have the reference to the data structure. It is represented in the card, describing the dataset. However, it remains difficult to get an idea what kind of data this dataset contains and how it can be interpreted.

It is not possible to identify the datasets with linked open data. There is no special section or signified tag in the Open Data Portal of Russia. Despite the regulatory requirements there are also no signs of pilot projects on official web sites representing “linked open data sets with the possibility of visualization” [17].

We must admit LOD creation is rather difficult not only for Russian e-government but also for many other countries. For example, the analysis of data available in European countries in accordance with the requirements of the 5-Star Open Data model [4] is represented in EU Open Data Maturity Report [18]. It reveals that most data (above 50%) on the national data portals is available according to the requirements of the first three stars. In 68% of countries more than 90% of the data still only corresponds to 1*. 64% of countries achieved 3* level with 50–90% of data. However that data according to four and five stars - the use of Uniform Resource Identifiers (URIs) (4*) as well as linking data so that a person or a machine can explore the web of data (5*) is not common in in the public sector open data of Europe yet. Only in 7% of countries most data (50–90%) correspond to 4*, and 96% of countries do not have open data at level 5*.

Nevertheless, the European Union considers the implementation of LOD as a crucial objective for the development of OGD. It is one of the maturity indicators confirming data quality. Unfortunately, Russia uses neither the 5-star model adopted in many countries (and we followed its requirements in this and previous work), nor any other measurable indicators to evaluate the quality of open data (Fig. 1).

Fig. 1.
figure 1

The percentage of EU data provided in conformity to the 5-Star model [18].

Therefore, these updated research results cast doubt on the possibility to follow a data-centric paradigm in Russian e-government without changing the attitude to OGD publication together with linking data for its interpretation. Published datasets do not gain any special interest either from experts, press or business. Using open data for the benefit of the state and society has not become essential and has not even come to practice. That is why there is no “customer demand” for the data quality, as well as for its relevance and availability. Conversely, while data customers do not use (linked) open data constantly, data providers do not see the need to save the unique and correct meaning for further information sharing. At the same time, tracing the requirements of new conceptual documents, regulating data governance in Russia, it is impossible to identify the continuity of the existing Open Government legacy resources and their correspondence to the future systems and services developing within the framework of the National Data Governance System.

4.2 Preconditions to the Establishment of Data-Centric and Model-Oriented Paradigm in Digital Government

The volume of research papers published in the areas outlined in Sect. 2 is a good indicator for checking the relevance and practical significance of the studies in semantics for digital government. They aim to provide the effective information sharing and digitalization of the public sector. Thus, we consider it important to re-analyze recently published academic papers using the same keywords and search requests refining the query as we highlighted in our previous study [2]. We use Google Scholar (GS) as a source again sticking to the same indexing system, to maintain the integrity of this study and the comparability of its results.

To determine the relevance of the publications obtained as a result of a search request, we make an expert evaluation of the first thirty papers in order to understand how relevant they are to the chosen topic and criteria, using the same approach as at the previous study. The group of experts included specialists in semantic interoperability, LOD, domain models and programming from the community of our Center for Semantic IntegrationFootnote 11. Table 2 presents comparative results for both periods.

Table 2. Search results (GS).

Table 2 shows a significant growth of Russian academic studies dedicated to the application of open standards, open models (ontologies) and especially LOD to overcome the challenges of digitalization (Fig. 2). Thus, we should point not only the increase of this topic relevance and applicability. We can also determine that by 2020, Russian academic community, finally, has formed the prerequisites to establish a data-centric and model-oriented e-government, expressed via a wide range of research works. The current trend also proves the establishment of academic basis and the availability of experts actively working in this direction. It means their potential competence can and should be used in the implementation of the Concept.

Fig. 2.
figure 2

The growth of published research papers.

Nevertheless, we must again highlight the lack of studies describing the experience of implementing the developed models and the actual use of LOD in Russian e-government. The relevance of search results retrieved for these requests in Russian is extremely low. Therefore, it brings us to the conclusion that there are no papers describing LOD pilot projects fulfilled in accordance with the roadmap [19] or they have not been presented to the academic community or to the practitioners yet. The lack of new research shows that these pilot projects have not received any continuation and the development of LOD in Russia is still rather complicated.

On the other hand, we point just a slight increase (by 6%–7%) in the number of academic papers published in English at the “Open Standards” and “Linked Open Data” criteria, as well as a significant decrease (by 13%) in the criterion “Data Models, Semantics, Ontologies”. Until 2018, there were much more publications, especially obtained at searching “semantic interoperability”. Apparently, the reason for this dynamic is the accomplishment of academic work in this area (its peak was in 2010–2016), while even its practical implementation is already at the final stage. Indeed, for example, in Europe the achievement of interoperability has been regulated over 10 years by the ISA program established by European Commission. This year the second stage of the program comes to the end, leaving a good basis, formed of implemented strategies, developed architectures, accepted standards, and applied models. LOD creation is also the task of a practical level. In accordance with “Open Data Maturity Report” the availability of LOD indicated open data quality [18].

Among the papers published in Russian for the entire search period, we found just a little more than fifty works using the query “semantic interoperability in e-government”. Their relevance is only 50%. The reason for that is rather obvious. The state level does not support any research on this topic, despite its importance in e-government development and only a few enthusiasts continue their work.

4.3 Linked Open Statistical Data Initiatives in Russia

Statistics domain in Russia represents the effective implementation of international open standards for open government data. The Federal State Statistic Service uses object models, including SDMX and DDIFootnote 12 to improve customer understanding of data and increase the interoperability of statistical information systems. However, object models do not have the ability to deepen and extend. They are not interconnected, do not form multidimensional structures of concepts, as well as they do not reflect the variability of relations and associations that are essential for representing real-world entities. The effective reuse of open statistics is rather hampered. The primary reasons for that are (1) the heterogeneity of open statistical data environment, (2) data fragmentation and (3) the lack of the possibility to get meaningful interpretation [11, 13].

Semantic Web technologies (SW) serve to overcome these challenges. In accordance with SW principles data is presented in a standard form covering the data associations and relations. Semantic annotation allows both people and machines to determine a unique meaningful interpretation of data using semantic models (ontologies, thesauruses, glossaries, and dictionaries), which have no restrictions on complexity, coherence, and variation. These “embedded semantics offer significant advantages such as reasoning over data and operating with heterogeneous data sources” [5].

In Sect. 4.2 we place emphasis on a small percentage of LOD published among other datasets on OGD portals in Russia. We suppose the certain reason for that is the considerable effort required to complete this task. First, it is necessary to create and disseminate semantic models and then to use them for linking data as well as for adding the semantic annotation. This is rather complicated, labor-consuming, and multi-aspect work fulfilled only due to the significant joint efforts of IT specialists and domain experts. At the same time, these efforts invested in “linking” the data will get the reward by reaching a new level of statistical analysis using visualization tools and providing the opportunity to take full advantage of knowledge and context generated through the application of the semantic approach towards the development of data-driven government.

In September 2020 Russian Federal State Statistics Service together with Plekhanov Russian University of Economics started the research with the aim to develop the concept and the roadmap for the production and dissemination of linked open statistical data based on the study of international experience in terms of applied regulatory, methodological and technological approaches. This initiative is the first step to LOSD in Russia and representing its results will be the objective for our future work.

5 Conclusion and Recommendations

Within this research we put the aim to reexamine the readiness of Russian e-government and public sector in general for the digital transformation with the focus on LOD development and to compare it to the results obtained in 2018. Following three main criteria [2] we re-analyzed datasets published on official government open data portals, searched for the academic studies on semantic interoperability, the application of data models, as well as the experience of Linked Open Data development. Since statistics is one of the key cross-sector domains of OGD we give a review of existing practice in Linked Open Statistical data. The analysis of current LOSD initiatives both at the strategic and practical levels is rather useful for further research on LOSD implementation in Federal State Statistics Service.

This study shows the academic level in Russia has already provided rather solid background for the establishment of data-centric paradigm in digital government. The number of studies relevant to this topic has significantly increased since 2018. There are some papers, describing projects devoted to the development of semantic models in the public sector and other domains.

However, at the state level, there are still no conditions for data-centricity, despite the newly adopted conceptual documents. At the present time neither the existing international experience, nor the competencies of Russian experts, acquired over the past 10 years, come into account. The concept of “Government as platform”, fundamentally important for digitalization, is often replaced by the “platforms for the government” development, as well as the implementation of data-driven government is reduced to data governance.

To achieve the objectives of the Concept for the creation and operation of National Data Governance System, in particular “to provide the ontological unity of government data contained in the information resources of public sector bodies and organizations” [22] we propose to focus on the building of the groundwork for data-driven government, realized in the development, distribution and reuse of semantic assets (ontologies, thesauri, dictionaries and other semantic models) necessary for the semantic annotation of data and sustainable information sharing. Encouraged by academic and expert community we expand the recommendations given earlier [2] and suggest to organize at the state level the collaboration of domain experts and IT specialists for LOD production and dissemination, improving the quality of government data. Within the joint Center of competence among other things they could work on:

  • cataloging and managing of semantic assets, providing their dissemination and reuse.

  • development and dissemination of information exchange models for cross-sector interaction in distributed heterogeneous information systems of e-government.

  • LOD implementation for the comparison, analysis, and visualization of government data.

  • providing the informational, methodological, organizational, regulatory, and legal support to the expert community in the field of semantic integration and semantic analysis.