Keywords

1 Introduction

Ontology-based Data Access (OBDA) [1, 2] is concerned with providing end-users and applications with a way to query legacy databases through a high-level ontology that models both the business logic and the underlying data sources. Modern knowledge-based applications have replaced the representation of business logic by using a high-level representation of the business intelligence which is decoupled from the application code. This allows for improved flexibility. In Semantic Web applications [3], the business intelligence is represented by ontologies expressed in the Web Ontology Language 2 (OWL 2) [4]. Briefly, an ontology is a logical theory formed by a collection of concepts and roles and also a set of concept and role assertions [5]. The relationship holding among the concepts and roles in the ontology are described in terms of inclusion and equality axioms. Ontologies used to represent business logic are then used by ontology reasoners to conclude implicit knowledge (i.e. not present in the database). The conclusions that can be got include making explicit the implicit terminology of concepts defined by the ontology, determining if a certain individual is a member of a concept, or determining if two individuals are related through a role, determining if a concept is subsumed by another concept, or if a role is subsumed by another role.

Thus, the classic OBDA architecture [1] is composed of a global database, a legacy database and a bridge between the ontology and the database. The bridge between the ontology and the data sources is addressed by mappings that define how to express records of the database as ontological assertions. Relational databases are comprised of relations (tables), that in term are defined by data schemas, which define the names and domains of table attributes as well as any integrity constraints that might apply to them, and are composed of records. Ontologies, on the other hand, are composed of axioms, and concept and roles assertions. The mappings define how to populate the ontology in terms of the elements of the database. Basically, the concept and role fillers are defined by SQL queries that indicate how to populate them. Notice that in the case of having several databases, a federation system can be used that allows to see the set of databases as a unified database. In this work, however, we will not take this possibility into account.

In this research, we are concerned with providing tools for performing OBDA with relational and non-relational data sources. Several tools have been developed by other research groups (see for instance [6,7,8,9] that we reviewed in [10]). Some of those tools are closed-source while others are open-source, some are downloadable and can be used as stand-alone applications or as programming libraries. While many times they are a good starting point for building applications, many times they are not flexible enough. In that regard, we are developing a tool, which nowadays is in a prototypical state, that can access an H2 database, allowing the user to explicitly formulate mappings, and populating an ontology that can be saved for later querying and visualization. See [10, 11] for previous reports on the functionality of the application and its prospective application areas.

We present the advances we have made on the development of such a tool. In particular, we have added a form that allows end-users to fully specify in a high-level manner the nature of mappings and by writing SQL queries as well. We also added a module that allows testing on how our application behaves in the presence of increasing demands. We introduce a language that allows the user to precisely define the contents of a CSV file, we use that information to interpret the contents of a CSV file and then translating into OWL. We also discuss how this materialization tool could be used in the context of an e-government application. We provide with a downloadable prototype and a user manual at . We assume that the reader has a basic knowledge of Description Logics (DL) [12], relational databases [13] and the Web Ontology Language [4].

This work consolidates and extends results presented in [14]. We have included a new language for specifying the underlying schema of a CSV file, its implementation and an analysis of its performance. Also, we have also included an analysis of how this prototypical application could be integrated with an electronic government setting where public open data has to be machine processed.

The rest of the paper is structured as follows. In Sect. 2, we briefly recapitulate the concepts associated with materializing ontologies from tables. In Sect. 3, we present a novel development in the system that allows a naïve user to define a mapping from tables to ontologies in a visual manner. In Sect. 4, we present an alternative language for describing CSV meta information. In Sect. 5, we show an empirical evaluation of the performance of the prototype creating tables and ontologies. In Sect. 6, we present a case study where we show how the proposed application could be used for supporting data handling in an e-government application in a municipality. In Sect. 7, we review related work. In Sect. 8, we conclude and foresee future work.

2 Materialization of OWL Ontologies from Relational Databases

An ontology is a logical theory formed by set of axioms and assertions describing the business logic. The mappings describe how to map relational views into ontological vocabulary. Given a data access instance formed by a relational database \(\mathcal {D}\), an ontological vocabulary \(\mathcal {V}\), a set of ontological axioms \(\mathcal {O}\) over \(\mathcal {V}\), and a set of mappings \(\mathcal {M}\) between \(\mathcal {V}\) and \(\mathcal {D}\), there are two approaches to answer a query Q over \(\mathcal {V}\): (i) materialization: ontological facts are materialized (i.e. classes and properties participating in mappings are populated with individuals by evaluating SQL queries participating in mappings) and this gives a set of ontological facts \(\mathcal {A}\) and then Q is evaluated against \(\mathcal {O}\) and \(\mathcal {A}\) with standard query-answering engines for ontologies, or (ii) virtualization: Q should be first rewritten into SQL using \(\mathcal {O}\) and \(\mathcal {M}\) and then SQL should be executed over \(\mathcal {D}\).

In this work, we will only use the materialization approach. Materializing an OWL ontology from a relational database requires exporting the database contents as a text file in OWL format. For doing this, we need to export the schema information of each table as Tbox axioms and the instance data of the tables as Abox assertions. Here, we review the formalization for exporting database relations as ontologies as we presented it in [10] according to the directions given by [1, 15]. Building an ontology from a database requires creating at least a class \(C_T\) for every table T, and for every attribute a of domain d in T we need two inclusion DL axioms \(C_T \sqsubseteq \exists a\) and \(\exists a^{-} \sqsubseteq d\). Primary key values \(k_i\) serve the purpose of establishing the membership of individuals to classes as DL Abox assertions of the form \(C_T(C_T\#k^j)\). For indicating that \(a^j\) is the value of attribute a, we will use a role expression of the form \(C_T\#a(C_T\#k^j,C_T\#a^j)\). When it is clear from context, we might drop the prefix \(C_T\#\) for simplifying our notation. A foreign key fk in table \(T_1\) referencing a primary key field in table \(T_2\) will also require to add two Tbox axioms \(C_{T_1} \sqsubseteq \exists \mathsf {ref\_} fk \) and \(\exists \mathsf {ref\_} fk ^{-} \sqsubseteq C_{T_2}\) and an Abox assertion \(\mathsf {ref\_} fk (k^j, fk^t)\) for expressing that the individual named \(k^j\) in \(C_{T_1}\) is related to the individual named \(fk^t\) in \(C_{T_2}\). Besides, in any case, if we want to consider a subset of a table for its mapping into an ontology, we might define an SQL query that will act as an SQL filter. In this work, we will only deal with the translation into OWL of single tables and one-to-many relations (see [10] for details):

Definition 1 (Mapping of a table with a single primary key)

Let T be a table with schema \(T(\underline{k},a_1,\ldots ,a_n)\) and instance \(\{ (k^1, a_1^1, \ldots , a_n^1), \ldots , (k^m,\) \(a_1^m, \ldots ,\) \(a_n^m)\}\). To map T into a DL terminology \(\mathcal {T}\), we have to create a class T and for each attribute \(a_i\) of domain \(D_i\) we have to add two axioms: \(T \sqsubseteq \exists a_i\), indicating that every T has an attribute \(a_i\), and \(\exists a_i^{-} \sqsubseteq D_i\), meaning that the domain of \(a_i\) is \(D_i\). The assertional box \(\mathcal {A}\) for T will contain \(\{T(k^1), \ldots , T(k^m)\}\). Given a key value \(k_j\), \(j=1,\ldots ,m\), for every attribute \(a_i\), \(i=1,\ldots ,n\), of the schema and instance value \(a_i^j\) (i.e. the value of i-th attribute of the j-th individual), produce a property \(a_i(k^j,a_i^j)\).

Example 1

Consider a table for representing people with schema \(\mathsf {Person}\)\((\underline{\mathsf {personID}},\) \(\mathsf {name}, \mathsf {sex}, \mathsf {birthDate}, \mathsf {weight})\) and instance as on the left side of Fig. (1). This table is created by the SQL script presented in the right side of Fig. (1).

Fig. 1.
figure 1

On the left, relational instance of the table \(\mathsf {Person}\) and, on the right, SQL script for creating the table \(\mathsf {Person}\)

The table \(\mathsf {Person}\) is interpreted in Description Logics according to Definition 1, as \(\Sigma =(\mathcal {T},\mathcal {A})\) in Fig. 2. Description Logic ontologies are implemented in the OWL language, which includes an XML serialization which we partially present in Fig. 3 by showing the representation for John.

Fig. 2.
figure 2

Ontology \(\Sigma =(\mathcal {T},\mathcal {A})\) representing the table Person from Example 1

Fig. 3.
figure 3

Part of the OWL code for the definition of the class \(\mathsf {Person}\) from Example 1

We now recall how to map two tables participating in a one-to-many relationship.

Definition 2 (Mapping of a one-to-many relationship)

Let \(A(\underline{k_1}, a_1, \ldots ,\) \(a_n)\) and \(B(\underline{k_2}, b_1, \ldots , b_m, k_1)\) be two tables participating in a one-to-many relationship where \(k_1\) is both the primary key in A and a foreign key in B. Tables A and B are translated in DL according to Definition 1. Besides, the two axioms are added: \(B \sqsubseteq \exists \mathsf {ref\_}k_1.A\) and \(\exists \mathsf {ref\_}k_1^{-}.B \sqsubseteq A\). And for every tuple \((k_1^i, a_1^i, \ldots , a_n^i)\) of A related to a tuple \((k_2^j, b_1^j, \ldots , b_m^j, k_1^i)\) in B, an assertion \(\mathsf {ref\_}k_1(k_2^j,k_1^i)\) is added.

Example 2

(Continues Example 1). Consider a one-to-many relation of table \(\mathsf {Person}\) from Example 1 with a table \(\mathsf {Phone}(\underline{\mathsf {phoneNumber}}, \mathsf {personID})\), populated as shown in Fig. 4. Notice that \(\mathsf {personID}\) is a foreign key referencing table \(\mathsf {Person}\).

Fig. 4.
figure 4

Relational instance of table \(\mathsf {Phone}\) from Example 2

Notice that \(\mathsf {phoneNumber}\) is the primary key while \(\mathsf {personID}\) is a foreign key referencing key-values of the table \(\mathsf {Person}\). Concerning the one-to-many relation and according to Definition 2, two axioms are added to the ontology: \(\mathsf {Phone}\sqsubseteq \) \(\exists \mathsf {ref\_}\mathsf {personID}.\mathsf {Person}\) and \(\exists \mathsf {ref\_}\mathsf {personID}^{-}.\mathsf {Phone}\sqsubseteq \mathsf {Person}\). Let \(p_1=\mathsf {Phone}\#\)\(\text{555-0000 }\) be an IRI for the first phone and \(p_2=\mathsf {Phone}\#\text{555-0001 }\) for the second one. The assertions \(\mathsf {Phone}(p_1)\), \(\mathsf {phoneNumber}(p_1,\text{555-0000 })\), \(\mathsf {personID}(p_1,1)\), \(\mathsf {ref\_}\mathsf {personID}(p_1,\) \(\mathsf {Person}\#1)\), \(\mathsf {Phone}(p_2)\), \(\mathsf {phoneNumber}(p_2,\text{555-0001 })\), \(\mathsf {personID}\)\((p_2,1)\), \(\mathsf {ref\_}\mathsf {personID}\) \((p_2,\) \(\mathsf {Person}\#1)\), are then added to the ontology indicating that 555-0001 and 555-0002 are phone numbers and that the person with id 1 owns these phone numbers. Notice how the IRIs for the phones are built concatenating both the name of the class and the value of the respective key values. Assertions prefixing the name of the field with \(\mathsf {ref\_}\) that relate the person and his/her phone are added too.

3 Visual Mapping Specification

The specification of the mappings for obtaining the fillers of concept from a table is usually a complex matter for naïve end-users. Remember that a mapping is basically a SQL query that defines how the fillers of concept, property or role are computed in terms of the contents of a database. When there is no support for composing mappings, the user has to write such SQL from scratch. We believe that adding support for building the mappings will improve the user experience of a prospective user of OBDA technology.

With the idea of providing support to end-users in their quest of creating concepts for populating ontologies from database contents, we created a module that allows to visually specify a mapping from a table. The module retrieves the tables from the database, and allows to select a table. Once the table is selected, its fields can be selected too. The user can then introduce what conditions each field of the table has to satisfy. Besides, one field (usually the key field of the table) has to be selected to fill the concept. The module then will automatically generate the SQL filter for filling the concept by extracting the records from the table, and will also add a subclass axiom to the ontology.

Example 3

Consider again the table \(\mathsf {Person}\) from Example 1 and suppose that some user of the system wants to define the concept “heavy, young, male individual”. Suppose also that the user models a heavy individual as somebody who weighs at least a hundred kilograms, a young individual as someone who was born after 2001, and a male individual as someone of male sex. People of male sex are codified as having the column named sex as true while females are codified as false. Although this is a trivial example, it shows the complexities that run into database modeling that produce a degradation of the representation of the world and that are unretrievable afterwards. The user will then visually specify the conditions for an individual to be a member of the concept \(\mathsf {YoungHeavyMalePerson}\) in a form like the one presented in Fig. 6. Notice how the user specifies which database field corresponds to the key (i.e. the name of the individuals), in this case \(\mathsf {personID}\). In turn, the system will generate a SQL query as shown in Fig. 5.

Fig. 5.
figure 5

SQL query for the specification of the concept \(\mathsf {YoungHeavyMalePerson}\) of Example 3

Fig. 6.
figure 6

Visual concept specification of the concept \(\mathsf {YoungHeavyMalePerson}\)

After the execution of the query that will compute the individuals that fill the concept, the system will add to the current ontology the triples expressing that those individuals are the fillers of the concept \(\mathsf {YoungHeavyMalePerson}\). Besides, in order to relate this concept to its superconcept, the axiom \(\mathsf {YoungHeavyMalePerson}\) \(\sqsubseteq \mathsf {Person}\) will be added to the current ontology as well.

This will lead to the situation presented in Fig. 7. The new class YoungHeavyMalePerson is defined as a subclass of \(\mathsf {Person}\) and \(\mathsf {John}\), whose \(\mathsf {personID}\) role is “1” becomes a member of \(\mathsf {YoungHeavyMalePerson}\). Notice also that no new individuals are defined as \(\mathsf {John}\) is already present in the ontology because he is a \(\mathsf {Person}\). In this sense, we adhere to the unique name assumption as much as we can although this is not required by the formalism. Also notice how the intensional definition of the concept is lost in the ontology (other than being a subclass of \(\mathsf {Person}\)) and only its extension is maintained in the ontology (as the set of its fillers).

Fig. 7.
figure 7

Situation arisen by specifying a subclass of \(\mathsf {Person}\) named \({\mathsf {YoungHeavyMale}}{\mathsf {Person}}\)

Another feature that the current version of the system includes is the possibility of specifying a subclass by means of an explicit SQL query.

Example 4

Continuing Example 3, the concept \(\mathsf {FemalePerson}\) (which defines a subset of the table \(\mathsf {Person}\) formed by women) is specified by means of the SQL query:

SELECT “personID” FROM “Person” WHERE “sex” = false

This can be done by using the form presented in Fig. 8. Notice the additional OWL code in the ontology generated by out tool which is presented in Fig. 9 expressing that a female person is a person (i.e. \(\mathsf {FemalePerson}\sqsubseteq \mathsf {Person}\) is an axiom in the ontology) and that Mary is both a female person and a person (i.e. \(\mathsf {FemalePerson}(\mathsf {Mary})\) and \(\mathsf {Person}(\mathsf {Mary})\) are assertions in the ontology).

Fig. 8.
figure 8

Specification of the subclass \(\mathsf {FemalePerson}\) of \(\mathsf {Person}\) by a SQL query

Fig. 9.
figure 9

Portion of OWL code for introducing subconcept \(\mathsf {FemalePerson}\)

4 Specification of Schemas for CSV Files

A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields enclosed in delimiters and separated by commas. A CSV file stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. Comma separated files are used for the interchange of database information between machines of two different architectures. The plain-text character of CSV files largely avoids incompatibilities such as byte-order and word size. The files are human-readable, so it is easier to deal with them in the absence of perfect documentation or communication.

Example 5

In Fig. 10, we show the CSV table for the table \(\mathsf {Person}\) of Example 1.

Fig. 10.
figure 10

CSV file for the table \(\mathsf {Person}\) of Example 1

Despite its simplicity, the lack of both standardization and schema information in CSV files poses a disadvantage, forcing application programs to guess or ask the user for delimiter and field-separators characters. For solving this problem, the W3C Working Group has proposed a format for specifying CSV metadata [16] based mostly in JSON (JavaScript Object Notation)Footnote 1. Although this solution works in practice, we think that JSON files, although their human readability, are not simple enough for naive users. We then propose a simpler language for specifying the schema (or meta information) of a CSV file as defined by the BNF grammar presented in the left side of Fig. 11. We believe that our language is simple enough to be human-readable and complex enough for its purpose. The declarations have to be sound (i.e each declared field in CSV Meta information file must be present in the CSV file), complete (i.e. each field in the CSV file must be declared in the CSV meta information file) and ordered (i.e. the order in which fields appear in both the CSV file and the CSV meta information file must be the same).

Fig. 11.
figure 11

On the left, BNF grammar for the meta information language for defining CSV schemas, and, on the right, example of providing schema information for the CSV file in Fig. 10

In the right side of Fig. 11, we provide an example of a file for defining the schema of a CSV file that would represent the data provided in Example 5. Although the declarations for number-of-key-fields and number-of-fields seem redundant, we think that they offer a way for validating that the user is doing things correctly. Valid identifiers begin with a letter and continue with letters and numbers. For now we consider only the types: integer, real, string, boolean and date, but this can be easily extended. If the number of rows to be translated is not specified then all is assumed by default. If no field separator is specified, the comma is assumed by default. If no quotation character is specified, the double quotation mark is assumed by default. The parser validates that both the number of key-fields and fields declared matches those that were defined.

The contents of the CSV file are validated and processed according to the definitions given in the CSV schema file. Then the contents of the CSV file are loaded into an H2 database which is translated into OWL as explained in Sect. 2. Our implementation approach parses first the meta-information schema file and then the CSV file; it then generates a SQL script that is used to create an H2 table, that is translated into OWL.

5 Experimental Evaluation

We now discuss some of the tests we have performed in order to test how our application handles increasing demands in database size. The performance of our system is affected mainly by the fact that we chose to materialize tables as triples (i.e. class membership, property and roles assertions) and also by three factors: (i) the system is implemented in the JAVA programming language; (ii) the database management system that we use is H2Footnote 2, and, (iii) the handling of the global ontology is done via the OWL API [17, 18].

Our tests were conducted on an ASUS notebook having an Intel Core i7, 3.5 GHz CPU, 8 GB RAM, 1 TB HDD, Windows 10. They involved the creation of simple databases composed by a single table containing 100 fields of text type filled with an increasing number of records. Table 1 summarizes our results. As it can be seen, our implementation starts having problems at tables with 100,000 records; although an ontology can be generated and saved to the disk, when we try to load the ontology we saved previously, we get an error inside the code of the OWL API, indicating that library cannot handle such a data load. When running a test for creating a database of a million records, the H2 database produces an error (which is understandable as it is maintained in RAM). Likewise, in Table 2, we can see the times for loading CSV files of 100 fields containing integer values and also an increasing number of records. Therefore, we conclude that our application can only handle tables with a size tens of thousands records and is not able of handling tables of a hundred thousand records. Because of this limitation, we think that we will be forced to use query-rewriting techniques [1] for delegating the evaluation of queries to the database management system instead of the ontology management library.

Table 1. Running times for ontology generation from H2 database
Table 2. Running times for ontology generation from CSV file

6 Case Study: Support for e-Gov in Municipalities

The importance of social policies has grown in recent years at all levels of government (whether municipal, provincial, national and international), since they represent one of the main tools to combat economic inequalities that occur globally [19] and serve to meet the needs of many vulnerable groups. The provision of public social action services to citizens become an obligation for governments, just as such services are a human right, such as are access to water, energy, health, education and other services.

Despite the global relevance, universality in the provision of public services is a challenge for each government due to the variety of contexts in which such services are provided, including the needs of specific social groups, the capacities of each government, and the context-specific conditions (such as territory, political, cultural, economic, etc.) [19, 20]. In some municipalities of the province of Buenos Aires in Argentina, the following challenges are observed for the provision of public social action services: (i) the services are provided by several municipal government agencies and there is no consolidated information on how the services are being delivered; (ii) currently, there are ad hoc applications that support the process for the delivery of each service, these applications work in isolation, without sharing data; (iii) there is no strategy for the delivery of these services using multiple channels; (iv) the digital channels that could be used are not being exploited properly; (v) there is no software infrastructure that allows the rapid development of applications for the delivery of social action services [21].

Based on these challenges, it is necessary to find a way to publish data contained in legacy and current applications and used in various state institutions in a way that can be integrated, accessed, modified and consulted in a format that is uniform, distributed and scalable. In this sense, the technologies of the Semantic Web have matured enough to be considered as a viable solution for the publication and integration of institutional data. In particular, using semantics implies conceiving systems where the meaning of the data is explicitly specified and is taken into account to design their functionalities. This idea has become crucial for a wide variety of information processing applications and has received much attention in the artificial intelligence, database, web and data mining communities.

As a case study of the operation of the application developed, we present an example loosely based on the public data available in the Municipality of Bahía Blanca. A preliminary version was presented in [11]Footnote 3. Let us take as an example three tables, presented in Fig. 12 with the details of the beneficiaries of all social assistance in the period selected in the Municipality of Bahía BlancaFootnote 4, where we have a table called Program representing social assistance programs, another with the beneficiaries called Beneficiary and a third called Person with the data of the people enrolled in the aid programs.

The program table schema contains the identifier (which is the primary key) and program name. The table of beneficiaries of aid programs contains the identification (id) of the benefit (which is the primary key of the table), the document number of the beneficiary (which is a foreign key), the amount received, the date of reception and the help program for which the help was received (which is a foreign key). The people table contains the person’s social ID (personal document) number (which is the primary key) and his/her last and first name.

Fig. 12.
figure 12

Relational tables from Bahia Blanca municipality public information site

The Tbox axioms in Fig. 13 represent the schema information of the tables in Fig. 12 and the Abox assertions in Fig. 14 represent the relational instance.

Fig. 13.
figure 13

Terminology for the schema of the data from the municipality

Next, we present the technology for querying the OWL ontologies materialized from this database. SPARQL (SPARQL Protocol and RDF Query Language) [22] is a declarative query language for RDF that allows to retrieve and manipulate data stored in the Semantic Web represented as RDF statements. A SPARQL endpoint accepts queries for any web-accessible RDF data and returns results via HTTP. The results of SPARQL queries can be returned in a variety of formats such as XML, JSON, RDF and HTML. Then, with this solution, data from the municipality’s administration can be published in an uniform, public, modern, open format that can be queried both by people and by applications. For instance, finding who were the ten people who, during the year 2019, collected the most of a plan of at least \(\$15,000\) could be done by means of the SPARQL query as shown in Fig. 15. This would allow both applications to use it as a web service or end-users users to query the data in a web page.

7 Related Work

With ViziQuer, Cerans et al. [23] provide an open source tool for web-based creation and execution of visual diagrammatic queries over RDF/SPARQL data. The tool supports the data instance level and statistics queries, providing visual counterparts for most of SPARQL 1.1 select query constructs, including aggregation and subqueries. A query environment can be created over a user-supplied SPARQL endpoint with known data schema. ViziQuer provides a visual interface for expressing user queries in SPARQL posed against an ontology. In contrast, we provide the user with an interface for describing subclass expressions and inclusion axioms by means of restrictions imposed on records of a relational table with the aim of populating an ontology that could later be exposed and queried as an SPARQL endpoint.

Fig. 14.
figure 14

Relational instance from the data of the municipality

Fig. 15.
figure 15

On the left, the SPARQL query for finding who were the ten people who, during the year 2019, collected the most of a plan of at least \(\$15,000\); and, on the right, the result of the execution of the query by a SPARQL engine against the ontology data in Fig. 14

Christodoulou et al. [24] make the case that structural summaries over linked-data sources can inform query formulation and provide support for data integration and query processing over multiple linked-data sources. To fulfil this aim, they propose an approach that builds on a hierarchical clustering algorithm for inferring structural summaries over linked-data sources. Thus, their approach takes as input an RDF repository and then reverse engineers an ontology using clustering techniques to detect prospective classes. In contrast, we take a database and the user proposes SQL queries to express subconcepts intensionally; when the SQL queries are executed, the fillers of the concept populate the ontology building an extensional de facto definition of the concept. In that regard, the work of Christodoulou et al. can be considered complementary to ours.

Barrasa et al. [25] propose R2O, an extensible and declarative language to describe mappings between relational DB schemas and ontologies implemented in RDF(S) or OWL. R2O provides an extensible set of primitives with well defined semantics. This language has been conceived expressive enough to cope with complex mapping cases arisen from situations of low similarity between the ontology and the DB models. R2O allows the user to express complex queries in terms of ontologies in a language that is similar to the relational algebra but aimed at ontologies. Therefore, this approach is complementary to ours because it allows to query the ontology once it is published in OWL format.

8 Conclusions and Future Work

We presented a framework for performing ontology-based data access by means of performing a materialization approach. Our implementation is JAVA-based and relies on a H2 database management system and a JAVA library called OWL-API for accessing and querying databases and maintaining an OWL database in main memory, respectively. We presented several enhancements that we have made to the previous iteration of our prototype implementation, which now includes a visual mapping specification functionality and allows to maintain a global database that can be either created from scratch or loaded from disk, modified and later saved again to disk. From the experimental evaluation to which we subjected our system, we conclude that our application is able to handle a moderate workload of a table of tens of thousands of records but fails to handle table of the order of millions of records. In this regard, we think that we will be forced to use query-rewriting techniques for delegating the evaluation of queries to the database management system instead of the ontology management library. Part of our current research efforts are aimed in this direction. Other form of improvement lies in the possibility of addressing the federation of databases for performing integration of multiple heterogeneous data sources. We introduced a language for defining the schema information of a CSV file, and then how to interpret the contents of a CSV file for performing its translation into OWL. We discussed how this materialization tool could be used in the context of an e-government application showing how relational data can be publish as open data in OWL.