1 Introduction

RDFFootnote 1 is a metadata model recommended by the W3C (World Wide Web Consortium), which can explicitly describe resources on the Web and the relationships between these resources. RDF has good machine readability, and its syntactic form is very similar to the composition of knowledge. Thus, the RDF model has been extensively accepted as a representation model of knowledge graphs, which was formally introduced by Google in 2012 with the aim of improving the performance of search engines. Nowadays, knowledge graphs have been widely applied in diverse domains, and many knowledge graphs have become available (e.g., DBpediaFootnote 2 and WikidataFootnote 3). With the increasing scale of knowledge graphs, efficient storage and query of the huge amount of RDF data are of crucial importance. Traditionally, there are three main categories of RDF storage methods, which are memory-based method (Atre & Hendler, 2009), disk-based (Wu et al., 2009), and database-based [3, 9, 11, 26, 35, 40- 42, 45, 51, 52], respectively. Among them, the database storage method has become the primary means to manage large-scale RDF data because of the mature techniques and numerous products of database systems (Ma et al., 2016). It is especially true for relational databases.

The real world is dynamic, and any individual may change from time to time. Data with temporal features are known as temporal data. One can find temporal data available in many fields (e.g., geographic information systems, weather forecasts, dynamic social networks, the Internet of Things, etc.). The issue of representing and managing temporal data has been investigated in the context of relational databases for a longtime [10, 43, 44]. After realizing the importance and urgency of explicitly manipulating temporal data in relational databases in a normal way, the ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) jointly published SQL:2011, which is the most recent revision of the SQL (Structured Query Language) standard, replacing SQL:2008. One of the new features in SQL:2011, and also the most important new feature in SQL:2011, is the ability to create and manipulate temporal tables, in which rows are associated with one or more time periods (Kulkarni & Michels, 2012). In addition to the temporal relational database model, several emerging temporal data models have been proposed for dealing with temporal data in recent years. To represent and share temporal data on the Web, for example, the temporal XML model is proposed and applied (Faisal & Sarwar, 2014); also, to manage and process big temporal data, temporal NoSQL databases (e.g., column-oriented NoSQL databases) are proposed (Chen, et al., 2022).

In the context of RDF, the classical RDF model can only represent static semantics, i.e., the current state of resources and their relationships. To capture the dynamic state of resources and their relationships, several temporal RDF models have been proposed by extending the static RDF model. Basically, we can identify three major types of temporal RDF models, which are the temporal RDF model for version control (Gutierrez et al., 2007), the temporal RDF model with time label (Pugiles et al., 2008), and the temporal RDF model with the triple extension (Koubarakis & Kyzirakos, 2010). Among these temporal RDF models, the temporal RDF model with a time label is widely accepted and used as it does not change the structure and extensibility of current RDF triples. In the current temporal RDF models, a temporal RDF triple contains a timestamp that is attached to either the predicate of the RDF triple or the whole RDF triple. The RDF triple with a temporal granularity of the whole triple declares a temporal statement (i.e., a temporal fact), but it is unclear which one of the subject, predicate, and object in this triple is actually time-aware. Also, the RDF triple with a temporal granularity in the predicate of triple clearly indicates a time-aware predicate, but it fails to represent temporal information in the object of the triple. It is possible and common for the same subject and predicate to have several different objects over time. At this point, it is essential to explicitly represent time-aware objects in temporal RDF triples.

For the current temporal RDF models, few efforts are devoted to temporal RDF query (e.g., (Tappolet & Bernstein, 2009; Zaniolo et al., 2018)) and temporal RDF index (e.g., (Pugiles et al., 2008; Yan et al., 2019)). Although storing the classical RDF in databases has been widely investigated and applied, to the best of our knowledge, there is no work investigating temporal RDF storage. The temporal RDF has been used in knowledge graphs to represent temporal knowledge [50, 21]. Recently, temporal knowledge graphs are receiving increasing attention for their representation learning (e.g., (Chen, et al., 2022)), but they are mainly based on the temporal triples with a temporal granularity of the whole triple. With the widespread application of knowledge graphs in diverse time-sensitive domains (e.g., the Internet of Things), a huge amount of temporal RDF data is being proliferated and becoming available. Therefore, it is increasingly important to propose a more semantically expressive temporal RDF model and then efficiently manage large-scale temporal RDF data.

In this paper, we propose a new temporal RDF model named tRDF by exploiting the time label, which can be applied to the predicate and/or object of a triple as a timestamp. Clearly, the tRDF model is different from the existing temporal RDF models whose time labels are attached to either the predicates of the RDF triples or the whole RDF triples. We present the syntax and semantics of the tRDF model. Based on the tRDF model, we particularly advocate storing temporal RDF data in the temporal relational databases and, with this, propose mapping rules to map the tRDF model to the temporal relational tables. To query the temporal RDF data that is actually stored in the temporal relational databases, we formalize a temporal SPARQL (Simple Protocol and RDF Query Language) for the tRDF model, termed as tSPARQLt, and then provide the transformation of partial queries from tSPARQLt to SQL, a standard query language for the relational databases. We validate our proposed model and approach through comparative experiments. Although there are some proposals for temporal metrics, to the best of our knowledge, this paper is the first effort to model and query temporal RDF data with temporal relational databases.

The rest of the paper is organized as follows. Section 2 provides a brief overview of related work in RDF storage, temporal databases, and temporal RDF. Section 3 presents some preliminaries. Section 4 proposes a new temporal RDF model named tRDF and provides its syntax and semantics. Section 5 presents mapping rules and algorithms for tRDF data storage in relational databases. Section 6 formalizes a query language, called tSPARQLt, for the temporal tRDF model and defines the rules for transforming partial queries of tSPARQLt to SQL queries. Section 7 presents the experimental evaluations for our proposed storage and query method. Section 8 concludes this paper.

2 Related Work

In this section, we present the related work in three categories: RDF storage in relational databases, temporal databases, and temporal RDF.

2.1 RDF storage

Data storage is the foundation of implementing data management. To manage large-scale RDF data, many proposals have been developed to store RDF data, which are roughly categorized as memory-based storage (Atre & Hendler, 2009), disk-based storage (Wu et al., 2009), and database-based storage [3, 9, 11, 26, 35, 40- 42, 45, 51, 52]. The memory-based and disk-based storages load RDF data as triples directly into memory and store RDF triples on the local hard disk, respectively, which are collectively referred to as a local storage approach. The local storage approach preserves the triadic structure and semantics of RDF triples well, but it also suffers from several drawbacks. The memory-based storage is clearly limited by the size of memory and is only applicable to storing a small amount of RDF data. The disk-based storage shifts data storage place from memory to disk and thereby satisfies the requirement of storing larger-scale RDF data. The disk-based RDF storage is a category of native stores (e.g., RDF-3X (Neumann & Weikum, 2008)) that use customized binary RDF data representation and are built directly on the file system (Bornea, et al., 2013). Note that the native RDF stores fail to provide full and strong support for data access and control management. Database management systems (DBMSs) are designed especially for efficient data storage and management. At this point, non-native RDF stores built on top of existing database management systems (DBMSs) become the mainstream method of RDF data storage (Ma et al., 2016).

Relational databases (RDBs) have been widely used for their solid theoretical foundation and strong technical support in products as well as development tools. Typically, there are three common ways to store RDF data with relational databases (Ma et al., 2016): vertical stores, horizontal stores, and type stores. The vertical stores (e.g., Triplet (Wolff et al., 2015)), also known as triple stores, create a single relational table with three columns, and each RDF triple is directly mapped into a tuple of the relational table. Here the subject, predicate, and object of an RDF triple become three attribute values of the corresponding tuple. The horizontal stores (e.g., C-Store (Stonebraker, et al., 2005), Virtuoso (Erling & Mikhailov, 2009), and SW-store (Abadi et al., 2009)) create either a single relational table that contains all predicates of RDF triples as the table’s column names or a set of relational tables and a relational table only contains one predicate as the table’s column name. Note that, in the horizontal stores, the created relational table also contains a column name representing the subject of RDF triples in addition to the column name(s) from the predicate(s). In the horizontal stores with a single relational table, the RDF triples with the same subject and becomes a tuple of the relational table. In the horizontal stores with a set of relational tables, the RDF triples with the same predicate appear in the same relational table, where each RDF triple corresponds to a tuple of the relational table. The type stores (e.g., RDFBroker (Sintek & Kiesel, 2006), RDB2RDF(Salas, et al., 2011), and Jena (McBride, 2002)) may create multiple relational tables, and each one is for a type of subject, in which a relational table contains the properties as n-ary table columns for the same subject. In addition to the three basic relational stores of RDF above, there are also efforts in RDF data storage that use two or more of the three basic stores concurrently or revise the three basic stores (e.g., (Bornea, et al., 2013)).

For large-scale RDF data management, it is essential to ensure the scalability of RDF stores by using optimization structures such as indexes and data partitioning. In (Weiss et al., 2008), an RDF storage scheme called Hexastore was proposed, which enhances the vertical partitioning idea and takes it to its logical conclusion. As a result, a sextuple indexing scheme was applied in Hexastore. Unlike Hexastore, which builds exhaustive indexing of pairs of positions in triples, RDF-3X (Neumann & Weikum, 2008) builds exhaustive indexing of all permutations of triple positions, and TripleT (Wolff et al., 2015) builds exhaustive indexing of all single positions. In addition, to significantly improve the scalability of massive RDF stores, distributed/parallel RDF stores have been developed (Papailiou, et al., 2013). In contrast to centralized RDF stores that are single-machine solutions, distributed RDF stores (e.g., 4storeFootnote 4) partition triples across multiple machines and parallelizes query processing (Ma et al., 2016). In distributed RDF stores, RDF data partitioning is a crucial issue. Distributed RDF stores adopt two categories of partitioning: horizontal partitioning and hash partitioning, according to how the RDF data are partitioned and how partitions are stored for access (Lee & Liu, 2013). Horizontal partitioning generally partitions an RDF dataset across multiple servers by using horizontal (random) partitioning, where the partitions are stored by using distributed file systems (e.g., HDFS (Hadoop Distributed File System)Footnote 5), and then queries are processed by parallel access to the clustered servers by using distributed programming models (e.g., Hadoop MapReduce). Hash partitioning partitions an RDF dataset across multiple nodes by using hash partitioning on three components of RDF triples (i.e., subject, object, and predicate) or any combination of them, where the partitions are locally stored in a database like HBase or an RDF store like RDF-3X and then accessed through a local query interface.

To deal with big data, NoSQL databases have emerged as a new infrastructure for massive data storage and management. As a result, NoSQL databases are applied to handle massive RDF data (Cudre-Mauroux, et al., 2013). Moreover, several NoSQL stores for RDF management (e.g., RDFChain (Choi et al., 2013), the Jena-HBase (Khadilkar, et al., 2012), and Trinity.RDF (Shao et al., 2013)) have been proposed. RDF data management merits the use of NoSQL databases because of their scalability and high performance. Viewed from the theoretical foundation and technical support in products as well as development tools, however, relational databases are still in a dominant position for a relatively long period of time. Concerning massive RDF data stored in relational and NoSQL databases, one can refer to a comprehensive review in Ma et al. (2016). Note that the existing approaches for RDF data stores, both in relational and NoSQL databases, cannot explicitly deal with temporal RDF data.

2.2 Temporal databases

Temporal data representation and management have been widely investigated in the context of relational databases. As early as the 1980s, the temporal relational model was proposed by including temporal columns in the relational model. In (Clifford & Croker, 1987), a historical relational model was proposed, in which several issues like relation, tuple, and field value with temporal information were discussed. Time in temporal relational databases can be classified into three types (Mckenzie & Snodgrass, 1991; O'Connor & Das, 2010): valid time, transaction time, and user-defined time. The temporal relational databases containing only valid time are called the historical relational databases, the temporal relational databases containing only transaction time are called the rollback relational databases, and the temporal relational databases containing both valid time and transaction time are called the bi-temporal relational databases. With the development of temporal relational models, several query languages have been proposed. TQuel, proposed by Snodgrass in (Snodgrass, 1987), a well-known temporal query language, is an extension of Quel, which is upward compatible with Quel, which is very helpful in promoting the temporal data model and the temporal query language. Snodgrass further proposed a temporal query language, TSQL2, in (Snodgrass, 1994). TempSQL in Gadia (1988) is a temporal relational model which provides a complete temporal query language. These proposed temporal query languages support both valid time and transaction time.

SQL:2011 should be a milestone in the research and development of temporal relational databases. As the latest edition of the SQL standard published by the ISO/IEC in 2011, SQL:2011 explicitly provides support for creating and manipulating temporal data with temporal tables (Kulkarni & Michels, 2012). In SQL:2011, a common column may be related to an application-time period, a system-time period, or both. Furthermore, a time (application- or system-) period contains the period start time and the period end time, which are declared as two special columns named by the user. After SQL:2011 was published, many efforts have been made to extend traditional database management systems. Gao et al. in (Gao, et al., 2018), for example, proposed a new framework for the design of temporal relational databases, which supports effective access to current and historical information; Lu et al. in Lu et al. (2019) contributed to temporal extensions in distributed database management systems so that the efficiency of managing temporal data can be improved.

With the increased use of NoSQL databases for big data management, few efforts are devoted to dealing with temporal big data. Hu and Dessloch (Hu & Dessloch, 2015) proposed to use column-oriented NoSQL databases (CoNoSQLDBs) for temporal data management and processing. In the context of spatio-temporal data, Zhong et al. (Zhong et al., 2013) combined NoSQL databases and Hadoop to achieve the storage of temporal data based on distributed techniques, and Fox et al. in (Fox, et al., 2013) used the high scalability of NoSQL databases to achieve high performance queries on temporal data. Unlike SQL:2011, the current NoSQL databases do not support temporal data management. Few temporal extensions to NoSQL databases are designed for temporal RDF data store. At this point, it is a good choice to apply SQL:2011 for modeling and processing temporal RDF data, just like the relational databases for the common RDF data.

2.3 Temporal RDF models and query languages

It is recognized that in many practical applications of RDF, it is necessary to attach triples metadata into RDF (Hogan, et al., 2010). In (Udrea et al., 2010), annotated RDF model was formally proposed, where RDF triples are annotated by members of a partially ordered set. For the annotated RDF in Udrea et al. (2010), a general extension to RDF Schema (RDFS) was proposed in Straccia, et al. (2010), and a query language AnQL was then developed for the annotated RDFS in Lopes, et al. (2010). Annotations in RDF can support several specific domains to represent the temporal aspects, uncertainty, trust, and provenance of the RDF triples. Temporal RDF models are explicitly proposed to handle semantically metadata with temporal information. Gutierrez et al. in (Gutierrez et al., 2007) proposed a temporal RDF model by using a version control approach. They presented the syntax and semantics of the proposed temporal RDF model. In (Pugiles et al., 2008), a temporal RDF model was proposed by adding timestamps to RDF predicates, and the concept of indeterminate temporal triples was introduced. Koubarakis et al. (Koubarakis & Kyzirakos, 2010) proposed the quad-tuple structure with temporal information using the triple extension method. Among the above temporal RDF models, the time label method is widely used because it preserves the original triple structure RDF. In (Grandi, 2009), a temporal RDF model was proposed, which uses triple timestamping with temporal elements. The data model is equipped with manipulation operations to manage temporal versions of an ontology. A survey of temporal extensions to RDF is provided in Wang and Tansel (2019), in which the proposals for extending RDF for modeling temporal data are classified into explicit reification or implicit reification according to the used reification. The time investigated in the existing temporal RDF models mainly focuses on valid time.

Ontologies can be seen as a formal representation of knowledge over RDF data. In addition to the RDF with time, several studies have been proposed to manage domain knowledge evolution in the context of ontology versioning. In (Brahmia, et al., 2022), temporal versioning of both ontology instances and ontology schemas was considered, where ontology schema changes are triggered by non-conservative updates to ontology instances. That is, an ontology schema versioning is driven by instance updates. To address the problem of asynchronous versioning in the context of a materialized integration system, the principle of ontological continuity was proposed in Xuan et al. (2006) to support ontology changes. With the proposed principle, each old instance can be managed by using the new version of the ontology. Focusing on time-sensitive application domains, Canito, Corchado, and Marreiros (Canito et al., 2022) systematically reviewed the state of the art of representation of time and ontology evolution in the predictive maintenance field. It was identified that, although ontologies have many applications in predictive maintenance, there have been few studies on ontology evolution, and applications of time to the problem of ontology evolution are still in the open.

To query temporal RDF data, a temporal SPARQL language called τ-SPARQL was proposed in (Tappolet & Bernstein, 2009) for temporal RDF graph, where τ-SPARQL queries can be translated to standard SPARQL queries. A temporal extension of SPARQL was presented in Grandi (2010), which was aimed at embedding several features of the TSQL2 consensual language. In (Zaniolo et al., 2018), a point-based temporal extension of SPARQL, called SPARQLT, was proposed for the main-memory RDF-TX system, which supports user-friendly by-example temporal queries on historical knowledge bases derived from Wikipedia. Based on classical OBDA (ontology-based data access) systems, Brandt et al. in Brandt, et al. (2017) proposed a framework of temporal OBDA, which can extract information about temporal events in RDF format and provides a SPARQL-based query language for retrieving temporal information. Concentrating on the OBDA system for query answering with temporal data and ontologies, Elem et al. in (Kalayci, et al., 2018) developed a tool called Ontop-temporal, which can access timestamped log data. To further improve the efficiency of querying large-scale temporal RDF data, few efforts have worked on indexing temporal RDF. In (Pugiles et al., 2008), an index structure named tGRIN was proposed, which builds a specialized index for temporal RDF triples stored in Jena2, Sesame, and 3store. An index structure was proposed in (Zaniolo et al., 2018) for the original temporal RDF graphs, where the prefix path index for querying subjects of temporal RDF triples and the suffix index for querying objects of temporal RDF triples were built, respectively. In addition to several temporal SPARQL languages, few proposals for spatiotemporal SPARQL languages have been developed. In (Perry et al., 2011), for example, SPARQL was extended to SPARQL-ST so that spatiotemporal queries can be supported, and in Koubarakis and Kyzirakos (2010), the query language stSPARQ was developed for spatiotemporal RDF model in the context of the Semantic sensor Web.

The RDF model has been applied to the infrastructure of knowledge graphs (Hogan et al., 2022). With temporal RDF triples, some recent efforts have been made to investigate temporal knowledge graphs (Huang et al., 2020; Lu et al., 2019). There has been an increasing interest in learning representations of temporal knowledge graphs (e.g., (Chen, et al., 2022; Zhu, et al., 2021)). As of yet, no proposals exist for storing and querying multigranularity temporal RDF data using temporal relational databases.

3 Preliminaries

In this section, we introduce some preliminaries about the RDF model, SPARQL (Simple Protocol and RDF Query Language), and SQL:2011.

3.1 RDF Model

RDF is a metadata model proposed by W3C to describe Web resources and their mutual relationships. It provides a general framework for the description and interaction of information. An RDF model is described with a set of RDF triples. An RDF triple in the form of (subject, predicate, object) (abbreviated to (S, P, O)) is a statement in which the subject is the resource being described, the predicate is the property being described with respect to the resource, and the object is the value for the property. Here, a resource is anything with a URI (Universal Resource Identifier), and an object is a literal (if the corresponding predicate is an attribute of the resource) or another resource (if the corresponding predicate is a relationship of resources).

Definition 1 (RDF model)

An RDF model is a set of triples, and an RDF triple is formally defined as (S, P, O) ∈ (I ∪ B) × I × (I ∪ B ∪ L), where I, B, and L are infinite sets of IRIs (Internationalized Resource Identifiers), blank nodes and RDF literals, respectively.

RDF model can be described with several formats such as RDF/XML, N-Triples, Turtle, RDFa (Resource Description Framework in Attributes), and JSON-LD (JSON for Linking Data). RDF/XML represents RDF data by using the syntax of XML. Since the diffused syntax format of RDF/XML is too complex to understand, N-Triples (NT) are applied to represent RDF data, which is most similar to the syntax of the RDF model and easy to read and parse. Nowadays, many public RDF datasets (e.g., Wikidata and DBpedia) are published in the format of N-Triples. In addition, Turtle is an optimization of RDF/XML, which makes the representation more compact by indicating the prefix; RDFa uses HTML5 to represent RDF data; JSON-LD uses key-value pairs to describe RDF data. Also, the RDF model can be represented as a directed and labeled graph, where subjects and objects of triples are the vertices and predicates of triples are the edges from subject vertices to object vertices.

In addition to its syntax, an RDF model has its semantic interpretation.

Definition 2 (RDF model semantics)

For an RDF model, its semantic interpretation I consists of the following elements:

  1. (1)

    A non-empty set of resources (i.e., IR) is called the domain of I.

  2. (2)

    A set IP is called the set of properties of I.

  3. (3)

    A set IL is called the set of literals that contains all the objects of the literal type.

  4. (4)

    A mapping IEXT from IP into the power-set \(IR\times IR\) (i.e., the set of sets of < x, y > pairs with x and y in IR).

  5. (5)

    A mapping IS from IRIs into \((IR\cup IP)\).

  6. (6)

    A partial mapping ILR from literals into IR.

3.2 SPARQL

SPARQLFootnote 6 recommended by W3C, is a query language for RDF data. With a simple query statement structure, SPARQL is easy to understand and read, and has a reasoning ability to optimize the query efficiency of RDF. Considering the essential graphic structure of the RDF model, SPARQL queries evaluate the user’s requirements against RDF datasets in a graph matching way. A statement of SPARQL query generally consists of four components, which are query form, dataset, graph pattern with constraints, and solution modifier, respectively.

Four query forms in SPARQL can be identified as follows.

  • SELECT: identifies and returns the matched datasets or graphs.

  • CONSTRUCT: creates a new RDF graph.

  • ASK: judges whether the RDF graph has the result of a given query.

  • DESCRIBE: returns information of all graph nodes matched by the query.

Among these four query forms, SELECT is widely used for searching RDF. A SELECT query has the basic structure of SELECT-FROM-WHERE. The SELECT clause indicates the set of variables being shown in query answers, and a dataset in the FROM clause is used to specify the RDF data to be queried. A graph pattern in the WHERE clause describes the user's query requirement as a filter condition. We can identify three major kinds of graph patterns: the basic graph pattern, the group graph pattern, and the optional graph pattern. The basic graph pattern (BGP) consists of a number of triple patterns separated by ".". A triple pattern is a special kind of triple, where at least its subject, predicate, or object is represented by a variable. In SPARQL, a variable is introduced using "?" or "$" as a prefix. We can identify triple patterns like (S, P, ?O), (S, ?P, ?O), (S, ?P, O), (?S, ?P, O), (?S, P, O), (?S, P, ?O) and (?S, ?P, ? O). A basic graph pattern is a set of triple patterns surrounded by "{}," and all of them have to be matched in query evaluation. A group graph pattern (GGP) consists of a set of BGPs, and all of these BGPs need to be matched in query evaluation. An optional graph pattern (OGP), starting with the keyword OPTIONAL, is followed by one or more BGPs, which should be optional and not be requested for a mandatory match on query evaluation. In the graph patterns, the keyword FILTER can be used to explicitly filter out the set of eligible results. Also, SPARQL provides several modifiers (such as LIMIT, OFFSET, ORDER BY, etc.) to arrange the query results so that users can better view the result set.

3.3 SQL:2011

SQL is a standard query language for relational databases. SQL:2011 published by the ISO/IEC in 2011, is the latest edition of the SQL standard. SQL:2011 replaces the previous edition SQL:2008 and contains many new features, in which the most important new feature is that it can explicitly represent and deal with temporal data with temporal tables.

In SQL:2011, time periods are explicitly defined and associated with the rows of a table. A time period is demarcated by a start time and an end time. Here a period definition is a named table component, which actually identifies a pair of columns to capture the start time and the end time of the period. Note that the start column and the end column of the period are special columns in the table. Note that SQL:2011 adopts a left-closed-right-open period model to define a time period like [a start time, an end time), which includes the start time, but excludes the end time.

SQL:2011 distinguishes two types of time periods: the system-time period for transaction time support; an application-time period for valid time support. In an application-time period table, SQL:2011 applies the keywords PERIOD FOR to define an application-time period with a user-defined name, which contains two user-named columns to respectively represent the start time and end time of the period. Note that the period start and end columns must have the same data types, either DATE or a timestamp type. Assume that the user would create an application-time period table, atTable, which contains a period definition with a user-defined name, atPeriod. This period contains two columns with user-defined names, atStart and atEnd, which act as the start and end columns of the period. Then this temporal table is formally defined as follows.

figure a

SQL:2011 also uses the regular query syntax SELECT-FROM-WHERE to query application-time period tables. Here SQL:2011 provides several period predicates to express conditions that involve periods, including CONTAINS, OVERLAPS, EQUALS, PRECEDES, SUCCEEDS, IMMEDIATELY PRECEDES, and IMMEDIATELY SUCCEEDS.

In the system-versioned tables, SQL:2011 uses the keywords PERIOD FOR SYSTEM_TIME to define a system-time period with the standard-specified name: SYSTEM_TIME. The declared system-time period contains two user-named columns to respectively represent the start and end columns of the SYSTEM_TIME period. Also, the period start and end columns must have the same data types, either DATE or a timestamp type. However, in practice, the TIMESTAMP type with the highest fractional seconds precision is applied as the data type for the system-time period start and end columns. Note that the system-versioned table includes the keywords WITH SYSTEM VERSIONING in its definition. Assume that the user would create a system-versioned table, svTable, which contains a period definition with the standard-specified name SYSTEM_TIME, and the keywords WITH SYSTEM VERSIONING. This period contains two columns with user-defined names, svStart and svEnd, which act as the start and end columns of the period. Then this temporal table is formally described as follows.

figure b

SQL:2011 first provides three syntactic extensions for retrieving the table content as of a given time point or between any two given time points from system-versioned tables. The first extension is the FOR SYSTEM_TIME AS OF syntax for querying the table content as of a specified time point; The second and third extensions allow for retrieving the content of a system-versioned table between any two time points. If a query on system-versioned tables is not one of the above three syntactic options, this query specifies FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP by default, which returns the current system rows as the result only.

A temporal relational schema can be formally defined as R = (A1, tsA1, teA1, A2, tsA2, teA2, …, An, tsAn, teAn), where A1, A2, …, An are common attributes, and each one of them (say Ai (1 ≤ i ≤ n)) may have two associated attributes (tsAi and teAi), representing that Ai contains a period (application-time or system-time) with the start and end columns tsAi and teAi. Relational instance of R written by r (R) is a set of tuples, and we have r (R) = {t1, t2, …, tm}. A tuple of r (R), say tj (1 ≤ j ≤ m), is formally represented as tj =  < aj1, tsaj1, teaj1, aj2, tsaj2, teaj2, …, ajn, tsajn, teajn > , where tj [Ai] = aji, tj [tsAi] = tsaji and tj [teAi] = teaji. Here tj [X] means the value of tuple tj on column X.

In SQL:2011, a table may be both a system-versioned one and an application-time period one, forming a so-called bitemporal table. Rows in bitemporal tables are associated with both the system-time period and the application-time period. Concerning temporal information in the RDF model, in this paper, we pay attention only to the application-time period in RDF and do not consider the system-time period and bi-temporal periods.

4 Temporal RDF Model

Most of the temporal RDF models proposed only attach timestamps directly to the predicates of RDF triples or the whole RDF triples. Such temporal RDF models fail to represent the temporal objects of RDF triples. In this section, we propose a novel temporal RDF model termed tRDF.

4.1 Overview of the tRDF model

First, we adopt the left-closed-right-open time model [Ts,Te) proposed in SQL:2011, where Ts and Te are the start time and end time of the time period, respectively. As a special case, a time interval can be a time point, where the end time of the time period is set to be the highest value of the data type. For example, [1885–05-18, 9999–12-31) means a time point 1885–05-18. For an RDF triple, its predicate or object may be added with a time period. In the paper, we identify two types of temporal RDF triples: the temporal period is attached to the predicate to indicate a time-aware relationship between two resources when the object is a resource; the temporal period is attached to the object to indicate the time-aware value of resource on the property when the object is a literal. The RDF model with the above two types of temporal RDF triples is referred to as tRDF in this paper. We illustrate our tRDF model with examples.

Table 1 presents a classical RDF model containing 6 triples about personal information. Moreover, the graph representation of this RDF model is presented in Fig. 1, where prefixes are not shown in the figure.

Table 1 An example of a traditional RDF model
Fig. 1
figure 1

An example of an RDF graph

As a temporal extension to the traditional RDF model given in Table 1, the tRDF model is shown in Table 2. Its graph representation is presented in Fig. 2, where prefixes are not shown in the figure.

Table 2 An example of a tRDF model
Fig. 2
figure 2

An example of a tRDF graph

It can be seen that the tRDF model is based on a time label, so the tRDF model only needs to modify the timestamps of some temporal triples when temporal information changes. In Table 2, for example, it is assumed that the name of Márton Garas was changed to NameB on January 1, 1900. Then the original temporal triple (Márton_Garas, name, Márton Garas [1885–05-18, 1930–06-26)) should be modified to (Márton_Garas, name, Márton Garas [1885–05-18,1889–12-31)) and meanwhile, a new triple (Márton_Garas, name, NameB [1900–01-01,1930–06-26)) should be added. Of course, it is possible that the name of Márton Garas was changed back to the original later on.

Now let us look at how to represent temporal information with two existing temporal RDF models. With the temporal RDF model whose time labels are attached to the whole RDF triples, we have temporal triples (Márton_Garas, name, Márton Garas) [1885–05-18, 1930–06-26), (Márton_Garas, gender, Male)[1885–05-18, 1930–06-26), (Márton_Garas, birthPlace, Novi_Sad)[1885–05-18, 9999–12-31) and (Márton_Garas, deathPlace, Budapest)[1930–06-26, 9999–12-31). With the temporal RDF model whose time labels are attached only to the predicates of triples, we have temporal triples (Márton_Garas, name[1885–05-18, 1930–06-26), Márton Garas), (Márton_Garas, gender[1885–05-18, 1930–06-26), Male), (Márton_Garas, birthPlace[1885–05-18, 9999–12-31), Novi_Sad) and (Márton_Garas, deathPlace[1930–06-26, 9999–12-31), Budapest). Although these two models can model temporal information in triples, their semantics are ambiguous. With our tRDF model, we have temporal triples (Márton_Garas, name, Márton Garas [1885–05-18, 1930–06-26)), (Márton_Garas, gender, Male [1885–05-18, 1930–06-26)), (Márton_Garas, birthPlace[1885–05-18, 9999–12-31), Novi_Sad) and (Márton_Garas, deathPlace[1930–06-26, 9999–12-31), Budapest). Clearly, they can more exactly describe the temporal semantics in real-world scenarios.

4.2 tRDF syntax

4.2.1 tRDF triple

In the tRDF model, time labels are applied as timestamps, which are added to the predicates or the objects of the common RDF triples, depending on the type of objects. The triples with timestamps in their predicates or the objects are referred to as temporal triples in this paper. The syntax of the tRDF model is declared as a set of temporal triples. Following the step of SQL:2011, a time period for a timestamp is uniformly expressed as [Ts, Te), where Ts and Te are the start time and end time of the time period, respectively. Here two cases are considered: Ts = Te (the time interval is a time point) and Ts < Te (the time interval is truly a period of time).

Definition 1 (tRDF triple)

Temporal triples in the tRDF model have the form of (S,P[Ts,Te),O) if O is a resource or (S,P,O[Ts, Te)) if O is a literal. Here S, P, and O are, respectively, the subject, predicate, and object of triple, Ts, Te ∈ T (T is a time domain) and Ts ≤ Te. The individual terms are described as follows.

  • (S,P,O) is a common triple of the traditional RDF model.

  • When O is a resource, P may be associated with a timestamp, and P[Ts,Te) is a temporal predicate of tRDF triple, indicating that the relationship between two resources, S and O, is valid during the time interval [Ts,Te).

  • When O is a literal, O may be associated with a timestamp and O[Ts,Te) is a temporal literal of the tRDF triple, indicating that the literal O is valid during the time interval [Ts,Te).

  • T is a time domain (a set of time points). For t ∈ T, the data type of t is xsd:date with the format of “yyyy-MM-dd”.

  • A timestamp temporal of tRDF triple is represented by a time interval [Ts,Te), where Ts, Te ∈ T. As a special case, it is allowed for Ts = Te, which signifies a time point instead of a time interval.

Let us look at the tRDF model shown in Table 2. It contains 6 temporal tRDF triples in N-Triples format. These triples describe the personal information of Márton_Garas, including date of birth, place of birth, name, gender, date of death, and place of death. They share a common subject, a resource identified by “http://dbpedia.org/resource/Márton_Garas,” and two types of objects. It is shown in Table 2 that, for the object that is a resource, a timestamp in the form of a time interval is attached to the predicate; for the object that is a literal, a timestamp is added to the object. Note that data date of birth, place of birth, date of death, and place of death are attached with time points and time intervals with the same start time and the end time.

4.2.2 tRDF graph

The tRDF model can be represented as a directed graph. As shown in Fig. 3, in the tRDF graph model, nodes S and O represent the subject and object of the tRDF triple, respectively. When the object is a resource, the directed edge P[Ts,Te) represents a temporal predicate of the triple, i.e., a temporal relationship between S and O. When the object is a literal, the directed edge P represents a static predicate of the triple, and meanwhile, the node O[Ts,Te) represents a temporal object of the triple, which is a temporal value of S on P.

Fig. 3
figure 3

Graphic representation of tRDF triple

Note that by deriving temporal information from the tRDF graph representation, a tRDF graph can be converted to an ordinary RDF graph. For this purpose, it is required to introduce several new vocabularies (e.g., startTime and endTime) to describe the temporal interval. In the tRDF graph, for the nodes with temporal interval, we first introduce a new node T to represent the time interval and then use the startTime and endTime vocabularies to represent the start and end times of node T. As to the subject, predicate, and object of tRDF triples, they are represented with the vocabularies rdf:subject, rdf:predicate, and rdf:object, respectively. Note that, unlike the existing temporal RDF models, the tRDF model is converted according to the type of object. For two temporal RDF triples in Fig. 3, they are converted to the ordinary RDF graph shown in Fig. 4.

Fig. 4
figure 4

Ordinary RDF graph converted from the corresponding tRDF graph

4.3 tRDF semantics

The semantics of the classical RDF model includes three aspects, which are the explanation, satisfaction, and entailment of the RDF model. As for the tRDF model, its semantics are also described from these three aspects.

4.3.1 Temporal interpretation

As with the classical RDF model, the tRDF model is interpreted using expressions or logical relational operators except with added temporal information.

Definition 2 (Temporal interpretation)

Let I be the interpretation of the RDF model and TI be the interpretation of the tRDF model, where the RDF model is obtained by removing all temporal information from the tRDF model. Then TI is defined by adding the following temporal elements into I:

  • A subset T of IR indicates the set of interval information.

  • A flag OR indicates that the object is a resource.

  • A subset BP of IP–basic properties, indicates the set of predicates without temporal information when the object is a literal.

  • A subset TP of IR–temporal properties indicates the set of predicates with temporal information when the object is a resource. Also, temporal-related contents need to be added to TP (e.g., startTime and endTime).

  • A subset BO of IR–basic objects, indicating the set of objects without temporal information when the object is a resource.

  • The literal set IL needs several temporal properties (e.g., startTime and endTime).

  • A mapping PT, mapping TP × (T ∩ OR) × (T ∩ OR) to IP.

  • A mapping ILR, mapping IL × T × T to IR.

4.3.2 Temporal satisfaction

The satisfaction of the temporal RDF model means the basic semantic relationship between the interpretation TI of the tRDF model and temporal RDF triples.

Definition 3 (Temporal satisfaction)

Given an interpretation TI of the tRDF model TM, TI satisfies a certain triple tm ∈ T (written as TItm), if and only if.

  • Ts, Te ∈ T, (S,P,O) ∈ (TI(Ts)TI(Te)), we have TI(S,P[Ts,Te),O) when O is a resource;

  • Ts, Te ∈ T, (S,P,O) ∈ (TI(Ts)TI(Te)), we have TI(S,P,O[Ts,Te)) when O is a literal.

If TItm for ∀tm ∈ TM, then it can be said that the temporal interpretation TI satisfies the tRDF model TM, written as TITM.

4.3.3 Temporal entailment

Temporal entailment represents the logical relationship between two entities (e.g., temporal inclusion), which is mainly used for intellectual reasoning and logical deduction.

Definition 4 (Temporal entailment)

Let TM be the tRDF model and TI be an interpretation of TM.

When O is a resource,

  • Ts, Te ∈ T, (S,subP,O) ∈ (TI(Ts)TI(Te)), we have TI(S,subP[Ts,Te),O);

  • If TI(S,P[Ts,Te),O) and there exists a time interval [Ts',Te') (Ts ≤ Ts' ≤ Te' ≤ Te), there is TI(S,P[Ts',Te'),O).

When O is a literal,

  • Ts, Te ∈ T, (S,P,subO) ∈ (TI(Ts)TI(Te)), we have TI(S,P,subO[Ts,Te));

  • If TI(S,P,O[Ts,Te)) and there exists a time interval [Ts',Te') (Ts ≤ Ts' ≤ Te' ≤ Te), there is TI(S,P,O[Ts',Te')).

5 Storage of tRDF

In this paper, we store tRDF data with SQL:2011, which supports temporal data manipulation. We first propose the relational schema designed for temporal RDF storing and then propose the rules and algorithms of mapping tRDF data to the relational databases.

5.1 Design of database schema

Among the three methods of storing the classical RDF data with relational databases, the horizontal stores suffer from problems such as multi-valued attributes and many null values (or the horizontal stores with a single table) or too many built tables (for the horizontal stores with multiple tables); the type store is applicable to the scenarios that the RDF triples hold more types of subjects, and may have problems of multi-valued attributes, some null values and some built tables (Ma et al., 2016). Many built tables for RDF triple store mean that many join operations are generally involved for querying. In addition, with the horizontal and type stores, when new triples are inserted, new predicates must result in dynamic relational schema(s).

Based on the above understanding, in this paper, we adopt the basic idea of the vertical stores to store the tRDF data with SQL:2011. To overcome the shortages of the classical vertical stores and satisfy the need to store temporal information, we designed five tables, named the Namespace table, Subject table, Property table, Object table, and Statement table, respectively, rather than a single table. The schemas of these five relational tables are defined as follows.

Definition 5 (Schema of the relational table)

The schema of the relational table is a six-tuple P = (N, COL, DT, PK, FK, L).

  1. (1)

    N = TN ∪ DN is a finite non-empty set of names, where TN is a set of names of the entity tables and DN is a set of names of data type;

  2. (2)

    COL is a finite non-empty set of column names of the table. For ∀t ∈ TN, we have ∃COL(t);

  3. (3)

    DT is a set of data types of columns of the table. For ∀c ∈ COL(t), we have ∃DT(c) ∈ DN;

  4. (4)

    PK is a set of primary keys of the table. For ∀t ∈ TN, we have ∃PK(t) ∈ COL(t);

  5. (5)

    FK is a set of foreign keys of the table. For ∀t ∈ TN, we have ∃n(n ≥ 0) FK(t) ⊆ COL(t);

  6. (6)

    L ⊆ TN × TN is a set of relationships between the tables. The relationships between tables are represented by the reference from the foreign key FK(ti) to the primary key PK(tj). For ∀ti,tj ∈ TN, a reference to table tj's primary key PK(tj) by table ti's foreign key FK(ti) can be indicated as FK(ti) → PK(tj).

The Property table shown in Table 3 contains the ID, NS_ID, Property, PTs, and PTe columns. The type of ID column is BIGSERIAL, which means self-increment. PRIMARY KEY indicates that the ID column is the table's primary key, which can uniquely identify a record. NOT NULL means it is not allowed for the column to be empty. The NS_ID column is used to represent the IRI prefix of the complete predicate stored in the Namespace table. References Namespace (ID) keyword means that this column is the table's foreign key, which refers to the ID column in the Namespace table. The Property and columns PTs and PTe respectively correspond to the predicate and time information,. Here it is allowed for the PTs and PTe columns to be empty, and their data types must be the same. As we know, for a tRDF triple, a timestamp is attached to its predicate when its object is a resource; a timestamp is attached to its object when its object is a literal, where PTs and PTe columns are empty. The PERIOD FOR keyword is used to define the valid time, and the ProPeriod is the name of the time interval.

Table 3 Property table creation with SQL:2011

The structure of the Namespace table is shown in Table 4. This table, which is applied to store the prefix of the tRDF triple, consists of the primary key ID and the Prefix column. There are many duplicate IRIs in the tRDF data, and separating the IRI from the subject, predicate, and object can significantly save storage space. The Namespace table stores the IRIs in the Prefix column and is linked to the Subject, Predicate, and Object tables by the primary key ID.

Table 4 Namespace table

The structure of the Subject table is shown in Table 5. This table, which is applied to store the subject of the tRDF triple and associate with the Statement table through the primary key ID column, consists of the primary key ID, the foreign key NS_ID, and the Resource column. The NS_ID column acts as a foreign key to refer to the prefix of the subject stored in the Namespace table. The Resource column stores the subjects without prefixes.

Table 5 Subject table

The structure of the Property table is shown in Table 6. This table, which is applied to store the predicate of the tRDF triple and associate the Statement table through the primary key ID, consists of the primary key ID, the foreign key NS_ID, the Property, the PTs, and the PTe column. The NS_ID column acts as a foreign key to refer to the prefix of the predicate stored in the Namespace table. The Property column stores the predicates without prefixes. When the object is a resource, the PTs and PTe columns are, respectively, the start and end time of the time interval, and the PTs and PTe columns are empty when the object is a literal.

Table 6 Property table

The structure of the Object table is shown in Table 7. This table, which is applied to store the object of the tRDF triple and associate the Statement table through the primary key ID, consists of the primary key ID, the foreign key NS_ID, the Object, the OTs, and the OTe column. Here the NS_ID column acts as a foreign key to refer to the prefix of the object stored in the Namespace table. Note that when the object is a literal, the NS_ID of the record corresponds to the ID with a null prefix in the Namespace table. The Object column stores the objects without prefixes. When the object is a resource, the OTs and OTe columns are empty, and the OTs and OTe columns are, respectively, the start and end time of the time interval when the object is a literal.

Table 7 Object table

The structure of the Statement table is shown in Table 8. This table, which is applied to store the statement of tRDF triple by using integers, consists of the primary key ID, the foreign key Sid, the foreign key Pid, and the foreign key Oid column. The Statement table uses foreign keys, Sid, Pid, and Oid, to refer to the subjects, predicates, and objects stored in the tables.

Table 8 Statement table

The above tables are connected through their primary keys and foreign keys. The relationships between these tables are shown in Fig. 5.

Fig. 5
figure 5

Relationships of the created relational schemas

5.2 Mapping rules

Based on the relational schemas designed in Sect. 5.1, we present the rules for mapping the tRDF model to the relational tables. First, we need to divide the subjects, predicates, and objects of tRDF triples into prefix N, subject ES without prefix, predicate EP without prefix, object EO without prefix, and temporal information Ts and Te. On this basis, the mapping rules for each part of tRDF triples are given as follows.

  • Rule 1: Insert the prefix N into the Prefix column of the Namespace table. Note that when the object is a literal, the Prefix column is allowed to be null according to the structure setting of the Namespace table.

  • Rule 2: Insert the prefix ID in the Namespace table returned by Rule 1, and then insert the corresponding subject ES without prefix into the NS_ID and Resource columns of the Subject table, respectively. The ID column in the Subject table corresponding to each record acts as the primary key referred to in the Statement table.

  • Rule 3: Insert the prefix ID in the Namespace table returned by Rule 1 and the corresponding predicate EP without prefix into the NS_ID and the Property columns of the Property table, respectively. When the object is a resource, insert the temporal information Ts and Te into the PTs and PTe columns of the Property table, respectively. The ID column in the Property table corresponding to each record acts as the primary key referred to in the Statement table.

  • Rule 4: Insert the prefix ID in the Namespace table returned by Rule 1 and the corresponding Object EO without prefix into the NS_ID and the Object columns of the Object table, respectively. When the object is a literal, insert the temporal information Ts and Te into the OTs and OTe columns of the Object table, respectively. The ID column in the Object table corresponding to each record acts as the primary key referred to in the Statement table.

  • Rule 5: When the NS_ID, Property, and Object columns are the same as the situations in Rules 3 and 4, the following four cases need to be considered according to different temporal information.

  • Discard: if (PTsOTs) ≤ Ts⋀(PTeOTe) ≥ Te, the data will be discarded.

  • Insert: if (PTsOTs) > Te or (PTeOTe) < Ts, the data will be inserted.

  • Cover: if (PTsOTs) > Ts⋀(PTeOTe) < Te, the original data in the tables will be covered.

  • Combine: if (PTsOTs) > Ts⋀(PTeOTe) ≥ Te or (PTsOTs) ≤ Ts⋀(PTeOTe) < Te, the original data in the tables will be combined with the new data.

  • Rule 6: Insert the ID of the subject, predicate, and object of each tRDF triple returned by Rules 2, 3, and 4 into the Sid, Pid, and Oid columns of the Statement table.

5.3 Mapping algorithms

To store tRDF data in relational databases, the original tRDF data need to be analyzed first. Through tRDF data analysis, the prefix, subject, predicate, object, and temporal information of the temporal RDF triples can be obtained. The details of the tRDF data analysis are shown in Algorithm 1.

Algorithm 1 tRDF Data AnalysisFootnote 7

figure c

With the mapping rules proposed in Sect. 5.2, the algorithm for data storage can store the prefix, subject without prefix, predicate without prefix, object without prefix, and temporal information of the tRDF triples obtained with Algorithm 1 into the relational tables. The details of the tRDF data store are shown in Algorithm 2.

Algorithm 2 Storage of tRDF-to-relational table

figure d

6 Query of Temporal RDF

For the temporal RDF model tRDF, the traditional RDF query language SPARQL should be extended to support tRDF data query. In this paper, we propose such a query language for the tRDF, termed as tSPARQLt. In this section, we first describe tSPARQLt from two aspects of syntax and basic query statement. Furthermore, to query the tRDF data stored in the temporal relational databases with tSPARQLt, it is necessary to transform the tSPARQLt queries into the corresponding SQL queries.

6.1 Syntax of tSPARQLt

Similar to the SPARQL syntax, we present the tSPARQLt syntax with terms and triple. In this paper, we use the N-Triples format to represent tRDF triples.

6.1.1 tRDF terms

Following are the four basic terms of the tSPARQLt syntax.

  • The syntax for IRIs. The subject, predicate, and object with a resource type of each RDF triple in SPARQL are composed of complete IRIs, where " < " and " > " are delimiters and not part of the IRI reference. In the tRDF model, a timestamp is attached to the predicate of a triple when its object is a resource. As a result, in tSPARQLt, when the object is a resource, a time interval [Ts,Te) may be attached after the predicate IRI of the tRDF triple.

  • The syntax for literals. Literals are used to represent string, date, number, or Boolean in SPARQL. A string literal is surrounded by quotation marks, which is followed by the language type quoted by "@" or the IRI identifier that indicates the string type quoted by "^^." A date literal is similar to the string type, where the date is represented as a string but followed by an IRI identifier that indicates the date type. A number literal (e.g., INTEGER, DECIMAL, and DOUBLE) is interpreted as the numeric meaning of the corresponding type, which does not have any quotation marks or is not followed by an IRI to specify the data type. A Boolean literal can be written directly as TRUE or FALSE. In the tRDF model, a timestamp may be attached to the object that is a literal. So, for the tRDF triple whose object is a literal, in tSPARQLt, the time interval [Ts,Te) should be added after the literal object of the tRDF triple.

  • The syntax for query variables. SPARQL identifies a query variable by prefixing it with mark "?" or "$," but they are not part of the variable name. SPARQL has three types of query variables: ?S, ?P, and ?O, which denote the query on the subject, predicate, and object of RDF triples, respectively. For the tRDF model, two new query variables ?Ts and ?Te are introduced to tSPARQLt, which are used to represent the query on the start and end times of time intervals, respectively. Then tSPARQLt can query the variables ?Ts and ?Te on the predicate when the object is a resource, and query the variables ?Ts and ?Te on the object when the object is a literal.

  • The syntax for blank nodes. The blank nodes in tSPARQLt are consistent with SPARQL.

6.1.2 tRDF triple pattern

A triple pattern in SPARQL is a special triple, and at least one of its subject, predicate, or object is a query variable, for example, {?S ?P ?O.}. In the tRDF model, a temporal triple contains temporal information on its predicate or object. Correspondingly, a triple pattern in tSPARQLt should have two new variables ?Ts and ?Te to represent time information. We identify two types of tRDF triple patterns as follows:

  • {?S ?P[?Ts,?Te) ?O.}) for the case that the object is a resource.

  • {?S ?P ?O[?Ts,?Te).} for the case that the object is a literal.

6.2 Query statement of tSPARQLt

The query statement structure of tSPARQLt is similar to SPARQL, which contains four clauses and is summarized in Table 9. These four clauses correspond to the query form, dataset, graph pattern with constraints, and solution modifier. In the following, we explain these four clauses in detail.

Table 9 tSPARQLt query statement

6.2.1 Query form of tSPARQLt

Like SPARQL, tSPARQLt also includes four query types identified by the keywords SELECT, CONSTRUCT, ASK and DESCRIBE, respectively. SELECT returns all variables or a subset of the variables that are obtained by a query pattern match. CONSTRUCT produces a tRDF graph that is made up of the matched triples. ASK determines whether the querying dataset contains the desired match, which returns true or false as a result. DESCRIBE returns tRDF information for all resources that match the query. Among these four types of queries, SELECT is very useful for data retrieval.

The use of SELECT of tSPARQLt is basically the same as the SELECT of SPARQL. First, the statement SELECT * returns all variables and their values bound to these variables. Second, the statement SELECT ?variable1 ?variable2… only returns the variables and their bindings that are specified by the given variable names. The main difference between the SELECT of tSPARQLt and the SELECT of SPARQL is that they use different RDF models. The tRDF model contains temporal information, and the SELECT of tSPARQLt may contain two query variables ?Ts and ?Te, which describes the start time and end time of a time interval. So, the SELECT of tSPARQLt can query the time variables ?Ts and ?Te and the SELECT of SPARQL cannot.

6.2.2 Dataset in tSPARQLt

The dataset in tSPARQLt all means the data resource identified by the URL enclosed by "[]." The dataset can include zero or more named graphs. Like SPARQL, tSPARQLt always contains a default graph. A query statement without the clause FROM means that the query is evaluated against a default graph. When a named graph is specified in the query statement, which means a match to the named graph only, both the clauses FROM and FROM NAMED can be used in the query. The former is for a traditional RDF dataset, and the latter is for a temporally extended tRDF dataset. The specific form of the dataset in tSPARQLt is presented in Table 10.

Table 10 Dataset declaration in tSPARQLt

6.2.3 Graph patterns of tSPARQLt

Like SPARQL, the queries of tSPARQLt are evaluated based on graph pattern matching. Nevertheless, the graph pattern of tSPARQLt is different from the graph pattern of SPARQL because tSPARQLt is for the tRDF model, and its graph pattern should contain temporal information also.

The statement SELECT ?Ts ?Te WHERE {?S P[?Ts,?Te) ?O}, for example, matches all triples with the given predicate P, regardless of the subject, object, and temporal information. For the matched triples whose objects are a resource, the start time and end time of their predicates can be returned by the statement with pattern {?S P[?Ts,?Te) ?O}. And the statement SELECT ?Ts ?Te WHERE {?S ?P O[?Ts,?Te)} matches all triples with the given object O, regardless of the subject, predicate, and temporal information. For the matched triples whose objects are a literal, the start time and end time of their objects can be returned by the statement with pattern {?S ?P O[?Ts,?Te)}. Also, the statement SELECT ?S WHERE {?S P[Ts,Te) ?O} matches all triples with the given predicate P[Ts,Te). This statement with pattern {?S P[Ts,Te) ?O} can return subjects of the matched triples whose objects are a resource. Furthermore, the statement SELECT ?S WHERE {?S ?P O[Ts,Te)} matches all triples with the given object O[Ts,Te). This statement with pattern {?S ?P O[Ts,Te)} can return subjects of the matched triples whose objects are a literal.

The graph pattern of tSPARQLt can combinedly use three conventional variables (i.e., ?S, ?P and ?O) and two temporal variables (i.e.,?Ts and ?Te). With the above examples of basic graph patterns, we can construct diverse graph patterns for tSPARQLt, including basic graph patterns, group graph patterns, and optional graph patterns.

Basic graph pattern

tSPARQLt uses graph patterns to match and filter tRDF data, and graph patterns are, therefore, the most important part of tSPARQLt query statements. A graph pattern is built with the basic graph patterns (BGPs), and a basic graph pattern consists of multiple triple patterns. For where all triple patterns need to be matched. Of course, it is possible that a basic graph pattern only contains a triple pattern. SPARQL contains eight basic triple patterns, depending on the combination of different query variables. In tSPARQLt, the basic triple patterns are greatly extended by adding two new query variables about time information. Table 11 presents the basic triple patterns in tSPARQLt when the object is a resource. In addition, there are similar basic triple patterns in tSPARQLt when the object is a literal, which are not shown here.

Table 11 Triple patterns of tSPARQLt when the object is a resource

Group graph pattern

The group graph pattern in tSPARQLt consists of a set of basic graph patterns, in which each basic graph pattern must be matched for query evaluation. In tSPARQLt, the basic graph pattern is composed of triple patterns. Let us look at an example SELECT ?S ?O WHERE {?S P1[Ts1,Te1) ?O. ?S P2[Ts2,Te2) ?O. ?S P3[Ts3,Te3) ?O.}. This statement matches all triples whose predicates are P1[Ts1,Te1), P2[Ts2,Te2) and P3[Ts3,Te3), which returns the subjects and objects of such triples.

Optional graph pattern

The optional graph pattern in tSPARQLt, which is identified by the keyword OPTIONAL, contains basic graph patterns or group graph patterns that may not be matched in query evaluation. In tSPARQLt, the triple patterns are used to construct the optional graph pattern. The statement SELECT ?S ?O WHERE {?S P1[Ts1,Te1) ?O. OPTIONAL {?S P2[Ts2,Te2) ?O}}, for example, matches all triples whose predicates must be P1[Ts1,Te1) and may be P2[Ts2,Te2), which returns the subjects and objects of such triples.

Constraints in graph pattern

Like SPARQL, tSPARQLt also uses the keyword FILTER to filter the results obtained with the graph patterns. The clause FILTER can only occur in the graph patterns. There are two major scenarios to use FILTER: restricting the value of strings and restricting the values of some types (e.g., numeric types, xsd:string, xsd:boolean and xsd:dateTime). We discuss the FILTER of these two scenarios in tSPARQLt as follows.

  • Restriction of string. The FILTER of tSPARQLt also uses the function regex() to match string literals. Using the str function, regex() can also match the lexical forms of other literals. The clause WHERE {?S P[Ts,Te) ?O. FILTER REGEX (?O, "^Knowledge Graphs", "i")}, for example, searches all triples whose predicates are P[Ts,Te) and whose objects contain "Knowledge Graphs." Here the "i" flag indicates that the match in regex() is case-insensitive. The FILTER of tSPARQLt is not different from the FILTER of SPARQL in this scenario.

  • Restriction of some data types. The FILTER can be used to restrict expressions, which consist of variables, operators, and RDF terms. In tSPARQLt, the used variables can be ?Ts and ?Te, and the tRDF terms can be time values. The clause WHERE {?S P[?Ts,?Te) ?O. FILTER (?Ts > 2007–01-02)}, for example, can filter the matched triples by setting the start time of the predicate on January 2, 2007.

To simplify time expressions in the FILTER of tSPARQLt, it is necessary to use temporal predicates. Temporal query languages have been widely studied in the context of relational databases. Following SQL:2011, we introduce and apply seven temporal predicates in the FILTER of tSPARQLt, including CONTAINS, OVERLAPS, EQUALS, PRECEDES, SUCCEEDS, IMMEDIATELY PRECEDES, and IMMEDIATELY SUCCEEDS. Their usage samples are as follows.

  1. (1)

    FILTER (X CONTAINS Y)

  2. (2)

    FILTER (X OVERLAPS Y)

  3. (3)

    FILTER (X EQUALS Y)

  4. (4)

    FILTER (X PRECEDES Y)

  5. (5)

    FILTER (X IMMEDIATELY PRECEDES Y)

  6. (6)

    FILTER (X SUCCEEDS Y)

  7. (7)

    FILTER (X IMMEDIATELY SUCCEEDS Y)

Here, X can be a temporal variable such as ?Ts and ?Te, or a pair like (?Ts, ?Te); Y can be a time point such as 2020–1-1, or a time period such as (2020–1-1, 2021–12-31).

6.2.4 Solution modifiers of tSPARQLt

The returned results of the tSPARQLt query may be large, unordered, and redundant. Like SPARQL, tSPARQLt can use six modifiers, which are ORDER BY, DISTINCT, LIMIT, PROJECT, REDUCED, and OFFSET, to optimize the result set. With these modifiers, a more intuitive and easy-to-understand sequence is created. The solution modifiers that are commonly used in tSPARQLt are briefly presented as follows.

  • ORDER BY: It is followed by variable name(s) and an optional order modifier ASC (ascending order) or DESC (descending order), ranking the results in ascending/descending order according to the variable name(s). Here is an example SELECT ?S WHERE {?S P[Ts,Te) ?O.} ORDER BY DESC (?S). Noe that tSPARQLt may use two new variables for time, so it can order the results according to the time variables, for example, SELECT ?S WHERE {?S P[?Ts,?Te) ?O.} ORDER BY ?Ts.

  • DISTINCT: It is followed by a variable name(s) in the statement SELECT, eliminating possible duplicates of the result set on the variable name(s), for example, SELECT DISTINCT ?S WHERE {?S P[Ts,Te) ?O.}.

  • LIMIT: It is followed by an integer, indicating the maximum number of returned results, for example, SELECT ?S WHERE {?S P[Ts,Te) ?O.} LIMIT 20.

  • OFFSET: It is followed by an integer, setting the offset of the returned result. It is often used in conjunction with LIMIT, for example, SELECT ?S WHERE {?S P[Ts,Te) ?O.} LIMIT 20 OFFSET 10.

6.3 Transformation from tSPARQLt to SQL

tSPARQLt queries are not supported by relational databases. So, to query tRDF data stored in relational tables with tSPARQLt queries, it is necessary to transform the tSPARQLt queries into the corresponding SQL queries. In this section, we first discuss the rules of transforming the basic query statement in tSPARQLt to SQL. We further present concrete transformation cases where major tSPARQLt queries are transformed into their corresponding SQL statements. With the transformation of basic statements, more complex statements can be transformed.

6.3.1 Transformation rules

The basic query statement of tSPARQLt proposed in Sect. 6.2 contains four clauses. We investigate how to use these four clauses in tSPARQLt.

6.3.2 Transformation of query types

As we know, tSPARQLt includes four types of queries: SELECT, ASK, CONSTRUCT and DESCRIBE. Here we only focus on SELECT, which is widely applied for RDF data query. The keyword SELECT in tSPARQLt directly corresponds to the keyword SELECT in SQL. The keyword SELECT of tSPARQLt is followed by variable names prefixed with "?", which are separated by spaces. The keyword SELECT of SQL is followed by column names of tables, which are separated by ",".

First, the statement SELECT * is used both in tSPARQLt and SQL. This statement in tSPARQLt returns all variable names and their binding values, and it in SQL returns all records (i.e., tuples) in the entire table. The transformation rules for the statement SELECT are summarized as follows.

  • The statement SELECT * in tSPARQLt can make a direct transformation without any change.

  • The statement SELECT ?Variablename1 ?Variablename2 …?VariablenameN in tSPARQLt needs to be transformed to the column names (i.e., attributes) of the corresponding tables in SQL, where the columns are named according to the actual meaning of the variable names (i.e., subject, predicate, object or time). The result of the SQL statement looks like SELECT column1,column2,…, columnN.

Transformation of the dataset

The keyword FROM in tSPARQLt directly corresponds to the keyword FROM in SQL. The keyword FROM of tSPARQLt is followed by the tRDF dataset to be queried, which is identified by the IRI enclosed in "[]." This dataset can be either a default or a named graph that may contain zero or more graphs.

The clause FROM can be left out in the tSPARQLt, which means that the query statement can be executed in the default graph. Unlike tSPARQLt, in relational databases, the clause FROM must exist and is used to specify the table(s) that contain(s) the column names provided by the clause WHERE. For the transformation of the clause FROM in tSPARQLt, it is necessary to determine the tables being used by SQL according to the variables that arise in the clause SELECT in tSPARQLt. The transformation rules for the dataset are summarized as follows.

  • The query statements in the SQL must contain the statement FROM, no matter if there is the statement FROM in the tSPARQLt.

  • The corresponding relational tables are selected according to the query variables provided in the clause SELECT of tSPARQLt. The selected table names are added to the clause of SQL (e.g., SELECT column1,column2,…,columnN FROM table_name).

Transformation of Graph Patterns

In tSPARQLt, a graph pattern is introduced by the keyword WHERE, which is contained in "{}." For example, the clause WHERE {?S P[Ts,Te) ?O} uses a basic graph pattern for the tRDF model, where the objects of triples are a resource. The clause WHERE in tSPARQLt directly corresponds to the clause WHERE in SQL. Unlike tSPARQLt, the clause WHERE in SQL is followed by the query conditions rather than the graph patterns in tSPARQLt. The query conditions of SQL do not need to be enclosed with "{}" and are connected with the keywords AND/OR. The query conditions in SQL must correspond to all triple patterns in the tSPARQLt graph patterns. The key to transforming the graph patterns of tSPARQLt to the query conditions of SQL is to transform the triple patterns contained in the graph patterns.

To transform the clause WHERE of tSPARQLt, it is necessary to understand the meaning of each triple pattern in tSPARQLt first. A basic graph pattern {?S P[Ts,Te) ?O}, for example, means that it will match all tRDF triples whose objects are a resource and predicates are P[Ts,Te). The corresponding meaning of this basic graph pattern in SQL is to get all rows whose predicates are P, starting and ending at Ts and Te, respectively. Based on this understanding, the above basic graph pattern is transformed to a clause of SQL WHERE Predicate = P AND PTs = Ts AND PTe = Te. In addition, a pluralistic basic graph pattern in tSPARQLt consists of a set of basic graph patterns. For its transformation, each basic graph pattern is first transformed separately, and then these transformed conditions are connected with the keyword OR in SQL.

Moreover, the group graph pattern is a combination of many basic graph patterns. Its transformation is similar to that of the pluralistic basic graph pattern, except for the keyword AND rather than OR. We summarize the transformations of graph patterns in tSPARQLt as follows.

  • The keyword WHERE in tSPARQLt can be transformed to the keyword WHERE in SQL directly.

  • In SQL, the "{}" in tSPARQLt is not required, and the keyword WHERE is directly followed by the query conditions.

  • A basic graph pattern in tSPARQLt is transformed to the query conditions in SQL. The triple pattern {S ?P[Ts,Te) ?O}, for example, corresponds to condition PTs = Ts AND PTe = Te in SQL; the triple pattern {S ?P[?Ts,?Te) ?O} corresponds to condition Subject = S in SQL.

  • The transformations of the pluralistic basic graph pattern and group graph pattern in tSPARQLt are on the single basic graph pattern. First, their triple patterns are respectively transformed to the corresponding query conditions in SQL. Then these transformed conditions are connected with AND (for the transformations of pluralistic basic graph pattern) or OR (for the transformations of group graph pattern).

As to the clause FILTER used in the graph patterns of tSPARQLt, it is transformed to SQL by adding conditions with the keyword AND. Since the clause FILTER only takes effect in the basic graph pattern. For a group graph pattern, the clause FILTER in each basic graph pattern is separately transformed into a condition, which should be placed in the query statement of SQL. These conditions are finally connected with the keyword AND. We summarize the transformations of constraints in tSPARQLt as follows.

  • A restriction on string values in tSPARQLt is transformed to the corresponding regular expression with the same semantics in SQL.

  • A restriction on number or date in tSPARQLt is transformed with the comparison operators and keywords such as NOT, LIKE, IN, NOT IN, BETWEEN in SQL.

  • For multiple restrictions in the clause FILTER of tSPARQLt, each restriction is first transformed to a condition in SQL with the two transformations above. Then these transformed conditions are connected together with AND. The final conditions are placed in the clause WHERE of SQL.

Transformation of solution modifiers

Generally speaking, tSPARQLt and SQL use very similar solution modifiers. In the following, we briefly describe some of the common ones. First, the clause ORDER BY is used both in tSPARQLt and SQL, which follows the clause WHERE. However, the clause ORDER BY is followed by the variable names in tSPARQLt and the column names of the tables in SQL, respectively. In addition,, although tSPARQLt and SQL both use the keywords ASC and DESC, the clauses ORDER BY in tSPARQLt and SQL are applied differently to represent ascending/descending order. In tSPARQLt, the keyword ASC or DESC is followed by a ranking variable, which is enclosed in "()"; In SQL, the keyword ASC or DESC is directly followed by a column name without "()." We summarize the transformations of the clause ORDER BY in tSPARQLt as follows.

  • Essentially, the variable names in tSPARQLt need to be transformed to the corresponding column names in SQL, maintaining their relative positions.

  • ASC(?variablename) and DESC(?variablename) in tSPARQLt are transformed as columnname ASC and columnname DESC in SQL, respectively.

Second, the clause DISTINCT is used in the same way both for tSPARQLt and SQL, which follows the keyword SELECT and can eliminate duplicate information in the returned results. In addition, the clauses LIMIT and OFFSET are also used in the same way both in tSPARQLt and SQL. The clause LIMIT is applied to limit the number of returned results, and the clause OFFSET is applied to specify the offset of returned results. These two modifiers follow the clause WHERE in tSPARQLt and SQL. So, the transformations of the three modifiers from tSPARQLt to SQL are straightforward.

6.3.3 Transformation examples

In the above, we present the principles of transforming tSPARQLt to SQL. In this section, we take the tRDF model whose triple objects are a resource and triple predicates may be time-aware as an example to show how diverse tSPARQLt queries with SELECT are transformed to the corresponding SQL queries. These transformation examples are presented in Table 12. Also, we can have similar transformation examples for querying the tRDF model whose triple objects are a literal and may be time-aware and do not give them due to the limited space.

Table 12 Transformation examples of SELECT query for the tRDF model with temporal predicates

Note that Table 12 gives some transformation examples for tSPARQLt query statements. With them, we can transform more complex tSPARQLt query statements. For example, we can modify the triple pattern in Example 2 to obtain the Predicate-Object lists, Subject-Predicate lists, and Object lists in the basic graph pattern. Other graph patterns, such as the group graph pattern, can be obtained by extending Example 5, and filtering data can be achieved by adding more constraints on the basis of Example 7. It should be noted that the above transformation examples are only at a conceptual level. In practice, the transformed SQL statements should be optimized against the database structure.

Discussions. Generally speaking, both SPARQL for RDF and SQL for RDBMS are structure-based query languages with SELECT-FROM-WHERE. They have direct correspondences: the clauses SELECT, FROM, and WHERE in SPARQL directly correspond to the clauses SELECT, FROM, and WHERE in SQL, respectively. Also, they use very similar solution modifiers: the clauses ORDER BY, DISTINCT, LIMIT, and OFFSET are used in the same way both for SPARQL and SQL. It is demonstrated in Chebotko et al. (2009) that SPARQL-to-SQL translation is semantics preserving. The temporal RDF query language tSPARQLt and the temporal RDBMS query language SQL:2011 still have direct correspondences and similar solution modifiers. On this basis, the semantics preserving tSPARQLt-to-SQL:2011 translation is completed for all possible situations: triple patterns, basic graph patterns, optional graph patterns, alternative graph patterns, and value constraints. The core of the tSPARQLt-to-SQL:2011 translation is the clause WHERE mapping. Any tSPARQLt queries, including complex ones, constitute one of the above primary situations or a combination of them, and their translations to SQL:2011 are, therefore, semantics preserving. The transformation examples in Table 12 demonstrate the preservation of semantics.

7 Experimental Evaluations

We designed several comparative experiments to verify the feasibility and effectiveness of our proposed storage and query methods in terms of storage time and query efficiency. The experiments were implemented based on the Eclipse platform with JDK13 and PostgreSQLFootnote 8and performed on a system with an Intel(R) Core(TM) i5-4210H 2.9 GHz processor, 8.00 GB RAM, and Windows 10 operating system. Here PostgreSQL is the most powerful open-source relational database management system (RDBMS). Although there are other popular RDBMSs available such as Oracle, MySQL, and SQL Server,Footnote 9 PostgreSQL conforms to primary features for SQL:2011 Core conformance. So far, no RDBMS meets full conformance with this standard. To avoid possible errors that may occur in a single experiment, our experimental results given below were obtained by averaging the results of five times of experiments.

7.1 Datasets

There are many classical RDF datasets (Ma et al., 2016). Among them, DBpediaFootnote 10 contains 10,310,048 triples describing a multitude of personal information. As shown in Table 1, these triples contain temporal information that appears as elements of triples. Following our temporal RDF model, we re-formatted the triples with temporal information in DBpedia. Basically, we extracted and identified two categories of temporal information and respectively added them as annotations to predicates or objects of the original triples according to the object type. This way, we obtained the tRDF triples, as shown in Table 2. Unlike the temporal RDF dataset, whose temporal information is randomly generated, our tRDF dataset directly comes from the original RDF data with temporal information and is, therefore more authentic and significant. To test the performance of our proposed storage and query methods on datasets with different sizes, we divided the constructed tRDF dataset into four tRDF datasets (i.e., Dataset1, Dataset2, Dataset3, and Dataset4) with different sizes. These four temporal RDF databases are shown in Table 13.

Table 13 Four temporal RDF datasets with different sizes

7.2 Experimental results

We verified the feasibility of tRDF data storage and query methods proposed in the paper from two aspects: storage time and query efficiency. To facilitate the following discussion, our storage method proposed in Sect. 5 is referred to the PostSQL mapping. We compare the PostSQL mapping with the vertical mapping.

7.2.1 Storage time

Each tRDF dataset listed in Table 13 is stored in PostgreSQL with vertical mapping and PostSQL mapping, respectively. The times of storing these four tRDF datasets with these two methods are shown in Fig. 6. It is shown in Fig. 6 that when the number of tRDF triples in the datasets is less than one million, the storage times of both methods increase linearly with the number of triples. However, the storage times increase exponentially with the number of triples when the amount of data exceeds one million.

Fig. 6
figure 6

Time of storage

It is also shown in Fig. 6 that, for storage of Dataset1, the vertical mapping and the PostSQL mapping take 0.69 min and 3.43 min, respectively. Their difference in storage time is not very significant because of the small size of the dataset. For storing a given dataset, the PostSQL mapping takes more time than the vertical mapping for two reasons. First, the PostSQL mapping needs to analyze the tRDF triples first to obtain the prefix and temporal information and then store the obtained information in different tables. tRDF data analysis inevitably leads to more time consumption. Second, the vertical mapping directly stores triples in one table and only executes the query once to judge the uniqueness of data. On the contrary, the PostSQL mapping stores triples in multiple tables and needs to check the uniqueness of each table, which must lead to the growth of time.

7.2.2 Query efficiency

Based on the PostSQL mapping method proposed in the paper, we used five query statements with different complexities to test their query efficiencies over datasets with different sizes. These five query statements are presented in Table 14.

Table 14 Five query statements

In Table 14, Q1 is used to search all records in the Statement table of the PostgreSQL database and displays the first 500 records. Q2 is used to search all records with the subject "http://dbpedia.org/resource/Fiatau_Penitala_Teo." Q3 specifies the predicate on the basis of Q2, which is used to search the records with the predicate "http://xmlns.com/foaf/0.1/gender" and the subject "http://dbpedia.org/resource/Fiatau_Penitala_Teo." Q4 is used to search the records whose temporal information on the object starts on July 23, 1911, and ends between November 1, 1920, and December 1, 1998. Q5 is used to search the records whose predicate is "http://dbpedia.org/ontology/deathPlace," in which the temporal information on the predicate starts later than January 1, 1990, and displays the first 500 records. The experimental results of these five query statements are shown in Fig. 7.

Fig. 7
figure 7

Query time of different queries over datasets

Figure 7a, b, c, d, and e correspond to the query results of Q1 to Q5, respectively. First, it is shown in Fig. 7 that, for a query that does not involve temporal information, its query efficiency based on the two mapping methods is almost independent of the size of the dataset. At this point, with the vertical mapping, the maximum time differences of executing each of Q1, Q2, and Q3 on two datasets, Dataset4 and Dataset1, are respectively 39 ms, 23 ms, and 54 ms; with the PostSQL mapping, the maximum time differences of executing each of Q1, Q2 and Q3 on two datasets Dataset4 and Dataset1 are respectively 43 ms, 72 ms and 19 ms. For a query that involves temporal information, however, its execution time based on two mapping methods is significantly affected by the size of the dataset. For two queries Q4 and Q5, based on the PostSQL mapping, their execution time on Dataset4 are 16.21 times and 36.54 times longer than their execution time on Dataset1. In addition, by comparing Fig. 7d and e with Fig. 7a, b, and c, it can be observed that, for the same dataset, the queries involving temporal information generally need more execution time than the queries without temporal information. For the queries without temporal information, their executions based on the vertical mapping and the PostSQL mapping have an approximate query efficiency. For the queries with temporal information (e.g., Q4 and Q5), their executions based on the PostSQL mapping take more time. This is because PostSQL mapping uses multiple relational tables to store data and inevitably involves querying multi-tables. It is shown in Fig. 7d and e that, for queries Q4 and Q5 over Dataset4 with 10 million data, their executions based on the PostSQL mapping only take 1.95 s, 1.90 s longer than their executions based on the vertical mapping.

8 Conclusion and Future Work

In this paper, we proposed a novel temporal RDF model, termed tRDF, in which a temporal triple contains a temporal predicate or a temporal object. We defined the syntax and semantics of tRDF in detail. With the proposed temporal RDF model, we proposed the storage and query method to manage tRDF data with SQL:2011. We developed the rules and algorithms that can map tRDF data to relational databases. In addition, we extended the SPARQL query language and provided a temporal query language termed tSPARQLt for the tRDF model query. To efficiently query the temporal RDF data stored in the relational databases, we further defined partial transformation rules for transforming the query statements from tSPARQLt to SQL. Finally, we designed comparison experiments to verify the feasibility and effectiveness of our proposed storage and query methods in terms of storage time and query efficiency.

In this paper, we used PostgreSQL to verify our storage and query methods against a temporal RDF dataset generated from DBPedia. In our future work, we plan to generate several massive temporal RDF datasets from, for example, Wikipedia and YAGO, and then verify our methods against these temporal RDF datasets with other relational database management systems such as Oracle, MySQL, and SQL Server. Considering that the queries with aggregations are usually used by professionals, we will investigate how to further extend tSPARQLt for aggregate queries over temporal RDF data and how to transform it to SQL. Scalability is a crucial issue with large-scale temporal RDF data management. In this direction, we will introduce optimization structures such as indexes and data partitioning and also explore the storage of massive temporal RDF data with NoSQL databases.