A Semantic Framework for Designing Temporal SQL Databases

Gao, Qiao; Lee, Mong Li; Dobbie, Gillian; Zeng, Zhong

doi:10.1007/978-3-030-00847-5_27

Qiao Gao²⁰,
Mong Li Lee²⁰,
Gillian Dobbie²¹ &
…
Zhong Zeng²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11157))

Included in the following conference series:

International Conference on Conceptual Modeling

2243 Accesses
5 Citations

Abstract

Many real world applications need to capture a mix of temporal and non-temporal entities, relationships and attributes. These concepts add complexity when designing database schemas and it is difficult to capture the temporal semantics precisely. We propose a new framework for designing SQL databases that distinguishes between temporal and non-temporal concepts while also distinguishing between entities, relationships and attributes at every step. The framework first utilizes an Entity-Relationship (ER) diagram to capture the real world semantics. Temporal constructs in the ER diagram are then annotated. Finally we map the temporal ER diagram to a normal form database schema that reduces redundant data by separating current data from historical data. We also describe how data consistency is maintained during updates. Experiment results show that we can generate database schemas that support efficient access to both current and historical information.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Temporal Data Management – An Overview

A Comparison of Different Forms of Temporal Data Management

τJSchema: A Framework for Managing Temporal JSON-Based NoSQL Databases

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Many organizations, especially in regulated industries such as finance and healthcare, need to manage and maintain data that changes over time. SQL:2011 [10] introduces temporal tables where a relational table can be associated with an explicit time period [Start, End) to restrict the valid times of its tuples. However, designing database schemas that capture both temporal and non-temporal data with entity, relationship and attribute semantics is complex, and may lead to incorrect temporal semantics and data redundancy if the semantics are not carefully considered at every step of the design process.

A database that keeps the history of entities, relationships and their attributes will contain both current valid and historical data. Information about an entity/relationship and its temporal and non-temporal attributes may be stored over several tuples in multiple temporal and/or non-temporal tables. Temporal joins are needed to enforce constraints between the time periods of tuples from two or more temporal tables. For queries that primarily focus on current valid data, the volume of historical data will hamper the query evaluation process. Designing database schemas that reduce temporal joins and support efficient access to both current and historical information is crucial.

Figure 1 shows an example company database whose data are stored using the temporal tables in SQL:2011. The schema of the database was designed using a standard technique of deriving tables from an ER diagram and adding attributes Start and End to each table to indicate the validity of the data values.

Temporal Semantics. The complexity of mixing temporal and non-temporal data may lead to difficulty in enforcing the intended temporal semantics when updates occur. Consider the temporal table Employee which has a non-temporal attribute Phone whose value may change over time but only the current value is of interest and captured, and a temporal attribute Salary whose changes over time are tracked. Suppose an employee Alice wants to update her phone number to 90000011 on 2017-03-01. There are two possible ways to do the update, both of which are problematic:

a.
Set the valid end time of the tuple $t_{13}$ to 2017-03-01, and insert a new tuple <e01, Alice, 8000, 90000011, 2017-03-01, 9999-12-31> to the table Employee, where the date “9999-12-31” means now. However, this violates the user requirement to keep only the latest phone number, that is, Phone is non-temporal, because tuples $t_{11}$ and $t_{12}$ still maintain the previous phone number of Alice.
b.
Set the value of attribute Phone in the tuples $t_{11}$, $t_{12}$, $t_{13}$ to Alice’s new phone number. In this case, the tuple $t_{11}$ becomes <e01, Alice, 90000011, 5000, 2000-01-01, 2010-01-01> which does not provide the correct valid time for Alice’s phone number.

Data Redundancy. Having a schema where a table has both temporal and non-temporal attributes may lead to data redundancy when a temporal attribute in the table is updated. For instance, if Alice (e01) has a salary raise on 2012-01-01 from $\$7000$ to $\$8000$, a new tuple $t_{13}$ will be inserted into the Employee table, and the end date of the tuple $t_{12}$ is set to 2012-01-01. Note that Alice’s non-temporal attributes Ename and Phone are replicated, since we store them together with the temporal attribute Salary in one relation. Further, the temporal table EmpDep captures the temporal relationship between employees and departments. This table also has a temporal attribute Position, and any change in the job position of an employee will lead to replication of data.

Costly Query Evaluation. Processing a routine query involving temporal tables can become complex and costly. If a user wants to find the department that Alice currently works in, we need to carry out temporal joins of the tables Employee, EmpDep and Department to retrieve the current records of Alice. The historical records in each table participating in the temporal join will slow down the query evaluation process.

To address the above issues, we propose a framework that utilizes a high-level conceptual model, such as the Entity-Relationship (ER) model, to facilitate the database design process. We first construct a normal form ER diagram without considering the temporal aspects [12]. Then we annotate the entity types, relationship types and attributes based on the temporal aspects of the database. Finally, we map the temporal ER diagram to a set of normal form relations. We automatically generate two sets of normal form relations to increase the speed of queries, one for the current database state, and another for the historical database state. The key contributions of the mapping algorithm include:

a.
Mapping each temporal and all non-temporal attributes to separate tables, thus reducing data redundancy;
b.
Associating time periods with each temporal concept such as entity, relationship and attribute in the same relation instead of the tuple, further reducing data duplication;
c.
Separating the current and historical data of entities and relationships, thus increasing the speed of queries.

With the separation of current and historical data, we examine how data consistency is maintained when there are updates. Experimental results show that our framework leads to temporal database schemas that reduce data redundancy, and supports efficient querying of both current and historical information.

2 Temporal ER Diagram

An ER diagram models real world entities that interact with each other via relationships, and their attributes. The works in [5, 6, 8] have proposed temporal ER diagrams to capture the temporal semantics in the real world. Here, we describe a temporal ER diagram ($ERD^T$) that contains the essential constructs required for our mapping algorithms. We also analyze the semantics when we have temporal constructs, e.g., temporal entities that participate in non-temporal relationships, or non-temporal entities that participate in temporal relationships.

Capturing both real world semantics and temporal aspects of entities and relationships typically makes an ER diagram complicated and hard to design. As such, we propose to first use an ER diagram to capture the entities, relationships and attributes of a database application, and then annotate these constructs with a superscript T to indicate that they are temporal. We have 3 types of temporal constructs depending on the changes that an application wants to keep track of.

Temporal Entity Type. A temporal entity type indicates that all the entities of this type are temporal. Each entity is associated with some implicit time period values [Start, End). The entity is valid within these time periods.

Temporal Relationship Type. A temporal relationship type indicates that all the relationships of this type are temporal. A relationship may be associated with more than one time period values.

Temporal Attribute. A temporal attribute indicates that the value of this attribute of an entity or relationship may change over time periods and the database keeps track of the changes.

An $ERD^T$ can have combinations of non-temporal and temporal constructs, e.g., a temporal entity/relationship can have non-temporal attributes, and a non-temporal entity/relationship can have temporal attributes. Not all the participating entity types of a temporal relationship need to be temporal, and some participating entity types of a non-temporal relationship can be temporal. An $ERD^T$ becomes a traditional ER diagram if all its entity types, relationship types and attributes are non-temporal. Figure 2 shows the temporal ER diagram of a company database. Department is a non-temporal entity type. Employee is a non-temporal entity type with temporal attributes Salary and Hobby. Project is a temporal entity type. EmpDep is a temporal relationship type with a temporal attribute Position. Note that the temporal relationship type SPJ has an aggregate entity type SP.

3 Mapping Algorithms

In this section, we present our mapping algorithms to generate the schema for a temporal database from a temporal ER diagram. The schema separates the current and historical data to facilitate the processing of queries. Depending on the workload of the application, the schema can be a set of normal form relations or a set of nested relations. The latter facilitates the frequent retrieval of all the information pertaining to some entitiy/relationship.

3.1 Current Database Schema

We can store the current state of the database in a set of normal form relations. Since the temporal ER diagram models time period attributes (Start and End) implicitly, these attributes will be added to the database schema. Details of the mapping is given in [4]. The key steps of the mapping are:

a.
All non-temporal or temporal single-valued attributes of a non-temporal or temporal entity/relationship type are mapped to one relation.
b.
Each non-temporal/temporal multivalued attribute together with the identifier of its entity/relationship type is mapped to a separate relation.
c.
For each temporal entity/relationship type in $ERD^T$, we add an attribute $R\_Start$ to its corresponding current relation $R\_curr$. This $R\_Start$ attribute indicates the start time of the temporal entities/relationships.
d.
For each temporal attribute A, we add an attribute $A\_Start$ to the current relation $R\_curr$ containing this temporal attribute A to indicate the start time of the attribute values of A.

All the single valued attributes (temporal or non-temporal) are put in one relation, and a start time for each of the temporal single valued attributes, and a start time for the entity/relationship types is added if the entity/relationship type is temporal. This removes the issue of data redundancy because the current relation has only one current value for each single-valued temporal attribute. Since our $ERD^T$ is in normal form and the current state database is a snapshot database, the relations obtained are in 3NF or 4NF [12].

Key of a Current Relation. The identifier of a temporal/non-temporal entity type becomes the key of the corresponding current relation for its single valued attributes. The current relation corresponding to a temporal/non-temporal relationship type contains the identifiers of its participating entity types. For the current relation corresponding to a temporal/non-temporal multivalued attribute, the identifier of its entity/relationship type together with this multivalued attribute form the key. Note that the Start time of a temporal entity/relationship/attribute is not part of the key.

Example 1

[Current Data]. Based on the $ERD^T$ in Fig. 2, we generate some normal form relations for the current database schema.

1.
Non-temporal entity type Employee with temporal attributes Salary and Hobby:

$Employee\_curr (\underline{Eid},Name,Phone,Salary,Salary\_Start)$

$EmployeeHobby\_curr (\underline{Eid,Hobby},Hobby\_Start)$
2.
Temporal entity type Project with non-temporal attributes Tittle and Budget:

$Project\_curr (\underline{Jid},Title,Budget, Project\_Start)$
3.
Non-temporal relationship type SP with temporal attribute Price:

$SP\_curr (\underline{Sid,Pid},Price, Price\_Start)$
4.
Temporal relationship type SPJ with non-temporal attribute Quantity:

$SPJ\_curr (\underline{Sid,Pid,Jid},Quantity,SPJ\_Start)$
5.
Temporal relationship type EmpDep with temporal attribute Position:

$EmpDep\_curr (\underline{Eid},Did,Position$, $Position\_Start, EmpDep\_Start)$

$\square $

Note that we can have more than one start time in a relation, e.g. in relation $EmpDep\_curr$, the start time for relation $EmpDep\_Start$ records the existence of relationship EmpDep, while attribute $Position\_Start$ records the start time of temporal attribute Position.

3.2 Historical Database Schema

Our mapping algorithm separates the current and historical data by generating another set of relations for the historical data. Generating the normal form relations for the historical database is similar to that for the current database. The main differences are:

a.
For each temporal entity/relationship type in $ERD^T$, its corresponding historical relation $R\_hist$ contains both $R\_Start$ and $R\_End$ attributes indicating the start and end time of the temporal entities/relationships.
b.
For each temporal single valued or multivalued attribute A, we generate a separate historical relation which contains the primary key of its original current relation together with a time period, i.e. $A\_Start$ and $A\_End$ attributes.
c.
Each non-temporal multivalued attribute A of a temporal entity/relationship type is associated with $R\_Start$ and $R\_End$ attributes to indicate the set of attribute values for each time period of the corresponding relationship R.
d.
The non-temporal attributes of non-temporal entity/relationship type is not stored in any historical relations.

Note that we separate temporal attributes and non-temporal attributes in the historical relations. Further, we create one historical table for each temporal attribute and each temporal entity/relationship type.

Key of Historical Relation. The start time $R\_Start$ for a temporal entity or relationship type R and the start time $A\_Start$ for a temporal attribute A are part of the key in the corresponding historical relations.

Example 2

[Historical Data]. Consider the $ERD^T$ in Fig. 2. The relations to store the historical information are as follows:

1.
Non-temporal entity type Employee with temporal attributes:

$EmployeeSalary\_hist (\underline{Eid,Salary\_Start },Salary\_End,Salary)$

$EmployeeHobby\_hist (\underline{Eid,Hobby,Hobby\_Start},Hobby\_End)$
2.
Temporal entity type Project:

$Project\_hist (\underline{Jid, Project\_Start}, Project\_End,Title,Budget)$
3.
Non-temporal relationship type SP with temporal attribute Price:

$SP\_hist (\underline{Sid,Pid, Price\_Start},Price\_End,Price)$
4.
Temporal relationship type SPJ:

$SPJ\_hist (\underline{Sid,Pid,Jid,SPJ\_Start},SPJ\_End,Quantity)$
5.
Temporal relationship type EmpDep with temporal attribute Position:

$EmpDep\_hist (\underline{Eid,EmpDep\_Start},EmpDep\_End,Did)$

$EmpDepPosition\_hist (\underline{Eid, Position\_Start},Position\_End$, Did, Position)

$\square $

3.3 Schema with Nested Relations

In order to minimize the cost of temporal joins, we propose to store all the attributes of an entity/relationship type in one nested relation. Since the $ERD^T$ is in normal form, we can generate a set of normal form nested relations [13] for the $ERD^T$ including the Start and End attributes.

Current Nested Relations. The difference between the nested relations for current data and the normal form relations lies in the mapping of multivalued attributes to a first-level repeating group inside the relation of its corresponding entity/relationship type. This removes the need to join the relations for the entity/relationship and the multivalued attribute.

Example 3

[Current Data in Nested Relations]. Consider the $ERD^T$ in Fig. 2. Nested relations generated for the current state of the database include

1.
Non-temporal entity type Employee with temporal attributes Salary and Hobby:

$Employee\_currN (\underline{Eid},Name,Phone$, $Salary,Salary\_Start$, $(\underline{Hobby},$ $Hobby\_Start)^*)$
2.
Temporal entity type Project with non-temporal attributes Title and Budget:

$Project\_currN (\underline{Jid},Title,Budget, Project\_Start)$
3.
Non-temporal relationship type SP with temporal attribute Price:

$SP\_currN (\underline{Sid,Pid},Price, Price\_Start)$
4.
Temporal relationship type SPJ with non-temporal attribute Quantity:

$SPJ\_currN (\underline{Sid,Pid,Jid},Quantity,SPJ\_Start)$
5.
Temporal relationship type Join with temporal attribute Position:

$EmpDep\_currN (\underline{Eid},Did,Position, Position\_Start$, $EmpDep\_Start)$

$\square $

Note that the multivalued temporal attribute Hobby is stored in a first-level repeating group in relation $Employee\_curr$.

Historical Nested Relations. The intuition behind historical nested relations is to put each single-valued or multivalued temporal attribute with its start and end time attributes into a repeating group, since a temporal attribute becomes a multivalued attribute in historical relation.

For each temporal entity/relationship type, all its non-temporal single-valued attributes together with attributes $R\_Start$ and $R\_End$ form the first-level repeating group of a nested relation R. The time period constrains the temporal entity/relationship as well as all its attributes. Each non-temporal multivalued attribute forms a second-level repeating group since it is constrained by the time period of the temporal entity/relationship. Each single-valued or multivalued temporal attribute A together with their time period $A\_Start$ and $A\_End$ also form a second-level repeating group.

Note that if an entity/relationship type is non-temporal but has some temporal attributes, we generate the nested relation that contains only the temporal attributes as repeating groups. Details of the mapping algorithm is given in [4].

Key for Historical Nested Relation. The start time $R\_Start$ for a temporal entity or relationship type R and the start time $A\_Start$ for a temporal attribute A is a part of the key for its corresponding historical relation or repeating group.

Example 4

[Historical Data in Nested Relations]. Consider the $ERD^T$ in Fig. 2. The nested relations to store the historical information of temporal entities, relationships and attributes are generated as follows. “$()^*$” denotes a repeating group, and underlined attributes in a repeating group form the key of this group.

1.
Non-temporal entity type Employee with two temporal attributes:

$Employee\_histN (\underline{Eid}, (Salary,$ $\underline{Salary\_Start},Salary\_End)^*, $

$(\underline{Hobby,Hobby\_Start},Hobby\_End)^*)$
2.
Temporal entity type Project:

$Project\_histN (\underline{Jid}, (Title,Budget,\underline{Project\_Start}, Project\_End)^*)$
3.
Non-temporal relationship type SP with temporal attribute Price:

$SP\_histN (\underline{Sid,Pid},(Price, \underline{Price\_Start}, Price\_End)^*)$
4.
Temporal relationship type SPJ:

$SPJ\_histN (\underline{Sid,Pid,Jid},$ $(Quantity,\underline{SPJ\_Start},SPJ\_End)^*)$
5.
Temporal relationship type EmpDep with temporal attribute Position:

$EmpDep\_histN (\underline{Eid,Did},$ $((Position,\underline{Position\_Start},Position\_End)^*,$

$\underline{EmpDep\_Start},EmpDep\_End)^*)$

$\square $

Note that the historical relation for the non-temporal entity type Employee does not contain non-temporal attributes (e.g. Name), which can be obtained by joining the historical relation with the corresponding current relation in Example 3.

4 Maintaining Data Consistency

Since we separate the current and historical data, we need to ensure consistency when updates occur. Figure 3 shows the schema generated from the temporal ER diagram in Fig. 2. We focus on updates to the current data since historical data are typically not updated but archived for subsequent analysis. Update operations on the current data include modifying the values of temporal and non-temporal attributes, and inserting or deleting tuples. These tuples may correspond to multivalued attributes or some entity/relationship.

Modify Attribute Values. If a non-temporal attribute such as an employee’s phone number is modified, we simply replace the old attribute value in the current relation with the new value since the database does not keep the old values of non-temporal attributes. However, if a temporal attribute such as an employee’s salary is changed, we need to insert a corresponding tuple containing the old attribute value with a valid end time to the historical relation. The start date for the new salary value in the current relation is changed to reflect the valid time of the change. Figure 4 shows the changes (in italics) in the current and historical Employee relations after updating the phone number and salary of Alice to 90000011 on 2017-03-01 and $10,000 on 2017-06-01 respectively.

Insert/Delete Tuples. Inserting new tuples to the current data does not affect the historical data, and will not lead to any inconsistency. However, deleting a tuple t from a current relation $R\_curr$ may require the insertion of a corresponding tuple to the historical relation with valid end time. We have 3 cases:

1.
$R\_curr$ corresponds to a multivalued attribute of some entity/relationship. If the attribute is temporal, we insert a corresponding tuple $t'$ to $R\_hist$ with a valid end time, and delete t from $R\_curr$. Otherwise, we simply delete t from $R\_curr$. For example, if we delete tuple <e01, Swimming, 2000-01-01> from $EmployeeHobby\_curr$ in Fig. 3 with a valid end time 2017-08-01, we will insert a tuple <e01, Swimming, 2000-01-01, 2017-08-01> with valid end time 2017-08-01 to the historical relation $EmployeeHobby\_hist$.
2.
$R\_curr$ corresponds to an entity type t with identifier e.

Since the information of an entity may be stored across multiple relations, its deletion may trigger the deletion of tuples that correspond to its multivalued attributes and the relationships the entity participates in.
- Case 2(a) Entity type is non-temporal.
  
  We delete all tuples with identifier e from current and historical relations, i.e., remove all information about this entity from the database.
- Case 2(b) Entity type is temporal.
  
  We delete tuples with identifier e from the current relations, and insert corresponding tuples with valid end times in the historical relations.
For example, Alice leaves the company and we delete tuple <e01, Alice, 90000001, 8000,2012-01-01> from $Employee\_curr$ in Fig. 3. This triggers the deletion of all tuples with $eid=e01$ from both current and historical relations. In contrast, suppose project j04 is completed in 2017-10-01 and we delete it from current relation $Project\_curr$. Since this is a temporal entity, the database will store its information in the historical relations. As such, we delete tuples involving j04 from current relations and insert the corresponding tuples with valid end times in historical relations. If employee is a temporal entity type, then we need to create historical relations to store the non-temporal single-valued and multivalued attributes of employees who have left the company, i.e., they have been deleted from current relations.
3.
$R\_curr$ corresponds to a relationship type.

If the relationship is temporal, then insert a corresponding tuple with valid end time into the historical relation $R\_hist$. Otherwise, delete tuple t from $R\_curr$. For example, if employee Smith (e03) no longer works in the department d01 on 2017-09-01, then we insert tuple <e03, d01, 2012-01-01, 2017-09-01> into the historical relation $EmpDep\_hist$ and a tuple <e03, d01, Engineer, 2005-01-01, 2017-09-01> is inserted into the historical relation $EmpDepPosition\_hist$. The tuple <e03, d01, Engineer, 2005-01-01, 2012-01-01> is deleted from the current relation $EmpDep\_curr$.

5 Experiments

We evaluate the performance of our approach for designing temporal databases. We use Oracle 12c Enterprise Edition hosted on Solaris 10 with 2.60 GHz CPU and 128 GB RAM. Oracle implements an object-relational model and supports multi-level nested tables as objects. Three schemas are generated based on the Employee database^{Footnote 1} which captures employees who work for and manage departments in a company, and keeps track of the changes of the employees’ salaries, job titles, departments worked for and managed.

S1: This is the traditional schema which mixes current and historical data.
S2: This is our normal form relations that separates current and historical data.
S3: This is our normal form nested relations that stores current and historical data separately.

We have 3 datasets for each schema: original Employee dataset $D_1$, and two synthetically generated datasets $D_2$ and $D_3$. Table 1 shows the statistics of these datasets. In $D_2$, each current entity/relationship has 10 historical entities/relationships, and each temporal relationship/attribute is associated with one time period. This increases the size of historical data to evaluate our approach that separates the current and historical states of the database. In $D_3$, each current entity/relationship has 1 historical entity/relationship, and each temporal relationship/attribute is associated with 10 time periods. This increases the number of repeating groups to evaluate our approach that stores historical data in nested relations. Table 2 gives the descriptions of our 9 test queries, and Fig. 5 shows the CPU time of SQL execution for each query on the 3 datasets.

Table 1. Statistics of Datasets

Full size table

Table 2. Queries for Employee database

Full size table

$Q_1$ and $Q_2$ query the current data only. These queries run much faster on all 3 datasets with schema S2 and S3 compared to S1. The gap in their runtime widens when the ratio of historical to current data increases (as in $D_2$). The query times for S2 and S3 are the same since they have the same current relations.

$Q_3$ to $Q_5$ retrieve historical information of current/resigned employees. $Q_3$ requires traditional join and an aggregation function on the temporal attribute. $Q_4$ finds all historical records for a large set of entities, while Q5 finds the historical records for a few entities. All these queries need to retrieve the historical record for each entity/relationship. S3 gives the best performance on all 3 datasets as it reduces the joins between temporal attributes and their entities/relationships.

$Q_6$ to $Q_7$ are queries on temporal relationships or constrain temporal attributes with some values. As the queries focus on either historical or current data, S1 is slower that S2. S3 is expensive since it is difficult to retrieve entities or relationships whose attributes satisfy some constraints from nested relations.

$Q_8$ involves a temporal join between current data and historical data. Since S1 combines current and historical data, we need to join two Emptitle relations and check that the time period of “Engineer” is before that of “Senior Engineer”. This is time consuming. Separating current and historical data avoids such temporal joins. S3 does not perform as well as S2 because this query constrains the value of the temporal attributes, which is slow for nested relations. $Q_9$ retrieves both current and historical data. The runtimes for this query is similar for all three schemes since we need to scan current and historical data.

The results of our experiments demonstrate that separating the data into current and historical relations (for both flat and nested relations) improves the efficiency when queries are focused on either current or historical data. The normal form relations for storing historical data (S2) is a better design for temporal join between current and historical data, and is also good for querying historical data with some constraints on temporal attributes. The nested relations which stores historical data (S3) shows good performance for queries that retrieve two or more historical records for entities/relationships.

6 Related Work

Research in conceptual modeling captures the temporal aspects of a database by introducing new temporal constructs to the standard ER diagram [1, 3, 9, 16,17,18], or extending existing constructs to include temporal semantics [2, 11]. Our $ERD^T$ is a simplified temporal ER diagram that contains the essential constructs for our mapping algorithm.

There are two main approaches to generate a temporal database schema from an ER diagram. One approach is to map a traditional ER model to a database schema, annotate it with temporal functional dependencies, and use the decomposition approach to obtain a set of normal form relations including temporal relations [14]. However, these relations do not capture the semantics of temporal entity, relationship and attribute. Further, when the database schema becomes complex, it is not easy for users to specify the temporal functional dependencies compared to annotating an ER diagram as in our proposed approach.

Another approach is to map a temporal ER model to a temporal database schema. The works in [16, 18] map their temporal ER models to the standard ER model first (which may include adding explicit time period attributes) before translating the ER diagram to the database schema. However, the semantics of temporal entity, relationship and attribute are lost during the mapping since the time attributes, i.e., Start and End, are treated as regular attributes in the standard ER model. The works in [7, 11, 15, 17] map their temporal ER models directly to a database schema. [17] generates a set of 3NF database schema that does not capture the semantics of temporal attributes since they use a temporal ER model which does not support temporal attribute and attributes of relationship type. [7] maps a temporal ER diagram to a surrogate-based relational model (RM/T model) which does not capture the semantics of temporal relationship type. [11] adds an implicit valid time period for each generated relation, thus making all the attributes in the database temporal. This may increase the data redundancy when there are one or more temporal attributes in a relation. All these methods do not separate the current data from historical data.

Although commercial databases such as Microsoft SQL server and IBM DB2 10 implemented the feature of historical and current tables, a historical table is a mirror of its current table, and the purpose is to track modifications in the current table. Further, these tables are only available to data involving transaction time, and not data involving valid time. Moreover, SQL Server and DB2 cannot distinguish between temporal attributes and temporal entities, which needs to be handled differently when updates occur. In contrast, our proposed framework provides a principled approach to generate database schema that captures these temporal semantics and supports efficient access of data involving valid time.

7 Conclusion

The requirement to capture a mix of temporal and non-temporal entities, relationships and attributes adds complexity to the design of database schemas. We proposed a semantic framework that precisely captures temporal semantics. Each step in the framework distinguishes temporal and non-temporal entities, relationships and attributes. Our mapping algorithm generates current and historical schemas from a temporal ER diagram. This accelerates querying of current data, and provides the flexibility of mapping to a set of nested relations. We discussed how consistency between current and historical data is maintained during updates. Experiments showed that the schemas obtained provide efficient access to both current and historical information. Future work includes studying the physical design over our temporal database schema, e.g., temporal indexing.

Notes

1.
https://dev.mysql.com/doc/employee/en/.

References

Elmasri, R., El-Assal, I., Kouramajian, V.: Semantics of temporal data in an extended ER model. In: ER (1990)
Google Scholar
Elmasri, R., Wuu, G.T.J.: A temporal model and query language for ER databases. In: IEEE ICDE (1990)
Google Scholar
Ferg, S.: Modelling the time dimension in an entity-relationship diagram. In: ER (1985)
Google Scholar
Gao, Q., Lee, M.L., Ling, T.W., Dobbie, G., Zeng, Z.: A semantic framework for designing temporal SQL databases. Technical report, TRB3/18, NUS (2018)
Google Scholar
Gregersen, H., Jensen, C.: Conceptual modeling of time-varying information. Technical report, TimeCenter TR-35 (1998)
Google Scholar
Gregersen, H., Jensen, C.S.: Temporal entity-relationship models - a survey. IEEE TKDE 11, 464–497 (1999)
Google Scholar
Gregersen, H., Mark, L., Jensen, C.S.: Mapping temporal ER diagrams to relational schemas. Technical report, TimeCenter TR-39 (1998)
Google Scholar
Khatri, V., Ram, S., Snodgrass, R.T., et al.: Capturing telic/atelic temporal data semantics: Generalizing conventional conceptual models. IEEE TKDE 26, 528–548 (2014)
Google Scholar
Klopprogge, M.R.: TERM: an approach to include time dimension in the entity-relationship model. In: ER (1981)
Google Scholar
Kulkarni, K., Michels, J.E.: Temporal features in SQL: 2011. Sigmod Rec. 41, 34–43 (2012)
Article Google Scholar
Lai, V.S., Kuilboer, J.P., Guynes, J.L.: Temporal databases: model design and commercialization prospects. ACM SIGMIS Database 25, 6–18 (1994)
Article Google Scholar
Ling, T.W.: A normal form for entity-relationship diagrams. In: ER (1985)
Google Scholar
Ling, T.W., Yan, L.: NF-NR: a practical normal form for nested relations. J. Syst. Integr. 4, 309–340 (1994)
Article Google Scholar
Papazoglou, M., Spaccapietra, S., Tari, Z.: Advances in Object-Oriented Data Modeling. MIT Press, Cambridge (2000). Chap. 7
Google Scholar
Snodgrass, R.T.: Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann Publishers, San Francisco (2000). Chap. 11
Google Scholar
Tauzovich, B.: Towards temporal extensions to the entity-relationship model. In: ER (1991)
Google Scholar
Theodoulidis, C., Loucopoulos, P., Wangler, B.: A conceptual modelling formalism for temporal database applications. Inf. Syst. 16, 401–416 (1991)
Article Google Scholar
Zimanyi, E., Parent, C., Spaccapietra, S., Pirotte, A.: TERC+: a temporal conceptual model. In: International Symposium Digital Media Information Base (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

National University of Singapore, Singapore, Singapore
Qiao Gao & Mong Li Lee
University of Auckland, Auckland, New Zealand
Gillian Dobbie
Data Center Technology Lab, Huawei, Hangzhou, China
Zhong Zeng

Authors

Qiao Gao
View author publications
You can also search for this author in PubMed Google Scholar
Mong Li Lee
View author publications
You can also search for this author in PubMed Google Scholar
Gillian Dobbie
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiao Gao .

Editor information

Editors and Affiliations

Lucentia, University of Alicante, Alicante, Spain
Juan C. Trujillo
Miami University, Oxford, OH, USA
Karen C. Davis
Renmin University of China, Beijing, China
Xiaoyong Du
Northwestern Polytechnical University, Xian, China
Zhanhuai Li
Department of Computer Science, National University of Singapore, Singapore, Singapore
Tok Wang Ling
Department of Computer Science and Technology, Tsinghua University, Beijing, Beijing, China
Guoliang Li
National University of Singapore, Singapore, Singapore
Mong Li Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Q., Lee, M.L., Dobbie, G., Zeng, Z. (2018). A Semantic Framework for Designing Temporal SQL Databases. In: Trujillo, J., et al. Conceptual Modeling. ER 2018. Lecture Notes in Computer Science(), vol 11157. Springer, Cham. https://doi.org/10.1007/978-3-030-00847-5_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-00847-5_27
Published: 26 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00846-8
Online ISBN: 978-3-030-00847-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Semantic Framework for Designing Temporal SQL Databases

Abstract