Keywords

1 Introduction

Time is one of the fundamental dimensions of the physical space-time universe [45], it is a philosophical a-priori [32], and time is one of the most important categories for perceiving the physical and social world. Time is a very special dimension of our perception. It flows, even if we do nothing, it changes without requiring our contribution. It has a direction. And unlike space, we can never revisit a particular point in time, while we can revisit a particular point in space. Time points are in a total order.

When it comes to data modeling, We hardly observed an application domain where time is not a relevant aspect of discourse. Hence, modeling temporal aspects of a universe of discourse correctly is indispensable for data models. We are not aware of any real-world data model where no temporal information is included, where no data type time or date is used in a schema.

The Covid pandemic and the publication of pandemic related data showed us that the treatment of time in data models is frequently insufficient and leads to numerous misinterpretations as temporal aspects were not adequately modeled or communicated. So in many countries the time points of infection, of developing symptoms, of negative or positive tests, of registering symptoms, tests, etc. in databases, the time points of publishing data, etc. were not clearly distinguished and represented. This led to unnecessary and disturbing differences in statistical evaluation of available data.

In this paper, we will discuss some fundamental aspects of representing time in data models. Conceptual (data) models are essential in many areas [25] - from requirements engineering [13], software design [21, 37], or database development to the meta-modeling of process enactment services [14]. These temporal models can then be implemented in various forms, e.g., in temporal databases [3, 23, 40, 41].

First, we show how time itself can be modeled and which data types are available for representing time, and which are the basic objects representing time: time-points and durations. Many applications have requirements relating to the concepts of time [13]. These requirements have to be checked and are mapped to data models and process models.

Most of the textbook exercises for data modeling discuss snapshot models, i.e., the representation of the state of the universe of discourse at a particular point in time, predominately, the current state. In real applications, nowadays, we find that it is necessary to represent the state of affairs over longer periods of time. We show the fundamental concepts of including history and temporal dimensions in data models and point out that conceptual models featuring time as fundamental modeling dimension are available (e.g., [8, 31, 34, 36]). This has also serious consequences on data models and in particular on integrity constraints. For an example, 1-to-1 relationships in snapshot models might become n-to-m relationships in temporal data models. Hence we discuss the consequences of moving from a snapshot to a temporal representation.

In many application domains, data is typically no longer deleted or updated in situ to the current state but just declared as outdated. There is a trend to collect data over longer periods of time in order to prepare it for data mining, knowledge discovery, machine learning and other techniques of data science. On the one hand this is a tremendous source for gaining insights and to compute forecasts, but on the other hand requires an adequate modeling of the temporal dimensions to avoid erroneous inferences (see, e.g., [2, 39, 44]).

Data models typically can be seen as sequences of snapshots, as a series of static views on the universe of discourse. Process models, on the other hand, focus on the changes over time, the dynamics of the universe of discourse and as such naturally feature a temporal dimension (e.g., [7, 12, 13, 17, 18, 22, 33]). In process mining, these views are combined to derive process models and other useful information about the dynamics of organizations from the data collected and represented with a necessary temporal dimension [35, 42].

Also data models, which explicitly feature time, as in most multidimensional data models which allow to stare time-series of measures frequently fall short of representing the evolution of other data in relation to the measurements. We show how temporal data warehouses [20] address this problem. In a more general view, we have to deal with the problem that not only data changes over time, but also data models evolve. Multi-version databases [26] strive to address this problem of schema evolution.

The main focus of this paper is on temporal integrity constraints which allow the representation of constraints between time-points of the data model. A particular difficulty of temporal constraints is how to check whether there are inconsistencies, i.e., whether the constraints are in conflict. As we will show, this is more difficult than checking other constraints, as satisfiability is not a sufficient criterion for specifying that it is possible to not violate some constraints. Introducing the notion of controllability we propose adequate procedures to check whether a set of constraints is in conflict.

The aim of this paper is to provide a broad overview about the issues of including time as an explicit special dimension in data models, discuss some of the problems involved and review some established and some quite recent techniques for dealing with these problems. In focal point we see in the treatment of time in data models the inherent difficulties and opportunities of integrating static and dynamic perceptions of a universe of discourse into a holistic view.

2 Representing Time

How is time measured? Do we measure time equally anywhere on Earth? How can we represent time and how can we reason about it? Let us start this discussion at the very beginning.

When we talk about time, we usually talk about discrete time measured in chronons, representing time spans of different granularity such as seconds, minutes, hours, days and so forth. In order to allow a reasonable large group of people to relate to a common time, we use calendars that are relative to some periodic event. For example the solar calendar measures time relative to earths rotation around the sun, defining a year to consist of approximately 365 days. However, there is more than one calendar system (e.g., Lunar, Lunisolar, Gregorian, ...), thus, bringing us back to one of the initial questions: “Do we measure time equally anywhere on earth?”. Although there have been several proposals for an universal calendar, the simple answer to this question is no, and this fact, has to be considered when engineering time in data models.

When we reason about time, we usually refer to events that either happen at a point in time or within an interval. For example “John finished his presentation at 16:00” refers to an instantaneous event at a specific point in time, while “John was giving an hour long presentation” refers to an interval specified by a pair of time points (start and end of his presentation). Sometimes we use temporal relations, where we describe the time at which an event happens to be relative to some other event. For example “John started his presentation after Jane finished hers”. This gives us, on one hand, more flexibility for describing points in time, and on the other hand, the tools to formulate complex temporal propositions.

From a technical perspective, we encode time as natural numbers forming a time axis starting at time point zero. The time at which an event happens, can be defined as time point positioned on a time axis. This position can be either absolute to zero, or relative to some other time point, defined by some temporal relation. As an example, let us assume that there are two time points A, B. Time point A happens at 16 (absolute) and B happens at \(A+2\). We can see that there is a temporal relation between A and B stating that B happens 2 time units after A. Similar to the after relation, Allen’s Interval Algebra [1]) presents in summary 13 possible temporal relations (i.e., before, during, overlaps, etc.).

Given the position of two arbitrary time points on a time axis, we may also compute their (temporal) distance, or in other words the duration or interval. In the above example, it is easy to see, that the duration between A and B is 2. We can use these definitions to also formulate constraints. For example we could require that an additional time point C has to occur some time in between A and B expressed by a simple inequality \(A< C < B\). In Sect. 6, we give a more detailed presentation of temporal constraints in data models.

With this representation of time, the provision of abstract datatypes of time (or, correspondingly, of date), we can analyze, in which way time is included in data models. In the next section we discuss the difference between data models, which represent the universe of discourse at a particular point in time versus data models which also include past (and probably future) states.

3 Snapshots vs. Historical Data Models

Several applications require handling information that varies over time [31]. Most notable examples include medical record keeping, bank account management, and stock management. In the context of stock management in the retail industry, for instance, the availability of goods is subject to high variability over time; similarly, prices of goods may change in relation to demand. For such applications, the conceptual design of a data model is often a necessary step preceding the implementation, e.g., into a database system.

For a data model which has to adequately represent such time-varying information, it is essential to support semantics for evolving entities and relationships between entities. A fundamental distinction between data models is hence based on the existence of a notion for changes over time. The lack of such a notion characterizes snapshot data models; on the contrary, its awareness and the existence of modeling constructs for it enable historical data models.

In a snapshot data model, the represented information corresponds to a timeless state of things in the real world. There is no notion of time, hence any change to data in an instance leads to the loss of past information, which may be undesirable in several applications.

In a historical data model, history of past states is kept, forming a sequence of snapshots. This requires adequate conceptual modeling, in particular for ensuring that in and across snapshots relations between entities remain consistent with the real world, despite data changes.

For example, at a given point in time, a person can be married to at most one other person (1:1 relationship between instances in a data model). If two people divorce, the marriage relationship is deleted in the current data model state. If a divorced person marries another person, a new relationship is introduced in the data model. This latter situation requires, in a historical data model, adequately modeling the 1:1 marriage relationship as a n:m relationship enriched with temporal information, in order to record all past relationships in a consistent manner (i.e., no concurrent marriages).

Reaching a conceptual model that is capable of capturing the temporal relations of instances is indeed an interesting problem. Like other models, the traditional, widely adopted, Entity-Relationship (ER) model lacks expressiveness for time-varying information. Several works have proposed temporal extensions to the ER model over the years: we refer the reader to [24] for an overview.

A possible solution allowing the modeling of time-varying information in an (extended) ER model is proposed in [6]. In their work, the authors propose an extension to the ER model that features the representation of advanced requirements on evolving data. In particular, the proposed model enables expressing temporal semantics for entity attributes, relationships, keys, and inheritance. A further advantage of the contribution is the support for the widely adopted relational model for the implementation of the data model.

4 Temporal Databases

Databases have been for decades the prime technology for implementing data models. As suggested in [40], a fundamental distinction can be drawn between database types based on their support to temporal data models. In particular, [40] identifies four types of databases: static, historical, static rollback, and temporal databases. We highlight their peculiarities here.

In a static database, the real world is represented through a snapshot at a particular point in time, which is considered to be the sole state of things over time. Of course, a snapshot may yield a state which does not necessarily reflect the current reality, as reality constantly mutates. Thus, as soon as any change in reality is recorded in such a static database, the past state, i.e., the information it yields about the real world, is discarded and lost forever. After a change is recorded, the new state is the one considered to correctly represent reality (currently and in the past as well as in the future), residing in the database timelessly.

To partially overcome the limitations of static databases, a historical database adds additional information about when in the real world there is a counterpart for the recorded information. This temporal information is also known as valid time. Technically, valid time is implemented as an additional field of database tables, and can be seen as an associated time axis to the data model, which becomes three-dimensional. The limitation of the valid time model is that it does not track any updates to a given tuple. Thus, the database stores information about a reality grounded in time (the valid time), which, however, has a timeless representation. Consequently, it is not possible to reconstruct the history of changes a tuple underwent.

The third type of database is the static rollback database, also known as transaction time database. In a static rollback database, records are extended with temporal information about when the record has been inserted, and possibly until when it is to be considered valid information. This temporal information is known as transaction time. Similar to valid time, transaction time is implemented as an additional field storing a time interval which records when a tuple was inserted and (potentially) when it was (logically) deleted. The limitation of this model is dual to the limitation of the valid time model. Indeed, the database stores information about a timeless reality, which, however, has a time-grounded representation (the valid time), so it does not track when in the real world facts are to be considered to hold.

Combining the previous approaches, i.e., storing both valid time and transaction time, allows database managers to overcome the respective limitations of databases which make use of either valid time or transaction time only. Storing both such times means associating tuples with two time axes, thus making the data model four-dimensional. The benefit of such an extension is that tuples can be seen as valid at any moment as of any moment in time, thus capturing any retroactive and proactive changes. Databases handling both valid and transaction time are better known with the designation bitemporal databases.

While temporal databases play a significant role in the storing of time-varying information, they are often insufficient for real-world problems requiring to analyze historical data, possibly of high dimension and resulting from aggregation of multiple data sources.

5 Temporal Multiversion Data Models

The multidimensional modeling of data warehouses [30] typically cared about representing time, as one of the features promised by data warehouse technology was always the analysis of data over time. One of the dimensions of a multidimensional data model is typically time, mostly in form of valid time or transaction time. The typical textbook examples show how facts like number of sales can be analyzed along dimensions like geographic area, product hierarchy, and, in particular, time.

The dimension time allows to represent the changes of facts indexed by the other dimensions over time, allowing the representation of time series. The hierarchical organization of the time dimension offers the possibility of analyzing these time series on different levels of temporal granularity.

So one could for an example ask queries like “which products have increased in sales in Europe over the last 2 years”. OLAP operators help to analyze the data and achieve insights.

There is, however, a fundamental problem with this multidimensional approach, as the dimensions are considered orthogonal. Orthogonality with respect to the dimension time means time invariant, or in other words, it is not possible to represent changes of other dimension data and structure over the time. Inadequate treatment of these temporal and evolution issues lead to wrong analysis of the available data ([28]) and in consequence to bad decisions.

For an example, to answer a query like“increase in the population of the European union in the last 40 years?” one has to be aware of the following: First of all, the geopolitical entity “European Union” only exists since 1993, succeeding the “European Community”, which itself was originally named “European Economic Community”. Furthermore, in the considered period (1981 to 2021), the European Union grew from 12 to 28 members and then reduced to 27 members in 2020, when the United Kingdom left the union. Finally, with the reunification of East- and West-Germany in 1990 one of the member countries had a massive internal reorganization. Comparing the numbers of 1990 and 1991, where the organization, the set of member states, itself did not change, may indicate a massive increase of inhabitants. In reality, the 1991 number also contains the 16.4 million people of former East-Germany. So if querying the number of inhabitants from 1983 to 2021, how can the resulting numbers be compared?

Temporal data warehousing [9, 16, 20] took up this issue of so called slowly changing dimensions [19] and developed a series of data models permitting both the multidimensional representation of facts changing over time as well as the changing master data and dimension structure data. Time series analysis can then be made by selecting reference time points for structures and by applying explicit mappings between the different versions of master and dimension data. A particular challenge for data warehousing and data mining of time series with changing schemas and master data is that the changes frequently are hidden and have to be detected when data is loaded into the data warehouse [15].

Not only data warehouses suffer from the problem of changing dimension data, but in general schemas of databases evolve over time due to changing requirements, changes in the universe of discourse, changes due to technological progress or changes in transaction profiles. The dominant way of dealing with these changes is to transform the already stored data to the new schema in a schema evolution process [10, 38]. However, there are applications, where it is necessary to keep the old data in its original form and make it available for querying through the new schema. Databases with multi-version schemas have the ambition to support such applications [26, 27].

6 Temporal Integrity Constraints

Data in many different areas, such as health, business, law and research, often have to fulfill, besides functional requirements, temporal requirements [13]. Such temporal requirements may be expressed as either descriptive or prescriptive temporal constraints. Descriptive constraints originate from given temporal properties of the environment (e.g., the law) relevant for the correctness or integrity of the data; prescriptive constraints originate from desirable properties (e.g., business goals) within the context of its use.

Independent from the origin, temporal constraints require data or events to be valid only in a certain time interval. We can express such constraints either as absolute (i.e., bound to a specific date), or as temporal relations between time points, e.g., in the form of before or after some event such as a transaction happened. In general, temporal constraints can be expressed as lower-bound or upper-bound constraints, or as combinations of them, between two time-points. A lower-bound constraint defines a minimum time span that needs to elapse between two time points, while an upper-bound constraint defines a maximum such time span. Lower-bound constraints are used in dynamic systems such as business processes to define precedence constraints, e.g., between actions, to define partial orderings. A pair of a lower- and an upper-bound constraint between two time points defines a duration constraint.

An example for the lower-bound constraint comes from the COVID-19 regulations in Austria, which requires that a vaccination is only valid if at least 22 days have elapsed since the administration of the first dose of the vaccine. Another COVID-19 regulation in Austria states that a certificate of anti-bodies (as verification of recovery) is valid at most three month after the registration. The protection against an infection provided by the vaccination, however, can be seen as duration, defined by the combination of a lower-bound constraint stating the minimum, and an upper-bound constraint stating the maximum time of protection.

A crucial aspect in the design phase of time-constrained data models is verifying whether a set of temporal integrity constraints may be contradicting, meaning that one constraint cannot be satisfied without violating another. The absence of such contradictions is something that we refer to as temporal correctness. Interestingly, the absence of contradictions in a set of temporal constraints is seen in the literature from different perspectives and according to different goals, which leads to a plethora of notions for temporal correctness [13]. Examples for such notions are the following:

  • Satisfiability, also called consistency: the most relaxed notion for temporal correctness. A set of temporal constraints is satisfiable (consistent) if there exists at least one configuration, or setting of the involved variables, which satisfies all constraints. Satisfiability, however, does not prohibit the existence of possibly invalid configurations for which some temporal constraint is violated.

    For an example, a constraint “take this drug 2 days before you get an infection” would be satisfiable, as for each time point of an infection there would be a proper time-point for the application of the drug. However, the earlier time-point cannot be changed due to a later observation. Hence the constraint is satisfiable, but this does not prevent that time-failures, violations of temporal constraints, are avoidable.

  • Strong controllability: a set of temporal constraints is strong controllable if for all possible configurations of the involved variables all constraints are satisfied. Strong controllability may be, however, too restrictive and not always possible, since it may happen that not all variables involved in temporal constraints may be controlled, but depend on external influences.

    The constraint in the example above, therefore, is satisfiable, but not controllable.

  • Conditional controllability: it is meaningful only in presence of conditional constraints, i.e., constraints which must hold only under conditions attached to them. A set of temporal constraints is conditional controllable if there exists a configuration of the involved variables satisfying all constraints for each possible observed values of conditions.

  • Dynamic controllability: it is meaningful only in dynamic systems, in which a controller is able to dynamically assign variables involved in temporal constraints. A set of temporal constraints is dynamically controllable if the controller can assign values in response to the past observations leading to the current state, so that all constraints can be fulfilled, despite uncertainties on future events.

Fig. 1.
figure 1

Example of an STNU

How can we make sure that a data model meets a certain degree of temporal correctness such as the above? Depending on the context of the application, temporal aspects are encoded in models with different degrees of abstraction and expressiveness. For example, in the area of Business Process Management practitioners use process models and activity diagrams, while in the area of formal verification methods temporal logic is the most adopted formalism. Also integrity constraints can be of different complexity and of different types. Applications might state different requirements for the correctness of temporal models. And finally, models might be descriptive, represent some universe of discourse, or they might be prescriptive, representing goals or obligations.

Depending on the temporal data model type, the specification formalism, and the considered temporal correctness notion, different techniques are available for the verification of such temporal correctness.

For static systems, for instance, satisfiability may be sufficient while dynamic controllability may be not needed. In such a case, any SAT solver may be adopted. For a dynamic system such as a business process, dynamic controllability may be a more suitable notion to adopt. In this case, verification approaches frequently found in the literature are based on transformations to various types of Simple Temporal Networks (STNs) [11, 29] or Timed Automata [46]. STNs are a formally grounded approach to represent temporal constraint satisfaction problems in the form of graphs, in which nodes encode time points and edges encode binary constraints between time points.

Considerable attention has been devoted to a particular STN type known as the STN with Uncertainty (STNU), due to its expressiveness for capturing real-world dynamic phenomena [43]. An example for an STNU is shown in Fig. 1.

An STNU includes semantics for uncontrollable assignments of time points, with the inclusion of contingent time points and edges. A contingent edge between two time points A and C (with C contingent) is expressed as (AluC), meaning that C occurs some uncontrollable time \(\varDelta \in [ l,u ]\) after A. Contingencies enable the representation of real-world situation characterized by uncontrollable, uncertain fragments. However, the presence of temporal uncertainty demands for advanced techniques for checking temporal correctness of an STNU.

Efficient algorithms have been proposed over the years, most of which are based on the notion of constraint propagation, i.e., the inference of implicit constraints [5]. Notably, these approaches have polynomial complexity, making the STNU a much suitable formalism for representing a broad number of temporal models and verifying their temporal correctness.

This glimpse on checking the controllability of sets of temporal constraints should highlight that checking temporal aspects in data models requires additional methods from the widely applied model checking approaches. As we showed, satisfiability which is typically considered in model checkers [4] is not sufficient for many applications, and more strict properties like controllability and dynamic controllability are necessary to ascertain that violations of temporal constraints are certainly avoidable.

7 Conclusions

Time is a peculiar phenomenon. Temporal aspects play a crucial role in many application domains of information technology. To capture temporal aspects correctly and to represent time properly in data models is inevitable for the development of information systems which yield accurate and useful answers for information needs. Hence, representing time and temporal aspects can be found in many technologies from conceptual modeling, information systems design, software development, to data science. We brought an overview about the perception of time in different data models and discussed techniques to represent time and deal with time, including the formulation of temporal constraints and checking fundamental properties of temporal models.