Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Statistical models for the analysis of highly complex data and processes have gained strong attraction within the last decade. Graphical models have been proven to be a sufficient approach of dealing with high dimensionality and uncertainty. A reduction of complexity is achieved by using factorizations of joint densities. Plotting the graph offers a simple and intuitive visualization of the underlying dependence structure. Besides, several statistical models can be seen as special cases of a general graphical model formalism (e.g. mixture models, factor analysis).

Most research on graphical models have focussed on cross-sectional data, henceforth referred to as static graphical models. As most prominent cases, this class includes directed acyclic graphs (also known as Bayesian networks) introduced by Pearl (1988) as well as undirected graphical models, which are also labelled as conditional independence graphs or Markov random fields (cf. Lauritzen 1996). Applications of static models can be found in different scientific fields including machine learning, decision modelling, artificial intelligence or image analysis. Markov random fields have played a prominent role in spatial econometrics and spatial statistics with regard to lattice data. Currently, a strong increase of interest in resp. usage of (social) network analysis emerged in economics as well as risk assessment or natural sciences. A profound treatment of static graphical models is presented in Lauritzen (1996), Cowell et al. (1999), Edwards (2000), Cox and Wermuth (1996), Pearl (1988), Spirtes (2000), Whittacker (2008) and Koller and Friedman (2010).

Recently, several extensions have been introduced aiming to model temporal dynamics and changing dependence structures including time series data as well as event histories. The objective of this paper is to review these extensions.

2 Graph Theoretic Preliminaries

Let \(\mathcal{G} = (\mathcal{V},\mathcal{E})\) denote a graph with \(\mathcal{V} = \left \{v_{1},\ldots,v_{k}\right \}\) as finite set of vertices and \(\mathcal{E}\subseteq \mathcal{V}\times \mathcal{V}\) as set of edges—joining the vertices where \(\mathcal{E}(\mathcal{G}) \cap \mathcal{V}(\mathcal{G}) =\emptyset\). Two vertices v i , v j are called adjacent if and only if \((v_{i},v_{j}) \in \mathcal{E}(\mathcal{G})\). Otherwise, v i , v j are non-adjacent. If an edge e i consists of a pair of identical nodes (v i , v i ) we refer to e i as a loop of \(\mathcal{G}\). Furthermore, if a distinct pair (v i , v j ) is joined by more than one edge, we label this multiple edges or parallel edges. In order to emphasize graphs containing multiple edges we term any such graph multigraph. Similarly, a graph is simple if it is not a multigraph. An undirected or unoriented edge exists if the pairs (v i , v j ) and (v j , v i ) are both in the edge set \(\mathcal{E}(\mathcal{G})\) given v i v j . We use \(v_{j} \sim v_{i}\) to indicate undirected edges. To specify adjacency in case of undirected edges we define the neighbourhood as \(\mathop{\mathrm{ne}}\nolimits \left (v_{j}\right ) = \{v_{i}: v_{j} \sim v_{i}\}\). If all edges in \(\mathcal{E}(\mathcal{G})\) are undirected \(\mathcal{G}\) is an undirected graph. In contrast, an edge is called directed or oriented if the ordered pair \((v_{i},v_{j}) \in \mathcal{E}(\mathcal{G}),\ v_{i}\neq v_{j}\). Thus, only \((v_{i},v_{j}) \in \mathcal{E}(\mathcal{G})\) while \((v_{j},v_{i})\notin \mathcal{E}(\mathcal{G})\). Directed edges are also referred to as arcs and we write v i ⟶ v j if there is a directed edge from v i to v j . Formally, for v i v j we define pa(v j ) = {v i : v i ⟶ v j } as the parents. In addition, if v i ⟶ v j we call {v j } children of v i and define ch(v i ) = {v j : v i ⟶ v j } (see Kolaczyk 2009; Bondy and Murty 2008). Additionally, let \(\mathop{\mathrm{co - pa}}\nolimits (v_{i}) = \{v_{j}:\mathop{ \mathrm{ch}}\nolimits (v_{j}) \cap \mathop{\mathrm{ch}}\nolimits (v_{i})\neq \emptyset \}\) be the co-parents of v i . Hence, if co − pa(v i ) = v j it follows that v i and v j share a common child whereas v i and v j have not necessarily to be joined by an edge. Consequently, a directed graph or digraph is a graph exclusively build on directed edges. A detailed discussion on digraphs is given in Bang-Jensen (2001). In addition, a multiple digraph is a multiple graph exclusively build on directed edges. Following Kolaczyk (2009), mutual edges are directed parallel edges with diametrically direction (e. g. if v i ⟶ v j and v j ⟶ v i are in \(\mathcal{E}(\mathcal{G})\)) whereas multi-arcs are multiple edges with identical orientation.

Let \(\left (v_{0},e_{1},v_{1},e_{2},\ldots,v_{k-1},e_{k},v_{k}\right )\) be a sequence of vertices and edges of \(\mathcal{G}\) with endpoints v 0 and v k such that \(\forall \ e_{i},1 \leq i \leq k\) the pair v i−1 and v i is joined by e i . We call this sequence w ik of potentially repeating pairs of vertices a walk of length k in \(\mathcal{G}\). If a walk passes through every node of a sequence exactly once we label this as a path. A path with identical endpoints is a cycle and a cycle of length one is a loop. Thus, a directed acyclic graph is a digraph without any directed cycles which is linked to undirected graphs by an operation termed moralization. In a first step an undirected edge is inserted joining every co-parents in \(\mathcal{G}\). Hereafter, every directed edge in \(\mathcal{G}\) is substituted by an undirected edge.

A graph build on directed as well as undirected edges is called a mixed graph. As a special case hereof, a chain graph is a simple mixed graph without any partially directed cycle where \(\mathcal{V}(\mathcal{G})\) is partitioned into k blocks \(\mathcal{B}_{k}\) such that \(\mathcal{V}(\mathcal{G}) = \mathcal{B}_{1} \cup \mathcal{B}_{2} \cup \ldots \cup \mathcal{B}_{k-1} \cup \mathcal{B}_{k}\) and

  1. 1.

    v i ⟶ v j if and only if \(v_{i} \in \mathcal{B}_{i}\) and \(v_{j} \in \mathcal{B}_{j},i < j\)

  2. 2.

    \(v_{i} \sim v_{j}\) if and only if \(v_{i} \in \mathcal{B}_{i}\) and \(v_{j} \in \mathcal{B}_{i}\).

Hence, undirected and directed acyclic graphs are special cases of chain graphs.

3 Causality in Graphical Models

Graphical models relate conditional independence statements among random variables of a multivariate probability distribution to graphs such that the nodes represent random variables and edges encode the dependence structure. Recently, these models have been extended to the time domain based on different definitions of causality. Eichler (2013), Didelez (2011) as well as Eichler and Didelez (2010) have discussed alternative formulations including Granger causality, Sims causality, local dependence and causality in terms of interventions. Hereof, Granger causality and local dependence have been most prominent with regard to graphical models. As shown in Florens and Fougere (1996) local dependence can be seen as a continuous time version of Granger causality.

Definition 1 (Granger Causality).

Let {X(t)} and {Y (t)} be stochastic processes on \(\left (\varOmega,\mathcal{F}, \mathbb{P}\right )\) where \(t \in T \subset \mathbb{Z}\). Given {Ω(t)} as all information in the universe \(\{X(t)\} \subset \{\varOmega (t)\}\) is causal with respect to {Y (t)} if the prediction is less precise based on {Ω(t)}∖{X(t)} (ct. Granger 1969; Lütkepohl 2005).

Different levels of Granger causal relations have been discussed in Florens and Fougere (1996). Obviously, conditional on all information in the universe seems impracticable and might be replaced by all available information with regard to a vector valued process.

Local dependence has been introduced by Schweder (1970) who focussed on transition intensities in discrete state-space Markov processes. As an extension hereof, Aalen (1987) considered non-causality in continuous time restricted to processes satisfying the Doob–Meyer decomposition.

Definition 2 (Local Independence).

Let \(\mathcal{F}_{t}\) denote the information which is available at time \(t \in T \subset \mathbb{R}_{+} = \left [0,\infty \right )\). Then, {Y (t)} is locally independent of {X(t)} given {Z(t)} if the compensator \(\varLambda ^{Y }(t)\) of {Y (t)} remains unchanged whether conditional on \(\mathcal{F}_{t-}^{XY Z}\) or conditional on \(\mathcal{F}_{t-}^{Y Z}\) (ct. Schweder 1970; Aalen 1987; Florens and Fougere 1996).

4 Graphical Modelling of Temporal Processes

Several graphical models have been developed to approach temporal stochastic processes either in the time as well as in the frequency domain. Misleadingly, different models have been named identical. Generally, two different classes can be differentiated with regard to the underlying definition of the nodes. Firstly, the vertex set can encode random variables at different times. Secondly, the components of a vector valued process can be represented by individual nodes which lead to a coarser modelling of the graphical structure.

4.1 Time Series Data

Most of the research regarding graphical modelling of time series data focussed on vector valued stationary processes in discrete time. Different approaches of graphical modelling towards autoregressive processes are summarized in Songsiri et al. (2010).

Static graphical models have been applied to time series data manifold. Chain graph models in which time slices are represented by blocks have been discussed in Lynggarrd and Walther (1993) and Dahlhaus and Eichler (2003). Furthermore, Queen and Smith (1992) and Anacleto and Queen (2013) introduced a dynamic chain graph model based on multivariate Bayesian dynamic models.

Extensions of directed acyclic graphs (so-called dynamic Bayesian networks) have extensively been treated by Murphy (2002). Defined as a sequence of directed acyclic graphs such that time is displayed in form of stacked time slices these models include hidden Markov models and Kalman filters besides other latent state-space models as special cases. An overview of different models belonging to this class of graphs is given in Barber and Cemgil (2010). Anacleto et al. (2013a,b) extended the multiregression dynamic model as introduced in Queen and Smith (1993) and used directed acyclic graphs to analyse multivariate time series from traffic flows. These graphs have been called Bayesian dynamic graphical models.

Several papers have focussed on structural vector autoregressive processes and discussed the sufficiency of learning structural constraints from static graphical models. Moneta (2008) proposed the usage of a structural learning algorithm with regard to directed acyclic graphs in order to obtain the contemporaneous dependence structure. Alternatively, Oxley et al. (2008), Meurk et al. (2007) as well as Penny and Reale (2004) presented a two-step estimation procedure based on a novel graph linkage called demoralization. Thereby, the optimal directed acyclic graph is chosen from a list of suitable graphs obtained from an undirected graphical model by the inverse moralization operation.

Additionally, sequences of undirected Gaussian graphical models arranged in independent and identically distributed blocks have been used by Talih and Hengartner (2005), Talih (2003) and Cai and Li (2012) to capture changing dependence structures of multivariate time series. Thus, a new block only emerges if a new edge is included or excluded into \(\mathcal{E}(\mathcal{G})\) such that consecutive blocks contain different graphs. Xuan and Murphy (2007) presented a similar approach. Gao and Tian (2010) proposed a mixed graphical model called latent ancestral graph to model latent variables in case of structural vector autoregressive processes.

A first approach focussing on graphical models related to components of vector valued stationary time series in discrete time has been presented by Brillinger (1996) in frequency domain. Dahlhaus (2000) introduced a refined version hereof called partial correlation graph based on partial spectral coherence between components of a multivariate time series. The resulting graph is a simple undirected graph in which the nodes correspond to the components of a vector valued process. Partial spectral coherence measures the dependence of two components after removing linear time invariant effect of the remaining series (cf. Brillinger 1981). Thus, two components are conditional orthogonal if the partial error processes are uncorrelated after filtration. This is equivalent to conditional independence only in cases of Gaussian time series. Efficiently, instead of computing the correlation between partial error processes conditional orthogonality can equivalently be achieved from the partial spectra coherence and similarly been read of zero entries of the inverse spectral matrix. Partial correlation graphs have been applied in various fields by Gather et al. (2002), Fried et al. (2004), Feiler et al. (2005) and Allali et al. (2008). Avventi et al. (2013) discussed the usage of these graphs in case of autoregressive moving average processes. Additionally, structural learning has been treated in Bach and Jordan (2004).

Eichler (19992012) has introduced a mixed graph called Granger causality graph since it encodes Granger causal as well as contemporaneous relationships between time series components. This graph has also been called dynamic chain graph by Murphy (2002). Corander and Villani (20032006) discussed Granger causal graphs from a Bayesian perspective. Additionally, Marttinen and Corander (2009) dealt with the task of Bayesian learning of such graphs. Application of Granger causal graphs are presented in Wild et al. (2010), Allali et al. (2011) and Arnold et al. (2007). Focussing on latent variables Eichler (2010) recently introduced extended Granger causality graphs related to autoregressive moving average processes which he called dynamic maximal anchestral graphs. In difference to Granger causality graphs these graphs consist of one additional edge type.

4.2 Event History Data

Based on the concept of local dependence as described in Sect. 3 Didelez (2000) introduced extensions of directed acyclic graphs with regard to counting processes which she termed local dependence graphs. These models are defined in case of marked point processes (Didelez 2008) as well as composable finite Markov processes (Didelez 2007). Similarly, Gottard (2007) presented so-called graphical duration models as extensions of chain graph models displaying marked point processes. Dreassi and Gottard (2007) dealt with Bayesian estimation of this duration models. This graphs might also allow to model frailty terms besides the hierarchical structures discussed in Gottard and Rampichini (2007). Additionally, Fosen et al. (2006) derived dynamic path analysis models based upon Aalens additive hazard model. A further discussion of these models is given in Aalen et al. (2008), whereas the large sample properties are derived in Martinussen (2010). Alternatively, dynamic Bayesian networks build on two time slices have been used to model duration data. Confusingly, Donat et al. (20082010) named these models also graphical duration models.

Graphical modelling of counting processes in the frequency domain has been described in Dahlhaus et al. (1997) and Eichler et al. (2003). In both papers partial correlation graphs are used to model intensity functions.

4.3 Beyond Discrete Time

Besides discrete time a limited number of papers have been published with regard to stochastic processes that evolve in continuous time. Shelton et al. (2010) and Nodelman et al. (20022003) introduced continuous time Bayesian networks (CTBN). Alternatively, El-Hay et al. (2006) presented an extended version of undirected graphical models which they called continuous time Markov network (CTMN).

5 Discussion

This review has shown the state of the art and the great variety of graphical models in the temporal domain. Mostly, they are closely related to classical graphical models and satisfy the traditional Markov properties. The main difference between the approaches arises with regard to the inherent definition of the nodes. Relating the vertices to the components of a multivariate process leads to a coarser modelling of the dependence structure. Oppositely, the dimension of the graphical model strongly increases in case of temporally separated nodes as \(dim(\mathcal{G}) = N \times T\). This might negatively impact the costs of estimation and computational efficiency, especially in case of high dimensional data structures evolving in time. Contrary, parameter as well as structural estimation can be built on already existing algorithms when applying static models to time series data.

Additionally, Granger causality graphs are strongly effected by the choice of the correct time intervals taking into account. Hence, larger intervals correspond to marginalization over time and create additional correlation. Nonetheless, the graphs are suitable models for structural learning in high dimensional time series and might easily be extended to more complex dimensions. Furthermore, extensions to non-linear relationships can be achieved conditioning on \(\sigma\)-algebras.