Detecting Causality in Non-stationary Time Series Using Partial Symbolic Transfer Entropy: Evidence in Financial Data

Papana, Angeliki; Kyrtsou, Catherine; Kugiumtzis, Dimitris; Diks, Cees

doi:10.1007/s10614-015-9491-x

Detecting Causality in Non-stationary Time Series Using Partial Symbolic Transfer Entropy: Evidence in Financial Data

Published: 03 February 2015

Volume 47, pages 341–365, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Economics Aims and scope Submit manuscript

Detecting Causality in Non-stationary Time Series Using Partial Symbolic Transfer Entropy: Evidence in Financial Data

Download PDF

Angeliki Papana ORCID: orcid.org/0000-0003-4472-8274¹,
Catherine Kyrtsou^1,2,3,4,
Dimitris Kugiumtzis⁵ &
…
Cees Diks⁶

2177 Accesses
56 Citations
Explore all metrics

Abstract

In this paper, a framework is developed for the identification of causal effects from non-stationary time series. Focusing on causality measures that make use of delay vectors from time series, the idea is to account for non-stationarity by considering the ranks of the components of the delay vectors rather than the components themselves. As an exemplary measure, we introduce the partial symbolic transfer entropy (PSTE), which is an extension of the bivariate symbolic transfer entropy quantifying only the direct causal effects among the variables of a multivariate system. Through Monte Carlo simulations it is shown that the PSTE is directly applicable to non-stationary in mean and variance time series and it is not affected by the existence of outliers and VAR filtering. For stationary time series, the PSTE is also compared to the linear conditional Granger causality index (CGCI). Finally, the causal effects among three financial variables are investigated. Computations of the PSTE and the CGCI on both the initial returns and the VAR filtered returns, and the PSTE on the original non-stationary time series, show consistency of the PSTE in estimating the causal effects.

A Nonparametric Causality Test: Detection of Direct Causal Effects in Multivariate Systems Using Corrected Partial Transfer Entropy

Fast and effective pseudo transfer entropy for bivariate data-driven causal inference

Article Open access 19 April 2021

Identification of causal relationships in non-stationary time series with an information measure: Evidence for simulated and financial data

Article 22 July 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The investigation of interactions among the components of a multivariate system addresses three major issues: the detection of the couplings, their direction, and the quantification of the coupling strengths. When evaluating the causal influence between two variables from a multivariate time series, it is necessary to take the effects of the remaining variables into account. Multivariate analysis is required to distinguish between direct and indirect causal effects.

The concept of Granger causality is instrumental in the study of dynamic interactions in multivariate systems (Granger 1969). Linear Granger causality suggests that causes always precede their effects and it is implemented by fitting autoregressive models. However, the selected model should be appropriately matched to the underlying dynamics of the examined system, otherwise model misspecification may lead to spurious identification of causality.

Stationarity is not expected when examining real data possessing non-constant mean and variance. Preliminary data treatment (i.e. detrending, differencing, filtering) can be used to deal with non-stationarity, e.g. see Wei (2006) and Bossomaier et al. (2013).

In econometrics, causality in non-stationary time series in the mean is typically investigated through vector error correction models (VECM), and it is subdivided into short-run and long-run (Lee et al. 2002; Cheng et al. (2010)). In this respect, cointegration between two variables implies the existence of long-run causality in at least one direction and a cointegration test can be viewed as an indirect test of long-run dependence (Engle and Granger 1987). Testing for cointegration and causality are thus jointly applied to investigate long- and short-run relationships among variables. Regarding non-stationarity in variance, several methods have been proposed in the literature, e.g. model fitting allowing for a time-varying variance and heteroskedasticity tests (Xu and Phillips 2008; Kim and Park 2010), but we are not aware of any works treating the problem of causality and non-stationarity in variance jointly.

Most Granger causality measures are developed for stationary time series, e.g. conditional Granger causality (Geweke 1982), partial directed coherence (Baccala and Sameshima 2001), coarse-grained information rates (Paluš et al. 2001), extended Granger causality (Chen et al. 2004), and conditional mutual information (Vejmelka and Paluš 2008). Methods, such as transfer entropy (Schreiber 2000) from information theory and linear Granger causality, are theoretically invariant under a rather broad class of transformations (Barnett and Seth 2011). However, in practice, data transformations may have an impact on causal inference. Recently, many model-free causality measures have been developed to address nonlinear signal properties, as for example state space and information measures. On the other hand, these methods involve more free parameters and are more data demanding than linear model-based methods, such as linear Granger causality.

In financial applications, most causality tests are not applied to the raw data but to the (log) returns. For example, we can mention the modified test of nonlinear Granger causality that has been introduced by Hiemstra and Jones (1994), corrected by Diks and Panchenko (2006), and it is usually applied on the Vector Auroregressive (VAR) filtered residuals. It is, however, reported that linear filtering of the data before the application of a causality test can lead to serious distortions, e.g. see Kyrtsou (2005) and Karagianni and Kyrtsou (2011). On the other hand, it is claimed that the estimation of information-theoretical quantities is typically improved by diminishing long-range second-order temporal structure using VAR filters, provided that the interactions between time series are not purely linear (Gomez-Herrero 2010). The influence of filtering on the different causality tests remains open for further investigation, but it is not within the scope of the present work.

The developments above highlight the importance of building causality tests able to take into account causal effects directly in non-stationary time series. In this work, we propose a general framework to address non-stationarity when estimating causality which encompasses all causality measures that involve the delay vectors in their computation. Specifically, we suggest to formulate and utilize the rank vector of the corresponding sample vectors reconstructed from the time series, instead of the delay vectors themselves.

The idea of using ranks instead of the values of a vector variable dates back to Spearman (1904) and Kendall (1938) suggesting the estimation of the statistical dependence between two variables. This idea has been adopted for the estimation of correlation and causality measures. Along these lines, the symbolic transfer entropy (STE) (Staniek and Lehnertz 2008) and the generalized measure of association (Fadlallah et al. 2012) have been introduced.

To demonstrate the efficiency of the proposed framework based on rank vectors, we extend the bivariate information causality measure of STE (Staniek and Lehnertz 2008) to the multivariate case, called partial symbolic transfer entropy (PSTE), in order to account only for direct causal effects among the components of a complex system. The PSTE, as the STE, is estimated on rank vectors. It is evaluated on multivariate time series of known coupled and uncoupled systems, on stationary and non-stationary time series in mean and in variance, on time series with outliers, and on VAR filtered time series as well. Complementarily and for comparison reasons, the conditional Granger causality index (CGCI) is also considered.

A corrected version of the STE and PSTE (namely TERV and PTERV) have been recently introduced in Kugiumtzis (2012, 2013), but here we consider the initial definition of STE, as used in different applications (Kowalski et al. 2010; Ku et al. 2011; Martini et al. 2011). To get further insight on the performance of the suggested approach, besides an extensive simulation experiment, we look for causal relationships between three well-known financial time series, namely the 3-month Treasury Bill, the 10-year Treasury Bond and the volatility index VIX.

The structure of the paper is as follows. In Sect. 2, the multivariate causality measures of PSTE and conditional Granger causality index are presented, and their statistical significance is discussed. In Sect. 3, the two causality measures are evaluated in a simulation study, while their performance is also examined in three financial time series. Finally, conclusions are discussed in Sect. 4.

2 Materials and Methods

Let us consider the bivariate process ${(x_{1,t},x_{2,t})}$, i.e. two simultaneously observed time series $\{x_{1,t}\}$, $\{x_{2,t}\}$, $t=1,\ldots ,n$ derived from the dynamical systems $X_1$ and $X_2$, respectively. The delay vectors for $X_1$ and $X_2$ are defined as $\mathbf x _{1,t}$ $=(x_{1,t}$, $x_{1,t-\tau _1},\ldots $, $x_{1,t-(m_1-1)\tau _1})'$, $\mathbf x _{2,t}$ $=(x_{2,t}$, $x_{2,t-\tau _2},\ldots $ ,$x_{2,t-(m_2-1)\tau _2})'$, where $t=1,\ldots ,n^{\prime }$, $n^{\prime } = n-h-\max \{(m_1-1)\tau _1,(m_2-1)\tau _2\}$, $m_1$ and $m_2$ are the embedding dimensions, $\tau _1$ and $\tau _2$ are the time delays and $h$ is the step ahead to address for the interaction. The rank vectors are formed by ordering the amplitude values of the delay vectors. Considering the delay vector $\mathbf x _{1,i}$, the $m_1$ amplitude values are arranged in an ascending order so that $x_{1,t -(r_{i,1}-1)\tau _1} \le x_{1,t -(r_{i,2}-1)\tau _1} \le \ldots \le x_{1,t -(r_{i,m}-1)\tau _1}$, where $r_{i,j}$, $j=1,\ldots ,m$, are all different and $r_{i,j} \in \{1,\ldots ,m_1\}$. Therefore, every delay vector is uniquely mapped onto one of the $m_1!$ possible permutations. The rank vectors for $X_1$ are defined as $\hat{\mathbf{x }}_{1,i} = (r_{i,1}, r_{i,2},\ldots , r_{i,m_1})$ and accordingly for $\mathbf x _{2,i}$. The advantage of using ranks is that vectors formed by time series segments at different levels of magnitude can be compared in terms of distance, and thus similar data patterns can be searched regardless of their magnitude levels, accounting in this way for non-stationarity.

To indicate the suitability of this approach for non-stationary time series, we take the example of a stationary time series $\{x_{t}\}$, with outliers added to it, denoted as $\{y_{t}\}$ (see Fig. 1). We construct also the time series $\{z_{t}\}$ by adding a linear trend to $\{x_{t}\}$: $z_t = x_t + 0.1t$ (Fig. 1c). Further, we consider the embedding dimension $m=4$ and the time delay $\tau =1$, while we highlight all the delay vectors with corresponding rank vectors $\{2,1,4,3\}$. For $\{x_t\}$, we observe 8 delay vectors in total with corresponding rank vector $\{2,1,4,3\}$. In $\{y_t\}$ there are again 8 delay vectors, all of which are at the same time points as in $\{x_t\}$, while in $\{z_t\}$ there are 6 in total delay vectors all of which are at the same time points as in $\{x_t\}$. We note that all the highlighted delay vectors have identical rank vectors ($\{2,1,4,3\}$), whereas the corresponding sample vectors (delay vectors) are not necessarily close.

Thus one can base the distance measure on the relative magnitude ordering and not the sample values of the delay vectors of the time series. The estimation of the probability of occurrence of the rank vectors can be more robust than in the case of the delay vectors. The possible combinations of the rank vectors are $m!=4!$, while using a binning approach for the delay vectors with $b$ bins, there are $b^m$ possible vectors for each component.

Therefore, measures that make use of embedding point distances, e.g. interdependence measures (Arnhold et al. 1999; Romano et al. 2007; Chicharro and Andrzejak 2009)^{Footnote 1} and information measures can be modified to use ranks instead of samples. As an exemplary measure that uses rank vectors, we introduce here the PSTE.

2.1 Partial Symbolic Transfer Entropy

The transfer entropy (TE) is an information measure related to the concept of Granger causality, which has been utilized for the detection of the directional couplings and the asymmetry in the interaction of subsystems (Schreiber 2000). The TE and its multivariate extension, the partial transfer entropy (PTE), incorporate time dependence by relating previous values of two variables $X_1$ and $X_2$ in order to predict $X_1$ (or similarly $X_2$) $h$ steps ahead. The TE quantifies the deviation from the generalized Markov property, $p(x_{1,i+h}|\mathbf x _{1,i},\mathbf x _{2,i}) = p(x_{1,i+h}|\mathbf x _{1,i}) $, where $p$ denotes the transition probability density. If the generalized Markov property holds, then $X_2$ does not drive $X_1$. Different techniques have been proposed to estimate the TE and PTE from observed data, e.g. binning, kernel methods and nearest neighbor estimators (Cover and Thomas 1991; Silverman 1986; Kraskov et al. 2004).

The STE has been introduced aiming to provide an alternative way of estimating the TE, i.e. in terms of rank vectors (Staniek and Lehnertz 2008). For each of $x_{1,i+h}$, $\mathbf x _{1,i}$ and $\mathbf x _{2,i}$ first the rank vectors are formed denoted $\hat{\mathbf{x }}_{1,i+h}$, $\hat{\mathbf{x }}_{1,i}$ and $\hat{\mathbf{x }}_{2,i}$. Note that the scalar future response $x_{1,i+h}$ is treated as an embedding vector $\mathbf x _{1,i+h}$. Then the STE is expressed similarly to TE as

$$\begin{aligned} \text{ STE }_{X_2 \rightarrow X_1} = \sum p(\hat{\mathbf{x }}_{1,t+h},\hat{\mathbf{x }}_{1,t},\hat{\mathbf{x }}_{2,t}) \log \frac{p(\hat{\mathbf{x }}_{1,t+h}|\hat{\mathbf{x }}_{1,t}, \hat{\mathbf{x }}_{2,t})}{p(\hat{\mathbf{x }}_{1,t+h}|\hat{\mathbf{x }}_{1,t})}, \end{aligned}$$

(1)

where $p(\hat{\mathbf{x }}_{1,t+h},\hat{\mathbf{x }}_{1,t},\hat{\mathbf{x }}_{2,t})$, $p(\hat{\mathbf{x }}_{1,t+h}|\hat{\mathbf{x }}_{1,t},\hat{\mathbf{x }}_{2,t})$ and $p(\hat{\mathbf{x }}_{1,t+h}|\hat{\mathbf{x }}_{1,t})$ are the joint and conditional distributions estimated on the rank vectors as relative frequencies, respectively.

The PSTE is the extension of the STE that accounts only for direct causal effects in multivariate systems. It is defined conditioning on the set of the remaining variables $Z=\{X_3, X_4,\ldots ,X_K\}$ of a multivariate system of $K$ observed variables

$$\begin{aligned} \text{ PSTE }_{X_2 \rightarrow X_1|Z} = \sum p(\hat{\mathbf{x }}_{1,t+h},\hat{\mathbf{x }}_{1,t},\hat{\mathbf{x }}_{2,t}, \hat{\mathbf{z }}_t) \log \frac{p(\hat{\mathbf{x }}_{1,t+h}|\hat{\mathbf{x }}_{1,t}, \hat{\mathbf{x }}_{2,t},\hat{\mathbf{z }}_t)}{p(\hat{\mathbf{x }}_{1,t+h}|\hat{\mathbf{x }}_{1,t},\hat{\mathbf{z }}_t)}, \end{aligned}$$

(2)

where the rank vector $\hat{\mathbf{z }}_{t}$ is formulated as the concatenation of the rank vectors for each of the delay vectors of the variables in $Z$.

The PSTE is a measure formed on nonparametric estimators from information theoretical arguments. Its definition is built on the probability distributions or equivalently on conditional entropies, and quantifies the reduction in conditional uncertainty of $\hat{\mathbf{x }}_{1,t+h}$ when the conditioning changes from $\hat{\mathbf{x }}_{1,t},\hat{\mathbf{z }}_{t}$ to $\hat{\mathbf{x }}_{2,t},\hat{\mathbf{x }}_{1,t},\hat{\mathbf{z }}_{t}$. Causality is defined in terms of predictive power using an information theoretical statistic rather than linear modeling tools and thus it accounts for nonlinearity in the data. Similarly to PSTE, also other causality measures calculated using the delay vectors of the time series could be estimated on the corresponding rank vectors.

2.2 Conditional Granger Causality Index

For comparison reasons, the Conditional Granger Causality Index (CGCI) is also considered in this study (Geweke 1982). To define CGCI from $X_2$ to $X_1$ for a multivariate time series of the variables $\{X_1,X_2,\ldots ,X_K\}$, two vector autoregressive models (VAR) are considered, the unrestricted model

$$\begin{aligned} x_{1,t+1} = \sum _{j=0}^{P-1} a_{1,j}x_{1,t-j}+ \sum _{j=0}^{P-1} a_{2,j}x_{2,t-j} + \sum _{i=3}^K \sum _{j=0}^{P-1} a_{i,j}x_{i,t-j} + \epsilon _{U,t+1}, \end{aligned}$$

(3)

and the restricted model

$$\begin{aligned} x_{1,t+1} = \sum _{j=0}^{P-1} a_{1,j}x_{1,t-j}+ \sum _{i=3}^K \sum _{j=0}^{P-1} a_{i,j}x_{i,t-j} + \epsilon _{R,t+1}, \end{aligned}$$

(4)

where $a_{i,j}$ are coefficients and $\epsilon _{U,t}$ and $\epsilon _{R,t}$ are residual terms. If the variance $s_{U}^2$ of the residuals of the unrestricted model in Eq. 3 for $X_1$ is statistically significantly less than the residual variance $s_{R}^2$ of the restricted model for $X_1$ in Eq. 4 that does not include $X_2$, then there is statistical evidence that the variable $X_2$ Granger causes $X_1$. The magnitude of the effect of $X_2$ on $X_1$ in the presence of the other variables is given by the CGCI defined as

$$\begin{aligned} \text{ CGCI }_{X_2 \rightarrow X_1|Z} = \ln \left( s_{R}^2 / s_{U}^2\right) . \end{aligned}$$

(5)

The CGCI is a causality measure able to detect the direct causal effects in multivariate systems with linear couplings.

2.3 Statistical Significance of the PSTE and CGCI

Kugiumtzis (2013) discussed the parametric approximation of the null distribution $H_0$ of no coupling for PSTE (and the corrected version PTERV) was discussed but found it insufficient in general and always inferior to approximation based on resampling. Therefore, the statistical significance of the PSTE is assessed by a randomization test making use of time-shifted surrogates (Quian Quiroga et al. 2002). The surrogate time series are formed by time-shifting the time series of the driving variable by a random time step, while the other time series remain intact. By this, the driving and the response time series become independent to each other and the couplings are destroyed. Explaining further time-shifting, we draw a random integer $d$ (with $d$ less than the time series length $n$), and the first $d$ values of the driving time series are moved to the end, so that the new driving series is $\{x_{d+1},\ldots ,x_{n},x_{1},\ldots ,x_d\}$.

To test $H_0$, denote $q_0$ the PSTE value estimated from the original data and $q_1,\ldots ,q_M$ the PSTE values estimated from the $M$ surrogate multivariate time series. $H_0$ is rejected if $q_0$ lies at the tail of the distribution of $q_1,\ldots ,q_M$. The $p$-values for the two-sided test are derived by rank ordering. Letting the original value have rank $i$ in the ordered list of $M+1$ values, the $p$-value equals $2i/(M+1)$ if $i \le (M+1)/2$ and $2(M+1-i)/(M+1)$ if $i > (M+1)/2$ (the correction of the rank approximation of the cumulative density function in Yu and Huang (2001) is applied).

The statistical significance of the CGCI can be assessed by means of a parametric test, i.e. the $F$-test for the null hypothesis that the coefficients for the driving variable in the unrestricted model are zero (Brandt and Williams 2007). For example, applying the $F$-significance test for each of the $P$ coefficients $a_{2,j}$ in Eq. 3, constitutes the parametric significance test for CGCI to test the null hypothesis that variable $X_2$ is not driving $X_1$.

3 Results

The effectiveness of the PSTE in detecting direct nonlinear causal effects at different settings is assessed based on a simulation study. The PSTE and the CGCI are complementarily used, in order to determine both the linear and nonlinear couplings from the simulation systems. The two causality measures are estimated from 100 realizations of different simulation systems with linear and/or nonlinear couplings, for different coupling strengths and for all directions. However, the CGCI is only estimated on stationary data.

3.1 Simulation Study

The PSTE and CGCI are evaluated on multivariate time series from coupled and uncoupled systems of different types: stationary, non-stationary in mean and in variance, with outliers, with linear and / or nonlinear causal effects. We also apply the PSTE on VAR filtered time series in order to assess the ability to capture remaining nonlinear couplings. Specifically, the following simulation systems are examined:

(1)
A stationary system in three variables with one linear coupling ($X_2 \rightarrow X_3$) and two nonlinear ones ($X_1 \rightarrow X_2$, $X_1 \rightarrow X_3$) (Gourévitch et al. 2006, Model 7) (see Fig. 2a)
$$\begin{aligned} x_{1,t}&= 3.4x_{1,t-1}(1-x_{1,t-1})^2 \exp {(-x_{1,t-1}^2)} + 0.4\epsilon _{1,t} \\ x_{2,t}&= 3.4x_{2,t-1}(1-x_{2,t-1})^2 \exp {(-x_{2,t-1}^2)} + 0.5x_{1,t-1}x_{2,t-1} +0.4\epsilon _{2,t} \\ x_{3,t}&= 3.4x_{3,t-1}(1-x_{3,t-1})^2 \exp {(-x_{3,t-1}^2)} + 0.3x_{2,t-1}+ 0.5x_{1,t-1}^2 + 0.4\epsilon _{3,t}, \end{aligned}$$
where $\epsilon _{i,t}$, $i=1,2,3$, are Gaussian white noise terms with unit covariance matrix.
(2)
A stationary system in three variables, with only nonlinear couplings ($X_1 \rightarrow X_2$, $X_1 \rightarrow X_3$) (see Fig. 2b)
$$\begin{aligned} x_{1,t}&= 0.7x_{1,t-1} + \epsilon _{1,t} \\ x_{2,t}&= 0.3x_{2,t-1} + 0.5x_{2,t-2} x_{1,t-1} + \epsilon _{2,t} \\ x_{3,t}&= 0.3x_{3,t-1} + 0.5x_{3,t-2} x_{1,t-1} + \epsilon _{3,t}. \end{aligned}$$
The model restricted to the two first variables was introduced in Baghli (2006). The term product of the variables in the second and third equation causes the variables $X_2$ and $X_3$ to have marginal distributions with long tails.
(3)
A stationary system of three coupled Hénon maps with nonlinear couplings ($X_1 \rightarrow X_2$, $X_2 \rightarrow X_3$) (see Fig. 2c)
$$\begin{aligned} x_{1,t}&= 1.4 - x_{1,t-1}^2 + 0.3x_{1,t-2} \\ x_{2,t}&= 1.4 - c x_{1,t-1} x_{2,t-1} - (1-c)x_{2,t-1}^2 + 0.3x_{2,t-2} \\ x_{3,t}&= 1.4 - c x_{2,t-1} x_{3,t-1} - (1-c)x_{3,t-1}^2 + 0.3x_{3,t-2}, \end{aligned}$$
with equal coupling strengths $c$ for $X_1 \rightarrow X_2$ and $X_2 \rightarrow X_3$, with $c = 0$, 0.05, 0.3, 0.5. The time series of this system become completely synchronized for coupling strengths $c \ge 0.7$.
(4)
A system of four coupled Hénon maps with nonlinear couplings (two unidirectional $X_1 \rightarrow X_2$, $X_4 \rightarrow X_3$ and a bidirectional coupling $X_2 \leftrightarrow X_3$) (see Fig. 2d), defined as
$$\begin{aligned} x_{i,t}&= 1.4 - x_{i,t-1}^2 + 0.3x_{i,t-2}, i=1,4 \\ x_{i,t}&= 1.4 - \left( 0.5c(x_{i-1,t-1}+x_{i+1,t-1})+(1-c)x_{i,t-1}\right) ^2 + 0.3x_{i,t-2}, i=2,3 \end{aligned}$$
for coupling strengths $c=0$ (uncoupled case), $c=0.2$ (weak coupling) and $c=0.4$ (strong coupling).
(5)
A stationary system with outliers, from the three coupled Hénon maps (system 3), where outliers have been randomly added to each variable drawn from the standard uniform distribution. The number of outliers constitute $1~\%$ of the total number of data points.
(6)
A non-stationary system in level (mean), from the three coupled Hénon maps (system 3), where a stochastic trend $\eta _t = \eta _{t-1} + \epsilon _t$ is added to each variable; $\epsilon _t$ is Gaussian white noise with unit variance. The CGCI is estimated on the detrended time series.
(7)
A non-stationary system in level (mean), from the three coupled Hénon maps (system 3) where a deterministic trend $\eta _t = a \cdot t$ is added to each variable, and $a$ is a constant. The value of $a$ is randomly set for each realization of the system and normally distributed with mean $0.01$ and standard deviation $0.02$. The CGCI is estimated on the first differences of the data.
(8)
A system which is non-stationary in variance, resulting from the addition of an integrated generalized autoregressive conditional heteroskedasticity process of order (1,1), IGARCH (1,1), to system 2:
$$\begin{aligned} z_t&= \sigma _t \epsilon _t \\ \sigma _{t}^2&= \alpha _0 + \alpha _1 \epsilon _{t-1}^2 + \beta _1 \sigma _{t-1}^2, \end{aligned}$$
where $\epsilon _t$ is Gaussian white noise with unit variance, $\alpha _0 = 0.2$, $\alpha _1 = 0.9$ and $\beta _1 = 0.1$. The $z_{i,t}$ of IGARCH (1,1) is first multiplied by a factor $g$ and then added to each $x_i$, $i=1,2,3$ of system 2, so that the derived time series of $y_i$ is $y_{i,t}=x_{i,t} + g z_{i,t}$, $i=1,2,3$.
(9)
It is a common practice in financial applications, to estimate causality measures or apply causality tests to the VAR residuals of the data in order to specify the underlying nature of the couplings. However, the influence of the filtering on the different causality measures and tests has not been fully investigated so far. For this reason, we consider here the VAR filtered residuals of system 1. The order of the VAR filter is set from the Schwarz’s Bayesian Information Criterion (BIC) (Schwartz 1978), for each realization.
(10)
Finally, we consider a VAR(3) process in three variables with linear causal effects $X_2 \rightarrow X_1$ and $X_3 \rightarrow X_1$, which is non-stationary in mean and there is one co-integrating relationship between the variables (see Sharp (2010), Model 8, p.78):
$$\begin{aligned} x_{1,t}&= 0.4x_{1,t-1}+0.4x_{2,t-1}+0.5x_{3,t-1} \\&+\, 0.2x_{1,t-2}-0.2x_{2,t-2} \\&-\, 0.2x_{1,t-3}+0.15x_{2,t-3}+0.1x_{3,t-3}+\epsilon _{1,t} \\ x_{2,t}&= 0.6x_{2,t-1}+0.2x_{2,t-2} +0.2x_{2,t-3}+\epsilon _{2,t} \\ x_{3,t}&= 0.4x_{3,t-1}+0.3x_{3,t-2} +0.3x_{3,t-3}+\epsilon _{3,t}, \end{aligned}$$
where $\epsilon _{i,t}, i = 1,\ldots ,3$ are independent to each other Gaussian white noise processes with unit standard deviation. Further, in order to generate a non-stationary system both in mean and variance, we add to this stochastic system an IGARCH(1,1) multiplied by the factor $g=0.2$, as for System 8.

The time series lengths $n=512$ and 2048 are considered in the simulation study, to test the effectiveness of the measures on relatively small and large time series lengths. Larger time series lengths have not been considered due to the long calculation time that is required. For the PSTE, the time lag $\tau _i$ for all variables is set to $\tau =1$, as all the systems are discrete in time. The embedding dimension $m_i$ is identical for all variables (denoted as $m$) and for each system it is set according to its complexity. The number of time steps ahead $h$ equals 1, as in the original definition of transfer entropy (Schreiber 2000). For the estimation of the order $P$ of the VAR model used in CGCI, the Bayesian Information Criterion (BIC) (Schwartz 1978) is applied to model orders from 1 to 5 for all systems, taking into consideration that the true model order for each system lies within this range.

3.2 Results from Simulation Study

The performance of the PSTE and the CGCI is quantified by the percentage of statistically significant values in the 100 realizations for all the ordered couples of variables in the system, i.e. the percentage of rejections of the null hypothesis $H_0$ of no causal effects. For both measures, the causal effects are always regarded to be conditioned on the remaining variables. The true causal directions are appropriately highlighted in the respective Tables.

System 1 The optimal choice for the embedding dimension $m$ is 1, since the equations of system 1 are given only in terms of the first lag. By definition, however, we can only set $m \ge 2$ to estimate the PSTE. For $m=2$, the PSTE correctly detects the direct linear causal effect $X_2 \rightarrow X_3$ and, to a lesser extend, the nonlinear causal effect $X_1 \rightarrow X_2$. For these directions, the power of the test increases with $n$. Nevertheless, the PSTE fails to recognize the nonlinear causal effect $X_1 \rightarrow X_3$ (see Table 1). The percentages of significant PSTE values in the direction of no causal effects are low (between 1 and $8\,\%$). Its inability to detect the relationship $X_1 \rightarrow X_3$ is probably due to the fact that the effect of $X_2$ on $X_3$ is much larger than that of $X_1$ on $X_3$. The weak coupling of $X_1$ on $X_3$ might be arising from the small values of the variable $X_1$ that gets even smaller by squaring ($x_1^2$ is included in the equation of the system).

Table 1 Percentage of statistically significant PSTE ($m=2$) and CGCI ($P=2$) values for the simulation system 1

Detecting Causality in Non-stationary Time Series Using Partial Symbolic Transfer Entropy: Evidence in Financial Data

Abstract

Similar content being viewed by others

A Nonparametric Causality Test: Detection of Direct Causal Effects in Multivariate Systems Using Corrected Partial Transfer Entropy

Fast and effective pseudo transfer entropy for bivariate data-driven causal inference

Identification of causal relationships in non-stationary time series with an information measure: Evidence for simulated and financial data

1 Introduction

2 Materials and Methods

2.1 Partial Symbolic Transfer Entropy

2.2 Conditional Granger Causality Index

2.3 Statistical Significance of the PSTE and CGCI

3 Results

3.1 Simulation Study

3.2 Results from Simulation Study

3.3 Application to Financial Time Series

4 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation