Keywords

18.1 Introduction

The leading concept of Granger causality has been widely used to study the dynamic relationships between economic time series [4]. In practice, only a subset of the variables of the original multivariate system may be observed and omission of important variables could lead to spurious causalities between the variables. Therefore, the problem of spurious causality is addressed. Moreover, for a better understanding of the causal structure of a multivariate system it is important to study and discriminate between the direct and indirect causal effects.

Transfer entropy (TE) is an information theoretic measure that quantifies the statistical dependence of two variables (or subsystems) evolving in time. Although TE is able to distinguish effectively causal relationships and asymmetry in the interaction of two variables, it does not distinguish between direct and indirect relationships in the presence of other variables. Partial transfer entropy (PTE) is an extension of TE conditioning on the ensemble of the rest of the variables and it can detect the direct causal effects [20]. As reported in [13], using the nearest neighbor estimate, PTE can effectively detect direct coupling even in moderately high dimensions. The corrected transfer entropy (CTE) was proposed as a correction to the TE [12], aiming at reducing the estimation bias of TE. For its estimation, instead of making a formal surrogate data test, the surrogates were used within the estimation procedure of the measure, and the CTE was estimated based on correlation sums.

We introduce here the corrected partial transfer entropy (CPTE) that combines PTE and CTE, which reduces the bias in the estimation of TE, so that TE goes to the zero level when there is no causal effect. Similarly to CTE, the surrogates are used within the estimation procedure of CPTE, instead of performing a significant test for PTE. Further, for the estimation of CPTE, the nearest neighbor estimate is implemented since it has been shown to be robust to the time series length and to its free parameter (number of neighbors) and efficient in high dimensional data (e.g., see [21]).

The paper is organized as follows. In Sect. 18.2, the information causality measures, transfer entropy and partial transfer entropy are introduced and the suggested measure, corrected partial transfer entropy (CPTE) is presented. In Sect. 18.3, CPTE is evaluated on a simulation study using coupled stochastic systems with linear and nonlinear causal effects. As an example of a real application, the direct causal effects among economic variables are investigated in Sect. 18.4. Finally, in Sect. 18.5, the results from the simulation study and the application are discussed, while the usefulness and the limitations of the nonparametric causality test are addressed.

18.2 Methodology

In this section, we introduce the information causality measures transfer entropy (TE) and partial transfer entropy (PTE), and define the corrected partial transfer entropy (CPTE), a measure able to detect direct causal effects in multivariate systems. Transfer entropy (TE) is a nonlinear measure that quantifies the amount of information explained in Y at h time steps ahead from the state of X accounting for the concurrent state of Y [19]. Let x t , y t be two time series and \(\mathbf{x}_{t} = (x_{t},x_{t-\tau },\ldots,x_{t-(m-1)\tau })^{{\prime}}\) and \(\mathbf{y}_{t} = (y_{t},y_{t-\tau },\ldots,y_{t-(m-1)\tau })^{{\prime}}\), the reconstructed vectors of the state space of each system, where τ is the delay time and m is the embedding dimension. TE from X to Y is defined as

$$\displaystyle\begin{array}{rcl} \text{TE}_{X\rightarrow Y }& =& -H(y_{t+h}\vert \mathbf{x}_{t},\mathbf{y}_{t}) + H(y_{t+h}\vert \mathbf{y}_{t}) \\ & =& -H(y_{t+h},\mathbf{x}_{t},\mathbf{y}_{t}) + H(\mathbf{x}_{t},\mathbf{y}_{t}) + H(y_{t+h},\mathbf{y}_{t}) - H(\mathbf{y}_{t}),{}\end{array}$$
(18.1)

where H(x) is the Shannon entropy of the variable X. For a discrete variable X, the Shannon entropy is defined as \(H(X) = -\sum p(x_{i})\log p(x_{i})\), where p(x i ) is the probability mass function of the outcome x i , typically estimated by the relative frequency of x i . The partial transfer entropy (PTE) is the extension of TE accounting for the causal effect on the response Y by the other observed variables of a multivariate system besides the driving X, let us denote them Z. PTE is defined as

$$\displaystyle{ \text{PTE}_{X\rightarrow Y \vert Z} = - H(y_{t+h}\vert \mathbf{x}_{t},\mathbf{y}_{t},\mathbf{z}_{t}) + H(y_{t+h}\vert \mathbf{y}_{t},\mathbf{z}_{t}). }$$
(18.2)

where z t is the stacked vector of the reconstructed points for the variables in Z.

The information measure PTE is more general than partial correlation since it is not restricted to linear inter-dependence and relates presence and past (vectors \(\mathbf{x}_{t},\mathbf{y}_{t},\mathbf{z}_{t}\)) with future (y t+h ). Following the definition of Shannon entropy for discrete variables, one would discretize the data of X, Y, and Z first, but such binning estimate is inappropriate for high dimensional variables (m > 1). Instead we consider here the estimate of nearest neighbors. The joint and marginal densities are approximated at each point using the k-nearest neighbors and their distances from the point (for details see [6]). k-nearest neighbor estimate is found to be very robust to time series length, insensitive to its free parameter k and particularly useful for high dimensional data [11, 21].

Asymptotic properties for TE and PTE are mainly known for their binning estimate, which stem from the asymptotic properties of the estimates of entropy and mutual information for discrete variables (e.g., see [5, 10, 17]). Thus parametric significance testing for TE and PTE is possible assuming the binning estimate, but it was found to be less accurate than resampling testing making use of appropriate surrogates [7]. The nearest neighbor estimates of TE and PTE do not have parametric approximate distributions, and we employ resampling techniques in this study.

Theoretically, both PTE and TE should be zero when there is no driving-response effect (X → Y ). However, any entropy estimate gives positive TE and PTE at a level depending on the system, the embedding parameters and the estimation method. We introduce the Corrected Partial Transfer Entropy (CPTE), designed to give zero values in case of no causal effects and positive values otherwise. In order to define CPTE X → Y | Z , we compute M surrogate PTE values by randomizing the driving time series X using time shifted surrogates [15]. These M values form the null distribution of PTE for a significance test. We denote by q 0 the PTE value on the original set of time series and q(1 −α) the (1 −α)-percentile value from the M surrogate PTE values, where α corresponds to the significance level for an one-sided test. The CPTE X → Y | Z is defined as follows:

$$\displaystyle{ \begin{array}{rclll} \text{CPTE}_{X\rightarrow Y \vert Z}& =&0, &\quad \text{if}&q_{0} < q(1-\alpha ) \\ & =&q_{0} - q(1-\alpha ),&\quad \text{if}&q_{0} \geq q(1-\alpha ) \end{array} }$$
(18.3)

In essence, we correct for the bias given by q(1 −α) and either obtain a positive value if the null hypothesis of direct causal effect is rejected or obtain a zero value if CPTE is found statistically insignificant.

18.3 Evaluation of CPTE on Simulated Systems

CPTE is evaluated on Monte Carlo simulations on different multivariate stochastic coupled systems with linear and nonlinear causal effects. In this section, we present the simulation systems we used and display the results from the simulation study.

18.3.1 Simulation Setup

CPTE is computed on 100 realizations of the following coupled systems, for all pairs of variables conditioned on the rest of the variables and for all directions.

  1. 1.

    A VAR(1) model with three variables, where X 1 drives X 2 and X 2 drives X 3

    $$\displaystyle\begin{array}{rcl} x_{1,t}& =& \theta _{t} {}\\ x_{2,t}& =& x_{1,t-1} +\eta _{t} {}\\ x_{3,t}& =& 0.5x_{3,t-1} + x_{2,t-1} +\epsilon _{t}, {}\\ \end{array}$$

    where θ t , η t , ε t are Gaussian white noise with zero mean, diagonal covariance matrix, and standard deviations 1, 0.2, and 0.3, respectively.

  2. 2.

    A VAR(5) model with four variables, where X 1 drives X 3, X 2 drives X 1, X 2 drives X 3, and X 4 drives X 2 [22, Eq. 12]

    $$\displaystyle\begin{array}{rcl} x_{1,t}& =& 0.8x_{1,t-1} + 0.65x_{2,t-4} +\epsilon _{1,t} {}\\ x_{2,t}& =& 0.6x_{2,t-1} + 0.6x_{4,t-5} +\epsilon _{2,t} {}\\ x_{3,t}& =& 0.5x_{3,t-3} - 0.6x_{1,t-1} + 0.4x_{2,t-4} +\epsilon _{3,t} {}\\ x_{4,t}& =& 1.2x_{4,t-1} - 0.7x_{4,t-2} +\epsilon _{4,t} {}\\ \end{array}$$
  3. 3.

    A VAR(4) model of variables, where X 1 drives X 2, X 1 drives X 4, X 2 drives X 4, X 4 drives X 5, X 5 drives X 1, X 5 drives X 2, X 5 drives X 3 [18]

    $$\displaystyle\begin{array}{rcl} x_{1,t}& =& 0.4x_{1,t-1} - 0.5x_{1,t-2} + 0.4x_{5,t-1} +\epsilon _{1,t} {}\\ x_{2,t}& =& 0.4x_{2,t-1} - 0.3x_{1,t-4} + 0.4x_{5,t-2} +\epsilon _{2,t} {}\\ x_{3,t}& =& 0.5x_{3,t-1} - 0.7x_{3,t-2} - 0.3x_{5,t-3} +\epsilon _{3,t} {}\\ x_{4,t}& =& 0.8x_{4,t-3} + 0.4x_{1,t-2} + 0.3x_{2,t-3} +\epsilon _{4,t} {}\\ x_{5,t}& =& 0.7x_{5,t-1} - 0.5x_{5,t-2} - 0.4x_{4,t-1} +\epsilon _{5,t} {}\\ \end{array}$$
  4. 4.

    A coupled system of three variables with linear and nonlinear causal effects, where X 1 drives X 2, X 2 drives X 3, and X 1 drives X 3 [3, Model 7]

    $$\displaystyle\begin{array}{rcl} x_{1,t}& =& 3.4x_{1,t-1}(1 - x_{1,t-1})^{2}\exp -x_{ 1,t-1}^{2} + 0.4\epsilon _{ 1,t} {}\\ x_{2,t}& =& 3.4x_{2,t-1}(1 - x_{2,t-1})^{2}\exp -x_{ 2,t-1}^{2} + 0.5x_{ 1,t-1}x_{2,t-1} + 0.4\epsilon _{2,t} {}\\ x_{3,t}& =& 3.4x_{3,t-1}(1 - x_{3,t-1})^{2}\exp -x_{ 3,t-1}^{2} + 0.3x_{ 2,t-1} + 0.5x_{1,t-1}^{2} + 0.4\epsilon _{ 3,t} {}\\ \end{array}$$

The three first simulation systems are stochastic systems with only linear causal effects, while the fourth one has both linear and nonlinear causal effects. For all simulations systems, the time step h for the estimation of CPTE is set to one (as originally defined for TE in [19]) or m. The embedding dimension m is adapted to the system complexity, the delay time τ is set to one, and we use α = 0. 05. The number of neighbors k is set to 10 and we note that the choice of k has been found not to be crucial in the implementation of TE or PTE, e.g., see [6, 11, 13]. We consider the time series lengths n = 512 and 2,048, in order to examine the performance of the measure for both short and large time series length.

18.3.2 Results from Simulation Study

In order to evaluate the performance of CPTE, we display the percentages of rejection of the null hypothesis of no causal effect from the 100 realizations of the coupled systems.

For the first simulation system, if we set h = 1 and m = 1, the percentages of statistically significant CPTE at the directions of direct causal effects X 1 → X 2 and X 2 → X 3 are 100 %, while for the other directions of no causal effects the percentages vary from 2 % to 11 % (see Table 18.1). The choice h = 1 and m = 1 is favorably suited for this system and only direct causal effects are found significant. For different h or m values, indirect effects are detected by CPTE. For example, if we set h = 1 and m = 2, the indirect causal effect X 1 → X 3 is detected by CPTE. In this case however, this effect is indeed direct if two time lags are considered. The expression of x 3 after substituting x 2 becomes: \(x_{3,t} = 0.5x_{3,t-1} + x_{1,t-2} +\epsilon _{t} +\eta _{t-1}\). The same holds for h = 2 and m = 1, and here the direct causal effect X 1 → X 2 cannot be detected as the expression of x 2, t for two steps ahead is \(x_{2,t} =\theta _{t-1} +\eta _{t}\).

Table 18.1 Percentages of statistically significant CPTE for system 1, h = 1, m = 1x

Concerning the second system, the largest lag in the equations is 5, and therefore by setting h = 1 and m = 5, CPTE correctly detects the direct causal effects X 1 → X 3, \(X_{2} \rightarrow X_{1}\), and \(X_{4} \rightarrow X_{2}\). For the true direct effect X 2 → X 3 being under-valued in the system, the percentages of significant CPTE values increase with n, indicating that larger time series lengths are required to detect this interaction (see Table 18.2). By increasing h, indirect effects become statistically significant, e.g. for h = 5, CPTE correctly detects again all the direct interactions, even for small time series lengths, but it also indicates the indirect driving of X 4 to X 1 (with 50 % percentage for n = 512, and 100 % for n = 2, 048) and of X 4 to X 3 (35 % for n = 512, 74 % for n = 2, 048).

Table 18.2 Percentage of statistically significant CPTE for system 2, h = 1, m = 5

The third simulation system is on 5 variables and the largest lag is 4, so we set m = 4. For h = 1, CPTE correctly detects all the direct causal effects with a confidence increasing with n, e.g. the percentage of detection changes from 34 % for n = 512 to 96 % for n = 2, 048 for the weakest direct causal effect X 2 → X 4. However, for larger n, CPTE also indicates the indirect driving of X 5 → X 4 with percentage 52 % (see Table 18.3). For h = 4, the performance of CPTE worsens and it fails to detect some direct causal effects. For example, the percentages of significant CPTE values at the direction X 1 → X 4 are 11 % and 24 % for n = 512 and 2,048, respectively. For other couplings, the improvement of the detection from n = 512 to n = 2, 048 is larger: 17 % to 53 % for X 2 → X 4, 18 % to 47 % for X 5 → X 2, and 45 % to 98 % for X 4 → X 5.

Table 18.3 Percentage of statistically significant CPTE for system 3, h = 1, m = 4

The last simulation system involves linear interactions (X 2 → X 3) and nonlinear interactions (X 1 → X 2 and X 1 → X 3), all at lag one. For h = 1 and m = 2, CPTE correctly detects these causal effects for both small and large time series lengths, while the percentage of detection remains low at the absence of coupling, as shown in Table 18.4. Again, if h is larger than 1, false detections are observed. However, increasing n enhances the performance of CPTE, and for h = 2 and n = 4, 096 the percentage of significant CPTE for X 1 → X 2, X 2 → X 3, and X 1 → X 3 are 97 %, 100 %, and 77 %, respectively. Therefore, the effect of the selection of the free parameters h and m on CPTE gets larger for shorter time series.

Table 18.4 Percentages of statistically significant CPTE values of system 4, for h = 1, 2, m = 2, τ = 1, k = 10, and n = 512, 2,048, conditioned on the third variables, respectively

18.4 Application on Economic Data

As a real application, we investigate the causal effects among economic time series. Specifically, the goal of this section is to investigate the impact of monetary policy into financial uncertainty and the long-term rate by taking the direct effects of this relationship into account. The data are daily measurements from \(05/01/2007\) up to \(18/5/2012\). They consist of the 3-month Treasury Bill returns as a monetary policy tool, denoted as X 1, the 10-year Treasury Note to represent long-term behavior, denoted as X 2, and the option-implied expected volatility on the S & P500 returns index (VIX), X3, in order to take financial uncertainty into consideration.

In similar studies instead of using the 3-month TBill, the changes in monetary policy are mirrored in the evolution of the Fed Funds which is directly controlled by FED. However, as it is pointed out in [1, 8], the 3-month TBill rate can adequately reflect the Fed Funds movements.

An in-depth investigation of the interrelations among the three variables starts by estimating CPTE for all pairs of variables conditioned on the third variable. In the aim to smooth away any linear interdependence from the returns series the CPTE is applied on the VAR filtered variables. As it is shown in [2], information theoretic quantities, such as transfer entropy, perform better when VAR residuals are used. CPTE indicates the nonlinear driving of X 1 on X 2 (\(\mathrm{CPTE}_{X_{1}\rightarrow X_{2}} = 0.0024\)) for h = 1, m = 1, τ = 1, and k = 10. Regarding the “stability” of the results, it is expected to be lost by increasing the embedding dimension m. Clearly, CPTE for larger m values does not indicate any causal effect.

In order to further analyze the directions of those causal effects, PTE values from the VAR filtered returns are also calculated. The statistical significance of PTE is assessed with a surrogate data test. The respective p-values of the two-sided surrogate test are obtained with means of shifted surrogates. If the original PTE value is on the tail of the empirical distribution of the PTE surrogate value, then the “no-causal effects” hypothesis is rejected. It is worth noticing that the two-sided surrogate test for PTE indicates the same causal effects as CPTE, revealing that X 1 → X 2 (p-value = 0.03). The corresponding PTE values for this direction of the causality are much larger compared with the rest of relationships.

18.5 Conclusions

Corrected Partial Transfer Entropy (CPTE) is a nonparametric causality measure able to detect only the direct causal effects among the components (variables) of a multivariate system. CPTE is defined exploiting the concept of surrogate data in order to reduce the bias in Partial Transfer Entropy (PTE), giving zero values in case of no causal effects and otherwise positive values.

CPTE correctly detected the direct causal effects for all tested stochastic simulation systems, but only for the suitable selection of the free parameters. CPTE is sensitive to the selection of the free parameters h and m, especially for short time series. The selection of the step ahead h = 1 turns out to be more appropriate than h = m at all cases. The suitable selection of the free parameters seems to be crucial at most cases in order to avoid spurious detections of causal effects. The more complicated a system is, the larger the time series are needed.

In the real application, CPTE indicated the direct driving of the 3-month TBill returns on the 10-year TNote returns, without, however, excluding the presence of indirect dependencies among these interest rate variables and the VIX. Determining the 3-month TBill as the “node” variable, of our 3-dimensional system, highlights the interest in examining its underlying dynamics jointly with the transmission mechanisms of monetary policy. Although the transfer entropy (TE) method has been recently applied in financial data, the partial transfer entropy is a relatively new technique in this field. TE is estimated on the returns of the economic variables (log-returns) and does not rely upon cointegration aspects (e.g., see [9, 14, 16]). On the basis of the well-documented long-term comovement between the 3-month TBill and the 10-year TNote, the impact of non-stationarity on the performance of the above tests is an important issue meriting further investigation. This point reveals new insights about the informational content of Granger-causality type tests. The results from real data should be handled with care due to their high degree of sensitivity to the specific properties of the under-study variables.