Abstract
Orthogonal decompositions are essential tools for the study of weakly stationary time series. Some examples are given by the classical Wold decomposition of Wold (A study in the analysis of stationary time series, Almqvist & Wiksells Boktryckeri, Uppsala, 1938) and the extended Wold decomposition of Ortu et al. (Quant Econ 11(1):203–230, 2020), which permits to disentangle shocks with heterogeneous degrees of persistence from a given weakly stationary process. The analysis becomes more involved when dealing with vector processes because of the presence of different simultaneous shocks. In this paper, we recast the standard treatment of multivariate time series in terms of Hilbert A-modules (where matrices replace the field of scalars) and we prove the abstract Wold theorem for self-dual pre-Hilbert A-modules with an isometric operator. This theorem allows us to easily retrieve the multivariate classical Wold decomposition and the multivariate version of the extended Wold decomposition. The theory helps in handling matrix coefficients and computing orthogonal projections on closed submodules. The orthogonality notion is key to decompose the given vector process into uncorrelated subseries, and it implies a variance decomposition.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
A vector random process \(\textbf{x}=\{ x_t\}_{t\in \mathbb {Z}}\) is a collection of m univariate time series \(x_{i,t}\). Although weakly stationary univariate processes generally depend on a unique source of randomness [as ensured by the Wold decomposition Wold (1938)], each variable \(x_{i,t}\) of a weakly stationary multivariate process can be affected by m possibly different shocks \(\varepsilon _{1,t}, \varepsilon _{2,t}, \ldots , \varepsilon _{m,t}\). The representation of vector processes involves matrix coefficients, and their study has proved to be fruitful in numerous macroeconomic and financial applications (Lütkepohl 2005).
In this paper, we recast the standard treatment of multivariate time series in terms of Hilbert A-modules and prove the Abstract Wold Theorem for Hilbert A-modules (Theorem 1). Our abstract A-module framework features a notion of orthogonality that, by means of Theorem 1, permits to easily retrieve two important orthogonal decompositions for weakly stationary vector processes that we illustrate in Sects. 3 and 4, respectively. One is the celebrated multivariate classical Wold decomposition (MCWD, henceforth), summarized in Theorem 2. The other is the multivariate extended Wold decomposition (MEWD, henceforth), which constitutes the multivariate version of the extended Wold decompositionFootnote 1 of Ortu et al. (2020a), and it is used by Bandi et al. (2019, 2021) in financial economics settings. See Theorem 4.
Both the MCWD and the MEWD rely on orthogonal innovations. However, only in the MEWD shocks are associated with different degrees of persistence. In econometrics, persistence is usually addressed by spectral analysis techniques to analyze the frequency domain. For instance, cross-spectrum and squared coherency may be used to quantify the linear association between single time series in a vector process (Brockwell and Davis 2006, Section 11.6). On the contrary, the MEWD permits to disentangle uncorrelated persistent components from a weakly stationary vector process by using exclusively the time domain. Each vector component is associated with a specific persistence level (or time scale), and it is sensible to a family of shocks with precise duration. With respect to the univariate extended Wold decomposition of Ortu et al. (2020a), the illustration of the MEWD is richer because of the persistence of different simultaneous shocks.
To derive the MCWD and the MEWD, we first embed multivariate time series in an abstract A-module framework from which orthogonal decompositions naturally arise. We use the algebra A of square matrices, define orthogonal projections on closed submodules and prove the Abstract Wold Theorem for Hilbert A-modules, which is key to decompose vector processes into the sum of uncorrelated components. We provide a self-contained compendium on Hilbert A-modules on a non-commutative and finite dimensional algebra (as the one of matrices) in “Appendix A”. In fact, the application of Hilbert A-modules in economic theory and statistics is not novel. Some examples are given by Hansen and Richard (1987), Gallant et al. (1990), Wiener and Masani (1957) and Cerreia-Vioglio et al. (2022).
To enter more the details of our construction, we consider the vector space of square-integrable m-dimensional random vectors and we substitute the field of scalars with the algebra of \(m\times m\) matrices, obtaining an A-module H. We, then, endow H with an A-valued inner product which generalizes the inner product of \(L^{2}\) and naturally conveys a notion of orthogonality. Such a structure is a Hilbert A-module. The properties of self-duality of H and complementability of closed submodules, that we discuss in Sect. 2.1, are crucial for the Abstract Wold Theorem for Hilbert A-modules (Theorem 1).
Such theorem generalizes the Abstract Wold Theorem for Hilbert spaces (Sz-Nagy et al. 2010, Theorem 1.1) that permits to orthogonally decompose a Hilbert space by an isometric operator. When this theorem is applied to the Hilbert space generated by the past realizations of a weakly stationary univariate time series with the lag operator as isometry, the classical Wold decomposition obtains (see, e.g., Wold 1938; Brockwell and Davis 2006, Section 5.7 or Severino 2016). The orthogonality induced by the theorem is responsible for the white noise of fundamental innovations. However, other choices for the isometry are possible. For instance, the univariate persistence-based decomposition of Ortu et al. (2020a) is obtained by using as isometry the scaling operator on the Hilbert space generated by the past fundamental innovations. In this decomposition, the orthogonality ensured by the Abstract Wold Theorem is retrieved in the absence of correlation between persistent components.
The MEWD for a weakly stationary vector process \(x_t\) comes from the application of Theorem 1 to the Hilbert A-module generated by the multivariate fundamental innovations given by the MCWD of \(x_t\). The employed isometry is the scaling operator, adapted to A-modules. The A-module orthogonality given by the theorem guarantees that any entry of a multivariate persistent component at a given time scale is uncorrelated with any entry of any persistent component at a different scale (Theorem 4). This absence of correlation is crucial to associate each scale-specific response with the effect on the related time scale, without spurious correlation with the simultaneous effects at the other scales. In addition, the orthogonality of components induces a variance decomposition that permits to classify each entry of \(x_t\) as a short-, medium- or long-term process, according to the variance explained by the persistent components (Sect. 4.3).
The next section introduces the Hilbert A-module framework in which we embed weakly stationary vector processes and illustrates Theorem 1. Section 3 revisits the MCWD by emphasizing his connection with Theorem 1. Section 4 describes the MEWD and the persistence-based variance decomposition. As to appendices, “Appendix A” provides a comprehensive and self-contained treatment of Hilbert A-modules and Theorem 1, “Appendix B” contains the proofs about the MEWD and “Appendix C” provides some illustrations of the MEWD, including an application to the bivariate macroeconomic model of Blanchard and Quah (1989).
2 Hilbert A-modules for multivariate time series
In the first subsection, we condense the notions of Hilbert A-module theory that lead to the Abstract Wold Theorem for Hilbert A-modules (Theorem 1). “Appendix A” contains all the details (see also Cerreia-Vioglio et al. 2017, 2019). After that, we describe the Hilbert A-module \(L^2(\mathbb {R}^m,\Omega , \mathcal {F}, \mathbb {P})\) and the submodules that we will need for the MCWD and the MEWD.
2.1 Hilbert A-modules
Hilbert A-modules are a generalization of Hilbert spaces where the scalar field \(\mathbb {R}\) is replaced by an abstract algebra A. Although these structures have been studied in the mathematical literature (Kaplansky 1953), few works deal with the case of our interest, where the algebra is real, non-commutative and finite dimensional (Goldstine and Horwitz 1966). Differently from Hilbert spaces, the Riesz Theorem about the representation of linear functionals (Theorem 5.5 in Brezis 2011) is not always valid and closed subspaces are not always complemented. Moreover, the self-duality property that we discuss is key in deriving Theorem 1.
We consider a real normed operator algebra A (the algebra of square matrices) with norm \(\Vert \ \Vert _{A}\), an involution \(^{*}:A\rightarrow A\), an order \(\ge \) and a trace functional \(\bar{\varphi }:A\rightarrow \mathbb {R}\). Then, we consider an A-module H with outer product \(\cdot :A\times H\rightarrow H\) and we define an A-valued inner product \(\langle \, , \rangle _{H}:H\times H\rightarrow A\), which satisfies the A-valued versions of the usual properties of inner products. This makes H a pre-Hilbert A-module.
A-valued operators \(f: H \rightarrow A\) constitute the generalization of linear functionals, and the properties of A-linearity and boundedness can be defined accordingly. Importantly, H is self-dual when for each \(f: H \rightarrow A\) which is A-linear and bounded there exists \(y \in H\) such that \( f(x)=\langle x,y \rangle _H\) for all \(x \in H\). In other words, a version of the Riesz Theorem holds.
To put the theory at work, H can be endowed with two (real-valued) norms: \( \Vert \ \Vert _{\bar{\varphi }}\) and \(\Vert \ \Vert _{H}\). The first one is \(\Vert \ \Vert _{H}:H\rightarrow [ 0,+\infty )\) defined by
The second one is induced by the (real-valued) inner product \(\langle \, , \rangle _{\bar{\varphi }}:H\times H\rightarrow \mathbb {R}\) defined by \( \langle x,y \rangle _{\bar{\varphi }}=\bar{\varphi } ( \langle x,y \rangle _{H}) \ \ \forall x,y\in H\), which makes H a pre-Hilbert space. The norm is \(\Vert \ \Vert _{\bar{\varphi } }:H\rightarrow [0,+\infty )\) defined by
Proposition 12 in “Appendix A.3.1” shows that, if A is finite dimensional, then A admits a trace \(\bar{\varphi }\) and the norms \(\Vert \ \Vert _{H}\) and \(\Vert \ \Vert _{\bar{\varphi }}\) are equivalent. In our construction, the use of \(\Vert \ \Vert _{\bar{\varphi }}\) is convenient to establish the convergence of sequences in H, while \(\langle \, , \rangle _{H}\) naturally induces the notion of orthogonality: \(\langle x,y\rangle _{H}=0\) with \( x,y\in H\) and \(0\in A\).
We say that H is a Hilbert A-module when it is \(\Vert \ \Vert _{H}\) complete. Theorem 15 in “Appendix A.5” establishes two characterizations of \(\Vert \ \Vert _{H}\) completeness when A is finite dimensional: H is \(\Vert \ \Vert _{H}\) complete if and only if it is \(\Vert \ \Vert _{\bar{\varphi }}\) complete if and only if it is self-dual. The link between completeness and self-duality is, then, established.
The last ingredient for Theorem 1 regards the orthogonal complement of a given submodule \(M\subseteq H\), i.e., \(M^{\bot }= \{ x\in H: \langle x,y \rangle _{H}=0\quad \forall y\in M \}\). By Proposition 17 in “Appendix A.5.1”, if A is finite dimensional and H is self-dual, M is \(\Vert \ \Vert _{H}\) closed if and only if \(H=M\oplus M^{\bot } \) (complementability). The projection map on closed submodules is, then, well-defined.
The decomposition in Theorem 1 is due to an isometry \(\textbf{T}:H\rightarrow H\), i.e., an A-linear operator such that \(\langle \textbf{T}x,\textbf{T}y \rangle _{H}= \langle x,y \rangle _{H}\) for all \(x,y\in H\). Moreover, the theorem prescribes the determination of a wandering submodule L such that \(\textbf{T}^{n} L \bot \textbf{T} ^{m} L\) for all \(m,n\in \mathbb {N}_{0}\) with \(m\not =n\). If H is self-dual, a wandering submodule is \(L=(\textbf{T}H) ^{\bot }\).
Theorem 1
(Abstract Wold Theorem for Hilbert A-modules) Let A be finite dimensional and H a pre-Hilbert A-module. If H is self-dual and \(\textbf{T}:H\rightarrow H\) is an isometry, then \(H=\widehat{H}\oplus \widetilde{H}\) where
Moreover, the submodules orthogonal decomposition, \((\widehat{H},\widetilde{H })\), of H is the unique submodules orthogonal decomposition such that \(\textbf{T} \widehat{H}=\widehat{H}\) and \(\widetilde{H}=\bigoplus _{n=0}^{\infty }\textbf{ T}^{n}L\) given a wandering set L.
Proof
See “Appendix A.6”. \(\blacksquare \)
Theorem 1 provides a (generalized) version for Hilbert A-modules of the Abstract Wold Theorem for Hilbert spaces (Sz-Nagy et al. 2010, Theorem 1.1). The properties of self-duality and complementability play a crucial role in this wider setting.
2.2 The Hilbert A-module \(L^2(\mathbb {R}^m,\Omega , \mathcal {F}, \mathbb {P})\)
A probability space \((\Omega , \mathcal {F}, \mathbb {P})\) is given and, as usual, any two \(\mathcal {F}\)-measurable random vectors are defined to be equivalent when they coincide almost surely. We, then, consider the vector space \(L^2(\mathbb {R}^m,\Omega , \mathcal {F}, \mathbb {P})\) of (equivalence classes of) measurable square-integrable random vectors x that take value in \(\mathbb {R }^m\). We build on \(L^2(\mathbb {R}^m,\Omega , \mathcal {F}, \mathbb {P})\) the structure of pre-Hilbert A-module, and we denote it by H. The whole discussion is summarized in Table 1.
First of all, we consider the algebra \(A=\mathbb {R}^{m\times m}\) of real \( m\times m\) matrices. The unit in A is the identity matrix, and the product in A is the usual row-by-column product. A is normed by the operator norm \(\Vert \ \Vert _A\) such that, for any \(a=\{ a_{i,j}\}_{i,j=1,\ldots ,m}\) in A,
where \(\Vert \ \Vert _2\) is the \(L^2\) norm in \(\mathbb {R}^m\). In particular, \( \Vert a\Vert _A^2\) is the largest eigenvalue of the positive semidefinite matrix \( a^{\prime }a\) (Meyer 2000, Section 5.2), where \(a^{\prime } \) is the transpose of a. The map that associates any matrix a with \( a^{\prime }\) defines an involution in A. This map induces an order \(\ge \) such that, for any \(a,b \in A\), \(a \ge b\) when \(a-b\) is a symmetric and positive semidefinite matrix (equivalently, \(a-b \ge \textbf{0}\)).
We use as outer product \(A \times H \rightarrow H\) the standard matrix-by-vector product. This operation makes H an A-module. Then, we define the A-valued inner product \(\langle \, \ \rangle _H: H \times H \rightarrow A\) that associates any \(x=[x_1,\ldots ,x_m]^{\prime }, y=[y_1,\ldots ,y_m]^{\prime }\in H \) with the matrix
\(\langle \, , \rangle _H\) satisfies the usual properties of inner products. In addition, if x has zero mean, \(\langle x,x\rangle _H\) is the covariance matrix of x. Importantly, two random vectors \(x,y \in H\) are orthogonal when \(\langle x, y \rangle _H=\textbf{0}\), that is \(\mathbb {E}[x_i y_j]=0\) for all \(i,j =1,\ldots ,m\). If x and y have zero mean, this means that any \(x_i\) is uncorrelated with any \(y_j\).
We now define the two equivalent (real-valued) norms \(\Vert \ \Vert _H\) and \(\Vert \ \Vert _{\bar{\varphi }}\). Regarding \(\Vert \ \Vert _H: H \rightarrow [0,+\infty )\), we have
If x has zero mean, \(\Vert x\Vert _H^2\) is the largest eigenvalue of the covariance matrix of x.
To construct \(\Vert \ \Vert _{\bar{\varphi }}\), we first consider the trace functional \(\bar{\varphi }: A \rightarrow \mathbb {R}\) defined, for any matrix a, by its trace \(\bar{\varphi } (a) = \textrm{Tr}(a)\). Indeed, H is a pre-Hilbert space with the inner product \(\langle \, \ \rangle _{\bar{\varphi }}: H \times H \rightarrow \mathbb {R}\) defined by
which coincides with the usual inner product of \(L^2( \mathbb {R}^m)\). The associated norm \(\Vert \ \Vert _{\bar{\varphi }}: H \rightarrow [0,+\infty )\) is
If x has zero mean, \(\Vert x\Vert _{\bar{\varphi }}^2\) is the sum of the eigenvalues of the covariance matrix of x.
Proposition 20 in “Appendix B.1” shows that H is \(\Vert \ \Vert _{\bar{\varphi }}\) complete, i.e., it is a Hilbert A-module. Thus, the self-duality property holds.
2.3 The submodules generated by weakly stationary vector processes
Now, consider a zero-mean weakly stationary multivariate process \(\textbf{x} =\{x_t\}_{t \in \mathbb {Z}}\) such that \(x_t=[x_{1,t},\ldots , x_{m,t}]^{\prime }\in H\) for all \(t \in \mathbb {Z}\). The autocovariance function \(\Gamma : \mathbb {Z} \rightarrow A\) associates any integer n with the matrix \(\Gamma _n\) with entries \(\Gamma _n(p,q)= \mathbb {E}[x_{p,t}x_{q,t+n}]\) for \(p,q=1,\ldots ,m\). If \(\Gamma _n = \textbf{0}\) for any \(n\ne 0\), we are facing a multivariate white noise, which features unit variance when \( \Gamma _0 \) is the identity matrix. In this case, the single time series of the multivariate white noise are uncorrelated also simultaneously.
The sequence \(\{x_{t-n}\}_{n\in \mathbb {N}_0}\) spans the Hilbert submodule \( \mathcal {H}_t(\textbf{x}) \subseteq H\) defined by
with
As we will see in Sect. 3, when \(\textbf{x}\) is regular, the multivariate classical Wold decomposition (MCWD) can be obtained by applying Theorem 1 to \(\mathcal {H}_t(\textbf{x})\) with the lag operator as isometry.
An outcome of the MCWD is the unit variance multivariate white noise of fundamental innovations \(\varvec{\varepsilon }=\{\varepsilon _t \}_{t\in \mathbb {Z}}\). The sequence \(\{\varepsilon _{t-n}\}_{n\in \mathbb {N} _0}\) generates the submodule \(\mathcal {H}_t(\varvec{\varepsilon }) \subseteq H\) given by
with
Proposition 20 in “Appendix B.1” shows that \(\mathcal {H}_t(\varvec{ \varepsilon })\) is a closed submodule of H and so it is a Hilbert submodule. In Sect. 4, we apply Theorem 1 to \(\mathcal {H}_t(\varvec{\varepsilon })\) with the scaling operator as isometry in order to derive the multivariate extended Wold decomposition (MEWD) of \(x_t\).
3 Multivariate classical Wold decomposition
In the MCWD, a zero-mean regular weakly stationary vector process \(\textbf{x} =\{ x_t\}_{t\in \mathbb {Z}}\) is decomposed into the infinite sum of uncorrelated multivariate innovations that occur at different times, plus a deterministic component (Theorem 7.2 in Bierens 2005). Wiener and Masani (1957) and Rozanov (1967) provide a proof in the complex field. Here, we provide the roadmap to derive the MCWD via the Abstract Wold Theorem for Hilbert A-modules in the real case. A similar derivation in the univariate case is illustrated in Severino (2016).
We consider the Hilbert submodule \(\mathcal {H}_t(\textbf{x}) \subseteq H\) of Eq. (1). The lag operator \(\textbf{L}: \mathcal {H}_t(\textbf{x}) \rightarrow \mathcal {H}_t(\textbf{x})\) acts on generators of \(\mathcal {H}_t(\textbf{x})\) as
\(\textbf{L}\) is A-linear and bounded; hence, it can be extended to \( \mathcal {H}_t(\textbf{x})\) with continuity. Importantly, \(\textbf{L}\) is isometric on \(\mathcal {H}_t(\textbf{x})\). Theorem 1 can, then, be applied.
Theorem 1 requires to determine the images of \( \mathcal {H}_t(\textbf{x})\) through the powers of \(\textbf{L}\) and the wandering submodule \(\mathcal {L}_t^\textbf{L}\). Since, in a self-dual pre-Hilbert A-module, the image of a closed submodule through an isometry is a closed submodule (Lemma 18 in “Appendix A.6”), one can prove that \(\textbf{L}^j \mathcal {H}_t(\textbf{x})=\mathcal {H}_{t-j}(\textbf{x})\) for any \(j \in \mathbb {N}\).
Then, it is possible to show that \(\mathcal {H}_t(\textbf{x})\) can be decomposed into the direct sum
i.e., \(\mathcal {L}_t^\textbf{L}\) is the linear span of \(x_t- \mathcal {P}_{ \mathcal {H}_{t-1}(\textbf{x})} x_t\), where \(\mathcal {P}_M\) denotes the orthogonal projection on the closed submodule M.
Since \(\textbf{x}\) is regular, for any \(t \in \mathbb {Z}\), \(\langle x_t- \mathcal {P}_{\mathcal {H}_{t-1}(\textbf{x})} x_t, x_t- \mathcal {P}_{\mathcal {H }_{t-1}(\textbf{x})} x_t \rangle _H\) is a symmetric positive definite matrix (Bierens 2012, Section 6). Hence, by Theorem 7.2.6 in Horn and Johnson (1990), there exists a symmetric positive definite square root matrix \(\sigma \) such that
As \(\sigma \) is invertible, we define the fundamental innovation process \(\varvec{\varepsilon }=\{ \varepsilon _t \}_{t\in \mathbb {Z}}\) by setting, for any \(t \in \mathbb {Z}\), \( \varepsilon _t= \sigma ^{-1} ( x_t- \mathcal {P}_{\mathcal {H}_{t-1}( \textbf{x})} x_t ). \) In particular, \(\varvec{\varepsilon }\) is a unit variance white noise.
It is, then, possible to show that the lag and the projection operator commute: for any \(k,j \in \mathbb {N}_0\), \(\textbf{L}^j\mathcal {P}_{\mathcal {H }_{t-k-1}(\textbf{x})} x_{t-k}=\mathcal {P}_{\mathcal {H}_{t-k-j-1}(\textbf{x} )} x_{t-k-j}\). Therefore, the covariance matrix of \(x_t- \mathcal {P}_{ \mathcal {H}_{t-1}(\textbf{x})} x_t\) is not dependent on the time index \(t \in \mathbb {Z}\) and, for any \(j \in \mathbb {N}\),
As a result, Theorem 1 implies that \(\mathcal {H} _t(\textbf{x})=\widehat{\mathcal {H}}_t(\textbf{x}) \oplus \widetilde{ \mathcal {H}}_t(\textbf{x})\) with
The consequences for the process \(\textbf{x}\) are, then, straightforward.
Theorem 2
(Multivariate classical Wold decomposition) Let \(\textbf{x}=\{x_t\}_{t \in \mathbb {Z}}\) be a zero-mean regular weakly stationary m-dimensional process. Then, for any \(t \in \mathbb {Z}\), \(x_t\) decomposes as
where the equality is in norm and
-
1.
\(\varvec{\varepsilon }=\{\varepsilon _t\}_{t\in \mathbb {Z}}\) is a unit variance m-dimensional white noise;
-
2.
for any \(h \in \mathbb {N}_0\), the \(m\times m\) matrices \(\alpha _h\) do not depend on t,
$$\begin{aligned} \alpha _h=\mathbb {E}\left[ x_t \varepsilon ^{\prime }_{t-h}\right] \qquad \text{ and } \qquad \sum _{h=0}^{+\infty } \textrm{Tr}\left( \alpha _h \alpha ^{\prime }_h \right) < +\infty ; \end{aligned}$$ -
3.
\(\varvec{\nu }=\{\nu _t\}_{t\in \mathbb {Z}}\) is a zero-mean weakly stationary m-dimensional process,
$$\begin{aligned} \nu _t\in \bigcap _{j=0}^{+\infty }\mathcal {H}_{t-j}(\textbf{x}) \qquad \text{ and } \qquad \mathbb {E}\left[ \nu _t \varepsilon ^{\prime }_{t-h}\right] = \textbf{0}\quad \forall h \in \mathbb {N}_0; \end{aligned}$$ -
4.
$$\begin{aligned} \nu _t \in \textrm{cl}\left\{ \sum _{h=1}^{+\infty } a_h \nu _{t-h} \quad \in \bigcap _{j=1}^{+\infty }\mathcal {H}_{t-j}(\textbf{x}): \quad a_h \in A \right\} . \end{aligned}$$
Proof
Wiener and Masani (1957, Theorem 6.11) and Rozanov (1967, Chapter II, Section 3) provide a proof in the complex field. A detailed proof using real Hilbert A-modules is available upon request. \(\blacksquare \)
The process \(\varvec{\nu }\) constitutes the (predictable) deterministic component of \(\textbf{x}\). If each \(\nu _t\) is the null vector, \(\textbf{x}\) is a purely non-deterministic process.
In our approach, the multivariate impulse responses \(\alpha _h\) are fully characterized by the projection on Hilbert submodules via the inner product \(\alpha _h= \langle x_t, \varepsilon _{t-h} \rangle _H\). This feature generalizes the OLS methodology employed in the univariate case by exploiting orthogonal projections in a more general sense. Indeed, each projection matrix \(\alpha _h\) minimizes the distance of the outcome \(x_t\) from the submodule generated by the vector innovation \(\varepsilon _{t-h}\).
4 Multivariate extended Wold decomposition
In this section, we generalize the extended Wold decomposition for weakly stationary time series of Ortu et al. (2020a) to multidimensional processes, by exploiting our Hilbert-module framework. We apply the Abstract Wold Theorem for Hilbert A-modules to \(\mathcal {H}_t( \varvec{\varepsilon })\), and, later on, we deduce the decomposition for a weakly stationary vector process \(\textbf{x}\) with fundamental innovations given by \(\varvec{\varepsilon }\). We also provide a persistence-based variance decomposition.
4.1 The orthogonal decomposition of \(\mathcal {H}_t(\varepsilon )\) induced by \(\textbf{R}\)
Let \(\varvec{\varepsilon }=\{ \varepsilon _{t}\}_{t\in \mathbb {Z}}\) be a unit variance m-dimensional white noise and consider the Hilbert submodule \(\mathcal {H}_t(\varvec{\varepsilon }) \subseteq H\) of Eq. (2). In order to apply Theorem 1, we consider the scaling operator \(\textbf{R}: \mathcal {H}_t(\varvec{\varepsilon }) \rightarrow \mathcal {H}_t(\varvec{ \varepsilon })\) that makes (normalized) two-by-two averages of subsequent innovations:
Here, the function \(\lfloor \cdot \rfloor \) associates any \(c\in \mathbb {R}\) with the integer \(\lfloor c\rfloor =\max \{ n \in \mathbb {Z}: n \leqslant c\} \). In the proof of Theorem 3, we show that \(\textbf{R}\) is well-defined, A-linear and isometric on \(\mathcal {H}_t(\varvec{ \varepsilon })\).
Following Ortu et al. (2020a), to illustrate the decomposition of \(\mathcal {H}_{t}(\varvec{\varepsilon })\) induced by \(\textbf{R}\), from the white noise \(\varvec{\varepsilon }\), we define the (multivariate) detail process at scale 1 \(\varvec{ \varepsilon ^{(1)}}=\{\varepsilon _{t}^{(1)}\}_{t\in \mathbb {Z}}\) by
Each \(\varepsilon _{t}^{(1)}\) has zero mean and unit variance: \(\mathbb {E} [\varepsilon _{t}^{(1)}{\varepsilon _{t}^{(1)}}^{\prime }]=I\). In general, for any \(j\in \mathbb {N}\), we define the (multivariate) detail process at scale j \(\varvec{\varepsilon ^{(j)}}=\{\varepsilon _{t}^{(j)}\}_{t\in \mathbb {Z}}\) by
This definition is the natural multivariate counterpart of eq. (6) in Ortu et al. (2020a), which is written in the same way. Equation (3) is reminiscent of the iterated application of the discrete Haar transform to the series of \(\varepsilon _{t}\) (Addison 2002, Chapter 3). High scales involve detail processes that have not been faded out by many applications of this transform. They are, therefore, associated with a high degree of persistence. For instance, if t evolves daily, one can interpret the details at scale 1 as 2-day shocks, those at scale 2 as 4-day shocks, those at scale 3 as 8-day shocks and so on. In few words, the scale j involves \(2^j\)-day multivariate shocks and defines the degree of persistence j.
In order to avoid overlap among the vectors \(\varepsilon _{t}^{(j)}\), at any scale j we consider the subseries of \(\varvec{\varepsilon ^{(j)}}\) defined on the support \(S^{(j)}_t=\{t-k2^j: k \in \mathbb {Z}\}\). Indeed, each detail at scale j is a vector \(MA(2^j-1)\) of the white noise \(\varvec{\varepsilon }\). Some spurious correlation is present between the details \(\varepsilon _{t-k2^j}^{(j)}\) and \(\varepsilon _{\tau -k2^j}^{(j)}\) with \(\vert {t-\tau } \vert \leqslant 2^j -1\), but each subseries \(\{ \varepsilon _{t-k2^j}^{(j)}\}_{k \in \mathbb {Z}}\) is a unit variance white noise. We formalize this fact in the first point of Theorem 4, and we will associate high scales with more persistent detail processes.
We now state the orthogonal decomposition of \(\mathcal {H}_t(\varvec{ \varepsilon })\) implied by Theorem 1 when we use the scaling operator as isometry.
Theorem 3
Let \(\varvec{\varepsilon }\) be a unit variance m -dimensional white noise. The Hilbert A-module \(\mathcal {H}_t(\varvec{ \varepsilon })\) decomposes into the orthogonal sum
where
Proof
See “Appendix B.2”. \(\blacksquare \)
The main steps of the proof are the following.
We first determine the invariant submodule \(\widehat{\mathcal {H}}_t( \varvec{\varepsilon })\) prescribed by Theorem 1. From the definition of \(\textbf{R}\), the submodule \(\textbf{R}\mathcal {H} _t(\varvec{\varepsilon })\) is composed by the linear combinations of innovations \(\varepsilon _t\) that have the (matrix) coefficients equal to each others 2-by-2. Similarly, for any \(j \in \mathbb {N}\), the submodules \(\textbf{R}^j\mathcal {H}_t(\varvec{\varepsilon })\) consist of the linear combinations of vectors \(\varepsilon _t\) with (matrix) coefficients equal to each others \(2^j\)-by-\(2^j\):
It follows that the intersection of all \(\textbf{R}^j\mathcal {H}_t( \varvec{\varepsilon })\) contains only the zero vector, that is \(\widehat{ \mathcal {H}}_t(\varvec{\varepsilon })\) is the null submodule: \(\widehat{ \mathcal {H}}_t(\varvec{\varepsilon })=\{0\}\).
We now turn to the submodule \(\widetilde{\mathcal {H}}_t(\varvec{ \varepsilon })\). The wandering submodule \(\mathcal {L}_t^\textbf{R}\) associated with \(\textbf{R}\) is the orthogonal complement of \(\textbf{R} \mathcal {H}_t(\varvec{\varepsilon })\) in \(\mathcal {H}_t(\varvec{ \varepsilon })\). As \(\textbf{R}\) is linear and bounded, such submodule coincides with the kernel of the adjoint operator \(\mathbf {R^*}\) (Proposition 19 in “Appendix A.7”). Therefore,
Hence, \(\mathcal {L}_t^\textbf{R}\) contains the moving averages generated by the detail process at scale 1 on the support \(S_t^{(1)}\). More generally, for each \(j \in \mathbb {N}\), the image of \(\mathcal {L}_t^\textbf{R}\) through \(\textbf{R}^{j-1}\) is the submodule in Eq. (4), which consists of the moving averages generated by the detail process at scale j on \(S_t^{(j)}\).
4.2 The multivariate extended Wold decomposition of \(x_t\)
As in Sect. 3, we consider a zero-mean regular weakly stationary m-dimensional process \(\textbf{x}=\{x_t\}_{t \in \mathbb {Z}}\). We also require \(\textbf{x}\) to be purely non-deterministic in order to focus on the persistence generated by the shocks aggregation.
Theorem 2 (the MCWD) ensures that \(x_t\) belongs to \(\mathcal {H} _t(\varvec{\varepsilon })\), where \(\varvec{\varepsilon }\) is the process of fundamental innovations of \(\textbf{x}\). As a result, the orthogonal decomposition of \(\mathcal {H}_t(\varvec{\varepsilon })\) of Theorem 3 induces a decomposition of \(x_t\). Indeed, there exists a sequence \(\{ g_t^{(j)}\}_{j\in \mathbb {N}}\) of random processes such that
where each \(g_t^{(j)}\) is the orthogonal projection of \(x_t\) on the submodule \(\textbf{R}^{j-1} \mathcal {L}_t^\textbf{R}\) of \(\mathcal {H}_t( \varvec{\varepsilon })\). We refer to \(g_t^{(j)}\) as the (multivariate) persistent component at scale j. Clearly, given t, the components \(g_t^{(j)}\) are orthogonal to each others. In addition, since each \(g_t^{(j)}\) belongs to \(\textbf{R}^{j-1} \mathcal {L}_t^\textbf{R}\),
where \(\Vert \sum _{k=0}^{\infty } \beta _k^{(j)} \varepsilon _{t-k2^j}^{(j)} \Vert ^2_{ \bar{\varphi }}=\sum _{k=0}^{\infty } \textrm{Tr} ( \beta _k^{(j)} {\beta _k^{(j)} }^{\prime })\) is finite. Each \(\beta _k^{(j)}\) is obtained by projecting \(x_t\) on the submodule generated by the detail \(\varepsilon _{t-k2^j}^{(j)}\) and so
By replacing the expression of \(g_t^{(j)}\) into Eq. (5), we obtain the MEWD of \(x_t\) stated in Eq. (6). The explicit expression of matrices \(\beta _k^{(j)}\) in Eq. (7) below turns out to be the multivariate version of eq. (7) in Ortu et al. (2020a), which features the same writing.
Theorem 4
(Multivariate extended Wold decomposition) Let \(\textbf{x}\) be a zero-mean regular weakly stationary purely non-deterministic m-dimensional process. Then, \(x_t\) decomposes as
where the equality is in norm and
-
1.
for any fixed \(j\in \mathbb {N}\), the m-dimensional process \( \varvec{\varepsilon ^{(j)}}=\{\varepsilon ^{(j)}_{t}\}_{t \in \mathbb {Z}}\) with
$$\begin{aligned} \varepsilon _{t}^{(j)}=\frac{1}{\sqrt{2^j}}\left( \sum _{i=0}^{2^{j-1}-1} \varepsilon _{t-i}-\sum _{i=0}^{2^{j-1}-1} \varepsilon _{t-2^{j-1}-i} \right) \end{aligned}$$is a \(MA(2^j-1)\) with respect to the classical Wold innovations of \(\textbf{x }\) and \(\{\varepsilon ^{(j)}_{t-k2^j}\}_{k \in \mathbb {Z}}\) is a unit variance white noise;
-
2.
for any \(j\in \mathbb {N}\), \(k \in \mathbb {N}_0\), the \(m\times m\) matrices \(\beta _k^{(j)}\) are uniquely determined via
$$\begin{aligned} \beta _k^{(j)}=\frac{1}{\sqrt{2^j}}\left( \sum _{i=0}^{2^{j-1}-1}\alpha _{k2^j+i}-\sum _{i=0}^{2^{j-1}-1} \alpha _{k2^j+2^{j-1}+i}\right) ; \end{aligned}$$(7)hence, they are independent of t and \(\sum _{k=0}^{\infty } \textrm{Tr} (\beta _k^{(j)} {\beta _k^{(j)}}^{\prime })<+\infty \) for any \(j\in \mathbb {N}\);
-
3.
letting
$$\begin{aligned} g_t^{(j)}=\sum _{k=0}^{+\infty }\beta _k^{(j)}\varepsilon _{t-k2^j}^{(j)}, \end{aligned}$$(8)for any \(j,l \in \mathbb {N}, p,q,t \in \mathbb {Z}\), \(\mathbb {E}[ g_{t-p}^{(j)}{g_{t-q}^{(l)}}^{\prime }]\) depends at most on \(j,l,p-q\). Moreover, \(\mathbb {E}[ g_{t-m2^j}^{(j)}{g_{t-n2^l}^{(l)}}^{\prime }]=\textbf{ 0}\) for all \(j \ne l, m,n \in \mathbb {N}_0\) and \(t \in \mathbb {Z}\).
Proof
See “Appendix B.3.” \(\blacksquare \)
The matrix \(\beta _k^{(j)}\) is the (multivariate) scale-specific response associated with the innovation at scale j and time translation \( k2^j\). Since the details at different scales can be expressed in terms of the fundamental innovations \(\varepsilon _t\), the MEWD and the MCWD share the same shocks. For this reason, we can retrieve the matrices \(\beta _k^{(j)}\) from the matrices \(\alpha _h\) of the MCWD through Eq. (7).
An orthogonal decomposition of \(\mathcal {H}_t(\varvec{\varepsilon })\) into a finite number of submodules is also possible. Indeed, \(\mathcal {H}_t( \varvec{\varepsilon })= \textbf{R}\mathcal {H}_t(\varvec{\varepsilon }) \oplus \mathcal {L}_t^\textbf{R}\) and, by iteratively applying the scaling operator,
The (multivariate) residual component at scale j is the orthogonal projection of \(x_t\) on the submodule \(\textbf{R}^j\mathcal {H}_t(\varvec{ \varepsilon })\), and we denote it by \(\pi _t^{(j)}\). As can be seen in the proof of Theorem 4, \(\pi _t^{(j)}\) has the expression:
As a result, a MEWD of \(x_t\) holds both in the finite case, i.e., when a maximum scale J is chosen, and in the infinite one:
According to the third point of Theorem 4, when t is fixed, the orthogonality among persistent components involves all the shifted vectors \(g_{t-m2^j}^{(j)}\) and \(g_{t-n2^l}^{(l)}\), for any \(m,n \in \mathbb {Z}\), with time translation proportional to \(2^j\) and \(2^l\), respectively. In general, the cross-covariance matrix between \(g_{t-p}^{(j)}\) and \(g_{t-q}^{(l)}\) depends at most on the scales j, l and on the difference \(p-q\).
By the MEWD, \(x_t\) is decomposed into the sum of orthogonal components \( g_t^{(j)}\) associated with different persistence levels j. Each vector \( g_t^{(j)}\) has innovations on a grid \(S_t^{(j)}=\{ t-k2^j: k \in \mathbb {Z} \}\) with time interval between two indices proportional to \(2^j\). When the scale j increases, the support \(S_t^{(j)}\) becomes sparser and the degree of persistence of details rises. For instance, if j is high and a scale-specific response \(\beta _k^{(j)}\) is remarkably different from the null matrix, \(x_t\) is affected by an important low-frequency component. On the contrary, if j is low and some \(\beta _k^{(j)}\) differs from zero, a high-frequency component is not negligible and impinges on \(x_t\) in the short term.
A justification of the fact that the iterated application of \(\textbf{R}\) increases persistence is due to spectral analysis considerations, and it is developed in detail in Ortu et al. (2020a) for the univariate case: see Section 2.1 therein and the supplementary material (Ortu et al. 2020b). The scaling operator defines, in fact, an approximate low-pass filter. In addition, as suggested in the same paper, bases different from 2 can be used to derive similar persistence-based decompositions.
The MEWD properly generalizes the univariate Extended Wold Decomposition of Ortu et al. (2020a). Indeed, in case the matrix coefficients \(\alpha _h\) are diagonal, for any \(i=1,\ldots , m\) the entry \(x_{i,t}\) depends only on the innovations \(\varepsilon _{i,t}\) and the scale-specific responses \(\beta _k^{(j)}\) are diagonal matrices, too. Each \(x_{i,t}\) satisfies the decompositions
with
as in the univariate Extended Wold Decomposition.
Finally, the MEWD of \(x_{t}\) turned out to be a refinement of the MCWD, where \( \varvec{\varepsilon }\) is the process of fundamental innovations of \( \textbf{x}\). Nonetheless, such persistence-based decomposition holds also in case \(\varvec{\varepsilon }\) is any unit variance white noise that allows a moving average representation of \(x_{t}\). Furthermore, in case \( \varvec{\varepsilon }\) has a positive definite covariance matrix \(\xi \), then \(\xi =\zeta \zeta \) for some symmetric positive definite \(\zeta \in A\). Then, \(\eta _{t}=\zeta ^{-1}\varepsilon _{t}\) defines a unit variance white noise and the MCWD and the MEWD of \(x_{t}\) become, respectively,
with \(\widetilde{\alpha }_{h}=\alpha _{h}\zeta \) and \(\widetilde{\beta } _{k}^{(j)}=\beta _{k}^{(j)}\zeta \). Alternatively, other tools can be used to factorize \(\xi \), as the Cholesky decomposition (see, for instance, the application in “Appendix C.3”).
4.3 Persistence-based variance decomposition
One of the strength of the MEWD is that it allows us to define a variance decomposition across the different persistence layers. The notion of orthogonality permits to decompose the variance of each entry \(x_{i,t}\) of the vector \(x_t\) into the sum of the variances of the corresponding entries of each persistent component \(g_t^{(j)}\). This permits to disentangle the exposure towards shocks with heterogeneous persistence without distinguishing among the different univariate disturbances. As a further step, the null correlation between the univariate innovations in the (multivariate) details \(\varepsilon _t^{(j)}\) allows us to decompose the variance of each entry of \(g_t^{(j)}\) in order to quantify the contribution of each source of randomness. In so doing, one can individuate the main time scales at which each univariate shock impacts the aggregate process \(x_t\).
To illustrate the variance decompositions, we consider a weakly stationary bivariate process \(x_t=[ y_t, z_t]^{\prime }\) with unit variance white noise \(\varepsilon _t= [ u_t, v_t ]^{\prime }\). We focus on \(y_t\) and on the first entry of the persistent components \(g_t^{(j)}\). The portion of variance of \( y_t\) associated with the latter is
and
Operationally, in order to assess the overall importance of each persistence level in explaining the total variance of \(y_t\), we can compute, for each scale j, the ratio \({\textrm{var}}^{(j)}(y_t) / {\textrm{var}}(y_t)\). In addition, to capture the effect of the persistence of each single shock, we can compute, at any scale j, the ratios
Although a persistence-based variance decomposition is already present in Ortu et al. (2020a), the orthogonality notion employed in the multivariate case makes it possible to quantify both the persistence of each time series in the vector process and the persistence of single shocks impacting each of them.
5 Conclusions
In this paper, we recast the standard treatment of multivariate time series in a Hilbert A-module framework and prove the Abstract Wold Theorem for Hilbert A-modules. This result allows us to revisit the MCWD and to derive multivariate version of the Extended Wold Decomposition of Ortu et al. (2020a), by using two different isometric operators. Interestingly, the MEWD provides a decomposition of the given vector process into uncorrelated persistent components driven by shocks with longer and longer duration. The orthogonality ensured by the theorem induces a variance decomposition that permits to establish the relative importance of each persistence component. Moreover, scale-specific responses allow us to isolate, on different time scales, dynamics that are not recognizable from the impulse responses of the aggregated process (some illustrations are in “Appendix C”).
The statistical inference about scale-specific responses is an important issue that goes beyond the scope of the paper. As to the estimators of the parameters in moving average models, one can refer to the literature summarized in the introduction of Ghysels et al. (2003), especially to the seminal work of Durbin (1959). However, a simple way to statistically test whether scale-specific responses are zero is using the bootstrap procedure (Efron and Tibshirani 1986) in the pipeline described in “Appendix C.2”. The bootstrap empirical distribution of scale-specific responses can be easily employed to obtain the test p value.
Data Availability
Data are publicly available.
Code Availability
Codes are available upon request.
Notes
An application of the extended Wold decomposition to market returns is provided in Di Virgilio et al. (2019).
We will use Latin letters a, b, c to denote elements of A, Latin letters x, y, z to denote elements of H, and Greek letters \(\alpha ,\beta \) to denote elements of \(\mathbb {R}\).
It is routine to show that the following statements, which we will use later on, are true:
-
1.
\(\langle z,x+y\rangle _{H}=\langle z,x\rangle _{H}+\langle z,y\rangle _{H}\) for all \(x,y,z\in H\);
-
2.
\(\langle x,a\cdot y\rangle _{H}=\langle x,y\rangle _{H}a^{*}\) for all \(a\in A\) and for all \(x,y\in H\);
-
3.
\(\langle x,\alpha y\rangle _{H}=\alpha \langle x,y\rangle _{H}\) for all \(\alpha \in \mathbb {R} \) and for all \(x,y\in H\).
-
1.
A nonempty subset N of H is a submodule if and only if, for each \(a,b\in A\) and \(x,y\in N\), \(a\cdot x+b\cdot y\in N\).
We are relying on the following fact whose proof is routine. Given a H self-dual pre-Hilbert A-module, if M, N, P, Q are four \(\Vert \; \Vert _{H}\) closed submodules such that \( H=M\oplus N,\ N=P\oplus Q,\ N=M^{\bot }\): and \(P\bot Q\), then \(Q^{\bot }=M\oplus P\).
With the notation \(\bigoplus _{n=0}^{\infty }\textbf{T}^{n}L\), we mean the \( \Vert \; \Vert _{H}\) closure of the set \(\bigcup _{k\in \mathbb {N} _{0}}M_{k}\).
Note that \(\textbf{S} \textbf{T}\in B^{\sim }( H) \subseteq B( H) \), thus \( \textbf{ST}\in B( H) \). We only need to prove A-linearity. Indeed, for each \(a,b\in A\) and \(x,y\in H\)
$$\begin{aligned} \left( \textbf{ST}\right) \left( a\cdot x+b\cdot y\right)&=\textbf{S} \left( \textbf{T}\left( a\cdot x+b\cdot y\right) \right) = \textbf{S}\left( a\cdot \textbf{T}x +b\cdot \textbf{T}y \right) \\&=a\cdot \textbf{S}\left( \textbf{T}x \right) +b\cdot \textbf{S}\left( \textbf{T}y \right) =a\cdot \left( \textbf{ST}\right) x +b\cdot \left( \textbf{ST}\right) y. \end{aligned}$$In the literature, other assumptions are also considered, e.g., stability (Lütkepohl 2005, Chapter 2).
Abbreviations
- MCWD:
-
Multivariate classical Wold decomposition
- MEWD:
-
Multivariate extended Wold decomposition
References
Abramovich, Y.A., Aliprantis, C.D.: An Invitation to Operator Theory. American Mathematical Society, Providence (2002)
Addison, P.S.: The Illustrated Wavelet Transform Handbook: Introductory Theory and Applications in Science, Engineering, Medicine and Finance. IOP Publishing, London (2002)
Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis: A Hitchhiker’s Guide, 3rd edn. Springer, Berlin (2006)
Bandi, F.M., Perron, B., Tamoni, A., Tebaldi, C.: The scale of predictability. J. Econom. 208(1), 120–140 (2019)
Bandi, F.M., Chaudhuri, S., Lo, A.W., Tamoni, A.: Spectral factor models. J. Financ. Econ. 142(1), 214–238 (2021)
Bierens, H.J.: Introduction to the Mathematical and Statistical Foundations of Econometrics. Cambridge University Press, Cambridge (2005)
Bierens, H.J.: The Wold decomposition (2012). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.231.2308
Blanchard, O.J., Quah, D.: The dynamic effects of aggregate demand and supply disturbances. Am. Econ. Rev. 79(4), 655–673 (1989)
Brezis, H.: Functional Analysis. Sobolev Spaces and Partial Differential Equations. Springer, New York (2011)
Brockwell, P.J., Davis, R.A.: Time Series: Theory and Methods, 2nd edn. Springer, New York (2006)
Cerreia-Vioglio, S., Maccheroni, F.A., Marinacci, M.: Hilbert \(A\)-modules. J. Math. Anal. Appl. 446(1), 970–1017 (2017)
Cerreia-Vioglio, S., Maccheroni, F.A., Marinacci, M.: Orthogonal decompositions in Hilbert \(A\)-modules. J. Math. Anal. Appl. 470(2), 846–875 (2019)
Cerreia-Vioglio, S., Ortu, F., Rotondi, F., Severino, F.: On horizon-consistent mean-variance portfolio allocation. Ann. Oper. Res. (2022). https://doi.org/10.1007/s10479-022-04798-x
Di Virgilio, D., Ortu, F., Severino, F., Tebaldi, C.: Optimal asset allocation with heterogeneous persistent shocks and myopic and intertemporal hedging demand. In: Venezia, I. (ed.) Behavioral Finance: The Coming of Age, pp. 57–108. World Scientific, Singapore (2019)
Durbin, J.: Efficient estimation of parameters in moving-average models. Biometrika 46(3/4), 306–316 (1959)
Efron, B., Tibshirani, R.: Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat. Sci. 1, 54–75 (1986)
Frank, M.: Hilbert C*-modules and related subjects—a guided reference overview (1996). arXiv:funct-an/9605003
Gallant, A.R., Hansen, L.P., Tauchen, G.: Using conditional moments of asset payoffs to infer the volatility of intertemporal marginal rates of substitution. J. Econom. 45(1–2), 141–179 (1990)
Ghysels, E., Khalaf, L., Vodounou, C.: Simulation based inference in moving average models. Ann. Econ. Stat. 69, 85–99 (2003)
Goldstine, H.H., Horwitz, L.P.: Hilbert space with non-associative scalars II. Math. Ann. 164(4), 291–316 (1966)
Hansen, L.P., Richard, S.F.: The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models. Econometrica 55(3), 587–613 (1987)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1990)
Kaplansky, I.: Modules over operator algebras. Am. J. Math. 75(4), 839–858 (1953)
Lütkepohl, H.: New Introduction to Multiple Time Series Analysis. Springer, Berlin (2005)
Meyer, C.D.: Matrix Analysis and Applied Linear Algebra. SIAM, Philadelphia (2000)
Ortu, F., Severino, F., Tamoni, A., Tebaldi, C.: A persistence-based Wold-type decomposition for stationary time series. Quant. Econ. 11(1), 203–230 (2020a)
Ortu, F., Severino, F., Tamoni, A., Tebaldi, C.: Supplement to A persistence-based Wold-type decomposition for stationary time series. Quant. Econ. (2020b). https://doi.org/10.3982/QE994
Raeburn, I., Williams, D.P.: Morita Equivalence and Continuous-Trace \(C^{\ast }\)-Algebras. American Mathematical Society, Providence (1998)
Rozanov, Y.A.: Stationary Random Processes. Holden-Day, San Francisco (1967)
Severino, F.: Isometric operators on Hilbert spaces and Wold decomposition of stationary time series. Decis. Econ. Finance 39(2), 203–234 (2016)
Sz-Nagy, B., Foias, C., Bercovici, H., Ké rchy, L.: Harmonic Analysis of Operators on Hilbert Spaces. Springer, New York (2010)
Trefethen, L.N., Bau, D., III.: Numerical Linear Algebra. SIAM, Philadelphia (1997)
Wiener, N., Masani, P.: The prediction theory of multivariate stochastic processes. Acta Math. 98(1–4), 111–150 (1957)
Wold, H.: A Study in the Analysis of Stationary Time Series. Almqvist & Wiksells Boktryckeri, Uppsala (1938)
Acknowledgements
We thank Giorgio Primiceri for valuable insights and the participants at the 14th International Conference on Computational and Financial Econometrics (CFE), University of London (virtual, 2020), at the 2021 Annual Congress of the Swiss Society of Economics and Statistics (SSES), Universität Zürich (virtual, 2021) and at the 6th Canadian Conference in Applied Statistics, Concordia University (virtual, 2021).
Funding
S. Cerreia-Vioglio acknowledges the financial support of ERC Grant SDDM-TEA.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they do not have any financial or non-financial conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
“Appendix A” focuses on the Hilbert A-module theory. “Appendix B” illustrates the derivation of the multivariate extended Wold decomposition (MEWD). “Appendix C” provides some illustrations of the MEWD, including an application to Blanchard and Quah (1989).
Appendix A Hilbert A-modules
In this appendix, we present a primer on Hilbert A-modules. The purpose is twofold: (a) to present a uniform and self-contained treatment of the topic, (b) to present tools and results that are key for our theory and we could not find in the literature. We will mostly focus our attention to the case of A being the algebra of square matrices with real values, but we will keep our setting abstract in order to avoid getting lost in useless details.
Hilbert A-modules are to algebras as Hilbert spaces are to the real/complex field. In particular, one starts from the observation that the scalar field \(\mathbb {R}\) in a Hilbert space can be replaced by an abstract algebra A: for example, the algebra of matrices. All definitions (e.g., Definition 1) are then kept identical to the ones of the scalar case. Since the seminal paper of Kaplansky (1953), Hilbert A-modules have been widely studied in mathematics. In applications, their use seems to be scarcer even though there are notable exception. In economics, a particular Hilbert A-module was studied and used by Hansen and Richard (1987) to prove a conditional version of the fundamental theorem of asset pricing, while in statistics, Wiener and Masani (1957) studied the complex version of the Hilbert A-module we use in our application, to provide a proof for the multivariate classical Wold decomposition (MCWD). This is consistent with the mathematical literature. This literature, starting with Kaplansky (1953), focused on complex algebras and developed very rapidly and in a non-systematic/scattered way (see, e.g., Frank 1996 for an account). On the other hand, the real case received little attention. One notable exception to this is the paper of Goldstine and Horwitz (1966) which deals with the case we have at hand here: the algebra of square real matrices. As the appendix progresses, we will highlight what are the overlaps between our work and theirs. Most notably, they prove the Riesz representation theorem and the projection theorem, in a reverse order. Both results are instrumental in proving the main result of this appendix which is the Abstract Wold Theorem for Hilbert A-modules (Theorem 1).
The reader might be tempted to think that Hilbert A-modules behave exactly like Hilbert spaces. One key feature which makes them appealing for applications is that most of the statements valid for Hilbert spaces, seemingly carry over to this more general structure: the caveat is that the proofs not always generalize in a similar fashion.
1.1 A.1 \(C^{*}\)-algebras: the new scalars
Let A be a real \(C^{*}\)-algebra with (multiplicative) unit e which is \(^{*}\)-isomorphic to the real \(C^{*}\)-algebra of bounded operators over a real Hilbert space. In particular, A is a real normed algebra with multiplicative unit e, we denote by \( \Vert \ \Vert _{A}\) the norm of A. We denote the norm dual of A by \( A^{*}\). Recall that A is also a \(C^{*}\)-algebra with unit, that is, there exists an involution \(^{*}:A\rightarrow A\) such that for each \( a,b\in A\) and \(\alpha \in \mathbb {R} \)
The involution also well behaves with the norm, that is,
The algebra A is also naturally ordered by the order \(\ge \) induced by the closed convex cone of positive elements that are such that \(a=a^{*}\) (in the real case, the extra requirement \(a=a^{*}\) is not redundant). We denote by \(A_{+}=\{a\in A:a\ge 0\}\). The following properties will be very useful in what follows:
-
1.
\(\Vert a \Vert _{A}=\Vert a^{*} \Vert _{A}\);
-
2.
If \(a\in A\), then we have that \(a^{*}a\in A_{+}\);
-
3.
If \(a\ge 0\), then \(bab^{*}\le \Vert a \Vert _{A}bb^{*}\);
-
4.
If \(a\ge b\ge 0\), then \(\Vert a \Vert _{A}\ge \Vert b \Vert _{A}\);
-
5.
If A is finite dimensional, then there exists a (continuous) linear functional \(\bar{\varphi }:A\rightarrow \mathbb {R}\) such that
$$\begin{aligned} a&\ge 0\implies \bar{\varphi }\left( a\right) \ge 0 \\ a&\ge 0\text { and }\bar{\varphi }\left( a\right) =0\iff a=0 \\ \bar{\varphi }\left( a\right)&=\bar{\varphi }\left( a^{*}\right) \qquad \forall a\in A \\ \exists K&>0\text { such that }\left\| a\right\| _{A}\le \bar{\varphi } \left( a\right) \le K\left\| a\right\| _{A}\qquad \forall a\ge 0. \end{aligned}$$
We will call a continuous and linear functional \(\bar{\varphi }\) that satisfies the first three properties of point 5 strictly positive. We will call a functional as in point 5 a trace. Since \(A_{+}\) is a closed convex cone, there exists a closed and convex set \(C\subseteq A^{*}\ \)such that
1.2 A.2 Pre-Hilbert A-modules
Consider A as above. We next proceed by defining the objects we study in this paper.
Definition 1
An abelian group \(( H,+) \) is an A-module if and only if an outer product \(\cdot :A\times H\rightarrow H\) is well-defined with the following properties, for each \(a,b\in A\) and for each \( x,y\in H\):
-
(1)
\(a\cdot ( x+y) =a\cdot x+a\cdot y\);
-
(2)
\(( a+b) \cdot x=a\cdot x+b\cdot x\);
-
(3)
\(a\cdot ( b\cdot x) = ( ab ) \cdot x\);
-
(4)
\(e\cdot x=x\).
An A-module is a pre-Hilbert A-module if and only if an inner product \(\langle \;,\; \rangle _{H}:H\times H\rightarrow A\) is well-defined with the following properties, for each \( a\in A\) and for each \(x,y,z\in H\):
-
(5)
\(\langle x,x \rangle _{H}\ge 0\), with equality if and only if \( x=0 \);
-
(6)
\(\langle x,y \rangle _{H}=\langle y,x \rangle _{H}^{*}\);
-
(7)
\(\langle x+y,z \rangle _{H}= \langle x,z \rangle _{H}+ \langle y,z \rangle _{H}\);
-
(8)
\(\langle a\cdot x,y \rangle _{H}=a \langle x,y \rangle _{H}\).
For \(A=\mathbb {R}\) conditions (1)–(4) define vector spaces, while (5)–(8) define pre-Hilbert spaces.Footnote 2\(^{,}\)Footnote 3
Given a pre-Hilbert A-module, by adapting the techniques of Raeburn and Williams (1998, Lemma 2.5) to the real case, we will show that
where \(\Vert \; \Vert _{A}\) is the norm of A.
Given an element \(y\in H\), note that \(\langle \;,\; \rangle _{H}\) induces an operator \(f:H\rightarrow A\) defined as \(f( x) = \langle x,y \rangle _{H}\) with the following properties:
-
\(\varvec{A}\)-linearity \(f( a\cdot x+b\cdot y) =af( x) +bf( y) \) for all \(a,b\in A\) and for all \(x,y\in H\);
-
Boundedness There exists \(M>0\) such that \(\Vert f( x) \Vert _{A}^{2}\le M \Vert \langle x,x \rangle _{H} \Vert _{A}\) for all \( x\in H\).
In light of this fact, we give the following definition:
Definition 2
Let H be a pre-Hilbert A-module. We say that H is self-dual if and only if for each \(f:H\rightarrow A\) which is A-linear and bounded there exists \(y\in H\) such that
It is rather easy to see that if H is self-dual, then each A-linear and bounded \(f:H\rightarrow A\) is represented by a unique vector y.
1.3 A.3 The vector space structure of H
In this section, we will first show that a pre-Hilbert A-module has a natural structure of vector space. Next, we will show that the A-valued inner product \(\langle \;,\; \rangle _{H}\) shares some of the properties of standard real-valued inner products. In particular, under the assumption that A admits a strictly positive functional \(\bar{\varphi }\), we will show that it also induces a real valued inner product on H, thus making H into a pre-Hilbert space.
We use the outer product \(\cdot \) to define a scalar product:
We next show that \(\cdot ^{e}\) makes the abelian group H into a real vector space.
Proposition 5
Let H be an A-module. \(( H,+,\cdot ^{e}) \) is a real vector space.
Proof
By assumption, H is an abelian group. For each \( \alpha ,\beta \in \mathbb {R}\) and each \(x,y\in H\), we have that
-
(1)
\(\alpha \cdot ^{e}( x+y) =\alpha e\cdot ( x+y) =( \alpha e) \cdot x+( \alpha e) \cdot y=\alpha \cdot ^{e}x+\alpha \cdot ^{e}y\);
-
(2)
\(( \alpha +\beta ) \cdot ^{e}x=( ( \alpha +\beta ) e) \cdot x=( \alpha e+\beta e) \cdot x=( \alpha e) \cdot x+( \beta e) \cdot x=\alpha \cdot ^{e}x+\beta \cdot ^{e}x\);
-
(3)
\(\alpha \cdot ^{e}( \beta \cdot ^{e}x) =( \alpha e) \cdot ( ( \beta e) \cdot x) =( ( \alpha e) ( \beta e) ) \cdot x=( ( \alpha \beta ) e) \cdot x=( \alpha \beta ) \cdot ^{e}x\);
-
(4)
\(1\cdot ^{e}x=( 1e) \cdot x=e\cdot x=x\).
\(\blacksquare \)
From now on, we will often write \(\alpha x\) in place of \(\alpha \cdot ^{e}x\).
Corollary 6
Let H be an A-module. If \(f:H\rightarrow A\) is an A-linear operator, then f is linear.
Proof
Consider \(x,y\in H\) and \(\alpha ,\beta \in \mathbb {R} \). We have that
\(\blacksquare \)
Assume A admits a strictly positive functional \(\bar{\varphi }\). Define \( \langle \;,\; \rangle _{\bar{\varphi }}:H\times H\rightarrow \mathbb {R} \) by
Proposition 7
Let H be a pre-Hilbert A -module. If A admits a strictly positive functional \(\bar{\varphi }\), then \(\langle \;,\; \rangle _{\bar{\varphi }}\) is an inner product.
Proof
We prove four properties:
a. Consider \(x\in H\). By assumption, we have that \(\langle x,x\rangle _{H}\ge 0\). Since \(\bar{\varphi }\) is positive, it follows that \( \langle x,x \rangle _{\bar{\varphi }}=\bar{\varphi }( \langle x,x \rangle _{H}) \ge 0\). Since \(\bar{\varphi }\) is strictly positive and \(\langle x,x \rangle _{H}\ge 0 \), note also that
b. Consider \(x,y\in H\). Since \(\bar{\varphi }( a) =\bar{\varphi }( a^{*}) \) for all \(a\in A\), we have that
c. Consider \(x,y,z\in H\). Since \(\bar{\varphi }\) is linear, we obtain that
d. Consider \(x,y\in H\) and \(\alpha \in \mathbb {R} \). Since \(\bar{\varphi }\) is linear, we obtain that
Properties a–d yield the statement.
Corollary 8
Let H be a pre-Hilbert A-module. If A admits a strictly positive functional \(\bar{\varphi }\), then \(( H,+,\cdot ^{e}, \langle \;,\; \rangle _{\bar{\varphi }}) \) is a pre-Hilbert space.
Proposition 9
Let H be a pre-Hilbert A-module. The following statements are true:
-
1.
\(\langle x,y \rangle _{H}^{*} \langle x,y \rangle _{H}\le \Vert \langle x,x \rangle _{H} \Vert _{A} \langle y,y \rangle _{H}\) for all \( x,y\in H\);
-
2.
\(\Vert \langle x,y \rangle _{H} \Vert _{A}^{2}\le \Vert \langle x,x \rangle _{H} \Vert _{A} \Vert \langle y,y \rangle _{H} \Vert _{A}\) for all \( x,y\in H\);
-
3.
\(\Vert \langle x,y \rangle _{H}\Vert _{A}\le \Vert \langle x,x \rangle _{H} \Vert _{A}^{\frac{1}{2}} \Vert \langle y,y \rangle _{H} \Vert _{A}^{\frac{1}{2}}\) for all \(x,y\in H\).
Proof
Consider \(w,z\in H\) and assume that \(\langle w,z \rangle _{H}= \langle w,z \rangle _{H}^{*}\). Then, for each \(t\ge 0\)
Consider \(\varphi \in C\). It follows that
yielding that
Choose \(\bar{x},\bar{y}\in H\). Define \(w= \langle \bar{x},\bar{y} \rangle _{H}^{*} \cdot \bar{x}\) and \(z=\bar{y}\). It follows that
yielding that \(\langle w,z \rangle _{H}= \langle w,z \rangle _{H}^{*}\) and (A2) holds. In particular, we have that
Define \(a= \langle \bar{x},\bar{x} \rangle _{H}\) and \(b= \langle \bar{x}, \bar{y} \rangle _{H}^{*}\). Recall that \(bab^{*}\le \Vert a \Vert _{A}bb^{*}\) and \(bb^{*}\ge 0\). Thus, we have that
and \(\varphi ( \langle \bar{x},\bar{y} \rangle _{H}^{*} \langle \bar{x},\bar{y} \rangle _{H}) \ge 0\). We thus have that
Since \(\varphi \) was arbitrarily chosen, we have that (A3) holds for all \(\varphi \in C\), that is, by (A1)
Since \(\bar{x}\) and \(\bar{y}\) were arbitrarily chosen, the statement follows.
2. Consider \(x,y\in H\). Call \(a= \langle x,y \rangle _{H}\) and \(b= \Vert \langle x,x \rangle _{H} \Vert _{A} \langle y,y \rangle _{H}\). By point 1, we have that \(0\le a^{*}a\le b\). It follows that
3. It trivially follows from point 2. \(\blacksquare \)
1.3.1 A.3.1 Topological structure
The \(\Vert \; \Vert _{H}\) norm
Define \(\Vert \; \Vert _{H}:H\rightarrow [ 0,+\infty )\) by
Proposition 10
Let H be pre-Hilbert A-module. The following statements are true:
-
1.
\(\Vert \; \Vert _{H}\) is a norm;
-
2.
\(\Vert a\cdot x\Vert _{H}\le \Vert a\Vert _{A}\Vert x\Vert _{H}\) for all \(a\in A\) and for all \(x\in H\).
Proof
1. Note that
Note also that for each \(\alpha \in \mathbb {R} \) and \(x\in H\)
Finally, we have that for each \(x,y\in H\)
We can thus conclude that
proving that \(\Vert \; \Vert _{H}\) is a norm.
2. Given \(a\in A\) and \(x\in H\), define \(b=\left\langle x,x\right\rangle _{H}\ge 0\). We have
\(\blacksquare \)
By Proposition 9, it readily follows that
Corollary 11
Let H be a pre-Hilbert A-module. For each \(y\in H\), the functional \(\langle \cdot ,y \rangle _{H}:H\rightarrow A\) is A-linear, \( \Vert \; \Vert _{H}- \Vert \; \Vert _{A}\) continuous, and has norm \(\Vert y \Vert _{H}\).
Proof
Fix \(y\in H\). It is immediate to see that the operator induced by y is A-linear, thus, linear. Continuity easily follows from (A4). Since the norm of the linear operator is given by
the statement easily follows from (A4) and the definition of \(\Vert \; \Vert _{H}\). \(\blacksquare \)
The \(\Vert \; \Vert _{\bar{\varphi }}\) norm
Assume A admits a strictly positive functional \(\bar{\varphi }\). Define \( \Vert \ \Vert _{\bar{\varphi }}:H\rightarrow [ 0,+\infty ) \) by
By Corollary 8, \(\langle \;,\; \rangle _{ \bar{\varphi }}\) is an inner product on H and it is immediate to see that \( \Vert \; \Vert _{\bar{\varphi }}\) is a norm and
Relations among norms
Assume A admits a strictly positive functional \(\bar{\varphi }\). Since \( \bar{\varphi }\) is a continuous linear functional, it follows that there exists \(K>0\) such that
This implies that
that is,
We can conclude that
Proposition 12
Let H be a pre-Hilbert A-module. If A is finite dimensional, then A admits a trace \(\bar{\varphi }\) and the norms \( \Vert \; \Vert _{\bar{\varphi }}\) and \(\Vert \; \Vert _{H}\) are equivalent.
Proof
Since A is finite dimensional, there exists\(\ K>0 \) such that \(\Vert a \Vert _{A}\le \bar{\varphi }( a) \le K \Vert a \Vert _{A}\) for all \(a\ge 0\). It follows that
proving the statement. \(\blacksquare \)
1.4 A.4 Dual module
Given a pre-Hilbert A-module H, we define
By definition of boundedness and \(\Vert \; \Vert _{H}\), we have that f is bounded if and only if there exists \(M>0\) such that
Recall that if \(f\in H^{\sim }\), then f is linear. Thus, in this case, we have that \(H^{\sim }\subseteq B( H,A) \), where the latter is the set of all bounded linear operators from H to A when H is endowed with \(\Vert \; \Vert _{H}\) and A is endowed with \(\Vert \; \Vert _{A}\).
Proposition 13
If H is a pre-Hilbert A-module, then \(H^{\sim }\) is an A-module.
Proof
Define \(+:H^{\sim }\times H^{\sim }\rightarrow H^{\sim }\) to be such that for each \(f,g\in H^{\sim }\)
In other words, \(+\) is the usual pointwise sum of operators. Define \( \cdot :A\times H^{\sim }\rightarrow H^{\sim }\) to be such that for each \(a\in A\) and for each \(f\in H^{\sim }\)
It is immediate to verify that \(H^{\sim }\) is closed under \(+\) and \(\cdot \). In particular, \(( H,+) \) is an abelian group. Note that for each \(a,b\in A\) and each \(f,g\in H^{\sim }\):
-
1.
\(( a\cdot ( f+g) ) ( x) =( ( f+g) ( x) ) a^{*}=( f( x) +g( x) ) a^{*}=f( x) a^{*}+g( x) a^{*}=( a\cdot f) ( x) +( a\cdot g) ( x) =( a\cdot f+a\cdot g) ( x) \) for all \(x\in H\), that is, \(a\cdot ( f+g) =a\cdot f+a\cdot g\).
-
2.
\(( ( a+b) \cdot f) ( x) =f( x) ( a+b) ^{*}=f( x) ( a^{*}+b^{*}) =f( x) a^{*}+f( x) b^{*}=( a\cdot f) ( x) +( b\cdot f) ( x) =( a\cdot f+b\cdot f) ( x) \) for all \(x\in H\), that is, \(( a+b) \cdot f=a\cdot f+b\cdot f\).
-
3.
\(( a\cdot ( b\cdot f) ) ( x) =( ( b\cdot f) ( x) ) a^{*}=( f( x) b^{*}) a^{*}=f( x) ( b^{*}a^{*}) =f( x) ( ab) ^{*}=( ( ab) \cdot f) ( x) \) for all \(x\in H\), that is, \(a\cdot ( b\cdot f) =( ab) \cdot f \).
-
4.
\(( e\cdot f) ( x) =f( x) e^{*}=f( x) e=f( x) \) for all \(x\in H\), that is, \(e\cdot f=f\).\(\blacksquare \)
Since \(H^{\sim }\) is an A-module, it is also a vector space. Note that the scalar product \(\cdot ^{e}\) coincides with the usual scalar product defined on B(H, A) once restricted to \(H^{\sim }\). Thus, we can also define a norm \( \Vert \; \Vert _{H^{\sim }}:H^{\sim }\rightarrow [0,+\infty )\) defined by
Define \(S^{\sim }:H\rightarrow H^{\sim }\) by
Given Corollary 11 and the properties of \(\langle \text { },\; \rangle _{H}\), the map \(S^{\sim }\) is well-defined and linear. In fact, for each \(\alpha ,\beta \in \mathbb {R} \) and for each \(x,y,z\in H\)
proving that
Proposition 14
Let H be a pre-Hilbert A-module. The following statements are true:
-
1.
\(H^{\sim }\) is \(\Vert \; \Vert _{H^{\sim }}\) complete.
-
2.
\(S^{\sim }\) is an isometry, that is, \(\Vert S^{\sim }( y) \Vert _{H^{\sim }}= \Vert y \Vert _{H}\) for all \(y\in H\).
-
3.
If H is self-dual, then \(S^{\sim }\) is onto and H is \(\Vert \; \Vert _{H}\) complete.
Proof
1. By Proposition 13, \(H^{\sim } \) is an A-module. In particular, \(H^{\sim }\) is a vector subspace of B(H, A). Consider a \(\Vert \; \Vert _{H^{\sim }}\) Cauchy sequence \( \{f_{n}\}_{n\in \mathbb {N} }\subseteq H^{\sim }\subseteq B(H,A)\). By Aliprantis and Border (2006, Theorem 6.6) and since A is \(\left\| \; \right\| _{A}\) complete, we have that there exists \(f\in B(H,A)\) such that \(f_{n}\overset{\Vert \; \Vert _{H^{\sim }}}{\rightarrow }f\). We are left to show that f is A-linear. First, observe that \(f:H\rightarrow A\) is such that
where the limit is in norm \(\Vert \; \Vert _{A}\). We can conclude that for each \(a,b\in A\) and \(x,y\in H\)
At the same time, \(af_{n}(x)+bf_{n}(y)=f_{n}(a\cdot x+b\cdot y)\overset{ \Vert \; \Vert _{A}}{\rightarrow }f(a\cdot x+b\cdot y)\) for all \(a,b\in A\) and \(x,y\in H\). By the uniqueness of the limit, we can conclude that \( f(a\cdot x+b\cdot y)=af(x)+bf(y)\) for all \(a,b\in A\) and \(x,y\in H\), proving the statement.
2. Recall that \(S^{\sim }:H\rightarrow H^{\sim }\) is defined by
By Corollary 11, it follows that \(\Vert S^{\sim }(y)\Vert _{H^{\sim }}=\Vert y\Vert _{H}\) for all \(y\in H\).
3. If H is self-dual, it is immediate to see that \(S^{\sim }\) is onto. Consider a \(\Vert \; \Vert _{H}\) Cauchy sequence \(\{ x_{n} \} _{n\in \mathbb {N} }\subseteq H\). Since \(S^{\sim }\) is an isometry, it follows that \(\{ S^{\sim }( x_{n}) \} _{n\in \mathbb {N} }\) is a \(\Vert \; \Vert _{H^{\sim }}\) Cauchy sequence in \(H^{\sim }\). Since \(H^{\sim }\) is \(\Vert \; \Vert _{H^{\sim }}\) complete and \( S^{\sim } \) is onto, it follows that there exists \(f\in H^{\sim }\) such that \( S^{\sim }( x_{n}) \overset{ \Vert \text { } \Vert _{H^{\sim }}}{\rightarrow } f=S^{\sim }( x) \) for some \(x\in H^{\sim }\). Since \(S^{\sim }\) is an isometry, we have that \(x_{n}\overset{ \Vert \; \Vert _{H}}{\rightarrow }x\), proving that H is \(\Vert _{\; } \Vert _{H}\) complete. \(\blacksquare \)
1.5 A.5 Self-duality
Theorem 15
Let A be finite dimensional and H a pre-Hilbert A-module. The following statements are equivalent:
-
(i)
H is \(\Vert \; \Vert _{H}\) complete, that is, H is a Hilbert A-module;
-
(ii)
H is \(\Vert \; \Vert _{\bar{\varphi }}\) complete;
-
(iii)
H is self-dual.
Proof
Since A is finite dimensional, it admits a trace \(\bar{\varphi }\).
(i) implies (ii). By Proposition 12 and since A is finite dimensional, \(\Vert \; \Vert _{\bar{\varphi }}\) and \(\Vert \; \Vert _{H}\) are equivalent. It follows that H is \(\Vert \; \Vert _{\bar{\varphi }}\) complete.
(ii) implies (iii). By Corollary 8 and since H is \( \Vert \; \Vert _{\bar{\varphi }}\) complete, it follows that H is a Hilbert space with inner product \(\langle \text { },\; \rangle _{\bar{ \varphi }}\). Consider \(f:H\rightarrow A\) which is A-linear and bounded. In particular, by the proof of Proposition 12, we have that there exists \(M>0\) such that
We can conclude that \(f:H\rightarrow A\) is linear and \(\Vert \text { }\Vert _{ \bar{\varphi }}-\Vert \; \Vert _{A}\) continuous. Consider the linear functional \(l=\bar{\varphi }\circ f\). Since \(\bar{\varphi }\) is \(\Vert \; \Vert _{A}\) continuous and f is \(\Vert \; \Vert _{\bar{\varphi } }-\Vert \ \Vert _{A}\) continuous, we have that l is \(\Vert _{\; }\Vert _{\bar{\varphi }}\) continuous. By the standard Riesz representation theorem (Brezis 2011, Theorem 5.5), there exists (a unique) \(y\in H\ \)such that \(l(x)=\langle x,y\rangle _{\bar{\varphi }}\) for all \(x\in H\). It follows that, for all \(x\in H\),
Fix \(\bar{x}\in H\). Define \(a=(f(\bar{x})-\langle \bar{x},y\rangle _{H})^{*}\in A\). By Eq. (A7), we have that
Since \(\bar{\varphi }\) is a trace and \(aa^{*}\ge 0\), this implies that \( aa^{*}=0\), that is, \(\Vert a^{*}\Vert _{A}^{2}=\Vert aa^{*}\Vert _{A}=0\). We can conclude that \(f(\bar{x})-\langle \bar{x},y\rangle _{H}=a^{*}=0\). Since \(\bar{x}\) was arbitrarily chosen, it follows that \( f(x)=\langle x,y\rangle _{H}\) for all \(x\in H\), proving that H is self-dual.
(iii) implies (i). By point 3 of Proposition 14, it follows that H is \(\Vert \; \Vert _{H}\) complete. \(\blacksquare \)
The implication (ii) implies (iii) can be found in Goldstine and Horwitz (1966) although few mathematical differences are present. Namely, Goldstine and Horwitz use a different norm over A. The characterization of self-duality, that is, the remaining implications, to the best of our knowledge is novel. A similar observation holds for the implication (i) implies (ii) of Proposition 17.
1.5.1 A.5.1 Orthogonal decompositions
Pre-Hilbert A-modules behave very much like Hilbert spaces also in terms of orthogonal decompositions. Consider a pre-Hilbert A-module H and let \( M\subseteq H\). Define
If M is nonempty, then it is immediate to prove that \(M^{\bot }\) is a submodule.Footnote 4 It is also immediate to show that \(M\cap M^{\bot }=\{0\}\) and that \(M^{\bot \bot }\supseteq M\) where
Before stating our result on orthogonal decompositions, we need an ancillary fact.
Lemma 16
Let H be a pre-Hilbert A-module. If \( M\subseteq H\), then \(M^{\bot }\) is \(\Vert \; \Vert _{H}\) closed.
Proof
Fix \(z\in H\) and define \(\ker \{ z \} = \{ x\in H: \langle x,z \rangle _{H}=0 \} \). Consider a sequence \(\{ x_{n} \} _{n\in \mathbb {N} }\subseteq \ker \{ z \} \) such that \(x_{n}\overset{ \Vert \; \Vert _{H} }{\rightarrow }x\). Since \(S^{\sim }( z) \) is \(\Vert \text { } \Vert _{H}- \Vert \; \Vert _{A}\) continuous, it follows that \(S^{\sim }( z) ( x) =0\), that is, \(\ker \{ z \} \) is closed. Since \(M^{\bot }=\bigcap _{y\in M}\ker \{ y \} \), the statement follows. \(\blacksquare \)
Proposition 17
Let A be finite dimensional and H a pre-Hilbert A-module. If H is self-dual and M is a submodule of H, then the following statements are equivalent:
-
(i)
M is \(\Vert \; \Vert _{H}\) closed;
-
(ii)
\(H=M\oplus M^{\bot }\);
-
(iii)
\(M=M^{\bot \bot }\).
Proof
(i) implies (ii). Clearly, \(M\oplus M^{\bot }\subseteq H\). We next prove the opposite inclusion. Since M is a submodule of H, if we define \(\langle \;,\; \rangle _{M}\) as the restriction of \(\langle \;,\; \rangle _{H}\) to \(M\times M\), then \((M,+,\cdot ,\langle \;,\; \rangle _{M})\) is a pre-Hilbert A-module. It is immediate to see that \(\Vert \text { }\Vert _{M}=\Vert \; \Vert _{H}\) once the latter is restricted to M. By Theorem 15 and since M is \(\Vert \; \Vert _{H}\) closed, it follows that M is \(\Vert \; \Vert _{M}\) complete and is itself self-dual. Fix \(y\in H\). The map defined on M by \( x\mapsto \langle x,y\rangle _{H}\) is A-linear and bounded. Since M is self-dual, it follows that there exists a unique \(y_{1}\in M\) such that \( \left\langle x,y_{1}\right\rangle _{H}=\left\langle x,y_{1}\right\rangle _{M}=\left\langle x,y\right\rangle _{H}\) for all \(x\in M\). Define \( y_{2}=y-y_{1}\). It follow that \(\left\langle x,y-y_{1}\right\rangle _{H}=0\) for all \(x\in M\), that is, \(y_{2}\in M^{\bot }\). It is also immediate to see that \(y_{1}+y_{2}=y\). Since \(y\ \)was arbitrarily chosen, we can conclude that \(H\subseteq M\oplus M^{\bot }\).
(ii) implies (iii). Since \(M\subseteq M^{\bot \bot }\), we only need to prove the opposite inclusion. By assumption, if \(x\in M^{\bot \bot }\), then there exists \(x_{M}\in M\ \)and \(x_{M^{\bot }}\in M^{\bot }\) such that \( x=x_{M}+x_{M^{\bot }}\). Since \(M\subseteq M^{\bot \bot }\), we have that \( M^{\bot }\ni x_{M^{\bot }}=x-x_{M}\in M^{\bot \bot }\). Since \(M^{\bot }\cap M^{\bot \bot }=\{0\}\), this implies that \(x-x_{M}=0\), that is, \(x=x_{M}\in M\), proving the opposite inclusion and the statement.
(iii) implies (i). By Lemma 16 and since \( M=M^{\bot \bot }=(M^{\bot })^{\bot }\), it follows that M is \(\Vert \; \Vert _{H}\) closed.\(\blacksquare \)
We conclude with a last piece of notation given \(M,N\subseteq H\) we write \( M\bot N\) if and only if \(\langle x,y \rangle _{H}=0\) for all \(x\in M\) and \( y\in N\). Clearly, we have that \(M\bot M^{\bot }\) for all \(M\subseteq H\).
Proposition 17 allows us to define the (orthogonal) projection of an element \(x \in H\) on a \(\Vert \; \Vert _{H}\) closed submodule M.
Definition 3
Let A be finite dimensional, H a Hilbert A -module, and \(M\subseteq H\) a \(\Vert \; \Vert _{H}\) closed submodule. We call projection on M the linear map \(\mathcal {P} _{M}:H\rightarrow M\) such that, for any \(x\in H\),
where \(x_{M}\in M\) and \(x_{M^{\bot }}\in M^{\bot }\) are the unique elements that satisfy \(x=x_{M}+x_{M^{\bot }}\).
Given \(x\in H\) and \(y\in M\), we have that \(y=\mathcal {P}_{M}x\) if and only if \(\langle x-y,z\rangle _{H}=0\) for all \(z\in M\). Moreover, since \(\mathcal { P}_{M}x\in M\) and \(x-\mathcal {P}_{M}x\in M^{\perp }\), \(\mathcal {P}_{M}x\) minimizes the distance between x and the submodule M since, for all \(z\in M\),
1.6 A.6 The abstract Wold theorem for Hilbert A-modules
In this section, we prove a (generalized) version for Hilbert A-modules of the Abstract Wold Theorem (see, e.g., Sz-Nagy et al. 2010, Theorem 1.1). It is important to observe that the properties of self-duality and complementability (see Theorem 15 and Proposition 17) are fundamental in allowing us to follow the proof strategy used for Hilbert spaces.
Definition 4
We say that \(\textbf{T}:H\rightarrow H\) is an isometry if and only if \(\textbf{T}\) is A-linear and such that
Note that an isometry in this sense satisfies the usual property
It is immediate to prove by induction that for each \(n\in \mathbb {N}_{0}\) the iterate \(\textbf{T}^{n}\) satisfies Eqs. (A8) and (A9). In particular, by Abramovich and Aliprantis (2002, Theorem 2.5), if H is \(\Vert \; \Vert _{H}\) complete, \(\textbf{T}^{n}H\) is a \(\Vert \; \Vert _{H}\) closed submodule of H.
Definition 5
Let \(\textbf{T}:H\rightarrow H\) be an isometry. We say that a submodule L is wandering if and only if \(\textbf{T} ^{n} L \bot \textbf{T}^{m} L\) for all \(m,n\in \mathbb {N}_{0}\) such that \( m\not =n\).
Lemma 18
Let A be finite dimensional and H a pre-Hilbert A-module. If H is self-dual and \(\textbf{T}:H\rightarrow H\) an isometry, then the following statements are true:
-
1.
If M is \(\Vert \ \Vert _{H}\) closed, so is \(\textbf{T}M \).
-
2.
If \(L=(\textbf{T}H) ^{\bot }\), then L is wandering.
-
3.
If \(L=(\textbf{T}H) ^{\bot }\), then for each \(n\in \mathbb {N} _{0}\)
$$\begin{aligned} \textbf{T}^{n} H =\textbf{T}^{n} L \oplus \textbf{T}^{n+1} H \quad \text {and} \quad \textbf{T}^{n} L \bot \textbf{T}^{n+1} H . \end{aligned}$$ -
4.
If \(L=(\textbf{T}H) ^{\bot }\), then for each \(k\in \mathbb {N} _{0}\)
$$\begin{aligned} \bigoplus _{n=0}^{k}\textbf{T}^{n} L =\left( \textbf{T}^{k+1} H\right) ^{\bot }. \end{aligned}$$
Proof
1. Since \(\textbf{T}\) is A-linear, that is, for each \(a,b\in A\) and each \(x,y\in H\)
we have that \(\textbf{T}\) is linear. By the proof of Abramovich and Aliprantis (2002, Theorem 2.5) and since \(\textbf{T }\) satisfies Eq. (A9), we have that \(\textbf{T}M\) is closed.
2. Observe that \(\textbf{T}^{n}H \subseteq H\) for all \(n\in \mathbb {N} _{0}\). It follows that \(\textbf{T}^{n} H \subseteq \textbf{T}H\) for all \(n\in \mathbb {N} \). Since \(L\subseteq H\), it also follows that \(\textbf{T}^{n} L \subseteq \textbf{T}^{n} H \subseteq \textbf{T}H \) for all \(n\in \mathbb {N} \). Since \(\textbf{T}H \bot L\), this implies that \(\textbf{T}^{n} L \bot L\) for all \(n\in \mathbb {N} \). Next, consider \(m,n\in \mathbb {N} _{0}\) such that \(m\not =n\). Without loss of generality, assume that \(n>m\). By the previous part of the proof, we have that \(\textbf{T}^{n-m} L \bot L\). By Eq. (A8), we can conclude that \(\textbf{T}^{n} L \bot \textbf{T}^{m} L \).
3. We proceed by induction.
Initial Step. \(n=0\). By definition of L, point 1, and Proposition 17 and since H is self-dual, we have that L is a \(\Vert \; \Vert _{H}\) closed submodule and
proving the step.
Inductive Step. Assume the statement is true for n. By assumption, it follows that \(\textbf{T}^{n}H=\textbf{T}^{n}L\oplus \textbf{T} ^{n+1}H\) and \(\textbf{T}^{n}L\bot \textbf{T}^{n+1}H\). By Eq. (A8), we have that
as well as
where the last equality follows from (A10). The statement follows by induction.
4. We proceed by induction.
Initial Step. \(k=0\). By definition of L,
Inductive Step. Assume the statement is true for k. By assumption, it follows that \(\bigoplus _{n=0}^{k}\textbf{T}^{n}L=(\textbf{T} ^{k+1}H)^{\bot }\). By Proposition 17 and since H is self-dual and since \(\textbf{T}^{k+1}H\) is a \(\Vert \; \Vert _{H}\) closed submodule, this implies that
At the same time, by point 3, we also have that \(\textbf{T}^{k+1}H=\textbf{T} ^{k+1}L\oplus \textbf{T}^{k+2}H\) and \(\textbf{T}^{k+1}L\bot \textbf{T} ^{k+2}H \). We can conclude thatFootnote 5
The statement follows by induction. \(\blacksquare \)
Theorem 1 Let A be finite dimensional and H a pre-Hilbert A-module. If H is self-dual and \(\textbf{T}:H\rightarrow H\) is an isometry, then \(H=\widehat{H}\oplus \widetilde{H}\) where
Moreover, the submodules orthogonal decomposition, \(( \widehat{H}, \widetilde{H}) \), of H is the unique submodules orthogonal decomposition such that \({\textbf {T}}\widehat{H} =\widehat{H}\) and \(\widetilde{H}=\bigoplus _{n=0}^{\infty } {\textbf {T}}^{n} L \) given a wandering set L.
Proof
Define \(L=(\textbf{T}H)^{\bot }\). Define also \( M_{k}=\bigoplus _{n=0}^{k}\textbf{T}^{n}L\) for all \(k\in \mathbb {N} _{0}\), \(\widetilde{H}=\bigoplus _{n=0}^{\infty }\textbf{T}^{n}L\),Footnote 6 and \(\widehat{H}=\widetilde{H}^{\bot }\). It is immediate to see that \(\widehat{H}\) and \(\widetilde{H}\) are two \(\Vert \; \Vert _{H}\) closed submodules. Note that \(M_{k}\subseteq M_{k+1}\) for all \(k\in \mathbb {N} _{0}\) and \(\widetilde{H}=\textrm{cl}_{\Vert \; \Vert _{H}}(\bigcup _{k\in \mathbb {N} _{0}}M_{k})\). By construction, we have that \(\widehat{H}\bot M_{k}\) for all \( k\in \mathbb {N} _{0}\). By Lemma 18, it follows that \(M_{k}=( \textbf{T}^{k+1}H)^{\bot }\) for all \(k\in \mathbb {N} \). By Proposition 17, this implies that if \( x\in \widehat{H}\), then \(x\in M_{k}^{\bot }=\textbf{T}^{k+1}H\) for all \(k\in \mathbb {N} _{0}\). We can conclude that \(x\in \bigcap _{n=1}^{\infty }\textbf{T}^{n}H\cap H=\bigcap _{n=0}^{\infty }\textbf{T}^{n}H\). Vice versa, since \(M_{k}=(\textbf{ T}^{k+1}H)^{\bot }\) for all \(k\in \mathbb {N} \), if \(x\in \bigcap _{n=0}^{\infty }\textbf{T}^{n}H\), then
Since \(\textrm{cl}_{\Vert \; \Vert _{H}}(\bigcup _{n=0}^{\infty }M_{n})= \widetilde{H}\), this implies that \(x\in \widetilde{H}^{\bot }=\widehat{H}\). In other words, we proved that \(\widehat{H}=\bigcap _{n=0}^{\infty }\textbf{T} ^{n}H\).
We next prove uniqueness. Since \(\textbf{T}^{n+1}H\subseteq \textbf{T} ^{n}H\subseteq H\) for all \(n\in \mathbb {N} _{0}\), it follows that
Assume that \((\widehat{H}^{\prime },\widetilde{H}^{^{\prime }})\) is another decomposition. Consider the wandering set \(L^{\prime }\) generating \( \widetilde{H}^{\prime }\). By construction and since \(L^{\prime }\) is wandering, we have that \(L^{\prime }\bot \textbf{T}\widetilde{H}^{\prime }\) and \(L^{\prime }\oplus \textbf{T}\widetilde{H}^{\prime }=\widetilde{H} ^{\prime }\). By construction and Eq. (A8), this implies that
proving the statement. \(\blacksquare \)
1.7 A.7 Adjoints
Given a pre-Hilbert A-module, we define by \(B^{\sim }( H) \) the collection of all bounded A-linear operators. In other words, \(\textbf{T}\in B^{\sim }( H) \) if and only if
and there exists \(M>0\) such that
Since any A-linear operator is linear, we have that \(B^{\sim }\left( H\right) \subseteq B( H) \) where the latter is the set of all bounded and linear operators from H to H.
Given \(\textbf{T}\in B^{\sim }( H) \), we define the adjoint of \(\textbf{T}\), denoted by \(\textbf{T}^{*}\), to be such that
The next result shows that adjoints are well-defined and all the properties that hold for Hilbert spaces are satisfied once suitably adjusted to Hilbert modules. It is immediate to see that \(B^{\sim }( H) \) is a vector subspace of B(H) . Note also that if \(\textbf{S},\textbf{T}\in B^{\sim }( H) \), then the composition of \(\textbf{S}\) with \(\textbf{T}\) is also in \(B^{\sim }( H) \),Footnote 7
Proposition 19
Let H be a self-dual pre-Hilbert A-module. The following statements are true:
-
1.
\(^{*}:B^{\sim }(H)\rightarrow B^{\sim }(H)\) is well-defined, injective, and linear;
-
2.
\(\textbf{T}^{**}=\textbf{T}\) for all \(\textbf{T}\in B^{\sim }( H) \);
-
3.
\(^{*}:B^{\sim }(H)\rightarrow B^{\sim }(H)\) is surjective;
-
4.
\(\Vert \textbf{T}\Vert = \Vert \textbf{T}^{*} \Vert \) for all \( \textbf{T}\in B^{\sim }( H) \);
-
5.
\(\Vert \textbf{ST}\Vert \le \Vert \textbf{S}\Vert \Vert \textbf{T} \Vert \) for all \(\textbf{S},\textbf{T}\in B^{\sim }( H) \);
-
6.
\(\Vert \textbf{T}^{*}\textbf{T}\Vert = \Vert \textbf{TT}^{*} \Vert = \Vert \textbf{T}\Vert ^{2}\) for all \(\textbf{T}\in B^{\sim }( H) \);
-
7.
\((\textbf{ST})^{*}=\textbf{T}^{*}\textbf{S}^{*}\) for all \( \textbf{S},\textbf{T}\in B^{\sim }(H)\);
-
8.
For all \(\textbf{T}\in B^{\sim }(H)\), \(\textrm{ker}(\textbf{T}^{*})=\textbf{T}(H)^{\bot }\), where
$$\begin{aligned} \textrm{ker}\left( \textbf{T}^{*}\right) =\left\{ x\in H:\textbf{T} ^{*}x=0\right\} . \end{aligned}$$
Proof
1. Consider \(\textbf{T}\in B^{\sim }(H)\). Fix \( y\in H\). Since \(\textbf{T}\) is A-linear and bounded, note that the element y induces a bounded A-linear operator on H to A via the map
Since H is self-dual, there exists a unique \(z_{y}\in H\) such that \( \langle \textbf{T}x,y\rangle _{H}=\langle x,z_{y}\rangle _{H}\) for all \(x\in H\). We define \(\textbf{T}^{*}:H\rightarrow H\) to be such that \(\textbf{T} ^{*}y=z_{y}\). It follows that \(\textbf{T}^{*}\) is well-defined and satisfies Eq. (A11). Next, observe that for each \(y_{1},y_{2}\in H \) and \(a_{1},a_{2}\in A\)
yielding that \(\textbf{T}^{*}\) is A-linear and, in particular, linear. Finally, note that
proving that \(\textbf{T}^{*}\in B^{\sim }(H)\) and \(^{*}\) is well-defined. Next, fix \(y\in H\). Consider \(\textbf{S},\textbf{T}\in B^{\sim }(H)\) and \(\alpha ,\beta \in \mathbb {R} \). Observe that
We can conclude that \((\alpha \textbf{S}+\beta \textbf{T})^{*}y=(\alpha \textbf{S}^{*}+\beta \textbf{T}^{*})y\). Since y was arbitrarily chosen, we can conclude that \((\alpha \textbf{S}+\beta \textbf{T})^{*}=\alpha \textbf{S}^{*}+\beta \textbf{T}^{*}\), that is, \(^{*}\) is linear. Next, fix \(x\in H\) and assume that \(\textbf{T}^{*}=\textbf{S} ^{*}\). It follows that
We can conclude that \(\textbf{T}x=\textbf{S}x\). Since x was arbitrarily chosen, we can conclude that \(\textbf{T}=\textbf{S}\), that is, \(^{*}\) is injective.
2. Fix \(x\in H\). By definition of \(\textbf{T}^{*}\) and \(\textbf{T}^{**}\), we have that
We can conclude that \(\textbf{T}x =\textbf{T}^{**} x \). Since x was arbitrarily chosen, we can conclude that \(\textbf{T}=\textbf{T}^{**}\).
3. Consider \(\textbf{S}\in B^{\sim }( H) \) and consider \(\textbf{T}=\textbf{S} ^{*}\). By point 2, it follows that \(\textbf{T}^{*}=\textbf{S}^{**}=\textbf{S}\), that is, \(^{*}\) is surjective.
4. By the proof of point 1, we have that
In particular, we have that \(\Vert \textbf{T}^{**} \Vert \le \Vert \textbf{T}^{*} \Vert \le \Vert \textbf{T}\Vert \ \)for all \(\textbf{T}\in B^{\sim }( H) \). By point 2, we can conclude that \(\Vert \textbf{T}\Vert \le \Vert \textbf{T}^{*} \Vert \le \Vert \textbf{T}\Vert \ \)for all \( \textbf{T}\in B^{\sim }( H) \), proving the statement.
5. Consider \(\textbf{S},\textbf{T}\in B^{\sim }( H) \). We have that
6. Consider \(\textbf{S}\in B^{\sim }( H) \). By points 4 and 5, observe that
yielding that \(\Vert \textbf{S}^{*}\textbf{S}\Vert = \Vert \textbf{S} \Vert ^{2}\). If we choose \(\textbf{S}=\textbf{T}\), then \(\Vert \textbf{T} ^{*}\textbf{T}\Vert = \Vert \textbf{T}\Vert ^{2}\). If we choose \(\textbf{S }=\textbf{T}^{*}\), then \(\Vert \textbf{TT}^{*} \Vert = \Vert \textbf{T }^{*} \Vert ^{2}= \Vert \textbf{T}\Vert ^{2}\).
7. Consider \(\textbf{S},\textbf{T}\in B^{\sim }(H)\). Fix \(y\in H\). We have that, for each \(x\in H\),
It follows that \((\textbf{ST})^{*}y=\textbf{T}^{*}\textbf{S}^{*}y \). Since y was arbitrarily chosen, it follows that \((\textbf{ST})^{*}y=\textbf{T}^{*}\textbf{S}^{*}y\) for all \(y\in H\), that is, \(( \textbf{ST})^{*}=\textbf{T}^{*}\textbf{S}^{*}\).
8. First, we show that \(\textrm{ker}(\textbf{T}^{*})\) is included in \(( \textbf{T}H)^{\bot }\). Equivalently, we prove that each \(\bar{y}\in \textrm{ ker}(\textbf{T}^{*})\) is orthogonal to any \(y\in \textbf{T}H\). Note that \(\textbf{T}^{*}\bar{y}=0\) and that \(y=\textbf{T}x\) for some \(x\in H\). By the definition of adjoint operator,
proving the orthogonality of \(\bar{y}\) and y. Conversely, consider \(\bar{y} \in (\textbf{T}H)^{\bot }\), that is \(\langle \textbf{T}x,\bar{y}\rangle _{H}=0\) for all \(x\in H\). Since \(\textbf{T}^{*}\) is the adjoint operator, \(\langle x,\textbf{T}^{*}\bar{y}\rangle _{H}=0\) for all \(x\in H \), yielding that \(\textbf{T}^{*}\bar{y}=0\).\(\blacksquare \)
Point 1 can also be found in Goldstine and Horwitz (1966). Also in this case, there is a technical difference in terms of norm used over A.
Appendix B Proofs about the MEWD
1.1 B.1 Properties of \(L^2(\mathbb {R}^m,\Omega , \mathcal {F}, \mathbb {P})\) and \(\mathcal {H}_t(\varepsilon )\)
Proposition 20
\(H=L^2(\mathbb {R}^m,\Omega , \mathcal {F}, \mathbb {P})\) is a Hilbert A-module. Moreover, if \(\varvec{ \varepsilon }\) is a m-dimensional white noise, the submodule \(\mathcal {H}_t( \varvec{\varepsilon })\) of H defined in Eq. (2) is closed.
Proof
We already described in Sect. 2.2 that H is a pre-Hilbert A-module. Here, we show that H is also \(\Vert \ \Vert _{\bar{\varphi }}\) complete. We consider a Cauchy sequence \(\{ x^{(n)}\}_n \subset H\), i.e., for any \(\varepsilon >0\) there exists \(N>0\) such that
For any entry \(i=1,\ldots ,m\), the sequence \(\{ x_i^{(n)}\}_n \subset L^2(\Omega , \mathcal {F}, \mathbb {P})\) satisfies the Cauchy condition. Since \( L^2(\Omega , \mathcal {F}, \mathbb {P})\) is complete, there exists \(x_i \in L^2(\Omega , \mathcal {F}, \mathbb {P})\) such that \(\mathbb {E}[ (x_i^{(n)} - x_i)^2] < \varepsilon ^2 / m\) for all \(n > N_i\). As a result, by defining \( x=[x_1, \ldots , x_m]^{\prime }\in H\), we have
and so H is \(\Vert \ \Vert _{\bar{\varphi }}\) complete, i.e., it is a Hilbert A -module by Theorem 15.
As to the submodule \(\mathcal {H}_t(\varvec{\varepsilon })\), take \(x \in H\) such that there exists a sequence \(\{ x^{(n)} \}_n \subset \mathcal {H}_t( \varvec{\varepsilon })\) with \(\Vert x^{(n)} - x\Vert _{\bar{\varphi }} \rightarrow 0\). We show that \(x \in \mathcal {H}_t(\varvec{\varepsilon })\), too. Any \(x^{(n)}\) can be written as \(x^{(n)}=\sum _{k=0}^{\infty } \langle x^{(n)}, \varepsilon _{t-k} \rangle _H \varepsilon _{t-k}\) because, if \( x^{(n)}=\sum _{k=0}^{\infty } a_k^{(n)} \varepsilon _{t-k}\) with \(a_k^{(n)} \in A\),
In addition, the limit x can be decomposed as \(x=\sum _{k=0}^{\infty } \langle x, \varepsilon _{t-k} \rangle _H \varepsilon _{t-k} + \nu \) with \(\nu \in H\) such that \(\langle \nu , \varepsilon _{t-k} \rangle _H=\textbf{0}\) for all \(k \in \mathbb {N}_0\). This implies that \(\langle \nu , \varepsilon _{t-k} \rangle _{\bar{\varphi }}=\textrm{Tr} ( \langle \nu , \varepsilon _{t-k} \rangle _H )=0\). In consequence,
As \(\Vert x^{(n)} - x\Vert _{\bar{\varphi }}\) is arbitrary small, \(\Vert \nu \Vert _{\bar{ \varphi }}=0\) and so \(\nu =0\). Thus, \(x=\sum _{k=0}^{\infty } \langle x, \varepsilon _{t-k} \rangle _H \varepsilon _{t-k}\) belongs to \(\mathcal {H}_t( \varvec{\varepsilon })\). \(\blacksquare \)
1.2 B.2 Proof of Theorem 3
By Proposition 20, \(\mathcal {H}_t( \varvec{\varepsilon })\) is a closed submodule of \(H=L^2(\mathbb {R} ^m,\Omega , \mathcal {F}, \mathbb {P})\), which is a Hilbert A-module. Hence, \( \mathcal {H}_t(\varvec{\varepsilon })\) is a Hilbert A-module too and so, by Theorem 15, it is self-dual. Before applying the Abstract Wold Theorem for Hilbert A-modules (Theorem 1), we prove that the scaling operator \(\textbf{R} \) is well-defined, A-linear and isometric on \(\mathcal {H}_t(\varvec{ \varepsilon })\).
To show that \(\textbf{R}\) is well-defined on \(\mathcal {H}_t(\varvec{ \varepsilon })\), consider any \(X=\sum _{k=0}^{\infty } a_k \varepsilon _{t-k} \in \mathcal {H}_t(\varvec{\varepsilon })\), i.e., \(\Vert X\Vert _{\bar{\varphi } }^2 = \sum _{k=0}^{\infty } \textrm{Tr}( a_k a^{\prime }_k ) <+\infty \). Then,
and this quantity is finite. Thus, \(\textbf{R}\) is well-defined and it is a bounded operator.
About A-linearity, consider any matrix \(m \in A\) and \(X=\sum _{k=0}^{ \infty } a_k \varepsilon _{t-k}, Y=\sum _{k=0}^{\infty } b_k \varepsilon _{t-k}\) in \(\mathcal {H}_t(\varvec{\varepsilon })\). The element \(X+m Y=\sum _{k=0}^{\infty } c_k \varepsilon _{t-k}\) has for coefficients the matrices \(c_k = a_k +m b_k\) for any k in \(\mathbb {N}_0\). Then, \(\textbf{R}\) maps \(X+m Y\) to the element
As a result, \(\textbf{R}(X+m Y)=\textbf{R}X+m \textbf{R}Y\), i.e., \(\textbf{R }\) is A-linear.
To prove that \(\textbf{R}\) is isometric on \(\mathcal {H}_t(\varvec{ \varepsilon })\), consider again any X and Y as before. Since \(\varvec{ \varepsilon }\) is a multivariate white noise,
Hence, \(\textbf{R}\) is an isometry on \(\mathcal {H}_t(\varvec{\varepsilon } )\). Theorem 1 provides the orthogonal decomposition \(\mathcal {H}_t(\varvec{\varepsilon })=\widehat{\mathcal {H}}^{} _t(\varvec{\varepsilon }) \oplus \widetilde{\mathcal {H}}_t(\varvec{ \varepsilon })\), where
and \(\mathcal {L}_t^\textbf{R}=\mathcal {H}_t(\varvec{\varepsilon }) \ominus \textbf{R}\mathcal {H}_t(\varvec{\varepsilon })\) is the wandering submodule.
First, we show that \(\widehat{\mathcal {H}}_t(\varvec{\varepsilon })\) is the null submodule. Indeed, the submodules \(\textbf{R}^j \mathcal {H}_t( \varvec{\varepsilon })\) consist of linear combinations of innovations \( \varepsilon _t\) with matrix coefficients equal to each others \(2^j\)-by-\(2^j\):
Therefore, \(\widehat{\mathcal {H}}_t(\varvec{\varepsilon })\) can just include vectors as \(\sum _{h=0}^{\infty }c \varepsilon _{t-h}\) with \(c \in A\). Such vectors must belong to \(\mathcal {H}_t(\varvec{\varepsilon })\); hence,
is finite. Since the addends do not depend on k, \(c(p,q) = 0\) for all \( p,q=1,\ldots ,m\) and so c is the null matrix. Consequently, \(\widehat{ \mathcal {H}}_t(\varvec{\varepsilon })=\{0\}\) and \(\mathcal {H}_t( \varvec{\varepsilon })= \widetilde{\mathcal {H}}_t(\varvec{\varepsilon } )\).
We now turn to the submodule \(\widetilde{\mathcal {H}}_t(\varvec{ \varepsilon }^{})\). As the orthogonal complement of \(\textbf{R}\mathcal {H}_t( \textbf{x})\) is the kernel of the adjoint operator \(\textbf{R}^*\) (Proposition 19), we determine \(\textbf{R}^*\). In particular, \(\textbf{R}^*: \mathcal {H}_t(\varvec{\varepsilon }) \rightarrow \mathcal {H}_t(\varvec{\varepsilon })\) is defined by
To prove that \(\textbf{R}^*\) is well-defined, we take any \( Y=\sum _{k=0}^{\infty } a_k\varepsilon _{t-k}\) in \(\mathcal {H}_t(\varvec{ \varepsilon })\), i.e.,
Similarly,
and so
We deduce that \(\Vert \textbf{R}^*Y\Vert ^2\) is finite and \(\textbf{R}^*\) is well-defined.
We now establish that \(\langle \textbf{R}X, Y \rangle _H = \langle X, \textbf{R}^*Y \rangle _H\) for any \(X=\sum _{h=0}^{\infty } b_h \varepsilon _{t-h} \) and \(Y=\sum _{k=0}^{\infty } a_k \varepsilon _{t-k}\) in \( \mathcal {H}_t(\varvec{\varepsilon })\). By the unit variance white noise properties of \(\varvec{\varepsilon }\),
Therefore, \(\textbf{R}^*\) is the adjoint of the scaling operator.
Regarding the kernel of \(\textbf{R}^*\), we show that
Any element \(X=\sum _{k= 0}^{\infty } d_k^{(1)} ( \varepsilon _{t-2k}- \varepsilon _{t-2k-1})\) of \(\mathcal {H}_t(\varvec{\varepsilon })\) can be rewritten as \(X=\sum _{h=0}^{\infty } a_h \varepsilon _{t-h}\) with \( a_{2k+1}=-a_{2k}\) for every \(k \in \mathbb {N}_0\), i.e., \(a_{2k}+a_{2k+1}= \textbf{0}\). Consequently, \(\textbf{R}^*X=0\) and so
Conversely, take any \(X=\sum _{h=0}^{\infty } a_h \varepsilon _{t-h}\) in \( \textrm{ker}(\textbf{R}^*)\). Since \(\Vert \textbf{R}^*X \Vert _{\bar{\varphi }}=0\),
It follows that \(a_{2k+1}(p,q) = - a_{2k}(p,q) \) for any \(k \in \mathbb {N}_0\) and \(p,q=1,\ldots ,m\). Therefore, \(a_{2k+1}=-a_{2k}\) for any \(k \in \mathbb {N} _0\). As a result, \(X=\sum _{k= 0}^{\infty } d_k^{(1)} ( \varepsilon _{t-2k}- \varepsilon _{t-2k-1})\) with \(d_k^{(1)}=a_{2k}\) and so the converse inclusion in (B12) holds. As a result,
Moreover,
and, for any \(j \in \mathbb {N}\),
As the case with \(j \in \mathbb {N}\) follows by induction, we focus on \( \textbf{R}\mathcal {L}_t^\textbf{R}\) and prove that
Consider any \(Y \in \textbf{R}\mathcal {L}_t^\textbf{R}\). As Y is the image of some \(X \in \mathcal {L}_t^\textbf{R}\), there exists a sequence of matrices \(\{d_k^{(1)}\}_{k}\) such that \(X=\sum _{k=0}^{\infty } d_k^{(1)} ( \varepsilon _{t-2k}- \varepsilon _{t-2k-1})\) and
As a result, \(\textbf{R}\mathcal {L}_t^\textbf{R}\) is included in the module in (B13).
Conversely, consider any \(\sum _{k=0}^{\infty } d_k^{(2)} (\varepsilon _{t-4k}+\varepsilon _{t-4k-1}-\varepsilon _{t-4k-2}- \varepsilon _{t-4k-3})\) in \(\mathcal {H}_t(\varvec{\varepsilon })\). Then, Y belongs to \(\textbf{R}\mathcal {L}_t^\textbf{R}\) too, because it is the image of \(X=\sum _{k=0}^{\infty } \sqrt{2}d_k^{(2)} \left( \varepsilon _{t-2k}- \varepsilon _{t-2k-1} \right) \) in \(\mathcal {L}_t^\textbf{R}\). Consequently, the relation in (B13) holds.
The decomposition of the Hilbert A-module \(\mathcal {H}_t(\varvec{ \varepsilon })\) is, then, achieved. \(\blacksquare \)
1.3 B.3 Proof of Theorem 4
By applying the MCWD (Theorem 2) to the zero-mean, weakly stationary purely non-deterministic process \(\textbf{x}\), we find that \(x_t\) belongs to the Hilbert A-module \(\mathcal {H}_t(\varvec{\varepsilon })\), where \(\varvec{\varepsilon }\) is the unit variance white noise of classical Wold innovations of \(\textbf{x}\). Notably, \(\mathcal {H}_t( \varvec{\varepsilon })\) orthogonally decomposes as in Theorem 3. By denoting \(g_t^{(j)}\) the orthogonal projections of \(x_t\) on the submodules \(\textbf{R}^{j-1}\mathcal {L}_t^\textbf{R}\), we find that \( x_t=\sum _{j=1}^{\infty }g_t^{(j)}\), where the equality is in norm (recall that \(\Vert \ \Vert _H\) and \(\Vert \ \Vert _{\bar{\varphi }}\) are equivalent by Proposition 12). Then, by using the characterizations of submodules \(\textbf{R}^{j-1}\mathcal {L}_t^\textbf{R}\), for any scale \(j \in \mathbb {N}\) we find a sequence of matrices \(\{\beta _k^{(j)}\}_k\) such that Eq. (8) holds with \(\sum _{k=0}^{\infty } \textrm{Tr} (\beta _k^{(j)} {\beta _k^{(j)}}^{\prime }) < +\infty \). As a consequence, we can decompose \(x_t\) as in Eq. (6).
1. As we can see in Eq. (3), the process \( \varvec{\varepsilon _{t}^{(j)}}\) is a \(MA(2^j-1)\) with respect to the fundamental innovations \(\varvec{\varepsilon }\). In addition, the subprocess \(\{\varepsilon ^{(j)}_{t-k2^j}\}_{k \in \mathbb {Z}}\) is weakly stationary. Indeed, since \(\varvec{\varepsilon }\) is a multivariate white noise, \(\mathbb {E}[ \varepsilon _{t-k2^j}^{(j)} {\varepsilon _{t-k2^j}^{(j)}} ^{\prime }]\) is finite and it does not depend on k: for any \(k \in \mathbb { Z}\),
In addition, \(\mathbb {E}[ \varepsilon _{t-k2^j}^{(j)}]=0\) for any \(k \in \mathbb {Z}\) and the expectation does not depend on k. Regarding the cross-moment matrix on the support \(S^{(j)}_t = \{ t-k2^j: k \in \mathbb {N} _0\}\), for any \(h \ne k\),
The sets of indices \(\{h2^j, \ldots , h2^j+2^j-1\}\) and \(\{k2^j, \ldots , k2^j+2^j-1\}\) are disjoint because of \(h \ne k\) and so the last sums are null. In consequence, \(\mathbb {E}[ \varepsilon _{t-h2^j}^{(j)} { \varepsilon _{t-k2^j}^{(j)}}^{\prime }]=\textbf{0}\) for all \(h \ne k\). Thus, \(\{\varepsilon _{t-k2^j}^{(j)}\}_{k\in \mathbb {Z}}\) turns out to be weakly stationary on \(S^{(j)}_t\). In particular, it is a unit variance white noise.
2. In order to find the exact expression of the matrices \(\beta _k^{(j)}\), we exploit the orthogonal decompositions of the Hilbert A-module \(\mathcal {H} _t(\varvec{\varepsilon })\) at different scales \(J \in \mathbb {N}\):
We call \(\pi _t^{(j)}\) the orthogonal projection of \(x_t\) on \(\textbf{R}^j \mathcal {H}_t(\varvec{\varepsilon })\), and we proceed inductively.
We begin with the decomposition \(x_t=\pi _t^{(1)}+g_t^{(1)}\) coming from scale \(J=1\), i.e., \(\mathcal {H}_t(\varvec{\varepsilon })= \textbf{R} \mathcal {H}_t(\varvec{\varepsilon }) \oplus \mathcal {L}_t^\textbf{R}\). By using the characterization of submodules \(\textbf{R}\mathcal {H}_t( \varvec{\varepsilon })\) and \(\mathcal {L}_t^\textbf{R}\) described in the proof of Theorem 3, we set
for some sequences of matrices \(\{c_k^{(1)}\}_k\) and \(\{d_k^{(1)}\}_k\), or equivalently \(\{\gamma _k^{(1)}\}_k\) and \(\{\beta _k^{(1)}\}_k\), to determine in order to have \(x_t=\pi _t^{(1)}+g_t^{(1)}\), where we set \(\sqrt{2} c_k^{(1)}=\gamma _k^{(1)}\) and \(\sqrt{2} d_k^{(1)}=\beta _k^{(1)}\). The expressions above may be rewritten as
However, from Theorem 2 we know that
where the same fundamental innovations \(\varepsilon _t\) are present. By the uniqueness of writing of the MCWD, the two expressions for \(x_t\) must coincide. As a result, \(c_k^{(1)}\) and \(d_k^{(1)}\) are the solutions of the linear system
that is,
In particular, we find
Now, we focus on the scale \(J=2\). We exploit the decomposition of the submodule \(\textbf{R}\mathcal {H}_t(\varvec{\varepsilon })= \textbf{R}^2 \mathcal {H}_t(\varvec{\varepsilon }) \oplus \textbf{R}\mathcal {L}_t^ \textbf{R}\), which implies the relation \(\pi _t^{(1)}=\pi _t^{(2)}+g_t^{(2)}\). We follow the same track as in the previous case, by using the features of the elements in \(\textbf{R}^2\mathcal {H}_t(\varvec{\varepsilon })\) and in \(\textbf{R}\mathcal {L}_t^\textbf{R}\) and, finally, by comparing the expression of \(\pi _t^{(2)}+g_t^{(2)}\) with the (unique) writing of \( \pi _t^{(1)}\) that we found before. Since
by solving a linear system, we get
At the generic scale \(J=j\), we retrieve the expressions of \(\beta _k^{(j)}\) and \(\gamma _k^{(j)}\) of Eqs. (7) and (9), where \(\pi _t^{(j)}\) is also defined.
3. First of all, when t is fixed, \(\langle g_{t}^{(j)}, g_{t}^{(l)} \rangle _H = \mathbb {E}[ g_{t}^{(j)}{g_{t}^{(l)}}^{\prime }]=\textbf{0}\) for all \(j \ne l\) because \(g_{t}^{(j)}\) and \(g_{t}^{(l)}\) are, respectively, the projections of \(x_t\) on the submodules \(\textbf{R}^{j-1}\mathcal {L}_t^ \textbf{R}\) and \(\textbf{R}^{l-1}\mathcal {L}_t^\textbf{R}\), which are orthogonal by construction. Now, consider any \(g_{t-m2^j}^{(j)}\) with \(m \in \mathbb {N}_0\). Clearly, \(g_{t-m2^j}^{(j)}\) belongs to \(\textbf{R}^{j-1} \mathcal {L}_{t-m2^j}^\textbf{R}\) but, by the definition of \(g_t^{(j)}\), we can write
with \(\beta _K^{(j)} = \textbf{0}\) if \(K = 0, \ldots , m-1\) and \( \beta _K^{(j)}=\beta _k^{(j)}\) if \(K = m+k\) for some \(k \in \mathbb {N}_0\). As a result, \(g_{t-m2^j}^{(j)}\) belongs to \(\textbf{R}^{j-1}\mathcal {L}_t^ \textbf{R}\), too. Similarly, at scale l, taken any \(n \in \mathbb {N}_0\), it is easy to see that \(g_{t-n2^l}^{(l)}\) belongs to \(\textbf{R}^{l-1} \mathcal {L}_t^\textbf{R}\). Hence, the orthogonality of such submodules guarantees that \(\mathbb {E}[ g_{t-m2^j}^{(j)} {g_{t-n2^l}^{(l)}}^{\prime }]= \textbf{0}\) for all \(j \ne l\) and \(m,n \in \mathbb {N}_0\).
As for the general requirement about \(\mathbb {E}[ g_{t-p}^{(j)}{g_{t-q}^{(l)} }^{\prime }]\) for any \(j,l \in \mathbb {N}\) and \(p,q,t \in \mathbb {Z}\), we have
and so
where the matrices \(\beta _k^{(j)}\), \(\beta _h^{(l)}\) do not depend on t and \(\Gamma _n\) denotes the autocovariance matrix of \(\varvec{\varepsilon }\) at lag \(n \in \mathbb {Z}\). Hence, after the summations over u, v and k, h, the one remaining variables are \(j, l, p-q\). Therefore, \(\mathbb {E}[ g_{t-p}^{(j)}{g_{t-q}^{(l)}}^{\prime }]\) depends at most on \(j,l,p-q\). \(\blacksquare \)
Appendix C Illustrations of the MEWD
To put the MEWD into practice, we first compute the scale-specific responses of weakly stationary VAR(1) and VARMA(1, 1) processes in closed form. Afterwards, we briefly describe how to estimate the MEWD of weakly stationary processes in general. Finally, we analyze the persistent dynamics of Blanchard and Quah (1989) bivariate process through the MEWD.
1.1 C.1 The MEWD of VAR(1) and VARMA(1,1)
Consider a weakly stationary purely non-deterministic vector ARMA(1, 1) process, or VARMA(1, 1), \(\textbf{x}=\{x_t\}_{t\in \mathbb {Z}}\) defined by \( x_t=\rho x_{t-1}+\varepsilon _t+\theta \varepsilon _{t-1}, \) where \(\rho , \theta \in A\), \(\rho +\theta \ne \textbf{0}\) and \(\varvec{ \varepsilon }=\{\varepsilon \}_{t\in \mathbb {Z}}\) is a multivariate unit variance white noise. We assume the stationarity condition \(\Vert \rho \Vert _A<1\).Footnote 8
By using the lag operator \(\textbf{L}\), we can rewrite the previous equation as \((I-\rho \textbf{L})x_t=(I+\theta \textbf{L})\varepsilon _t\). Since \( \Vert \rho \Vert _A<1\), the operator \(\sum _{l=0}^{\infty }( \rho \textbf{L})^l\) is well-defined. Moreover, the operator \((I-\rho \textbf{L})\) is invertible with \( \sum _{l=0}^{\infty }( \rho \textbf{L})^l\) as inverse. Therefore, the moving average representation of \(x_t\) is
with \(\alpha _0=1\) and \(\alpha _h = \rho ^{h-1}(\rho +\theta )\) for all \(h \in \mathbb {N}\).
We compute the scale-specific responses by Eq. (7). Fixed a scale \(j \in \mathbb {N}\), we obtain
By setting \(\theta =0\), we find the scale-specific responses of a VAR(1):
As an example, consider a weakly stationary bivariate \({\textrm{VAR}}(1)\) process with \( x_t=[ y_t, z_t]^{\prime }\), \(\varepsilon _t= [ u_t, v_t ]^{\prime }\) as unit variance white noise and \(\rho =[a, b; c, d]\), that is
For any \(j\in \mathbb {N}\) and \(k \in \mathbb {N}_0\), the scale-specific responses \(\beta _k^{(j)}\) turn out to be
1.2 C.2 Estimation of the MEWD
When the moving average representation of the weakly stationary process \(\textbf{x}=\{x_t\}_{t\in \mathbb {Z}}\) is not known, to obtain the MEWD, one can follow the similar procedure of Subsections 3.1.1 and 3.2.1 in Ortu et al. (2020a) and Section 5 in Di Virgilio et al. (2019). The first step is to estimate a vector autoregressive form for \(x_t\). Then, a moving average representation can be retrieved. Finally, detail processes and scale-specific responses can be obtained from Eqs. (3) and (7), respectively. More details can be found in the next subsection, in the special case of Blanchard and Quah (1989) model.
1.3 C.3 Blanchard and Quah’s model
The macroeconomic model of Blanchard and Quah (1989) studies the impulse responses of GNP and unemployment to demand and supply shocks. We reproduce the model and analyze the responses through the MEWD in order to quantify the persistence of different disturbances.
We consider the algebra A of \(2\times 2\) matrices, and we take into account a zero-mean weakly stationary purely non-deterministic bivariate time series \(\textbf{x}=\{ x_t \}_{t\in \mathbb {Z}}\) such that
where \(\varvec{\varepsilon }=\{ \varepsilon _t \}_{t\in \mathbb {Z}}\) is a unit variance bivariate white noise and \(\sum _{h=0}^{\infty }\textrm{Tr} (\alpha _h \alpha ^{\prime }_h)\) is finite. In Blanchard and Quah (1989), \(x_t=[y_t, z_t]^{\prime }\), where \(y_t\) is the first-difference process of log real GNP (or output growth) and \(z_t\) is the seasonally adjusted unemployment rate for males aged more than 20. The impulse responses satisfy the long-run restriction \( \sum _{h=0}^{\infty }\alpha _h(1,1)=0, \) which is crucial for the identification. Indeed, \(x_t\) is supposed to have also the MA representation
where \(\varvec{\eta }=\{ \eta _t \}_{t\in \mathbb {Z}}\) is a bivariate with noise with covariance matrix \(\omega \), generally different from the identity. Here, \(\eta _t\) are the reduced-form residuals, while \( \varepsilon _t\) are the structural shocks. Equation (C15) provides the formulation obtained by estimating the time series parameters from the data. Specifically, we first estimate a vector autoregressive form for \(x_t\), that is
The matrix \(\omega \) is estimated by the covariance matrix of the residuals in the multivariate regression. Then, the autoregressive form implies that
From the last expression, we can find the MA matrix coefficients of Eq. (C15):
The MA representations (C14) and (C15) of \(x_t\) are related by \(\eta _t=\alpha _0 \varepsilon _t\) and \(\alpha _h = c_h \alpha _0\), where the matrix \(\alpha _0\) is such that \(\omega = \alpha _0 \alpha _0^{\prime }\). However, many choices for \( \alpha _0\) are possible since the factorization of \(\omega \) provides only three conditions for the identification of \(\alpha _0\). The long-run restriction (together with the signs restrictions) is an additional requirement that ensures the identification of structural shocks (Lütkepohl 2005, Section 9.1.4).
By the Cholesky factorization, there exists a unique lower triangular matrix s, such that \(\omega = ss^{\prime }\) (Trefethen and Bau III 1997, Lecture 23). Any \(\alpha _0\) such that \(\omega =\alpha _0 \alpha _0^{\prime }\) is an orthonormal transformation of s, namely \( \alpha _0=sr^{\prime }\) with \(r \in A\) orthonormal. The long-run restriction and the sign restrictions \(r(1,2)<0\), \(r(2,1)>0\) imply that r is uniquely determined by
By using these parameters, we get the MA of Eq. (C14), where the univariate shocks are simultaneously uncorrelated.
Blanchard and Quah (1989) data, which are freely released, are quarterly and span from 1950 : Q2 to 1987 : Q4. The maximum autoregressive lag N is chosen equal to 8. To reduce non-stationarity, \( z_t\) is linearly detrended, while \(y_t\) is demeaned by splitting the sample into two parts: before and after 1973 : Q4. The bivariate structural innovation \(\varepsilon _t = [ u_t, v_t]^{\prime }\) consists of the demand shock \(u_t\) and the supply shock \(v_t\). The impulse responses of output are obtained by cumulating the impulse responses of \(y_t\). They are plotted (together with the impulse responses of \(z_t\)) in the top panels of Fig. 1, which reproduce Figures 1 and 2 in Blanchard and Quah (1989). The impulse responses of output with respect to \(u_t\) converge to zero in the long term by the long-run restriction. This phenomenon is not present in the impulse responses of output with respect to \(v_t\): demand shocks have transitory effects, while supply disturbances have a permanent impact on output.
Demand shocks have opposite hump-shaped effects on output and unemployment, with a peak after two or four quarters. Moreover, the impact of \(u_{t}\) vanishes after 3 or 5 years: demand disturbances have similar relevant effects on GNP and employment, but, definitively, the subsequent adjustment of prices and wages leads the economy back to the equilibrium. As for supply shocks, the influence of innovations \(v_{t}\) on output cumulates over time, reaching a peak after 2 years. Except for the first quarter, output is increasing. Then, the output response declines and stabilizes on a steady level after 5 years from the initial shock. A different reaction, however, characterizes the unemployment rate. Indeed, even if the supply disturbance is favorable (due, e.g., to a productivity increase), in the short term \(z_{t}\) rises, plausibly because of wage rigidities. After several quarters \(z_{t}\) drops and, later, it slowly reverts to the original value. No effect is present after 5 years.
The lower graphs of Fig. 1 display the (non-cumulated) impulse responses of \(y_t\), together with those of \(z_t\). The non-monotonic responses of output in top panels correspond to oscillatory and sign changing responses of \(y_t\) in bottom panels. The latter reveal the presence of contrasting reactions that, overall, generate the hump-shaped responses. We now use the MEWD to shed light on the persistence of the shocks causing opposite reactions on different time scales.
From a sample of 159 data points, we estimate the first 7 scales in the MEWD and we plot in Fig. 2 the variance decompositions of \(y_t\) and \(z_t\) illustrated in Sect. 4.3. The two left panels, inspired by Eq. (10), regard the decomposition across scales without distinguishing between the two sources of randomness. The middle and right panels further disentangle the contribution across time scales of demand and supply shocks, following Eq. (11). From the figure, it is apparent that the unemployment rate is a longer term phenomenon than output growth. Indeed, most of the variance of \(z_t\) is explained by persistent components at scales 4, 5 and 6, involving shocks from 4 to 16 years. On the contrary, the variance of \(y_t\) comes from scales 1–3, i.e., from disturbances lasting from 6 months to 2 years, with an additional sizable contribution given by scale 4. Moreover, the analysis of individual shocks reveals that demand shocks provide the most variability of both output growth and unemployment. In agreement with the long-run restriction, the response of \( y_t\) to supply shocks operates mainly at scale 4 (involving 4-year innovations), while the response of the same variable to demand shocks is more concentrated on lower scales (1–3). However, demand shocks explain relatively more variance.
We can compare the results of Fig. 2 with the analysis of the spectral density matrix of \(x_t\). Following Example 11.8.1 in Brockwell and Davis (2006), we rewrite Eq. (C14) by using a time-invariant linear filter U:
where we assume that the entries of \(U(\textbf{L})\) are absolutely summable. Then, the spectral density matrix of \(x_t\) is, for any \(\pi < \lambda \leqslant \pi \),
where the spectral density of output growth satisfies
the spectral density of unemployment is similar, i.e.,
the cross-spectrum satisfies
and \(f_{zy}(\lambda )\) is the complex conjugate of \(f_{yz}(\lambda )\).
The left panels of Fig. 3 display the spectral densities of output growth and unemployment. Middle panels represent the contributions of demand shocks to such densities, i.e., the first summation terms in Eqs. (C16) and (C17). Similarly, right panels display the contribution of supply shocks to \(f_y\) and \(f_z\). Demand shocks keep a prominent role with respect to supply disturbances, as it was already clear from Fig. 2. The peaks of \(f_y\) are consistent with the variance of output growth explained by scales 1–4 (top-left panels in both figures). Moreover, the low density at 0 is consistent with the negligible variance of output explained by high scales. Supply shocks feature a peak of frequencies at roughly 0.23, in line with the peak of variance at scale 4 (top-right panels in the same figures). The spectral analysis confirms the persistent nature of unemployment, which features a concentration of frequencies around 0 (bottom-left panels). Hence, Figs. 2 and 3 are consistent within them. However, the variance decomposition of the MEWD provides simple interpretations directly in the time domain.
Figure 4 displays the scale-specific responses \(\beta _{k}^{(j)}\) of \(x_{t}\). As to supply shocks, we confirm the positive contemporaneous reaction of output growth with respect to 4-year innovations, captured by \(\beta _{0}^{(4)}(1,2)\). Regarding the reaction of output growth to demand shocks, the positive reaction causing the surge of the hump in cumulated responses is principally due to the coefficients \( \beta _{0}^{(j)}(1,1)\) with \(j=1,\ldots ,4\). Such responses are simultaneous with the shock and positive. The responses at other lags and scales mainly contribute to stabilize, and definitely fade out, this positive effect over time. In line with the variance decompositions, the responses of unemployment rate to demand shocks are mainly dictated by the coefficients \( \beta _{0}^{(j)}(2,1)\) with \(j=4,5,6\), which are negative and occur instantaneously (lag zero). These scale-specific responses generate the fall in the hump of the responses of \(z_{t}\) to demand shocks. Interestingly, other scale-specific coefficients, as for example \(\beta _{0}^{(2)}(2,1)\), are positive, proving the coexistence of contrasting reactions at different time scales. The simultaneous positive impact of demand shocks at scale 2 concurs to delay the large drop in unemployment and to make the hump arise.
Hence, by the MEWD we can disaggregate demand/supply calendar-time shocks and quantify the impact of innovations with heterogeneous duration in Blanchard and Quah’s model. First, the permanent effect of supply shocks on GNP growth (assumed by the long-run restriction) turns out to be linked to 4-year innovations. Overall, demand shocks are more important than supply shocks in the variance explanation. In particular, output growth positively responds to demand shocks lasting from 6 months to 2 years, while the unemployment rate evolves primarily on a 4, 8 or 16-year basis. In addition, the analysis of scale-specific responses can help in disentangling the positive and negative reactions at different time scales that, aggregated, generate the hump-shaped behavior of cumulated responses. The economic rigidities advocated by Blanchard and Quah (1989) can be individuated at some scale and lag, providing useful tools to the policy maker.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cerreia-Vioglio, S., Ortu, F., Severino, F. et al. Multivariate Wold decompositions: a Hilbert A-module approach. Decisions Econ Finan 46, 45–96 (2023). https://doi.org/10.1007/s10203-023-00392-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10203-023-00392-3