1 Introduction

A vector random process \(\textbf{x}=\{ x_t\}_{t\in \mathbb {Z}}\) is a collection of m univariate time series \(x_{i,t}\). Although weakly stationary univariate processes generally depend on a unique source of randomness [as ensured by the Wold decomposition Wold (1938)], each variable \(x_{i,t}\) of a weakly stationary multivariate process can be affected by m possibly different shocks \(\varepsilon _{1,t}, \varepsilon _{2,t}, \ldots , \varepsilon _{m,t}\). The representation of vector processes involves matrix coefficients, and their study has proved to be fruitful in numerous macroeconomic and financial applications (Lütkepohl 2005).

In this paper, we recast the standard treatment of multivariate time series in terms of Hilbert A-modules and prove the Abstract Wold Theorem for Hilbert A-modules (Theorem 1). Our abstract A-module framework features a notion of orthogonality that, by means of Theorem 1, permits to easily retrieve two important orthogonal decompositions for weakly stationary vector processes that we illustrate in Sects. 3 and 4, respectively. One is the celebrated multivariate classical Wold decomposition (MCWD, henceforth), summarized in Theorem 2. The other is the multivariate extended Wold decomposition (MEWD, henceforth), which constitutes the multivariate version of the extended Wold decompositionFootnote 1 of Ortu et al. (2020a), and it is used by Bandi et al. (2019, 2021) in financial economics settings. See Theorem 4.

Both the MCWD and the MEWD rely on orthogonal innovations. However, only in the MEWD shocks are associated with different degrees of persistence. In econometrics, persistence is usually addressed by spectral analysis techniques to analyze the frequency domain. For instance, cross-spectrum and squared coherency may be used to quantify the linear association between single time series in a vector process (Brockwell and Davis 2006, Section 11.6). On the contrary, the MEWD permits to disentangle uncorrelated persistent components from a weakly stationary vector process by using exclusively the time domain. Each vector component is associated with a specific persistence level (or time scale), and it is sensible to a family of shocks with precise duration. With respect to the univariate extended Wold decomposition of Ortu et al. (2020a), the illustration of the MEWD is richer because of the persistence of different simultaneous shocks.

To derive the MCWD and the MEWD, we first embed multivariate time series in an abstract A-module framework from which orthogonal decompositions naturally arise. We use the algebra A of square matrices, define orthogonal projections on closed submodules and prove the Abstract Wold Theorem for Hilbert A-modules, which is key to decompose vector processes into the sum of uncorrelated components. We provide a self-contained compendium on Hilbert A-modules on a non-commutative and finite dimensional algebra (as the one of matrices) in “Appendix A”. In fact, the application of Hilbert A-modules in economic theory and statistics is not novel. Some examples are given by Hansen and Richard (1987), Gallant et al. (1990), Wiener and Masani (1957) and Cerreia-Vioglio et al. (2022).

To enter more the details of our construction, we consider the vector space of square-integrable m-dimensional random vectors and we substitute the field of scalars with the algebra of \(m\times m\) matrices, obtaining an A-module H. We, then, endow H with an A-valued inner product which generalizes the inner product of \(L^{2}\) and naturally conveys a notion of orthogonality. Such a structure is a Hilbert A-module. The properties of self-duality of H and complementability of closed submodules, that we discuss in Sect. 2.1, are crucial for the Abstract Wold Theorem for Hilbert A-modules (Theorem 1).

Such theorem generalizes the Abstract Wold Theorem for Hilbert spaces (Sz-Nagy et al. 2010, Theorem 1.1) that permits to orthogonally decompose a Hilbert space by an isometric operator. When this theorem is applied to the Hilbert space generated by the past realizations of a weakly stationary univariate time series with the lag operator as isometry, the classical Wold decomposition obtains (see, e.g., Wold 1938; Brockwell and Davis 2006, Section 5.7 or Severino 2016). The orthogonality induced by the theorem is responsible for the white noise of fundamental innovations. However, other choices for the isometry are possible. For instance, the univariate persistence-based decomposition of Ortu et al. (2020a) is obtained by using as isometry the scaling operator on the Hilbert space generated by the past fundamental innovations. In this decomposition, the orthogonality ensured by the Abstract Wold Theorem is retrieved in the absence of correlation between persistent components.

The MEWD for a weakly stationary vector process \(x_t\) comes from the application of Theorem 1 to the Hilbert A-module generated by the multivariate fundamental innovations given by the MCWD of \(x_t\). The employed isometry is the scaling operator, adapted to A-modules. The A-module orthogonality given by the theorem guarantees that any entry of a multivariate persistent component at a given time scale is uncorrelated with any entry of any persistent component at a different scale (Theorem 4). This absence of correlation is crucial to associate each scale-specific response with the effect on the related time scale, without spurious correlation with the simultaneous effects at the other scales. In addition, the orthogonality of components induces a variance decomposition that permits to classify each entry of \(x_t\) as a short-, medium- or long-term process, according to the variance explained by the persistent components (Sect. 4.3).

The next section introduces the Hilbert A-module framework in which we embed weakly stationary vector processes and illustrates Theorem 1. Section 3 revisits the MCWD by emphasizing his connection with Theorem 1. Section 4 describes the MEWD and the persistence-based variance decomposition. As to appendices, “Appendix A” provides a comprehensive and self-contained treatment of Hilbert A-modules and Theorem 1, “Appendix B” contains the proofs about the MEWD and “Appendix C” provides some illustrations of the MEWD, including an application to the bivariate macroeconomic model of Blanchard and Quah (1989).

2 Hilbert A-modules for multivariate time series

In the first subsection, we condense the notions of Hilbert A-module theory that lead to the Abstract Wold Theorem for Hilbert A-modules (Theorem 1). “Appendix A” contains all the details (see also Cerreia-Vioglio et al. 2017, 2019). After that, we describe the Hilbert A-module \(L^2(\mathbb {R}^m,\Omega , \mathcal {F}, \mathbb {P})\) and the submodules that we will need for the MCWD and the MEWD.

2.1 Hilbert A-modules

Hilbert A-modules are a generalization of Hilbert spaces where the scalar field \(\mathbb {R}\) is replaced by an abstract algebra A. Although these structures have been studied in the mathematical literature (Kaplansky 1953), few works deal with the case of our interest, where the algebra is real, non-commutative and finite dimensional (Goldstine and Horwitz 1966). Differently from Hilbert spaces, the Riesz Theorem about the representation of linear functionals (Theorem 5.5 in Brezis 2011) is not always valid and closed subspaces are not always complemented. Moreover, the self-duality property that we discuss is key in deriving Theorem 1.

We consider a real normed operator algebra A (the algebra of square matrices) with norm \(\Vert \ \Vert _{A}\), an involution \(^{*}:A\rightarrow A\), an order \(\ge \) and a trace functional \(\bar{\varphi }:A\rightarrow \mathbb {R}\). Then, we consider an A-module H with outer product \(\cdot :A\times H\rightarrow H\) and we define an A-valued inner product \(\langle \, , \rangle _{H}:H\times H\rightarrow A\), which satisfies the A-valued versions of the usual properties of inner products. This makes H a pre-Hilbert A-module.

A-valued operators \(f: H \rightarrow A\) constitute the generalization of linear functionals, and the properties of A-linearity and boundedness can be defined accordingly. Importantly, H is self-dual when for each \(f: H \rightarrow A\) which is A-linear and bounded there exists \(y \in H\) such that \( f(x)=\langle x,y \rangle _H\) for all \(x \in H\). In other words, a version of the Riesz Theorem holds.

To put the theory at work, H can be endowed with two (real-valued) norms: \( \Vert \ \Vert _{\bar{\varphi }}\) and \(\Vert \ \Vert _{H}\). The first one is \(\Vert \ \Vert _{H}:H\rightarrow [ 0,+\infty )\) defined by

$$\begin{aligned} \left\| x\right\| _{H}=\sqrt{\left\| \left\langle x,x\right\rangle _{H}\right\| _{A}}\qquad \forall x\in H. \end{aligned}$$

The second one is induced by the (real-valued) inner product \(\langle \, , \rangle _{\bar{\varphi }}:H\times H\rightarrow \mathbb {R}\) defined by \( \langle x,y \rangle _{\bar{\varphi }}=\bar{\varphi } ( \langle x,y \rangle _{H}) \ \ \forall x,y\in H\), which makes H a pre-Hilbert space. The norm is \(\Vert \ \Vert _{\bar{\varphi } }:H\rightarrow [0,+\infty )\) defined by

$$\begin{aligned} \left\| x\right\| _{\bar{\varphi }}=\sqrt{\left\langle x,x\right\rangle _{ \bar{\varphi }}}=\sqrt{\bar{\varphi }\left( \left\langle x,x\right\rangle _{H}\right) }\qquad \forall x\in H. \end{aligned}$$

Proposition 12 in “Appendix A.3.1” shows that, if A is finite dimensional, then A admits a trace \(\bar{\varphi }\) and the norms \(\Vert \ \Vert _{H}\) and \(\Vert \ \Vert _{\bar{\varphi }}\) are equivalent. In our construction, the use of \(\Vert \ \Vert _{\bar{\varphi }}\) is convenient to establish the convergence of sequences in H, while \(\langle \, , \rangle _{H}\) naturally induces the notion of orthogonality: \(\langle x,y\rangle _{H}=0\) with \( x,y\in H\) and \(0\in A\).

We say that H is a Hilbert A-module when it is \(\Vert \ \Vert _{H}\) complete. Theorem 15 in “Appendix A.5” establishes two characterizations of \(\Vert \ \Vert _{H}\) completeness when A is finite dimensional: H is \(\Vert \ \Vert _{H}\) complete if and only if it is \(\Vert \ \Vert _{\bar{\varphi }}\) complete if and only if it is self-dual. The link between completeness and self-duality is, then, established.

The last ingredient for Theorem 1 regards the orthogonal complement of a given submodule \(M\subseteq H\), i.e., \(M^{\bot }= \{ x\in H: \langle x,y \rangle _{H}=0\quad \forall y\in M \}\). By Proposition 17 in “Appendix A.5.1”, if A is finite dimensional and H is self-dual, M is \(\Vert \ \Vert _{H}\) closed if and only if \(H=M\oplus M^{\bot } \) (complementability). The projection map on closed submodules is, then, well-defined.

The decomposition in Theorem 1 is due to an isometry \(\textbf{T}:H\rightarrow H\), i.e., an A-linear operator such that \(\langle \textbf{T}x,\textbf{T}y \rangle _{H}= \langle x,y \rangle _{H}\) for all \(x,y\in H\). Moreover, the theorem prescribes the determination of a wandering submodule L such that \(\textbf{T}^{n} L \bot \textbf{T} ^{m} L\) for all \(m,n\in \mathbb {N}_{0}\) with \(m\not =n\). If H is self-dual, a wandering submodule is \(L=(\textbf{T}H) ^{\bot }\).

Theorem 1

(Abstract Wold Theorem for Hilbert A-modules) Let A be finite dimensional and H a pre-Hilbert A-module. If H is self-dual and \(\textbf{T}:H\rightarrow H\) is an isometry, then \(H=\widehat{H}\oplus \widetilde{H}\) where

$$\begin{aligned} \widehat{H}=\bigcap _{n=0}^{\infty }\textbf{T}^{n}H,\qquad \widetilde{H} =\bigoplus _{n=0}^{\infty }\textbf{T}^{n}L,\qquad L=\left( \textbf{T}H\right) ^{\bot }. \end{aligned}$$

Moreover, the submodules orthogonal decomposition, \((\widehat{H},\widetilde{H })\), of H is the unique submodules orthogonal decomposition such that \(\textbf{T} \widehat{H}=\widehat{H}\) and \(\widetilde{H}=\bigoplus _{n=0}^{\infty }\textbf{ T}^{n}L\) given a wandering set L.

Proof

See “Appendix A.6”. \(\blacksquare \)

Theorem 1 provides a (generalized) version for Hilbert A-modules of the Abstract Wold Theorem for Hilbert spaces (Sz-Nagy et al. 2010, Theorem 1.1). The properties of self-duality and complementability play a crucial role in this wider setting.

Table 1 Relations between the general Hilbert A-module theory and the Hilbert A-module of square-integrable random vectors with zero mean

2.2 The Hilbert A-module \(L^2(\mathbb {R}^m,\Omega , \mathcal {F}, \mathbb {P})\)

A probability space \((\Omega , \mathcal {F}, \mathbb {P})\) is given and, as usual, any two \(\mathcal {F}\)-measurable random vectors are defined to be equivalent when they coincide almost surely. We, then, consider the vector space \(L^2(\mathbb {R}^m,\Omega , \mathcal {F}, \mathbb {P})\) of (equivalence classes of) measurable square-integrable random vectors x that take value in \(\mathbb {R }^m\). We build on \(L^2(\mathbb {R}^m,\Omega , \mathcal {F}, \mathbb {P})\) the structure of pre-Hilbert A-module, and we denote it by H. The whole discussion is summarized in Table 1.

First of all, we consider the algebra \(A=\mathbb {R}^{m\times m}\) of real \( m\times m\) matrices. The unit in A is the identity matrix, and the product in A is the usual row-by-column product. A is normed by the operator norm \(\Vert \ \Vert _A\) such that, for any \(a=\{ a_{i,j}\}_{i,j=1,\ldots ,m}\) in A,

$$\begin{aligned} \Vert a\Vert _A = \sup _{x\in \mathbb {R}^m, \Vert x\Vert _2=1} \Vert ax\Vert _2, \end{aligned}$$

where \(\Vert \ \Vert _2\) is the \(L^2\) norm in \(\mathbb {R}^m\). In particular, \( \Vert a\Vert _A^2\) is the largest eigenvalue of the positive semidefinite matrix \( a^{\prime }a\) (Meyer 2000, Section 5.2), where \(a^{\prime } \) is the transpose of a. The map that associates any matrix a with \( a^{\prime }\) defines an involution in A. This map induces an order \(\ge \) such that, for any \(a,b \in A\), \(a \ge b\) when \(a-b\) is a symmetric and positive semidefinite matrix (equivalently, \(a-b \ge \textbf{0}\)).

We use as outer product \(A \times H \rightarrow H\) the standard matrix-by-vector product. This operation makes H an A-module. Then, we define the A-valued inner product \(\langle \, \ \rangle _H: H \times H \rightarrow A\) that associates any \(x=[x_1,\ldots ,x_m]^{\prime }, y=[y_1,\ldots ,y_m]^{\prime }\in H \) with the matrix

$$\begin{aligned} \langle x, y \rangle _H = \mathbb {E}\left[ x y^{\prime }\right] =\left\{ \mathbb {E}\left[ x_iy_j\right] \right\} _{i,j=1,\ldots ,m}. \end{aligned}$$

\(\langle \, , \rangle _H\) satisfies the usual properties of inner products. In addition, if x has zero mean, \(\langle x,x\rangle _H\) is the covariance matrix of x. Importantly, two random vectors \(x,y \in H\) are orthogonal when \(\langle x, y \rangle _H=\textbf{0}\), that is \(\mathbb {E}[x_i y_j]=0\) for all \(i,j =1,\ldots ,m\). If x and y have zero mean, this means that any \(x_i\) is uncorrelated with any \(y_j\).

We now define the two equivalent (real-valued) norms \(\Vert \ \Vert _H\) and \(\Vert \ \Vert _{\bar{\varphi }}\). Regarding \(\Vert \ \Vert _H: H \rightarrow [0,+\infty )\), we have

$$\begin{aligned} \Vert x\Vert _H=\sqrt{\Vert \langle x,x \rangle _H\Vert _A} = \sqrt{\Vert \mathbb {E}\left[ xx^{\prime }\right] \Vert _A} \qquad \forall x \in H. \end{aligned}$$

If x has zero mean, \(\Vert x\Vert _H^2\) is the largest eigenvalue of the covariance matrix of x.

To construct \(\Vert \ \Vert _{\bar{\varphi }}\), we first consider the trace functional \(\bar{\varphi }: A \rightarrow \mathbb {R}\) defined, for any matrix a, by its trace \(\bar{\varphi } (a) = \textrm{Tr}(a)\). Indeed, H is a pre-Hilbert space with the inner product \(\langle \, \ \rangle _{\bar{\varphi }}: H \times H \rightarrow \mathbb {R}\) defined by

$$\begin{aligned} \langle x,y \rangle _{\bar{\varphi }}=\bar{\varphi }\left( \langle x,y \rangle _H\right) =\textrm{Tr}\left( \mathbb {E}\left[ xy^{\prime }\right] \right) = \sum _{i=1}^m \mathbb {E}\left[ x_i y_i\right] \qquad \forall x,y \in H, \end{aligned}$$

which coincides with the usual inner product of \(L^2( \mathbb {R}^m)\). The associated norm \(\Vert \ \Vert _{\bar{\varphi }}: H \rightarrow [0,+\infty )\) is

$$\begin{aligned} \Vert x\Vert _{\bar{\varphi }}=\sqrt{\langle x,x \rangle _{\bar{\varphi }}} =\sqrt{ \textrm{Tr}\left( \mathbb {E}\left[ xx^{\prime }\right] \right) } =\sqrt{ \sum _{i=1}^m \mathbb {E}\left[ x_i^2\right] } \qquad \forall x \in H. \end{aligned}$$

If x has zero mean, \(\Vert x\Vert _{\bar{\varphi }}^2\) is the sum of the eigenvalues of the covariance matrix of x.

Proposition 20 in “Appendix B.1” shows that H is \(\Vert \ \Vert _{\bar{\varphi }}\) complete, i.e., it is a Hilbert A-module. Thus, the self-duality property holds.

2.3 The submodules generated by weakly stationary vector processes

Now, consider a zero-mean weakly stationary multivariate process \(\textbf{x} =\{x_t\}_{t \in \mathbb {Z}}\) such that \(x_t=[x_{1,t},\ldots , x_{m,t}]^{\prime }\in H\) for all \(t \in \mathbb {Z}\). The autocovariance function \(\Gamma : \mathbb {Z} \rightarrow A\) associates any integer n with the matrix \(\Gamma _n\) with entries \(\Gamma _n(p,q)= \mathbb {E}[x_{p,t}x_{q,t+n}]\) for \(p,q=1,\ldots ,m\). If \(\Gamma _n = \textbf{0}\) for any \(n\ne 0\), we are facing a multivariate white noise, which features unit variance when \( \Gamma _0 \) is the identity matrix. In this case, the single time series of the multivariate white noise are uncorrelated also simultaneously.

The sequence \(\{x_{t-n}\}_{n\in \mathbb {N}_0}\) spans the Hilbert submodule \( \mathcal {H}_t(\textbf{x}) \subseteq H\) defined by

$$\begin{aligned} \mathcal {H}_t(\textbf{x})=\textrm{cl}\left\{ \sum _{k=0}^{+\infty } a_k x_{t-k} : \quad a_k \in A, \quad \sum _{k=0}^{+\infty } \sum _{h=0}^{+\infty } \textrm{Tr }\left( a_k \Gamma _{k-h} a^{\prime }_h\right) <+\infty \right\} \end{aligned}$$
(1)

with

$$\begin{aligned} \left\| \sum _{k=0}^{+\infty } a_k x_{t-k} \right\| _{\bar{\varphi }}^2 = \sum _{k=0}^{+\infty } \sum _{h=0}^{+\infty } \textrm{Tr}\left( a_k \Gamma _{k-h} a^{\prime }_h\right) . \end{aligned}$$

As we will see in Sect. 3, when \(\textbf{x}\) is regular, the multivariate classical Wold decomposition (MCWD) can be obtained by applying Theorem 1 to \(\mathcal {H}_t(\textbf{x})\) with the lag operator as isometry.

An outcome of the MCWD is the unit variance multivariate white noise of fundamental innovations \(\varvec{\varepsilon }=\{\varepsilon _t \}_{t\in \mathbb {Z}}\). The sequence \(\{\varepsilon _{t-n}\}_{n\in \mathbb {N} _0}\) generates the submodule \(\mathcal {H}_t(\varvec{\varepsilon }) \subseteq H\) given by

$$\begin{aligned} \mathcal {H}_t(\varvec{\varepsilon })=\left\{ \sum _{k=0}^{+\infty } a_k \varepsilon _{t-k} : \quad a_k \in A, \quad \sum _{k=0}^{+\infty } \textrm{Tr} \left( a_k a^{\prime }_k \right) <+\infty \right\} \end{aligned}$$
(2)

with

$$\begin{aligned} \left\| \sum _{k=0}^{+\infty } a_k \varepsilon _{t-k} \right\| _{\bar{\varphi } }^2 = \sum _{k=0}^{+\infty } \textrm{Tr}\left( a_k a^{\prime }_k\right) . \end{aligned}$$

Proposition 20 in “Appendix B.1” shows that \(\mathcal {H}_t(\varvec{ \varepsilon })\) is a closed submodule of H and so it is a Hilbert submodule. In Sect. 4, we apply Theorem 1 to \(\mathcal {H}_t(\varvec{\varepsilon })\) with the scaling operator as isometry in order to derive the multivariate extended Wold decomposition (MEWD) of \(x_t\).

3 Multivariate classical Wold decomposition

In the MCWD, a zero-mean regular weakly stationary vector process \(\textbf{x} =\{ x_t\}_{t\in \mathbb {Z}}\) is decomposed into the infinite sum of uncorrelated multivariate innovations that occur at different times, plus a deterministic component (Theorem 7.2 in Bierens 2005). Wiener and Masani (1957) and Rozanov (1967) provide a proof in the complex field. Here, we provide the roadmap to derive the MCWD via the Abstract Wold Theorem for Hilbert A-modules in the real case. A similar derivation in the univariate case is illustrated in Severino (2016).

We consider the Hilbert submodule \(\mathcal {H}_t(\textbf{x}) \subseteq H\) of Eq. (1). The lag operator \(\textbf{L}: \mathcal {H}_t(\textbf{x}) \rightarrow \mathcal {H}_t(\textbf{x})\) acts on generators of \(\mathcal {H}_t(\textbf{x})\) as

$$\begin{aligned} \textbf{L}: \qquad \sum _{k=0}^{+\infty } a_k x_{t-k} \quad \mapsto \quad \sum _{k=0}^{+\infty } a_k x_{t-1-k}. \end{aligned}$$

\(\textbf{L}\) is A-linear and bounded; hence, it can be extended to \( \mathcal {H}_t(\textbf{x})\) with continuity. Importantly, \(\textbf{L}\) is isometric on \(\mathcal {H}_t(\textbf{x})\). Theorem 1 can, then, be applied.

Theorem 1 requires to determine the images of \( \mathcal {H}_t(\textbf{x})\) through the powers of \(\textbf{L}\) and the wandering submodule \(\mathcal {L}_t^\textbf{L}\). Since, in a self-dual pre-Hilbert A-module, the image of a closed submodule through an isometry is a closed submodule (Lemma 18 in “Appendix A.6”), one can prove that \(\textbf{L}^j \mathcal {H}_t(\textbf{x})=\mathcal {H}_{t-j}(\textbf{x})\) for any \(j \in \mathbb {N}\).

Then, it is possible to show that \(\mathcal {H}_t(\textbf{x})\) can be decomposed into the direct sum

$$\begin{aligned} \mathcal {H}_t(\textbf{x})=\mathcal {H}_{t-1}(\textbf{x})\oplus \textrm{span} \left\{ x_t- \mathcal {P}_{\mathcal {H}_{t-1}(\textbf{x})} x_t\right\} , \end{aligned}$$

i.e., \(\mathcal {L}_t^\textbf{L}\) is the linear span of \(x_t- \mathcal {P}_{ \mathcal {H}_{t-1}(\textbf{x})} x_t\), where \(\mathcal {P}_M\) denotes the orthogonal projection on the closed submodule M.

Since \(\textbf{x}\) is regular, for any \(t \in \mathbb {Z}\), \(\langle x_t- \mathcal {P}_{\mathcal {H}_{t-1}(\textbf{x})} x_t, x_t- \mathcal {P}_{\mathcal {H }_{t-1}(\textbf{x})} x_t \rangle _H\) is a symmetric positive definite matrix (Bierens 2012, Section 6). Hence, by Theorem 7.2.6 in Horn and Johnson (1990), there exists a symmetric positive definite square root matrix \(\sigma \) such that

$$\begin{aligned} \left\langle x_t- \mathcal {P}_{\mathcal {H}_{t-1}(\textbf{x})} x_t, x_t- \mathcal {P}_{\mathcal {H}_{t-1}(\textbf{x})} x_t \right\rangle _H = \sigma \sigma . \end{aligned}$$

As \(\sigma \) is invertible, we define the fundamental innovation process \(\varvec{\varepsilon }=\{ \varepsilon _t \}_{t\in \mathbb {Z}}\) by setting, for any \(t \in \mathbb {Z}\), \( \varepsilon _t= \sigma ^{-1} ( x_t- \mathcal {P}_{\mathcal {H}_{t-1}( \textbf{x})} x_t ). \) In particular, \(\varvec{\varepsilon }\) is a unit variance white noise.

It is, then, possible to show that the lag and the projection operator commute: for any \(k,j \in \mathbb {N}_0\), \(\textbf{L}^j\mathcal {P}_{\mathcal {H }_{t-k-1}(\textbf{x})} x_{t-k}=\mathcal {P}_{\mathcal {H}_{t-k-j-1}(\textbf{x} )} x_{t-k-j}\). Therefore, the covariance matrix of \(x_t- \mathcal {P}_{ \mathcal {H}_{t-1}(\textbf{x})} x_t\) is not dependent on the time index \(t \in \mathbb {Z}\) and, for any \(j \in \mathbb {N}\),

$$\begin{aligned} \textbf{L}^j\mathcal {L}_t^\textbf{L}=\textrm{span}\left\{ x_{t-j}- \mathcal {P} _{\mathcal {H}_{t-j-1}(\textbf{x})} x_{t-j}\right\} . \end{aligned}$$

As a result, Theorem 1 implies that \(\mathcal {H} _t(\textbf{x})=\widehat{\mathcal {H}}_t(\textbf{x}) \oplus \widetilde{ \mathcal {H}}_t(\textbf{x})\) with

$$\begin{aligned} \widehat{\mathcal {H}}_t(\textbf{x})=\bigcap _{j=0}^{+\infty }\mathcal {H}_{t-j}( \textbf{x}), \qquad \quad \widetilde{\mathcal {H}}_t(\textbf{x} )=\bigoplus _{j=0}^{+\infty }\textrm{span}\left\{ x_{t-j}- \mathcal {P}_{ \mathcal {H}_{t-j-1}(\textbf{x})} x_{t-j}\right\} . \end{aligned}$$

The consequences for the process \(\textbf{x}\) are, then, straightforward.

Theorem 2

(Multivariate classical Wold decomposition) Let \(\textbf{x}=\{x_t\}_{t \in \mathbb {Z}}\) be a zero-mean regular weakly stationary m-dimensional process. Then, for any \(t \in \mathbb {Z}\), \(x_t\) decomposes as

$$\begin{aligned} x_t=\sum _{h=0}^{+\infty } \alpha _h \varepsilon _{t-h}+\nu _t, \end{aligned}$$

where the equality is in norm and

  1. 1.

    \(\varvec{\varepsilon }=\{\varepsilon _t\}_{t\in \mathbb {Z}}\) is a unit variance m-dimensional white noise;

  2. 2.

    for any \(h \in \mathbb {N}_0\), the \(m\times m\) matrices \(\alpha _h\) do not depend on t,

    $$\begin{aligned} \alpha _h=\mathbb {E}\left[ x_t \varepsilon ^{\prime }_{t-h}\right] \qquad \text{ and } \qquad \sum _{h=0}^{+\infty } \textrm{Tr}\left( \alpha _h \alpha ^{\prime }_h \right) < +\infty ; \end{aligned}$$
  3. 3.

    \(\varvec{\nu }=\{\nu _t\}_{t\in \mathbb {Z}}\) is a zero-mean weakly stationary m-dimensional process,

    $$\begin{aligned} \nu _t\in \bigcap _{j=0}^{+\infty }\mathcal {H}_{t-j}(\textbf{x}) \qquad \text{ and } \qquad \mathbb {E}\left[ \nu _t \varepsilon ^{\prime }_{t-h}\right] = \textbf{0}\quad \forall h \in \mathbb {N}_0; \end{aligned}$$
  4. 4.
    $$\begin{aligned} \nu _t \in \textrm{cl}\left\{ \sum _{h=1}^{+\infty } a_h \nu _{t-h} \quad \in \bigcap _{j=1}^{+\infty }\mathcal {H}_{t-j}(\textbf{x}): \quad a_h \in A \right\} . \end{aligned}$$

Proof

Wiener and Masani (1957, Theorem 6.11) and Rozanov (1967, Chapter II, Section 3) provide a proof in the complex field. A detailed proof using real Hilbert A-modules is available upon request. \(\blacksquare \)

The process \(\varvec{\nu }\) constitutes the (predictable) deterministic component of \(\textbf{x}\). If each \(\nu _t\) is the null vector, \(\textbf{x}\) is a purely non-deterministic process.

In our approach, the multivariate impulse responses \(\alpha _h\) are fully characterized by the projection on Hilbert submodules via the inner product \(\alpha _h= \langle x_t, \varepsilon _{t-h} \rangle _H\). This feature generalizes the OLS methodology employed in the univariate case by exploiting orthogonal projections in a more general sense. Indeed, each projection matrix \(\alpha _h\) minimizes the distance of the outcome \(x_t\) from the submodule generated by the vector innovation \(\varepsilon _{t-h}\).

4 Multivariate extended Wold decomposition

In this section, we generalize the extended Wold decomposition for weakly stationary time series of Ortu et al. (2020a) to multidimensional processes, by exploiting our Hilbert-module framework. We apply the Abstract Wold Theorem for Hilbert A-modules to \(\mathcal {H}_t( \varvec{\varepsilon })\), and, later on, we deduce the decomposition for a weakly stationary vector process \(\textbf{x}\) with fundamental innovations given by \(\varvec{\varepsilon }\). We also provide a persistence-based variance decomposition.

4.1 The orthogonal decomposition of \(\mathcal {H}_t(\varepsilon )\) induced by \(\textbf{R}\)

Let \(\varvec{\varepsilon }=\{ \varepsilon _{t}\}_{t\in \mathbb {Z}}\) be a unit variance m-dimensional white noise and consider the Hilbert submodule \(\mathcal {H}_t(\varvec{\varepsilon }) \subseteq H\) of Eq. (2). In order to apply Theorem 1, we consider the scaling operator \(\textbf{R}: \mathcal {H}_t(\varvec{\varepsilon }) \rightarrow \mathcal {H}_t(\varvec{ \varepsilon })\) that makes (normalized) two-by-two averages of subsequent innovations:

$$\begin{aligned} \textbf{R}: \qquad \sum _{k=0}^{+\infty } a_k \varepsilon _{t-k} \quad \mapsto \quad \sum _{k=0}^{+\infty } \frac{a_k}{\sqrt{2}}\left( \varepsilon _{t-2k}+ \varepsilon _{t-2k-1}\right) =\sum _{k=0}^{+\infty } \frac{a_{\lfloor \frac{k}{2} \rfloor }}{\sqrt{2}}\varepsilon _{t-k}. \end{aligned}$$

Here, the function \(\lfloor \cdot \rfloor \) associates any \(c\in \mathbb {R}\) with the integer \(\lfloor c\rfloor =\max \{ n \in \mathbb {Z}: n \leqslant c\} \). In the proof of Theorem 3, we show that \(\textbf{R}\) is well-defined, A-linear and isometric on \(\mathcal {H}_t(\varvec{ \varepsilon })\).

Following Ortu et al. (2020a), to illustrate the decomposition of \(\mathcal {H}_{t}(\varvec{\varepsilon })\) induced by \(\textbf{R}\), from the white noise \(\varvec{\varepsilon }\), we define the (multivariate) detail process at scale 1 \(\varvec{ \varepsilon ^{(1)}}=\{\varepsilon _{t}^{(1)}\}_{t\in \mathbb {Z}}\) by

$$\begin{aligned} \varepsilon _{t}^{(1)}=\frac{\varepsilon _{t}-\varepsilon _{t-1}}{\sqrt{2}} ,\qquad t\in \mathbb {Z}. \end{aligned}$$

Each \(\varepsilon _{t}^{(1)}\) has zero mean and unit variance: \(\mathbb {E} [\varepsilon _{t}^{(1)}{\varepsilon _{t}^{(1)}}^{\prime }]=I\). In general, for any \(j\in \mathbb {N}\), we define the (multivariate) detail process at scale j \(\varvec{\varepsilon ^{(j)}}=\{\varepsilon _{t}^{(j)}\}_{t\in \mathbb {Z}}\) by

$$\begin{aligned} \varepsilon _{t}^{(j)}=\frac{1}{\sqrt{2^{j}}}\left( \sum _{i=0}^{2^{j-1}-1}\varepsilon _{t-i}-\sum _{i=0}^{2^{j-1}-1}\varepsilon _{t-2^{j-1}-i}\right) ,\qquad t\in \mathbb {Z}. \end{aligned}$$
(3)

This definition is the natural multivariate counterpart of eq. (6) in Ortu et al. (2020a), which is written in the same way. Equation (3) is reminiscent of the iterated application of the discrete Haar transform to the series of \(\varepsilon _{t}\) (Addison 2002, Chapter 3). High scales involve detail processes that have not been faded out by many applications of this transform. They are, therefore, associated with a high degree of persistence. For instance, if t evolves daily, one can interpret the details at scale 1 as 2-day shocks, those at scale 2 as 4-day shocks, those at scale 3 as 8-day shocks and so on. In few words, the scale j involves \(2^j\)-day multivariate shocks and defines the degree of persistence j.

In order to avoid overlap among the vectors \(\varepsilon _{t}^{(j)}\), at any scale j we consider the subseries of \(\varvec{\varepsilon ^{(j)}}\) defined on the support \(S^{(j)}_t=\{t-k2^j: k \in \mathbb {Z}\}\). Indeed, each detail at scale j is a vector \(MA(2^j-1)\) of the white noise \(\varvec{\varepsilon }\). Some spurious correlation is present between the details \(\varepsilon _{t-k2^j}^{(j)}\) and \(\varepsilon _{\tau -k2^j}^{(j)}\) with \(\vert {t-\tau } \vert \leqslant 2^j -1\), but each subseries \(\{ \varepsilon _{t-k2^j}^{(j)}\}_{k \in \mathbb {Z}}\) is a unit variance white noise. We formalize this fact in the first point of Theorem 4, and we will associate high scales with more persistent detail processes.

We now state the orthogonal decomposition of \(\mathcal {H}_t(\varvec{ \varepsilon })\) implied by Theorem 1 when we use the scaling operator as isometry.

Theorem 3

Let \(\varvec{\varepsilon }\) be a unit variance m -dimensional white noise. The Hilbert A-module \(\mathcal {H}_t(\varvec{ \varepsilon })\) decomposes into the orthogonal sum

$$\begin{aligned} \mathcal {H}_t(\varvec{\varepsilon })=\bigoplus _{j=1}^{+\infty }\textbf{R} ^{j-1}\mathcal {L}_t^{\textbf{R}}, \end{aligned}$$

where

$$\begin{aligned} \textbf{R}^{j-1}\mathcal {L}_t^\textbf{R}=\left\{ \sum _{k=0}^{+\infty } b_k^{(j)} \varepsilon _{t-k2^j}^{(j)} \in \mathcal {H}_t(\varvec{ \varepsilon }) : \quad b_k^{(j)} \in A\right\} . \end{aligned}$$
(4)

Proof

See “Appendix B.2”. \(\blacksquare \)

The main steps of the proof are the following.

We first determine the invariant submodule \(\widehat{\mathcal {H}}_t( \varvec{\varepsilon })\) prescribed by Theorem 1. From the definition of \(\textbf{R}\), the submodule \(\textbf{R}\mathcal {H} _t(\varvec{\varepsilon })\) is composed by the linear combinations of innovations \(\varepsilon _t\) that have the (matrix) coefficients equal to each others 2-by-2. Similarly, for any \(j \in \mathbb {N}\), the submodules \(\textbf{R}^j\mathcal {H}_t(\varvec{\varepsilon })\) consist of the linear combinations of vectors \(\varepsilon _t\) with (matrix) coefficients equal to each others \(2^j\)-by-\(2^j\):

$$\begin{aligned} \textbf{R}^j\mathcal {H}_t(\varvec{\varepsilon })=\left\{ \sum _{k = 0}^{+\infty }c_{k}^{(j)} \left( \sum _{i=0}^{2^j-1}\varepsilon _{t-k2^j-i}\right) \in \mathcal {H}_t( \varvec{\varepsilon }) : \quad c_{k}^{(j)} \in A \right\} . \end{aligned}$$

It follows that the intersection of all \(\textbf{R}^j\mathcal {H}_t( \varvec{\varepsilon })\) contains only the zero vector, that is \(\widehat{ \mathcal {H}}_t(\varvec{\varepsilon })\) is the null submodule: \(\widehat{ \mathcal {H}}_t(\varvec{\varepsilon })=\{0\}\).

We now turn to the submodule \(\widetilde{\mathcal {H}}_t(\varvec{ \varepsilon })\). The wandering submodule \(\mathcal {L}_t^\textbf{R}\) associated with \(\textbf{R}\) is the orthogonal complement of \(\textbf{R} \mathcal {H}_t(\varvec{\varepsilon })\) in \(\mathcal {H}_t(\varvec{ \varepsilon })\). As \(\textbf{R}\) is linear and bounded, such submodule coincides with the kernel of the adjoint operator \(\mathbf {R^*}\) (Proposition 19 in “Appendix A.7”). Therefore,

$$\begin{aligned} \mathcal {L}_t^\textbf{R}=\left\{ \sum _{k= 0}^{+\infty } b_k^{(1)} \varepsilon _{t-2k}^{(1)} \in \mathcal {H}_t(\varvec{\varepsilon }) : \quad b_k^{(1)} \in A\right\} . \end{aligned}$$

Hence, \(\mathcal {L}_t^\textbf{R}\) contains the moving averages generated by the detail process at scale 1 on the support \(S_t^{(1)}\). More generally, for each \(j \in \mathbb {N}\), the image of \(\mathcal {L}_t^\textbf{R}\) through \(\textbf{R}^{j-1}\) is the submodule in Eq. (4), which consists of the moving averages generated by the detail process at scale j on \(S_t^{(j)}\).

4.2 The multivariate extended Wold decomposition of \(x_t\)

As in Sect. 3, we consider a zero-mean regular weakly stationary m-dimensional process \(\textbf{x}=\{x_t\}_{t \in \mathbb {Z}}\). We also require \(\textbf{x}\) to be purely non-deterministic in order to focus on the persistence generated by the shocks aggregation.

Theorem 2 (the MCWD) ensures that \(x_t\) belongs to \(\mathcal {H} _t(\varvec{\varepsilon })\), where \(\varvec{\varepsilon }\) is the process of fundamental innovations of \(\textbf{x}\). As a result, the orthogonal decomposition of \(\mathcal {H}_t(\varvec{\varepsilon })\) of Theorem 3 induces a decomposition of \(x_t\). Indeed, there exists a sequence \(\{ g_t^{(j)}\}_{j\in \mathbb {N}}\) of random processes such that

$$\begin{aligned} x_t=\sum _{j=1}^{+\infty } g_t^{(j)}, \end{aligned}$$
(5)

where each \(g_t^{(j)}\) is the orthogonal projection of \(x_t\) on the submodule \(\textbf{R}^{j-1} \mathcal {L}_t^\textbf{R}\) of \(\mathcal {H}_t( \varvec{\varepsilon })\). We refer to \(g_t^{(j)}\) as the (multivariate) persistent component at scale j. Clearly, given t, the components \(g_t^{(j)}\) are orthogonal to each others. In addition, since each \(g_t^{(j)}\) belongs to \(\textbf{R}^{j-1} \mathcal {L}_t^\textbf{R}\),

$$\begin{aligned} g_t^{(j)}=\sum _{k=0}^{+\infty } \beta _k^{(j)} \varepsilon _{t-k2^j}^{(j)} \end{aligned}$$

where \(\Vert \sum _{k=0}^{\infty } \beta _k^{(j)} \varepsilon _{t-k2^j}^{(j)} \Vert ^2_{ \bar{\varphi }}=\sum _{k=0}^{\infty } \textrm{Tr} ( \beta _k^{(j)} {\beta _k^{(j)} }^{\prime })\) is finite. Each \(\beta _k^{(j)}\) is obtained by projecting \(x_t\) on the submodule generated by the detail \(\varepsilon _{t-k2^j}^{(j)}\) and so

$$\begin{aligned} \beta _k^{(j)}=\left\langle x_t, \varepsilon _{t-k2^j}^{(j)}\right\rangle _H = \mathbb {E} \left[ x_t {\varepsilon _{t-k2^j}^{(j)}}^{\prime }\right] . \end{aligned}$$

By replacing the expression of \(g_t^{(j)}\) into Eq. (5), we obtain the MEWD of \(x_t\) stated in Eq. (6). The explicit expression of matrices \(\beta _k^{(j)}\) in Eq. (7) below turns out to be the multivariate version of eq. (7) in Ortu et al. (2020a), which features the same writing.

Theorem 4

(Multivariate extended Wold decomposition) Let \(\textbf{x}\) be a zero-mean regular weakly stationary purely non-deterministic m-dimensional process. Then, \(x_t\) decomposes as

$$\begin{aligned} x_t= \sum _{j=1}^{+\infty }\sum _{k=0}^{+\infty }\beta _k^{(j)} \varepsilon _{t-k2^j}^{(j)}, \end{aligned}$$
(6)

where the equality is in norm and

  1. 1.

    for any fixed \(j\in \mathbb {N}\), the m-dimensional process \( \varvec{\varepsilon ^{(j)}}=\{\varepsilon ^{(j)}_{t}\}_{t \in \mathbb {Z}}\) with

    $$\begin{aligned} \varepsilon _{t}^{(j)}=\frac{1}{\sqrt{2^j}}\left( \sum _{i=0}^{2^{j-1}-1} \varepsilon _{t-i}-\sum _{i=0}^{2^{j-1}-1} \varepsilon _{t-2^{j-1}-i} \right) \end{aligned}$$

    is a \(MA(2^j-1)\) with respect to the classical Wold innovations of \(\textbf{x }\) and \(\{\varepsilon ^{(j)}_{t-k2^j}\}_{k \in \mathbb {Z}}\) is a unit variance white noise;

  2. 2.

    for any \(j\in \mathbb {N}\), \(k \in \mathbb {N}_0\), the \(m\times m\) matrices \(\beta _k^{(j)}\) are uniquely determined via

    $$\begin{aligned} \beta _k^{(j)}=\frac{1}{\sqrt{2^j}}\left( \sum _{i=0}^{2^{j-1}-1}\alpha _{k2^j+i}-\sum _{i=0}^{2^{j-1}-1} \alpha _{k2^j+2^{j-1}+i}\right) ; \end{aligned}$$
    (7)

    hence, they are independent of t and \(\sum _{k=0}^{\infty } \textrm{Tr} (\beta _k^{(j)} {\beta _k^{(j)}}^{\prime })<+\infty \) for any \(j\in \mathbb {N}\);

  3. 3.

    letting

    $$\begin{aligned} g_t^{(j)}=\sum _{k=0}^{+\infty }\beta _k^{(j)}\varepsilon _{t-k2^j}^{(j)}, \end{aligned}$$
    (8)

    for any \(j,l \in \mathbb {N}, p,q,t \in \mathbb {Z}\), \(\mathbb {E}[ g_{t-p}^{(j)}{g_{t-q}^{(l)}}^{\prime }]\) depends at most on \(j,l,p-q\). Moreover, \(\mathbb {E}[ g_{t-m2^j}^{(j)}{g_{t-n2^l}^{(l)}}^{\prime }]=\textbf{ 0}\) for all \(j \ne l, m,n \in \mathbb {N}_0\) and \(t \in \mathbb {Z}\).

Proof

See “Appendix B.3.” \(\blacksquare \)

The matrix \(\beta _k^{(j)}\) is the (multivariate) scale-specific response associated with the innovation at scale j and time translation \( k2^j\). Since the details at different scales can be expressed in terms of the fundamental innovations \(\varepsilon _t\), the MEWD and the MCWD share the same shocks. For this reason, we can retrieve the matrices \(\beta _k^{(j)}\) from the matrices \(\alpha _h\) of the MCWD through Eq. (7).

An orthogonal decomposition of \(\mathcal {H}_t(\varvec{\varepsilon })\) into a finite number of submodules is also possible. Indeed, \(\mathcal {H}_t( \varvec{\varepsilon })= \textbf{R}\mathcal {H}_t(\varvec{\varepsilon }) \oplus \mathcal {L}_t^\textbf{R}\) and, by iteratively applying the scaling operator,

$$\begin{aligned} \mathcal {H}_t(\varvec{\varepsilon })= \textbf{R}^J\mathcal {H}_t( \varvec{\varepsilon }) \oplus \bigoplus _{j=1}^{J}\textbf{R}^{j-1}\mathcal { L}_t^\textbf{R}. \end{aligned}$$

The (multivariate) residual component at scale j is the orthogonal projection of \(x_t\) on the submodule \(\textbf{R}^j\mathcal {H}_t(\varvec{ \varepsilon })\), and we denote it by \(\pi _t^{(j)}\). As can be seen in the proof of Theorem 4, \(\pi _t^{(j)}\) has the expression:

$$\begin{aligned} \pi _t^{(j)}=\sum _{k=0}^{+\infty } \gamma _k^{(j)} \left( \frac{1}{\sqrt{2^j}} \sum _{i=0}^{2^j-1} \varepsilon _{t- k2^j-i} \right) , \qquad \qquad \gamma _k^{(j)}=\frac{1}{\sqrt{2^j}} \sum _{i=0}^{2^j-1}\alpha _{k2^j+i}. \end{aligned}$$
(9)

As a result, a MEWD of \(x_t\) holds both in the finite case, i.e., when a maximum scale J is chosen, and in the infinite one:

$$\begin{aligned} x_t=\pi _t^{(J)} + \sum _{j=1}^J g_t^{(j)} \qquad \qquad \text{ or } \qquad \qquad x_t=\sum _{j=1}^{+\infty }g_t^{(j)}. \end{aligned}$$

According to the third point of Theorem 4, when t is fixed, the orthogonality among persistent components involves all the shifted vectors \(g_{t-m2^j}^{(j)}\) and \(g_{t-n2^l}^{(l)}\), for any \(m,n \in \mathbb {Z}\), with time translation proportional to \(2^j\) and \(2^l\), respectively. In general, the cross-covariance matrix between \(g_{t-p}^{(j)}\) and \(g_{t-q}^{(l)}\) depends at most on the scales jl and on the difference \(p-q\).

By the MEWD, \(x_t\) is decomposed into the sum of orthogonal components \( g_t^{(j)}\) associated with different persistence levels j. Each vector \( g_t^{(j)}\) has innovations on a grid \(S_t^{(j)}=\{ t-k2^j: k \in \mathbb {Z} \}\) with time interval between two indices proportional to \(2^j\). When the scale j increases, the support \(S_t^{(j)}\) becomes sparser and the degree of persistence of details rises. For instance, if j is high and a scale-specific response \(\beta _k^{(j)}\) is remarkably different from the null matrix, \(x_t\) is affected by an important low-frequency component. On the contrary, if j is low and some \(\beta _k^{(j)}\) differs from zero, a high-frequency component is not negligible and impinges on \(x_t\) in the short term.

A justification of the fact that the iterated application of \(\textbf{R}\) increases persistence is due to spectral analysis considerations, and it is developed in detail in Ortu et al. (2020a) for the univariate case: see Section 2.1 therein and the supplementary material (Ortu et al. 2020b). The scaling operator defines, in fact, an approximate low-pass filter. In addition, as suggested in the same paper, bases different from 2 can be used to derive similar persistence-based decompositions.

The MEWD properly generalizes the univariate Extended Wold Decomposition of Ortu et al. (2020a). Indeed, in case the matrix coefficients \(\alpha _h\) are diagonal, for any \(i=1,\ldots , m\) the entry \(x_{i,t}\) depends only on the innovations \(\varepsilon _{i,t}\) and the scale-specific responses \(\beta _k^{(j)}\) are diagonal matrices, too. Each \(x_{i,t}\) satisfies the decompositions

$$\begin{aligned} x_{i,t}=\sum _{h=0}^{+\infty }\alpha _h(i,i) \varepsilon _{i,t-h} \qquad \textrm{ and} \qquad x_{i,t}=\sum _{j=1}^{+\infty } \sum _{k=0}^{+\infty } \beta _k^{(j)} (i,i) \varepsilon _{i,t-k2^j}^{(j)} \end{aligned}$$

with

$$\begin{aligned} \beta _{k}^{(j)}(i,i)=\frac{1}{\sqrt{2^j}}\left( \sum _{p=0}^{2^{j-1}-1} \alpha _{k2^j+p}(i,i) - \sum _{p=0}^{2^{j-1}-1} \alpha _{k2^j+2^{j-1}+p}(i,i) \right) , \end{aligned}$$

as in the univariate Extended Wold Decomposition.

Finally, the MEWD of \(x_{t}\) turned out to be a refinement of the MCWD, where \( \varvec{\varepsilon }\) is the process of fundamental innovations of \( \textbf{x}\). Nonetheless, such persistence-based decomposition holds also in case \(\varvec{\varepsilon }\) is any unit variance white noise that allows a moving average representation of \(x_{t}\). Furthermore, in case \( \varvec{\varepsilon }\) has a positive definite covariance matrix \(\xi \), then \(\xi =\zeta \zeta \) for some symmetric positive definite \(\zeta \in A\). Then, \(\eta _{t}=\zeta ^{-1}\varepsilon _{t}\) defines a unit variance white noise and the MCWD and the MEWD of \(x_{t}\) become, respectively,

$$\begin{aligned} x_{t}=\sum _{h=0}^{+\infty }\widetilde{\alpha }_{h}\eta _{t-h},\qquad \qquad x_{t}=\sum _{j=1}^{+\infty }\sum _{k=0}^{+\infty }\widetilde{\beta } _{k}^{(j)}\eta _{t-h} \end{aligned}$$

with \(\widetilde{\alpha }_{h}=\alpha _{h}\zeta \) and \(\widetilde{\beta } _{k}^{(j)}=\beta _{k}^{(j)}\zeta \). Alternatively, other tools can be used to factorize \(\xi \), as the Cholesky decomposition (see, for instance, the application in “Appendix C.3”).

4.3 Persistence-based variance decomposition

One of the strength of the MEWD is that it allows us to define a variance decomposition across the different persistence layers. The notion of orthogonality permits to decompose the variance of each entry \(x_{i,t}\) of the vector \(x_t\) into the sum of the variances of the corresponding entries of each persistent component \(g_t^{(j)}\). This permits to disentangle the exposure towards shocks with heterogeneous persistence without distinguishing among the different univariate disturbances. As a further step, the null correlation between the univariate innovations in the (multivariate) details \(\varepsilon _t^{(j)}\) allows us to decompose the variance of each entry of \(g_t^{(j)}\) in order to quantify the contribution of each source of randomness. In so doing, one can individuate the main time scales at which each univariate shock impacts the aggregate process \(x_t\).

To illustrate the variance decompositions, we consider a weakly stationary bivariate process \(x_t=[ y_t, z_t]^{\prime }\) with unit variance white noise \(\varepsilon _t= [ u_t, v_t ]^{\prime }\). We focus on \(y_t\) and on the first entry of the persistent components \(g_t^{(j)}\). The portion of variance of \( y_t\) associated with the latter is

$$\begin{aligned} {\text {var}}^{(j)}\left( y_t\right) = \sum _{k = 0}^{+\infty } \left[ \left( \beta _k^{(j)}(1,1)\right) ^2 + \left( \beta _k^{(j)}(1,2)\right) ^2 \right] \end{aligned}$$
(10)

and

$$\begin{aligned} {\text {var}}\left( y_t\right) = \sum _{j=1}^{+\infty } {\text {var}}^{(j)}\left( y_t\right) . \end{aligned}$$

Operationally, in order to assess the overall importance of each persistence level in explaining the total variance of \(y_t\), we can compute, for each scale j, the ratio \({\textrm{var}}^{(j)}(y_t) / {\textrm{var}}(y_t)\). In addition, to capture the effect of the persistence of each single shock, we can compute, at any scale j, the ratios

$$\begin{aligned} \sum _{k = 0}^{+\infty } \left( \beta _k^{(j)}(1,1)\right) ^2 \Big / {\textrm{var}}\left( y_t\right) , \qquad \qquad \sum _{k = 0}^{+\infty } \left( \beta _k^{(j)}(1,2)\right) ^2 \Big / {\textrm{var}}\left( y_t\right) . \end{aligned}$$
(11)

Although a persistence-based variance decomposition is already present in Ortu et al. (2020a), the orthogonality notion employed in the multivariate case makes it possible to quantify both the persistence of each time series in the vector process and the persistence of single shocks impacting each of them.

5 Conclusions

In this paper, we recast the standard treatment of multivariate time series in a Hilbert A-module framework and prove the Abstract Wold Theorem for Hilbert A-modules. This result allows us to revisit the MCWD and to derive multivariate version of the Extended Wold Decomposition of Ortu et al. (2020a), by using two different isometric operators. Interestingly, the MEWD provides a decomposition of the given vector process into uncorrelated persistent components driven by shocks with longer and longer duration. The orthogonality ensured by the theorem induces a variance decomposition that permits to establish the relative importance of each persistence component. Moreover, scale-specific responses allow us to isolate, on different time scales, dynamics that are not recognizable from the impulse responses of the aggregated process (some illustrations are in “Appendix C”).

The statistical inference about scale-specific responses is an important issue that goes beyond the scope of the paper. As to the estimators of the parameters in moving average models, one can refer to the literature summarized in the introduction of Ghysels et al. (2003), especially to the seminal work of Durbin (1959). However, a simple way to statistically test whether scale-specific responses are zero is using the bootstrap procedure (Efron and Tibshirani 1986) in the pipeline described in “Appendix C.2”. The bootstrap empirical distribution of scale-specific responses can be easily employed to obtain the test p value.