1 Introduction

In the setting of a covariance-stationary, mean zero time series \(\{X_t , t\in \mathbb {Z}\}\), the standard linear prediction problem amounts to extrapolating the one-step ahead (or multiple steps ahead) value based on the observed past; this is typically done via orthogonal projection of the random variable of interest on the linear span of the observed data—see e.g. Ch. 2 of Brockwell and Davis (1991). The interpolation problem is closely related but now we assume that the entire time series has been observed except one particular value. Because of stationarity, and without loss of generality, we may assume that the missing value is \(X_0\), i.e., the data consist of \(\{ X_t , t\in \mathbb {Z}{\setminus } \{0\} \}\). Then, the goal is to ‘predict’ \(X_0\) as a linear function of the data \(\{ X_t , t\in \mathbb {Z}{\setminus } \{0\} \}\); this is also done via orthogonal projection—see Kolmogorov (1941) and Wiener (1949), who pioneered the projection technique working independently during World War II.

The one-step ahead prediction problem has as its solution the autoregressive coefficients of the process; see e.g. Ch. 6.1 of McElroy and Politis (2020) . By contrast, the optimal interpolation problem leads to the notion of inverse autocorrelations; see e.g. Politis (1992). Cleveland (1972) apparently coined this term, but the solution to the interpolation problem was developed much earlier; see Chapter 2.3 of Grenander and Rosenblatt (1957), as well as the original monograph Wiener (1949). Later developments include Masani (1960), Chapter 2 of Rozanov (1967), Salehi (1979), Franke (1984), Pourahmadi (1989), and Cheng and Pourahmadi (1997). More recently, further results have been obtained by Bondon (2005), Kasahara et al. (2009), Inoue et al. (2016), Inoue (2020), and Inoue (2021); Lepot et al. (2017) provides a recent review.

Interpolation and extrapolation have been also well-studied in the context of a multivariate time series; see Ch. 6 of Hannan (1970), as well as some of the above references, such as Masani (1960). In the context outlined above interpolation results have only been obtained for the case of a single missing value or consecutive stretches of missing values—as discussed in Inoue (2021). (There is an extensive literature treating the missing value problem for finite-length samples, for example using the Kalman filter, but in this note we are focused on infinite-length samples.) The key novelty of our work, with respect to prior literature, is to provide interpolation formulas for the case of general, possibly non-consecutive, patterns of missing values present in multivariate time series. In Sect. 2 we derive the form of the optimal linear interpolator in the general multivariate case; application to the univariate case is immediate. Sect. 3 shows the validity of the optimal interpolating filter in the case of possibly slow decay of the autocovariances. Illustrations using two simple models are provided in Sect. 4. “Appendix A” presents some background on optimal linear prediction in the multivariate case, while “Appendix B” contains the technical proofs.

Remark 1

The objective of our paper is to derive the formula for the mean squared error optimal linear interpolator with multiple missing values; this is a theoretical result without regard to any data, following the Wiener–Kolmogorov theory of linear prediction as Hilbert space projection. Having derived the formula of the optimal filter, its coefficients could indeed be estimated from data by one of the usual methods, e.g., Method of Moments, Least Squares, pseudo Maximum Likelihood, etc.

2 Optimal linear interpolation of multiple missing values

Consider a covariance-stationary, mean zero, multivariate time series \(\{ X_t , t\in \mathbb {Z}\}\) with autocovariance at lag h denoted by \(\gamma _h= \mathbb {E}X_t X_{t-h}^\prime \) where \(^\prime \) denotes the transpose; each \( X_t\) is an N-dimensional column vector, and therefore \(\gamma _h\) is an \(N\times N\) matrix. A typical assumption is that the autocovariance generating function \(\gamma (z)= \sum _{h \in \mathbb {Z}} \gamma _h z^h\) is well-defined for z in some annulus about the unit circle of the complex plane; this converging power series in both powers of z and \(z^{-1}\) is called a Laurent series, and characterizes a holomorphic function of a complex variable. In this case, the spectral density of the time series \(\{ X_t \}\) is given by the function \(f(\lambda )=\gamma (e^{-i\lambda }) \) for \(\lambda \in [-\pi , \pi ]\).

If the determinant of \(\gamma (z)\) has no zeroes on the unit circle, then the function \(\xi (z) = { \gamma (z) }^{-1}\) is also holomorphic on some annulus about the unit circle. Consequently, it has a converging Laurent series expansion \(\xi (z) = \sum _{h \in \mathbb {Z}} \xi _h z^h\) for some coefficients \( { \{ \xi _h \} }_{h \in \mathbb {Z}}\) that we will call the inverse autocovariances; therefore, \(\xi (z) \) is the inverse autocovariance generating function. We can also define the inverse autocorrelation at lag h by \(\zeta _h = \xi _0^{-1} \xi _h\); see also Ch. 6.3 of McElroy and Politis (2020).

Our goal is to derive a linear filter to predict a finite number of missing values that are assumed to be completely missing (i.e., either all or none of the components of each random vector are available).Footnote 1 There will be one filter for each missing value, which is viewed as the target—just like there is one filter for each forecast lead in a multi-step ahead forecasting problem.

We need a reference time t, in terms of which to formulate the filters. Because of stationarity, this is not too important; hence, for ease of exposition, we choose \(t=0\) as the reference point.

Then, the missing value times will be denoted as \(k \in M\); we assume to have m missing values where \(m = |M| < \infty \) (here |M| denotes the number of elements in the set M). The elements of M are indexed in some way, and can be written \(M = \{ j_1, j_2, \ldots , j_m \}\). Without loss of generality, suppose these are ordered such that \(j_1< j_2< \cdots < j_m\) (these integers are of course distinct).

Fix \(k \in M\). The dataset available to predict the specific missing value \(X_k\) is \({ \{ X_{j} \} }_{j \not \in M}\). To predict \(X_{k}\) from \( { \{ X_{ j } \} }_{j \not \in M}\) we can employ the linear predictor

$$\begin{aligned} \widehat{X}_{k} = \sum _{j \not \in M} \pi _j X_{j}. \end{aligned}$$
(1)

The filter coefficients \(\pi _j\) are \(N \times N\)-dimensional matrices; they depend on k, but we suppress this to facilitate the notation. The challenge is to identify the optimal linear predictor, i.e., identify the optimal coefficients \(\pi _j\) that minimize the Mean Square Error (MSE) of predictionFootnote 2 that is defined as \(\mathbb {E}({X}_{k} -\widehat{X}_{k} ) { ({X}_{k} -\widehat{X}_{k} )}^{\prime }\).

Note that the filter (1) does not rely on any values that are missing. Hence, we can re-write the predictor (1) as

$$\begin{aligned} \widehat{X}_{k} = \sum _{j \in \mathbb {Z}} \pi _j X_{j} \end{aligned}$$
(2)

with the understanding that \(\pi _j = 0\) for \(j \in M\); we call these the zero constraints of the filter. Apparently, (2) can be expressed as \(\widehat{X}_k = \pi (B^{-1}) X_0\), where B is the backshift operator (and \(B^{-1}\) is a forward shift), i.e., \(BX_t=X_{t-1}\). The function \(\pi (z)\) is then defined via

$$\begin{aligned} \pi (z) = \sum _{j \in \mathbb {Z}} \pi _j z^j = \sum _{j \not \in M} \pi _j z^j. \end{aligned}$$
(3)

In our main result below, we identify the function \(\pi (z)\) associated with the optimal linear filter. Note that knowledge of \(\pi (z)\) for all z in an annulus about the unit circle is tantamount to knowing the coefficient sequence \(\{ \pi _j , j\in \mathbb {Z}\}\); to see why, we can plug \(z=e^{- i \lambda }\) (where \(i = \sqrt{-1}\)) into (3), yielding the Fourier series

$$\begin{aligned} \pi (e^{- i \lambda }) = \sum _{j \in \mathbb {Z}} \pi _j e^{- ij \lambda } . \end{aligned}$$
(4)

Consequently, the j-th Fourier coefficient of the function \( \pi (e^{- i \lambda }) \) is given by

$$\begin{aligned} \pi _j= {(2 \pi )}^{-1} \int _{-\pi }^{\pi } \pi (e^{- i \lambda }) e^{ij \lambda }d\lambda . \end{aligned}$$
(5)

To facilitate notation, in what follows we will denote \(\langle g(z) \rangle = {(2 \pi )}^{-1} \int _{-\pi }^{\pi } g(e^{-i \lambda }) d\lambda \); we will also employ the following assumption:

Assumption A

Assume that the autocovariance generating function \(\gamma (z)= \sum _{h \in \mathbb {Z}} \gamma _h z^h\) is well-defined for z in some annulus about the unit circle of the complex plane. Further assume that the determinant of \(\gamma (z)\) has no zeroes on the unit circle, i.e., that \(\gamma (z)\ne 0\) whenever \(|z|=1\).

Theorem 1

Assume Assumption A. Then, we can compute the inverse autocovariances via

$$\begin{aligned} \xi _h = \langle z^{-h} {\gamma (z)}^{-1} \rangle . \end{aligned}$$
(6)

Furthermore, the MSE-optimal linear predictor of \(X_{k}\) given data \( { \{ X_{j } \} }_{j \not \in M}\), where \(M = \{ j_1, j_2, \) \( \ldots , j_m \}\), is given by (1). Letting \(k = j_u\), and setting \(\pi _{j_r} = 0\) for \(1 \le r \le m\), the formula for \(\pi (z) = \sum _{j \in \mathbb {Z}} \pi _j z^j\) is given by

$$\begin{aligned} \pi (z) = I z^{k} - {E}_u \, \Xi ^{-1} \, Z \, \xi (z^{-1}), \end{aligned}$$
(7)

where \({E}_u = \underline{e}_{u}^{\prime } \otimes I\). Here, I denotes the \(N \times N\) identity matrix, \( \otimes \) denotes Kronecker product, \(\underline{e}_u\) is a dimension m unit vector (with one in position u for \(1 \le u \le m\), and zero else), \(\Xi \) is a \(mN \times mN\)-dimensional block Toeplitz matrix with rs-th block entry (for \(1 \le r,s \le m\)) given by \(\xi _{ j_r - j_s}\), and \(Z = \underline{z} \otimes I\), where \(\underline{z}\) is a vector with entries \(z^{j_r}\) for \(1 \le r \le m\).

Finally, the optimal predictor’s MSE matrix equals \( {E}_u \, \Xi ^{- \prime } \, {E}_u ^{\prime }\), where \(-\prime \) denotes inverse transpose.

Remark 2

The filter formula (7) automatically generates the zero constraints, because the jth coefficient is given by

$$\begin{aligned} \pi _j = I 1_{ \{ j = k \} } - \sum _{r=1}^m \Xi ^{-1}_{u,r} \, \xi _{ j_r - j}, \end{aligned}$$
(8)

where \(1_{ \{ j = k \} }\) is the indicator of set \( { \{ j = k \} }\). If \(j \in M\) then \(j = j_v\) for some \(1 \le v \le m\), implying

$$\begin{aligned} \pi _j = 1_{ \{ v = u \} } - \sum _{r=1}^m \Xi ^{-1}_{u,r} \, \Xi _{r,v} = 1_{ \{ v = u \} } - { \left( \Xi ^{-1} \Xi \right) }_{u,v} = 0 \end{aligned}$$

(since \(\xi _{j_r - j_v} = \Xi _{r,v}\)). The pattern of the non-zero weights depends on whether the missing values are clustered (i.e., contiguous) or isolated from one another, as this impacts the structure of \(\Xi \).

The results are somewhat easier to state in the univariate case (\(N=1\)), as summarized in the following corollary. Note that in this case the autocovariance generating function \(\gamma (z)\), as well as its inverse \({\gamma (z)}^{-1}\), are real-valued (scalar) functions. Historically, this univariate case has been of great interest; Masani (1960), Rozanov (1967), and Inoue (2021) discuss the situation where all the missing values are consecutive.

Corollary 1

Assume Assumption A, and that \(N=1\), i.e., the univariate case. Then, we can compute the inverse autocovariances via (6) as before.

The MSE-optimal linear predictor of \(X_{k}\) given data \( { \{ X_{j } \} }_{j \not \in M}\), where \(M = \{ j_1, j_2, \ldots , j_m \}\), is given by (1). Letting \(k = j_u\), and setting \(\pi _{j_r} = 0\) for \(1 \le r \le m\), the formula for \(\pi (z) = \sum _{j \in \mathbb {Z}} \pi _j z^j\) is given by

$$\begin{aligned} \pi (z) = z^{k} - \underline{e}_{u}^{\prime } \, \Xi ^{-1} \, \underline{z} \, \xi (z^{-1}), \end{aligned}$$
(9)

where \(\underline{e}_u\) is a dimension m unit vector (with one in position u for \(1 \le u \le m\), and zero else), \(\Xi \) is a \(m \times m\)-dimensional Toeplitz matrix with rs-th entry (for \(1 \le r,s \le m\)) given by \(\xi _{ j_r - j_s}\), and \(\underline{z}\) is a vector with entries \(z^{j_r}\) for \(1 \le r \le m\).

Finally, the optimal predictor’s MSE equals \( \underline{e}_{u}^{\prime } \, \Xi ^{- \prime } \, \underline{e}_{u} \), where \(-\prime \) denotes inverse transpose.

3 Optimal linear interpolation in the case of a possible slow decay of the autocovariances

In the previous section, it was assumed that the autocovariance generating function \(\gamma (z)\) is well-defined over an annulus about the unit circle. However, this presupposes that the autocovariances \(\gamma _h\) decay to zero exponentially fast as the lag h increases. While this is true in many cases, e.g. in stationary ARMA models, there are certainly situations where \(\gamma _h\) only decays slowly in h, with a polynomial rate. Even in such a case, the optimal interpolating filter would still be given by Eq. (7) of Theorem 1provided the infinite sums implicit in this formula are well-defined.

Recall that the coefficients of the optimal filter can be computed via the values of \(\gamma (z)\) for z on the unit circle; see Eqs. (4) and (5). Hence, we may focus our attention to the spectral density \(f(\lambda )=\gamma (e^{-i\lambda }) \), and try to find a sufficient condition for the optimal filter (7) to be well-defined. To this end, we are aided by Wiener’s lemma; see Wiener (1932) for the original, and Sun (2007) or Shin and Sun (2013) for extensions and elaborations.

In the univariate case (\(N=1\)), the classical Wiener’s lemma states that if a periodic function \(f(\lambda )\) has an absolutely convergent Fourier series and never vanishes, then \(1/ f(\lambda )\) also has an absolutely convergent Fourier series. We will formulate a multivariate version of Wiener’s lemma that will help us define the inverse autocovariances when \(\gamma (z)\) is well-defined only on the unit circle. Let \({\Vert \cdot \Vert }_2\) denote the matrix 2-norm;Footnote 3 we will require the following assumption:

Assumption B

Assume that the autocovariances \(\gamma _h\) have summable matrix 2-norm, i.e., \(\sum _{h \in \mathbb {Z}} { \Vert \gamma _h \Vert }_2 < \infty \), and that the determinant of \(f(\lambda )\), denoted by \(\det f(\lambda )\), does not vanish for \(\lambda \in [-\pi , \pi ]\).

The following is a multivariate version of Wiener’s lemma that is sufficient for our purposes. The result is not necessarily new in view of the general Banach space theorems of Bochner and Phillips (1942). Nevertheless, for completeness and concreteness, we provide a statement and proof in the finite-dimensional matrix setting.

Lemma 1

Suppose that \(f(\lambda ) = \gamma (e^{-i \lambda })\) satisfies Assumption B. Then the inverse \( { f(\lambda )}^{-1}\) is well-defined for \(\lambda \in [-\pi , \pi ]\), and has Fourier coefficients \(\{ \xi _h \}\) with summable matrix 2-norm.

Under the conditions of the above Lemma, the function \(\xi (e^{-i\lambda }) = { \gamma (e^{-i\lambda }) }^{-1}\) will have an absolutely convergent Fourier series \(\xi (e^{-i\lambda }) = \sum _{h \in \mathbb {Z}} \xi _h e^{-i h\lambda }\). The coefficients \( { \{ \xi _h \} }_{h \in \mathbb {Z}}\) are the inverse autocovariances; note that this is identical to Eq. (6). The inverse autocorrelations are defined by \(\zeta _h = \xi _0^{-1} \xi _h\) as before. In view of Lemma 1, the following is immediate.

Theorem 2

Assume Assumption B instead of Assumption A. Then, the conclusions and formulas of Theorem 1 remain valid.

Remark 3

It is also shown in the proof of Lemma 1 that the absolute value of each component of both f and \(f^{-1}\) has an absolutely convergent Fourier series—and in particular is absolutely integrable over \([-\pi , \pi ]\)—hence impying that (1.2) and (1.3) of Inoue (2021) hold. Therefore Assumption B implies “minimality”—a concept that Masani (1960) attributes to A.Kolmogorov—which essentially states that \(X_t\) (for any \(t \in \mathbb {Z}\)) cannot be perfectly predicted from all the other random variables \({ \{ X_s \} }_{s \ne t}\) of the stochastic process. Hence, Theorem 2 could also be proved by assuming minimality instead of Assumption B. However, we note that Assumption B is a weak assumption that is convenient to verify. For example, in the univariate case, assuming absolute summability of the autocovariances is easily checked, and moreover is necessary to ensure that the spectral density f is well-defined.

For completeness, we also state the corollary in the univariate case.

Corollary 2

Assume \(N=1\). Instead of Assumption A, assume that the autocovariances \(\gamma _h\) are absolutely summable, and that \(f(\lambda )\) does not vanish for \(\lambda \in [-\pi , \pi ]\). Then, the conclusions and formulas of Corollary 1 remain valid.

4 Illustrations

In this section we consider two illustrations: a VAR(1) and a VMA(1), i.e., a Vector Autoregressive model of order 1, and a Vector Moving Average model of order 1; these are both examples where the autocovariance generating function is well-defined over an annulus. Within the multivariate examples, we also examine the case of \(N=1\) as well.

4.1 VAR(1)

Suppose that \(\{ X_t \}\) is a VAR(1) process with coefficient matrix \(\Phi \) and innovation covariance \(\Sigma \), i.e., \(X_t= \Phi X_{t-1}+Z_t\) where \(Z_t\) is a mean zero white noise with \(\Sigma =\mathbb {E}Z_t Z_t^\prime .\) In this case,

$$\begin{aligned} \gamma (z)&= {(I - \Phi z)}^{-1} \, \Sigma {( I - \Phi ^{\prime } z^{-1} )}^{-1} \\ \xi (z)&= (I - \Phi ^{\prime } z^{-1}) \, \Sigma ^{-1} \, (I - \Phi z) \\ \xi _0&= \Sigma ^{-1} + \Phi ^{\prime } \, \Sigma ^{-1} \, \Phi \\ \xi _1&= - \Sigma ^{-1} \, \Phi . \end{aligned}$$

All but three of the inverse autocovariances are zero, and hence the summation in (8) can involve at most three terms, involving \(\xi _{-1}, \xi _0, \xi _1\). The structure of \(\Xi \) is

$$\begin{aligned} \Xi = \left[ \begin{array}{ccccc} \xi _0 &{} \xi _{j_1 - j_2} &{} 0 &{} \ldots &{} 0 \\ \xi _{j_2 - j_1} &{} \xi _0 &{} \xi _{j_2 - j_3} &{} 0 &{} \ldots \\ 0 &{} \xi _{j_3 - j_2} &{} \xi _0 &{} \xi _{j_3 - j_4 } &{} 0 \\ \vdots &{} \ddots &{} \ddots &{} \ddots &{} \ddots \\ 0 &{} \ldots &{} 0 &{} \xi _{j_{m} - j_{m-1}} &{} \xi _0 \end{array} \right] . \end{aligned}$$

Only the block diagonal, the super-diagonal, and the sub-diagonal are non-zero, and some of these entries can even be zero—the matrix is not Toeplitz in general. For instance, on the super-diagonal each entry is an inverse autocovariance based at a lag that is a consecutive difference of elements of M, and is zero unless this lag equals 1. Hence, entries of the super-diagonal are zero unless they correspond to consecutive (clustered) missing values.

To make the illustration more concrete, suppose that \(M = \{ 0, 1, 3 \}\), which means there are two consecutive missing values at times 0 and 1, but an isolated missing value at time 3. Then we find

$$\begin{aligned} \Xi&= \left[ \begin{array}{ccc} \xi _0 &{} \xi _{-1} &{} 0 \\ \xi _{1} &{} \xi _0 &{} 0 \\ 0 &{} 0 &{} \xi _0 \end{array} \right] \\ \Xi ^{-1}&= \left[ \begin{array}{ccc} \xi _0^{-1} + \xi _0^{-1} \xi _{-1} S^{-1} \xi _{1} \xi _0^{-1} &{} - \xi _0^{-1} \xi _{-1} S^{-1} &{} 0 \\ - S^{-1} \xi _{1} \xi _0^{-1} &{} S^{-1} &{} 0 \\ 0 &{} 0 &{} \xi _0^{-1} \end{array} \right] \\ S&= \xi _0 - \xi _{1} \xi _0^{-1} \xi _{-1}. \end{aligned}$$

Because of this structure of missing values, \(\pi _0 = \pi _1 = \pi _3 = 0\); also from (8) and the fact that \(\xi _h = 0\) for \(|h| > 1\), we find \(\pi _j = 0\) if \(j > 4 \) or \(j < -1\). So only \(\pi _{-1}\), \(\pi _2\), and \(\pi _4\) are non-zero, and their values depends on which of the three missing value filters we are considering. Setting \(\zeta _h = \xi _0^{-1} \xi _h\), for \(k = 0\) the coefficients are

$$\begin{aligned} \pi _{-1}&= - \left( \xi _0^{-1} + \xi _0^{-1} \xi _{-1} S^{-1} \xi _{1} \xi _0^{-1} \right) \xi _{1} = - { \left( I - \zeta _{-1} \zeta _{1} \right) }^{-1} \zeta _{1} \\ \pi _2&= \xi _0^{-1} \xi _{-1} S^{-1} \xi _{-1} = \zeta _{-1} \, {( I - \zeta _{1} \zeta _{-1} )}^{-1} \zeta _{-1} \\ \pi _4&= 0, \end{aligned}$$

which shows that only observations at either side of the cluster of missing values get weighted; also their weights are different, which makes sense because observation \(X_{1}\) has more to say about \(X_0\) than does observation \(X_{2}\), because it is closer (and hence has a higher degree of association) in lag distance.

Special case: \(N=1\). In the univariate case, the coefficients simplify to

$$\begin{aligned} \pi _{-1} = - \frac{ \zeta _1 }{ 1 - \zeta _1^2 } \quad \pi _2 = \frac{ \zeta _1^2 }{ 1 - \zeta _1^2}, \end{aligned}$$

using \(\zeta _{-1} = \zeta _1\). Clearly \(\pi _2 = - \zeta _1 \pi _{-1} \) and hence \(|\pi _2 | < |\pi _{-1}|\). Next, the coefficients for the \(k= 1\) filter are

$$\begin{aligned} \pi _{-1}&= S^{-1} \xi _{1} \xi _0^{-1} \xi _{1} = \zeta _{1} \,{( I - \zeta _{-1} \zeta _{1} )}^{-1} \, \zeta _{1} \\ \pi _2&= - S^{-1} \xi _{-1} = - {( I - \zeta _{1} \zeta _{-1} )}^{-1} \zeta _{-1} \\ \pi _4&= 0. \end{aligned}$$

Again there is no contribution from \(X_{4}\), but we see the weights for \(\pi _{-1}\) and \(\pi _2\) are similar to those of \(\pi _2\) and \(\pi _{-1}\) (respectively) in the \(k=0\) filter, only with \(\zeta _1\) and \(\zeta _{-1}\) interchanged. This is appropriate, and in the univariate case the weights exactly correspond. Finally, the coefficients for the \(k=3\) filter are

$$\begin{aligned} \pi _{-1}&= 0 \\ \pi _2&= - \xi _0^{-1} \xi _{1} = - \zeta _{1} \\ \pi _4&= - \xi _0^{-1} \xi _{-1} = - \zeta _{-1}, \end{aligned}$$

which provides equal weights in the univariate case for the observations on either side of the isolated missing value \(X_{3}\). As a final remark, it is simple to see that the weight \(\pi _j\) does not depend on \(\Sigma \) when \(N=1\), because it cancels out, but when \(N > 1\) the matrix \(\Sigma \) cannot be manipulated in this way, and the filter coefficients will depend on \(\Sigma \).

4.2 VMA (1)

Suppose that \(\{ X_t \}\) is a VMA(1) process written with a minus convention (for convenience in the following formulas): \( X_t = Z_t - \Theta ^{\prime } Z_{t-1}\), where \(\{ Z_t \}\) is a mean zero white noise with covariance matrix \(\Sigma \). This is an atypical parameterization of a VMA(1), where we have written \(-\Theta ^{\prime }\) where one would usually expect to see \(\Theta \), but this is only done so that the following formulas are less cluttered. It follows that

$$\begin{aligned} \gamma (z)&= {(I - \Theta ^{\prime } z)} \, \Sigma \, {( I - \Theta z^{-1} )} \\ \xi (z)&= {(I - \Theta z^{-1})}^{-1} \, \Sigma ^{-1} \, { (I - \Theta ^{\prime } z)}^{-1} \\ \xi _h&= \sum _{k \ge 0} \Theta ^{h+k} \Sigma ^{-1} \Theta ^{\prime k} = \Theta ^h \xi _0 \quad h \ge 0. \end{aligned}$$

As a result the structure of \(\Xi \) is

$$\begin{aligned} \Xi = \left[ \begin{array}{cccc} \xi _0 &{} \xi _0 {\Theta ^{j_2 - j_1} }^{\prime } &{} \xi _0 { \Theta ^{j_3 - j_1}}^{\prime } &{} \ldots \\ \Theta ^{j_2 - j_1} \xi _0 &{} \xi _0 &{} \xi _0 { \Theta ^{j_3 - j_2}}^{\prime } &{} \ldots \\ \vdots &{} \ddots &{} \ddots &{} \ddots \\ \ldots &{} { \Theta ^{ j_m - j_{m-2}} } \xi _0 &{} {\Theta ^{ j_m - j_{m-1}} } \xi _0 &{} \xi _0 \end{array} \right] . \end{aligned}$$

This block matrix is highly structured, but in general is not block Toeplitz.

For a concrete illustration, suppose that \(M = \{ 0, \ell \}\) for some \(\ell > 0\), which means there are two missing values that are clustered when \(\ell = 1\). Then we find

$$\begin{aligned} \Xi&= \left[ \begin{array}{ccc} \xi _0 &{} \xi _0 { \Theta ^{\ell }}^{\prime } \\ \Theta ^{\ell } \xi _0 &{} \xi _0 \end{array} \right] \\ \Xi ^{-1}&= \left[ \begin{array}{cc} ( I + \zeta _{-\ell } {[ I - \zeta _{ \ell } \zeta _{-\ell } ]}^{-1} \zeta _{\ell } ) \xi _0^{-1} &{} - \zeta _{-\ell } {[ I - \zeta _{\ell } \zeta _{-\ell } ]}^{-1} \xi _0^{-1} \\ - {[ I - \zeta _{ \ell } \zeta _{-\ell } ]}^{-1} \zeta _{\ell } \xi _0^{-1} &{} {[ I - \zeta _{ \ell } \zeta _{-\ell } ]}^{-1} \xi _0^{-1} \end{array} \right] \\ \pi _j&= {\left\{ \begin{array}{ll} I 1_{ \{ j = 0 \} } - {[ I - \zeta _{- \ell } \zeta _{ \ell } ]}^{-1} (\zeta _{-j} - \zeta _{-\ell } \zeta _{ \ell -j} ) &{}\quad \text{ if } \; k = 0 \\ I 1_{ \{ j = \ell \} } - {[ I - \zeta _{ \ell } \zeta _{-\ell } ]}^{-1} ( \zeta _{ \ell - j} - \zeta _{ \ell } \zeta _{-j} ) &{}\quad \text{ if } \; k = \ell . \end{array}\right. } \end{aligned}$$

We can directly see that these coefficients satisfy the zero constraints.

Special case: \(N=1\). In the univariate case, we can make some further simplifications, because \(\zeta _h = \theta ^{|h|} \) writing the scalar \(\theta \) for \(\Theta ^{\prime }\); recall that in the multivariate case \(\zeta _h \ne \zeta _{-h}\) in general. If \(k = 0\), then

$$\begin{aligned} \pi _j = {\left\{ \begin{array}{ll} 1_{ \{ j = 0 \} } - \theta ^{-j} &{}\quad \text{ if } \; j \le 0 \\ - {(1 - \theta ^{2 \ell } )}^{-1} ( \theta ^j - \theta ^{2 \ell - j} ) &{}\quad \text{ if } \; 0< j < \ell \\ 0 &{}\quad \text{ if } \; j \ge \ell , \end{array}\right. } \end{aligned}$$

and if \(k = \ell \) then

$$\begin{aligned} \pi _j = {\left\{ \begin{array}{ll} 1_{ \{ j = \ell \} } - \theta ^{j - \ell } &{}\quad \text{ if } \; j \ge \ell \\ - {(1 - \theta ^{2 \ell } )}^{-1} ( \theta ^{\ell -j} - \theta ^{ \ell + j} ) &{}\quad \text{ if } \; 0< j < \ell \\ 0 &{}\quad \text{ if } \; j \le 0. \end{array}\right. } \end{aligned}$$

One interesting feature is that the coefficients truncate on one side of the filter, past where the second missing value is placed—this differs from the case of a single missing value, where all coefficients (except \(\pi _0\)) are nonzero.