Abstract
The problem of linear interpolation in the context of a multivariate time series having multiple (possibly non-consecutive) missing values is studied. A concise formula for the optimal interpolating filter is derived, and illustrations using two simple models are provided.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the setting of a covariance-stationary, mean zero time series \(\{X_t , t\in \mathbb {Z}\}\), the standard linear prediction problem amounts to extrapolating the one-step ahead (or multiple steps ahead) value based on the observed past; this is typically done via orthogonal projection of the random variable of interest on the linear span of the observed data—see e.g. Ch. 2 of Brockwell and Davis (1991). The interpolation problem is closely related but now we assume that the entire time series has been observed except one particular value. Because of stationarity, and without loss of generality, we may assume that the missing value is \(X_0\), i.e., the data consist of \(\{ X_t , t\in \mathbb {Z}{\setminus } \{0\} \}\). Then, the goal is to ‘predict’ \(X_0\) as a linear function of the data \(\{ X_t , t\in \mathbb {Z}{\setminus } \{0\} \}\); this is also done via orthogonal projection—see Kolmogorov (1941) and Wiener (1949), who pioneered the projection technique working independently during World War II.
The one-step ahead prediction problem has as its solution the autoregressive coefficients of the process; see e.g. Ch. 6.1 of McElroy and Politis (2020) . By contrast, the optimal interpolation problem leads to the notion of inverse autocorrelations; see e.g. Politis (1992). Cleveland (1972) apparently coined this term, but the solution to the interpolation problem was developed much earlier; see Chapter 2.3 of Grenander and Rosenblatt (1957), as well as the original monograph Wiener (1949). Later developments include Masani (1960), Chapter 2 of Rozanov (1967), Salehi (1979), Franke (1984), Pourahmadi (1989), and Cheng and Pourahmadi (1997). More recently, further results have been obtained by Bondon (2005), Kasahara et al. (2009), Inoue et al. (2016), Inoue (2020), and Inoue (2021); Lepot et al. (2017) provides a recent review.
Interpolation and extrapolation have been also well-studied in the context of a multivariate time series; see Ch. 6 of Hannan (1970), as well as some of the above references, such as Masani (1960). In the context outlined above interpolation results have only been obtained for the case of a single missing value or consecutive stretches of missing values—as discussed in Inoue (2021). (There is an extensive literature treating the missing value problem for finite-length samples, for example using the Kalman filter, but in this note we are focused on infinite-length samples.) The key novelty of our work, with respect to prior literature, is to provide interpolation formulas for the case of general, possibly non-consecutive, patterns of missing values present in multivariate time series. In Sect. 2 we derive the form of the optimal linear interpolator in the general multivariate case; application to the univariate case is immediate. Sect. 3 shows the validity of the optimal interpolating filter in the case of possibly slow decay of the autocovariances. Illustrations using two simple models are provided in Sect. 4. “Appendix A” presents some background on optimal linear prediction in the multivariate case, while “Appendix B” contains the technical proofs.
Remark 1
The objective of our paper is to derive the formula for the mean squared error optimal linear interpolator with multiple missing values; this is a theoretical result without regard to any data, following the Wiener–Kolmogorov theory of linear prediction as Hilbert space projection. Having derived the formula of the optimal filter, its coefficients could indeed be estimated from data by one of the usual methods, e.g., Method of Moments, Least Squares, pseudo Maximum Likelihood, etc.
2 Optimal linear interpolation of multiple missing values
Consider a covariance-stationary, mean zero, multivariate time series \(\{ X_t , t\in \mathbb {Z}\}\) with autocovariance at lag h denoted by \(\gamma _h= \mathbb {E}X_t X_{t-h}^\prime \) where \(^\prime \) denotes the transpose; each \( X_t\) is an N-dimensional column vector, and therefore \(\gamma _h\) is an \(N\times N\) matrix. A typical assumption is that the autocovariance generating function \(\gamma (z)= \sum _{h \in \mathbb {Z}} \gamma _h z^h\) is well-defined for z in some annulus about the unit circle of the complex plane; this converging power series in both powers of z and \(z^{-1}\) is called a Laurent series, and characterizes a holomorphic function of a complex variable. In this case, the spectral density of the time series \(\{ X_t \}\) is given by the function \(f(\lambda )=\gamma (e^{-i\lambda }) \) for \(\lambda \in [-\pi , \pi ]\).
If the determinant of \(\gamma (z)\) has no zeroes on the unit circle, then the function \(\xi (z) = { \gamma (z) }^{-1}\) is also holomorphic on some annulus about the unit circle. Consequently, it has a converging Laurent series expansion \(\xi (z) = \sum _{h \in \mathbb {Z}} \xi _h z^h\) for some coefficients \( { \{ \xi _h \} }_{h \in \mathbb {Z}}\) that we will call the inverse autocovariances; therefore, \(\xi (z) \) is the inverse autocovariance generating function. We can also define the inverse autocorrelation at lag h by \(\zeta _h = \xi _0^{-1} \xi _h\); see also Ch. 6.3 of McElroy and Politis (2020).
Our goal is to derive a linear filter to predict a finite number of missing values that are assumed to be completely missing (i.e., either all or none of the components of each random vector are available).Footnote 1 There will be one filter for each missing value, which is viewed as the target—just like there is one filter for each forecast lead in a multi-step ahead forecasting problem.
We need a reference time t, in terms of which to formulate the filters. Because of stationarity, this is not too important; hence, for ease of exposition, we choose \(t=0\) as the reference point.
Then, the missing value times will be denoted as \(k \in M\); we assume to have m missing values where \(m = |M| < \infty \) (here |M| denotes the number of elements in the set M). The elements of M are indexed in some way, and can be written \(M = \{ j_1, j_2, \ldots , j_m \}\). Without loss of generality, suppose these are ordered such that \(j_1< j_2< \cdots < j_m\) (these integers are of course distinct).
Fix \(k \in M\). The dataset available to predict the specific missing value \(X_k\) is \({ \{ X_{j} \} }_{j \not \in M}\). To predict \(X_{k}\) from \( { \{ X_{ j } \} }_{j \not \in M}\) we can employ the linear predictor
The filter coefficients \(\pi _j\) are \(N \times N\)-dimensional matrices; they depend on k, but we suppress this to facilitate the notation. The challenge is to identify the optimal linear predictor, i.e., identify the optimal coefficients \(\pi _j\) that minimize the Mean Square Error (MSE) of predictionFootnote 2 that is defined as \(\mathbb {E}({X}_{k} -\widehat{X}_{k} ) { ({X}_{k} -\widehat{X}_{k} )}^{\prime }\).
Note that the filter (1) does not rely on any values that are missing. Hence, we can re-write the predictor (1) as
with the understanding that \(\pi _j = 0\) for \(j \in M\); we call these the zero constraints of the filter. Apparently, (2) can be expressed as \(\widehat{X}_k = \pi (B^{-1}) X_0\), where B is the backshift operator (and \(B^{-1}\) is a forward shift), i.e., \(BX_t=X_{t-1}\). The function \(\pi (z)\) is then defined via
In our main result below, we identify the function \(\pi (z)\) associated with the optimal linear filter. Note that knowledge of \(\pi (z)\) for all z in an annulus about the unit circle is tantamount to knowing the coefficient sequence \(\{ \pi _j , j\in \mathbb {Z}\}\); to see why, we can plug \(z=e^{- i \lambda }\) (where \(i = \sqrt{-1}\)) into (3), yielding the Fourier series
Consequently, the j-th Fourier coefficient of the function \( \pi (e^{- i \lambda }) \) is given by
To facilitate notation, in what follows we will denote \(\langle g(z) \rangle = {(2 \pi )}^{-1} \int _{-\pi }^{\pi } g(e^{-i \lambda }) d\lambda \); we will also employ the following assumption:
Assumption A
Assume that the autocovariance generating function \(\gamma (z)= \sum _{h \in \mathbb {Z}} \gamma _h z^h\) is well-defined for z in some annulus about the unit circle of the complex plane. Further assume that the determinant of \(\gamma (z)\) has no zeroes on the unit circle, i.e., that \(\gamma (z)\ne 0\) whenever \(|z|=1\).
Theorem 1
Assume Assumption A. Then, we can compute the inverse autocovariances via
Furthermore, the MSE-optimal linear predictor of \(X_{k}\) given data \( { \{ X_{j } \} }_{j \not \in M}\), where \(M = \{ j_1, j_2, \) \( \ldots , j_m \}\), is given by (1). Letting \(k = j_u\), and setting \(\pi _{j_r} = 0\) for \(1 \le r \le m\), the formula for \(\pi (z) = \sum _{j \in \mathbb {Z}} \pi _j z^j\) is given by
where \({E}_u = \underline{e}_{u}^{\prime } \otimes I\). Here, I denotes the \(N \times N\) identity matrix, \( \otimes \) denotes Kronecker product, \(\underline{e}_u\) is a dimension m unit vector (with one in position u for \(1 \le u \le m\), and zero else), \(\Xi \) is a \(mN \times mN\)-dimensional block Toeplitz matrix with rs-th block entry (for \(1 \le r,s \le m\)) given by \(\xi _{ j_r - j_s}\), and \(Z = \underline{z} \otimes I\), where \(\underline{z}\) is a vector with entries \(z^{j_r}\) for \(1 \le r \le m\).
Finally, the optimal predictor’s MSE matrix equals \( {E}_u \, \Xi ^{- \prime } \, {E}_u ^{\prime }\), where \(-\prime \) denotes inverse transpose.
Remark 2
The filter formula (7) automatically generates the zero constraints, because the jth coefficient is given by
where \(1_{ \{ j = k \} }\) is the indicator of set \( { \{ j = k \} }\). If \(j \in M\) then \(j = j_v\) for some \(1 \le v \le m\), implying
(since \(\xi _{j_r - j_v} = \Xi _{r,v}\)). The pattern of the non-zero weights depends on whether the missing values are clustered (i.e., contiguous) or isolated from one another, as this impacts the structure of \(\Xi \).
The results are somewhat easier to state in the univariate case (\(N=1\)), as summarized in the following corollary. Note that in this case the autocovariance generating function \(\gamma (z)\), as well as its inverse \({\gamma (z)}^{-1}\), are real-valued (scalar) functions. Historically, this univariate case has been of great interest; Masani (1960), Rozanov (1967), and Inoue (2021) discuss the situation where all the missing values are consecutive.
Corollary 1
Assume Assumption A, and that \(N=1\), i.e., the univariate case. Then, we can compute the inverse autocovariances via (6) as before.
The MSE-optimal linear predictor of \(X_{k}\) given data \( { \{ X_{j } \} }_{j \not \in M}\), where \(M = \{ j_1, j_2, \ldots , j_m \}\), is given by (1). Letting \(k = j_u\), and setting \(\pi _{j_r} = 0\) for \(1 \le r \le m\), the formula for \(\pi (z) = \sum _{j \in \mathbb {Z}} \pi _j z^j\) is given by
where \(\underline{e}_u\) is a dimension m unit vector (with one in position u for \(1 \le u \le m\), and zero else), \(\Xi \) is a \(m \times m\)-dimensional Toeplitz matrix with rs-th entry (for \(1 \le r,s \le m\)) given by \(\xi _{ j_r - j_s}\), and \(\underline{z}\) is a vector with entries \(z^{j_r}\) for \(1 \le r \le m\).
Finally, the optimal predictor’s MSE equals \( \underline{e}_{u}^{\prime } \, \Xi ^{- \prime } \, \underline{e}_{u} \), where \(-\prime \) denotes inverse transpose.
3 Optimal linear interpolation in the case of a possible slow decay of the autocovariances
In the previous section, it was assumed that the autocovariance generating function \(\gamma (z)\) is well-defined over an annulus about the unit circle. However, this presupposes that the autocovariances \(\gamma _h\) decay to zero exponentially fast as the lag h increases. While this is true in many cases, e.g. in stationary ARMA models, there are certainly situations where \(\gamma _h\) only decays slowly in h, with a polynomial rate. Even in such a case, the optimal interpolating filter would still be given by Eq. (7) of Theorem 1provided the infinite sums implicit in this formula are well-defined.
Recall that the coefficients of the optimal filter can be computed via the values of \(\gamma (z)\) for z on the unit circle; see Eqs. (4) and (5). Hence, we may focus our attention to the spectral density \(f(\lambda )=\gamma (e^{-i\lambda }) \), and try to find a sufficient condition for the optimal filter (7) to be well-defined. To this end, we are aided by Wiener’s lemma; see Wiener (1932) for the original, and Sun (2007) or Shin and Sun (2013) for extensions and elaborations.
In the univariate case (\(N=1\)), the classical Wiener’s lemma states that if a periodic function \(f(\lambda )\) has an absolutely convergent Fourier series and never vanishes, then \(1/ f(\lambda )\) also has an absolutely convergent Fourier series. We will formulate a multivariate version of Wiener’s lemma that will help us define the inverse autocovariances when \(\gamma (z)\) is well-defined only on the unit circle. Let \({\Vert \cdot \Vert }_2\) denote the matrix 2-norm;Footnote 3 we will require the following assumption:
Assumption B
Assume that the autocovariances \(\gamma _h\) have summable matrix 2-norm, i.e., \(\sum _{h \in \mathbb {Z}} { \Vert \gamma _h \Vert }_2 < \infty \), and that the determinant of \(f(\lambda )\), denoted by \(\det f(\lambda )\), does not vanish for \(\lambda \in [-\pi , \pi ]\).
The following is a multivariate version of Wiener’s lemma that is sufficient for our purposes. The result is not necessarily new in view of the general Banach space theorems of Bochner and Phillips (1942). Nevertheless, for completeness and concreteness, we provide a statement and proof in the finite-dimensional matrix setting.
Lemma 1
Suppose that \(f(\lambda ) = \gamma (e^{-i \lambda })\) satisfies Assumption B. Then the inverse \( { f(\lambda )}^{-1}\) is well-defined for \(\lambda \in [-\pi , \pi ]\), and has Fourier coefficients \(\{ \xi _h \}\) with summable matrix 2-norm.
Under the conditions of the above Lemma, the function \(\xi (e^{-i\lambda }) = { \gamma (e^{-i\lambda }) }^{-1}\) will have an absolutely convergent Fourier series \(\xi (e^{-i\lambda }) = \sum _{h \in \mathbb {Z}} \xi _h e^{-i h\lambda }\). The coefficients \( { \{ \xi _h \} }_{h \in \mathbb {Z}}\) are the inverse autocovariances; note that this is identical to Eq. (6). The inverse autocorrelations are defined by \(\zeta _h = \xi _0^{-1} \xi _h\) as before. In view of Lemma 1, the following is immediate.
Theorem 2
Assume Assumption B instead of Assumption A. Then, the conclusions and formulas of Theorem 1 remain valid.
Remark 3
It is also shown in the proof of Lemma 1 that the absolute value of each component of both f and \(f^{-1}\) has an absolutely convergent Fourier series—and in particular is absolutely integrable over \([-\pi , \pi ]\)—hence impying that (1.2) and (1.3) of Inoue (2021) hold. Therefore Assumption B implies “minimality”—a concept that Masani (1960) attributes to A.Kolmogorov—which essentially states that \(X_t\) (for any \(t \in \mathbb {Z}\)) cannot be perfectly predicted from all the other random variables \({ \{ X_s \} }_{s \ne t}\) of the stochastic process. Hence, Theorem 2 could also be proved by assuming minimality instead of Assumption B. However, we note that Assumption B is a weak assumption that is convenient to verify. For example, in the univariate case, assuming absolute summability of the autocovariances is easily checked, and moreover is necessary to ensure that the spectral density f is well-defined.
For completeness, we also state the corollary in the univariate case.
Corollary 2
Assume \(N=1\). Instead of Assumption A, assume that the autocovariances \(\gamma _h\) are absolutely summable, and that \(f(\lambda )\) does not vanish for \(\lambda \in [-\pi , \pi ]\). Then, the conclusions and formulas of Corollary 1 remain valid.
4 Illustrations
In this section we consider two illustrations: a VAR(1) and a VMA(1), i.e., a Vector Autoregressive model of order 1, and a Vector Moving Average model of order 1; these are both examples where the autocovariance generating function is well-defined over an annulus. Within the multivariate examples, we also examine the case of \(N=1\) as well.
4.1 VAR(1)
Suppose that \(\{ X_t \}\) is a VAR(1) process with coefficient matrix \(\Phi \) and innovation covariance \(\Sigma \), i.e., \(X_t= \Phi X_{t-1}+Z_t\) where \(Z_t\) is a mean zero white noise with \(\Sigma =\mathbb {E}Z_t Z_t^\prime .\) In this case,
All but three of the inverse autocovariances are zero, and hence the summation in (8) can involve at most three terms, involving \(\xi _{-1}, \xi _0, \xi _1\). The structure of \(\Xi \) is
Only the block diagonal, the super-diagonal, and the sub-diagonal are non-zero, and some of these entries can even be zero—the matrix is not Toeplitz in general. For instance, on the super-diagonal each entry is an inverse autocovariance based at a lag that is a consecutive difference of elements of M, and is zero unless this lag equals 1. Hence, entries of the super-diagonal are zero unless they correspond to consecutive (clustered) missing values.
To make the illustration more concrete, suppose that \(M = \{ 0, 1, 3 \}\), which means there are two consecutive missing values at times 0 and 1, but an isolated missing value at time 3. Then we find
Because of this structure of missing values, \(\pi _0 = \pi _1 = \pi _3 = 0\); also from (8) and the fact that \(\xi _h = 0\) for \(|h| > 1\), we find \(\pi _j = 0\) if \(j > 4 \) or \(j < -1\). So only \(\pi _{-1}\), \(\pi _2\), and \(\pi _4\) are non-zero, and their values depends on which of the three missing value filters we are considering. Setting \(\zeta _h = \xi _0^{-1} \xi _h\), for \(k = 0\) the coefficients are
which shows that only observations at either side of the cluster of missing values get weighted; also their weights are different, which makes sense because observation \(X_{1}\) has more to say about \(X_0\) than does observation \(X_{2}\), because it is closer (and hence has a higher degree of association) in lag distance.
Special case: \(N=1\). In the univariate case, the coefficients simplify to
using \(\zeta _{-1} = \zeta _1\). Clearly \(\pi _2 = - \zeta _1 \pi _{-1} \) and hence \(|\pi _2 | < |\pi _{-1}|\). Next, the coefficients for the \(k= 1\) filter are
Again there is no contribution from \(X_{4}\), but we see the weights for \(\pi _{-1}\) and \(\pi _2\) are similar to those of \(\pi _2\) and \(\pi _{-1}\) (respectively) in the \(k=0\) filter, only with \(\zeta _1\) and \(\zeta _{-1}\) interchanged. This is appropriate, and in the univariate case the weights exactly correspond. Finally, the coefficients for the \(k=3\) filter are
which provides equal weights in the univariate case for the observations on either side of the isolated missing value \(X_{3}\). As a final remark, it is simple to see that the weight \(\pi _j\) does not depend on \(\Sigma \) when \(N=1\), because it cancels out, but when \(N > 1\) the matrix \(\Sigma \) cannot be manipulated in this way, and the filter coefficients will depend on \(\Sigma \).
4.2 VMA (1)
Suppose that \(\{ X_t \}\) is a VMA(1) process written with a minus convention (for convenience in the following formulas): \( X_t = Z_t - \Theta ^{\prime } Z_{t-1}\), where \(\{ Z_t \}\) is a mean zero white noise with covariance matrix \(\Sigma \). This is an atypical parameterization of a VMA(1), where we have written \(-\Theta ^{\prime }\) where one would usually expect to see \(\Theta \), but this is only done so that the following formulas are less cluttered. It follows that
As a result the structure of \(\Xi \) is
This block matrix is highly structured, but in general is not block Toeplitz.
For a concrete illustration, suppose that \(M = \{ 0, \ell \}\) for some \(\ell > 0\), which means there are two missing values that are clustered when \(\ell = 1\). Then we find
We can directly see that these coefficients satisfy the zero constraints.
Special case: \(N=1\). In the univariate case, we can make some further simplifications, because \(\zeta _h = \theta ^{|h|} \) writing the scalar \(\theta \) for \(\Theta ^{\prime }\); recall that in the multivariate case \(\zeta _h \ne \zeta _{-h}\) in general. If \(k = 0\), then
and if \(k = \ell \) then
One interesting feature is that the coefficients truncate on one side of the filter, past where the second missing value is placed—this differs from the case of a single missing value, where all coefficients (except \(\pi _0\)) are nonzero.
Data availability
There is no data or code for this article.
Notes
This assumption precludes the handling of mixed frequency data—where a lower frequency section of the time series is viewed as a higher frequency time series observed with missing values—as well as ragged edge data (where the vector components at a particular time point may be partially missing).
Minimization of the MSE matrix is considered in the usual way, in the sense of an ordering of non-negative definite matrices; see “Appendix A”.
Alternatively, we can use the matrix Frobenius norm in the result’s formulation.
References
Artin M (1991) Algebra. Prentice-Hall, New Jersey
Bochner S, Phillips RS (1942) Absolutely convergent Fourier expansions for non-commutative normed rings. Ann Math 43:409–418
Bondon P (2005) Influence of missing values on the prediction of a stationary time series. J Time Ser Anal 26(4):519–525
Brockwell PJ, Davis RA (1991) Time series: theory and methods, 2nd edn. Springer, New York
Cheng R, Pourahmadi M (1997) Prediction with incomplete past and interpolation of missing values. Stat Probab Lett 33(4):341–346
Cleveland WS (1972) The inverse autocorrelations of a time series and their applications. Technometrics 14(2):277–293
Franke J (1984) On the robust prediction and interpolation of time series in the presence of correlated noise. J Time Ser Anal 5(4):227–244
Grenander U, Rosenblatt M (1957) Statistical analysis of stationary time series. Wiley, New York
Hannan EJ (1970) Multiple time series. John Wiley, New York
Inoue A (2020) Closed-form expression for finite predictor coefficients of multivariate ARMA processes. J Multivar Anal 176:104578
Inoue A (2021) Explicit formulas for the inverses of Toeplitz matrices, with applications. arXiv preprint arXiv:2105.01165
Inoue A, Kasahara Y, Pourahmadi M (2016) The intersection of past and future for multivariate stationary processes. Proc Am Math Soc 144(4):1779–1786
Kasahara Y, Pourahmadi M, Inoue A (2009) Duals of random vectors and processes with applications to prediction problems with missing values. Stat Probab Lett 79(14):1637–1646
Kolmogorov AN (1941) Stationary sequences in Hilbert space. Byul Moskov Gos Univ Ser Mat 2(6):3–40
Lepot M, Aubin JB, Clemens FH (2017) Interpolation in time series: an introductive overview of existing methods, their performance criteria and uncertainty assessment. Water 9(10):796
Masani P (1960) The prediction theory of multivariate stochastic processes, III. Acta Math 104(1–2):141–162
McElroy TS, Politis DN (2020) Time series: a first course with bootstrap starter. Chapman and Hall/CRC Press, Boca Raton
Politis DN (1992) Moving average processes and maximum entropy. IEEE Trans Inf Theory 38(3):1174–1177
Pourahmadi M (1989) Estimation and interpolation of missing values of a stationary time series. J Time Ser Anal 10(2):149–169
Rozanov IA (1967) Stationary random processes. Holden-Day
Salehi H (1979) Algorithms for linear interpolator and interpolation error for stationary stochastic processes. Ann Probab 840–846
Shin CE, Sun Q (2013) Wiener’s lemma: localization and various approaches. Appl Math J Chin Univ 28(4):465–484
Sun Q (2007) Wiener’s lemma for infinite matrices. Trans Am Math Soc 359:3099–3123
Wiener N (1932) Tauberian theorems. Ann Math 33(1):1–100
Wiener N (1949) Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications. MIT Press, Cambridge
Acknowledgements
Many thanks are due to Qiyu Sun for his helpful suggestions with regards to the multivariate Wiener’s lemma. We are also grateful to two anonymous referees for helpful suggestions. Research of the second author was partially supported by NSF grant DMS 19-14556.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This report is released to inform interested parties of research and to encourage discussion. The views expressed on statistical issues are those of the authors and not those of the U.S. Census Bureau.
Appendices
Appendix A: Minimal MSE matrix and optimal linear prediction
For symmetric non-negative definite matrices A and B, we say that \(A \ge B\) if and only if \(A - B \) is non-negative definite, which we can denote as \(A - B \ge 0\). A predictor is said to have minimal MSE matrix \(\Sigma \) if and only if any other predictor has MSE matrix \(\widetilde{\Sigma } \ge \Sigma \). Any optimal linear prediction satisfies the normal equations, and hence has minimal MSE matrix, which can be shown as follows.
Let \(\widehat{X}\) be the optimal linear predictor of some random vector X. Then,
by definition of MSE matrix. By the normal equations, the error \(X - \widehat{X}\) is orthogonal to all linear functions of the information set (i.e., the random variables used to form the prediction). Hence any other linear predictor \(\widetilde{X}\) has MSE matrix
using the fact that \(\widehat{X} - \widetilde{X}\) is a linear function of the information set, and hence is orthogonal to \(X - \widehat{X}\). Therefore
a non-negative definite matrix; hence, \(\widetilde{\Sigma } \ge \Sigma \). \(\square \)
Appendix B: Technical proofs
Proof of Theorem 1
First note that Assumption A implies that \(\xi (z) = {\gamma (z)}^{-1}\) is also holomorphic on some annulus about the unit circle, i.e., it has the converging Laurent series expansion \(\xi (z) = \sum _{h \in \mathbb {Z}} \xi _h z^h\); hence, Eq. (6) follows. Now, the normal equations for the optimal predictor are
for all \(\ell \not \in M\). This is equivalent to
or
Mapping \(z \mapsto z^{-1}\) by change of variable and consolidating, we have
for all \(\ell \not \in M\). We know the function \( z^{- \ell } ( I z^k - \pi (z) ) \gamma (z^{-1})\) is a Laurent series, and the above equation says that the \(\ell \)th coefficient of \((I z^k - \pi (z) ) \gamma (z^{-1})\) is zero unless \(\ell \in M\). Hence there exist real \(N \times N\)-dimensional coefficient matrices \({ \{ c_h \} }_{h \in M}\) such that
Solving for \(\pi (z)\), we obtain
Now we apply the zero constraints by integrating (A.1) against \(z^{- \ell }\) for each \(\ell \in M\):
These are m linear equations in m (matrix) unknowns, and are easily solved. Let \(C = [ c_{j_1}, c_{j_2}, \ldots , c_{j_m} ]\); recalling that \(k = j_u\), our system is written
where the identity matrix is in position u among m slots. Therefore \( C = {E}_u \, \Xi ^{-1}\), with invertibility guaranteed because the corresponding spectral density \(\xi (e^{-i \lambda })\) is invertible for all \(\lambda \). Plugging into (A.1) yields (7). Note that \( {E}_u^{\prime } \Xi ^{-1}\) is just the u-th block row of \(\Xi ^{-1}\).
To compute the MSE, recall that B is the backshift operator. The prediction error is
which has mean zero with variance matrix
this is the uu-th block matrix of \(\Xi ^{-1}\). \(\square \)
Proof of Lemma 1
The crux of the argument is that for each \(\lambda \) the matrix \(f(\lambda )\) is invertible if and only if its determinant is non-zero; but \(\det f(\lambda )\) is a scalar quantity, to which we can apply the original Wiener’s lemma. Recall the formula
where \({ f(\lambda )}^{\sharp }\) is called the adjoint matrix. Note that \({ f(\lambda )}^{\sharp }\) is well-defined for any \(\lambda \), because its matrix entries are given by finite sums and products of the entries of \(f(\lambda )\); see e.g. Artin (1991). First, we establish that the Fourier coefficients of the adjoint have square summable matrix 2-norm. Let the kth Fourier coefficient of the adjoint be denoted \(\tau _k = \langle z^{-k} { \gamma (z)}^{\sharp } \rangle \). Let \({ \Vert \cdot \Vert }_F\) denote the Frobenius norm; then, by the Plancherel identity we have
which also shows that a sufficient condition for the \(\{ \tau _k \}\) to have square summable matrix 2-norm (in view of the fact that \({ \Vert A \Vert }_2 \le { \Vert A \Vert }_F\) for any matrix A) is that \( { \Vert {\gamma (z)}^{\sharp } \Vert }_F^2\) has finite integral. By the definition of adjoint,
where the s, r subscript means that we remove the s-th row and r-th column before computing the determinant. Because \( f (\lambda )\) is Hermitian, we do not need the absolute value bars, and for any r, s we obtain
In the above, we have used the fact that for a non-negative definite matrix the determinant is bounded by the Nth power of the largest eigenvalue; we also used a well-known expression for the matrix 2-norm, took the sub-matrix of \(f( \lambda )\) term by term, used the triangle inequality, noted that the 2-norm is bounded by the Frobenius norm, and finally used the fact that the Frobenius norm of a sub-matrix is bounded above by the Frobenius norm of the whole matrix. Because the matrix Frobenius norm is bounded by a constant times the matrix 2-norm, it follows that the Frobenius norm of the \(\{ \gamma _k \}\) is summable, and hence the \(\{ \tau _k \}\) have square summable matrix 2-norm.
Next, we show that \(\det f(\lambda )\) has absolutely summable Fourier coefficients. From the identity
we find that the Fourier coefficients of \( \det f(\lambda ) \, I_N \) are
Applying the triangle inequality and the Hölder inequality for sequences (and letting \(C = {\Vert I_N \Vert }_2\)),
We have already shown square summability of the adjoint’s Fourier coefficients (with respect to the matrix 2-norm), and the \(\{ \gamma _j \}\) have summable, and hence square summable, matrix 2-norm by Assumption B. Hence, \(\det f(\lambda )\) has absolutely summable Fourier coefficients, and we can apply Wiener’s lemma, thereby concluding that \(1/ \det f(\lambda )\) has an absolutely convergent Fourier series.
Finally, having shown the square summability of both factors on the right hand side of Eq. (A.2), a Hölder inequality argument similar to that leading to Eq. (A.3) implies that the Fourier coefficients of \( { f(\lambda )}^{-1}\) have summable matrix 2-norm. This completes the proof, but we also show that any component of f or \(f^{-1}\) is absolutely integrable:
which is finite for any \(1 \le r,s, \le N\). A similar argument holds for \(f^{-1}\). \(\square \)
Rights and permissions
About this article
Cite this article
McElroy, T.S., Politis, D.N. Optimal linear interpolation of multiple missing values. Stat Inference Stoch Process 25, 471–483 (2022). https://doi.org/10.1007/s11203-022-09269-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11203-022-09269-5