Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator

Cai, Liqian; Bhattacharjee, Arnab; Calantone, Roger; Maiti, Taps

doi:10.1007/s13571-018-0176-z

Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator

Published: 06 November 2018

Volume 81, pages 146–200, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Sankhya B Aims and scope Submit manuscript

Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator

Download PDF

Liqian Cai ORCID: orcid.org/0000-0003-0322-6892¹,
Arnab Bhattacharjee²,
Roger Calantone³ &
…
Taps Maiti¹

158 Accesses
5 Citations
Explore all metrics

Abstract

We propose generalized moments LASSO estimator, combining LASSO with GMM, for penalized variable selection and estimation under the spatial error model with spatially autoregressive errors. We establish parameter consistency and selection sign consistency of the proposed estimator in the low dimensional setting when the parameter dimension p < sample size n , as well as the high dimensional setting with p greater than and growing with n. Finite sample performance of the method is examined by simulation, compared against the LASSO for IID data. The methods are applied to estimation of a spatial Durbin model for the Aveiro housing market (Portugal).

Article PDF

Robust variable selection with exponential squared loss for partially linear spatial autoregressive models

Article 03 May 2023

Shrinkage estimation of the linear model with spatial interaction

Article 13 August 2016

Robust variable selection with exponential squared loss for the partially linear varying coefficient spatial autoregressive model

Article 13 February 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ahrens, A. and Bhattacharjee, A. (2015). Two-step lasso estimation of the spatial weights matrix. Econometrics (MDPI)3, 1, 128–155.
Google Scholar
Ando, T. and Bai, J. (2016). Panel data models with grouped factor structure under unknown group membership. J. Appl. Econ.31, 163–191.
MathSciNet Google Scholar
Anselin, L. (1988). Spatial Econometrics: Methods and Models. Kluwer Academic, Boston.
MATH Google Scholar
Bai, Z.D. (1999). Methodologies in spectral analysis of large dimensional random matrices: A review. Statistica Sinica9, 611–677.
MathSciNet MATH Google Scholar
Bailey, N., Holly, S. and Pesaran, M.H. (2016). A two-stage approach to spatio-temporal analysis with strong and weak cross-sectional dependence. J. Appl. Econ.31, 249–280.
MathSciNet Google Scholar
Belloni, A. and Chernozhukov, V. (2011). High dimensional sparse econometric models: An introduction. arXiv:1106.5242v2.
Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli19, 521–547.
MathSciNet MATH Google Scholar
Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root LASSO: pivotal recovery of sparse signals via conic programming. Biometrika98, 791–806.
MathSciNet MATH Google Scholar
Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica80, 2369–2429.
MathSciNet MATH Google Scholar
Belloni, A., Chernozhukov, V. and Wei, Y. (2016). Post-selection inference for generalized linear models with many controls. Journal of Business and Economic Statistics, forthcoming.
Bhattacharjee, A., Castro, E.A. and Marques, J.L. (2012). Understanding spatial diffusion with factor-based hedonic pricing models: the urban housing market of Aveiro, Portugal. Spat. Econ. Anal.7, 1, 133–167.
Google Scholar
Bhattacharjee, A., Castro, E., Maiti, T. and Marques, J. (2016). Endogenous spatial regression and delineation of submarkets: A new framework with application to housing markets. J. Appl. Econ.31, 32–57.
MathSciNet Google Scholar
Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of LASSO and Dantzig selector. Ann. Stat.37, 1705–1732.
MathSciNet MATH Google Scholar
Brady, R.R. (2011). Measuring the diffusion of housing prices across space and over time. J. Appl. Econ.26, 2, 213–231.
MathSciNet Google Scholar
Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer.
Caner, M. and Zhang, H.H. (2014). Adaptive elastic net for generalized methods of moments. J. Bus. Econ. Stat.32, 1, 30–47.
MathSciNet Google Scholar
Castle, J.L. and Hendry, D.F. (2014). Model selection in under-specified equations with breaks. J. Econ.178, 286–293.
MathSciNet MATH Google Scholar
Castle, J.L., Doornik, J.A., Hendry, D.F. and Pretis, F. (2015). Detecting location shifts during model selection by step-indicator saturation. Econometrics (MDPI)3, 2, 240–264.
Google Scholar
Chudik, A. and Pesaran, M.H. (2011). Infinite-dimensional VARs and factor models. J. Econ.163, 1, 4–22.
MathSciNet MATH Google Scholar
Chudik, A., Grossman, V. and Pesaran, M.H. (2016). A multi-country approach to forecasting output growth using PMIs. Journal of Econometrics, forthcoming.
Cliff, A.D. and Ord, J.K. (1973). Spatial Autocorrelation. London: Pion.
Cuaresma, C. and Feldkircher, M. (2013). Spatial filtering, model uncertainty and the speed of income convergence in Europe. J. Appl. Econ.28, 4, 720–741.
MathSciNet Google Scholar
Feng, W., Lim, C., Maiti, T. and Zhang, Z. (2016). Spatial regression and estimation of disease risks: A clustering based approach. Statistical Analysis and Data Mining, forthcoming.
MathSciNet Google Scholar
Flores-Lagunes, A. and Schnier, K.E. (2012). Estimation of sample selection models with spatial dependence. J. Appl. Econ.27, 2, 173–204.
MathSciNet Google Scholar
Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33, 1–22.
Google Scholar
Fu, W. and Knight, K. (2000). Asymptotics for LASSO-type estimators. Ann. Stat.28, 1356–1378.
MathSciNet MATH Google Scholar
Geyer, C.J. (1996). On the asymptotics of convex stochastic optimization. Unpublished manuscript.
Hall, P. and Horowitz, J.L. (2005). Nonparametric methods for inference in the presence of instrumental variables. Ann. Stat.33, 2904–2929.
MathSciNet MATH Google Scholar
Hendry, D.F., Johansen, S. and Santos, C. (2008). Automatic selection of indicators in a fully saturated regression. Comput. Stat.33, 317–335. Erratum, 337–339.
MathSciNet MATH Google Scholar
Ishwaran, H. and Rao, J.S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Stat.33, 2, 730–773.
MathSciNet MATH Google Scholar
Johansen, S. and Nielsen, B. (2009). An analysis of the indicator saturation estimator as a robust regression estimator. In The Methodology and Practice of Econometrics, (J.L. Castle and N. Shephard, eds.). Oxford University Press, Oxford, pp. 1–36 .
Google Scholar
Kapoor, M., Kelejian, H.H. and Prucha, I.R. (2007). Panel data models with spatially correlated error components. J. Econ.140, 97–130.
MathSciNet MATH Google Scholar
Kelejian, H.H. and Prucha, I.R. (1999). A generalized moments estimator for the autoregressive parameter in a spatial model. Int. Econ. Rev.40, 509–533.
MathSciNet Google Scholar
Kelejian, H.H. and Prucha, I.R. (2010). Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J. Econ.157, 53–67.
MathSciNet MATH Google Scholar
Kock, A.B. and Callot, L. (2015). Oracle inequalities for high dimensional vector autoregressions. J. Econ.186, 2, 325–344.
MathSciNet MATH Google Scholar
Lam, C. and Souza, P.C. (2016). Regularization for spatial panel time series using the adaptive LASSO. Journal of Regional Science, forthcoming.
Lee, L.-F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica72, 6, 1899–1925.
MathSciNet MATH Google Scholar
Lee, L.-F. and Yu, J. (2010). Estimation of spatial autoregressive panel data models with fixed effects. J. Econ.154, 165–185.
MathSciNet MATH Google Scholar
Lee, L.-F. and Yu, J. (2016). Identification of spatial Durbin panel models. J. Appl. Econ.31, 1, 133–162.
MathSciNet Google Scholar
Lin, X. and Lee, L.-F. (2010). GMM estimation of spatial autoregressive models with unknown heteroskedasticity. J. Econ.157, 1, 34–52.
MathSciNet MATH Google Scholar
Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electr. J. Stat.2, 90–102.
MathSciNet MATH Google Scholar
Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the LASSO. Ann. Stat.34, 1436–1462.
MathSciNet MATH Google Scholar
Nandy, S., Lim, C. and Maiti, T. (2016). Additive model building for spatial regression. Journal of the Royal Statistical Society Series B, forthcoming.
Nowak, A. and Smith, P. (2017). Textual analysis in real estate. Journal of Applied Econometrics, forthcoming.
Pesaran, M.H., Schuermann, T. and Weiner, S.M. (2004). Modelling regional interdependencies using a global error-correcting macroeconometric model. J. Business Econ. Stat.22, 2, 129–162.
Google Scholar
Pollard, D. (1991). Asymptotic for least absolute deviation regression estimators. Econ. Theory7, 186–199.
MathSciNet Google Scholar
Stock, J.H. and Watson, M.W. (2002). Forecasting using principle components from a large number of predictors. J. Am. Stat. Assoc.97, 1167–1179.
MATH Google Scholar
Su, L. and Yang, Z. (2015). QML estimation of dynamic panel data models with spatial errors. J. Econ.185, 1, 230–258.
MathSciNet MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Series B58, 267–288.
MathSciNet MATH Google Scholar
Varian, H.R. (2014). Big data: new tricks for econometrics. J. Econ. Perspect.28, 3–28.
Google Scholar
Whittle, P. (1954). On stationary processes in the plane. Biometrica41, 434–449.
MathSciNet MATH Google Scholar
Yu, J., de Jong, R.M. and Lee, L.-F. (2008). Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both N and T are large. J. Econ.146, 1, 118–134.
MathSciNet MATH Google Scholar
Zhao, P. and Yu, B. (2006). On model selection consistency of LASSO. J. Mach. Learn. Res.7, 2541–2563.
MathSciNet MATH Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Stat. Assoc.101, 1418–1429.
MathSciNet MATH Google Scholar
Zou, H. and Zhang, H. (2009). On the adaptive elastic-net with a diverging number of parameters. Ann. Stat.37, 1733–1751.
MathSciNet MATH Google Scholar

Download references

Acknowledgments

We thank organisers and participants in seminars at the University of Illinois and Indian Statistical Institute, the USC Dornsife INET Conference on Big Data in Economics, and invited presentations at the 26^th (EC)² Conference and American Statistical Association JSM (Business & Economic Statistics Section) for valuable comments and suggestions. The usual disclaimer applies.

Author information

Authors and Affiliations

Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
Liqian Cai & Taps Maiti
Spatial Economics and Econometrics Centre, Heriot-Watt University, Edinburgh, UK
Arnab Bhattacharjee
Department of Marketing, Michigan State University, East Lansing, MI, USA
Roger Calantone

Authors

Liqian Cai
View author publications
You can also search for this author in PubMed Google Scholar
Arnab Bhattacharjee
View author publications
You can also search for this author in PubMed Google Scholar
Roger Calantone
View author publications
You can also search for this author in PubMed Google Scholar
Taps Maiti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liqian Cai.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 166 KB)

Appendix: Proofs of technical results

Proof Proof of Theorem 1.

Define a random function of ρ and ϕ,

$$Z_{n}(\phi ,\rho )=\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }{\Sigma} (\rho )(Y_{n}-X_{n}\phi )+\frac{\lambda_{n}}{n}\sum\limits_{j = 1}^{p}|\phi_{j}|. $$

By the definition of LASSO estimator, for any fixed ρ, Z_n(ϕ,ρ) is minimized at $\phi =\hat {\beta }_{L}(\rho )$.

However, we not have the true value of ρ, but instead, we use the GMM estimator $\hat {\rho }_{n}$ as a substitute. Then the function $Z_{n}(\phi , \hat {\rho }_{n})$ is minimized at the generalized moments LASSO estimator $ \phi =\hat {\beta }_{L}(\hat {\rho }_{n})$. Furthermore, denote by β the true value of the unknown parameter, and let

$$Z(\phi ,\rho )=(\beta -\phi )^{\prime }C(\rho )(\beta -\phi )+\sigma^{2}. $$

Then, it is easy to see that for any given ρ, Z(β,ρ) is minimized at ϕ = β. For each ϕ ∈ R^p,

$$\begin{array}{@{}rcl@{}} Z_{n}(\phi ,\hat{\rho}_{n}) &=&\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }{\Sigma} (\hat{\rho}_{n})(Y_{n}-X_{n}\phi )+\frac{\lambda_{n}}{n}\sum\limits_{j = 1}^{p}|\phi_{j}| \\ &=&{\Phi}_{1}-{\Phi}_{2}+{\Phi}_{2}+{\Phi}_{3} \end{array} $$

where

$${\Phi}_{1}=\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }{\Sigma} (\hat{\rho} _{n})(Y_{n}-X_{n}\phi ) $$

$${\Phi}_{2}=\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }{\Sigma} (\rho )(Y_{n}-X_{n}\phi ) $$

$${\Phi}_{3}=\frac{\lambda_{n}}{n}\sum\limits_{j = 1}^{p}|\phi_{j}| $$

Since $\frac {\lambda _{n}}{n}\rightarrow 0$, we have Φ₃ → 0. Also,

$$\begin{array}{@{}rcl@{}} {\Phi}_{2} &=&\frac{1}{n}[(I-\rho M_{n})X_{n}(\beta -\phi )+\epsilon_{n}]^{\prime }[(I-\rho M_{n})X_{n}(\beta -\phi )+\epsilon_{n}] \\ &=&\frac{1}{n}(\beta -\phi )^{\prime }X_{n}^{\prime }{\Sigma} (\rho )X_{n}(\beta -\phi )+\frac{1}{n}\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}(\beta -\phi )+ \\ &&\frac{1}{n}(\beta -\phi )^{\prime }X_{n}^{\prime }(I-\rho M_{n})^{\prime }\epsilon_{n}+\frac{1}{n}\epsilon_{n}^{\prime }\epsilon_{n} \\ \rightarrow_{p} &&(\beta -\phi )^{\prime }C(\rho )(\beta -\phi )+\sigma^{2} \\ &=&Z(\phi ,\rho ) \end{array} $$

by Assumption 6 and the weak law of large numbers.

Moreover, since $\hat {\rho }_{n}$ is a consistent estimator of ρ,

$$\begin{array}{@{}rcl@{}} {\Phi}_{1}-{\Phi}_{2} &=&\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }[{\Sigma} (\hat{ \rho}_{n})-{\Sigma} (\rho )](Y_{n}-X_{n}\phi ) \\ &=&\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }[(\rho -\hat{\rho} _{n})(M_{n}+M_{n}^{\prime })+(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}](Y_{n}-X_{n}\phi ) \\ &=&\frac{1}{n}\left[ (\beta -\phi )^{\prime }X_{n}^{\prime }+\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}\right] \\ &&\left[ (\rho -\hat{\rho}_{n})(M_{n}+M_{n}^{\prime })+(\hat{\rho} _{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}\right] \\ &&\left[ X_{n}(\beta -\phi )+(I-\rho M_{n})^{-1}\epsilon_{n}\right] \end{array} $$

$$\begin{array}{@{}rcl@{}} &=&\frac{1}{n}(\rho -\hat{\rho}_{n})(\beta -\phi )^{\prime }X_{n}^{\prime }(M_{n}+M_{n}^{\prime })X_{n}(\beta -\phi ) \\ &&+\frac{1}{n}(\hat{\rho}_{n}^{2}-\rho^{2})(\beta -\phi )^{\prime }X_{n}^{\prime }(M_{n}^{\prime }M_{n})X_{n}(\beta -\phi ) \\ &&+\frac{1}{n}(\rho -\hat{\rho}_{n})\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}+M_{n}^{\prime })X_{n}(\beta -\phi ) \\ &&+\frac{1}{n}(\hat{\rho}_{n}^{2}-\rho^{2})\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }M_{n})X_{n}(\beta -\phi ) \\ &&+\frac{1}{n}(\rho -\hat{\rho}_{n})(\beta -\phi )^{\prime }X_{n}^{\prime }(M_{n}+M_{n}^{\prime })(I-\rho M_{n})^{-1}\epsilon_{n} \\ &&+\frac{1}{n}(\hat{\rho}_{n}^{2}-\rho^{2})(\beta -\phi )^{\prime }X_{n}^{\prime }(M_{n}^{\prime }M_{n})(I-\rho M_{n})^{-1}\epsilon_{n} \\ &&+\frac{1}{n}(\rho -\hat{\rho}_{n})\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}+M_{n}^{\prime })(I-\rho M_{n})^{-1}\epsilon_{n} \\ &&+\frac{1}{n}(\hat{\rho}_{n}^{2}-\rho^{2})\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }M_{n})(I-\rho M_{n})^{-1}\epsilon_{n} \\ \rightarrow_{p} &&0 \end{array} $$

Therefore, $Z_{n}(\phi ,\hat {\rho }_{n})-Z(\phi ,\rho )\rightarrow _{p}0$ for any ϕ ∈ R^p. Combined with the fact that $Z_{n}(\phi ,\hat {\rho }_{n})$ is a convex function of ϕ, we have

$$\sup\limits_{\phi \in \mathcal{K}}|Z_{n}(\phi ,\hat{\rho}_{n})-Z(\phi ,\rho )|\rightarrow_{p}0 $$

for any compact set K and $\hat {\beta }_{L}(\hat {\rho }_{n})\in O_{p}(1)$ by applying the convexity lemma in Pollard (1991). From the above result we have

$$\arg \min (Z_{n}(\phi ,\hat{\rho}_{n}))\rightarrow_{p}\arg \min (Z(\phi ,\rho )) $$

which implies that

$$\hat{\beta}_{L}(\hat{\rho}_{n})\rightarrow_{p}\beta . $$

For asymptotic normality of the estimator, we need λ_n to grow slowly, and further assume that $\lambda _{n}=O(\sqrt {n})$. From the above proof, we already know that

$$nZ_{n}(\phi ,\hat{\rho}_{n})=(Y_{n}-X_{n}\phi )^{\prime }{\Sigma} (\hat{\rho} _{n})(Y_{n}-X_{n}\phi )+\lambda_{n}\sum\limits_{j = 1}^{p}|\phi_{j}| $$

is minimized at $\phi =\hat {\beta }_{L}(\hat {\rho }_{n})$. Now define $w=\sqrt { n}(\phi -\beta )$. Then $nZ_{n}(\phi ,\hat {\rho }_{n})$ can be treated as a function of w and

$$\begin{array}{@{}rcl@{}} nZ_{n}(\phi ,\hat{\rho}_{n}) &=&\left[ Y_{n}-X_{n}\left( \frac{w}{\sqrt{n}} +\beta \right) \right]^{\prime }{\Sigma} (\hat{\rho}_{n})\left[ Y_{n}-X_{n}\left( \frac{w}{\sqrt{n}}+\beta \right) \right] \\ &&+\lambda_{n}\sum\limits_{j = 1}^{p}\left\vert \frac{w_{j}}{\sqrt{n}}+\beta_{j}\right\vert \\ &=&\tilde{V}_{n}(w) \\ && \end{array} $$

is minimized at $\sqrt {n}\left (\hat {\beta }_{L}(\hat {\rho }_{n})-\beta \right ) $. The same is true for

$$\begin{array}{@{}rcl@{}} V_{n}(w) &=&\tilde{V}_{n}(w)-(Y_{n}-X_{n}\beta )^{\prime }{\Sigma} (\hat{\rho} _{n})(Y_{n}-X_{n}\beta )-\lambda_{n}\sum\limits_{j = 1}^{p}|\beta_{j}|. \\ && \end{array} $$

It follows that

$$\lambda_{n}\sum\limits_{j = 1}^{p}\left[ \left\vert \frac{w_{j}}{\sqrt{n}}+\beta_{j}\right\vert -|\beta_{j}|\right] \rightarrow \lambda_{0}\sum\limits_{j = 1}^{p}[w_{j}sgn(\beta_{j})I(\beta_{j}\neq 0)+|w_{j}|I(\beta_{j}= 0)]. $$

Also, define

$$\begin{array}{@{}rcl@{}} {\Omega}_{n}(w) &=&\left( Y_{n}-X_{n}\frac{w}{\sqrt{n}}-X_{n}\beta \right)^{\prime }{\Sigma} (\hat{\rho}_{n})\left( Y_{n}-X_{n}\frac{w}{\sqrt{n}} -X_{n}\beta \right) - \\ &&(Y_{n}-X_{n}\beta )^{\prime }{\Sigma} (\hat{\rho}_{n})(Y_{n}-X_{n}\beta ) \\ &=&{\Omega}_{n}(w)-{\Omega}_{1}(w)+{\Omega}_{1}(w), \end{array} $$

where

$${\Omega}_{1}(w)=\left( Y_{n}-X_{n}\frac{w}{\sqrt{n}}-X_{n}\beta \right)^{\prime }{\Sigma} (\rho )\left( Y_{n}-X_{n}\frac{w}{\sqrt{n}}-X_{n}\beta \right) -\epsilon_{n}^{\prime }\epsilon _{n}. $$

Easy to see that

$$\begin{array}{@{}rcl@{}} {\Omega}_{1}(w) &=&\left[ \epsilon_{n}-(I-\rho M_{n})X_{n}\frac{w}{\sqrt{n}} \right]^{\prime }\left[ \epsilon_{n}-(I-\rho M_{n})X_{n}\frac{w}{\sqrt{n}} \right] -\epsilon_{n}^{\prime }\epsilon_{n} \\ &=&-2\frac{1}{\sqrt{n}}w^{\prime }X_{n}^{\prime }(I-\rho M_{n})^{\prime }\epsilon_{n}+\frac{1}{n}w^{\prime }X_{n}^{\prime }(I-\rho M_{n})^{\prime }(I-\rho M_{n})X_{n}w \\ \rightarrow_{D} &&-2w^{\prime }U+w^{\prime }C(\rho )w \end{array} $$

where U ∼ N(0,σ²C(ρ)). Also

$$\begin{array}{@{}rcl@{}} {\Omega}_{n}(w)-{\Omega}_{1}(w) &=&\frac{-2}{\sqrt{n}}\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\hat{\rho}_{n})X_{n}w+\frac{1}{n} w^{\prime }X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})X_{n}w \\ &&+\frac{2}{\sqrt{n}}\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}w-\frac{1}{n} w^{\prime }X_{n}^{\prime }{\Sigma} (\rho )X_{n}w \\ &=&\frac{2}{\sqrt{n}}\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}[(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})-(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}]X_{n}w \\ &&-\frac{1}{n}w^{\prime }X_{n}^{\prime }[(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})-(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}]X_{n}w \\ \rightarrow_{p} &&0 \end{array} $$

where we use the consistency of $\hat {\rho }_{n}$ in the proof above. Thus V_n(w) →_DV (w), and combined with the fact that V_n is convex and V has a unique minimum, it follows from Geyer (1996) that

$$\arg \min (V_{n})=\sqrt{n}\left[ \hat{\beta}_{L}(\hat{\rho}_{n})-\beta \right] \rightarrow_{D}\arg \min (V(w)). $$

□

Proof Proof of Proposition 1.

By the definition of estimator in the second estimation step,

$$\hat{\beta}_{L}(\hat{\rho}_{n})=\arg \min_{\phi }[(Y_{n}-X_{n}\phi ){\Sigma} (\hat{\rho}_{n})(Y_{n}-X_{n}\phi )]+\lambda_{n}||\phi ||_{1}, $$

where the estimator is the minimizer of the penalized least square when the true spatial parameter ρ is replaced by its consistent estimator $\hat { \rho }_{n}$. Let φ = ϕ − β, which is equivalent to $\frac {w}{ \sqrt {n}}$ in the proof of Theorem 1. The following proof is similar to that of the proof of Theorem 1. Define

$$\begin{array}{@{}rcl@{}} D_{n}(\varphi ) &=&[(Y_{n}-X_{n}(\varphi +\beta ))^{\prime }{\Sigma} (\hat{\rho }_{n})(Y_{n}-X_{n}(\varphi +\beta ))]+\lambda_{n}||\varphi +\beta ||_{1} \\ &&-(Y_{n}-X_{n}\beta )^{\prime }{\Sigma} (\hat{\rho}_{n})(Y_{n}-X_{n}\beta ) \end{array} $$

Then

$$\begin{array}{@{}rcl@{}} \hat{\varphi} &=&\hat{\beta}_{L}(\hat{\rho}_{n})-\beta \\ &=&\arg \min_{\varphi }D_{n}(\varphi ). \end{array} $$

Separate D_n(φ) into two parts, D_n1(φ) and D_n2(φ). Let

$$\begin{array}{@{}rcl@{}} D_{n1}(\varphi ) &=&[(Y_{n}-X_{n}(\varphi +\beta ))^{\prime }{\Sigma} (\hat{ \rho}_{n})(Y_{n}-X_{n}(\varphi +\beta ))] \\ &&-(Y_{n}-X_{n}\beta )^{\prime }{\Sigma} (\hat{\rho}_{n})(Y_{n}-X_{n}\beta ) \\ &=&[(I-\hat{\rho}_{n}M_{n})((I-\rho M_{n})^{-1}\epsilon_{n}-X_{n}\varphi )]^{\prime }[(I-\hat{\rho}_{n}M_{n})((I-\rho M_{n})^{-1}\epsilon_{n}-X_{n}\varphi )] \\ &&-\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(I-\hat{\rho} _{n}M_{n}^{\prime })(I-\hat{\rho}_{n}M_{n})(I-\rho M_{n})^{-1}\epsilon_{n} \\ &=&-2\varphi^{\prime }X_{n}^{\prime }(I-\hat{\rho}_{n}M_{n})^{\prime }(I- \hat{\rho}_{n}M_{n})(I-\rho M_{n})^{-1}\epsilon_{n} \\ &&+\varphi^{\prime }X_{n}^{\prime }(I-\hat{\rho}_{n}M_{n}^{\prime })(I-\hat{ \rho}M_{n})X_{n}\varphi \\ &=&-2(\sqrt{n}\varphi )^{\prime }W^{n}+(\sqrt{n}\varphi )^{\prime }C^{n}(\hat{\rho}_{n})(\sqrt{n}\varphi ) \end{array} $$

where

$$W^{n}=W^{n}(\hat{\rho}_{n})=X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})(I-\rho M_{n})^{-1}\epsilon_{n}/\sqrt{n}, $$

Differentiate D_n(φ) w.r.t. φ, we have

$$\frac{dD_{n1}(\varphi )}{d\varphi }=-2\sqrt{n}W^{n}+ 2nC^{n}(\hat{\rho} _{n})\varphi . $$

Note here that both $\hat {\varphi }(1)$ and Wⁿ(1) are vectors of dimension p × 1. Let $\hat {\varphi }(1)$, Wⁿ(1) and $\hat {\varphi } (2)$, Wⁿ(2) denote the first q and last p − q entries of $\hat {\varphi }$ and Wⁿ respectively. Then by definition:

$$\{sign(\hat{\beta}_{Lj}(\hat{\rho}_{n}))=sign(\beta_{j}),for j = 1,2,{\cdots} ,q.\}\!\supseteq\! \{sign(\beta (1))\hat{\varphi}(1)\!>\!-|\beta (1)|\}. $$

Hence if there exists $\hat {\varphi }$ such that

$$C_{11}^{n}(\hat{\rho}_{n})(\sqrt{n}\hat{\varphi}(1))-W^{n}(1)=-\frac{\lambda_{n}}{2\sqrt{n}}sign(\beta (1)), $$

$$|\hat{\varphi}(1)|<|\beta (1)|, $$

$$-\frac{\lambda_{n}}{2\sqrt{n}}\mathbf{1}\leqslant C_{21}^{n}(\hat{\rho}_{n})(\sqrt{n}\hat{\varphi}(1))-W^{n}(2)\leqslant \frac{\lambda_{n}}{2\sqrt{ n}}\mathbf{1}, $$

then by Lemma 1 and the uniqueness of LASSO solution, $sign(\hat {\beta }_{L}(\hat {\rho }_{n})(1))=sign(\beta (1))$ and $\hat {\beta }_{L}(\hat {\rho } _{n})(2)=\beta (2)= 0$.

And the existence of such $\hat {\varphi }$ is implied by

$$ |(C_{11}^{n}(\hat{\rho}_{n}))^{-1}W^{n}(1)|<\sqrt{n}(|\beta (1)|-\frac{ \lambda_{n}}{2n}|(C_{11}^{n}(\hat{\rho}_{n}))^{-1}sign(\beta (1)|), $$

(A.1)

$$ |C_{21}^{n}(\hat{\rho}_{n})(C_{11}^{n}(\hat{\rho}_{n}))^{-1}W^{n}(1)-W^{n}(2)|\leqslant \frac{\lambda_{n}}{2\sqrt{n}}\left( \mathbf{1}-|C_{21}^{n}(\hat{\rho}_{n})\left( C_{11}^{n}(\hat{\rho}_{n})\right)^{-1}sign(\beta (1))|\right) $$

(A.2)

here (A.1) coincides with A_n and (A.2) contains B_n. The result for Proposition 1 follows. □

Proof Proof of Theorem 2.

From Proposition 1, we have

$$P(\hat{\beta}_{L}(\hat{\rho}_{n};\lambda )=_{s}\beta )\geqslant P(A_{n}\cap B_{n}). $$

Thus,

$$\begin{array}{@{}rcl@{}} P(A_{n}\cap B_{n}) &\geqslant &1-P({A_{n}^{c}})-P({B_{n}^{c}}) \\ &\geqslant &1-\sum\limits_{i = 1}^{q}P(|{z_{i}^{n}}|\geqslant \sqrt{n}(|{\beta_{i}^{n}}|- \frac{\lambda_{n}}{2n}{b_{i}^{n}})-\sum\limits_{i = 1}^{p-q}P(|{\zeta_{i}^{n}}|>\frac{ \lambda_{n}}{2\sqrt{n}}\eta_{i}). \end{array} $$

where $z^{n}=({z_{1}^{n}},{\cdots } ,{z_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}W^{n}(1)$, $\zeta ^{n}=({\zeta _{1}^{n}},{\cdots } ,\zeta _{p-q}^{n})^{\prime }=C_{21}^{n}$$(C_{11}^{n})^{-1}W^{n}(1)-W^{n}(2)$ and $b^{n}=({b_{1}^{n}},{\cdots } ,{b_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}sign(\beta (1))$.

Since $\hat {\rho }_{n}$ is a consistent estimator of ρ, similar to the proof of Theorem 1, and under the regularity conditions in Assumption 6, we have

$$(C_{11}^{n})^{-1}W^{n}(1)\rightarrow_{D}N(0,C_{11}^{-1}(\rho )\sigma^{2}) $$

This is because

$$\begin{array}{@{}rcl@{}} C^{n} &=&\frac{1}{n}X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})X_{n} \\ &=&\frac{1}{n}X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})X_{n}-\frac{1}{n} X_{n}^{\prime }{\Sigma} (\rho )X_{n}+\frac{1}{n}X_{n}^{\prime }{\Sigma} (\rho )X_{n} \\ &=&\frac{1}{n}X_{n}^{\prime }[(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}-(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})]X_{n}+\frac{1}{n} X_{n}^{\prime }{\Sigma} (\rho )X_{n} \\ \rightarrow_{p} &&C \end{array} $$

The final step follows from Assumption 3 and 6, together with the consistency of $\hat {\rho }_{n}$. Thus, $(C_{11}^{n}(\hat {\rho }_{n}))^{-1}\rightarrow _{p}(C_{11}(\rho ))^{-1}$. Similarily,

$$\begin{array}{@{}rcl@{}} &&X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})(I-\rho M_{n})^{-1}\epsilon_{n}/ \sqrt{n} \\ &=&X_{n}^{\prime }[(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}-(\hat{ \rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})](I-\rho M_{n})^{-1}\epsilon_{n}/ \sqrt{n} \\ &&+X_{n}^{\prime }(I-\rho M_{n}^{\prime })\epsilon_{n}/\sqrt{n} \\ &=&o_{p}(1)O_{p}(1)+X_{n}^{\prime }(I-\rho M_{n}^{\prime })\epsilon_{n}/ \sqrt{n} \\ && \end{array} $$

Since $X_{n}^{\prime }(I-\rho M_{n}^{\prime })\epsilon _{n}/\sqrt {n} \rightarrow _{d}N(0,\sigma ^{2}C(\rho ))$, we have

$$W_{n}=X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})(I-\rho M_{n})^{-1}\epsilon_{n}/ \sqrt{n}\rightarrow_{d}N(0,\sigma^{2}C(\rho )) $$

Thus Wⁿ(1) →_DN(0,σ²C₁₁(ρ)). Applying Slutsky’s theorem, we have

$$z^{n}=(C_{11}^{n})^{-1}W^{n}(1)\rightarrow_{D}N(0,(C_{11}(\rho ))^{-1}\sigma^{2}). $$

Making use of the above result, combined with the fact that

$$C_{21}^{n}(C_{11}^{n})^{-1}W^{n}(1)-W^{n}(2)=(C_{21}^{n}(C_{11}^{n})^{-1},-I_{p-q})W_{n} $$

we have

$$\zeta^{n}=C_{21}^{n}(C_{11}^{n})^{-1}W^{n}(1)-W^{n}(2)\rightarrow_{d}N(0,C_{22}(\rho )-C_{21}(\rho )C_{11}(\rho )^{-1}C_{12}(\rho )\sigma^{2}). $$

Hence all ${z_{i}^{n}}$’s and ${\zeta _{i}^{n}}$’s converge in distribution to Gaussian random variables with mean 0 and finite variance bounded by s²(ρ) for some constant function s(ρ). For t > 0, the Gaussian distribution has its tail probability bounded by

$$1-{\Phi} (t)<t^{-1}e^{-\frac{1}{2}t^{2}} $$

Since λ_n/n → 0 and $\lambda _{n}/n^{\frac {1+c}{2} }\geqslant r$ with 0 ≤ c < 1, we have

$$\begin{array}{@{}rcl@{}} &&\sum\limits_{i = 1}^{q}P(|{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n}{b_{i}^{n}}) \\ &\leqslant &\left( 1+o(1)\right) {\sum}_{i = 1}^{q}2\left( 1-{\Phi} \left( \frac{1 }{s(\rho )}n^{\frac{1}{2}}|{\beta_{i}^{n}}|\left( 1+o(1)\right) \right) \right) \\ &=&o\left( s(\rho )e^{\frac{-n^{c}}{s^{2}(\rho )}}\right) \end{array} $$

and

$$\sum\limits_{i = 1}^{p-q}P\left( |{\zeta_{i}^{n}}|\geqslant \frac{\lambda_{n}}{2\sqrt{ n}}\eta_{i}\right) =\left( 1+o(1)\right) \sum\limits_{i = 1}^{p-q}2\left( 1-{\Phi} \left( \frac{1}{s}\frac{\lambda_{n}}{2\sqrt{n}}\eta_{i}\right) \right) =o\left( s(\rho )e^{\frac{-n^{c}}{s^{2}(\rho )}}\right) . $$

Theorem 2 follows.□

Proof Proof of Proposition 2.

$$\begin{array}{@{}rcl@{}} &&P\left( \max\limits_{1\leqslant j\leqslant p}2|\epsilon_{n}^{\prime }T^{(j)}|/n>\lambda_{0}\right) \\ &=&P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n}+\frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\hat{\rho}_{n})X_{n}^{(j)}}{n}\right. \right. \\ &&\left. \left. -\frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n} \right\vert >\frac{\lambda_{0}}{2}\right) \\ &\leqslant &P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n}\right\vert \right. \\ &&\left. +\max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\hat{\rho}_{n})X_{n}^{(j)}-\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\rho )X_{n}^{(j)}}{n}\right\vert >\frac{\lambda_{0}}{2}\right) \end{array} $$

Let $r=\sigma \sqrt {\frac {\log 2p}{n}}$, and denote

$$A=\max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\hat{\rho}_{n})X_{n}^{(j)}-\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\rho )X_{n}^{(j)}}{n} \right\vert $$

Then

$$\begin{array}{@{}rcl@{}} &&P(A>r) \\ &=&P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}X_{n}^{(j)}}{n}\right. \right. \\ &&\left. \left. -\frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})X_{n}^{(j)}}{n}\right\vert >r\right) \\ &\leqslant &P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}X_{n}^{(j)}}{n}\right\vert \right. \\ &&\left. +\max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})X_{n}^{(j)}}{n}\right\vert >r\right) . \end{array} $$

Further define

$$A1=\max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}X_{n}^{(j)}}{n}\right\vert \text{ \ \ and} $$

$$A2=\max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})X_{n}^{(j)}}{n}\right\vert ; $$

therefore,

$$\begin{array}{@{}rcl@{}} P(A>r) &\leqslant &P(A1+A2>r) \\ &\leqslant &P\left( A1>\frac{r}{2}\right) +P\left( A2>\frac{r}{2}\right) , \end{array} $$

where the final inequality holds because, if {A1 + A2 > r}, at least one of {A1 > r/2} and {A2 > r/2} necessarily need to hold. Since $\hat {\rho }_{n}$ is a consistent estimator of ρ, that is, $\hat {\rho }_{n}\rightarrow _{p}\rho $, we have, for all t > 0, defining $c=\frac {1}{2}\exp (-\frac {t^{2}}{2})$, when n is large enough,

$$P(|\hat{\rho}_{n}-\rho |>c)<c $$

and

$$P(|\hat{\rho}_{n}^{2}-\rho^{2}|>c)<c. $$

Then, it is easy to see that

$$\begin{array}{@{}rcl@{}} P\left( A1>\frac{r}{2}\right) &=&P\left( \frac{\left\{ \max_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|\right\} |\hat{\rho}_{n}^{2}-\rho^{2}|}{n}>\frac{r}{2}\right) \\ &=&P\left( \left\{ \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|\right\} \left\vert \hat{\rho}_{n}^{2}-\rho^{2}\right\vert /n>\frac{r}{2}\right. \\ &&\left. \bigcap |\hat{\rho}_{n}^{2}-\rho^{2}|>c\right) \\ &&+P\left( \left\{ \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|\right\} \left\vert \hat{\rho}_{n}^{2}-\rho^{2}\right\vert /n>\frac{r}{2}\right. \\ &&\left. \bigcap |\hat{\rho}_{n}^{2}-\rho^{2}|\leq c\right) \\ &\leqslant &c+P\left( \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|>\frac{rn}{2c} \right) \end{array} $$

and

$$\begin{array}{@{}rcl@{}} P\left( A2>\frac{r}{2}\right) &=&P\left( \frac{\left\{ \max_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|\right\} \left\vert \hat{\rho} _{n}-\rho \right\vert }{n}>\frac{r}{2}\right) \\ &=&P\left( \left\{ \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|\right\} \left\vert \hat{\rho}_{n}-\rho \right\vert /n>\frac{r}{2}\right. \\ &&\left. \bigcap \left\vert \hat{\rho}_{n}-\rho \right\vert >c\right) \\ &&+P\left( \left\{ \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|\right\} \left\vert \hat{\rho}_{n}-\rho \right\vert /n>\frac{r}{2}\right. \\ &&\left. \bigcap \left\vert \hat{\rho}_{n}-\rho \right\vert \leq c\right) \\ &\leqslant &c+P\left( \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|>\frac{rn}{ 2c}\right) \\ && \end{array} $$

Next, we need the tail probability of $\max _{1\leqslant j\leqslant p}|\epsilon _{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|$ and $\max _{1\leqslant j\leqslant p}|\epsilon _{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|$. However, note that in our case, we do not assume Gaussian distribution for the error 𝜖_n, instead, we only have zero mean and finite second moment assumption (Assumption ??). Thus, we use the moment inequality derived from the Nemirovski’s inequality:

$$E\left( \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }U^{(j)}|\right)^{2}\leqslant 8\log(2p)\sum\limits_{i = 1}^{n}\left( \max\limits_{1\leqslant j\leqslant p}|U_{i}^{(j)}|\right) ^{2}E{\epsilon_{i}^{2}} $$

for any design matrix U, with U^(j) as its j th column. Based on the assumption, the row and column sums of M_n and (I − ρM_n)^− 1 are bounded uniformly in absolute value and each element of X_n are nonstochastic and uniformly bounded in absolute value. Also, we know that, if A_n and B_n are matrices that are conformable for multiplication with row and column sums uniformly bounded in absolute value, then the row and column sums of A_nB_n are also uniformly bounded in absolute value. Further, this result follows to 3 or more matrices.

Thus, the row and column sums of I − ρM_n, $(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}$ and $(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})$ are all bounded uniformly in absolute value. So every element in matrices $(I-\rho M_{n})X_{n}^{(j)}$, $(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}$ and $(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}$ are bounded, and denote the common bound for all of them as κ_B.

Then, we have

$$\begin{array}{@{}rcl@{}} P\left( A1>\frac{r}{2}\right) &\leqslant &c+\frac{E[\max_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|]^{2}}{(rn/2c)^{2}} \\ &\leqslant &c+\frac{8(2c)^{2}\log(2p)\sigma^{2}\kappa_{B}}{nr^{2}}, \end{array} $$

and similarly,

$$\begin{array}{@{}rcl@{}} P\left( A2>\frac{r}{2}\right) &\leqslant &c+\frac{E[\max_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|]^{2}}{(rn/2c)^{2}} \\ &\leqslant &c+\frac{8(2c)^{2}\log(2p)\sigma^{2}\kappa_{B}}{nr^{2}}. \end{array} $$

As a result,

$$\begin{array}{@{}rcl@{}} P(A>r) &\leqslant &P\left( A1>\frac{r}{2}\right) +P\left( A2>\frac{r}{2} \right) \\ &\leqslant &2c+\frac{(2c)^{2}\log(2p)\sigma^{2}\kappa_{B0}}{nr^{2}} \end{array} $$

Substituting the above probability bounds, we have

$$\begin{array}{@{}rcl@{}} &&P\left( \max\limits_{1\leqslant j\leqslant p}2|\epsilon_{n}^{\prime }T^{(j)}|/n>\lambda_{0}\right) \\ &\leqslant &P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n}\right\vert +A>\frac{\lambda_{0} }{2}\right) \\ &\leqslant &P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n}\right\vert +A>\frac{\lambda_{0} }{2}\bigcap A>r\right) \\ &&+P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n}\right\vert +A>\frac{\lambda_{0} }{2}\bigcap A\leqslant r\right) \end{array} $$

$$\begin{array}{@{}rcl@{}} &\leqslant &2c+\frac{(2c)^{2}\log(2p)\sigma^{2}\kappa_{B0}}{nr^{2}} +P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}\right\vert >n\left( \frac{\lambda_{0}}{2} -r\right) \right) \\ &\leqslant &2c+\frac{(2c)^{2}\log(2p)\sigma^{2}\kappa_{B0}}{nr^{2}}+ \frac{E\left( \max_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}|\right)^{2}}{n^{2}(\frac{\lambda_{0}}{2}-r)^{2}} \\ &\leqslant &2c+\frac{(2c)^{2}\log(2p)\sigma^{2}\kappa_{B0}}{nr^{2}}+ \frac{\log(2p)\sigma^{2}\kappa_{B0}}{n(\frac{\lambda_{0}}{2}-r)^{2}} \\ &\leqslant &\exp [-t^{2}/2]+\kappa_{B0}\exp [-t^{2}]+\kappa_{B0}\exp [-t^{2}/2] \\ &\leqslant &K\exp [-t^{2}/2]. \end{array} $$

This then implies the proof of the result:

$$\begin{array}{@{}rcl@{}} P(\Im ) &=&1-P\left( \max\limits_{1\leqslant j\leqslant p}2|\epsilon_{n}^{\prime }T^{(j)}|/n>\lambda_{0}\right) \\ &\geqslant &1-K\exp [-t^{2}/2]. \end{array} $$

□

Proof Proof of Theorem 3.

On the set I, with $\lambda _{n}\geqslant 2\lambda _{0}$,

$$\begin{array}{@{}rcl@{}} &&2\frac{\left. \left\Vert (I-\hat{\rho}_{n}M_{n})X_{n}(\hat{\beta}-\beta )\right\Vert_{2}\right.^{2}}{n}+\lambda_{n}||\hat{\beta}-\beta ||_{1} \\ &=&2\frac{\left. \left\Vert (I-\hat{\rho}_{n}M_{n})X_{n}(\hat{\beta}-\beta )\right\Vert_{2}\right.^{2}}{n}+\lambda_{n}||\hat{\beta}_{S_{0}}-\beta_{S_{0}}||_{1}+\lambda_{n}||\hat{\beta}_{{S_{0}^{c}}}||_{1} \\ &\leqslant &4\lambda_{n}||\hat{\beta}_{S_{0}}-\beta_{S_{0}}||_{1} \\ &\leqslant &4\lambda_{n}\sqrt{s_{0}}\left\Vert (I-\hat{\rho}_{n}M_{n})X_{n}(\hat{\beta}-\beta )\right\Vert_{2}/(\sqrt{n}\phi_{0}) \\ &\leqslant &\left. \left\Vert (I-\hat{\rho}_{n}M_{n})X_{n}(\hat{\beta}-\beta )\right\Vert_{2}\right.^{2}/n + 4{\lambda_{n}^{2}}s_{0}/{\phi_{0}^{2}}, \end{array} $$

where the final inequality follows from the fact that

$$4uv\leqslant u^{2}+ 4v^{2}. $$

Further, combining the oracle inequality with the Proposition regarding the set I, the result follows. □

Proof Proof of Theorem 4.

Using the result of Proposition 1 and the line of proof of Theorem 2, we have

$$\begin{array}{@{}rcl@{}} P(A_{n}\cap B_{n}) &\geqslant &1-P({A_{n}^{c}})-P({B_{n}^{c}}) \\ &\geqslant &1-\sum\limits_{i = 1}^{q}P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n}{b_{i}^{n}}\right) -\sum\limits_{i = 1}^{p-q}P\left( |{\zeta_{i}^{n}}|>\frac{\lambda _{n}}{2\sqrt{n}}\eta_{i}\right) . \end{array} $$

where $z^{n}=({z_{1}^{n}},{\cdots } ,{z_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}W^{n}(1)$, $\zeta ^{n}=({\zeta _{1}^{n}},{\cdots } ,\zeta _{p-q}^{n})^{\prime }=$$C_{21}^{n}(C_{11}^{n})^{-1}W^{n}(1)-W^{n}(2)$ and $b^{n}=({b_{1}^{n}},{\cdots } ,{b_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}sign(\beta (1))$.

Replace all the $\hat {\rho }_{n}$ in the notations above with the true parameter value ρ, and denote these as ${C_{0}^{n}}$, ${W_{0}^{n}}$, $ {z_{0}^{n}}$, ${\zeta _{0}^{n}}$, and ${b_{0}^{n}}$ for simple notation. Then each element in the first term on the right hand side of the above inequality is:

$$\begin{array}{@{}rcl@{}} &&P(|{z_{i}^{n}}|\geqslant \sqrt{n}\left( |\beta_{i}|-\frac{\lambda_{n}}{2n} {b_{i}^{n}}\right) \\ &=&P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n} {b_{i}^{n}}),|z_{0i}^{n}-{z_{i}^{n}}|>\delta ,|b_{0i}^{n}-{b_{i}^{n}}|>\delta \right) \\ &&+P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n} {b_{i}^{n}}),|z_{0i}^{n}-{z_{i}^{n}}|\leqslant \delta ,|b_{0i}^{n}-{b_{i}^{n}}|\leqslant \delta \right) \\ &&+P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n} {b_{i}^{n}}),|z_{0i}^{n}-{z_{i}^{n}}|>\delta ,|b_{0i}^{n}-{b_{i}^{n}}|\leqslant \delta \right) \\ &&+P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n} {b_{i}^{n}}),|z_{0i}^{n}-{z_{i}^{n}}|\leqslant \delta ,|b_{0i}^{n}-{b_{i}^{n}}|>\delta \right) \\ &=&A_{1}+A_{2}+A_{3}+A_{4} \end{array} $$

for any δ > 0.

Since $C^{n}-{C_{0}^{n}}\rightarrow _{p}0,W^{n}-{W_{0}^{n}}\rightarrow _{p}0$, then $z^{n}-{z_{0}^{n}}=o_{p}(1),\zeta ^{n}-{\zeta _{0}^{n}}=o_{p}(1)$ and $ b^{n}-{b_{0}^{n}}=o_{p}(1)$. Note that here we cannot use $C=\lim _{n\rightarrow \infty }\frac {1}{n} X_{n}^{\prime }{\Sigma } (\rho )X_{n}$ as defined in Assumption 6, since this may not be nonsingular or maybe not even convergent in the high-dimensional context.

Thus, A₁ + A₃ + A₄ < 3δ, and

$$\begin{array}{@{}rcl@{}} A_{2} &=&P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|{\beta_{i}^{n}}|-\frac{ \lambda_{n}}{2n}{b_{i}^{n}}),|z_{0i}^{n}-{z_{i}^{n}}|\leqslant \delta ,|b_{0i}^{n}-{b_{i}^{n}}|\leqslant \delta \right) \\ &\leqslant &P\left( |z_{0i}|\geqslant \sqrt{n}(|{\beta_{i}^{n}}|-\frac{ \lambda_{n}}{2n}(b_{i0}^{n}+\delta ))-\delta \right) . \end{array} $$

Now if we write ${z_{0}^{n}}=H_{A}^{\prime }\epsilon _{n}$, where $H_{A}^{\prime }=({h_{1}^{a}},{\cdots } ,{h_{q}^{a}})^{\prime }=(C_{11}^{0})^{-1} \frac {1}{\sqrt {n}}$ [(I − ρM_n)X_n] (1)^′, then

$$H_{A}^{\prime }H_{A}=(C_{11}^{0})^{-1}n^{-1}[(I-\rho M_{n})X_{n}](1)^{\prime }[(I-\rho M_{n})X_{n}](1)(C_{11}^{0})^{-1}=(C_{11}^{0})^{-1}. $$

Therefore, $z_{0i}^{n}=({h_{i}^{a}})^{\prime }\epsilon _{n}$ with

$$ ||{h_{i}^{a}}||_{2}^{2}\leqslant \frac{1}{K_{2}}\forall i = 1,{\cdots} ,q. $$

(A.3)

Similarly,

$$\begin{array}{@{}rcl@{}} &&P\left( |{\zeta_{i}^{n}}|>\frac{\lambda_{n}}{2\sqrt{n}}\eta_{i}\right) \\ &=&P\left( |{\zeta_{i}^{n}}|>\frac{\lambda_{n}}{2\sqrt{n}}\eta_{i},|\zeta_{i}^{n}-\zeta_{0i}|>\delta )+P(|{\zeta_{i}^{n}}|>\frac{\lambda_{n}}{2\sqrt{ n}}\eta_{i},|{\zeta_{i}^{n}}-\zeta_{0i}|\leqslant \delta \right) \\ &\leqslant &\delta +P\left( |\zeta_{0i}|>\frac{\lambda_{n}}{2\sqrt{n}}\eta_{i}-\delta \right) . \end{array} $$

If we write ${\zeta _{0}^{n}}=H_{B}^{\prime }\epsilon _{n}$ where $H_{B}^{\prime }=({h_{1}^{b}},{\cdots } ,h_{p-q}^{b})^{\prime }=C_{21}^{0}(C_{11}^{0})^{-1}n^{-\frac {1}{2}}[(I-\rho M_{n})X_{n}](1)^{\prime }-n^{-\frac {1}{2}}[(I-\rho M_{n})X_{n}](2)^{\prime }$, then

$$\begin{array}{@{}rcl@{}} H_{B}^{\prime }H_{B} &=&\frac{1}{n}[(I-\rho M_{n})X_{n}](2)^{\prime }\{I-[(I-\rho M_{n})X_{n}](1) \\ &&\{[(I-\rho M_{n})X_{n}](1)^{\prime }[(I-\rho M_{n})X_{n}](1)\}^{-1}[(I-\rho M_{n})X_{n}](1)^{\prime }\} \\ &&[(I-\rho M_{n})X_{n}](2). \end{array} $$

Since I − [(I −ρM_n)X_n](1){[(I −ρM_n)X_n](1)^′[(I −ρM_n)X_n](1)}^− 1[(I −ρM_n)X_n](1)^′ has eigenvalues between 0 and 1, therefore $\zeta _{0i}^{n}=({h_{i}^{b}})^{\prime }\epsilon _{n}$ with

$$ ||{h_{i}^{b}}||_{2}^{2}\leqslant K_{1}\forall i = 1,{\cdots} ,q. $$

(A.4)

Also note that,

$$ \left\vert \frac{\lambda_{n}}{n}{b_{0}^{n}}\right\vert =\frac{\lambda_{n}}{n} \left\vert (C_{11}^{0})^{-1}sign(\beta (1))\right\vert \leqslant \frac{ \lambda_{n}}{nK_{2}}\left\Vert sign(\beta (1))\right\Vert_{2}=\frac{ \lambda_{n}}{nK_{2}}\sqrt{q} $$

(A.5)

Now given (A.3) and (A.4), it can be shown that $ E({\epsilon _{i}^{n}})^{4}<\infty $ in Assumption 1 implies $ E({z_{i}^{n}})^{4}<\infty $ and $E({\zeta _{i}^{n}})^{4}<\infty $. In fact, given any constant n-dimensional vector α,

$$E(\alpha^{\prime }\epsilon^{n})^{2k}\leqslant (2k-1)!\left. \left\Vert \alpha \right\Vert_{2}\right.^{2}E({\epsilon_{i}^{n}})^{2k}. $$

For IID errors with bounded 4th moments, we have their tail probability bounded by

$$P(z_{i0}^{n}>t)=O(t^{-4}) $$

Therefore, for $\lambda _{n}/\sqrt {n}=O(n^{\frac {c_{2}-c_{1}}{2}})$, using (A.5), if we make δ arbitrary small, we have

$$\begin{array}{@{}rcl@{}} &&\sum\limits_{i = 1}^{q}P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}\left( |\beta_{i}|- \frac{\lambda_{n}}{2n}{b_{i}^{n}}\right) \right) \\ &\leqslant &q(3\delta +O(\sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n} (b_{i0}^{n}+\delta ))-\delta )^{-4}) \\ &=&qO\left( r(\rho )n^{-2c_{2}+ 2c_{1}-2}\right) \\ &=&O\left( r(\rho )n^{-2 + 2c_{2}}\right) , \end{array} $$

where r(ρ) is the bound for the absolute value of the elements in the matrix $(C_{11}^{n}(\rho ))^{-1}$. Likewise,

$$\begin{array}{@{}rcl@{}} &&\sum\limits_{i = 1}^{p-q}P\left( |{\zeta_{i}^{n}}|>\frac{\lambda_{n}}{2\sqrt{n}} \eta_{i}\right) \\ &\leqslant &\delta +(p-q)O\left( \frac{n^{2}}{{\lambda_{n}^{4}}}\right) \\ &=&O\left( \frac{pn^{2}}{{\lambda_{n}^{4}}}\right) \\ &=&o(1) \\ && \end{array} $$

Adding these two terms, Theorem 4 follows.□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, L., Bhattacharjee, A., Calantone, R. et al. Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator. Sankhya B 81 (Suppl 1), 146–200 (2019). https://doi.org/10.1007/s13571-018-0176-z

Download citation

Received: 18 April 2018
Published: 06 November 2018
Issue Date: September 2019
DOI: https://doi.org/10.1007/s13571-018-0176-z

Keywords and phrases

AMS (2000) subject classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator

Abstract

Article PDF

Similar content being viewed by others

Robust variable selection with exponential squared loss for partially linear spatial autoregressive models

Shrinkage estimation of the linear model with spatial interaction

Robust variable selection with exponential squared loss for the partially linear varying coefficient spatial autoregressive model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

(PDF 166 KB)

Appendix: Proofs of technical results

Proof Proof of Theorem 1.

Proof Proof of Proposition 1.

Proof Proof of Theorem 2.

Proof Proof of Proposition 2.

Proof Proof of Theorem 3.

Proof Proof of Theorem 4.

Rights and permissions

About this article

Cite this article

Keywords and phrases

AMS (2000) subject classification

Navigation

Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator

Abstract

Article PDF

Similar content being viewed by others

Robust variable selection with exponential squared loss for partially linear spatial autoregressive models

Shrinkage estimation of the linear model with spatial interaction

Robust variable selection with exponential squared loss for the partially linear varying coefficient spatial autoregressive model

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

(PDF 166 KB)

Appendix: Proofs of technical results

Appendix: Proofs of technical results

Proof Proof of Theorem 1.

Proof Proof of Proposition 1.

Proof Proof of Theorem 2.

Proof Proof of Proposition 2.

Proof Proof of Theorem 3.

Proof Proof of Theorem 4.

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases

AMS (2000) subject classification

Search

Navigation