Abstract
We propose generalized moments LASSO estimator, combining LASSO with GMM, for penalized variable selection and estimation under the spatial error model with spatially autoregressive errors. We establish parameter consistency and selection sign consistency of the proposed estimator in the low dimensional setting when the parameter dimension p < sample size n , as well as the high dimensional setting with p greater than and growing with n. Finite sample performance of the method is examined by simulation, compared against the LASSO for IID data. The methods are applied to estimation of a spatial Durbin model for the Aveiro housing market (Portugal).
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Ahrens, A. and Bhattacharjee, A. (2015). Two-step lasso estimation of the spatial weights matrix. Econometrics (MDPI)3, 1, 128–155.
Ando, T. and Bai, J. (2016). Panel data models with grouped factor structure under unknown group membership. J. Appl. Econ.31, 163–191.
Anselin, L. (1988). Spatial Econometrics: Methods and Models. Kluwer Academic, Boston.
Bai, Z.D. (1999). Methodologies in spectral analysis of large dimensional random matrices: A review. Statistica Sinica9, 611–677.
Bailey, N., Holly, S. and Pesaran, M.H. (2016). A two-stage approach to spatio-temporal analysis with strong and weak cross-sectional dependence. J. Appl. Econ.31, 249–280.
Belloni, A. and Chernozhukov, V. (2011). High dimensional sparse econometric models: An introduction. arXiv:1106.5242v2.
Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli19, 521–547.
Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root LASSO: pivotal recovery of sparse signals via conic programming. Biometrika98, 791–806.
Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica80, 2369–2429.
Belloni, A., Chernozhukov, V. and Wei, Y. (2016). Post-selection inference for generalized linear models with many controls. Journal of Business and Economic Statistics, forthcoming.
Bhattacharjee, A., Castro, E.A. and Marques, J.L. (2012). Understanding spatial diffusion with factor-based hedonic pricing models: the urban housing market of Aveiro, Portugal. Spat. Econ. Anal.7, 1, 133–167.
Bhattacharjee, A., Castro, E., Maiti, T. and Marques, J. (2016). Endogenous spatial regression and delineation of submarkets: A new framework with application to housing markets. J. Appl. Econ.31, 32–57.
Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of LASSO and Dantzig selector. Ann. Stat.37, 1705–1732.
Brady, R.R. (2011). Measuring the diffusion of housing prices across space and over time. J. Appl. Econ.26, 2, 213–231.
Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer.
Caner, M. and Zhang, H.H. (2014). Adaptive elastic net for generalized methods of moments. J. Bus. Econ. Stat.32, 1, 30–47.
Castle, J.L. and Hendry, D.F. (2014). Model selection in under-specified equations with breaks. J. Econ.178, 286–293.
Castle, J.L., Doornik, J.A., Hendry, D.F. and Pretis, F. (2015). Detecting location shifts during model selection by step-indicator saturation. Econometrics (MDPI)3, 2, 240–264.
Chudik, A. and Pesaran, M.H. (2011). Infinite-dimensional VARs and factor models. J. Econ.163, 1, 4–22.
Chudik, A., Grossman, V. and Pesaran, M.H. (2016). A multi-country approach to forecasting output growth using PMIs. Journal of Econometrics, forthcoming.
Cliff, A.D. and Ord, J.K. (1973). Spatial Autocorrelation. London: Pion.
Cuaresma, C. and Feldkircher, M. (2013). Spatial filtering, model uncertainty and the speed of income convergence in Europe. J. Appl. Econ.28, 4, 720–741.
Feng, W., Lim, C., Maiti, T. and Zhang, Z. (2016). Spatial regression and estimation of disease risks: A clustering based approach. Statistical Analysis and Data Mining, forthcoming.
Flores-Lagunes, A. and Schnier, K.E. (2012). Estimation of sample selection models with spatial dependence. J. Appl. Econ.27, 2, 173–204.
Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33, 1–22.
Fu, W. and Knight, K. (2000). Asymptotics for LASSO-type estimators. Ann. Stat.28, 1356–1378.
Geyer, C.J. (1996). On the asymptotics of convex stochastic optimization. Unpublished manuscript.
Hall, P. and Horowitz, J.L. (2005). Nonparametric methods for inference in the presence of instrumental variables. Ann. Stat.33, 2904–2929.
Hendry, D.F., Johansen, S. and Santos, C. (2008). Automatic selection of indicators in a fully saturated regression. Comput. Stat.33, 317–335. Erratum, 337–339.
Ishwaran, H. and Rao, J.S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Stat.33, 2, 730–773.
Johansen, S. and Nielsen, B. (2009). An analysis of the indicator saturation estimator as a robust regression estimator. In The Methodology and Practice of Econometrics, (J.L. Castle and N. Shephard, eds.). Oxford University Press, Oxford, pp. 1–36 .
Kapoor, M., Kelejian, H.H. and Prucha, I.R. (2007). Panel data models with spatially correlated error components. J. Econ.140, 97–130.
Kelejian, H.H. and Prucha, I.R. (1999). A generalized moments estimator for the autoregressive parameter in a spatial model. Int. Econ. Rev.40, 509–533.
Kelejian, H.H. and Prucha, I.R. (2010). Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J. Econ.157, 53–67.
Kock, A.B. and Callot, L. (2015). Oracle inequalities for high dimensional vector autoregressions. J. Econ.186, 2, 325–344.
Lam, C. and Souza, P.C. (2016). Regularization for spatial panel time series using the adaptive LASSO. Journal of Regional Science, forthcoming.
Lee, L.-F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica72, 6, 1899–1925.
Lee, L.-F. and Yu, J. (2010). Estimation of spatial autoregressive panel data models with fixed effects. J. Econ.154, 165–185.
Lee, L.-F. and Yu, J. (2016). Identification of spatial Durbin panel models. J. Appl. Econ.31, 1, 133–162.
Lin, X. and Lee, L.-F. (2010). GMM estimation of spatial autoregressive models with unknown heteroskedasticity. J. Econ.157, 1, 34–52.
Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electr. J. Stat.2, 90–102.
Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the LASSO. Ann. Stat.34, 1436–1462.
Nandy, S., Lim, C. and Maiti, T. (2016). Additive model building for spatial regression. Journal of the Royal Statistical Society Series B, forthcoming.
Nowak, A. and Smith, P. (2017). Textual analysis in real estate. Journal of Applied Econometrics, forthcoming.
Pesaran, M.H., Schuermann, T. and Weiner, S.M. (2004). Modelling regional interdependencies using a global error-correcting macroeconometric model. J. Business Econ. Stat.22, 2, 129–162.
Pollard, D. (1991). Asymptotic for least absolute deviation regression estimators. Econ. Theory7, 186–199.
Stock, J.H. and Watson, M.W. (2002). Forecasting using principle components from a large number of predictors. J. Am. Stat. Assoc.97, 1167–1179.
Su, L. and Yang, Z. (2015). QML estimation of dynamic panel data models with spatial errors. J. Econ.185, 1, 230–258.
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Series B58, 267–288.
Varian, H.R. (2014). Big data: new tricks for econometrics. J. Econ. Perspect.28, 3–28.
Whittle, P. (1954). On stationary processes in the plane. Biometrica41, 434–449.
Yu, J., de Jong, R.M. and Lee, L.-F. (2008). Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both N and T are large. J. Econ.146, 1, 118–134.
Zhao, P. and Yu, B. (2006). On model selection consistency of LASSO. J. Mach. Learn. Res.7, 2541–2563.
Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Stat. Assoc.101, 1418–1429.
Zou, H. and Zhang, H. (2009). On the adaptive elastic-net with a diverging number of parameters. Ann. Stat.37, 1733–1751.
Acknowledgments
We thank organisers and participants in seminars at the University of Illinois and Indian Statistical Institute, the USC Dornsife INET Conference on Big Data in Economics, and invited presentations at the 26th (EC)2 Conference and American Statistical Association JSM (Business & Economic Statistics Section) for valuable comments and suggestions. The usual disclaimer applies.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Proofs of technical results
Appendix: Proofs of technical results
Proof Proof of Theorem 1.
Define a random function of ρ and ϕ,
By the definition of LASSO estimator, for any fixed ρ, Zn(ϕ,ρ) is minimized at \(\phi =\hat {\beta }_{L}(\rho )\).
However, we not have the true value of ρ, but instead, we use the GMM estimator \(\hat {\rho }_{n}\) as a substitute. Then the function \(Z_{n}(\phi , \hat {\rho }_{n})\) is minimized at the generalized moments LASSO estimator \( \phi =\hat {\beta }_{L}(\hat {\rho }_{n})\). Furthermore, denote by β the true value of the unknown parameter, and let
Then, it is easy to see that for any given ρ, Z(β,ρ) is minimized at ϕ = β. For each ϕ ∈ Rp,
where
Since \(\frac {\lambda _{n}}{n}\rightarrow 0\), we have Φ3 → 0. Also,
by Assumption 6 and the weak law of large numbers.
Moreover, since \(\hat {\rho }_{n}\) is a consistent estimator of ρ,
Therefore, \(Z_{n}(\phi ,\hat {\rho }_{n})-Z(\phi ,\rho )\rightarrow _{p}0\) for any ϕ ∈ Rp. Combined with the fact that \(Z_{n}(\phi ,\hat {\rho }_{n})\) is a convex function of ϕ, we have
for any compact set K and \(\hat {\beta }_{L}(\hat {\rho }_{n})\in O_{p}(1)\) by applying the convexity lemma in Pollard (1991). From the above result we have
which implies that
For asymptotic normality of the estimator, we need λn to grow slowly, and further assume that \(\lambda _{n}=O(\sqrt {n})\). From the above proof, we already know that
is minimized at \(\phi =\hat {\beta }_{L}(\hat {\rho }_{n})\). Now define \(w=\sqrt { n}(\phi -\beta )\). Then \(nZ_{n}(\phi ,\hat {\rho }_{n})\) can be treated as a function of w and
is minimized at \(\sqrt {n}\left (\hat {\beta }_{L}(\hat {\rho }_{n})-\beta \right ) \). The same is true for
It follows that
Also, define
where
Easy to see that
where U ∼ N(0,σ2C(ρ)). Also
where we use the consistency of \(\hat {\rho }_{n}\) in the proof above. Thus Vn(w) →DV (w), and combined with the fact that Vn is convex and V has a unique minimum, it follows from Geyer (1996) that
□
Proof Proof of Proposition 1.
By the definition of estimator in the second estimation step,
where the estimator is the minimizer of the penalized least square when the true spatial parameter ρ is replaced by its consistent estimator \(\hat { \rho }_{n}\). Let φ = ϕ − β, which is equivalent to \(\frac {w}{ \sqrt {n}}\) in the proof of Theorem 1. The following proof is similar to that of the proof of Theorem 1. Define
Then
Separate Dn(φ) into two parts, Dn1(φ) and Dn2(φ). Let
where
Differentiate Dn(φ) w.r.t. φ, we have
Note here that both \(\hat {\varphi }(1)\) and Wn(1) are vectors of dimension p × 1. Let \(\hat {\varphi }(1)\), Wn(1) and \(\hat {\varphi } (2)\), Wn(2) denote the first q and last p − q entries of \(\hat {\varphi }\) and Wn respectively. Then by definition:
Hence if there exists \(\hat {\varphi }\) such that
then by Lemma 1 and the uniqueness of LASSO solution, \(sign(\hat {\beta }_{L}(\hat {\rho }_{n})(1))=sign(\beta (1))\) and \(\hat {\beta }_{L}(\hat {\rho } _{n})(2)=\beta (2)= 0\).
And the existence of such \(\hat {\varphi }\) is implied by
here (A.1) coincides with An and (A.2) contains Bn. The result for Proposition 1 follows. □
Proof Proof of Theorem 2.
From Proposition 1, we have
Thus,
where \(z^{n}=({z_{1}^{n}},{\cdots } ,{z_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}W^{n}(1)\), \(\zeta ^{n}=({\zeta _{1}^{n}},{\cdots } ,\zeta _{p-q}^{n})^{\prime }=C_{21}^{n}\)\((C_{11}^{n})^{-1}W^{n}(1)-W^{n}(2)\) and \(b^{n}=({b_{1}^{n}},{\cdots } ,{b_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}sign(\beta (1))\).
Since \(\hat {\rho }_{n}\) is a consistent estimator of ρ, similar to the proof of Theorem 1, and under the regularity conditions in Assumption 6, we have
This is because
The final step follows from Assumption 3 and 6, together with the consistency of \(\hat {\rho }_{n}\). Thus, \((C_{11}^{n}(\hat {\rho }_{n}))^{-1}\rightarrow _{p}(C_{11}(\rho ))^{-1}\). Similarily,
Since \(X_{n}^{\prime }(I-\rho M_{n}^{\prime })\epsilon _{n}/\sqrt {n} \rightarrow _{d}N(0,\sigma ^{2}C(\rho ))\), we have
Thus Wn(1) →DN(0,σ2C11(ρ)). Applying Slutsky’s theorem, we have
Making use of the above result, combined with the fact that
we have
Hence all \({z_{i}^{n}}\)’s and \({\zeta _{i}^{n}}\)’s converge in distribution to Gaussian random variables with mean 0 and finite variance bounded by s2(ρ) for some constant function s(ρ). For t > 0, the Gaussian distribution has its tail probability bounded by
Since λn/n → 0 and \(\lambda _{n}/n^{\frac {1+c}{2} }\geqslant r\) with 0 ≤ c < 1, we have
and
Theorem 2 follows.□
Proof Proof of Proposition 2.
Let \(r=\sigma \sqrt {\frac {\log 2p}{n}}\), and denote
Then
Further define
therefore,
where the final inequality holds because, if {A1 + A2 > r}, at least one of {A1 > r/2} and {A2 > r/2} necessarily need to hold. Since \(\hat {\rho }_{n}\) is a consistent estimator of ρ, that is, \(\hat {\rho }_{n}\rightarrow _{p}\rho \), we have, for all t > 0, defining \(c=\frac {1}{2}\exp (-\frac {t^{2}}{2})\), when n is large enough,
and
Then, it is easy to see that
and
Next, we need the tail probability of \(\max _{1\leqslant j\leqslant p}|\epsilon _{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|\) and \(\max _{1\leqslant j\leqslant p}|\epsilon _{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|\). However, note that in our case, we do not assume Gaussian distribution for the error 𝜖n, instead, we only have zero mean and finite second moment assumption (Assumption ??). Thus, we use the moment inequality derived from the Nemirovski’s inequality:
for any design matrix U, with U(j) as its j th column. Based on the assumption, the row and column sums of Mn and (I − ρMn)− 1 are bounded uniformly in absolute value and each element of Xn are nonstochastic and uniformly bounded in absolute value. Also, we know that, if An and Bn are matrices that are conformable for multiplication with row and column sums uniformly bounded in absolute value, then the row and column sums of AnBn are also uniformly bounded in absolute value. Further, this result follows to 3 or more matrices.
Thus, the row and column sums of I − ρMn, \((I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}\) and \((I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})\) are all bounded uniformly in absolute value. So every element in matrices \((I-\rho M_{n})X_{n}^{(j)}\), \((I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}\) and \((I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}\) are bounded, and denote the common bound for all of them as κB.
Then, we have
and similarly,
As a result,
Substituting the above probability bounds, we have
This then implies the proof of the result:
□
Proof Proof of Theorem 3.
On the set I, with \(\lambda _{n}\geqslant 2\lambda _{0}\),
where the final inequality follows from the fact that
Further, combining the oracle inequality with the Proposition regarding the set I, the result follows. □
Proof Proof of Theorem 4.
Using the result of Proposition 1 and the line of proof of Theorem 2, we have
where \(z^{n}=({z_{1}^{n}},{\cdots } ,{z_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}W^{n}(1)\), \(\zeta ^{n}=({\zeta _{1}^{n}},{\cdots } ,\zeta _{p-q}^{n})^{\prime }=\)\(C_{21}^{n}(C_{11}^{n})^{-1}W^{n}(1)-W^{n}(2)\) and \(b^{n}=({b_{1}^{n}},{\cdots } ,{b_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}sign(\beta (1))\).
Replace all the \(\hat {\rho }_{n}\) in the notations above with the true parameter value ρ, and denote these as \({C_{0}^{n}}\), \({W_{0}^{n}}\), \( {z_{0}^{n}}\), \({\zeta _{0}^{n}}\), and \({b_{0}^{n}}\) for simple notation. Then each element in the first term on the right hand side of the above inequality is:
for any δ > 0.
Since \(C^{n}-{C_{0}^{n}}\rightarrow _{p}0,W^{n}-{W_{0}^{n}}\rightarrow _{p}0\), then \(z^{n}-{z_{0}^{n}}=o_{p}(1),\zeta ^{n}-{\zeta _{0}^{n}}=o_{p}(1)\) and \( b^{n}-{b_{0}^{n}}=o_{p}(1)\). Note that here we cannot use \(C=\lim _{n\rightarrow \infty }\frac {1}{n} X_{n}^{\prime }{\Sigma } (\rho )X_{n}\) as defined in Assumption 6, since this may not be nonsingular or maybe not even convergent in the high-dimensional context.
Thus, A1 + A3 + A4 < 3δ, and
Now if we write \({z_{0}^{n}}=H_{A}^{\prime }\epsilon _{n}\), where \(H_{A}^{\prime }=({h_{1}^{a}},{\cdots } ,{h_{q}^{a}})^{\prime }=(C_{11}^{0})^{-1} \frac {1}{\sqrt {n}}\) [(I − ρMn)Xn] (1)′, then
Therefore, \(z_{0i}^{n}=({h_{i}^{a}})^{\prime }\epsilon _{n}\) with
Similarly,
If we write \({\zeta _{0}^{n}}=H_{B}^{\prime }\epsilon _{n}\) where \(H_{B}^{\prime }=({h_{1}^{b}},{\cdots } ,h_{p-q}^{b})^{\prime }=C_{21}^{0}(C_{11}^{0})^{-1}n^{-\frac {1}{2}}[(I-\rho M_{n})X_{n}](1)^{\prime }-n^{-\frac {1}{2}}[(I-\rho M_{n})X_{n}](2)^{\prime }\), then
Since I − [(I −ρMn)Xn](1){[(I −ρMn)Xn](1)′[(I −ρMn)Xn](1)}− 1[(I −ρMn)Xn](1)′ has eigenvalues between 0 and 1, therefore \(\zeta _{0i}^{n}=({h_{i}^{b}})^{\prime }\epsilon _{n}\) with
Also note that,
Now given (A.3) and (A.4), it can be shown that \( E({\epsilon _{i}^{n}})^{4}<\infty \) in Assumption 1 implies \( E({z_{i}^{n}})^{4}<\infty \) and \(E({\zeta _{i}^{n}})^{4}<\infty \). In fact, given any constant n-dimensional vector α,
For IID errors with bounded 4th moments, we have their tail probability bounded by
Therefore, for \(\lambda _{n}/\sqrt {n}=O(n^{\frac {c_{2}-c_{1}}{2}})\), using (A.5), if we make δ arbitrary small, we have
where r(ρ) is the bound for the absolute value of the elements in the matrix \((C_{11}^{n}(\rho ))^{-1}\). Likewise,
Adding these two terms, Theorem 4 follows.□
Rights and permissions
About this article
Cite this article
Cai, L., Bhattacharjee, A., Calantone, R. et al. Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator. Sankhya B 81 (Suppl 1), 146–200 (2019). https://doi.org/10.1007/s13571-018-0176-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-018-0176-z