A systematic review on model selection in high-dimensional regression

Lee, Eun Ryung; Cho, Jinwoo; Yu, Kyusang

doi:10.1016/j.jkss.2018.10.001

A systematic review on model selection in high-dimensional regression

Review
Published: 12 November 2018

Volume 48, pages 1–12, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of the Korean Statistical Society Aims and scope Submit manuscript

A systematic review on model selection in high-dimensional regression

Download PDF

Eun Ryung Lee¹,
Jinwoo Cho¹ &
Kyusang Yu²

111 Accesses
7 Citations
Explore all metrics

Abstract

High dimensional models are getting much attention from diverse research fields involving very many parameters with a moderate size of data. Model selection is an important issue in such a high dimensional data analysis. Recent literature on theoretical understanding of high dimensional models covers a wide range of penalized methods including LASSO and SCAD. This paper presents a systematic overview of the recent development in high dimensional statistical models. We provide a brief review on the recent development of theory, methods, and guideline on applications of several penalized methods. The review includes appropriate settings to be implemented and limitations along with potential solution for each of the reviewed method. In particular, we provide a systematic review of statistical theory of the high dimensional methods by considering a unified high-dimensional modeling framework together with high level conditions. This framework includes (generalized) linear regression and quantile regression as its special cases. We hope our review helps researchers in this field to have a better understanding of the area and provides useful information to future study.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Bickel, P., Ritov, Y., & Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 37, 1705–1732.
Article MathSciNet Google Scholar
Bühlmann, P., & van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Springer.
Book Google Scholar
Candès, E., & Tao, T. (2005). Decoding by linear programming. IEEE Transaction on Information Theory, 59, 1207–1223.
MathSciNet MATH Google Scholar
Candès, E., & Tao, T. (2007). The Dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics, 35, 2313–2351.
Article MathSciNet Google Scholar
Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771.
Article MathSciNet Google Scholar
Chen, J., & Chen, Z. (2012). Extended BIC for small-n-large-P sparse GLM. Statistica Sinica, 22, 555–574.
MathSciNet MATH Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its Oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article MathSciNet Google Scholar
Fan, J., & Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101–148.
MathSciNet MATH Google Scholar
Fan, J., & Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32, 928–961.
Article MathSciNet Google Scholar
Kim, Y., Choi, H., & Oh, H. S. (2012). Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association, 103, 1665–1673.
Article MathSciNet Google Scholar
Kim, Y., & Kwon, S. (2012). Global optimality of nonconvex penalized estimators. Biometrika, 99, 315–325.
Article MathSciNet Google Scholar
Kim, Y., Kwon, S., & Choi, H. (2012). Consistent model selection criteria on high dimensions. Journal of Machine Learning Research (JMLR), 13, 1037–1057.
MathSciNet MATH Google Scholar
Kwon, S., & Kim, Y. (2012). Large sample properties of the scad-penalized maximum likelihood estimation on high dimensions. Statistica Sinica, 22, 629–653.
Article MathSciNet Google Scholar
Lee, E. R., Noh, H., & Park, B. U. (2014a). Model selection via bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 216–229.
Google Scholar
Lee, E. R., Noh, H., & Park, B. U. (2014b). Supplement to “Model selection via Bayesian information criterion for quantile regression models”. Journal of the American Statistical Association, 216–229.
Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, B58, 267–288.
Google Scholar
Tsybakov, A. B., & van de Geer, S. A. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. The Annals of Statistics, 33, 1203–1224.
Article MathSciNet Google Scholar
van de Geer, S. (2007). The deterministic lasso. In JSM proceedings. American Statistical Association.
Google Scholar
van de Geer, S. (2008). High-dimensional generalized linear models and the lasso. The Annals of Statistics, 36, 614–645.
Article MathSciNet Google Scholar
van de Geer, S., & Bühlmann, P. (2009). On the conditions used to prove oracle results for the lasso. Electronic Journal of Statistics, 3, 1360–1392.
Article MathSciNet Google Scholar
Wang, L., Kim, Y., & Li, R. (2013). Calibrating nonconvex penalized regression in ultra-high dimension. The Annals of Statistics, 41, 2505–2536.
Article MathSciNet Google Scholar
Wang, H., & Leng, C. (2007). Unified lasso estimation by least squares approximation. Journal of the American Statistical Association, 102, 1418–1429.
MathSciNet MATH Google Scholar
Wang, H., Li, B., & Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society B, 71, 671–683.
Article MathSciNet Google Scholar
Wang, H., Li, R., & Tsai, C. L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553–568.
Article MathSciNet Google Scholar
Wang, L., Wu, Y., & Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107, 214–222.
Article MathSciNet Google Scholar
Zhang, (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.
Article MathSciNet Google Scholar
Zhang, Y., Li, R., & Tsai, C. L. (2010). Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association, 105, 312–323.
Article MathSciNet Google Scholar
Zhao, P., & Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research (JMLR), 2541–2563.
Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Article MathSciNet Google Scholar
Zou, H., & Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Annals of Statistics, 36, 1509–1533.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Sungkyunkwan University, Korea
Eun Ryung Lee & Jinwoo Cho
Konkuk University, Korea
Kyusang Yu

Authors

Eun Ryung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jinwoo Cho
View author publications
You can also search for this author in PubMed Google Scholar
Kyusang Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyusang Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, E.R., Cho, J. & Yu, K. A systematic review on model selection in high-dimensional regression. J. Korean Stat. Soc. 48, 1–12 (2019). https://doi.org/10.1016/j.jkss.2018.10.001

Download citation

Received: 26 September 2018
Accepted: 19 October 2018
Published: 12 November 2018
Issue Date: March 2019
DOI: https://doi.org/10.1016/j.jkss.2018.10.001

AMS 2000 subject classifications

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A systematic review on model selection in high-dimensional regression

Abstract

Article PDF

Similar content being viewed by others

Statistical inference and large-scale multiple testing for high-dimensional regression models

Some Themes in High-Dimensional Statistics

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

AMS 2000 subject classifications

Keywords

Navigation

A systematic review on model selection in high-dimensional regression

Abstract

Article PDF

Similar content being viewed by others

Statistical inference and large-scale multiple testing for high-dimensional regression models

Some Themes in High-Dimensional Statistics

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

AMS 2000 subject classifications

Keywords

Search

Navigation