Abstract
Various forms of penalized estimators with good statistical and computational properties have been proposed for variable selection respecting the grouping structure in the variables. The attractive properties of these shrinkage and selection estimators, however, depend critically on the choice of the tuning parameter. One method for choosing the tuning parameter is via information criteria, such as the Bayesian information criterion (BIC). In this paper, we consider the problem of consistent tuning parameter selection in high dimensional generalized linear regression with grouping structures. We extend the results of the extended regularized information criterion (ERIC) to group selection methods involving concave penalties and then investigate the selection consistency with diverging variables in each group. Moreover, we show that the ERIC-type selector enables consistent identification of the true model and that the resulting estimator possesses the oracle property even when the number of group is much larger than the sample size. Simulations show that the ERIC-type selector can significantly outperform the BIC and cross-validation selectors when choosing true grouped variables, and an empirical example is given to illustrate its use.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Breheny P, Huang J. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat Comput, 2015, 25: 173–187
Chen J, Chen Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 2008, 95: 759–771
Cortez P, Silva A M G. Using data mining to predict secondary school student performance. In: Proceedings of 5th Annual Future Business Technology Conference. https://doi.org/hdl.handle.net/1822/8024, 2008
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360
Fan J, Peng H. Nonconcave penalized likelihood with a diverging number of parameters. Ann Statist, 2004, 32: 928–961
Fan J, Song R. Sure independence screening in generalized linear models with NP-dimensionality. Ann Statist, 2010, 38: 3567–3604
Fan Y, Tang C Y. Tuning parameter selection in high dimensional penalized likelihood. J R Stat Soc Ser B Stat Methodol, 2013, 75: 531–552
Friedman J, Hastie T, Tibshirani R. A note on the group LASSO and a sparse group LASSO. ArXiv:1001.0736, 2010
Gao X, Carroll R J. Data integration with high dimensionality. Biometrika, 2017, 104: 251–272
Huang J, Breheny P, Ma S. A selective review of group selection in high-dimensional models. Statist Sci, 2012, 27: 481–499
Huang J, Ma S, Zhang C H. Adaptive LASSO for sparse high-dimensional regression models. Statist Sinica, 2008, 18: 1603–1618
Hui F K C, Warton D I, Foster S D. Tuning parameter selection for the adaptive LASSO using ERIC. J Amer Statist Assoc, 2015, 110: 262–269
Kim Y, Kwon S, Choi H. Consistent model selection criteria on high dimensions. J Mach Learn Res, 2012, 13: 1037–1057
McCullagh P, Nelder J A. Generalized Linear Models. Boca Raton: CRC Press, 1989
Meier L, Van De Geer S, Bühlmann P. The group LASSO for logistic regression. J R Stat Soc Ser B Stat Methodol, 2008, 70: 53–71
Wang H, Leng C. A note on adaptive group LASSO. Comput Statist Data Anal, 2008, 52: 5277–5286
Wang H, Li B, Leng C. Shrinkage tuning parameter selection with a diverging number of parameters. J R Stat Soc Ser B Stat Methodol, 2009, 71: 671–683
Wang L, Chen G, Li H. Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics, 2007, 23: 1486–1494
Wei F, Huang J. Consistent group selection in high-dimensional linear regression. Bernoulli, 2010, 16: 1369–1384
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol, 2006, 68: 49–67
Zhang Y, Li R, Tsai C L. Regularization parameter selections via generalized information criterion. J Amer Statist Assoc, 2010, 105: 312–323
Zhang Y, Shen X. Model selection procedure for high-dimensional data. Stat Anal Data Min, 2010, 3: 350–358
Zou H. The adaptive Lasso and its oracle properties. J Amer Statist Assoc, 2006, 101: 1418–1429
Zou H, Li R. One-step sparse estimates in nonconcave penalized likelihood models. Ann Statist, 2008, 36: 1509–1533
Zou H, Zhang H H. On the adaptive elastic-net with a diverging number of parameters. Ann Statist, 2009, 37: 1733–1751
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 11571337 and 71631006) and the Fundamental Research Funds for the Central Universities (Grant No. WK2040160028).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, Y., Wu, Y. & Jin, B. Consistent tuning parameter selection in high-dimensional group-penalized regression. Sci. China Math. 62, 751–770 (2019). https://doi.org/10.1007/s11425-017-9189-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-017-9189-9
Keywords
- Bayesian information criterion
- group selection
- penalized likelihood
- regularization parameter
- ultra-high dimensionality