Recent developments in high dimensional covariance estimation and its related issues, a review

Hong, Younghee; Kim, Choongrak

doi:10.1016/j.jkss.2018.04.005

Recent developments in high dimensional covariance estimation and its related issues, a review

Review
Published: 23 July 2018

Volume 47, pages 239–247, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of the Korean Statistical Society Aims and scope Submit manuscript

Recent developments in high dimensional covariance estimation and its related issues, a review

Download PDF

Younghee Hong¹ &
Choongrak Kim¹

83 Accesses
4 Citations
Explore all metrics

Abstract

In this paper we review some of recent developments in high dimensional data analysis, especially in the estimation of covariance and precision matrix, asymptotic results on the eigenstructure in the principal components analysis, and some relevant issues such as test on the equality of two covariance matrices, determination of the number of principal components, and detection of hubs in a complex network.

Article PDF

Test on the linear combinations of covariance matrices in high-dimensional data

Article 11 April 2019

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

Hypothesis tests for high-dimensional covariance structures

Article 01 August 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ahn, S. C., & Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81, 1203–1227.
Article MathSciNet MATH Google Scholar
Alessi, L., Barigozzi, M., & Capasso, M. (2010). Improved penalization for determining the number of factors in approximate factor models. Statistics & Probability Letters, 80, 1806–1813.
Article MathSciNet MATH Google Scholar
Bai, Z. D. (1993). Convergence rate of expected spectral distributions of large random matrices. The Annals of Probability, 21, 649–672.
Article MathSciNet MATH Google Scholar
Bai, J., & Li, K. (2012). Statistical analysis of factor models of high dimension. The Annals of Probability, 40, 437–465.
Article MathSciNet Google Scholar
Bai, J., & Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70, 191–221.
Article MathSciNet MATH Google Scholar
Bai, Z. D., & Yin, Y. Q. (1993). Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. The Annals of Probability, 21, 1275–1294.
Article MathSciNet MATH Google Scholar
Bao, Z. G., Pan, G. M., & Zhou, W. (2011). Tracy-Widom law for the extreme eigenvalues of sample correlation matrices. Preprint. Available at arXiv: 1110.5208.
Google Scholar
Berthet, Q., & Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. The Annals of Statistics, 41, 1780–1815.
Article MathSciNet MATH Google Scholar
Bickel, P. J., & Levina, E. (2008a). Covariance regularization by thresholding. The Annals of Statistics, 36, 2577–2604.
Article MathSciNet MATH Google Scholar
Bickel, P. J., & Levina, E. (2008b). Regularized estimation of large covariance matrices. The Annals of Statistics, 36, 199–227.
Article MathSciNet MATH Google Scholar
Bien, J., & Tibshirani, R. J. (2011). Sparse estimation of a covariance matrix. Biometrika, 98, 807–820.
Article MathSciNet MATH Google Scholar
Birnbaum, A., Johnstone, I. M., Nadler, B., & Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. The Annals of Statistics, 41, 1055–1084.
Article MathSciNet MATH Google Scholar
Bonacich, P. (1987). Power and centrality: A family of measures power and centrality. The American Journal of Sociology, 92, 1170–1182.
Article Google Scholar
Butte, A. J., Tamayo, P., Slonim, D., Golub, T. R., & Kohane, I. S. (2000). Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proceedings of the National Academy of Sciences, 97, 12182–12186.
Article Google Scholar
Cai, T. T., & Liu, W. (2011). A direct estimation approach to sparse linear discriminant analysis. Journal of the American Statistical Association, 106, 1566–1577.
Article MathSciNet MATH Google Scholar
Cai, T.T., Liu, W., & Luo, X. (2011). Aconstrained l₁ minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 672–684.
Article MathSciNet Google Scholar
Cai, T. T., Liu, W., & Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108, 265–277.
Article MathSciNet MATH Google Scholar
Cai, T. T., Liu, W., & Zhou, H. H. (2016). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. The Annals of Statistics, 44, 455–488.
Article MathSciNet MATH Google Scholar
Cai, T. T., Ma, Z., & Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probability Theory and Related Fields, 161, 781–815.
Article MathSciNet MATH Google Scholar
Cai, T. T., Ren, Z., & Zhou, H. H. (2013). Optimal rates of convergence for estimating Toeplitz covariance matrices. Probability Theory and Related Fields, 156, 101–143.
Article MathSciNet MATH Google Scholar
Cai, T. T., & Yuan, M. (2012). Adaptive covariance matrix estimation through block thresholding. The Annals of Statistics, 40, 2014–2042.
Article MathSciNet MATH Google Scholar
Cai, T. T., Zhang, C. H., & Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics, 38, 2118–2144.
Article MathSciNet MATH Google Scholar
Cai, T. T., & Zhou, H. H. (2012). Minimax estimation of large covariance matrices under l₁ norm (with discussion). Statistica Sinica, 22, 1319–1378.
MathSciNet MATH Google Scholar
Chandrasekaran, V., Parrilo, P. A., & Willsky, A. S. (2012). Latent variable graphical model selection via convex optimization. The Annals of Statistics, 40, 1935–1967.
Article MathSciNet MATH Google Scholar
Chaudhuri, S., Alur, R., & Cerny, P. (2007). Model checking on trees with path equivalences. In 13th international conference on tools and algorithms for the construction and analysis of systems.
Google Scholar
Choi, Y., Taylor, J., & Tibshirani, R. (2017). Selecting the number of principal components: Estimation of the rank of a noisy matrix. The Annals of Statistics, 45, 2590–2617.
Article MathSciNet MATH Google Scholar
Chun, M., Kim, C., & Chang, I. (2016). Uncovering multiloci-ordering by algebraic property of Laplacian matrix and its Fiedler vector. Bioinformatics, 32, 801–807.
Article Google Scholar
Dempster, A. P. (1972). Covariance selection. Bioemtrics, 28, 157–175.
Article MathSciNet Google Scholar
Edward, D. (2000). Introduction to graphical modelling (2nd ed.). New York: Springer.
Book Google Scholar
El Karouri, N. (2008a). Operator norm consistent estimation of large-dimensional sparse covariance matrices. The Annals of Statistics, 36, 2717–2756.
Article MathSciNet MATH Google Scholar
El Karouri, N. (2008b). Spectrum estimation for large dimensional covariance matrices using random matrix theory. The Annals of Statistics, 36, 2757–2790.
Article MathSciNet MATH Google Scholar
Fan, J., Fan, Y., & Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147, 186–197.
Article MathSciNet MATH Google Scholar
Fan, J., Liao, Y., & Liu, H. (2016). An overview on the estimation of large covariance and precision matrices. The Econometrics Journal, 19, C1–C32.
Article MathSciNet Google Scholar
Fan, J., Liao, Y., & Mincheva, M. (2011). High-dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics, 39, 3320–3356.
Article MathSciNet MATH Google Scholar
Fan, J., Liao, Y., & Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements (with discussion). Journal of the Royal Statistical Society. Series B., 75, 603–680.
Article MathSciNet MATH Google Scholar
Fan, J., Liao, Y., & Wang, W. (2016). Projected principal component analysis in factor models. The Annals of Statistics, 44, 219–254.
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., & Tibshirani, T. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441.
Article MATH Google Scholar
Hallin, M., & Liška, R. (2007). Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association, 102, 603–617.
Article MathSciNet MATH Google Scholar
Hong, Y. (2015). A study on the adjacency matrix and hub in networks(Ph.D. thesis), Pusan National University, Unpublished.
Google Scholar
Huang, J. Z., Liu, N., Pourahmadi, M., & Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika, 93, 85–98.
Article MathSciNet MATH Google Scholar
Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal component analysis. The Annals of Statistics, 29, 295–327.
Article MathSciNet MATH Google Scholar
Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy-Widom limits and rates of convergence. The Annals of Statistics, 36, 2638–2716.
Article MathSciNet MATH Google Scholar
Johnstone, I. M., & Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions (with discussion). Journal of the American Statistical Association, 104, 682–693.
Article MathSciNet MATH Google Scholar
Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). New York: Springer.
MATH Google Scholar
Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18, 39–43.
Article MATH Google Scholar
Kim, C., Cheon, M., Kang, M., & Chang, I. (2008). A simple and exact Laplacian clustering of complex networking phenomena: Application togene expression profiles. Proceedings of the National Academy of Sciences, 105, 4083–4087.
Article Google Scholar
Lam, C., & Fan, J. (2009). Sparsitency and rates of convergence in large covariance matrices. The Annals of Statistics, 37, 4254–4278.
Article MathSciNet MATH Google Scholar
Lam, C., & Yao, Q. (2012). Factor modeling for high-dimensional time series: Inference for the number of factors. The Annals of Statistics, 40, 694–726.
Article MathSciNet MATH Google Scholar
Lam, C., Yao, Q., & Bathia, N. (2011). Estimation of latent factors for high-dimensional time series. Biometrika, 98, 901–918.
Article MathSciNet MATH Google Scholar
Levina, E., & Vershynin, R. (2012). Partial estimation of covariance matrices. Probability Theory and Related Fields, 153, 405–419.
Article MathSciNet MATH Google Scholar
Li, J., & Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40, 908–940.
Article MathSciNet MATH Google Scholar
Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. The Annals of Statistics, 41, 772–801.
Article MathSciNet MATH Google Scholar
Marcenko, V. A., & Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR - Sbornik, 1, 507–536.
Article Google Scholar
Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. New York: Academic Press.
MATH Google Scholar
Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34, 1436–1462.
Article MathSciNet MATH Google Scholar
Mieghem, P. V. (2010). Graph spectra for complex networks. New York: Cambridge University Press.
Book MATH Google Scholar
Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. The Annals of Statistics, 36, 2791–2817.
Article MathSciNet MATH Google Scholar
Newman, M. (2010). Networks; an introduction. New York: Oxford University Press.
Book MATH Google Scholar
Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17, 1617–1642.
MathSciNet MATH Google Scholar
Peng, J., Wang, P., Zhou, N., & Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104, 735–746.
Article MathSciNet MATH Google Scholar
Pillai, N. S., & Yin, J. (2012). Edge universality of correlation matrices. The Annals of Statistics, 40, 1737–1763.
Article MathSciNet MATH Google Scholar
Pourahmadi, M. (2013). Graphical models in applied mathematical multivariate statistics. New York: John Wiley & Sons.
Google Scholar
Rothman, A. J., Levina, E., & Zhu, J. (2009). Generalized thresholding of large covariance matrices. Journal of the American Statistical Association, 104, 177–186.
Article MathSciNet MATH Google Scholar
Schott, J. R. (2007). A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Computational Statistics & Data Analysis, 51, 653–6542.
Article MathSciNet MATH Google Scholar
Shen, D., Shen, H., & Marron, J. S. (2013). Consistency of sparse PCA in high dimension, low sample size contexts. Journal of Multivariate Analysis, 115, 317–333.
Article MathSciNet MATH Google Scholar
Srivastava, M. S., & Yanagihara, H. (2010). Testing the equality of several covariance matrices with fewer observations than the dimension. Journal of Multivariate Analysis, 101, 1319–1329.
Article MathSciNet MATH Google Scholar
Stock, J. H., & Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 97, 1167–1179.
Article MathSciNet MATH Google Scholar
Tracy, C. A., & Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Communications in Mathematical Physics, 177, 727–754.
Article MathSciNet MATH Google Scholar
Tracy, C. A., & Widom, H. (2000). The distribution of the largest eigenvalue in the Gaussian ensembles; β = 1, 2, 4. CRM Series in Mathematical Physics, 4, 461–472.
MathSciNet Google Scholar
Vu, V. Q., Cho, J., Lei, J., & Rohe, K. (2013). Fantope projection and selection: A near-optimal convex relaxation of sparse pca. In Advances in neural information processing systems (pp. 2670–2678).
Google Scholar
Vu, V. Q., & Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. The Annals of Statistics, 41, 2905–2947.
Article MathSciNet MATH Google Scholar
Wang, W., & Fan, J. (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance. The Annals of Statistics, 45, 1342–1374.
Article MathSciNet MATH Google Scholar
Whittaker, J. (1990). High-dimensional covariance estimation. New York: John Wiley & Sons.
Google Scholar
Wigner, E. P. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Annals of Mathematics, 62, 548–564.
Article MathSciNet MATH Google Scholar
Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices. Annals of Mathematics, 67, 325–328.
Article MathSciNet MATH Google Scholar
Xia, Y., Cai, T., & Cai, T. T. (2015). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika, 102, 247–266.
Article MathSciNet MATH Google Scholar
Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. Journal of Machine Learning Research (JMLR), 11, 2261–2286.
MathSciNet MATH Google Scholar
Yuan, M., & Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19–35.
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component. Journal of Computational and Graphical Statistics, 15, 265–286.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Pusan National University, Pusan, 609-735, Korea
Younghee Hong & Choongrak Kim

Authors

Younghee Hong
View author publications
You can also search for this author in PubMed Google Scholar
Choongrak Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Choongrak Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, Y., Kim, C. Recent developments in high dimensional covariance estimation and its related issues, a review. J. Korean Stat. Soc. 47, 239–247 (2018). https://doi.org/10.1016/j.jkss.2018.04.005

Download citation

Received: 02 March 2018
Accepted: 24 April 2018
Published: 23 July 2018
Issue Date: September 2018
DOI: https://doi.org/10.1016/j.jkss.2018.04.005

AMS 2000 subject classifications

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Recent developments in high dimensional covariance estimation and its related issues, a review

Abstract

Article PDF

Similar content being viewed by others

Test on the linear combinations of covariance matrices in high-dimensional data

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

Hypothesis tests for high-dimensional covariance structures

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

AMS 2000 subject classifications

Keywords

Navigation

Recent developments in high dimensional covariance estimation and its related issues, a review

Abstract

Article PDF

Similar content being viewed by others

Test on the linear combinations of covariance matrices in high-dimensional data

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

Hypothesis tests for high-dimensional covariance structures

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

AMS 2000 subject classifications

Keywords

Search

Navigation