Abstract
In this paper we review some of recent developments in high dimensional data analysis, especially in the estimation of covariance and precision matrix, asymptotic results on the eigenstructure in the principal components analysis, and some relevant issues such as test on the equality of two covariance matrices, determination of the number of principal components, and detection of hubs in a complex network.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Ahn, S. C., & Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81, 1203–1227.
Alessi, L., Barigozzi, M., & Capasso, M. (2010). Improved penalization for determining the number of factors in approximate factor models. Statistics & Probability Letters, 80, 1806–1813.
Bai, Z. D. (1993). Convergence rate of expected spectral distributions of large random matrices. The Annals of Probability, 21, 649–672.
Bai, J., & Li, K. (2012). Statistical analysis of factor models of high dimension. The Annals of Probability, 40, 437–465.
Bai, J., & Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70, 191–221.
Bai, Z. D., & Yin, Y. Q. (1993). Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. The Annals of Probability, 21, 1275–1294.
Bao, Z. G., Pan, G. M., & Zhou, W. (2011). Tracy-Widom law for the extreme eigenvalues of sample correlation matrices. Preprint. Available at arXiv: 1110.5208.
Berthet, Q., & Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. The Annals of Statistics, 41, 1780–1815.
Bickel, P. J., & Levina, E. (2008a). Covariance regularization by thresholding. The Annals of Statistics, 36, 2577–2604.
Bickel, P. J., & Levina, E. (2008b). Regularized estimation of large covariance matrices. The Annals of Statistics, 36, 199–227.
Bien, J., & Tibshirani, R. J. (2011). Sparse estimation of a covariance matrix. Biometrika, 98, 807–820.
Birnbaum, A., Johnstone, I. M., Nadler, B., & Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. The Annals of Statistics, 41, 1055–1084.
Bonacich, P. (1987). Power and centrality: A family of measures power and centrality. The American Journal of Sociology, 92, 1170–1182.
Butte, A. J., Tamayo, P., Slonim, D., Golub, T. R., & Kohane, I. S. (2000). Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proceedings of the National Academy of Sciences, 97, 12182–12186.
Cai, T. T., & Liu, W. (2011). A direct estimation approach to sparse linear discriminant analysis. Journal of the American Statistical Association, 106, 1566–1577.
Cai, T.T., Liu, W., & Luo, X. (2011). Aconstrained l1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106, 672–684.
Cai, T. T., Liu, W., & Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. Journal of the American Statistical Association, 108, 265–277.
Cai, T. T., Liu, W., & Zhou, H. H. (2016). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. The Annals of Statistics, 44, 455–488.
Cai, T. T., Ma, Z., & Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probability Theory and Related Fields, 161, 781–815.
Cai, T. T., Ren, Z., & Zhou, H. H. (2013). Optimal rates of convergence for estimating Toeplitz covariance matrices. Probability Theory and Related Fields, 156, 101–143.
Cai, T. T., & Yuan, M. (2012). Adaptive covariance matrix estimation through block thresholding. The Annals of Statistics, 40, 2014–2042.
Cai, T. T., Zhang, C. H., & Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics, 38, 2118–2144.
Cai, T. T., & Zhou, H. H. (2012). Minimax estimation of large covariance matrices under l1 norm (with discussion). Statistica Sinica, 22, 1319–1378.
Chandrasekaran, V., Parrilo, P. A., & Willsky, A. S. (2012). Latent variable graphical model selection via convex optimization. The Annals of Statistics, 40, 1935–1967.
Chaudhuri, S., Alur, R., & Cerny, P. (2007). Model checking on trees with path equivalences. In 13th international conference on tools and algorithms for the construction and analysis of systems.
Choi, Y., Taylor, J., & Tibshirani, R. (2017). Selecting the number of principal components: Estimation of the rank of a noisy matrix. The Annals of Statistics, 45, 2590–2617.
Chun, M., Kim, C., & Chang, I. (2016). Uncovering multiloci-ordering by algebraic property of Laplacian matrix and its Fiedler vector. Bioinformatics, 32, 801–807.
Dempster, A. P. (1972). Covariance selection. Bioemtrics, 28, 157–175.
Edward, D. (2000). Introduction to graphical modelling (2nd ed.). New York: Springer.
El Karouri, N. (2008a). Operator norm consistent estimation of large-dimensional sparse covariance matrices. The Annals of Statistics, 36, 2717–2756.
El Karouri, N. (2008b). Spectrum estimation for large dimensional covariance matrices using random matrix theory. The Annals of Statistics, 36, 2757–2790.
Fan, J., Fan, Y., & Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147, 186–197.
Fan, J., Liao, Y., & Liu, H. (2016). An overview on the estimation of large covariance and precision matrices. The Econometrics Journal, 19, C1–C32.
Fan, J., Liao, Y., & Mincheva, M. (2011). High-dimensional covariance matrix estimation in approximate factor models. The Annals of Statistics, 39, 3320–3356.
Fan, J., Liao, Y., & Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements (with discussion). Journal of the Royal Statistical Society. Series B., 75, 603–680.
Fan, J., Liao, Y., & Wang, W. (2016). Projected principal component analysis in factor models. The Annals of Statistics, 44, 219–254.
Friedman, J., Hastie, T., & Tibshirani, T. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441.
Hallin, M., & Liška, R. (2007). Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association, 102, 603–617.
Hong, Y. (2015). A study on the adjacency matrix and hub in networks(Ph.D. thesis), Pusan National University, Unpublished.
Huang, J. Z., Liu, N., Pourahmadi, M., & Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika, 93, 85–98.
Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal component analysis. The Annals of Statistics, 29, 295–327.
Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy-Widom limits and rates of convergence. The Annals of Statistics, 36, 2638–2716.
Johnstone, I. M., & Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions (with discussion). Journal of the American Statistical Association, 104, 682–693.
Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). New York: Springer.
Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18, 39–43.
Kim, C., Cheon, M., Kang, M., & Chang, I. (2008). A simple and exact Laplacian clustering of complex networking phenomena: Application togene expression profiles. Proceedings of the National Academy of Sciences, 105, 4083–4087.
Lam, C., & Fan, J. (2009). Sparsitency and rates of convergence in large covariance matrices. The Annals of Statistics, 37, 4254–4278.
Lam, C., & Yao, Q. (2012). Factor modeling for high-dimensional time series: Inference for the number of factors. The Annals of Statistics, 40, 694–726.
Lam, C., Yao, Q., & Bathia, N. (2011). Estimation of latent factors for high-dimensional time series. Biometrika, 98, 901–918.
Levina, E., & Vershynin, R. (2012). Partial estimation of covariance matrices. Probability Theory and Related Fields, 153, 405–419.
Li, J., & Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. The Annals of Statistics, 40, 908–940.
Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. The Annals of Statistics, 41, 772–801.
Marcenko, V. A., & Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR - Sbornik, 1, 507–536.
Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. New York: Academic Press.
Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34, 1436–1462.
Mieghem, P. V. (2010). Graph spectra for complex networks. New York: Cambridge University Press.
Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. The Annals of Statistics, 36, 2791–2817.
Newman, M. (2010). Networks; an introduction. New York: Oxford University Press.
Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, 17, 1617–1642.
Peng, J., Wang, P., Zhou, N., & Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104, 735–746.
Pillai, N. S., & Yin, J. (2012). Edge universality of correlation matrices. The Annals of Statistics, 40, 1737–1763.
Pourahmadi, M. (2013). Graphical models in applied mathematical multivariate statistics. New York: John Wiley & Sons.
Rothman, A. J., Levina, E., & Zhu, J. (2009). Generalized thresholding of large covariance matrices. Journal of the American Statistical Association, 104, 177–186.
Schott, J. R. (2007). A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Computational Statistics & Data Analysis, 51, 653–6542.
Shen, D., Shen, H., & Marron, J. S. (2013). Consistency of sparse PCA in high dimension, low sample size contexts. Journal of Multivariate Analysis, 115, 317–333.
Srivastava, M. S., & Yanagihara, H. (2010). Testing the equality of several covariance matrices with fewer observations than the dimension. Journal of Multivariate Analysis, 101, 1319–1329.
Stock, J. H., & Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association, 97, 1167–1179.
Tracy, C. A., & Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Communications in Mathematical Physics, 177, 727–754.
Tracy, C. A., & Widom, H. (2000). The distribution of the largest eigenvalue in the Gaussian ensembles; β = 1, 2, 4. CRM Series in Mathematical Physics, 4, 461–472.
Vu, V. Q., Cho, J., Lei, J., & Rohe, K. (2013). Fantope projection and selection: A near-optimal convex relaxation of sparse pca. In Advances in neural information processing systems (pp. 2670–2678).
Vu, V. Q., & Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. The Annals of Statistics, 41, 2905–2947.
Wang, W., & Fan, J. (2017). Asymptotics of empirical eigenstructure for high dimensional spiked covariance. The Annals of Statistics, 45, 1342–1374.
Whittaker, J. (1990). High-dimensional covariance estimation. New York: John Wiley & Sons.
Wigner, E. P. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Annals of Mathematics, 62, 548–564.
Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices. Annals of Mathematics, 67, 325–328.
Xia, Y., Cai, T., & Cai, T. T. (2015). Testing differential networks with applications to the detection of gene-gene interactions. Biometrika, 102, 247–266.
Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. Journal of Machine Learning Research (JMLR), 11, 2261–2286.
Yuan, M., & Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19–35.
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component. Journal of Computational and Graphical Statistics, 15, 265–286.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hong, Y., Kim, C. Recent developments in high dimensional covariance estimation and its related issues, a review. J. Korean Stat. Soc. 47, 239–247 (2018). https://doi.org/10.1016/j.jkss.2018.04.005
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1016/j.jkss.2018.04.005