Large-scale paralleled sparse principal component analysis

Liu, W.; Zhang, H.; Tao, D.; Wang, Y.; Lu, K.

doi:10.1007/s11042-014-2004-4

Large-scale paralleled sparse principal component analysis

Published: 24 April 2014

Volume 75, pages 1481–1493, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Large-scale paralleled sparse principal component analysis

Download PDF

W. Liu¹,
H. Zhang¹,
D. Tao²,
Y. Wang¹ &
…
K. Lu³

1327 Accesses
46 Citations
Explore all metrics

Abstract

Principal component analysis (PCA) is a statistical technique commonly used in multivariate data analysis. However, PCA can be difficult to interpret and explain since the principal components (PCs) are linear combinations of the original variables. Sparse PCA (SPCA) aims to balance statistical fidelity and interpretability by approximating sparse PCs whose projections capture the maximal variance of original data. In this paper we present an efficient and paralleled method of SPCA using graphics processing units (GPUs), which can process large blocks of data in parallel. Specifically, we construct parallel implementations of the four optimization formulations of the generalized power method of SPCA (GP-SPCA), one of the most efficient and effective SPCA approaches, on a GPU. The parallel GPU implementation of GP-SPCA (using CUBLAS) is up to eleven times faster than the corresponding CPU implementation (using CBLAS), and up to 107 times faster than a MatLab implementation. Extensive comparative experiments in several real-world datasets confirm that SPCA offers a practical advantage.

Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes

Article 22 September 2020

Parallel regressions for variable selection using GPU

Article 17 February 2016

G-MSR: A GPU-Based Dimensionality Reduction Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Principal component analysis (PCA) [15] is a well-established tool used for data analysis and dimensionality reduction [10] [11] [12] [13] [27] [31]. The goal of PCA is to find a sequence of orthogonal factors that represent the directions of largest variance. PCA is used in many applications, including machine learning [32], image processing [28], neurocomputing, engineering, and computer networks, especially for large datasets. However, despite its power and popularity, a major limitation of PCA is that the derived principal components (PCs) are difficult to interpret and explain because they tend to be linear combinations of all the original variables.

Over the past 10 years, sparse principal component analysis (SPCA) has been used to improve the interpretability of PCs. SPCA aims to find a reasonable balance between statistical fidelity and interpretability by approximating sparse PCs. Briefly, SPCA methods can be divided into two groups: (1) ad hoc methods [16] [4] and (2) sparsity penalization methods [17] [36] [1] [2] [23] [18]. Ad hoc methods post-process the components obtained from classical PCA; for example, Jolliffe [16] uses rotation techniques in the standard PCA subspace to find sparse loading vectors, while Cadima and Jolliffe [4] simply set the PCA loadings with small absolute values to zero. Sparsity penalization methods usually formulate the SPCA problem as an optimization program by adding a sparsity-penalized term into the PCA framework. For example, Jolliffe et al.[17] maximize the Rayleigh quotient of the data covariance matrix under the L1-norm penalty in the SCoTLASS algorithm. Zou et al.[36] formulate sparse PCA as a regression-type optimization problem by imposing the LASSO penalty on the regression coefficients. In the DSPCA algorithm, d’Aspremont et al. [1] solve a convex relaxation of the sparse PCA, while Moghaddam et al.[23] and d’Aspremont et al.[2] go on to use greedy methods in order to solve the combinatorial problems encountered in sparse PCA. Finally, Journée et al.[18] propose the generalized power method for sparse PCA (GP-SPCA), in which sparse PCA is formulated as two single-unit and two block optimization problems. GP-SPCA has optimal convergence properties when either the objective function, or the feasible set, are strongly convex [18].

There is ever growing collection, sharing, combination, and use of massive amounts of data. The analysis of such “big data” has become essential in many commercial and scientific applications, from image analysis [20] [21] to genome sequencing. Parallel computing algorithms are essential for large-scale, high-dimensional data. Fortunately, modern graphics processing units (GPUs) have a highly parallel structure that makes them ideally suited to processing big data algorithms as well as graphics [25].

In this study we consider how to build compact, unsupervised representations of large-scale, high-dimensional data using sparse PCA schemes, with an emphasis on executing the algorithm in the GPU environment. The work can be regarded as a set of parallel optimization procedures for SPCA; specifically, we construct parallel implementations of the four optimization formulations used in GP-SPCA. To the best of our knowledge, GP-SPCA has not previously been implemented using GPUs. We compare the GPU implementation (on an NVIDIA Tesla C2050) with the corresponding CPU implementation (on a six-core 3.33 GHz high-performance cluster) and show that the parallel GPU implementation of GP-SPCA is up to 11 times faster than the corresponding CPU implementation, and up to 107 times faster than the corresponding MatLab implementation. We also conduct extensive comparative experiments of SPCA and PCA on several benchmark datasets, which provide further evidence that SPCA outperforms PCA in the majority of cases.

The remainder of this paper is organized as follows. GP-SPCA is briefly introduced in Section 2. The implementation of GP-SPCA on GPUs using CUBLAS is described in Section 3, and the experiments are presented in Section 4. We conclude in Section 5.

2 Generalized power method of SPCA

Let A ∈ R^p × n be a matrix encoding p samples of n variables. SPCA aims to find principal components that are both sparse and explain as much of the variance in the data as possible, and in doing so finds a reasonable trade-off between statistical fidelity and interpretability. GP-SPCA considers two single-unit and two block formulations of SPCA, in order to extract m sparse principal components, with m = 1 for two single-unit formulations of SPCA and p ≥ m ≥ 1 for the two block formulations of SPCA. GP-SPCA maximizes a convex function on the unit Euclidean sphere in R^p (for m = 1) or on the Stiefel manifold in R^p × m (for m > 1). Depending on the type of penalty (either l₁ or l₀) used to enforce sparsity, there are four formulations of SPCA, namely single-unit SPCA via the l₁-penalty (GP-SPCA-SL1), single-unit SPCA via the l₀-penalty (GP-SPCA-SL0), block SPCA via the l₁-penalty (GP-SPCA-BL1), and block SPCA via the l₀-penalty (GP-SPCA-BL0).

Denote the unit Euclidean ball (resp. sphere) in R^k by B^k = {y ∈ R^k|y^Ty ≤ 1} (resp. S^k = {y ∈ R^k|y^Ty = 1}). Denote the space of n × m matrices with unit-norm columns by [Sⁿ]^m = {Y ∈ R^n × m|Diag(Y^TY) = I_m}, where Diag(∙) is the diagonal matrix, by extracting the diagonal of the argument. Denote the Stiefel manifold by S ^p_m = {Y ∈ R^n × m|Y^TY = I_m}, and write sign(t) for the sign of the argument t ∈ R and t₊ = max{0, t}. The characteristics of the four variants of GP-SPCA are summarized in Table 1 [18]. And we implement all the four formulations of GP-SPCA on the GPU to boost the efficiency.

Table 1 The four variant formulations of GP-SPCA

Full size table

GP-SPCA has optimal convergence properties when either the objective functions, or the feasible set, are strongly convex, which is the case with the single-unit formulations and can be enforced in the block cases [18].

3 GPU implementation of SPCA (GP-SPCA)

GPUs are typically used for computer graphics processing in general-purpose computing. There is a discrepancy between the floating-point capability of the CPU and GPU because the GPU is specialized for intensive, highly-parallel computation, and is therefore specifically designed to devote more transistors to data processing rather than data caching and flow control, as shown in Fig. 1 [25].

CUDA^TM is a general-purpose parallel computing architecture designed by NVIDIA, which has a parallel programming model and instruction set architecture. CUDA guides the programmer to partition a problem into a sub-problem that can be solved as independent parallel blocks of threads in a thread hierarchy; Fig. 2 illustrates the hierarchy of threads, blocks, and grids used in CUDA. As well as the CUDA programming environment, NVIDIA also supplies toolkits for the programmer: CUBLAS [26] is one such library that implements Basic Linear Algebra Subprograms (BLAS).

Here we implement all formulations of GP-SPCA on the GPU using CUBLAS. The data space is allocated both on the host memory (CPU) and on the device memory (GPU). Data are initialized on the host memory before being transferred to the device memory, after which parallel computation is performed on the device memory. The results are then transferred back to the host memory when computation is complete.

4 Experiments

In this section, we conduct comparative experiments to evaluate the efficiency of GPU computing and the effectiveness of GP-SPCA.

5 Efficiency of GPU computing

In order to compare the efficiency of GPU and CPU computing, we first conduct the CPU implementation of GP-SPCA using GSL CBLAS [9], which is a highly efficient implementation of BLAS. We also compare the implementation with the MatLab application presented in [18] .

A six-core 3.33 GHz high performance cluster was used for the CPU implementation, and an NVIDIA Tesla C2050 for the GPU implementation. Twenty test instances were generated for each input matrix A_P × N (N ∈ [5.0 × 10², 3.2 × 10⁴], P = N/10). Here, m = 5 is the number of sparse PCs, and γ ∈ {0.01, 0.05} is the aforementioned parameter that balances the sparsity and variance of the PCs.

Figure 3 shows the average running time of different input matrices using different parameters. The difference in processing time (between CPU and GPU) increases with increasing size of the input matrix, with up to eleven times improvement in speed over the corresponding CBLAS implementation, and up to 107-times over the MatLab implementation.

6 Effectiveness of GP-SPCA

To evaluate the effectiveness of GP-SPCA in practice, we next conducted GP-SPCA and PCA experiments on several benchmark datasets, including the USPS database [14], the COIL20 database [24], and the Isolet spoken letter recognition database [3] . For each experiment, we used GP-SPCA and PCA to learn the project functions using training samples, before mapping all the samples (both training and test samples) into the lower dimensional subspace where recognition is performed using a nearest neighbor classifier.

7 USPS database

The USPS database [14] is a handwritten digit database containing 9,298 16 × 16 pixel handwritten digit images in total (Fig. 4). Each pixel is with 256 grey levels, thus each image is represented by a 256-dimensional vector. The database was split into 7,291 training images and 2007 test images as in [5] [6], with the parameter γ set to 0.1.

The results of SPCA and PCA in recognizing the ten handwritten digits are shown in Fig. 5, from which we can see that SPCA outperforms PCA in most cases.

8 COIL20 database

The COIL20 database [24] contains 1,440 images of 20 objects (for examples, see Fig. 6). The images of each object are taken five degrees apart as the object is rotated on a turntable, and as a result each object is represented by 72 32 × 32 pixel images, with 256 grey levels per pixel. Thus, each image is represented by a 1,024-dimensional vector. We randomly selected two groups of 24 and 36 examples of each object as training sets, and used the remaining images for the test sets. The parameter γ was set to 0.3 for 24-example group, and 0.1 for the 36-example group. All the experiments were repeated five times.

Figure 7. shows that SPCA outperforms PCA in both cases. Figure 8, which shows the recognition rate of selected objects, demonstrates that SPCA outperforms PCA in most cases.

9 Isolet spoken letter recognition database

The Isolet spoken letter recognition database [3] contains 150 subjects, each of whom speaks each letter of the alphabet twice. The features include spectral coefficients; contour features, sonorant features, pre-sonorant features, and post-sonorant features as in [22] . The speakers were grouped into five sets of 30 speakers; three were used for training and two for testing in the first experiment and four groups for training the other for testing in the second experiment (to evaluate robustness). The parameter γ was set to 10^− 6 for the first experiment and 0.02 for the second, and each experiment was repeated five times. Figs. 9 and 10 show the average recognition rates and recognition of each character, respectively. SPCA is superior to PCA in the majority of cases.

10 Conclusion

Sparse PCA is a reasonable method for balancing statistical fidelity and interpretability. In this paper, we present a paralleled method of GP-SPCA, one of the most efficient SPCA approaches, using a GPU. Specifically, we construct parallel implementations of the four optimization formulations for the GPU, and compare this with a CPU implementation using CBLAS. Using real-world data, we experimentally validate the effectiveness of GP-SPCA and demonstrate that the parallel GPU implementation of GP-SPCA can significantly improve performance. This work has several potential applications in large-scale, high-dimension reduction problems such as video indexing [33] [7] [8]and web image annotation [34] [35] [19] [29] [30], which will be the subject of future study.

References

d’Aspremont A, El Ghaoui L, Jordan MI, Lanckriet GRG (2007) A direct formulation for sparse PCA using semidefinite programming. SIAM Rev 49:434–448
Article MathSciNet Google Scholar
D’Aspremont A, Bach FR, El Ghaoui L (2008) Optimal solutions for sparse principal component analysis. J Mach Learn Res 9:1269–1294
MathSciNet MATH Google Scholar
K. Bache and M. Lichman (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
Cadima J, Jolliffe IT (1995) Loadings and correlations in the interpretation of principal components. J Appl Stat 22:203–214
Article MathSciNet Google Scholar
Cai D, He X, Han J, Huang T (2011) Graph regularized Non-negative matrix factorization for data representation. IEEE Trans PAM 33(8):1548–1560
Article Google Scholar
Cai D, He X, Han J (2011) Speed Up kernel discriminant analysis. VLDB J 20(1):21–33
Article Google Scholar
Cheng-Chieh C, Huei-Fang Y (2013) Quick browsing and retrieval for surveillance videos. Multimedia Tools Appl. doi:10.1007/s11042-013-1750-z
Article Google Scholar
Youtian D, Feng C, Wenli X, Xueming Q (2013) Video content categorization using the double decomposition. Multimedia Tools Appl. doi:10.1007/s11042-012-1213-y
Article Google Scholar
Mark Galassi, Jim Davies, James Theiler, Brian Gough, et al. (2003)GNU Scientific Library
Guan N, Tao D, Luo Z, Yuan B (2012) Online nonnegative matrix factorization with robust stochastic approximation. IEEE Trans Neural Netw Learning Syst 23(7):1087–1099
Article Google Scholar
Guan N, Tao D, Luo Z, Yuan B (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process 60(6):2882–2898
Article MathSciNet Google Scholar
Guan N, Tao D, Luo Z, Yuan B (2011) Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process 20(7):2030–2048
Article MathSciNet Google Scholar
Guan N, Tao D, Luo Z, Yuan B (2011) Non-negative patch alignment framework. IEEE Trans Neural Netw 22(8):1218–1230
Article Google Scholar
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Article Google Scholar
Jolliffe IT (1986) Principal component analysis. Springer Verlag, New York
Book Google Scholar
Jolliffe IT (1995) Rotation of principal components: choice of normalization constraints. J Appl Stat 22:29–35
Article MathSciNet Google Scholar
Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12(3):531–547
Article MathSciNet Google Scholar
Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalize power method for sparse principal component analysis. J Mach Learn Res 11:517–553
MathSciNet MATH Google Scholar
Li J, Allinson NM, Tao D, Li X (2006) Multitraining support vector machine for image retrieval. IEEE Trans Image Process 15(11):3597–3601
Article Google Scholar
Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEEE Trans Image Process 22:2676–2687
Article MathSciNet Google Scholar
Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118:50–60
Article Google Scholar
Fanty, Mark, and Ronald Cole. (1990) “Spoken Letter Recogniitiion
Moghaddam B, Weiss Y, Avidan S (2006) Spectral bounds for sparse PCA: exact and greedy algorithms. Advances in neural information processing systems, vol 18. MIT Press, Cambridge, pp 915–922
Google Scholar
S. A. Nene, S. K. Nayar and H. Murase (1996) Columbia Object Image Library (COIL-20). Technical Report CUCS-005-96
NVIDIA, CUDA C Programming Guide (version 4.0), (2011)
NVIDIA, CUBLAS Library (2011)
J. Sun, D. Tao, C. Faloutsos (2006) Beyond streams and graphs: dynamic tensor analysis. KDD: 374–383
Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715
Article Google Scholar
Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088–1099
Article Google Scholar
Tao D, Tang X, Li X, Rui Y (2006) Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm. IEEE Trans on Multimed 8(4):716–727
Article Google Scholar
Tao D, Li X, Wu X, Maybank SJ (2009) Geometric mean for subspace selection. IEEE Trans Pattern Anal Mach Intell 31(2):260–274
Article Google Scholar
Xu C, Tao D, Xu C (2014) Large-margin multi-view information bottleneck. IEEE Trans Pattern Anal Mach Intell. doi:10.1109/TPAMI.2013.2296528
Article Google Scholar
Zha Z-J, Wang M, Zheng Y-T, Yang Y, Hong R (2012) Tat-seng Chua: interactive video indexing with statistical active learning. IEEE Trans Multimedia 14(1):17–27
Article Google Scholar
Zheng-Jun Zha, Xian-Sheng Hua, Tao Mei, Jingdong Wang, Guo-Jun Qi, Zengfu Wang (2008) Joint multi-label multi-instance learning for image classification. CVPR
Yan-Tao Z, Zheng-Jun Z, Tat-Seng C (2011) Research and applications on georeferenced multimedia: a survey. Multimed Tools Appl 51(1):77–98
Article Google Scholar
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work was supported in part by the following projects: the National Natural Science Foundation of China (61271407, 61301242), Shandong Provincial Natural Science Foundation, China (ZR2011FQ016), the Fundamental Research Funds for the Central Universities, China University of Petroleum (East China) (13CX02096A, CX2013057, 27R1105019A).

Author information

Authors and Affiliations

China University of Petroleum (East China), Qingdao, Shandong, China
W. Liu, H. Zhang & Y. Wang
South China University of Technology, Guangzhou, Guangdong, China
D. Tao
University of the Chinese Academy of Sciences, Beijing, China
K. Lu

Authors

W. Liu
View author publications
You can also search for this author in PubMed Google Scholar
H. Zhang
View author publications
You can also search for this author in PubMed Google Scholar
D. Tao
View author publications
You can also search for this author in PubMed Google Scholar
Y. Wang
View author publications
You can also search for this author in PubMed Google Scholar
K. Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Tao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, W., Zhang, H., Tao, D. et al. Large-scale paralleled sparse principal component analysis. Multimed Tools Appl 75, 1481–1493 (2016). https://doi.org/10.1007/s11042-014-2004-4

Download citation

Received: 27 November 2013
Revised: 25 February 2014
Accepted: 01 April 2014
Published: 24 April 2014
Issue Date: February 2016
DOI: https://doi.org/10.1007/s11042-014-2004-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Large-scale paralleled sparse principal component analysis

Abstract

Similar content being viewed by others

Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes

Parallel regressions for variable selection using GPU

G-MSR: A GPU-Based Dimensionality Reduction Algorithm

1 Introduction

2 Generalized power method of SPCA

3 GPU implementation of SPCA (GP-SPCA)

4 Experiments

5 Efficiency of GPU computing

6 Effectiveness of GP-SPCA

7 USPS database

8 COIL20 database

9 Isolet spoken letter recognition database

10 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large-scale paralleled sparse principal component analysis

Abstract

Similar content being viewed by others

Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes

Parallel regressions for variable selection using GPU

G-MSR: A GPU-Based Dimensionality Reduction Algorithm

Explore related subjects

1 Introduction

2 Generalized power method of SPCA

3 GPU implementation of SPCA (GP-SPCA)

4 Experiments

5 Efficiency of GPU computing

6 Effectiveness of GP-SPCA

7 USPS database

8 COIL20 database

9 Isolet spoken letter recognition database

10 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation