Abstract
We clarify the mathematical equivalence between low-dimensional singular value decomposition and low-order tensor principal component analysis for two- and three-dimensional images. Furthermore, we show that the two- and three-dimensional discrete cosine transforms are, respectively, acceptable approximations to two- and three-dimensional singular value decomposition and classical principal component analysis. Moreover, for the practical computation in two-dimensional singular value decomposition, we introduce the marginal eigenvector method, which was proposed for image compression. For three-dimensional singular value decomposition, we also show an iterative algorithm. To evaluate the performances of the marginal eigenvector method and two-dimensional discrete cosine transform for dimension reduction, we compute recognition rates for six datasets of two-dimensional image patterns. To evaluate the performances of the iterative algorithm and three-dimensional discrete cosine transform for dimension reduction, we compute recognition rates for datasets of gait patterns and human organs. For two- and three-dimensional images, the two- and three-dimensional discrete cosine transforms give almost the same recognition rates as the marginal eigenvector method and iterative algorithm, respectively.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we clarify the equivalence between Nth-order tensor principal component analysis (TPCA) and N-dimensional singular value decomposition (NDSVD) for \(N=2, 3\). We also show that the NDSVD is theoretically equivalent to classical principal component analysis (PCA), which is based on a vector representation. We refer to the classical PCA for vectors as vector PCA. Furthermore, for the practical computation in the 2DSVD, we introduce marginal eigenvector (MEV) method, which was proposed for image compression in 1981 by Otsu. Moreover, we present an extension of the properties represented in second-order tensors [1] to third-order tensors. These results imply that the TPCA for Nth-order tensors can be approximated by the N-dimensional discrete cosine transform (NDDCT). Table 1 summarises the abbreviations that we use in this paper.
A pattern is assumed to be a square integrable function defined on a finite support in N-dimensional Euclidean space. For planar and volumetric pattern, the dimensions of Euclidean spaces are two and three, respectively. Organs are essentially spatial textures which are functions defined in three-dimensional Euclidean space. Furthermore, for video sequence [2] and volumetric sequence [3], the dimensions of the data spaces are three and four, respectively, since in these applications for planar and spatio-temporal data are focused to analysis. Moreover, planar multichannel images [4, 5] are also expressed as three-way arrays. For these data, elements of two-dimensional array use an additional axis to express frequencies of elements. Multichannel image pattern recognition has been a central issue in remote sensing of earth and planets [6]. In seismic data analysis, the dimension of the data space is five, since waves stated by a planar source array migrated to planar receiver array are focused to analyse [7].
For numerical computation, we deal with sampled patterns. In traditional pattern recognition, these sampled patterns are embedded in an appropriate-dimensional Euclidean space as vectors. The other way is to deal with sampled patterns as higher-dimensional array data. These array data are expressed by tensor to preserve multilinearity of function in the original pattern space. Tensors allow expressing multidimensional array data in multilinear forms. Figure 1 illustrates the relation among sampling, vectors and tensor representation for multilinear structure.
For applications of modern pattern recognition techniques such as deep learning [8] and machine learning for big data [9], we are mathematically and numerically required to evaluate the performance of tensor-based pattern recognition of multilinear data. Importantly, for fast image pattern recognition, a compact representation of these image data is desirable. Tensor expressions fulfil these requirements in applications of pattern recognition of multidimensional array data.
The vector PCA method is a traditional method for data compression. It has been extended to higher-dimensional array data [10, 11]. For example, tensor PCA method constructs a small-size tensor using the orthogonal decomposition of a tensor, while the classical PCA (vector PCA) estimates a low-dimensional linear subspace using PCA. Second-order TPCA, which directly decomposes an image matrix, is the tensor PCA method [10, 11] for two-dimensional images. A survey [11] reported that there are three basic projections for a tensor. Second- and third-order TPCA use tensor-to-tensor projections consisting of 1- and 2-mode projections, and 1-, 2- and 3-mode projections for two-dimensional and three-dimensional images, respectively.
For image representation, two-dimensional principal component analysis (2DPCA) [12] has been proposed. However, the projection method in the 2DPCA is not a bilinear form since the 2DPCA uses only the 2-mode projection. The MEV method [13], which is based on both 1- and 2-mode projections, has been proposed for image compression. The 2DSVD [14, 15], which is also based on both 1- and 2-mode projections, has been proposed for image compression as an extension of the SVD [16]. The projections in the MEV method and the 2DSVD are equivalent to the tensor-to-tensor projection for a second-order tensor. This mathematical property implies that the 2DSVD is a special case of the TPCA. However, the compression rate of the 2DSVD is smaller than that of the 2DDCT [14] for the same reconstruction quality. An iterative algorithm [17] for the second-order TPCA has been proposed. This iterative algorithm is referred to as generalised principal component analysis (GPCA). This GPCA method is a two-dimensional version of the iterative algorithm for the SVD [18, 19].
The origin of the TPCA for the third-order tensors was proposed as the decomposition of tensors by Tucker [20]. For the Tucker decomposition of second- and third-order tensors, Kroonenberg and Leeuw [21] discussed the properties of convergence of alternating least squares (ALS) algorithms. In general for Tucker decomposition, orthogonality constraints on decomposed tensors are not required. Cichoki et al. [22] imposed that the existence of the constraints is the difference between the TPCA and parallel factor analysis. In Ref. [22], in addition to orthogonal constraints, sparse constraints and nonnegative constraints for tensor decomposition are studied. For practical computation in higher TPCA, higher-order singular value decomposition (HOSVD) [23] was formulated. In Refs. [23, 24], ALS algorithm for the smaller-size tensor approximation of higher-order tensors was studied. For pattern recognition, there are variants of the TPCA [10, 25–27]. Multilinear principal component analysis (MPCA) [10] is an iterative algorithm used for TPCA and is the N-dimensional version of the GPCA. This iterative algorithm is the same as the ALS algorithm. Robust MPCA [25] is a robust version of TPCA for image pattern recognition including outliers. Uncorrelated MPCA [26] searches for a tensor-to-vector projection that obtains most of the variation in the original tensorial input by deciding the maximum number of uncorrelated features. Sparse higher-order principal component analysis [27] searches for the minimum number of bases for input tensors by assuming sparsity in tensor decomposition.
For image pattern recognition, the two-dimensional tensor subspace method (2DTSM) [28], which measures the similarity between an input image and each tensor subspace of a class, has been proposed as extension of the subspace method [29]. The 2DTSM adopts the MEV method to construct each tensor subspace of a class. We extend the 2DTSM for use with three-dimensional images in this paper.
To evaluate the performances of the MEV method and the 2DDCT for the dimension reduction of two-dimensional images, we compute recognition rates for image patterns. To evaluate the performances of the HOSVD and the NDDCT [30] for three-dimensional images, we compute recognition rates for image sequences in gait classification and voxel images of livers in computational anatomy. For the two-dimensional images, the results show that the MEV method and the 2DDCT have almost the same performances in terms of the recognition rates for images in six datasets. Furthermore, for the three-dimensional images, the results show that the HOSVD and the NDDCT have almost the same performances in terms of the recognition rates for images in a dataset of gait patterns.
The 2DDCT-II is used for the coding in MPEG. In MPEG, a digital image of \(n \times n\) pixels is first partitioned to \(N\times N\) blocks. Usually, the size of each block is \(8 \times 8\) pixels. Then, each \(N\times N\) block is transformed to the frequency domain using the 2DDCT-II. Finally, each transformed value in each block is encoded. If we use MPEG for a sequence of two-dimensional images, that is, for a tensor with size \(N \times N \times N\), we apply N times partitioning with block and encoding with the 2DDCT for N frames. Note that we use the 2DDCT and 3DDCT without partitioning of two- and three-dimensional images, respectively, in this paper.
2 Tensor projection for images
We briefly summarise the multilinear projection for multidimensional arrays from Ref. [11]. A Nth-order tensor \({\mathcal {X}}\) defined in \({\mathbb {R}}^{I_1 \times I_2 \times \cdots \times I_N}\) is expressed as
for \({x}_{i_1,i_2,\ldots ,i_N}\in {\mathbb {R}}\), using N indices \(i_n\). Each subscript n denotes the n-mode of \({\mathcal {X}}\). For the outer products of N vectors, if the tensor \({\mathcal {X}}\) satisfies the condition
where \(\circ \) denotes the outer product, we call this tensor \({\mathcal {X}}\) a rank-one tensor. For \({\mathcal {X}}\), the n-mode vectors, \(n=1,2,\ldots ,N\), are defined as the \(I_n\)-dimensional vectors obtained from \({\mathcal {X}}\) by varying this index \(i_n\) while fixing all the other indices. The unfolding of \({\mathcal {X}}\) along the n-mode vectors of \({\mathcal {X}}\) is defined as
where the column vectors of \({\mathcal {X}}_{(n)}\) are the n-mode vectors of \({\mathcal {X}}\). Figure 2 shows two examples of n-mode unfolding for second- and third-order tensors. The n-mode product \({\mathcal {X}}\times _n \varvec{U}\) of a matrix \(\varvec{U} \in {\mathbb {R}}^{J_n\times I_n}\) and a tensor \({\mathcal {X}}\) is a tensor \(\mathcal {G} \in {\mathbb {R}}^{I_1 \times I_2 \times \cdots \times I_{n-1} \times J_n \times I_{n+1}\times \cdots \times I_N}\), with elements
by the manner in Ref. [22]. We define the inner product of two tensors \({\mathcal {X}}=(x_{i_1,i_2,\ldots ,i_N}), {\mathcal {Y}}=(y_{i_1,i_2,\ldots ,i_N}) \in {\mathbb {R}}^{I_1 \times I_2 \times \cdots \times I_N}\) by
Using this inner product, the Frobenius norm of a tensor \({\mathcal {X}}\) is
For the Frobenius norm of a tensor, we have
where vec and \(\Vert \cdot \Vert _2\) are the vectorisation operator and Euclidean norm of a tensor, respectively. For the two tensors \({\mathcal {X}}_1\) and \({\mathcal {X}}_2\), we define the distance between them as
Although this definition is a tensor-based measure, this distance is equivalent to the Euclidean distance between the vectorised tensors \({\mathcal {X}}_1\) and \({\mathcal {X}}_2\).
For a tensor, a multilinear projection maps the input tensor data from one space to another space. We have three basic multilinear projections, that is, the vector-to-vector projection (VVP), tensor-to-vector projection (TVP) and tensor-to-tensor projection (TTP). The VVP is a linear projection from a vector to another vector. To use the VVP for tensors, we need to reshape tensors into vectors before the projection. The TVP, which is also referred to as the rank-one projection [31, 32], consists of elementary multilinear projections (EMPs). An EMP projects a tensor to a scalar. Using d EMPs, the TVP obtains a d-dimensional vector projected from a tensor. The TTP projects a tensor to another tensor of the same order. In this paper, we focus on methods of finding the optimal projection for the TTP.
As the tensor \({\mathcal {X}}\) is in the tensor space \({\mathbb {R}}^{I_1} \otimes {\mathbb {R}}^{I_2} \otimes \cdots \otimes {\mathbb {R}}^{I_N}\), the tensor space can be interpreted as the Kronecker product of N vector spaces \({\mathbb {R}}^{I_1},{\mathbb {R}}^{I_2}, \dots , {\mathbb {R}}^{I_N}\). To project \({\mathcal {X}} \in {\mathbb {R}}^{I_1} \otimes {\mathbb {R}}^{I_2} \otimes \cdots \otimes {\mathbb {R}}^{I_N}\) to another tensor \({\mathcal {Y}}\) in a lower-dimensional tensor space \({\mathbb {R}}^{P_1} \otimes {\mathbb {R}}^{P_2} \otimes \cdots \otimes {\mathbb {R}}^{P_N}\), where \(P_n \le I_n \) for \(n=1,2, \ldots , N\), we need N matrices \(\{ \varvec{U}^{(n)} \in {\mathbb {R}}^{I_n \times P_n} \}_{n=1}^N\). Using the N matrices, the TTP is given by
This projection is established in N steps, where at the nth step, each n-mode vector is projected to a \(P_n\)-dimensional space by \(\varvec{U}^{(n)}\). Figures 3a and 4a show the steps for the projection of second- and third-order tensors to lower-dimensional tensors, respectively. Figures 3b, c and 4b–d show the procedures used to project second- and third-order tensors, respectively.
3 Decomposition of tensors
3.1 Two-dimensional singular value decomposition
A second-order tensor \({\mathcal {X}}\in {\mathbb {R}}^{I_1 \times I_2}\), which is the matrix \(\varvec{X} = (x_{i_1,i_2}) \in {\mathbb {R}}^{I_1 \times I_2} \), is denoted as a pair of indices \((i_1, i_2)\). For a tensor \({\mathcal {X}}\), the unfolding of \({\mathcal {X}}\) is defined by
The 1- and 2-mode products of a tensor by a matrix \(\varvec{U}^{\top }\) are given by
respectively. For a tensor \({\mathcal {X}}\) in the tensor space \({\mathbb {R}}^{I_1} \otimes {\mathbb {R}}^{I_2}\), using two matrices, we have the TTP
which projects \({\mathcal {X}}\) to a lower-dimensional tensor space. Replacing the tensor \({\mathcal {X}}\) with \(m=I_1\) and \(n=I_2\), and a matrix \(\varvec{X}\in {\mathbb {R}}^{m\times n}\), we have the TTP
for the second-order tensor with a matrix representation. From the reduced matrix \(\varvec{Y}\), we have the reconstruction given by
For a collection of matrices \(\{\varvec{X}_i\}_{i=1}^N \in {\mathbb {R}}^{m\times n}\) satisfying with zero expectation condition \(\mathrm {E} (\varvec{X}_i ) = 0\), the orthogonal projection-based data reduction
where \(\varvec{U}=[\varvec{u}_1,\ldots ,\varvec{u}_m]\) and \(\varvec{V}=[\varvec{v}_1,\ldots ,\varvec{v}_n]\), is performed by minimising the criterion
and maximising the criteria
with respect to the conditions
where \(\varvec{I}_m\) and \(\varvec{I}_n\) are the identity matrices in \({\mathbb {R}}^{m\times m}\) and \({\mathbb {R}}^{n\times n}\), respectively.
Eigen decomposition problems are derived by computing the extremals of
For covariant matrices \(\varvec{M} = \frac{1}{N}\sum _{i=1}^N \varvec{XX}^{\top }\) and \(\varvec{N} = \frac{1}{N}\sum _{i=1}^N \varvec{X}^{\top }\varvec{X}\), the optimisation of \(J_-\) and \(J_+\) derives the eigenvalue decomposition
where \(\varvec{{\varSigma }}\in {\mathbb {R}}^{m\times m}\) and \(\varvec{\varLambda }\in {\mathbb {R}}^{n\times n}\) are diagonal matrices satisfying the relationships \(\lambda _i=\sigma _i\) and \(\mathrm {rank}(\varvec{M})=\mathrm {rank}(\varvec{N})=K\) for
The optimisation of \(J_V\) and \(J_U\) derives the eigendecomposition problems in Eq. (25).Footnote 1 Using a set of orthonormal vectors \(\{ \varvec{e}_k \}_{k=1}^K\), where only kth element of \(\varvec{e}_k\) is 1 and others are 0, we set orthogonal projection matrices \(\varvec{P}_1 = \sum _{k=1}^{k_1} \varvec{e}_{k} \varvec{e}_{k}^{\top } \) and \(\varvec{P}_2 = \sum _{k=1}^{k_2} \varvec{e}_{k} \varvec{e}_{k}^{\top }\). Using these \(\varvec{P}_1\) and \(\varvec{P}_2\), the low-rank matrix approximation [33, 34] is achieved by
where \(\varvec{P}_1\) and \(\varvec{P}_2\) select \(k_1\) and \(k_2\) bases of orthogonal matrices \(\varvec{U}\) and \(\varvec{V}\), respectively. The low-rank approximation using Eq. (28) is called the 2DSVD method in the context of image compression [14, 15]. Moreover, the method based on the transform
is called the 2DPCA [12]. This 2DPCA proposed by Yan and Zhang is not bilinear form. Therefore, this 2DPCA is not the TPCA for second-order tensors.
For the 2DSVD, we have the following theorem.
Theorem 1
The 2DSVD method is equivalent to the vector PCA method.
Proof
The equation
is equivalent to
\(\square \)
Furthermore, the 2DDCT is an acceptable approximation of the 2DSVD since the 2DDCT is an acceptable approximation of the PCA for the reduction of images [1, 28, 29]. Moreover, the projection that selects \(K=k_1 k_2\) bases of the tensor space spanned by \(\varvec{u}_i \otimes \varvec{v}_j\), \(i=1,2, \ldots , m\) and \(j=1,2,\ldots ,n\), is
where \(\varvec{W}\) and \(\varvec{P}\) are an orthogonal matrix and a projection matrix, respectively. Therefore, the 2DSVD is equivalent to the TPCA for matrices because matrices are second-order tensors. In our application, an \(n \times n\) digital array is directly compressed by the 2DDCT-II with order \(\mathcal {O}(n^2)\). If we apply the fast Fourier transform to the computation of the 2DDCT-II, the computational complexity is \(\mathcal {O}(n \log n)\).
3.2 Three-dimensional singular value decomposition
A third-order tensor \({\mathcal {X}}\in {\mathbb {R}}^{I_1 \times I_2 \times I_3}\), which is the array \(\varvec{X} = (x_{i_1,i_2,i_3}) \in {\mathbb {R}}^{I_1 \times I_2 \times I_3} \), is denoted as a triple of indices \((i_1, i_2, i_3)\). Here we summarise the HOSVD for third-order tensors. For a collection of tensors \(\{{\mathcal {X}}_i\}_{i=1}^N \in {\mathbb {R}}^{I_1 \times I_2 \times I_3}\) satisfying the zero expectation condition \(\mathrm {E} ({\mathcal {X}}_i )\) \(= 0\), we compute the
where \(\varvec{U}^{(j)}=[\varvec{u}^{(j)}_1,\dots ,\varvec{u}^{(j)}_{I_j}]\), that minimises the criterion
and maximises the criteria
with respect to the conditions
where \(\varvec{I}_j, \ j=1, 2, 3\), is the identity matrices in \({\mathbb {R}}^{I_j\times I_j}\). For this criterion, by fixing two of \(\varvec{U}^{(1)}\), \(\varvec{U}^{(2)}\) and \(\varvec{U}^{(3)}\), we have the following criteria
where \({\mathcal {X}}_{i,(j)}, \ j=1, 2, 3\), are the j-mode unfolded tensor \({\mathcal {X}}_i\), with respect to Eq. (36).
Eigendecomposition problems are derived by computing the extremal of
As an extension of the two-dimensional problem, we define the system of minimisation problems
For matrices \(\varvec{M}^{(j)} = \frac{1}{N} \sum _{i=1}^N {\mathcal {X}}_{i,(j)}{\mathcal {X}}_{i,(j)}^{\top }\), \(j=1,2,3\), the optimisation of \(J_-\) derives the eigenvalue decomposition
where \(\varvec{{\varSigma }}^{(j)} \in {\mathbb {R}}^{I_j \times I_j}, \ j=1, 2, 3\), are diagonal matrices satisfying the relationships \(\sigma ^{(j)}_k =\sigma ^{(j')}_k\), \(k \in \{ 1, 2, \ldots , K\}\), \(K=\mathrm {rank}(\varvec{M}^{(1)})=\mathrm {rank}(\varvec{M}^{(2)})=\mathrm {rank}(\varvec{M}^{(3)})\) for
The optimisation of each \(J_j\) derives the eigendecomposition problems in Eq. (40). However, for the optimisation of \( \{ J_j \}_{j=1}^3\), there is no closed-form solution to this maximisation problem [23, 24]. Algorithm 1 is the iterative procedure of the MPCA. For Algorithm 1, we have the following property.
Property 1
The MPCA without iteration is equivalent to the HOSVD if dimensions of a projected tensor are coincident with ones of the modes of the original tensor.
For third-order tensors, there are 3! combinations in selecting the order of modes \(\xi _1, \xi _2\) and \(\xi _3 \) in a tensor-to-tensor projection for Algorithm 1. On the other hand, one combination exists in selecting the order of modes for the second-order tensors.
Property 2
For third-order tensors, the selection of order of modes does not effect to the results of a tensor-to-tensor projection.
From these two properties, we adopt Algorithm 1 [10] to solve the optimisation of \( \{ J_j \}_{j=1}^3\). For a set of orthonormal vectors \(\{ \varvec{e}_k \}_{k=1}^K\), where only kth element of \(\varvec{e}_k\) is 1 and others are 0, we set orthogonal projection matrices \(\varvec{P}^{(j)} = \sum _{k=1}^{k_j} \varvec{e}_{k} \varvec{e}_{k}^{\top }\) for \(j=1, 2, 3\). Using these \(\{ \varvec{P}^{(j)} \}_{j=1}^3\), the low-rank tensor approximation [24] is achieved by
where \(\varvec{P}^{(j)}\) selects \(k_j\) bases of orthogonal matrices \(\varvec{U}^{(j)}\). The low-rank approximation using Eq. (42) is used for compression in the TPCA.
For the HOSVD for third-order tensors, we have the following theorem.
Theorem 2
The HOSVD method is equivalent to the vector PCA method.
Proof
The equation
is equivalent to
\(\square \)
This theorem implies that the 3DDCT is an acceptable approximation of the HOSVD for third-order tensors since this is the analogy of the approximation of the PCA by the 2DDCT [29].
Furthermore, we have the following theorem.
Theorem 3
The compression computed by the HOSVD is equivalent to the compression computed by the TPCA.
Proof
The projection that selects \(K=k_1 k_2 k_3\) bases of the tensor space spanned by \(u^{(1)}_{i_1} \circ u^{(2)}_{i_2} \circ u^{(3)}_{i_3}, \ i_j = 1, 2, \ldots , k_j\) for \(j= 1, 2, 3\), is
where \(\varvec{W}\) and \(\varvec{P}\) are an orthogonal matrix and a projection matrix, respectively. Therefore, HOSVD is equivalent to TPCA for third-order tensors. \(\square \)
In our application, an \(n \times n \times n\) digital array is directly compressed by the 3DDCT-II with order \(\mathcal {O}(n^3)\). If we apply the fast Fourier transform to the computation of the 3DDCT-II, the computational complexity is \(\mathcal {O}(n \log n)\).
4 Tensor subspace method
4.1 Second-order tensors
As an extension of the subspace method for vector data, we introduced a linear tensor subspace method for a matrix called the 2DTSM [28]. For a matrix \(\varvec{X}\), setting \(\varvec{P}_{\mathrm {L}}\) and \(\varvec{P}_{\mathrm {R}}\) to be orthogonal projections, we call the operation
the orthogonal projection of \(\varvec{X}\) to \(\varvec{Y}\). Therefore, using this expression for a collection of matrices \(\{\varvec{X}_i \}_{i=1}^N\), such that \(\varvec{X}_i \in {{\mathbb {R}}^{m \times n}}\) and \(\mathrm {E}(\varvec{X}_i)=0\), the solutions of
define a bilinear subspace that approximates \(\{\varvec{X}_i \}_{i=1}^N\). Here, the norm \(\Vert \varvec{X} \Vert _{\mathrm {F}}\) for matrix \(\varvec{X}\) represents the Frobenius norm. Therefore, using the solutions of Eq. (47), if a query matrix \(\varvec{G}\) satisfies the condition
we conclude that \(\varvec{G} \in \mathcal {C}_k(\delta )\), \(k=1,2, \ldots , N_{\mathcal {C}}\) for \(N_{\mathcal {C}}\) categories if
In practical computation to find the projections \(P_{\mathrm {L}}\) and \(P_{\mathrm {R}}\) in Eq. (47), we adopt the MEV [13] method. This is a projection considering the distributions of column and row vectors of sampled images. We define two matrices
Using these two matrices, we have
Furthermore, for the two matrices \(\varvec{M}_{\mathrm {r}}\) and \(\varvec{M}_{\mathrm {c}}\), using the Lagrange multipliers \(\varLambda _{\mathrm {r}}\) and \(\varLambda _{\mathrm {c}}\), we find projections satisfying
where \(\varvec{I}\) is the identity matrix. The solutions of Eqs. (54) and (55) are given as the solutions of the eigenproblems of \(\varvec{M}_{\mathrm {r}}\) and \(\varvec{M}_{\mathrm {c}}\), respectively. We set \(\{ \varvec{u}_j \}_{j=1}^{k_1} \) and \(\{ \varvec{v}_j \}_{j=1}^{k_2} \) as the eigenvectors of \(\varvec{M}_{\mathrm {r}}\) and \(\varvec{M}_{\mathrm {c}}\), respectively. We define the eigenvectors of \(\varvec{M}_{\mathrm {r}}\) and \(\varvec{M}_{\mathrm {c}}\) as \(\Vert \varvec{u}_j \Vert _2 =1\) and \(\Vert \varvec{v}_j \Vert _2=1\) for eigenvalues \(\lambda _1^{\mathrm {r}} \ge \lambda _2^{\mathrm {r}} \ge \cdots \ge \lambda _j^{\mathrm {r}} \ge \cdots \ge \lambda _n^{\mathrm {r}}\) and \(\lambda _1^{\mathrm {c}} \ge \lambda _2^{\mathrm {c}} \ge \cdots \ge \lambda _j^{\mathrm {c}} \ge \cdots \ge \lambda _n^{\mathrm {c}}\), respectively. Therefore, for given numbers \(k_1 \le m\) and \(k_2 \le n\), the operators \(\varvec{P}_{\mathrm {L}}\) and \(\varvec{P}_{\mathrm {R}}\) are defined as \(\varvec{P}_{\mathrm {L},k_1} = [\varvec{u}_1,\varvec{u}_2,\ldots ,\varvec{u}_{k_1}]\) and \(\varvec{P}_{\mathrm {R},k_2} = [\varvec{v}_1,\varvec{v}_2,\dots , \varvec{v}_{k_2}]\) as the matrices consist of each set of eigenfunctions, respectively. These obtained projections are equivalent to the projections obtained by 2DSVD [14] using Eq. (28). Practically, the computational complexity of solving the eigendecomposition problem for an \(n \times n\)-matrix is \(\mathcal {O}(n^3)\).
4.2 Third-order tensors
As an extension of the subspace method for vector data, we introduce a new linear tensor subspace method for third-order tensors. This method is a three-dimensional version of the 2DTSM [28]. For a third-order tensor \({\mathcal {X}}\), setting \(\varvec{P}_j, \ j=1, 2, 3\), to be orthogonal projections, we call the operation
the orthogonal projection of \({\mathcal {X}}\) to \({\mathcal {Y}}\). Therefore, using this expression for a collection of matrices \(\{{\mathcal {X}}_i \}_{i=1}^M\), such that \({\mathcal {X}}_i \in {{\mathbb {R}}^{I_1 \times I_2 \times I_e}}\) and \(\mathrm {E}({\mathcal {X}}_i)=0\), the solutions of
define a trilinear subspace that approximates \(\{{\mathcal {X}}_i \}_{i=1}^M\). Here, the norm \(\Vert {\mathcal {X}} \Vert _{\mathrm {F}}\) for matrix \({\mathcal {X}}\) represents the Frobenius norm. Therefore, using the solutions of Eq. (57), if a query tensor \(\mathcal {G}\) satisfies the condition
we conclude that \(\mathcal {G} \in \mathcal {C}_k(\delta )\), \(k=1,2, \ldots , N_{\mathcal {C}}\) for \(N_{\mathcal {C}}\) categories if
For the practical computation of projections \(\{ \varvec{P}_{k,j} \}_{j=1}^3\), we adopt the iterative method described in Algorithm 1.
5 Numerical examples
5.1 Second-order tensors
To validate the relation between the 2DDCT and the MEV method (2DSVD), we compute the recognition rate using six image datasets: cropped versions of the extended YaleB dataset [35], ORL face dataset [36], ETH80 dataset [4], NEC animal dataset [37], MNIST dataset [38] and ETL9G character dataset [39]. Figure 5 shows examples of images belonging to the same class in each dataset. For the validation, we compress images in these datasets by the MEV method and the 2DDCT. Table 2 shows the sizes and the number of images in each dataset and the parameters used in the compression.
Using the compressed images belonging to the same class in each dataset, we compute the cumulative contribution ratio (CCR) for the eigenvalues of the covariance matrices of the 1- and 2-modes for the images.
Figure 6 shows the CCRs for the six datasets. In Fig. 6a–f, we define the compression ratio as \(32^2/k^2\) for the case of using \(k \in \{1, 2, \ldots , 32\}\) eigenvectors, corresponding to the k largest eigenvalues of each mode to construct the linear subspace. As shown in Fig. 6a–d, the CCR curves for the MEV method and the 2DDCT are coincident. In Fig. 6b, the CCR curves for the 1- and 2-modes are coincident for the MEV method and the 2DDCT. This means that the eigenvalues of the 1- and 2-modes are coincident. In Fig. 6a, c–f, the CCR curves are approximately the same except for a few of the largest eigenvalues.
Figure 7 summarises the computational time of the dimension reduction for six datasets. For computation, we use Intel Xeon X5570 Quad core 2.93 GHz. For computation of 2DDCT, we construct three DCT-II matrices without FFT. Note that the computational times of TTP in the MEV and the 2DDCT are the same. In Fig. 7, computational time of the 2DDCT is smaller than one of the MEV.
For the validation, we compute the recognition rate using the original and dimension-reduced images of the six datasets with the 2DTSM as the classifier. The MNIST dataset is predivided into training and test data before it is distributed. For the YaleB, ORL, ETH80 and NEC datasets, images labelled with even numbers are used as training data and the other images are used as test data. The recognition rate is defined as the successful label estimation ratio for 1000 label estimations. In each estimation of a label for a query, queries are randomly chosen from the test data. For both the 1- and 2-modes, we evaluated the results for linear subspaces with one to 32 dimensions.
Figure 8a–f shows the recognition rates for the original and compressed images in the six datasets. In Fig. 8, we define the compression ratio as the original dimension divided by \(k^2\), where k is the number of selected principal axes for both the 1- and 2-modes. According to these results, the MEV method and the 2DDCT give almost the same recognition rate.
5.2 Third-order tensors
To validate the relation between the HOSVD and the 3DDCT, we compute recognition rates using the OU-ISIR dataset [2] and the computational anatomy (CA) dataset. Figure 9 shows examples of sequences of silhouette images from two different categories in the OU-ISIR dataset. Figure 10 shows examples of voxel images of human livers in the CA datasets. In Fig. 10, voxel images are rendered using the computed tomography (CT) images of human livers. Table 3 summarises the sizes of tensors of the two datasets. For the compression of the silhouette-image sequences, we use the HOSVD and the 3DDCT. For the practical computation of the HOSVD, we use the iterative method described in Algorithm 1 [10]. If we set the number of iterations to 0 in Algorithm 1, we have the three-dimensional version of the MEV method. If we set the number of bases to the size of the original tensors in Algorithm 1, we call the method full projection (FP). If we set the number of bases to less than the size of the original tensors in Algorithm 1, we call the method full projection truncation (FPT).
Firstly, we observe the properties of the iterative method. Using sequences of silhouette images from a category, we compute the sum of the energies of projected tensors after k iterations and the CCR of the eigenvalues for the three modes. For the computation of tensor products, we select different orders of selection of the modes. Since there are three modes, we have \(3!=6\) orders. For FPT, we set the numbers of bases for each mode to \(64 \times 64 \times 64\) and \(32 \times 32 \times 32\). Figure 11 shows the sum of the energies \({\varPsi }_k\) after every 10 iterations for the FP and FPT with the six different orders of computation of the tensor projection. Figure 12 shows the CCRs of the three modes in the FP for the different orders of computation of the tensor projections. Figure 13 summarises the CCRs of the three modes for projections to the three different sizes. Figure 14 shows the CCRs of each mode for FP and FPT.
In Fig. 11, the iterations do not significantly affect the sum of the energies of the projected tensors. According to both Figs. 11 and 12, changing the order of computation of the tensor projections does not give different results. In Fig. 13, the CCRs of the three decompositions are almost coincident in the three modes. Figure 14 shows that the eigenvalues obtained by FP and FPT are not coincident. The decomposition of tensors for the FPT gives larger eigenvalues with a smaller number of bases than those obtained by FP.
Next, we compute the CCRs of the eigenvalues obtained by 10 iterations of Algorithm 1 for compressed tensors. For the compression of the tensors from \(128\times 88 \times 90\) to \(32 \times 32 \times 32\), we adopt the FP, the FPT and the 3DDCT. Figure 15 shows the CCRs of each mode for the three types of compressed tensor. The tensors compressed by the 3DDCT give larger eigenvalues than those compressed by the FP and the FPT with a smaller number of bases. The FP and the FPT gives the same CCRs for each mode.
Thirdly, we compute the recognition rate of the sequences of silhouette images by the TSM. In this validation, we use the original sizes of the tensors and compressed tensors for comparison. For the compression, we adopt the HOSVD, the FP, the FPT and the 3DDCT. Using these four methods, we compress the tensors to the sizes \(32 \times 32 \times 32\), \(16 \times 16 \times 16\) and \(8 \times 8 \times 8\). The OU-ISIR dataset contains sequences of images of 34 people with nine different walking speeds. We use the sequences with walking speeds of 2, 4, 6, 8 and 10 km/h for learning data and the sequences with walking speeds of 3, 5, 7 and 9 km/h for test data. The recognition rate is defined as the successful label estimation ratio for 1000 label estimations. In each estimation of a label for a query, queries are randomly chosen from the test dataset. For the 1-, 2- and 3-modes, we evaluate the results for multilinear subspaces with sizes from one to the dimension of the compressed tensors.
Figure 16a–c shows the recognition rates for the four compression methods with three different sizes of the compressed tensors of OU-ISIR dataset. For the images of size \(32 \times 32 \times 32\) shown in Fig. 16a, the recognition rates for all four types of compressed tensor are almost coincident with those of the original tensors, if the compression ratio is higher than \(10^3\). If the compression ratio is less than \(10^3\), the recognition ratio of the FPT is higher than those of the HOSVD, the FP and the 3DDCT. The recognition ratio of the 3DDCT is lower than those of other methods since the silhouette images are binary images. For the images of sizes \(16 \times 16 \times 16\) and \(8 \times 8 \times 8\) shown in Fig. 16b, c, although the recognition rates for the four types of compressed tensor are almost the same, the recognition rates are smaller than those for the original tensors. This recognition property depends on the size of the images, and the images used for the comparison are too small to evaluate our methods of the recognition of sequences of silhouette images. In Fig. 16a–c, the HOSVD and the FP give the same recognition rate. These results imply that the decomposition for the FP is independent of the number of iterations.
Fourthly, we compute the recognition rate of the voxel images of human livers by the TSM. For the computation, we use the CA dataset. The CA dataset contains livers of 25 males and 7 females. We use the voxel images of livers of 13 males and 4 females as training data. The residual voxel images are used as test data. In the recognition, we estimate the gender of livers. The recognition rate is defined as the successful estimation ratio for 1000 gender estimations. In each estimation of a gender for a query, queries are randomly chosen from the test dataset. For the 1-, 2- and 3-modes, we evaluate the results for multilinear subspaces with sizes from one to the dimension of the compressed tensors.
Figure 16d–f shows the recognition rates for the four compression methods with three different sizes of the compressed tensors. For the voxel images of sizes \(32 \times 32 \times 32\) and \(16 \times 16 \times 16\) shown in Fig. 16d, e, the recognition rates for all four types of compressed tensor are almost coincident with those of the original tensors. For the voxel images of sizes \(8 \times 8 \times 8\) shown in Fig. 16f, the recognition rates for all four types of compressed tensor are almost coincident with those of the original tensors, if the compression ratio is higher than \(10^4\). Compared to the recognition rates of the sequences of silhouette images, the 3DDCT gives better approximation for ones of the volume data of livers since livers are essentially volumetric objects, which consists of textures of liver tissues.
Finally, using Intel Xeon X5570 Quad core 2.93 GHz, we compare the computational times of the four reduction methods for OU-ISIR data and CA data. In the comparison, for TPCA, we set 3 iterations for Algorithm 1. For computation of 3DDCT, we construct three DCT-II matrices. Therefore, we do not use FFT. Note that the computational times of TTP in the four methods are the same. Therefore, we show only the mean of the computational time. Figure 17 shows the comparison of computational times for two datasets. In both cases for OU-ISIR and CA datasets, 3DDCT gives the fastest construction of projection matrices in the four methods.
For voxel images, the 3DDCT gives an acceptable approximation of the HOSVD, the FP and the FPT in the context of pattern recognition. Even for sequences of binary silhouette images, the 3DDCT gives an acceptable approximation of the HOSVD, the FP and the FPT in the context of pattern recognition. Moreover, from Figs. 14, 15, and 16a–c, changes in the energies of the projected tensors and the CCRs of the eigenvalues in the decomposition of tensors in these methods are not important in the context of pattern recognition.
6 Conclusions
We first clarified the equivalence between Nth-order tensor principal component analysis and N-dimensional singular value decomposition for \(N=2,3\). Second, we introduced the marginal eigenvalue method and the iterative algorithm as practical computation methods for two- and three-dimensional singular value decomposition, respectively. Furthermore, we introduced the N-dimensional discrete cosine transform as an approximation of N-dimensional singular value decomposition. Finally, to evaluate the effects of the N-dimensional discrete cosine transform and N-dimensional singular value decomposition for \(N=2,3\) on tensor pattern recognition, we presented two numerical examples.
For the first example, we presented two validations for two-dimensional images. Using the images compressed by the marginal eigenvalue method and two-dimensional discrete cosine transform, we computed the cumulative contribution ratio of the eigenvalues of a tensor subspace as the first validation. As the second validation, we computed the accuracy of image pattern recognition for images compressed by the marginal eigenvalue method and two-dimensional discrete cosine transform. All the results in these two validations demonstrated the equivalent performance of the marginal eigenvalue method and two-dimensional discrete cosine transform for two-dimensional image pattern recognition.
For the second example, we presented two validations for sequences of the two-dimensional images and voxel images of human livers. Using the sequences of binary images compressed by the iterative algorithm of the higher-order singular value decomposition and the three-dimensional discrete cosine transform, we computed the cumulative contribution ratio of the eigenvalues of a tensor subspace as the first validation. As the second validation, using the sequence images and voxel images, we computed the accuracy of tensor pattern recognition for tensors compressed by the iterative algorithm and the three-dimensional discrete cosine transform for the sequences and volumetric data. All the results in these two validations demonstrated the equivalent performance of the higher-order singular value decomposition and three-dimensional discrete cosine transform for third-order tensor pattern recognition. Furthermore, for the decomposition procedure, the results showed that tensor projection is independent of the order of selection of the modes in tensor projections. Moreover, for the sequences compressed the higher-order singular value decomposition, these results showed that the cumulative contribution ratio and recognition ratio are independent of the number of iterations in the decomposition procedure.
These two numerical examples illustrated that the N-dimensional discrete cosine transform can be an acceptable approximation for N-dimensional singular value decomposition for \(N=2,3\) in tensor pattern recognition if we adopt the Euclidean distance as the metric of the pattern space. These examples also imply that the approximation of the higher-order singular value decomposition by the discrete cosine transform may be valid for tensors with order higher than the third order without an iterative computation method. These approximations are useful for the practical and fast computation of tensor recognition.
References
Itoh, H., Imiya, A., Sakai, T.: Low-dimensional tensor principle component analysis. Proc. CAIP Part (I) 9256, 715–726 (2015)
Makihara, Y., Mannami, H., Tsuji, A., Hossain, M.A., Sugiura, K., Mori, A., Yagi, Y.: The OU-ISIR gait database comprising the treadmill dataset. IPSJ Trans. Comput. Vis. Appl. 4, 53–62 (2012)
von Siebenthal, M., Cattin, P.H., Gamper, U., Lomax, A., Székely, G.: 4D MR imaging using internal respiratory gating. In: Proceedings of the MICCAI. pp. 336–343 (2005)
Leibe, B., Schiele, B.: Analyzing appearance and contour based methods for object categorization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 409–415 (2003)
Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111, 98–136 (2015)
Boyer, K.L., Ünsalan, C.: Multispectral Satellite Image Understanding: From Land Classification to Building and Road Detection. Springer, New York (2012)
Ely, G., Aeron, S., Hao, N., Kilmer, M.E.: 5D seismic data completion and denoising using a novel class of tensor decompositions. Geophysics 80(4), V83–V95 (2015)
Cohen, N., Shashua, A.: Simnets: a generalization of convolutional networks. In: Proceedings of the NIPS Workshop on Deep Learning (2014)
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., Le, Q.V., Ng, A.Y.: Large scale distributed deep networks. In: Proceedings of the NIPS, pp 1232–1240 (2012)
Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: MPCA: multilinear principal component analysis of tensor objects. IEEE Trans. Neural Netw. 19(1), 18–39 (2008)
Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: A survey of multilinear subspace learning for tensor data. Pattern Recognit. 44, 1540–1551 (2011)
Yang, J., Zhang, D., Frangi, A.F., Yang, J.-Y.: Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans. PAMI 26, 131–137 (2004)
Otsu. N.: Mathematical studies on feature extraction in pattern recognition. PhD thesis, Electrotechnical Laboratory (1981)
Aase, S.O., Husoy, J.H., Waldemar, P.: A critique of SVD-based image coding systems. In: Proceedings of the IEEE International Symposium on Circuits and Systems vol. 4, pp. 13–16 (1999)
Ding, C., Ye, J.: Two-dimensional singular value decomposition (2DSVD) for 2D maps and images. In: Proceedings of the SIAM International Conference on Data Mining, pp 32–43 (2005)
Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1996)
Ye, J., Janardan, R., Qi, L.: GPCA: an efficient dimension reduction scheme for image compression and retrieval. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 354–363 (2004)
Helmke, U., Moore, J.B.: Singular-value decomposition via gradient and self-equivalent flows. Linear Algebra Appl. 169, 223–248 (1992)
Moore, J.B., Mahony, R.E., Helmke, U.: Numerical gradient algorithms for eigenvalue and singular value calculations. SIAM J. Matrix Anal. Appl. 15, 881–902 (1994)
Tucker, L.: Some mathematical notes on three-mode factor analysis. Psychometrika 31(3), 279–311 (1966)
Kroonenberg, P.M., Leeuw, J.: Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 45(1), 69–97 (1980)
Cichoki, A., Zdunek, R., Phan, A.H., Amari, S.: Nonnegative Matrix and Tensor Factorizations. Wiley, Hoboken (2009)
Lathauwer, L.D., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
Lathauwer, L.D., De Moor, B., Vandewalle, J.: On the best rank-1 and rank-(\(r_1, r_2, r_n\)) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21(4), 1324–1342 (2000)
Inoue, K., Hara, K., Urahama, K.: Robust multilinear principal component analysis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 591–597 (2009)
Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: Uncorrelated multilinear principal component analysis for unsupervised multilinear subspace learning. IEEE Trans. Neural Netw. 20(11), 1820–1836 (2009)
Allen, G.I.: Sparse higher-order principal components analysis. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 27–36 (2012)
Itoh, H., Sakai, T., Kawamoto, K., Imiya, A.: Topology-preserving dimension-reduction methods for image pattern recognition. In: Proceedings of the Scandinavian Conference on Image Analysis, pp. 195–204 (2013)
Oja, E.: Subspace Methods of Pattern Recognition. Research Studies Press, Baldock (1983)
Hamidi, M., Pearl, J.: Comparison of the cosine and fourier transforms of Markov-1 signals. IEEE Trans. Acoust. Speech Signal Process. 24(5), 428–429 (1976)
Wang, Y., Gong, S.: Tensor discriminant analysis for view-based object recognition. Proc. CVPR 3, 33–36 (2006)
Hua, G., Viola, P.A., Drucker, S.M.: Face recognition using discriminatively trained orthogonal rank one tensor projections. Proc. CVPR (2007)
Ye, J.: Generalized low rank approximations of matrices. In: Proceedings of the ICML (2004)
Liang, Z., Shi, P.: An analytical algorithm for generalized low-rank approximations of matrices. Pattern Recognit. 38, 2213–2216 (2005)
Georghiades, A.S., Belhumeur, P.N., Kriegman, D.J.: From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans. PAMI 23, 643–660 (2001)
Samaria, F., Harter, A.: Parameterisation of a stochastic model for human face identification. In: Proceedings of the IEEE Workshop on Applications of Computer Vision (1994)
Mobahi, H., Collobert, R., Weston, J.: Deep learning from temporal coherence in video. In: Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Saito, T., Yamada, H., Yamada, K.: On the data base ETL9 of handprinted characters in JIS Chinese characters and its analysis. IEEJ J. Ind. Appl. 68(4), 757–764 (1985)
Acknowledgments
This research was supported by the “Multidisciplinary Computational Anatomy and Its Application to Highly Intelligent Diagnosis and Therapy” project funded by a Grant-in-Aid for Scientific Research on Innovative Areas from MEXT, Japan, and by Grants-in-Aid for Scientific Research funded by the Japan Society for the Promotion of Science.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Itoh, H., Imiya, A. & Sakai, T. Pattern recognition in multilinear space and its applications: mathematics, computational algorithms and numerical validations. Machine Vision and Applications 27, 1259–1273 (2016). https://doi.org/10.1007/s00138-016-0806-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-016-0806-2