1 Introduction

In this paper, we clarify the equivalence between Nth-order tensor principal component analysis (TPCA) and N-dimensional singular value decomposition (NDSVD) for \(N=2, 3\). We also show that the NDSVD is theoretically equivalent to classical principal component analysis (PCA), which is based on a vector representation. We refer to the classical PCA for vectors as vector PCA. Furthermore, for the practical computation in the 2DSVD, we introduce marginal eigenvector (MEV) method, which was proposed for image compression in 1981 by Otsu. Moreover, we present an extension of the properties represented in second-order tensors [1] to third-order tensors. These results imply that the TPCA for Nth-order tensors can be approximated by the N-dimensional discrete cosine transform (NDDCT). Table 1 summarises the abbreviations that we use in this paper.

A pattern is assumed to be a square integrable function defined on a finite support in N-dimensional Euclidean space. For planar and volumetric pattern, the dimensions of Euclidean spaces are two and three, respectively. Organs are essentially spatial textures which are functions defined in three-dimensional Euclidean space. Furthermore, for video sequence [2] and volumetric sequence [3], the dimensions of the data spaces are three and four, respectively, since in these applications for planar and spatio-temporal data are focused to analysis. Moreover, planar multichannel images [4, 5] are also expressed as three-way arrays. For these data, elements of two-dimensional array use an additional axis to express frequencies of elements. Multichannel image pattern recognition has been a central issue in remote sensing of earth and planets [6]. In seismic data analysis, the dimension of the data space is five, since waves stated by a planar source array migrated to planar receiver array are focused to analyse [7].

For numerical computation, we deal with sampled patterns. In traditional pattern recognition, these sampled patterns are embedded in an appropriate-dimensional Euclidean space as vectors. The other way is to deal with sampled patterns as higher-dimensional array data. These array data are expressed by tensor to preserve multilinearity of function in the original pattern space. Tensors allow expressing multidimensional array data in multilinear forms. Figure 1 illustrates the relation among sampling, vectors and tensor representation for multilinear structure.

For applications of modern pattern recognition techniques such as deep learning [8] and machine learning for big data [9], we are mathematically and numerically required to evaluate the performance of tensor-based pattern recognition of multilinear data. Importantly, for fast image pattern recognition, a compact representation of these image data is desirable. Tensor expressions fulfil these requirements in applications of pattern recognition of multidimensional array data.

Table 1 Glossary of abbreviations
Fig. 1
figure 1

Sampling, vectors and tensors. The sampled value \(f({\varDelta }\varvec{z}), \varvec{z} \in \mathbb {Z}^n\) of a function \(f(\varvec{x}), \varvec{x} \in {\mathbb {R}}^n\) yields an array \(f_{\varvec{z}}, \varvec{z} \in \mathbb {Z}^{m \times m \times \dots \times m}\). This array \(f_{\varvec{z}}\) is expressed as a tensor \(\mathcal {F}\) to preserve multilinearity of \(f(\varvec{x})\). Interpolation procedure reconstructs \(f(\varvec{x})\) from \(Sf(\varvec{x})\) through \(\mathcal {F}\). The vector \(\varvec{f}\) whose elements are sample values of \(f(\varvec{z})\) is constructed from \(\mathcal {F}\) by vectorisation operator \(\mathrm {vec}\) to the tensor \(\mathcal {F}\)

The vector PCA method is a traditional method for data compression. It has been extended to higher-dimensional array data [10, 11]. For example, tensor PCA method constructs a small-size tensor using the orthogonal decomposition of a tensor, while the classical PCA (vector PCA) estimates a low-dimensional linear subspace using PCA. Second-order TPCA, which directly decomposes an image matrix, is the tensor PCA method [10, 11] for two-dimensional images. A survey [11] reported that there are three basic projections for a tensor. Second- and third-order TPCA use tensor-to-tensor projections consisting of 1- and 2-mode projections, and 1-, 2- and 3-mode projections for two-dimensional and three-dimensional images, respectively.

For image representation, two-dimensional principal component analysis (2DPCA) [12] has been proposed. However, the projection method in the 2DPCA is not a bilinear form since the 2DPCA uses only the 2-mode projection. The MEV method [13], which is based on both 1- and 2-mode projections, has been proposed for image compression. The 2DSVD [14, 15], which is also based on both 1- and 2-mode projections, has been proposed for image compression as an extension of the SVD [16]. The projections in the MEV method and the 2DSVD are equivalent to the tensor-to-tensor projection for a second-order tensor. This mathematical property implies that the 2DSVD is a special case of the TPCA. However, the compression rate of the 2DSVD is smaller than that of the 2DDCT [14] for the same reconstruction quality. An iterative algorithm [17] for the second-order TPCA has been proposed. This iterative algorithm is referred to as generalised principal component analysis (GPCA). This GPCA method is a two-dimensional version of the iterative algorithm for the SVD [18, 19].

The origin of the TPCA for the third-order tensors was proposed as the decomposition of tensors by Tucker [20]. For the Tucker decomposition of second- and third-order tensors, Kroonenberg and Leeuw [21] discussed the properties of convergence of alternating least squares (ALS) algorithms. In general for Tucker decomposition, orthogonality constraints on decomposed tensors are not required. Cichoki et al. [22] imposed that the existence of the constraints is the difference between the TPCA and parallel factor analysis. In Ref. [22], in addition to orthogonal constraints, sparse constraints and nonnegative constraints for tensor decomposition are studied. For practical computation in higher TPCA, higher-order singular value decomposition (HOSVD) [23] was formulated. In Refs. [23, 24], ALS algorithm for the smaller-size tensor approximation of higher-order tensors was studied. For pattern recognition, there are variants of the TPCA [10, 2527]. Multilinear principal component analysis (MPCA) [10] is an iterative algorithm used for TPCA and is the N-dimensional version of the GPCA. This iterative algorithm is the same as the ALS algorithm. Robust MPCA [25] is a robust version of TPCA for image pattern recognition including outliers. Uncorrelated MPCA [26] searches for a tensor-to-vector projection that obtains most of the variation in the original tensorial input by deciding the maximum number of uncorrelated features. Sparse higher-order principal component analysis [27] searches for the minimum number of bases for input tensors by assuming sparsity in tensor decomposition.

For image pattern recognition, the two-dimensional tensor subspace method (2DTSM) [28], which measures the similarity between an input image and each tensor subspace of a class, has been proposed as extension of the subspace method [29]. The 2DTSM adopts the MEV method to construct each tensor subspace of a class. We extend the 2DTSM for use with three-dimensional images in this paper.

To evaluate the performances of the MEV method and the 2DDCT for the dimension reduction of two-dimensional images, we compute recognition rates for image patterns. To evaluate the performances of the HOSVD and the NDDCT [30] for three-dimensional images, we compute recognition rates for image sequences in gait classification and voxel images of livers in computational anatomy. For the two-dimensional images, the results show that the MEV method and the 2DDCT have almost the same performances in terms of the recognition rates for images in six datasets. Furthermore, for the three-dimensional images, the results show that the HOSVD and the NDDCT have almost the same performances in terms of the recognition rates for images in a dataset of gait patterns.

The 2DDCT-II is used for the coding in MPEG. In MPEG, a digital image of \(n \times n\) pixels is first partitioned to \(N\times N\) blocks. Usually, the size of each block is \(8 \times 8\) pixels. Then, each \(N\times N\) block is transformed to the frequency domain using the 2DDCT-II. Finally, each transformed value in each block is encoded. If we use MPEG for a sequence of two-dimensional images, that is, for a tensor with size \(N \times N \times N\), we apply N times partitioning with block and encoding with the 2DDCT for N frames. Note that we use the 2DDCT and 3DDCT without partitioning of two- and three-dimensional images, respectively, in this paper.

2 Tensor projection for images

We briefly summarise the multilinear projection for multidimensional arrays from Ref. [11]. A Nth-order tensor \({\mathcal {X}}\) defined in \({\mathbb {R}}^{I_1 \times I_2 \times \cdots \times I_N}\) is expressed as

$$\begin{aligned} {\mathcal {X}} = \left( {x}_{i_1,i_2,\ldots ,i_N}\right) \end{aligned}$$
(1)

for \({x}_{i_1,i_2,\ldots ,i_N}\in {\mathbb {R}}\), using N indices \(i_n\). Each subscript n denotes the n-mode of \({\mathcal {X}}\). For the outer products of N vectors, if the tensor \({\mathcal {X}}\) satisfies the condition

$$\begin{aligned} {\mathcal {X}} =\varvec{u}^{(1)} \circ \varvec{u}^{(2)} \circ \cdots \circ \varvec{u}^{(N)}, \end{aligned}$$
(2)

where \(\circ \) denotes the outer product, we call this tensor \({\mathcal {X}}\) a rank-one tensor. For \({\mathcal {X}}\), the n-mode vectors, \(n=1,2,\ldots ,N\), are defined as the \(I_n\)-dimensional vectors obtained from \({\mathcal {X}}\) by varying this index \(i_n\) while fixing all the other indices. The unfolding of \({\mathcal {X}}\) along the n-mode vectors of \({\mathcal {X}}\) is defined as

$$\begin{aligned} {\mathcal {X}}_{(n)} \in {\mathbb {R}}^{I_n \times \left( I_1 \times I_2 \times \cdots I_{n-1} \times I_{n+1} \times \cdots \times I_N\right) }, \end{aligned}$$
(3)

where the column vectors of \({\mathcal {X}}_{(n)}\) are the n-mode vectors of \({\mathcal {X}}\). Figure 2 shows two examples of n-mode unfolding for second- and third-order tensors. The n-mode product \({\mathcal {X}}\times _n \varvec{U}\) of a matrix \(\varvec{U} \in {\mathbb {R}}^{J_n\times I_n}\) and a tensor \({\mathcal {X}}\) is a tensor \(\mathcal {G} \in {\mathbb {R}}^{I_1 \times I_2 \times \cdots \times I_{n-1} \times J_n \times I_{n+1}\times \cdots \times I_N}\), with elements

$$\begin{aligned} g_{i_1,i_2,\ldots , i_{n-1},j_n,i_{n+1},\ldots ,i_N} = \sum _{i_n=1}^{I_n} x_{i_1,i_2,\ldots , I_N} u_{j_n,i_n} \end{aligned}$$
(4)

by the manner in Ref. [22]. We define the inner product of two tensors \({\mathcal {X}}=(x_{i_1,i_2,\ldots ,i_N}), {\mathcal {Y}}=(y_{i_1,i_2,\ldots ,i_N}) \in {\mathbb {R}}^{I_1 \times I_2 \times \cdots \times I_N}\) by

$$\begin{aligned} \langle {\mathcal {X}}, {\mathcal {Y}} \rangle \ \ \ = \sum _{i_1} \sum _{i_2} \cdots \sum _{i_N} {x}_{i_1,i_2,\ldots ,i_N} {y}_{i_1,i_2,\ldots ,i_N}. \end{aligned}$$
(5)

Using this inner product, the Frobenius norm of a tensor \({\mathcal {X}}\) is

$$\begin{aligned} \Vert {\mathcal {X}} \Vert _{\mathrm {F}} = \sqrt{\langle {\mathcal {X}}, {\mathcal {X}} \rangle }. \end{aligned}$$
(6)

For the Frobenius norm of a tensor, we have

$$\begin{aligned} \Vert {\mathcal {X}} \Vert _{\mathrm {F}} = \Vert \mathrm {vec} \ {\mathcal {X}} \Vert _2, \end{aligned}$$
(7)

where vec and \(\Vert \cdot \Vert _2\) are the vectorisation operator and Euclidean norm of a tensor, respectively. For the two tensors \({\mathcal {X}}_1\) and \({\mathcal {X}}_2\), we define the distance between them as

$$\begin{aligned} d\left( {\mathcal {X}}_1, {\mathcal {X}}_2 \right) = \left\| {\mathcal {X}}_1 - {\mathcal {X}}_2\right\| _{\mathrm {F}}. \end{aligned}$$
(8)

Although this definition is a tensor-based measure, this distance is equivalent to the Euclidean distance between the vectorised tensors \({\mathcal {X}}_1\) and \({\mathcal {X}}_2\).

Fig. 2
figure 2

Unfoldings of second- and third-order tensors. a 1- and 2-mode unfoldings of a second-order tensor \({\mathcal {X}} \in {\mathbb {R}}^{6 \times 8}\). b 1-, 2- and 3-mode unfoldings of the third-order tensor \({\mathcal {X}} \in {\mathbb {R}}^{4 \times 5 \times 3}\)

For a tensor, a multilinear projection maps the input tensor data from one space to another space. We have three basic multilinear projections, that is, the vector-to-vector projection (VVP), tensor-to-vector projection (TVP) and tensor-to-tensor projection (TTP). The VVP is a linear projection from a vector to another vector. To use the VVP for tensors, we need to reshape tensors into vectors before the projection. The TVP, which is also referred to as the rank-one projection [31, 32], consists of elementary multilinear projections (EMPs). An EMP projects a tensor to a scalar. Using d EMPs, the TVP obtains a d-dimensional vector projected from a tensor. The TTP projects a tensor to another tensor of the same order. In this paper, we focus on methods of finding the optimal projection for the TTP.

As the tensor \({\mathcal {X}}\) is in the tensor space \({\mathbb {R}}^{I_1} \otimes {\mathbb {R}}^{I_2} \otimes \cdots \otimes {\mathbb {R}}^{I_N}\), the tensor space can be interpreted as the Kronecker product of N vector spaces \({\mathbb {R}}^{I_1},{\mathbb {R}}^{I_2}, \dots , {\mathbb {R}}^{I_N}\). To project \({\mathcal {X}} \in {\mathbb {R}}^{I_1} \otimes {\mathbb {R}}^{I_2} \otimes \cdots \otimes {\mathbb {R}}^{I_N}\) to another tensor \({\mathcal {Y}}\) in a lower-dimensional tensor space \({\mathbb {R}}^{P_1} \otimes {\mathbb {R}}^{P_2} \otimes \cdots \otimes {\mathbb {R}}^{P_N}\), where \(P_n \le I_n \) for \(n=1,2, \ldots , N\), we need N matrices \(\{ \varvec{U}^{(n)} \in {\mathbb {R}}^{I_n \times P_n} \}_{n=1}^N\). Using the N matrices, the TTP is given by

$$\begin{aligned} {{\mathcal {Y}}} = {\mathcal {X}} \times _{1} \varvec{U}^{(1) \top } \times _{2} \varvec{U}^{(2) \top } \cdots \times _{N} \varvec{U}^{(N) \top }. \end{aligned}$$
(9)

This projection is established in N steps, where at the nth step, each n-mode vector is projected to a \(P_n\)-dimensional space by \(\varvec{U}^{(n)}\). Figures 3a and 4a show the steps for the projection of second- and third-order tensors to lower-dimensional tensors, respectively. Figures 3b, c and 4b–d show the procedures used to project second- and third-order tensors, respectively.

Fig. 3
figure 3

Tensor-to-tensor projection of a second-order tensor \({\mathcal {X}} \in {\mathbb {R}}^{6 \times 8}\) to a lower-dimensional tensor \(\mathcal {Y} \in \mathbb {R}^{3 \times 3}\). a Tensor-to-tensor projection for a tensor \({\mathcal {X}}\). b 1-mode projection for \({\mathcal {X}}\) represented by a linear projection. c 2-mode projection for \({\mathcal {X}}\) represented by a linear projection

Fig. 4
figure 4

Tensor-to-tensor projection of a third-order tensor \({\mathcal {X}} \in {\mathbb {R}}^{4 \times 5 \times 3}\) to a lower-dimensional tensor \({\mathcal {Y}} \in {\mathbb {R}}^{3 \times 2 \times 1}\). a Tensor-to-tensor projection for a tensor \({\mathcal {X}}\). b 1-mode projection for \({\mathcal {X}}\) represented by a linear projection. c 2-mode projection for \({\mathcal {X}}\) represented by a linear projection. d 3-mode projection for \({\mathcal {X}}\) represented by a linear projection

3 Decomposition of tensors

3.1 Two-dimensional singular value decomposition

A second-order tensor \({\mathcal {X}}\in {\mathbb {R}}^{I_1 \times I_2}\), which is the matrix \(\varvec{X} = (x_{i_1,i_2}) \in {\mathbb {R}}^{I_1 \times I_2} \), is denoted as a pair of indices \((i_1, i_2)\). For a tensor \({\mathcal {X}}\), the unfolding of \({\mathcal {X}}\) is defined by

$$\begin{aligned} {\mathcal {X}}_{(1)} = \varvec{X} \in {\mathbb {R}}^{I_1 \times I_2}, \ \ \ {\mathcal {X}}_{(2)} = \varvec{X}^{\top } \in {\mathbb {R}}^{I_2 \times I_1}. \end{aligned}$$
(10)

The 1- and 2-mode products of a tensor by a matrix \(\varvec{U}^{\top }\) are given by

$$\begin{aligned} {\mathcal {X}} \times _1 \varvec{U}^{\top } = \varvec{U}^{\top }\varvec{X}, \ \ \ {\mathcal {X}} \times _2 \varvec{U}^{\top } = \varvec{U}^{\top }\varvec{X}^{\top }, \end{aligned}$$
(11)

respectively. For a tensor \({\mathcal {X}}\) in the tensor space \({\mathbb {R}}^{I_1} \otimes {\mathbb {R}}^{I_2}\), using two matrices, we have the TTP

$$\begin{aligned} {\mathcal {Y}} = {\mathcal {X}} \times _1 \varvec{U}^{(1)\top } \times _2 \varvec{U}^{(2)\top }, \end{aligned}$$
(12)

which projects \({\mathcal {X}}\) to a lower-dimensional tensor space. Replacing the tensor \({\mathcal {X}}\) with \(m=I_1\) and \(n=I_2\), and a matrix \(\varvec{X}\in {\mathbb {R}}^{m\times n}\), we have the TTP

$$\begin{aligned} \varvec{Y} = \varvec{U}^{(1)\top } \varvec{X} \varvec{U}^{(2)} \end{aligned}$$
(13)

for the second-order tensor with a matrix representation. From the reduced matrix \(\varvec{Y}\), we have the reconstruction given by

$$\begin{aligned} \varvec{X} = \varvec{U}^{(1)} \varvec{Y} \varvec{U}^{(2) \top }. \end{aligned}$$
(14)

For a collection of matrices \(\{\varvec{X}_i\}_{i=1}^N \in {\mathbb {R}}^{m\times n}\) satisfying with zero expectation condition \(\mathrm {E} (\varvec{X}_i ) = 0\), the orthogonal projection-based data reduction

$$\begin{aligned} \hat{\varvec{X}_i} = \varvec{U}^{\top } \varvec{X}_i \varvec{V}, \end{aligned}$$
(15)

where \(\varvec{U}=[\varvec{u}_1,\ldots ,\varvec{u}_m]\) and \(\varvec{V}=[\varvec{v}_1,\ldots ,\varvec{v}_n]\), is performed by minimising the criterion

$$\begin{aligned} J_- = \mathrm {E} \left( \Vert \varvec{X}_i-\varvec{U} \hat{\varvec{X}_i}\varvec{V}^{\top } \Vert _{\mathrm {F}}^2 \right) \end{aligned}$$
(16)

and maximising the criteria

$$\begin{aligned} J_+= & {} \mathrm {E} \left( \left\| \varvec{U}^\top \varvec{X}_i\varvec{V} \right\| ^2_{\mathrm {F}} \right) , \end{aligned}$$
(17)
$$\begin{aligned} J_V= & {} \mathrm {E} \left( \left\| \varvec{V}^\top \varvec{X}_i^\top \varvec{X}_i\varvec{V} \right\| ^2_{\mathrm {F}} \right) , \end{aligned}$$
(18)
$$\begin{aligned} J_U= & {} \mathrm {E} \left( \left\| \varvec{U}^\top \varvec{X}_i\varvec{X}_i^{\top }\varvec{U} \right\| ^2_{\mathrm {F}} \right) , \end{aligned}$$
(19)

with respect to the conditions

$$\begin{aligned} \varvec{U}^\top \varvec{U}=\varvec{I}_m \quad \mathrm {and }\quad \varvec{V}^\top \varvec{V}=\varvec{I}_n, \end{aligned}$$
(20)

where \(\varvec{I}_m\) and \(\varvec{I}_n\) are the identity matrices in \({\mathbb {R}}^{m\times m}\) and \({\mathbb {R}}^{n\times n}\), respectively.

Eigen decomposition problems are derived by computing the extremals of

$$\begin{aligned} E_-= & {} J_- +tr\left( (\varvec{I}-\varvec{V}^{\top } \varvec{V}) \varvec{\varLambda }\right) + tr\left( (\varvec{I}-\varvec{U}^{\top } \varvec{U})\varvec{{\varSigma }}\right) , \end{aligned}$$
(21)
$$\begin{aligned} E_+= & {} J_+ +tr\left( (\varvec{I}-\varvec{V}^{\top } \varvec{V})\varvec{\varLambda }\right) + tr\left( (\varvec{I}-\varvec{U}^{\top } \varvec{U})\varvec{{\varSigma }}\right) , \end{aligned}$$
(22)
$$\begin{aligned} E_V= & {} J_V+ tr\left( (\varvec{I}-\varvec{V}^{\top } \varvec{V})\varvec{\varLambda }\right) , \end{aligned}$$
(23)
$$\begin{aligned} E_U= & {} J_U+ tr\left( (\varvec{I}-\varvec{U}^{\top } \varvec{U})\varvec{{\varSigma }}\right) . \end{aligned}$$
(24)

For covariant matrices \(\varvec{M} = \frac{1}{N}\sum _{i=1}^N \varvec{XX}^{\top }\) and \(\varvec{N} = \frac{1}{N}\sum _{i=1}^N \varvec{X}^{\top }\varvec{X}\), the optimisation of \(J_-\) and \(J_+\) derives the eigenvalue decomposition

$$\begin{aligned} \varvec{M}\varvec{U}=\varvec{U}\varvec{{\varSigma }} \ \ \mathrm {and} \ \ \varvec{N}\varvec{V}=\varvec{V}\varvec{\varLambda }, \end{aligned}$$
(25)

where \(\varvec{{\varSigma }}\in {\mathbb {R}}^{m\times m}\) and \(\varvec{\varLambda }\in {\mathbb {R}}^{n\times n}\) are diagonal matrices satisfying the relationships \(\lambda _i=\sigma _i\) and \(\mathrm {rank}(\varvec{M})=\mathrm {rank}(\varvec{N})=K\) for

$$\begin{aligned} \varvec{{\varSigma }}= & {} \mathrm {diag} (\sigma _1,\sigma _2,\ldots , \sigma _K,0\ldots ,0), \end{aligned}$$
(26)
$$\begin{aligned} \varvec{\varLambda }= & {} \mathrm {diag} (\lambda _1,\lambda _2,\ldots , \lambda _K,0\ldots ,0). \end{aligned}$$
(27)

The optimisation of \(J_V\) and \(J_U\) derives the eigendecomposition problems in Eq. (25).Footnote 1 Using a set of orthonormal vectors \(\{ \varvec{e}_k \}_{k=1}^K\), where only kth element of \(\varvec{e}_k\) is 1 and others are 0, we set orthogonal projection matrices \(\varvec{P}_1 = \sum _{k=1}^{k_1} \varvec{e}_{k} \varvec{e}_{k}^{\top } \) and \(\varvec{P}_2 = \sum _{k=1}^{k_2} \varvec{e}_{k} \varvec{e}_{k}^{\top }\). Using these \(\varvec{P}_1\) and \(\varvec{P}_2\), the low-rank matrix approximation [33, 34] is achieved by

$$\begin{aligned} \varvec{Y}_i = \left( \varvec{P}_1\varvec{U}\right) ^{\top } \varvec{X}_i \left( \varvec{P}_2 \varvec{V}\right) = \varvec{L}^{\top }\varvec{X}_i \varvec{R}, \end{aligned}$$
(28)

where \(\varvec{P}_1\) and \(\varvec{P}_2\) select \(k_1\) and \(k_2\) bases of orthogonal matrices \(\varvec{U}\) and \(\varvec{V}\), respectively. The low-rank approximation using Eq. (28) is called the 2DSVD method in the context of image compression [14, 15]. Moreover, the method based on the transform

$$\begin{aligned} \varvec{Y}_i=\varvec{X}_i\varvec{R} \end{aligned}$$
(29)

is called the 2DPCA [12]. This 2DPCA proposed by Yan and Zhang is not bilinear form. Therefore, this 2DPCA is not the TPCA for second-order tensors.

For the 2DSVD, we have the following theorem.

Theorem 1

The 2DSVD method is equivalent to the vector PCA method.

Proof

The equation

$$\begin{aligned} (\varvec{P}_1\varvec{U})^{\top }\varvec{X}(\varvec{P}_2\varvec{V}) = \varvec{Y} \end{aligned}$$
(30)

is equivalent to

$$\begin{aligned} \left( \varvec{P}_2\varvec{V}\otimes \varvec{P}_1\varvec{U}\right) \mathrm {vec}\varvec{X} = \mathrm {vec} \varvec{Y}. \end{aligned}$$
(31)

\(\square \)

Furthermore, the 2DDCT is an acceptable approximation of the 2DSVD since the 2DDCT is an acceptable approximation of the PCA for the reduction of images [1, 28, 29]. Moreover, the projection that selects \(K=k_1 k_2\) bases of the tensor space spanned by \(\varvec{u}_i \otimes \varvec{v}_j\), \(i=1,2, \ldots , m\) and \(j=1,2,\ldots ,n\), is

$$\begin{aligned} \left( \varvec{P}_2\varvec{V}\otimes \varvec{P}_1\varvec{U}\right) =\left( \varvec{P}_2\otimes \varvec{P}_1\right) \left( \varvec{V}\otimes \varvec{U}\right) = \varvec{P}\varvec{W}, \end{aligned}$$
(32)

where \(\varvec{W}\) and \(\varvec{P}\) are an orthogonal matrix and a projection matrix, respectively. Therefore, the 2DSVD is equivalent to the TPCA for matrices because matrices are second-order tensors. In our application, an \(n \times n\) digital array is directly compressed by the 2DDCT-II with order \(\mathcal {O}(n^2)\). If we apply the fast Fourier transform to the computation of the 2DDCT-II, the computational complexity is \(\mathcal {O}(n \log n)\).

3.2 Three-dimensional singular value decomposition

A third-order tensor \({\mathcal {X}}\in {\mathbb {R}}^{I_1 \times I_2 \times I_3}\), which is the array \(\varvec{X} = (x_{i_1,i_2,i_3}) \in {\mathbb {R}}^{I_1 \times I_2 \times I_3} \), is denoted as a triple of indices \((i_1, i_2, i_3)\). Here we summarise the HOSVD for third-order tensors. For a collection of tensors \(\{{\mathcal {X}}_i\}_{i=1}^N \in {\mathbb {R}}^{I_1 \times I_2 \times I_3}\) satisfying the zero expectation condition \(\mathrm {E} ({\mathcal {X}}_i )\) \(= 0\), we compute the

$$\begin{aligned} \hat{{\mathcal {X}}_i} = {\mathcal {X}}_i \times _1 \varvec{U}^{(1)\top } \times _2 \varvec{U}^{(2)\top } \times _3 \varvec{U}^{(3)\top }, \end{aligned}$$
(33)

where \(\varvec{U}^{(j)}=[\varvec{u}^{(j)}_1,\dots ,\varvec{u}^{(j)}_{I_j}]\), that minimises the criterion

$$\begin{aligned} J_- = \mathrm {E} \left( \Vert {\mathcal {X}}_i- \hat{{\mathcal {X}}}_i \times _1 \varvec{U}^{(1)} \times _2 \varvec{U}^{(2)} \times _3 \varvec{U}^{(3)} \Vert _{\mathrm {F}}^2 \right) \end{aligned}$$
(34)

and maximises the criteria

$$\begin{aligned} J_+ = \mathrm {E} \left( \left\| \hat{{\mathcal {X}}_i} \right\| ^2_{\mathrm {F}} \right) , \ \end{aligned}$$
(35)

with respect to the conditions

$$\begin{aligned} \varvec{U}^{(j)\top }\varvec{U}^{(j)} =\varvec{I}_j, \end{aligned}$$
(36)

where \(\varvec{I}_j, \ j=1, 2, 3\), is the identity matrices in \({\mathbb {R}}^{I_j\times I_j}\). For this criterion, by fixing two of \(\varvec{U}^{(1)}\), \(\varvec{U}^{(2)}\) and \(\varvec{U}^{(3)}\), we have the following criteria

$$\begin{aligned} J_j = \mathrm {E} \left( \left\| \varvec{U}^{(j)\top } {\mathcal {X}}_{i,(j)}{\mathcal {X}}_{i,(j)}^{\top } \varvec{U}^{(j)} \right\| ^2_{\mathrm {F}} \right) , \end{aligned}$$
(37)

where \({\mathcal {X}}_{i,(j)}, \ j=1, 2, 3\), are the j-mode unfolded tensor \({\mathcal {X}}_i\), with respect to Eq. (36).

Eigendecomposition problems are derived by computing the extremal of

$$\begin{aligned} E_-= & {} J_- + \sum _{j=1}^N tr((\varvec{I}_j - \varvec{U}^{(j)\top }\varvec{U}^{(j)}) \varvec{{\varSigma }}^{(j)} ). \end{aligned}$$
(38)

As an extension of the two-dimensional problem, we define the system of minimisation problems

$$\begin{aligned} E_j= & {} J_j+ tr((\varvec{I}_j - \varvec{U}^{(j)\top }\varvec{U}^{(j)} ) \varvec{{\varSigma }}^{(j)}), \ j=1, 2, 3. \end{aligned}$$
(39)

For matrices \(\varvec{M}^{(j)} = \frac{1}{N} \sum _{i=1}^N {\mathcal {X}}_{i,(j)}{\mathcal {X}}_{i,(j)}^{\top }\), \(j=1,2,3\), the optimisation of \(J_-\) derives the eigenvalue decomposition

$$\begin{aligned} \varvec{M}^{(j)} \varvec{U}^{(j)}=\varvec{U}^{(j)}\varvec{{\varSigma }}^{(j)}, \end{aligned}$$
(40)

where \(\varvec{{\varSigma }}^{(j)} \in {\mathbb {R}}^{I_j \times I_j}, \ j=1, 2, 3\), are diagonal matrices satisfying the relationships \(\sigma ^{(j)}_k =\sigma ^{(j')}_k\), \(k \in \{ 1, 2, \ldots , K\}\), \(K=\mathrm {rank}(\varvec{M}^{(1)})=\mathrm {rank}(\varvec{M}^{(2)})=\mathrm {rank}(\varvec{M}^{(3)})\) for

$$\begin{aligned} \varvec{{\varSigma }}^{(j)}= & {} \mathrm {diag} \left( \lambda ^{(j)}_1,\lambda ^{(j)}_2,\ldots , \lambda ^{(j)}_K,0\ldots ,0\right) . \end{aligned}$$
(41)

The optimisation of each \(J_j\) derives the eigendecomposition problems in Eq. (40). However, for the optimisation of \( \{ J_j \}_{j=1}^3\), there is no closed-form solution to this maximisation problem [23, 24]. Algorithm 1 is the iterative procedure of the MPCA. For Algorithm 1, we have the following property.

Property 1

The MPCA without iteration is equivalent to the HOSVD if dimensions of a projected tensor are coincident with ones of the modes of the original tensor.

For third-order tensors, there are 3! combinations in selecting the order of modes \(\xi _1, \xi _2\) and \(\xi _3 \) in a tensor-to-tensor projection for Algorithm 1. On the other hand, one combination exists in selecting the order of modes for the second-order tensors.

Property 2

For third-order tensors, the selection of order of modes does not effect to the results of a tensor-to-tensor projection.

From these two properties, we adopt Algorithm 1 [10] to solve the optimisation of \( \{ J_j \}_{j=1}^3\). For a set of orthonormal vectors \(\{ \varvec{e}_k \}_{k=1}^K\), where only kth element of \(\varvec{e}_k\) is 1 and others are 0, we set orthogonal projection matrices \(\varvec{P}^{(j)} = \sum _{k=1}^{k_j} \varvec{e}_{k} \varvec{e}_{k}^{\top }\) for \(j=1, 2, 3\). Using these \(\{ \varvec{P}^{(j)} \}_{j=1}^3\), the low-rank tensor approximation [24] is achieved by

$$\begin{aligned} {{\mathcal {Y}}} = {\mathcal {X}} \times _1 \left( \varvec{P}^{(1)}\varvec{U}^{(1)}\right) ^{\top } \times _2 \left( \varvec{P}^{(2)}\varvec{U}^{(2)}\right) ^{\top } \times _3 \left( \varvec{P}^{(3)}\varvec{U}^{(3)}\right) ^{\top }, \end{aligned}$$
(42)

where \(\varvec{P}^{(j)}\) selects \(k_j\) bases of orthogonal matrices \(\varvec{U}^{(j)}\). The low-rank approximation using Eq. (42) is used for compression in the TPCA.

figure a

For the HOSVD for third-order tensors, we have the following theorem.

Theorem 2

The HOSVD method is equivalent to the vector PCA method.

Proof

The equation

$$\begin{aligned} {\mathcal {X}} \times _1 \left( \varvec{P}^{(1)}\varvec{U}^{(1)}\right) ^{\top } \times _2 \left( \varvec{P}^{(2)}\varvec{U}^{(2)}\right) ^{\top } \times _3 \left( \varvec{P}^{(3)}\varvec{U}^{(3)}\right) ^{\top } = {\mathcal {Y}} \end{aligned}$$
(43)

is equivalent to

$$\begin{aligned} \left( \varvec{P}^{(3)}\varvec{U}^{(3)}\otimes \varvec{P}^{(2)}\varvec{U}^{(2)} \otimes \varvec{P}^{(1)}\varvec{U}\right) ^{\top } \mathrm {vec}{\mathcal {X}} = \mathrm {vec} {\mathcal {Y}}. \end{aligned}$$
(44)

\(\square \)

This theorem implies that the 3DDCT is an acceptable approximation of the HOSVD for third-order tensors since this is the analogy of the approximation of the PCA by the 2DDCT [29].

Furthermore, we have the following theorem.

Theorem 3

The compression computed by the HOSVD is equivalent to the compression computed by the TPCA.

Proof

The projection that selects \(K=k_1 k_2 k_3\) bases of the tensor space spanned by \(u^{(1)}_{i_1} \circ u^{(2)}_{i_2} \circ u^{(3)}_{i_3}, \ i_j = 1, 2, \ldots , k_j\) for \(j= 1, 2, 3\), is

$$\begin{aligned} \begin{aligned}&\left( \varvec{P}^{(3)}\varvec{U}^{(3)} \otimes \varvec{P}^{(2)}\varvec{U}^{(2)} \otimes \varvec{P}^{(1)}\varvec{U}^{(1)}\right) \\&\ \ \ = \left( \varvec{P}^{(3)} \otimes \varvec{P}^{(2)} \otimes \varvec{P}^{(1)}\right) \left( \varvec{U}^{(3)} \otimes \varvec{U}^{(2)} \otimes \varvec{U}^{(1)}\right) = \varvec{P}\varvec{W}, \end{aligned} \end{aligned}$$
(45)

where \(\varvec{W}\) and \(\varvec{P}\) are an orthogonal matrix and a projection matrix, respectively. Therefore, HOSVD is equivalent to TPCA for third-order tensors. \(\square \)

In our application, an \(n \times n \times n\) digital array is directly compressed by the 3DDCT-II with order \(\mathcal {O}(n^3)\). If we apply the fast Fourier transform to the computation of the 3DDCT-II, the computational complexity is \(\mathcal {O}(n \log n)\).

4 Tensor subspace method

4.1 Second-order tensors

As an extension of the subspace method for vector data, we introduced a linear tensor subspace method for a matrix called the 2DTSM [28]. For a matrix \(\varvec{X}\), setting \(\varvec{P}_{\mathrm {L}}\) and \(\varvec{P}_{\mathrm {R}}\) to be orthogonal projections, we call the operation

$$\begin{aligned} \varvec{Y}=\varvec{P}_{\mathrm {L}}^{\top }\varvec{X}\varvec{P}_{\mathrm {R}} \end{aligned}$$
(46)

the orthogonal projection of \(\varvec{X}\) to \(\varvec{Y}\). Therefore, using this expression for a collection of matrices \(\{\varvec{X}_i \}_{i=1}^N\), such that \(\varvec{X}_i \in {{\mathbb {R}}^{m \times n}}\) and \(\mathrm {E}(\varvec{X}_i)=0\), the solutions of

$$\begin{aligned} \left( \varvec{P}_{\mathrm {L}}, \varvec{P}_R\right)= & {} \mathrm {arg} \ \mathrm {max} \ \mathrm {E} \left( \frac{ \Vert \varvec{P}_{\mathrm {L}}^{\top } \varvec{X}_i \varvec{P}_{\mathrm {R}} \Vert _{\mathrm {F}} }{\Vert \varvec{X}_i \Vert _{\mathrm {F}} }\right) \ \nonumber \\&w.r.t. \ \varvec{P}^{\top }_{\mathrm {L}} \varvec{P}_{\mathrm {L}}=\varvec{I}, \ \varvec{P}^{\top }_{\mathrm {R}} \varvec{P}_{\mathrm {R}}=\varvec{I} \end{aligned}$$
(47)

define a bilinear subspace that approximates \(\{\varvec{X}_i \}_{i=1}^N\). Here, the norm \(\Vert \varvec{X} \Vert _{\mathrm {F}}\) for matrix \(\varvec{X}\) represents the Frobenius norm. Therefore, using the solutions of Eq. (47), if a query matrix \(\varvec{G}\) satisfies the condition

$$\begin{aligned} {\mathrm {arg}} \left( \underset{i}{\max } \frac{\Vert \varvec{P}_{\mathrm {L},i}^{\top }\varvec{G}\varvec{P}_{\mathrm {R},i} \Vert _{\mathrm {F}}}{\Vert \varvec{G} \Vert _{\mathrm {F}}} \right) =\{ \varvec{P}_{{\mathrm {L}},k}, \varvec{P}_{\mathrm {R},k} \}, \end{aligned}$$
(48)

we conclude that \(\varvec{G} \in \mathcal {C}_k(\delta )\), \(k=1,2, \ldots , N_{\mathcal {C}}\) for \(N_{\mathcal {C}}\) categories if

$$\begin{aligned} {\mathcal {C}}_k (\delta ) = \left\{ \varvec{X} \, | \, \Vert P_{\mathrm {L},k}^{\top }\varvec{X}P_{\mathrm {R},k} - \varvec{X}\Vert _{\mathrm {F}} \ll \delta \right\} . \end{aligned}$$
(49)

In practical computation to find the projections \(P_{\mathrm {L}}\) and \(P_{\mathrm {R}}\) in Eq. (47), we adopt the MEV [13] method. This is a projection considering the distributions of column and row vectors of sampled images. We define two matrices

$$\begin{aligned} \varvec{M}_{\mathrm {r}}= & {} \frac{1}{N} \sum _{i=1}^N \varvec{X}_i \varvec{X}_i^{\top } \in {\mathbb {R}}^{m \times m}, \end{aligned}$$
(50)
$$\begin{aligned} \varvec{M}_{\mathrm {c}}= & {} \frac{1}{N} \sum _{i=1}^N \varvec{X}_i^{\top } \varvec{X}_i \in {\mathbb {R}}^{n \times n} . \end{aligned}$$
(51)

Using these two matrices, we have

$$\begin{aligned} {\mathrm {E}}\left( \Vert \varvec{P}_{\mathrm {L}}^{\top } \varvec{X}_i \varvec{P}_{\mathrm {R}} \Vert _{\mathrm {F}}^2 \right)= & {} \frac{1}{N}\sum _{i=1}^N \left( \varvec{P}_{\mathrm {L}}^{\top }\varvec{X}_i\varvec{P}_{\mathrm {R}} \right) \left( \varvec{P}_{\mathrm {L}}^{\top } \varvec{X}_i \varvec{P}_{\mathrm {R}} \right) ^{\top } \nonumber \\= & {} \varvec{P}_{\mathrm {L}}^{\top } \varvec{M}_{\mathrm {r}} \varvec{P}_{\mathrm {L}}, \end{aligned}$$
(52)
$$\begin{aligned} \mathrm {E}\left( \Vert \varvec{P}_{\mathrm {L}}^{\top } \varvec{X}_i\varvec{P}_{\mathrm {R}} \Vert _{\mathrm {F}}^2 \right)= & {} \frac{1}{N}\sum _{i=1}^N \left( \varvec{P}_{\mathrm {L}}^{\top }\varvec{X}_i\varvec{P}_{\mathrm {R}} \right) ^{\top } \left( \varvec{P}_{\mathrm {L}}^{\top } \varvec{X}_i \varvec{P}_{\mathrm {R}} \right) \nonumber \\= & {} \varvec{P}_{\mathrm {R}}^{\top } \varvec{M}_{\mathrm {c}} \varvec{P}_{\mathrm {R}}. \end{aligned}$$
(53)

Furthermore, for the two matrices \(\varvec{M}_{\mathrm {r}}\) and \(\varvec{M}_{\mathrm {c}}\), using the Lagrange multipliers \(\varLambda _{\mathrm {r}}\) and \(\varLambda _{\mathrm {c}}\), we find projections satisfying

$$\begin{aligned} J\left( \varvec{P}_{\mathrm {L}}\right)= & {} tr \left( \varvec{P}_{\mathrm {L}}^{\top } \varvec{M}_{\hbox {r}} \varvec{P}_{\mathrm {L}} \right) - tr \left( \left( \varvec{P}_{\mathrm {L}}^{\top } \varvec{P}_{\mathrm {L}} - \varvec{I} \right) \varLambda _{\mathrm {r}} \right) , \end{aligned}$$
(54)
$$\begin{aligned} J(\varvec{P}_Y)= & {} tr \left( \varvec{P}_{\mathrm {R}}^{\top } \varvec{M}_{\mathrm {c}} \varvec{P}_{\mathrm {R}} \right) - tr \left( \left( \varvec{P}_{\mathrm {R}}^{\top } \varvec{P}_{\mathrm {R}} - \varvec{I} \right) \varLambda _{\mathrm {c}} \right) , \end{aligned}$$
(55)

where \(\varvec{I}\) is the identity matrix. The solutions of Eqs. (54) and (55) are given as the solutions of the eigenproblems of \(\varvec{M}_{\mathrm {r}}\) and \(\varvec{M}_{\mathrm {c}}\), respectively. We set \(\{ \varvec{u}_j \}_{j=1}^{k_1} \) and \(\{ \varvec{v}_j \}_{j=1}^{k_2} \) as the eigenvectors of \(\varvec{M}_{\mathrm {r}}\) and \(\varvec{M}_{\mathrm {c}}\), respectively. We define the eigenvectors of \(\varvec{M}_{\mathrm {r}}\) and \(\varvec{M}_{\mathrm {c}}\) as \(\Vert \varvec{u}_j \Vert _2 =1\) and \(\Vert \varvec{v}_j \Vert _2=1\) for eigenvalues \(\lambda _1^{\mathrm {r}} \ge \lambda _2^{\mathrm {r}} \ge \cdots \ge \lambda _j^{\mathrm {r}} \ge \cdots \ge \lambda _n^{\mathrm {r}}\) and \(\lambda _1^{\mathrm {c}} \ge \lambda _2^{\mathrm {c}} \ge \cdots \ge \lambda _j^{\mathrm {c}} \ge \cdots \ge \lambda _n^{\mathrm {c}}\), respectively. Therefore, for given numbers \(k_1 \le m\) and \(k_2 \le n\), the operators \(\varvec{P}_{\mathrm {L}}\) and \(\varvec{P}_{\mathrm {R}}\) are defined as \(\varvec{P}_{\mathrm {L},k_1} = [\varvec{u}_1,\varvec{u}_2,\ldots ,\varvec{u}_{k_1}]\) and \(\varvec{P}_{\mathrm {R},k_2} = [\varvec{v}_1,\varvec{v}_2,\dots , \varvec{v}_{k_2}]\) as the matrices consist of each set of eigenfunctions, respectively. These obtained projections are equivalent to the projections obtained by 2DSVD [14] using Eq. (28). Practically, the computational complexity of solving the eigendecomposition problem for an \(n \times n\)-matrix is \(\mathcal {O}(n^3)\).

4.2 Third-order tensors

As an extension of the subspace method for vector data, we introduce a new linear tensor subspace method for third-order tensors. This method is a three-dimensional version of the 2DTSM [28]. For a third-order tensor \({\mathcal {X}}\), setting \(\varvec{P}_j, \ j=1, 2, 3\), to be orthogonal projections, we call the operation

$$\begin{aligned} {\mathcal {Y}} = {\mathcal {X}}_i \times _1 \varvec{P}_1^{\top } \times _2 \varvec{P}_2^{\top } \times _3 \varvec{P}_3^{\top } \end{aligned}$$
(56)

the orthogonal projection of \({\mathcal {X}}\) to \({\mathcal {Y}}\). Therefore, using this expression for a collection of matrices \(\{{\mathcal {X}}_i \}_{i=1}^M\), such that \({\mathcal {X}}_i \in {{\mathbb {R}}^{I_1 \times I_2 \times I_e}}\) and \(\mathrm {E}({\mathcal {X}}_i)=0\), the solutions of

$$\begin{aligned} \{ P_j \}_{j=1}^3= & {} \mathrm {arg} \ \mathrm {max} \ \mathrm {E} \left( \frac{ \Vert {\mathcal {X}} \times _1 \varvec{P}_1^{\top } \times _2 \varvec{P}_2^{\top } \times _3 \varvec{P}_3^{\top } \Vert _{\mathrm {F}} }{\Vert {\mathcal {X}}_i \Vert _{\mathrm {F}} }\right) \nonumber \\&w.r.t. \ \ \varvec{P}_j^{\top }\varvec{P}_j =\varvec{I} \ for \ j=1, 2, 3 \end{aligned}$$
(57)

define a trilinear subspace that approximates \(\{{\mathcal {X}}_i \}_{i=1}^M\). Here, the norm \(\Vert {\mathcal {X}} \Vert _{\mathrm {F}}\) for matrix \({\mathcal {X}}\) represents the Frobenius norm. Therefore, using the solutions of Eq. (57), if a query tensor \(\mathcal {G}\) satisfies the condition

$$\begin{aligned} \hbox {arg} \left( \underset{i}{\max } \frac{\left\| \mathcal {G} \times _1 \varvec{P}_{k,1}^{\top } \times _{2} \varvec{P}_{k,2}^{\top }\times _3 \varvec{P}_{k,3}^{\top } \right\| _{\mathrm {F}}}{\left\| \mathcal {G} \right\| _{\mathrm {F}}} \right) =\{ \varvec{P}_{k,j} \}_{j=1}^3, \end{aligned}$$
(58)

we conclude that \(\mathcal {G} \in \mathcal {C}_k(\delta )\), \(k=1,2, \ldots , N_{\mathcal {C}}\) for \(N_{\mathcal {C}}\) categories if

$$\begin{aligned} {\mathcal {C}}_k (\delta ) = \left\{ {\mathcal {X}} \ | \ \Vert \mathcal {G} \times _1 \varvec{P}_{k,1}^{\top } \times _{2} \varvec{P}_{k,2}^{\top } \times _3 \varvec{P}_{k,3}^{\top } - {\mathcal {X}} \Vert _{\mathrm {F}} \ll \delta \right\} . \end{aligned}$$
(59)

For the practical computation of projections \(\{ \varvec{P}_{k,j} \}_{j=1}^3\), we adopt the iterative method described in Algorithm 1.

5 Numerical examples

5.1 Second-order tensors

Fig. 5
figure 5

Examples of images belonging to the same class in each dataset. a Face images of the same person with different conditions of illumination. b Face images of the same person with different camera positions. c Images of the same object with different camera positions, where the number of degrees of freedom is two. d Images of the same object with different camera positions, where the number of degrees of freedom is one. e Images of the same handwritten digit. f Images of the same handwritten Chinese character written by different people

Table 2 Sizes and number of images in each dataset

To validate the relation between the 2DDCT and the MEV method (2DSVD), we compute the recognition rate using six image datasets: cropped versions of the extended YaleB dataset [35], ORL face dataset [36], ETH80 dataset [4], NEC animal dataset [37], MNIST dataset [38] and ETL9G character dataset [39]. Figure 5 shows examples of images belonging to the same class in each dataset. For the validation, we compress images in these datasets by the MEV method and the 2DDCT. Table 2 shows the sizes and the number of images in each dataset and the parameters used in the compression.

Fig. 6
figure 6

Cumulative contribution ratios for the six datasets. The vertical and horizontal axes represent the cumulative contribution ratio and the compression ratio, respectively. For the reduced size \(1024 = 32\times 32\) and the reduced size \(K = k^2\) for the case of selecting the k highest eigenvalues, the compression ratio is given as D / K. Upward triangles and diamonds represent the cumulative contribution ratios of the eigenvalues of the covariance matrices for the 1- and 2-modes, respectively, of the images compressed by the MEV method. Downward triangles and squares represent the cumulative contribution ratios of the eigenvalues of the covariance matrices for the 1- and 2-modes, respectively, of the images compressed by the 2DDCT. a YaleB, b ORL, c ETH80, d NEC, e MNIST, f ETL9G

Fig. 7
figure 7

Computational time of dimension reduction for tensors of the second order. a, b the computational time of construction of projection matrices for the MEV and 2DDCT, respectively. c The mean computational time of projecting images to low-dimensional tensor space for six datasets. In ac, the vertical and horizontal axes represent the computational time and compression ratio, respectively. For the original reduced size D shown in Table 2 and the reduced size \(K = k_1 \times k_2\), the compression ratio is given as D / K. Here, we adopt \(k_1, k_2 \in \{4, 8, 16, 32, 64\}\)

Using the compressed images belonging to the same class in each dataset, we compute the cumulative contribution ratio (CCR) for the eigenvalues of the covariance matrices of the 1- and 2-modes for the images.

Figure 6 shows the CCRs for the six datasets. In Fig. 6a–f, we define the compression ratio as \(32^2/k^2\) for the case of using \(k \in \{1, 2, \ldots , 32\}\) eigenvectors, corresponding to the k largest eigenvalues of each mode to construct the linear subspace. As shown in Fig. 6a–d, the CCR curves for the MEV method and the 2DDCT are coincident. In Fig. 6b, the CCR curves for the 1- and 2-modes are coincident for the MEV method and the 2DDCT. This means that the eigenvalues of the 1- and 2-modes are coincident. In Fig. 6a, c–f, the CCR curves are approximately the same except for a few of the largest eigenvalues.

Figure 7 summarises the computational time of the dimension reduction for six datasets. For computation, we use Intel Xeon X5570 Quad core 2.93 GHz. For computation of 2DDCT, we construct three DCT-II matrices without FFT. Note that the computational times of TTP in the MEV and the 2DDCT are the same. In Fig. 7, computational time of the 2DDCT is smaller than one of the MEV.

For the validation, we compute the recognition rate using the original and dimension-reduced images of the six datasets with the 2DTSM as the classifier. The MNIST dataset is predivided into training and test data before it is distributed. For the YaleB, ORL, ETH80 and NEC datasets, images labelled with even numbers are used as training data and the other images are used as test data. The recognition rate is defined as the successful label estimation ratio for 1000 label estimations. In each estimation of a label for a query, queries are randomly chosen from the test data. For both the 1- and 2-modes, we evaluated the results for linear subspaces with one to 32 dimensions.

Figure 8a–f shows the recognition rates for the original and compressed images in the six datasets. In Fig. 8, we define the compression ratio as the original dimension divided by \(k^2\), where k is the number of selected principal axes for both the 1- and 2-modes. According to these results, the MEV method and the 2DDCT give almost the same recognition rate.

Fig. 8
figure 8

Recognition rates for original and compressed images. The vertical and horizontal axes represent the recognition rate [%] and compression ratio, respectively. For the first reduced size D and the second reduced size \(K = k^2\), where \(k \in \{ 1, 2, \ldots , 32\}\) is the number of selected bases, the compression ratio is given as D / K. Circles, upward triangles and downward triangles show the recognition rates for the original dimension, the MEV method and the 2DDCT, respectively. a YaleB, b ORL, c ETH80, d NEC, e MNIST, f ETL9G

5.2 Third-order tensors

To validate the relation between the HOSVD and the 3DDCT, we compute recognition rates using the OU-ISIR dataset [2] and the computational anatomy (CA) dataset. Figure 9 shows examples of sequences of silhouette images from two different categories in the OU-ISIR dataset. Figure 10 shows examples of voxel images of human livers in the CA datasets. In Fig. 10, voxel images are rendered using the computed tomography (CT) images of human livers. Table 3 summarises the sizes of tensors of the two datasets. For the compression of the silhouette-image sequences, we use the HOSVD and the 3DDCT. For the practical computation of the HOSVD, we use the iterative method described in Algorithm 1 [10]. If we set the number of iterations to 0 in Algorithm 1, we have the three-dimensional version of the MEV method. If we set the number of bases to the size of the original tensors in Algorithm 1, we call the method full projection (FP). If we set the number of bases to less than the size of the original tensors in Algorithm 1, we call the method full projection truncation (FPT).

Fig. 9
figure 9

Examples of sequences of silhouette images. These are binary images whose pixel values are 0 or 255. The figure illustrates the 1st, 21st, 41st, 61st, 81st silhouette images of sequences from a person walking at different speeds. Each sequence consists of 90 silhouette images of four steps. For each sequence, we manually selected the start and finish of the sequence from the original OU-ISIR treadmill dataset. Each sequence is obtained by resampling of the selected sequence with linear interpolation, where the linear interpolation is only used for mode 3. a Person \(\sharp \)001. b Person \(\sharp \)128

Fig. 10
figure 10

Examples of the voxel images of human livers in CA dataset. a, e show the volume rendering of shape of livers for male and female, respectively. These rendering is computed from CT images of livers. bd, and fh are axial CT images used in the rendering for a and e, respectively

Table 3 Sizes and number of tensors of the resampled OU-ISIR and CA datasets

Firstly, we observe the properties of the iterative method. Using sequences of silhouette images from a category, we compute the sum of the energies of projected tensors after k iterations and the CCR of the eigenvalues for the three modes. For the computation of tensor products, we select different orders of selection of the modes. Since there are three modes, we have \(3!=6\) orders. For FPT, we set the numbers of bases for each mode to \(64 \times 64 \times 64\) and \(32 \times 32 \times 32\). Figure 11 shows the sum of the energies \({\varPsi }_k\) after every 10 iterations for the FP and FPT with the six different orders of computation of the tensor projection. Figure 12 shows the CCRs of the three modes in the FP for the different orders of computation of the tensor projections. Figure 13 summarises the CCRs of the three modes for projections to the three different sizes. Figure 14 shows the CCRs of each mode for FP and FPT.

In Fig. 11, the iterations do not significantly affect the sum of the energies of the projected tensors. According to both Figs. 11 and 12, changing the order of computation of the tensor projections does not give different results. In Fig. 13, the CCRs of the three decompositions are almost coincident in the three modes. Figure 14 shows that the eigenvalues obtained by FP and FPT are not coincident. The decomposition of tensors for the FPT gives larger eigenvalues with a smaller number of bases than those obtained by FP.

Next, we compute the CCRs of the eigenvalues obtained by 10 iterations of Algorithm 1 for compressed tensors. For the compression of the tensors from \(128\times 88 \times 90\) to \(32 \times 32 \times 32\), we adopt the FP, the FPT and the 3DDCT. Figure 15 shows the CCRs of each mode for the three types of compressed tensor. The tensors compressed by the 3DDCT give larger eigenvalues than those compressed by the FP and the FPT with a smaller number of bases. The FP and the FPT gives the same CCRs for each mode.

Fig. 11
figure 11

Convergence of iteration described in Algorithm 1. ac The sum of energies \({\varPsi }_k\) in each iteration for the given numbers of bases of \(128\times 88 \times 90\), \(64 \times 64 \times 64\) and \(32 \times 32 \times 32\), respectively. In the ac, horizontal and vertical axes represent the number of iterations and \({\varPsi }_k\), respectively. For the computation of the tensor projections using Algorithm 1, we adopt the six orders of selection of the modes, where the legends in the figures summarises the six orders

Fig. 12
figure 12

Cumulative contribution ratio of eigenvalues obtained by 10 iterations using Algorithm 1. ac The cumulative contribution ratios for the 1-, 2- and 3-modes, respectively. Here, the given number of bases is \(128 \times 88 \times 90\) for Algorithm 1. For the computation of the tensor projections using Algorithm 1, we adopt six orders of the selection of modes, where the legends in the figures summarises the six orders. The horizontal and vertical axes represent the compression ratio and cumulative contribution ratio, respectively. For the original size \(D = 128 \times 88 \times 90\) and the reduced size \(K = k \times k^{\prime } \times k^{\prime \prime }\), the compression ratio is given as D / K

Thirdly, we compute the recognition rate of the sequences of silhouette images by the TSM. In this validation, we use the original sizes of the tensors and compressed tensors for comparison. For the compression, we adopt the HOSVD, the FP, the FPT and the 3DDCT. Using these four methods, we compress the tensors to the sizes \(32 \times 32 \times 32\), \(16 \times 16 \times 16\) and \(8 \times 8 \times 8\). The OU-ISIR dataset contains sequences of images of 34 people with nine different walking speeds. We use the sequences with walking speeds of 2, 4, 6, 8 and 10 km/h for learning data and the sequences with walking speeds of 3, 5, 7 and 9 km/h for test data. The recognition rate is defined as the successful label estimation ratio for 1000 label estimations. In each estimation of a label for a query, queries are randomly chosen from the test dataset. For the 1-, 2- and 3-modes, we evaluate the results for multilinear subspaces with sizes from one to the dimension of the compressed tensors.

Fig. 13
figure 13

Cumulative contribution ratio of three modes. ac The cumulative contribution ratios of the three modes for the input sizes \(128 \times 88 \times 90\), \(64 \times 64 \times 64\) and \(34 \times 34 \times 34\) in Algorithm 1, respectively. The horizontal and vertical axes represent the compression ratio and cumulative contribution ratio, respectively. For the original size \(D = 128 \times 88 \times 90\) and the reduced size \(K = k \times k^{\prime } \times k^{\prime \prime }\), the compression ratio is given as D / K

Fig. 14
figure 14

Comparison of cumulative contribution ratio between full projection and full projection truncation. ac A comparison of the cumulative contribution ratio for the 1-, 2- and 3-modes, respectively. The horizontal and vertical axes represent the compression ratio and cumulative contribution ratio, respectively. For the original size \(D = 128 \times 88 \times 90\) and the reduced size \(K = k \times k^{\prime } \times k^{\prime }\), the compression ratio is given as D / K

Fig. 15
figure 15

Comparison of cumulative contribution ratio for three types of compressed tensor. For the compression of tensors, we use Algorithm 1 and the 3DDCT. In Algorithm 1, we respectively adopt sizes of \(128 \times 88 \times 90\) and \(32 \times 32 \times 32\) for the computation by FP and FPT. For the three types of compressed tensor of \(32 \times 32 \times 32\), we apply 10 iterations of Algorithm 1. ac The cumulative contribution ratios for the 1-, 2- and 3-modes, respectively. In ac, the horizontal and vertical axes represent the compression ratio and cumulative contribution ratio, respectively. For the first reduced size \(K_1 = 32 \times 32 \times 32\) and the second reduced size \(K_2 = k \times k^{\prime } \times k^{\prime \prime }\), the compression ratio is given as \(K_1/K_2\)

Fig. 16
figure 16

Recognition rates of gait patterns and liver data for original and compressed tensors. We adopt the reduces sizes of \(32 \times 32 \times 32\), \(16 \times 16 \times 16\) and \(8 \times 8 \times 8\). ac and df The recognition rates for three reduced sizes of OU-ISIR and CA datasets, respectively. For compression, we use the HOSVD, FP, FPT and 3DDCT. In af, the horizontal and vertical axes represent the compression ratio and recognition ratio [%], respectively. In ac and df, for the original reduced sizes \(D = 128 \times 88 \times 90\) and \(D = 89 \times 97 \times 76\), respectively the compression ratio is given as D / K for reduced size k

Fig. 17
figure 17

Computational time of dimension reduction for tensors of the order three. a, b The computational time of construction of projection matrices for 306 sequences of silhouette images and 35 voxel images of livers, respectively. c The mean computational time of projecting images to low-dimensional tensor space for OU-ISIR and CA datasets. In a, b, we compare the HOSVD, FP, FPT and 3DDCT. In ac, the vertical and horizontal axes represent the computational time and compression ratio, respectively

Figure 16a–c shows the recognition rates for the four compression methods with three different sizes of the compressed tensors of OU-ISIR dataset. For the images of size \(32 \times 32 \times 32\) shown in Fig. 16a, the recognition rates for all four types of compressed tensor are almost coincident with those of the original tensors, if the compression ratio is higher than \(10^3\). If the compression ratio is less than \(10^3\), the recognition ratio of the FPT is higher than those of the HOSVD, the FP and the 3DDCT. The recognition ratio of the 3DDCT is lower than those of other methods since the silhouette images are binary images. For the images of sizes \(16 \times 16 \times 16\) and \(8 \times 8 \times 8\) shown in Fig. 16b, c, although the recognition rates for the four types of compressed tensor are almost the same, the recognition rates are smaller than those for the original tensors. This recognition property depends on the size of the images, and the images used for the comparison are too small to evaluate our methods of the recognition of sequences of silhouette images. In Fig. 16a–c, the HOSVD and the FP give the same recognition rate. These results imply that the decomposition for the FP is independent of the number of iterations.

Fourthly, we compute the recognition rate of the voxel images of human livers by the TSM. For the computation, we use the CA dataset. The CA dataset contains livers of 25 males and 7 females. We use the voxel images of livers of 13 males and 4 females as training data. The residual voxel images are used as test data. In the recognition, we estimate the gender of livers. The recognition rate is defined as the successful estimation ratio for 1000 gender estimations. In each estimation of a gender for a query, queries are randomly chosen from the test dataset. For the 1-, 2- and 3-modes, we evaluate the results for multilinear subspaces with sizes from one to the dimension of the compressed tensors.

Figure 16d–f shows the recognition rates for the four compression methods with three different sizes of the compressed tensors. For the voxel images of sizes \(32 \times 32 \times 32\) and \(16 \times 16 \times 16\) shown in Fig. 16d, e, the recognition rates for all four types of compressed tensor are almost coincident with those of the original tensors. For the voxel images of sizes \(8 \times 8 \times 8\) shown in Fig. 16f, the recognition rates for all four types of compressed tensor are almost coincident with those of the original tensors, if the compression ratio is higher than \(10^4\). Compared to the recognition rates of the sequences of silhouette images, the 3DDCT gives better approximation for ones of the volume data of livers since livers are essentially volumetric objects, which consists of textures of liver tissues.

Finally, using Intel Xeon X5570 Quad core 2.93 GHz, we compare the computational times of the four reduction methods for OU-ISIR data and CA data. In the comparison, for TPCA, we set 3 iterations for Algorithm 1. For computation of 3DDCT, we construct three DCT-II matrices. Therefore, we do not use FFT. Note that the computational times of TTP in the four methods are the same. Therefore, we show only the mean of the computational time. Figure 17 shows the comparison of computational times for two datasets. In both cases for OU-ISIR and CA datasets, 3DDCT gives the fastest construction of projection matrices in the four methods.

For voxel images, the 3DDCT gives an acceptable approximation of the HOSVD, the FP and the FPT in the context of pattern recognition. Even for sequences of binary silhouette images, the 3DDCT gives an acceptable approximation of the HOSVD, the FP and the FPT in the context of pattern recognition. Moreover, from Figs. 14, 15, and 16a–c, changes in the energies of the projected tensors and the CCRs of the eigenvalues in the decomposition of tensors in these methods are not important in the context of pattern recognition.

6 Conclusions

We first clarified the equivalence between Nth-order tensor principal component analysis and N-dimensional singular value decomposition for \(N=2,3\). Second, we introduced the marginal eigenvalue method and the iterative algorithm as practical computation methods for two- and three-dimensional singular value decomposition, respectively. Furthermore, we introduced the N-dimensional discrete cosine transform as an approximation of N-dimensional singular value decomposition. Finally, to evaluate the effects of the N-dimensional discrete cosine transform and N-dimensional singular value decomposition for \(N=2,3\) on tensor pattern recognition, we presented two numerical examples.

For the first example, we presented two validations for two-dimensional images. Using the images compressed by the marginal eigenvalue method and two-dimensional discrete cosine transform, we computed the cumulative contribution ratio of the eigenvalues of a tensor subspace as the first validation. As the second validation, we computed the accuracy of image pattern recognition for images compressed by the marginal eigenvalue method and two-dimensional discrete cosine transform. All the results in these two validations demonstrated the equivalent performance of the marginal eigenvalue method and two-dimensional discrete cosine transform for two-dimensional image pattern recognition.

For the second example, we presented two validations for sequences of the two-dimensional images and voxel images of human livers. Using the sequences of binary images compressed by the iterative algorithm of the higher-order singular value decomposition and the three-dimensional discrete cosine transform, we computed the cumulative contribution ratio of the eigenvalues of a tensor subspace as the first validation. As the second validation, using the sequence images and voxel images, we computed the accuracy of tensor pattern recognition for tensors compressed by the iterative algorithm and the three-dimensional discrete cosine transform for the sequences and volumetric data. All the results in these two validations demonstrated the equivalent performance of the higher-order singular value decomposition and three-dimensional discrete cosine transform for third-order tensor pattern recognition. Furthermore, for the decomposition procedure, the results showed that tensor projection is independent of the order of selection of the modes in tensor projections. Moreover, for the sequences compressed the higher-order singular value decomposition, these results showed that the cumulative contribution ratio and recognition ratio are independent of the number of iterations in the decomposition procedure.

These two numerical examples illustrated that the N-dimensional discrete cosine transform can be an acceptable approximation for N-dimensional singular value decomposition for \(N=2,3\) in tensor pattern recognition if we adopt the Euclidean distance as the metric of the pattern space. These examples also imply that the approximation of the higher-order singular value decomposition by the discrete cosine transform may be valid for tensors with order higher than the third order without an iterative computation method. These approximations are useful for the practical and fast computation of tensor recognition.