Two-dimensional k-subspace clustering and its applications on image recognition

Guo, Yan Ru; Bai, Yan Qin

doi:10.1007/s13042-023-01790-0

Two-dimensional k-subspace clustering and its applications on image recognition

Original Article
Published: 23 February 2023

Volume 14, pages 2671–2683, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Two-dimensional k-subspace clustering and its applications on image recognition

Download PDF

Yan Ru Guo¹ &
Yan Qin Bai²

172 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Image clustering plays an important role in computer vision and machine learning. However, most of the existing clustering algorithms flatten the image into one-dimensional vector as an image representation for subsequent learning without fully considering the spatial relationship between pixels, which may lose some useful intrinsic structural information of the matrix data samples and result in high computational complexity. In this paper, we propose a novel two-dimensional k-subspace clustering (2DkSC). By projecting data samples into a discriminant low-dimensional space, 2DkSC maximizes the between-cluster difference and meanwhile minimizes within-cluster distance of matrix data samples in the projected space, thus dimensionality reduction and clustering can be realized simultaneously. The weight between the between-cluster and within-cluster terms is derived from a Bhattacharyya upper bound, which is determined by the involved input data samples. This weighting constant makes the proposed 2DkSC adaptive without setting any parameters, which improves the computational efficiency. Moreover, 2DkSC can be effectively solved by a standard eigenvalue decomposition problem. Experimental results on three different types of image datasets show that 2DkSC achieves the best clustering results in terms of average clustering accuracy and average normalized mutual information, which demonstrates the superiority of the proposed method.

A General Framework for Dimensionality Reduction of K-Means Clustering

Article 23 August 2019

Robust Subspace Clustering Based on Latent Low-rank Representation with Weighted Schatten-p Norm Minimization

Discriminant Manifold Learning via Sparse Coding for Image Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Clustering is one of the most important unsupervised learning topics in machine learning, where data samples are classified into different clusters based on their similarity. It has been studied and applied in many research areas such as text mining [1,2,3,4,5], gene expression [6,7,8,9], and image recognition [10,11,12]. In particular, researchers have used many clustering algorithms for image segmentation [13,14,15,16].

Among the various clustering methods, assigning data samples to clusters based on the prototype center of a cluster is one of the most effective and well-studied methods. k-means [17] is the most representative and classical clustering method that clusters around center data samples, which clusters all data samples by minimizing the sum of distances from data samples to their nearest cluster prototype. k-means works with data samples as the cluster prototype, which often fails when the distributions of data samples are not around several central data samples. In contrast to k-means, k-plane clustering (kPC) [18] and q-flat [19] $(0 \le q \le m-1)$ use hyperplanes or affine subspaces as the entity of the centers and assign each data sample to the nearest hyperplane or $(m-q)$-dimensional affine subspace, where m is the original feature dimension. When the value of q is 0 or $m-1$, q-flat is degraded to k-means or kPC, respectively. From the above descriptions, k-means, kPC and q-flat only use the structure within-clusters by minimizing some distances within-clusters. However, minimizing the distances within-clusters does not consider the discriminative information between different clusters. The k-proximal plane clustering (kPPC) [20] introduces the dissimilarity between clusters, which is a great improvement over kPC. Twin support vector clustering (TWSVC) [21] and least squares TWSVC (LSTWSVC) [22] also consider between-cluster separability, inspired by the twin support vector machine (TWSVM) [23] and least squares twin support vector machine (LSTWSVM) [24] on classification. To improve robustness, $l_{1}$-norm-based clustering methods were also investigated, such as robust TWSVC (RTWSVC) [25], fast RTWSVC (FRTWSVC) [25], and k-subspace discriminant clustering (kSDC) [26].

However, all of the above methods are vector-based ones. If the problem has the matrix data input, a matrix must be converted to a vector before the vector-based methods can be applied. This leads to high-dimensional data and a high computational cost. In addition, some of the underlying structural information is lost. To overcome these shortcomings, a two-dimensional embedded image clustering (A2DEIC) [27] can directly work with matrices instead of flat vectors was recently proposed. However, the objective function of A2DEIC is not smooth and difficult to solve. Moreover, the A2DEIC algorithm is affected by weighting parameters, and finding the optimal parameter is time-consuming. We also notice that though much progress has been made in the field of two-dimensional dimensionality reduction [28,29,30,31,32,33], little attention has been paid to two-dimensional clustering.

Recently, Li et al. [34] proposed a matrix-based dimensionality reduction method two-dimensional Bhattacharyya bound linear discriminant analysis (2DBLDA). In 2DBLDA, the between-class distance and the within-class distance are weighted by a constant calculated from the input data. This constant helps the objective of 2DBLDA to achieve the minimum Bhattacharyya error bound. Moreover, the design of 2DBLDA avoids the small sample size problem and can be solved by a simple standard eigenvalue decomposition problem. Inspired by the spirit of 2DBLDA, in this paper, we extend 2DBLDA to the clustering problem and propose a novel two-dimensional k-subspace clustering method (2DkSC) that considers both discriminative and underlying structural information. In particular, 2DkSC succeeds in minimizing the similarity within clusters and maximizing the dissimilarity between clusters. Moreover, taking the advantage of 2DBLDA, the cluster data samples are clustered into k-subspaces. In summary, 2DkSC has the following characteristics:

$\bullet$ 2DkSC maximizes the matrix-based between-cluster distance which is measured by the weighted pairwise distances of cluster centers and meanwhile minimizes the matrix-based within-cluster distance, and clusters data samples into these k-subspaces directly. In this way, on the premise of preserving the original matrix data structure, 2DkSC considers both local and discriminative information during clustering by finding the most appropriate reduced dimension for lower dimensional spaces.

$\bullet$ The weighting constant between the between-cluster and within-cluster terms is determined by the involved data that makes the proposed 2DkSC adaptive and without setting any parameters. Inherited from 2DBLDA, the constant is meaningful in the sense that it achieves minimizing the upper bound of Bhattacharyya error.

$\bullet$ From the experimental results of image recognition, 2DkSC has the highest ACC and NMI in five of the six datasets. For example, 2DkSC achieves 77.4% NMI in the Coil100 dataset, which is 3.89% better than the vector-based q-flat algorithm and 6.50% better than the matrix-based A2DEIC algorithm. This phenomenon proves the superiority of our proposed algorithm for image clustering.

The rest of the paper is organized as follows. In section 2, kPC, kPPC, q-flat and A2DEIC are briefly introduced. In section 3, our method is presented. The experiments and conclusions can be found in section 4 and 5, respectively. Details of the weighting constant is provided in the appendix 6.

2 Related works

Given the dataset $T=\{{\textbf {X}}_1,\,{\textbf {X}}_2,\ldots ,\,{\textbf {X}}_N\}$, where ${\textbf {X}}_l\in \mathbb {R}^{m\times n}$ for $l=1,\,2,\,\ldots ,N$. In particular, if a data sample is vector form, n equals 1. The goal of clustering is to partition T into k disjoint clusters $C_i$ for $i=1,2,\ldots ,k$ satisfying $C_{i'}\cap _{i'\ne i} C_i =\varnothing$ and $T=\cup _{i=1}^kC_i$. Correspondingly, $y_l\in \{1,2,\cdots ,k\}$ can be used to indicate the cluster label of the data sample ${\textbf {X}}_l$. Assume that the i-th cluster contains $N_{i}$ data samples. Then $\sum \nolimits _{i=1}^{k}N_i=N$. Let ${\overline{{\textbf {X}}}}_i=\frac{1}{N_i}\sum \nolimits _{s=1}^{N_i}{} {\textbf {X}}_{s}^i$ be the mean of the data samples of the i-th cluster, $i=1,2,\ldots ,k$, where ${\textbf {X}}_{s}^i$ is the s-th data sample of the i-th cluster. For a matrix ${\textbf {Q}}=({\textbf {q}}_1,\,{\textbf {q}}_2,\ldots ,{\textbf {q}}_n)\in \mathbb {R}^{m\times n}$, its Frobenius norm (F-norm) $\Vert {\textbf {Q}}\Vert _F$ is defined as $\Vert {\textbf {Q}}\Vert _F=\sqrt{\sum \nolimits _{i=1}^{n}\Vert {\textbf {q}}_i\Vert _2^2}$. The F-norm is a natural generalization of the vector $l_2$-norm on matrices.

2.1 kPC

kPC [18] divides the data samples into k clusters, so that the data samples gather around their own clustering hyperplane. For the i-th cluster, the hyperplane of kPC is determined by minimizing the sum of the distances between the data samples of the i-th cluster and the hyperplane of the i-th cluster, solving the following programming problem

$$\begin{aligned} \begin{array}{ll} \underset{{\textbf {w}}_i,b_i}{\min }~&{}\Vert {\textbf {w}}_i^T{\textbf {A}}_i+b_i{\textbf {e}}_i\Vert _2^2\\ \hbox {s.t.}~~&{}\Vert {\textbf {w}}_i\Vert _2^2 = 1, \end{array} \end{aligned}$$

(1)

where ${\textbf {w}}_i\in \mathbb {R}^m$, $b_i\in \mathbb {R}$, ${\textbf {A}}_i\in \mathbb {R}^{m\times N_i}$ is the matrix consisting of the data samples labeled i and ${\textbf {e}}_i$ is a vector of ones of an appropriate dimension, $i=1,2,\ldots ,k$. The constraint normalizes the normal vector of the hyperplane of the cluster center.

The solution of the problem (1) can be obtained by solving k eigenvalue problems. Once k hyperplanes of the cluster center are obtained, a data sample ${\textbf {x}}\in \mathbb {R}^{m}$ is assigned to the i-th cluster by

$$\begin{aligned} \text {Cluster}({\textbf {x}}) = \underset{i=1,2,\ldots ,k}{\arg \min }\,\,\Vert {\textbf {w}}_i^T{\textbf {x}}+b_i\Vert _2^2, \end{aligned}$$

(2)

The kPC clustering starts with a random initial assignment of data samples. Each data sample is assigned a label by (2). Then k cluster center hyperplanes are updated by solving (1). The final k cluster center hyperplanes are obtained when the overall objective function does not decrease or the overall assignment of all data samples to cluster center hyperplanes is repeated.

2.2 kPPC

In contrast to kPC, kPPC [20] is proposed by introducing between-cluster information into each cluster to construct the cluster hyperplane. kPPC not only requires that the data samples in each cluster be as close as possible to their own center hyperplane, but also pushes the data samples in the other clusters far away from this center hyperplane, solving the following optimization problem

$$\begin{aligned} \begin{aligned} \underset{{\textbf {w}}_i,b_i}{\min }~&\Vert {\textbf {w}}_i^T{\textbf {A}}_i+b_i{\textbf {e}}_i\Vert _2^2- c\Vert {\textbf {w}}_i^T\widehat{{\textbf {A}}}_i+b_i\widehat{{\textbf {e}}}_i\Vert _2^2 \\ \hbox {s.t.}~~&\Vert {\textbf {w}}_i\Vert _2^2 = 1, \end{aligned} \end{aligned}$$

(3)

where ${\textbf {A}}_i\in \mathbb {R}^{m\times N_i}$ is the matrix consisting of the data samples of label i, and $\widehat{{\textbf {A}}}_i\in \mathbb {R}^{m\times (N-N_i)}$ is the matrix consisting of the data samples of the other labels. c is a positive parameter, $\widehat{{\textbf {e}}}_i$ is the vector of ones of an appropriate dimension as ${\textbf {e}}_i$.

Different from random initialization in kPC, an initialization based on a Laplacian graph-based is constructed in kPPC, which makes kPPC more stable than kPC [20]. kPPC is also solved by an eigenvalue problem.

2.3 q-flat

q-flat [19] aims to partition the data samples into k clusters, each of which is well approximated by minimizing the sum of the squared distances of each data sample to the nearest flat. For the i-th cluster, q-flat minimizes the following problem to find its best fit $(m-q)$-dimensional subspace.

$$\begin{aligned} \begin{aligned} \underset{{\textbf {W}}_i, \varvec{\gamma }_i}{\min }~~&\Vert {\textbf {W}}_i^T{\textbf {X}}_i-\varvec{\gamma }_i{\textbf {e}}^T_i\Vert _F^2\\ \hbox {s.t. }\,&{\textbf {W}}_i^T{\textbf {W}}_i={\textbf {I}}, \end{aligned} \end{aligned}$$

(4)

where ${\textbf {W}}_{i}\in \mathbb {R}^{m\times q}$, $q\le m$, ${\textbf {I}}$ is the identity matrix of an appropriate dimension, and $\varvec{\gamma }_i\in \mathbb {R}^{q}$, $i=1,2,\ldots ,k$, ${\textbf {e}}_i$ is a vector of ones of an appropriate dimension.

In practice, q-flat also assumes a random initial assignment of the data samples and reassigns the data samples with

$$\begin{aligned} \text {Cluster}({\textbf {x}}) = \underset{i=1,2,\ldots ,k}{\arg \min }\,\,\Vert {\textbf {W}}_i^T{\textbf {x}}-\varvec{\gamma }_i\Vert _2^2 \end{aligned}$$

(5)

after obtaining all ${\textbf {W}}_{i}$ and $\varvec{\gamma }_i$.

Similar to kPC and kPPC, q-flats alternates between updating clusters and assigning clusters to determine k cluster flats and find k clusters.

2.4 A2DEIC

Different from kPC, kPPC, and q-flat, A2DEIC [27] proposes an image clustering algorithm that deals directly with matrix representation. It uses two projection matrices to map the original data samples into a low-dimensional subspace and achieve clustering. Given the image data set T, A2DEIC minimizes the following objective function

$$\begin{aligned} \begin{aligned} \underset{{\textbf {U}}, {\textbf {V}}}{\min }~~&\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{k}y_{ij}\Vert {\textbf {U}}^T({\textbf {X}}_i-{\overline{{\textbf {X}}}}_j){\textbf {V}}\Vert _F^2-\lambda \sum \limits _{i=1}^{N}\Vert {\textbf {U}}^T({\textbf {X}}_i-\overline{{\textbf {X}}}){\textbf {V}}\Vert _F^2\\ \hbox {s.t. }\,&{\textbf {U}}^T{\textbf {U}}={\textbf {I}}, {\textbf {V}}^T{\textbf {V}}={\textbf {I}}, \end{aligned} \end{aligned}$$

(6)

where ${\textbf {U}}\in \mathbb {R}^{m\times q_1}$ and ${\textbf {V}}\in \mathbb {R}^{n\times q_2}$ are projection matrices mapping the original data samples into a low-dimensional subspace $\mathbb {R}^{q_1 \times q_2}$. $y_{ij}\in \{0,1\}$ denotes the cluster indicator value of data samples ${\textbf {X}}_i$. The value is 1 if the data samples ${\textbf {X}}_i$ is partitioned into the i-th cluster, and 0 otherwise. $\overline{{\textbf {X}}}$ is the mean of all data sample matrices and ${\overline{{\textbf {X}}}}_j$ is the mean of the data samples in the j-th cluster. $\lambda$ is a positive parameter. A2DEIC is solved through an iterative algorithm.

3 Two-dimensional k-subspace clustering

3.1 Problem formulation

When the input data is in matrix (or two-dimensional) form, such as images, vector-based algorithms must convert matrices to vectors, which limits consideration of the spatial relationship between pixels and increases computational complexity. As seen above, A2DEIC is proposed to process data input from the matrix. However, the behavior of A2DEIC is greatly affected by its tuning parameters, and its optimization problem is complicated to solve. Inspired by the spirit of 2DBLDA, we propose a new two-dimensional k-subspace clustering algorithm (2DkSC) for image matrices. Inheriting from 2DBLDA, 2DkSC automatically adapts to the given dataset and no parameters need to be adjusted, which can solve the optimization problem efficiently. Moreover, it realizes simultaneously learning the clustering results in a most discriminant subspace of an appropriate dimension by preserving the original structure information of the image matrix.

Specifically, 2DkSC first initializes the cluster assignment and computes the k subspaces. Then, a new round assignment is performed according to the obtained k subspaces and the whole procedure is repeated. For the i-th cluster, $i=1,...,k$, we solve the following optimization problem

$$\begin{aligned} \begin{aligned} \underset{{\textbf {W}}_{i}}{\min }~~&\Delta _{i}\sum _{s=1}^{N_i}\Vert {\textbf {W}}_{i}^T({\textbf {X}}_s^i-\overline{{\textbf {X}}}_i)\Vert _{F}^{2}-\frac{1}{N}\sum _{j\not =i}\sqrt{N_iN_j}\Vert {\textbf {W}}_{i}^T(\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j})\Vert _{F}^{2}\\ \hbox {s.t. }\,&{\textbf {W}}_{i}^T{\textbf {W}}_{i}={\textbf {I}}, \end{aligned} \end{aligned}$$

(7)

where ${\textbf {W}}_{i}\in \mathbb {R}^{m\times d}$ is the projection matrices for the i-th subspace, $d \le m$, $\Delta _{i}=\frac{1}{4}\sum \nolimits _{j\not =i}\frac{\sqrt{N_iN_j}}{N}\Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2$ is a weighting constant.

We now give the geometric meaning of the model (7). Minimizing the first term in (7) forces the data samples of the i-th cluster around its own cluster center in its subspace. Minimizing the second term in (7) keeps the centers of two different clusters apart in the projected space, which guarantees the between-cluster separativeness. The weighting constant $\Delta _{i}$ in front of the first term balances the importance between clusters and the importance within clusters, which is derived by minimizing an upper bound of theoretical framework of the Bhattacharyya error bound optimality. The details can be found in the appendix 6. We can observe that 2DkSC can be adapted to different data samples since the weighting constant $\Delta _{i}$ is determined by the given data set. The constraint ${\textbf {W}}_{i}^T{\textbf {W}}_{i}={\textbf {I}}$ ensures that the obtained discrimination directions of the i-th cluster are orthonormal to each other, which ensures minimal redundancy in the projected space.

3.2 Solving algorithm and computational complexity analysis

2DkSC can be solved by the following standard eigenvalue decomposition problem

$$\begin{aligned} \begin{aligned} \underset{{\textbf {W}}_{i}}{\min }&~~{\textbf {tr}}({\textbf {W}}_{i}^T{\textbf {M}}_{i}{} {\textbf {W}}_{i})\\ \hbox {s.t. }\,&{\textbf {W}}_{i}^T{\textbf {W}}_{i}={\textbf {I}}, \end{aligned} \end{aligned}$$

(8)

where

$$\begin{aligned} \begin{aligned} {\textbf {M}}_{i}=\,&\Delta _{i}\sum _{s=1}^{N_i}\left( {\textbf {X}}_s^i-\overline{{\textbf {X}}}_i\right) \left( {\textbf {X}}_s^i-\overline{{\textbf {X}}}_i\right) ^T-\frac{1}{N}\sum _{j\not =i}\sqrt{N_iN_j}\left( \overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}\right) \left( \overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}\right) ^T. \end{aligned} \end{aligned}$$

(9)

With the initial cluster labels of all data samples, 2DkSC updates the data sample labels and k clustering subspaces alternately. After finding the optimal solution of model (8) for each cluster, a data sample ${\textbf {X}}_l$ is relabeled as follows

$$\begin{aligned} \text {Cluster}({\textbf {X}}_l) = \underset{i=1,2,\ldots ,k}{\arg \min }\,\,\Vert {\textbf {W}}_{i}^T({\textbf {X}}_l-\overline{{\textbf {X}}}_i)\Vert _{F}^{2}, ~l=1,\,2,\,\ldots ,N \end{aligned}$$

(10)

and k clusters are updated accordingly. These updated clusters are used to determine new projection directions by model (7). The entire process continues until a repeated assignment of cluster marks is made for all data samples. The clustering process of 2DkSC can be realized by Algorithm 1.

For 2DkSC, the main computational cost is to solve the optimization problems (8). From Algorithm 1, we can see that the main computational cost of 2DkSC is to compute the matrix ${\textbf {M}}_{i}$ and perform its standard eigenvalue decomposition. Its computational complexity is $O(m^3)$. Therefore, the computational complexity for Step (a) in Algorithm 1 is $O(rkm^3)$, where r is the number of iterations and k is the cluster number. The computational complexity for Step (b) is O(rkmnN). Therefore, considering that for high-dimensional data rknN is much smaller than $m^2$, the computational complexity for 2DkSC is $O(rkm^3)$.

To further illustrate the contribution of our method, we discuss the differences between the proposed 2DkSC and the four closely related methods, kPPC, TWSVC, q-flat and A2DEIC.

(i) Difference From kPPC, TWSVC and q-flat: Compared to the vector-based clustering algorithms kPPC, TWSVC and q-flat, the proposed 2DkSC is a matrix-based method. The similarity between kPPC, TWSVC and 2DkSC is that their objective functions both maximize the distance between clusters while minimizing the distance within clusters. However, q-flat minimizes only the distance within clusters. The weighting constant of 2DkSC is derived from the Bhattacharyya error bound and can be adaptively adjusted, while the weighting parameters of kPPC and TWSVC require grid search parameters. In addition, 2DkSC and q-flat can achieve clustering and dimensionality reduction simultaneously, while kPPC and TWSVC do not provide dimensionality reduction, only clustering. 2DkSC and kPPC obtain their solutions by solving eigenvalue problems, while q-flat is solved by the singular value decomposition and TWSVC by two quadratic problems.

(ii) Difference From A2DEIC: Although A2DEIC can also directly deal with the matrix subspace, A2DEIC is strongly influenced by its tuning parameters and the search for the optimal parameter is difficult and time-consuming, while 2DkSC does not need to tune any parameters, which can solve the optimization problem efficiently. 2DkSC can solve its optimization problem simply by a standard eigenvalue problem, while A2DEIC solves its optimization problem by an iteration technique.

4 Experiments

We compare the proposed approach with seven related clustering algorithms, including k-means [17], q-flat [19], kPPC [20], TWSVC [21], FRTWSVC [25], kSDC [26], and A2DEIC [27]. All our experiments are performed on a PC computer with an Intel 3.30 GHz CPU and 4 GB RAM memory under Matlab 2017b platform. kPPC and A2DEIC obtain their solutions by solving eigenvalue problems. q-flat is solved by singular value decomposition. TWSVC is solved by two quadratic problems. FRTWSVC solves a series of linear systems of equations. kSDC is solved by an alternating direction method of multipliers. As for the parameter selection, the tuning parameters c in kPPC, TWSVC, FRTWSVC and A2DEIC are selected from the set $\{2^{-8},2^{-7},\ldots ,2^{7}\}$. The optimal parameter is selected for all the investigated methods using the grid search technique. k is set to be equal to the ground truth cluster number for each dataset by default. For unknown k, one way is to use the non-parametric Bayesian method [35] to estimate it. Another approach is to run the clustering method on dataset with different number of clusters as input to find its optimum, whose quality can be measured by clustering accuracy or normalized mutual information. Once the optimal parameter is selected, it is used to learn the final clusters. For methods with random initialization, the average clustering result over ten runs are adopted.

4.1 Evaluation metrics

Following most work on clustering, we use clustering accuracy (ACC) and normalized mutual information (NMI) [36,37,38] as evaluation measures, which are in the range [0, 1]. A larger value indicates more accurate clustering results. Suppose, $p_i$ represents the label predicted by a clustering algorithm and $t_i$ represents the corresponding true label of a data sample ${\textbf {X}}_l$. The ACC is defined as follows:

$$\begin{aligned} \begin{aligned} ACC=\frac{\sum \nolimits _{i=1}^{N}\delta (t_i,map(p_i))}{N}, \end{aligned} \end{aligned}$$

(11)

where $\delta (y_1,y_2)=1$ if $y_1=y_2$ and $\delta (y_1,y_2)=0$ otherwise. $map(p_i)$ is the best mapping function that converts clustering labels to match true labels, using the Kuhn-Munkres algorithm [38].

Let us denote by C the set of clusters resulting from the ground truth, and by $C'$ the set resulting from our algorithm. There is a mutual information metric $MI(C,C')$, which is defined as follows:

$$\begin{aligned} \begin{aligned} MI(C,C')=\sum \limits _{c_i\in C,c_j'\in C'}p(c_i,c_j')log_{2}\frac{p(c_i,c_j')}{p(c_i)p(c_j')}, \end{aligned} \end{aligned}$$

(12)

where $p(c_i)$ and $p(c_j')$ are the probabilities that a document arbitrarily selected from the corpus belongs to clusters $c_i$ and $c_j'$, respectively, and $p(c_i,c_j')$ is the joint probability that the arbitrarily selected document belongs to both clusters $c_i$ and $c_j'$ simultaneously. In our experiments, we use the normalized mutual information (NMI) as follows:

$$\begin{aligned} \begin{aligned} NMI(C,C')=\frac{MI(C,C')}{max(H(C),H(C'))}, \end{aligned} \end{aligned}$$

(13)

where H(C) and $H(C')$ are the entropies of C and $C'$ respectively. It is easy to verify that $NMI(C,C')$ ranges from 0 to 1. $NMI(C,C')=1$ if the two groups of clusters are identical, and $NMI(C,C')=0$ if the two groups are independent.

4.2 Datasets

The experiments are performed on six image datasets, including one object image, one handwritten image, and four face images.

Object recognition: We use the Coil100 dataset [39]. Coil100 contains 900 images with 100 different objects.

Handwritten digit recognition: We use the USPS^{Footnote 1} dataset to evaluate the performance of handwritten digit recognition performance. The dataset contains 11000 samples with 10 classes, where each sample corresponds to one digit.

Face recognition: Four face image datasets (Yale [40], Indian [41], ORL^{Footnote 2} and FERET [42]) are used. The Yale dataset contains 165 images of 15 individuals. The Indian dataset contains 242 human face images of 22 females. The ORL dataset contains 400 images of 40 individuals. The FERET dataset contains 14126 images comprising 1199 individuals and 365 duplicate image sets. Here, we use a subset that contains 1400 images of 200 individuals.

The number of samples and categories as well as the image size are listed in Table 1. Some of the gray images are shown in Fig. 1.

Table 1 The summary of data statistics

Full size table

4.3 Experimental results

4.3.1 Performance analysis

The results of comparing the performance of different algorithms are shown in Tables 2 and 3, and the best results are indicated in bold figures. The p-value from paired t-test in 5% significance level are adopted. From the experimental results, the following observations can be obtained:

(1) 2DkSC achieves the best clustering results in terms of both average clustering ACC and average NMI. Moreover, 2DkSC has the highest ACC and NMI in five of the six datasets, respectively. For example, 2DkSC achieves 77.45% NMI in the Coil100 dataset, which is 3.89% better than the vector q-flat-based method algorithm and 6.50% better than the matrix A2DEIC-based method algorithm. This phenomenon proves the superiority of our proposed algorithm for image clustering.

(2) As a two-dimensional embedding for image clustering, A2DEIC achieves the second best performance in terms of both average clustering ACC and average NMI. The reason is that A2DEIC can directly handle matrix representations. In this way, the spatial information can be preserved in the original data. For example, A2DEIC has the highest ACC of 70.30% on the Yale dataset, 4.02% higher than 2DkSC and 10.30% higher than the vector-based kSDC algorithm.

(3) We also find out that q-flat performs better than other vector-based algorithms, ranking third in both average accuracy and average NMI. For example, q-flat has the highest NMI of 54.09% on the USPS dataset, 4.66% higher than 2DkSC and 13.13% higher than the k-means algorithm. Similar to q-flat, kSDC is also a vector-based clustering algorithm, and its average accuracy and average NMI are ranked fourth. The result supports the fact that q-flat and kSDC are able to capture the intrinsic structure in the low-dimensional subspace.

(4) k-means has better performance than the plane-based clustering algorithms kPPC, TWSVC, and FRTWSVC in terms of both average accuracy and average NMI. The performance of TWSVC is better than that of kPPC and FRTWSVC. kPPC has the worst performance.

(5) In terms of CPU time, kPPC, TWSVC and kSDC are slower than other methods. In contrast, 2DkSC costs the least CPU time compared to the seven similar clustering algorithms. This is because 2DkSC does not require any adjusting parameters and the solution can be achieved quickly, which shows the efficiency of our proposed method.

(6) p values bettween 2DkSC and other methods show that on most of the datasets, 2DkSC is statistically different from other methods.

Table 2 ACC(%), CPU time (second) and p value for different algorithms

Full size table

Table 3 NMI(%), CPU time (second) and p value for different algorithms

Full size table

4.3.2 The influence of the dimension

To observe the discriminative ability, the clustering results of 2DkSC along different dimensions are shown in Fig. 2. In Fig. 2, the clustering results of A2DEIC and our 2DkSC are shown when the reduced dimension is set to $d = 1, 2,..., m$. The results show the following: (i) Although the curve of A2DEIC algorithm is below the optimal parameters, its highest ACC and NMI are not as good as our method. (ii) With the increase of the number of reduced dimensions, ACC and NMI of our 2DkSC vary relatively. (iii) The 2DkSC has the highest results under the optimal reduced dimension on all datasets. (iv) The A2DEIC and 2DkSC are strongly affected by the reduced dimension, and it is necessary to choose an optimal reduced dimension.

5 Conclusion

In this paper, a novel two-dimensional k-subspace clustering method named 2DkSC is investigated. Both discriminative and underlying structural information are embedded in 2DkSC. Therefore, 2DkSC realizes dimensionality and clustering simultaneously. The 2DkSC algorithm has no parameters, its weighting constant can be adaptively adjusted according to the involved data, and the optimization problem has a closed form solution. Experimental results on image recognition have shown the superiority of the proposed method. However, a drawback of 2DkSC is that it may not be very robust to noise since it is based on the squared F-norm. Therefore, we will investigate the robust two-dimensional subspace clustering algorithm in the future. Our MATLAB code can be downloaded from http://www.optimalgroup.org/Resources/Code/2DkSC.html.

Notes

References

Tan PN, Steinbach M, Kumar V (2005) Introduction to Data Mining. Addison Wesley, Boston
Google Scholar
Zheng CT (2018) C, Liu, H. San Wong, Corpus based topic diffusion for short text clustering, Neurocomputing 275:2444–2458
Google Scholar
Abasi AK, Khader AT, Al-Betar MA et al (2020) Link based multi verse optimizer for text documents clustering. Appl Soft Comput 87:106002
Article Google Scholar
Costa G, Ortale R (2021) Jointly modeling and simultaneously discovering topics and clusters in text corpora using word vectors. Inf Sci 563:226–240
Article MathSciNet Google Scholar
Thirumoorthy K, Muneeswaran K (2021) A hybrid approach for text document clustering using jaya optimization algorithm. Expert Syst Appl 178:115040
Article Google Scholar
Jiang Z, Li T, Min W et al (2017) Fuzzy c-means clustering based on weights and gene expression programming. Pattern Recogn Lett 90:1–7
Article Google Scholar
Shukla AK, Muhuri PK (2019) Big data clustering with interval type 2 fuzzy uncertainty modeling in gene expression datasets. Eng Appl Artif Intell 77:268–282
Article Google Scholar
Zeng YP, Xu ZS, He Y et al (2020) Fuzzy entropy clustering by searching local border points for the analysis of gene expression data. Knowledge Based Systems 190:105309
Article Google Scholar
Rahman MA, Ang LM, Seng KP (2020) Clustering biomedical and gene expression datasets with kernel density and unique neighborhood set based vein detection. Inf Syst 91:101490
Article Google Scholar
Wang M, Deng WH (2020) Deep face recognition with clustering based domain adaptation. Neurocomputing 393:1–14
Article Google Scholar
Liu N, Guo B, Li XJ et al (2021) Gradient clustering algorithm based on deep learning aerial image detection. Pattern Recogn Lett 141:37–44
Article Google Scholar
Fang U, Li JX, Lu XQ et al (2021) Self-supervised cross-iterative clustering for unlabeled plant disease images. Neurocomputing 456:36–48
Article Google Scholar
Pham TX, Siarry P, Oulhadj H (2018) Integrating fuzzy entropy clustering with an improved PSO for MRI brain image segmentation. Appl Soft Comput 65:230–242
Article Google Scholar
Mahata N, Kahali S, Adhikari SK et al (2018) Local contextual information and Gaussian function induced fuzzy clustering algorithm for brain MR image segmentation and intensity inhomogeneity estimation. Appl Soft Comput 68:586–596
Article Google Scholar
Lei T, Jia X, Zhang Y et al (2019) Superpixel-based fast fuzzy C-means clustering for color image segmentation. IEEE Trans Fuzzy Syst 27(9):1753–1766
Article Google Scholar
Wei D, Wang ZB, Si L et al (2021) An image segmentation method based on a modified local information weighted intuitionistic fuzzy C-means clustering and gold panning algorithm. Eng Appl Artif Intell 101:104209
Article Google Scholar
Wu J, Liu H, Xiong H et al (2015) k-means based consensus clustering: a unified view. IEEE Trans Knowl Data Eng 27(1):155–169
Article Google Scholar
Bradley PS, Mangasarian OL (2000) k-plane clustering. J Global Optim 16(1):23–32
Article MathSciNet MATH Google Scholar
Tseng P (2000) Nearest q-Flat to m Points. J Optim Theory Appl 105:249–252
Article MathSciNet MATH Google Scholar
Liu LM, Guo YR, Wang Z et al (2017) k-proximal plane clustering. Int J Mach Learn Cybern 8(5):1537–1554
Article Google Scholar
Wang Z, Shao YH, Bai L et al (2015) Twin support vector machine for clustering. IEEE Trans Neural Netw Learn Sys 26(10):2583–2588
Article MathSciNet Google Scholar
Khemchandani R, Pal A, Chandra S (2018) Fuzzy least squares twin support vector clustering. Neural Comput Appl 29(2):553–563
Article Google Scholar
Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Article Google Scholar
Arun Kumar M, Gopal M (2009) Least squares twin support vector machines for pattern classification. mExpert Sys With Appl 36(4):7535–7543
Article Google Scholar
Ye Q, Zhao H, Li Z et al (2017) L1-norm distance minimization-based fast robust twin support vector $k$-plane clustering. IEEE Trans Neural Netw Learn Sys 29(9):4494–4503
Article Google Scholar
Li CN, Shao YH, Guo YR et al (2019) Robust k-subspace discriminant clustering. Appl Soft Comput 85:105858
Article Google Scholar
Li Z, Yao L, Wang S et al (2020) Adaptive two-dimensional embedded image clustering, Proceedings of the AAAI conference on. Artif Intell 34(04):4796–4803
Google Scholar
Lu Y, Yuan C, Lai Z et al (2019) Horizontal and vertical nuclear norm based 2DLDA for image representation. IEEE Trans Circuits Syst Video Technol 29(4):941–955
Article Google Scholar
Li CN, Shao YH, Deng NY (2015) Robust L1-norm two-dimensional linear discriminant analysis. Neural Netw 65:92–104
Article MATH Google Scholar
Li CN, Shang MQ, Shao YH et al (2019) Sparse L1-norm two dimensional linear discriminant analysis via the generalized elastic net regularization. Neurocomputing 337:80–96
Article Google Scholar
Lu Y, Yuan C, Lai Z et al (2018) Horizontal and vertical nuclear norm-based 2DLDA for image representation. IEEE Trans Circuits Syst Video Technol 29(4):941–955
Article Google Scholar
Li CN, Shao YH, Chen WJ et al (2021) Generalized two-dimensional linear discriminant analysis with regularization. Neural Netw 142:73–91
Article Google Scholar
Li CN, Shao YH, Wang Z et al (2019) Robust bilateral Lp-norm two-dimensional linear discriminant analysis. Inf Sci 500:274–297
Article MathSciNet MATH Google Scholar
Guo YR, Bai YQ, Li CN et al (2021) Two dimensional Bhattacharyya bound linear discriminant analysis with its applications. Appl Intell 1-17
Ma Z, Lai Y, Kleijn WB et al (2019) Variational bayesian learning for dirichlet process mixture of inverted dirichlet distributions in non-gaussian image feature modeling. IEEE Trans Neural Netw Learn Sys 30(2):449–463
Article MathSciNet Google Scholar
Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637
Article Google Scholar
Yang J, Parikh D, Batra D (2016) Joint unsupervised learning of deep representations and image clusters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp 5147–5156. https://doi.org/10.1109/CVPR.2016.556
Xie Y, Lin B, Qu Y et al (2020) Joint deep multi-view learning for image clustering. IEEE Trans Knowledge Data Eng 33(11):3594–3606
Nene SA, Nayar SK, Murase H (1996) Columbia object image library: Coil-100. Technical Report CUCS-006-96, Department of Computer Science, Columbia University, New York
Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660
Article Google Scholar
Jain V (2002) The Indian face database, http://vis-www.cs.umass.edu/~vidit/IndianFaceDatabase/
Phillips PJ, Moon H, Rizvi SA et al (2000) The FERET evaluation methodology for face-recognition algorithms. IEEE Trans Pattern Anal Mach Intell 22(10):1090–1104
Article Google Scholar
Nielsen F (2014) Generalized bhattacharyya and chernoff upper bounds on bayes error using quasi-arithmetic means. Pattern Recogn Lett 42:25–34
Article Google Scholar
Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press, New York
MATH Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No.12171307) and Zhejiang Soft Science Research Project (No.2021C35003).

Author information

Authors and Affiliations

College of science, Zhejiang University of Science and Technology, HangZhou, 310023, China
Yan Ru Guo
Department of Mathematics, Shanghai University, Shanghai, 200444, China
Yan Qin Bai

Authors

Yan Ru Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yan Qin Bai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Qin Bai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In the appendix, we present the proof procedure of the relevant Bhattacharyya error bound. It is further explained that the weighting constant $\Delta _{i}$ balances the importance between clusters and the importance within clusters, which is derived by minimizing an upper bound of theoretical framework of the Bhattacharyya error bound optimality.

The Bhattacharyya error [43] is a close upper bound to the Bayes error, which is given by

$$\begin{aligned} \epsilon _B=\sum \limits _{i<j}^k\sqrt{P_iP_{j}}\int \sqrt{p_i({\textbf {X}})p_{j}({\textbf {X}})}d{\textbf {X}}, \end{aligned}$$

(1)

where ${\textbf {X}}$ is a data sample, $P_i$ is the prior probability, and $p_i({\textbf {X}})$ is the probability density function of the i-th class of the data.

Proposition 1

Assume $P_i$ and $p_i({\textbf {X}})$ are the prior probability and the probability density function of the i-th class for the training data set T, respectively, and the data samples in each class are independent and identically normally distributed. Let $p_1({\textbf {X}}), p_2({\textbf {X}}),\ldots , p_k({\textbf {X}})$ be the Gaussian functions given by $p_i({\textbf {X}})=\mathcal {N}({\textbf {X}}\mid {\overline{{\textbf {X}}}}_i, \varvec{\Sigma }_i)$, where ${\overline{{\textbf {X}}}}_i$ and $\varvec{\Sigma }_i$ are the class mean and the class covariance matrix, respectively. We further suppose $\varvec{\Sigma }_i=\varvec{\Sigma }$, $i=1,2,\ldots ,k$, where $\varvec{\Sigma }$ is the covariance matrix of the data set T, and ${\overline{{\textbf {X}}}}_i$ and $\varvec{\Sigma }$ can be estimated accurately from T. Then for arbitrary projection vector ${\textbf {w}}\in \mathbb {R}^{m}$, the Bhattacharyya error bound $\epsilon _B$ defined by (1) on the data set $\widetilde{T}=\{\widetilde{{\textbf {X}}}_i\mid \widetilde{{\textbf {X}}}_i={\textbf {w}}^T{\textbf {X}}_i\in \mathbb {R}^{1\times n}\}$ satisfies the following [34]:

$$\begin{aligned} \begin{aligned} \epsilon _B \le&-\frac{a}{8}\sum \limits _{i<j}^k\sqrt{P_iP_j}{\Vert {\textbf {w}}^T({\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}})\Vert _2^2}+\frac{a}{8}\Delta \sum _{i=1}^{k}\sum _{s=1}^{N_i}\Vert {\textbf {w}}^T({\textbf {X}}_{is}-\overline{{\textbf {X}}}_i)\Vert _2^2\\&+\sum \limits _{i<j}^k\sqrt{P_iP_j},\\ \end{aligned} \end{aligned}$$

(2)

where $\Delta =\frac{1}{4}\sum \nolimits _{i<j}^k\frac{\sqrt{N_iN_j}}{N}\Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2$, $P_i=\frac{N_i}{N}$, $P_j=\frac{N_j}{N}$, and $a>0$ is some constant.

Proof

We first note that $p_i(\widetilde{{{\textbf {X}}}})=\mathcal {N}(\widetilde{{{\textbf {X}}}}\mid \widetilde{{\overline{{\textbf {X}}}}}_i, \widetilde{\varvec{\Sigma }})$, where $\widetilde{ {{\textbf {X}}}}_i={\textbf {w}}^T{\textbf {X}}_i$, $\widetilde{{\overline{{\textbf {X}}}}}_i={\textbf {w}}^T\overline{{\textbf {X}}}_i\in \mathbb {R}^{1\times n}$ is the i-class mean, and $\widetilde{\varvec{\Sigma }}$ is the covariance matrix in the $1\times n$ projected space. Denote

$$\begin{aligned} {\textbf {D}}=\begin{pmatrix}{} {\textbf {w}}^T{\textbf {X}}_1\\ \vdots \\ {\textbf {w}}^T{\textbf {X}}_N\end{pmatrix}^T\in \mathbb {R}^{n\times N} ~\text {and}~\widetilde{\overline{{\textbf {X}}}}_{{\textbf {I}}}=\begin{pmatrix} {\textbf {w}}^T\overline{{\textbf {X}}}_{t_1}\\ \vdots \\ {\textbf {w}}^T\overline{{\textbf {X}}}_{t_N}\end{pmatrix}^T\in \mathbb {R}^{n\times N}. \end{aligned}$$

(3)

Then $\widetilde{\varvec{\Sigma }}=({\textbf {D}}-\widetilde{\overline{{\textbf {X}}}}_{{\textbf {I}}}) ({\textbf {D}}-\widetilde{\overline{{\textbf {X}}}}_{{\textbf {I}}})^T$.

According to [44], we have

$$\begin{aligned} \begin{aligned} \int \sqrt{p_i(\widetilde{{\textbf {X}}})p_{j}(\widetilde{{\textbf {X}}})}= e^{-\frac{1}{8}(\widetilde{{\overline{{\textbf {X}}}}_{i}}-\widetilde{{\overline{{\textbf {X}}}}_{j}})\widetilde{\varvec{\Sigma }}^{-1}(\widetilde{{\overline{{\textbf {X}}}}_{i}}-\widetilde{{\overline{{\textbf {X}}}}_{j}})^T}. \end{aligned} \end{aligned}$$

(4)

The upper bound of the error $\epsilon _B$ can be estimated as

$$\begin{aligned} \begin{aligned} \epsilon _B&=\sum \limits _{i<j}^k \sqrt{P_iP_j} e^{-\frac{1}{8}(\widetilde{{\overline{{\textbf {X}}}}_{i}}-\widetilde{{\overline{{\textbf {X}}}}_{j}})\widetilde{\varvec{\Sigma }}^{-1}(\widetilde{{\overline{{\textbf {X}}}}_{i}}-\widetilde{{\overline{{\textbf {X}}}}_{j}})^T}\\&=\sum \limits _{i<j}^k \sqrt{P_iP_j} e^{-\frac{1}{8}\Vert (\widetilde{{\overline{{\textbf {X}}}}_{i}}-\widetilde{{\overline{{\textbf {X}}}}_{j}})\widetilde{\varvec{\Sigma }}^{-\frac{1}{2}}\Vert _2^2}\\&\le \sum \limits _{i<j}^k \sqrt{P_iP_j} \left( 1-\frac{a}{8}\Vert (\widetilde{{\overline{{\textbf {X}}}}_{i}}-\widetilde{{\overline{{\textbf {X}}}}_{j}})\widetilde{\varvec{\Sigma }}^{-\frac{1}{2}}\Vert _2^2\right) \\&=\sum \limits _{i<j}^k \sqrt{P_iP_j} -\frac{a}{8}\sum \limits _{i<j}^k\sqrt{P_iP_j}\cdot \Vert ({\textbf {w}}^T{{\overline{{\textbf {X}}}}_{i}}-{\textbf {w}}^T{{\overline{{\textbf {X}}}}_{j}})\widetilde{\varvec{\Sigma }}^{-\frac{1}{2}}\Vert _2^2\\&\le \sum \limits _{i<j}^k \sqrt{P_iP_j} -\frac{a}{8}\sum \limits _{i<j}^k\sqrt{P_iP_j}\cdot \frac{\Vert ({\textbf {w}}^T{{\overline{{\textbf {X}}}}_{i}}-{\textbf {w}}^T{{\overline{{\textbf {X}}}}_{j}})\Vert _2^2}{\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2}\\&\le \sum \limits _{i<j}^k \sqrt{P_iP_j} -\frac{a}{8}\sum \limits _{i<j}^k\sqrt{P_iP_j}\cdot \Vert {\textbf {w}}^T({\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}})\Vert _2^2\\&\quad +\frac{a}{8}\sum \limits _{i<j}^k \sqrt{P_iP_j}\cdot \Delta _{ij}'\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2, \end{aligned} \end{aligned}$$

(5)

where $\Delta _{ij}'= \frac{1}{4}\Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2$, $a>0$ is some constant.

For the first inequality of (5), note that the real value function $f(z)=e^{-z}$ is concave when $z\in [0,b]$, $b>0$; therefore, $e^{-z}\le 1-\frac{1-e^{-b}}{b}z$. By taking $a=\frac{1-e^{-b}}{b}$ and noting $\widetilde{{\overline{{\textbf {X}}}}_{i}}={\textbf {w}}^T{\overline{{\textbf {X}}}}_{i}$, the first inequality is obtained. For the second inequality, we first note that for any ${\textbf {z}}\in \mathbb {R}^{1\times {n}}$ and an invertible ${\textbf {A}}\in \mathbb {R}^{n\times n}$, $\Vert {\textbf {z}}\Vert _2=\Vert ({\textbf {z}}{} {\textbf {A}}){\textbf {A}}^{-1}\Vert _2\le \Vert {\textbf {z}}{} {\textbf {A}}\Vert _2\cdot \Vert {\textbf {A}}^{-1}\Vert _F$, which implies $\Vert {\textbf {z}}{} {\textbf {A}}\Vert _2\ge \frac{\Vert {\textbf {z}}\Vert _2}{\Vert {\textbf {A}}^{-1}\Vert _F}$. By taking ${\textbf {z}}={\textbf {w}}^T{{\overline{{\textbf {X}}}}_{i}}-{\textbf {w}}^T{{\overline{{\textbf {X}}}}_{j}}$ and ${\textbf {A}}=\widetilde{\varvec{\Sigma }}^{-\frac{1}{2}}$, we get the second inequality. For the last inequality, since $\Vert {\textbf {w}}\Vert _2=1$, $\Vert {\textbf {w}}^T({\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}})\Vert _2^2\le \Vert {\textbf {w}}\Vert _2^2\cdot \Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2 = \Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2$ and $\frac{1}{\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2} \left( 1-\frac{1}{\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2}\right) \le \frac{1}{4}$, we have

$$\begin{aligned} \begin{aligned}&\left( \Vert {\textbf {w}}^T({\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}})\Vert _2^2 -\frac{\Vert {\textbf {w}}^T({\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}})\Vert _2^2}{\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2}\right) \cdot \frac{1}{\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2}\\&\quad =\Vert {\textbf {w}}^T({\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}})\Vert _2^2\cdot \frac{1}{\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2} \left( 1-\frac{1}{\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2}\right) \\&\quad \le \frac{1}{4}\Vert {\textbf {w}}^T({\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}})\Vert _2^2\\&\quad \le \frac{1}{4}\Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2\\&\quad = \Delta _{ij}'. \end{aligned} \end{aligned}$$

(6)

which implies

$$\begin{aligned} \begin{aligned}&-\frac{\Vert {\textbf {w}}^T({\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}})\Vert _2^2}{\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2} \le -\Vert {\textbf {w}}^T({\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}})\Vert _2^2 +\Delta _{ij}'\cdot \Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2. \end{aligned} \end{aligned}$$

(7)

By multiplying $\frac{a}{8}\sqrt{P_iP_j}$ to both sides of (7) and summing it over all $1\le i<j\le k$, we obtain the last inequality of (5).

Take $\Delta =\sum \nolimits _{i<j}^k\sqrt{P_iP_j}\Delta _{ij}'= \frac{1}{4}\sum \nolimits _{i<j}^k\frac{\sqrt{N_iN_j}}{N}\Vert {\overline{{\textbf {X}}}_{i}-\overline{{\textbf {X}}}_{j}}\Vert _F^2$, and note that $\Vert \widetilde{\varvec{\Sigma }}^{\frac{1}{2}}\Vert _F^2=\sum \nolimits _{i=1}^{k}\sum \nolimits _{s=1}^{N_i}\Vert {\textbf {w}}^T({\textbf {X}}_{s}^{i}-\overline{{\textbf {X}}}_i)\Vert _2^2$, we then obtain (2). $\square$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, Y.R., Bai, Y.Q. Two-dimensional k-subspace clustering and its applications on image recognition. Int. J. Mach. Learn. & Cyber. 14, 2671–2683 (2023). https://doi.org/10.1007/s13042-023-01790-0

Download citation

Received: 27 June 2022
Accepted: 20 January 2023
Published: 23 February 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s13042-023-01790-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Two-dimensional k-subspace clustering and its applications on image recognition

Abstract

Similar content being viewed by others

A General Framework for Dimensionality Reduction of K-Means Clustering

Robust Subspace Clustering Based on Latent Low-rank Representation with Weighted Schatten-p Norm Minimization

Discriminant Manifold Learning via Sparse Coding for Image Analysis

1 Introduction