K-BEST subspace clustering: kernel-friendly block-diagonal embedded and similarity-preserving transformed subspace clustering

Maggu, Jyoti; Goel, Anurag

doi:10.1007/s10044-024-01336-2

K-BEST subspace clustering: kernel-friendly block-diagonal embedded and similarity-preserving transformed subspace clustering

Original Article
Published: 19 September 2024

Volume 27, article number 119, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Pattern Analysis and Applications Aims and scope Submit manuscript

K-BEST subspace clustering: kernel-friendly block-diagonal embedded and similarity-preserving transformed subspace clustering

Download PDF

Jyoti Maggu¹ &
Anurag Goel²

Abstract

Subspace clustering methods, employing sparse and low-rank models, have demonstrated efficacy in clustering high-dimensional data. These approaches typically assume the separability of input data into distinct subspaces, a premise that does not hold true in general. Furthermore, prevalent low-rank and sparse methods relying on self-expression exhibit effectiveness primarily with linear structure data, facing limitations in processing datasets with intricate nonlinear structures. While kernel subspace clustering methods excel in handling nonlinear structures, they may compromise similarity information during the reconstruction of original data in kernel space. Additionally, these methods may fall short of attaining an affinity matrix with an optimal block-diagonal property. In response to these challenges, this paper introduces a novel subspace clustering approach named Similarity Preserving Kernel Block Diagonal Representation based Transformed Subspace Clustering (KBD-TSC). KBD-TSC contributes in three key aspects: (1) integration of a kernelized version of transform learning within a subspace clustering framework, introducing a block diagonal representation term to generate an affinity matrix with a block-diagonal structure. (2) Construction and integration of a similarity preserving regularizer into the model by minimizing the discrepancy between inner products of the original data and those of the reconstructed data in kernel space. This facilitates enhanced preservation of similarity information between the original data points. (3) Proposal of KBD-TSC by integrating the block diagonal representation term and similarity preserving regularizer into a kernel self-expressing model. The optimization of the proposed model is efficiently addressed through the alternating direction method of multipliers. This study validates the effectiveness of the proposed KBD-TSC method through experimental results obtained from nine datasets, showcasing its potential in addressing the limitations of existing subspace clustering techniques.

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent research endeavors, considerable attention has been directed towards the development of subspace clustering methods, addressing the efficient processing of high-dimensional data. Subspace clustering serves as a valuable technique for grouping high-dimensional data distributed across a union of subspaces. The fundamental premise of subspace clustering posits that data samples within the same cluster should reside in a common subspace. Widely applied across various domains such as computer vision, face clustering [24, 55, 75], video analysis [66], image representation and compression [14], hyperspectral image processing [73], saliency detection [6], motion segmentation [65], and domain adaptation task [28], subspace clustering has become a versatile tool.

Existing subspace clustering methods fall into five broad categories [60]: iterative models [13], algebraic models [62], statistical models [8], spectral clustering-based models [6, 23, 30, 66], and deep learning-based models [15, 17, 18, 42, 78]. Among these, spectral clustering-based subspace clustering stands out as the most renowned due to its extensive exploration and practical applications [6, 54, 71]. The spectral clustering-based subspace clustering primarily consists of two components: (1) the construction of an affinity matrix and (2) spectral clustering [53]. Recently, diverse forms of affinity matrices have surfaced, depending on the regularization term incorporated. The most notable models include sparse subspace clustering (SSC) [4, 5, 59] and clustering based on low-rank representation (LRR) [3, 12, 26, 63, 72, 77]. These approaches, rooted in sparse or low-rank representation techniques, acquire a coefficient matrix through a self-expression model. SSC enforces an $l_1$ norm constraint on coefficients, while LRR employs $l_*$ norm or nuclear norm constraints on coefficients. Several works focused on enhancing the robustness of subspace clustering algorithms by employing a multi-objective framework and subspace optimization techniques [41], integrating subspace fuzzy clustering techniques [64] or adjusting the threshold based on the distribution of data points within subspaces [22].

Various techniques for subspace clustering have been developed, each employing distinct constraints to derive an optimal affinity matrix [1, 2, 30, 44, 56, 61, 71]. However, these methods are limited to processing linear subspaces, proving inadequate for handling the prevalent nonlinear structures found in real-world data [21, 52]. Another constraint is the assumption that data can be separated into distinct subspaces. Recently, techniques based on multiview clustering [11, 45,46,47, 57, 58, 74, 76] have also gained attention.

To address the limitations associated with linear subspace clustering on nonlinear data, kernel self-expression methods have been introduced [16, 27, 38,39,40, 66]. Kernelized SSC (KSSC) [40] and kernelized LRR (KLRR) [37, 66] are two prominent methods that capture nonlinear structure information in the input space, exhibiting significant advancements in various applications. Despite their efficiency in processing nonlinear data, these kernel subspace clustering methods may lose some similarity information between samples during the reconstruction of the original data in kernel space.

Recently, transform learning-based approaches for subspace clustering [9, 10, 33,34,35] have garnered attention. These methods operate on data originally deemed inseparable into subspaces, transforming it into a high-dimensional feature space where linear separability into subspaces is achievable. Notably, these methods eliminate the need to manually choose a mapping function, as they autonomously learn the mapping from the data itself.

In practical scenarios, real-world data exhibiting manifold structures often entails complexities beyond mere sparsity or low-rank characteristics. Consequently, it becomes crucial to formulate a representation that can adeptly capture the intricate structural information inherent in the original data. Numerous methodologies have been devised to uncover underlying structures by delving into data relationships [33,34,35]. Recently introduced subspace clustering methodologies grounded in structure learning comprise Similarity Learning via Kernel-Preserving Embedding (SLKE) [20] and Structure Learning with Similarity Preserving (SLSP) [19]. SLKE constructs a model that preserves similarity information among data, resulting in improved performance. In contrast, SLSP establishes a structure learning framework that integrates the similarity information of the original data, addressing potential drawbacks associated with the SLKE algorithm, which could lead to the loss of certain low-order information. Despite these methods exhibiting commendable performance, their effectiveness is dependent on a learned similarity matrix that might lack an optimal block diagonal structure for spectral clustering.

In the literature, diverse norm regularization terms have been utilized in self-expressive models to acquire a block diagonal coefficient matrix. These terms include the 1-norm, l2-norm, and nuclear norm. However, these regularization techniques exhibit two shortcomings: an inability to control the number of blocks in the coefficient matrix and the potential suboptimality of the learned coefficient matrix due to data noise. Addressing these limitations, Block Diagonal Representation (BDR) subspace clustering algorithms [30, 70] have been introduced, aiming to attain a good block diagonal structure in the coefficient matrix. For example, Implicit Block Diagonal Low-Rank Representation (IBDLR) integrates block diagonal priors and implicit feature representation into the low-rank representation model, progressively enhancing clustering performance [67]. Notably, these BDR-based subspace clustering methods, while effective, have not been integrated into similarity-preserving mechanisms. The kernelized version of transform learning was introduced in [32].

This work proposes “Similarity Preserving Kernel Block Diagonal Representation based Transformed Subspace Clustering (KBD-TSC)” that leverages the kernel self-expressing framework. Kernelized transformed subspace clustering accounts for the data that is not originally separable into subspaces by leveraging kernel self-expression-based transform learning. The proposed method transforms the data into high-dimensional feature spaces where they are linearly separable into subspaces. It doesn’t suffer from choosing the mapping function, as it learns the mapping from the data itself. Although the kernel subspace clustering methods based on kernel self-expression can efficiently process the nonlinear structure data, some similarity information between samples may be lost when reconstructing the original data in kernel space. The integration of similarity preserving regularizer and block diagonal regularizer into the proposed model facilitates enhanced preservation of similarity information between the original data points. Experimental results on nine datasets validate the effectiveness and robustness of the proposed KBD-TSC method.

The principal contributions of this paper are as follows:

A novel subspace clustering approach is proposed which accounts for the data that is not originally separable into subspaces by leveraging kernel self-expression-based transform learning.
A similarity preserving regularizer is incorporated in the proposed model to facilitate enhanced preservation of similarity information between the original data points.
The integration of a block diagonal representation into the proposed model aims to derive a similarity matrix characterized by an optimal block diagonal structure. This helps in achieving the desired optimal block diagonal matrix.
The proposed KBD-TSC model is evaluated on nine datasets featuring different types of manifolds, including handwritten digits clustering, face image clustering, object clustering, and text clustering. The experiments involved comparing the proposed model with several state-of-the-art approaches. The results strongly support the effectiveness of our proposed model.

The structure of the remaining paper is organized as follows: Sect. 2 provides a review of basic concepts in related work. Section 3 elaborates on the proposed algorithms and their solutions. Subsequently, Sect. 4 discusses experimental results, and Sect. 5 concludes the paper.

2 Background

2.1 Subspace clustering

The standard subspace clustering techniques are self-expression-oriented, where the aim is to express every single data point in terms of the linear combination of the rest of other data points that lie under the same subspace. The data must hold the basic pre-requisite of being separable into various subspaces. Let $\varvec{X} = [x_1, x_2, \cdots , x_N] \in \Re ^{d \times N}$ be the matrix of data points where every column vector $\varvec{x_i}$ is drawn from a union of lower-dimensional subspaces $\begin{Bmatrix} \varvec{S_1}\bigcup \varvec{S_2} \bigcup \cdots \varvec{S_n} \end{Bmatrix}$ that have dimensions $\begin{Bmatrix} d_k \end{Bmatrix}_{k=1}^n$ where n is the total number of manifolds. The subspace clustering technique targets to segments each set $X_k$ of $N_k$ points that basically belong to the same subspace $S_k$ of dimension $d-k$.

$$\begin{aligned} \underset{{\textbf {Z}}}{\text {minimize }}\frac{1}{2}\left\| {\textbf {X}}- {\textbf {XZ}}\right\| _F^2 + \lambda (\Omega ({\textbf {Z}})), \; \text {s.t.\; diag}({\textbf {Z}})=0, \; {\textbf {Z}} \ge 0. \end{aligned}$$

(1)

where $\Omega (\varvec{Z})$ is the regularization term and $\lambda >0$ denotes the hyperparameter. $\left\| {\textbf {Z}} \right\| _*$, $\left\| {\textbf {Z}} \right\| _1$, $\left\| {\textbf {Z}} \right\| _F^2$ are three common regularizers. Then an affinity matrix is constructed using $\varvec{Z}$, which applies any graph-cut technique to compute clusters.

2.2 Kernelized subspace clustering

The kernelized version for the subspace clustering is backed by mapping a kernel function as follows:

$$\begin{aligned} \underset{{\textbf {Z}}}{\text {minimize }} \frac{1}{2} \Vert \ker ({\textbf {X}}) - \ker ({\textbf {X}}){\textbf {Z}} \Vert _F^2 + \lambda (\Omega ({\textbf {Z}})) \nonumber \\ \equiv \underset{{\textbf {Z}}}{\text {minimize }} Tr({\textbf {I}}-2{\textbf {Z}}+{\textbf {Z}}^\top {\textbf {Z}})K + \lambda (\Omega ({\textbf {Z}})), \; \text {s.t. \; diag}({\textbf {Z}})=0, \; {\textbf {Z}} \ge 0. \end{aligned}$$

(2)

where $\ker (\varvec{X})$ is the kernel mapping function, $\varvec{K}$ is the kernel matrix where each matrix element is calculated as $K_{i,j} = \ker (X_i)^\top \ker (X_j)$.

2.3 Kernelized transform learning

A more recent unsupervised representation learning methodology known as transform learning that functions as a dictionary learning method with analysis capabilities. It seeks to train the transform matrix and coefficient matrix from the input data in such a way that the learned transform is capable of analysing the data and ultimately generating the coefficients matrix [48,49,50,51]. Since we anticipate that the input data would be able to be divided into many groups, we typically use raw data pixels as input to the clustering method. However, transform learning can be applied to any high-dimensional data, resulting in the effective representations of the transform matrix and the coefficients’ matrix being learned in latent space. The data’s nonlinearity can be handled by the kernel approach in an effective and efficient manner. Kernelization can be used if the learned transform coefficients are not linearly divided into distinct subspaces. The kernel transform learning [32] formulation can be given as Eq. 3.

$$\begin{aligned} \underset{{\textbf {A}},{\textbf {Z}}}{\text {minimize }} \Vert {\textbf {AK}} - {\textbf {Z}} \Vert _F^2 + \epsilon (\Vert {\textbf {A}} \Vert _F^2 - logdet({\textbf {A}})) + \mu \Vert {\textbf {Z}} \Vert _1. \end{aligned}$$

(3)

where $\varvec{K} = \ker ({\textbf {X}})^\top \ker ({\textbf {X}})$ is the kernel, $\epsilon$ and $\mu$ are the hyperparameters. Constraints introduced in Eq. 3 assist prevent trivial solutions. The first constraint − logdet^{Footnote 1}$(\varvec{A})$ guarantees that $\varvec{A}$ is the complete matrix. To prevent it from rising to the other extreme, the second restriction $\left\| \varvec{A} \right\| _F^2$ is used. The other restriction, $\left\| \varvec{Z} \right\| _1$, is used to make the coefficients sparser.

Equation 3. is solved using an alternating minimization method. Equation 4. describes how the iterative updating procedures for $\varvec{A}$ and $\varvec{Z}$ are carried out alternately.

$$\begin{aligned} \begin{array}{l} {\textbf {A}} \leftarrow \underset{{\textbf {A}}}{\text {minimize }} \left\| {\textbf {AK}} - {\textbf {Z}} \right\| _F^2 + \epsilon ( \left\| {\textbf {A}} \right\| _F^2 - {\text{log det}}({\textbf {A}}));\\ {\textbf {Z}} \leftarrow \underset{{\textbf {Z}}}{\text {minimize }} \left\| {\textbf {AK}} - {\textbf {Z}} \right\| _F^2 + \mu \left\| {\textbf {Z}} \right\| _1. \end{array} \end{aligned}$$

(4)

The update of $\varvec{A}$ is pretty straightforward as it is directly differentiable. There are some linear algebraic tricks to solve it too which are given in [31]. Equation 5 describes how to perform one-step soft thresholding to update $\varvec{Z}$.^{Footnote 2}

$$\begin{aligned} {\textbf {Z}} \leftarrow signum({\textbf {AK}}).\max (0, abs({\textbf {AK}})- \mu ). \end{aligned}$$

(5)

2.4 Transformed subspace clustering

Subspace clustering operates on the underlying assumption that the given data can be segmented into numerous sub-spaces. Yet, in the context of high-dimensional data, this assumption often does not hold true. To address this challenge, we employ clustering algorithms on the features derived from transform learning. This approach involves projecting the data onto a latent space, where we acquire separable coefficients that define distinct sub-spaces [34]. The modified subspace clustering can be articulated as follows, focusing on acquiring sparse representations based on the transformed features.

$$\begin{aligned} \underset{{\textbf {A,Z,C}}}{\text {minimize }} \Vert {\textbf {AX}} - {\textbf {Z}} \Vert _F^2 + \epsilon (\Vert {\textbf {A}} \Vert _F^2 - {\text{log det}}({\textbf {A}})) + \gamma \Vert {\textbf {Z}} - {\textbf {ZC}} \Vert _F^2 + \lambda (\Omega ({\textbf {C}})). \end{aligned}$$

(6)

Alternate minimization, as described in Eq. 7, is utilised to solve Eq. 6. In each cycle, we alternately update $\varvec{A}$, $\varvec{Z}$, and $\varvec{C}$.

$$\begin{aligned} \begin{array}{*{20}{l}} {\textbf {A}} \leftarrow \underset{{\textbf {A}}}{\text {minimize }} \Vert {\textbf {AX}} - {\textbf {Z}} \Vert _F^2 + \epsilon (\Vert {\textbf {A}} \Vert _F^2 - {\text{log det}}({\textbf {A}})); \\ {\textbf {Z}} \leftarrow \underset{{\textbf {Z}}}{\text {minimize }} \Vert {\textbf {AX}} - {\textbf {Z}} \Vert _F^2 + \gamma (\Vert {\textbf {Z}}-{\textbf {ZC}} \Vert _F^2); \\ {\textbf {C}} \leftarrow \underset{{\textbf {C}}}{\text {minimize }} \Vert {\textbf {Z}}-{\textbf {ZC}} \Vert _F^2 + \lambda (\Omega ({\textbf {C}})). \\ \end{array} \end{aligned}$$

(7)

After calculating the affinity matrix and applying spectral clustering to it to discover clusters, we obtain $\varvec{C}$. Additionally, to create the kernelized transform subspace clustering formulation, the subspace clustering loss is included in equation 6.

2.5 Block diagonal representation

By adding a block-diagonal regularisation term, the BDR method [29] completes the block diagonal matrix and achieves higher clustering performance. The BDR algorithm’s optimization model is expressed as follows:

$$\begin{aligned} \underset{{\textbf {Z}}}{\text {minimize }} \frac{1}{2} \Vert {\textbf {Z}} \Vert _{\fbox {m}} + \Vert \ker ({\textbf {X}}) - \ker ({\textbf {X}}){\textbf {Z}} \Vert _F^2, s.t. {\textbf {Z}} \ge 0, {\textbf {Z}}^\top = {\textbf {Z}}, diag({\textbf {Z}})=0. \end{aligned}$$

(8)

Here, $\varvec{X}$ and $\varvec{Z}$ represent data matrix and coefficient matrix respectively, while $\left\| \varvec{Z} \right\| _{\fbox {m}}$ denotes m-block diagonal regularizer.

3 Proposed method: KBD-TSC

The fundamental assumption underlying self-expression-based subspace clustering is the requirement for data to be segregable into distinct subspaces. Traditional methods also rely on the assumption of an inherent linear structure. However, when these assumptions are not met, particularly when data samples are non-separable into subspaces and linear subspace clustering methods struggle with non-linear structures, a more adaptable model is needed. In response to this challenge, we present a model designed to generalize effectively on non-linear structured data, even when they are not easily separable into subspaces.

The proposed Kernel Block Diagonal-based Transformed Subspace Clustering (KBD-TSC), is adept at preserving similarity information among samples, concurrently achieving an optimal block diagonal structure in the obtained similarity matrix. This approach involves embedding a non-linear model that integrates kernelized transformed subspace clustering with a kernel self-expression framework to achieve the desired objective. Incorporating a block diagonal regularization term into the kernel self-expression framework is pivotal for obtaining a similarity matrix characterized by a block diagonal structure. Furthermore, the preservation of similarity information is secured by minimizing the distinction between two inner products: one encompassing the inner products among original data in kernel space and the other involving the inner products of reconstructed data in kernel space. The resolution of the entire optimization problem employs alternate minimization techniques.

3.1 Similarity preserving model

To uphold similarity information among samples, our objective is to minimize the difference between two inner products. One corresponds to the inner product among the original data in kernel space, and the other corresponds to the inner product of the reconstructed data in kernel space, drawing inspiration from the work of Kang et al. [20].

$$\begin{aligned} \underset{{\textbf {Z}}}{\text {minimize }} \frac{1}{2} \Vert {\textbf {K}}-{\textbf {Z}}^\top {\textbf {KZ}} \Vert _F^2. \end{aligned}$$

(9)

where $\varvec{K} = \ker ({\textbf {X}})^\top \ker ({\textbf {X}})$ is a positive semi-definite matrix.

3.2 Proposed algorithm

We introduce a model designed for effective generalization on non-linear manifolds. This model integrates kernelized self-expression transformed subspace clustering with a similarity-preserving kernel block diagonal representation. The kernelized component of transform learning characterizes non-linear data as a linear combination of itself in the transform domain. The incorporation of transform learning with subspace clustering loss facilitates the separation of data into subspaces. To enhance this process, a block diagonal regularization term is introduced, aiming to achieve a similarity matrix between samples with a block diagonal structure. Consequently, our proposed model not only preserves the similarity information among non-linear samples in the transform domain but also simultaneously acquires a similarity matrix with an optimal block diagonal structure.

The complete joint formulation for the proposed model is expressed as Eq. 10.

$$\begin{aligned}{} & {} \underset{{\textbf {A,Z}}}{\text {minimize }} \left (\underbrace{\Vert {\textbf {AK-Z}} \Vert _F^2 + \epsilon (\Vert {\textbf {A}} \Vert _F^2 - {\text{log det}}({\textbf {A}}))}_\text{Kernelized\,\,transform\,\,learning} \nonumber\right. \\ {} & {} \quad\left. + \underbrace{\frac{1}{2}Tr({\textbf {Z}}-2{\textbf {KZ}}+{\textbf {Z}}^{T}{} {\textbf {KZ}})}_\text{Kernel\,\,self\,\,expression\,\,subspace\,\,clustering} + \underbrace{\alpha \Vert {\textbf {K}}-{\textbf {Z}}^{T}{} {\textbf {KZ}} \Vert _F^2 + \gamma \Vert {\textbf {Z}} \Vert _{\fbox {k}}}_\text{Similarity\,\,preserving\,\,with\,\,block\,\,diagonal}\right )\nonumber \\{} & {} \quad s.t.\; {\textbf {Z}}\ge 0, diag\left( {\textbf {Z}}\right) = 0, {\textbf {Z}}^\top ={\textbf {Z}}. \end{aligned}$$

(10)

where $\alpha , \epsilon , \gamma$ are positive hyperparameters.

To simplify and separate out the variables, let us introduce an auxiliary matrix $\varvec{B}$ and a regularization term $\left\| {\varvec{Z-B}} \right\| _F^2$ into our proposed model. Thus, the optimization problem in equation 10 can be translated to

$$\begin{aligned} \underset{{\textbf {A,Z,B}}}{\text {minimize }} \begin{array}{ll} (\left\| {{\textbf {AK}} - {\textbf {Z}}} \right\| _F^2+\epsilon \left( {\left\| {\textbf {A}} \right\| _F^2 - {\text{log det}}({\textbf {A}})} \right) + \frac{1}{2}Tr\left( {\textbf {K}}-2{\textbf {KZ}}+{\textbf {Z}}^\top {\textbf {KZ}} \right) \\ + \alpha {\left\| {\textbf {K}}- {\textbf {Z}}^\top {\textbf {KZ}} \right\| _F^2}+ \frac{\beta }{2} \left\| {{\textbf {Z-B}}} \right\| _F^2 + \gamma \left\| {\textbf {B}} \right\| _{\fbox {k}}) \\ s.t.\; {\textbf {Z}}\ge 0, diag\left( {\textbf {Z}} \right) = 0, {\textbf {Z}}^\top ={\textbf {Z}}. \\ \end{array} \end{aligned}$$

(11)

3.3 Optimization of the proposed KBD-TSC model

To facilitate the solution of the problem in equation 11, three new auxiliary variables are introduced that leads to the following equivalent problem:

$$\begin{aligned} \underset{{\textbf {A,Z,B,J,G,H}}}{\text {minimize }}\begin{array}{ll} (\left\| {{\textbf {AK}}-{\textbf {Z}}} \right\| _F^2+\epsilon \left( {\left\| {\textbf {A}} \right\| _F^2 - {\text{log det}}({\textbf {A}})} \right) + \frac{1}{2}Tr\left( {\textbf {K}}-2{\textbf {KJ}}+{\textbf {Z}}^\top {\textbf {K J}} \right) \\ + \alpha {\left\| {\textbf {K}}- {\textbf {G}}^\top {\textbf {K H}} \right\| _F^2} + \frac{\beta }{2} \left\| {{\textbf {J-B}}} \right\| _F^2 + \gamma \left\| {\textbf {B}} \right\| _{\fbox {k}}) \\ s.t.\; {\textbf {B}}\ge 0, diag\left( {\textbf {B}} \right) = 0, {\textbf {B}}^\top ={\textbf {Z}}, {\textbf {G}}={\textbf {Z}}, {\textbf {H}}={\textbf {Z}}. \\ \end{array} \end{aligned}$$

(12)

We use ADMM for solving equation 12, and its corresponding Lagrangian [25] is given as follows:

$$\begin{aligned} \begin{array}{ll} {\L } \left( {\textbf {Z,J,G,H,B}}, \lambda _1, \lambda _2, \lambda _3 \right) \\ \quad =\left\| {{\textbf {AK-Z}}} \right\| _F^2+\epsilon \left( {\left\| {\textbf {A}} \right\| _F^2 - {\text{log det}}{\textbf {A}}} \right) + \frac{1}{2}Tr\left( {\textbf {K}}-2{\textbf {KJ}}+{\textbf {Z}}^\top {\textbf {K J}}\right) \\ + \alpha {\left\| {\textbf {K}}- {\textbf {G}}^\top {\textbf {K H}} \right\| _F^2} + \frac{\beta }{2} \left\| {{\textbf {J-B}}} \right\| _F^2 + \gamma \left\| {\textbf {B}} \right\| _{\fbox {k}} \\ + \frac{\mu }{2} \left[ \left\| {\textbf {J-Z}}+\frac{\lambda _1}{\mu } \right\| _F^2 + \left\| {\textbf {G-Z}}+\frac{\lambda _2}{\mu } \right\| _F^2 + \left\| {\textbf {H-Z}}+\frac{\lambda _3}{\mu } \right\| _F^2\right] .\\ \end{array} \end{aligned}$$

(13)

where $\lambda _1, \lambda _2, \lambda _3$ are Lagrangian multipliers and $\mu > 0$ is a penalty parameter. Now, these variables can be updated alternately. The updates for all variables are given as follows:

Update A: After keeping other variables fixed, $\varvec{A}$ can be updated as follows:
$$\begin{aligned} \underset{{\textbf {A}}}{\text {minimize }} \Vert {\textbf {AK-Z}} \Vert _F^2+ \epsilon \left( {\Vert {\textbf {A}} \Vert _F^2 - {\text{log det}}({\textbf {A}})} \right) . \end{aligned}$$
(14)
For updating transform, given the original data as in equation 15, we can use equation 16.
$$\begin{aligned} \underset{{\textbf {T}}}{\text {minimize }} \Vert {\textbf {TX-Z}} \Vert _F^2 + \epsilon (\Vert {\textbf {T}} \Vert _F^2 - {\text{log det}}({\textbf {T}})). \end{aligned}$$
(15)
The update of the transform matrix $\varvec{T}$ is straightforward, as each term is directly differentiable. But, there are better ways of solving the update of $\varvec{T}$ by using some linear algebraic tricks [51].
$$\begin{aligned} \begin{gathered} {\textbf {X}}{{\textbf {X}}^\top } + \epsilon {\textbf {I}} = {\textbf {L}}{{\textbf {L}}^\top }, \\ {{\textbf {L}}^{ - 1}}{} {\textbf {X}}{{\textbf {Z}}^\top } = {\textbf {US}}{{\textbf {V}}^\top }, \\ {\textbf {T}} \leftarrow \frac{1}{2}{} {\textbf {U}}({\textbf {S}} + {({{\textbf {S}}^2} + \epsilon {\textbf {I}})^{1/2}}){{\textbf {V}}^\top }{{\textbf {L}}^{ - 1}}. \\ \end{gathered} \end{aligned}$$
(16)
Now, the solution to equation 14 is similar to the update of $\varvec{T}$ in equation 16. Here $\varvec{A}$ acts as transform matrix $\varvec{T}$, and instead of passing original samples $\varvec{X}$, we input kernel matrix $\varvec{K}$.
Update J: After keeping other variables fixed, $\varvec{J}$ can be updated as follows:
$$\begin{aligned} \begin{array}{ll} \underset{{\textbf {J}}}{\text {minimize }} \frac{1}{2}Tr\left( {\textbf {K}}-2{\textbf {KJ+Z}}^\top {\textbf {KJ}} \right) + \frac{\beta }{2} \Vert {{\textbf {J-B}}} \Vert _F^2 + \frac{\mu }{2} \Vert {\textbf {J-Z}}+\frac{\lambda _1}{\mu } \Vert _F^2\\ s.t.\; {\textbf {B}}\ge 0, diag\left( {\textbf {B}} \right) = 0,{\textbf {B}}^\top ={\textbf {B}}. \end{array} \end{aligned}$$
(17)
Taking its first derivative and equating it to 0 gives:
$$\begin{aligned} {\textbf {J}} = \left( {\textbf {K}} +{\textbf {B}}+\mu {\textbf {I}} \right) ^{-1}\left( {\textbf {K}} + \mu {\textbf {Z}} - \lambda _1 + \beta {\textbf {B}} \right) . \end{aligned}$$
(18)
Update G: After keeping other variables fixed, $\varvec{G}$ can be updated as follows:
$$\begin{aligned} \underset{{\textbf {G}}}{\text {minimize }} \alpha \Vert {{\textbf {K-G}}^\top {\textbf {KH}}} \Vert _F^2 + \frac{\mu }{2} \Vert {\textbf {G-Z}}+\frac{\lambda _2}{\mu } \Vert _F^2. \end{aligned}$$
(19)
Taking its first derivative and equating to 0 gives:
$$\begin{aligned} {\textbf {G}} = \left( 2 \alpha {\textbf {KH}} {\textbf {H}}^\top {\textbf {K}}^\top + \mu {\textbf {I}} \right) ^{-1}\left( 2 \alpha {\textbf {KH}} {\textbf {K}}^\top + \mu {\textbf {Z}} - \lambda _2\right) . \end{aligned}$$
(20)
Update H: After keeping other variables fixed, $\varvec{H}$ can be updated as follows:
$$\begin{aligned} \underset{{\textbf {H}}}{\text {minimize }} \alpha \Vert {{\textbf {K}} - {\textbf {G}}^\top {\textbf {KH}}} \Vert _F^2 + \frac{\mu }{2} \left\Vert {\textbf {H-Z}}+\frac{\lambda _3}{\mu } \right\Vert _F^2. \end{aligned}$$
(21)
Taking its first derivative and equating to 0 gives:
$$\begin{aligned} {\textbf {H}} = \left( 2 \alpha {\textbf {K}}^\top {\textbf {G}} {\textbf {G}}^\top {\textbf {K}} + \mu {\textbf {I}} \right) ^{-1}\left( 2 \alpha {\textbf {K}}^\top {\textbf { G K}} + \mu {\textbf {Z}} - \lambda _3\right) . \end{aligned}$$
(22)
Update Z: After keeping other variables fixed, the sub-problem becomes:
$$\begin{aligned} \underset{{\textbf {Z}}}{\text {minimize }} \frac{3 \mu }{2}\left\Vert {\textbf {Z}} - \frac{{\textbf {J}}+{\textbf {G}}+{\textbf {H}}+{\textbf {AK}}+ \left( \lambda _1 + \lambda _2 +\lambda _3 \right) / \mu }{3} \right\Vert _F^2. \end{aligned}$$
(23)
Taking its first derivative and equating to 0 gives:
$$\begin{aligned} {\textbf {Z}} = \frac{{\textbf {J}}+{\textbf {G}}+{\textbf {H}}+{\textbf {AK}}+ \left( \lambda _1 + \lambda _2 +\lambda _3 \right) / \mu }{3}. \end{aligned}$$
(24)
Update B: After keeping other variables fixed, $\varvec{B}$ can be updated as follows:
$$\begin{aligned} \underset{{\textbf {B}}}{\text {minimize }} \frac{\beta }{2} \Vert {{\textbf {J}} - {\textbf {B}}} \Vert _F^2 + \gamma \Vert {\textbf {B}} \Vert _{\fbox {k}} \; s.t.\; {\textbf {B}}\ge 0, diag\left( {\textbf {B}} \right) = 0, {\textbf {B}}^\top ={\textbf {B}}. \end{aligned}$$
(25)
Using K.Fan theorem [7], equation 25 can be rewritten as follows:
$$\begin{aligned} \begin{array}{ll} \underset{{\textbf {B}}}{\text {minimize }} \frac{\beta }{2} \Vert {{\textbf {J}} - {\textbf {B}}} \Vert _F^2 + \gamma \left\langle diag\left( {\textbf {B}} \right) - {\textbf {B}}, {\textbf {S}} \right\rangle \\ s.t.\; {\textbf {B}}\ge {\textbf {0}}, diag\left( {\textbf {B}} \right) = 0, {\textbf {B}}^\top ={\textbf {B}}, {\textbf {0}}< {\textbf {S}} < {\textbf {I}}, Tr({\textbf {S}})=k. \end{array} \end{aligned}$$
(26)
where $\varvec{S}=\varvec{UU}^\top$, $\varvec{U}$ consists of k eigenvectors that correspond to k smallest eigenvalues of $diag(\varvec{B})-\varvec{B}$. Now, equation 26 can be translated to:
$$\begin{aligned} \underset{{\textbf {B}}}{\text {minimize }} \frac{1}{2}\left\| {\textbf {B}}-J+ \frac{\gamma }{\beta } \left( diag({\textbf {S}})1^\top -{\textbf {S}} \right) \right\| _F^2. \end{aligned}$$
(27)
Let us define
$$\varvec{Q} = \varvec{J} - \frac{\gamma }{\beta } \left( diag(\varvec{S})1^\top -\varvec{S} \right) , \; \tilde{\varvec{Q}} = \varvec{Q}- Diag(diag(\varvec{Q})), \; \text {then} \; \varvec{B} = max(0, (\varvec{Q}+ \tilde{\varvec{Q}})/2).$$

Once we obtain the matrix $\varvec{B}$, the similarity matrix can be computed by $(\varvec{B} + \varvec{B} ^\top )/2$. After this, the clustering results can be obtained by applying spectral clustering on the similarity matrix. The step-by-step algorithm is given as algorithm 1 to understand the entire model better.

4 Experimental results and analysis

4.1 Dataset description

The proposed KBD-TSC algorithm is evaluated on nine images datasets which are as follows:

1.
Yale^{Footnote 3}: It consists of 165 facial images of 15 individuals in grayscale mode. The images are resized to 32$\times$32 pixels.
2.
Jaffe^{Footnote 4}: This dataset consists of 213 facial images corresponding to 7 facial expressions. The images are resized to 26 $\times$ 26 pixels.
3.
ORL^{Footnote 5}: It consists of 400 facial images of 40 subjects. Each image is of size 26 $\times$ 26 pixels.
4.
ARFaces^{Footnote 6}: This dataset comprises of 4000 facial images of 126 people.
5.
COIL20^{Footnote 7}: It consists of 1440 images of 20 objects. Each image size is 32 $\times$ 32 pixels.
6.
BA^{Footnote 8}: This dataset contains 1404 images of handwritten digits and uppercase alphabets. Each image size is 20 $\times$16 pixels.
7.
tr11^{Footnote 9}: This is a text dataset consists of 414 samples, 6429 features and 9 classes.
8.
tr41^{Footnote 10}: This is a text dataset consists of 878 samples, 7454 features and 10 classes.
9.
tr45^{Footnote 11}: This is a text dataset consists of 690 samples, 8261 features and 10 classes.

4.2 Baseline methods

We compare our proposed KBD-TSC method with several state-of-the-art methods, including spectral clustering (SC) [36], kernelized sparse subspace clustering (KSSC) [40], Kernel low-rank representation (KLRR) [37], Implicit block diagonal low-rank representation (IBDLR) [67], Similarity Learning via Kernel Preserving Embedding sparse (SLKEs) [20], Similarity Learning via Kernel Preserving Embedding low rank (SLKEr) [20], Structure learning with similarity preserving sparse (SLSPs) [19], Structure learning with similarity preserving low-rank (SLSPr) [19] and Kernel block diagonal representation subspace clustering with similarity preservation (KBDSP) [68].

4.3 Evaluation metrics

During experiments, it is typically presumed that the quantity of clusters is already established. Under these circumstances, various metrics, including accuracy, Normalized Mutual Information (NMI) and Purity, are commonly employed [42, 43]. The metrics are described below:

Accuracy: The accuracy is defined as the ratio of the number of data instances that are assigned the same cluster as in the ground truth to the total number of data instances.
Normalized Mutual Information (NMI): This metric computes the normalized measure of similarity between the labels of same data instances. The range of NMI is [0,1] where 0 signifies no correlation and 1 signifies the perfect correlation.
Purity: Purity measures the extent to which data points within each cluster are assigned to the same true class [69]. A larger purity value indicates better clustering performance.

4.4 Kernel design

We have designed 12 different kernels in this work, including one linear kernel, four polynomial kernels, and seven Gaussian kernels (Table 1).

Table 1 Details of parameter values w.r.t. different kernel functions

Full size table

In the case of the Gaussian kernel, $\sigma$ is the maximum distance between $x_i, x_j$.

4.5 Computational complexity

In the proposed algorithm, the first part is the construction of kernel matrix which is bounded by $O(n^2)$. The second part is updating step of different variables, each one of them is bounded by $O(n^3)$. Thus, the proposed algorithm has overall time complexity of $O(tn^3)$ where t and n represents the number of iterations and number of data samples respectively.

4.6 Parameter sensitivity analysis

There are four hyper-parameters in the proposed KBD-TSC algorithm 11, i.e., $\epsilon , \alpha , \beta , \gamma$. The parameter $\epsilon$ controls the good conditioning of the transform. The parameter $\alpha$ balances the similarity-preserving term $\left\| \varvec{K}-\varvec{Z}^T\varvec{KZ}\right\| _F^2$, the parameter $\beta$ is used to control the term $\left\| \varvec{Z}-\varvec{B}\right\| _F^2$, the parameter $\gamma$ is used to control the block-diagonal structure term $\left\| \varvec{B}\right\| _{\fbox {k}}$.The YALE and JAFFE datasets are used for parameter evaluation using NMI. The parameters $\epsilon , \alpha , \beta , \gamma$ take values from the sets $\left\{ {1e-2, 1e-1, 0.5, 1}\right\}$, $\left\{ {1e-5, 1e-4, 1e-3, 1e-2, 0.1, 1} \right\}$, $\left\{ {1e-5, 1e-3, 0.1, 1} \right\}$, and $\left\{ {1e-2, 1e-1, 1, 10, 30, 50} \right\}$ respectively. The best values of the parameters for the optimal clustering performance is 0.1, 0.01, 0.001, 0.1 for $\epsilon , \alpha , \beta , \gamma$, respectively. Therefore, we keep $\epsilon = 0.1, \alpha =0.01, \beta = 0.001, \gamma = 0.1$ in all the experiments of this paper. The parameter tuning is done using grid searching. The parameter settings of all experiments have been given in the Table 2, in which recommended parameters are indicated in bold. We performed parameter sensitivity analysis on JAFFE Dataset for wide range of $\alpha$, $\beta$ and $\gamma$ values against NMI score. It is observed that the proposed technique is not very sensitive to these hyperparameters. Moreover, when these parameters are set to $\alpha =1e-2, \beta =1e-3, and \gamma = 1e-1$, the best clustering performance is achieved. Hence, we fixed these values for all the experiments.

Table 2 Hyperparameter settings

Full size table

4.7 Results and discussion

The experimental results for all the nine datasets are shown in terms of Accuracy, NMI, and Purity in Tables 3, 4, and 5 respectively. We average out the results of all experiments on 12 kernels. The experimental results are reported by averaging the results of ten iterations in the Tables 3, 4 and 5. From the results, it can be observed that the proposed KBD-TSC approach outperforms the state-of-the-art methods.

Table 3 Comparison of clustering results based upon accuracy

Full size table

Table 4 Comparison of clustering results based upon NMI score

Full size table

Table 5 Comparison of clustering results based upon purity

Full size table

To be more specific in analysis, the results from Tables 3, 4, 5 are discussed below:

1.
Compared to SC algorithm, the proposed KBD-TSC method obtains better results regarding all evaluation metrics: accuracy, NMI, and purity. From Tables 3, 4, 5, it can be easily observed that the average value of accuracy, NMI, and purity of the proposed method are 20.81, 19.32, and 23.44 % higher than SC, respectively. The reason for the same is that the input to spectral clustering is the learned $\varvec{Z}$ instead of the raw kernel matrix.
2.
Our proposed KBD-TSC method also outperforms the kernel-based methods KSSC and KLLR. This is because of the similarity-preserving trick in the transform domain.
3.
In comparison to SLKEs and SLKEr, our proposed algorithm exhibits superior performance. This improvement can be attributed to two key aspects: firstly, the proposed framework for kernel self-expression has the capability to retain specific low-order details from the input data; secondly, the introduction of the term representing block diagonal structures in our model, within the latent transform space, facilitates the acquisition of a similarity matrix characterized by a block diagonal arrangement.
4.
SLSPs and SLSPr, which are capable of handling nonlinear datasets and preserving similarity information, gives better performance as compared to SC, KSSC, KLRR, SLKEs, and SLKEr. However, the proposed KBD-TSC algorithm consistently outperforms them in most instances. Specifically, average values of accuracy, NMI, and purity in Tables 3, 4, and 5 indicate that the proposed method surpasses SLSPs by 15.39%, 16.47%, and 3.17%, respectively. These findings confirm that the introduced term representing block diagonal structures significantly contributes to improving performance.
5.
Both IBDLR and the proposed KBD-TSC mthod facilitate the acquisition of a desired affinity matrix with an optimal block diagonal structure by integrating the block diagonal representation term. Tables 3, 4, and 5 demonstrate that the proposed KBD-TSC method and IBDLR outperform other compared algorithms on all datasets. This underscores the effectiveness of methods incorporating the block diagonal representation term, particularly for datasets with multiple classes. Remarkably, in the case of COIL20 and BA datasets characterized by a larger number of instances, the proposed method demonstrates a performance improvement of almost 15% compared to alternative methods, with the exception of IBDLR on the COIL20 dataset. Additionally, the suggested KBD-TSC method surpasses the performance of IBDLR specifically on the COIL20 dataset, underscoring the advantages of incorporating a similarity-preserving strategy in the transform domain.
6.
In datasets with high-dimensional features such as TR11, TR41, and TR45, SLSPr demonstrates superior performance to IBDLR, attributed to its integration of a similarity-preserving mechanism. Capitalizing on both the similarity-preserving strategy and the block diagonal representation term, the suggested KBD-TSC consistently outshines IBDLR and even surpasses SLSPr in most instances across TR11, TR41, and TR45 datasets. These outcomes underscore the effectiveness of the proposed KBD-TSC method in effectively managing datasets with intricate features, enabling the extraction of inherent data structures.

In a nutshell, the experimental results demonstrate the effectiveness of our proposed KBD-TSC method combined with similarity preserving regularizer, transform learning-based kernel self-expressing model, and block diagonal representation term.

4.8 Convergence analysis

The convergence plots of the proposed method is shown in Fig. 1. For all the datasets, the proposed method converges within 10 iterations.

4.9 Computational time

The experiments are conducted on a 64-bit Windows system with Intel i7 processor and 32GB RAM. The running time of the proposed method and the various state-of-the-art methods for all the datasets are shown in Table 6. From Table 6, it can be observed that the proposed KBD-TSC method is the fastest among all kernel-based techniques.

Table 6 Runtime comparison (in seconds)

Full size table

4.10 Ablation experiments

For the ablation experiments, we have compared the NMI score of the proposed technique against the objective function without the similarity preserving term and the objective function without the block diagonal term. It is observed that these terms are important for making clustering efficient. The NMI score gives the best value for all the datasets when these terms are included in the objective function. The ablation experiment results are discussed in Table 7. The objective function without the similarity preserving term is labelled as “similarity-” and the objective function without the block diagonal term is labelled as “block-.”

Table 7 Comparison of NMI score of the proposed objective function against the objective function without similarity preserving term and the objective function without block diagonal term for all the datasets

Full size table

5 Conclusion

This paper presents a novel subspace clustering approach that integrates transform learning-based kernel block diagonal representation and a similarity-preserving strategy. The method exhibits effective performance even when the raw data lacks inherent separability into subspaces and demonstrates robust generalization capabilities for non-linear manifolds. The proposed KBD-TSC operates through a three-step process. Initially, it captures the non-linear structure of the input data by incorporating the kernel self-expressing framework into the transform learning-based framework. The second step introduces the block diagonal representation term to create a similarity matrix with a block diagonal structure. In the final step, the similarity-preserving term is introduced to capture pairwise similarity information between various data points. The effectiveness of the proposed approach is evaluated on nine benchmark datasets, showcasing its superiority over several state-of-the-art methods. In future work, we aim to extend the proposed method to multiple kernel learning.

Notes

logdet(T)= log(singular values). If some singular value $\le 0$, then the log takes + ∞ as output. For the case when $\varvec{T}$ is not square, the algorithm solves -logdet$(\varvec{T'T)} + ||\varvec{T}||^2_F$.
The signum function returns -1 if the argument is negative, 0 if the argument is zero, and 1 if the argument is positive.
http://cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html.
https://zenodo.org/record/3451524.
https://cam-orl.co.uk/facedatabase.html.
https://www2.ece.ohio-state.edu/~aleix/ARdatabase.html.
https://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php.
https://cs.nyu.edu/~roweis/data/.
https://trec.nist.gov/.
https://trec.nist.gov/.
https://trec.nist.gov/.

References

Bai L, Liang J (2020) Sparse subspace clustering with entropy-norm. ICML’20. JMLR.org
Chen H, Wang W, Feng X, He R (2018) Discriminative and coherent subspace clustering. Neurocomputing 284:177–186. https://doi.org/10.1016/j.neucom.2018.01.006
Article Google Scholar
Chen J, Zhang H, Mao H, Sang Y, Yi Z (2016) Symmetric low-rank representation for subspace clustering. Neurocomputing 173:1192–1202. https://doi.org/10.1016/j.neucom.2015.08.077
Article Google Scholar
Chen Y, Gu Y (2017) Active orthogonal matching pursuit for sparse subspace clustering. IEEE Signal Process Lett. https://doi.org/10.1109/LSP.2017.2741509
Article Google Scholar
Cheng W, Chow TW, Zhao M (2016) Locality constrained-lp sparse subspace clustering for image clustering. Neurocomputing 205:22–31. https://doi.org/10.1016/j.neucom.2016.04.010
Article Google Scholar
Elhamifar E, Vidal R (2013) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35:2765–2781. https://doi.org/10.1109/TPAMI.2013.57
Article Google Scholar
Fan K (1949) On a theorem of Weyl concerning eigenvalues of linear transformations I. Proc Nat Acad Sci 35:652–655. https://doi.org/10.1073/pnas.35.11.652
Article MathSciNet Google Scholar
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395. https://doi.org/10.1145/358669.358692
Article MathSciNet Google Scholar
Goel A, Majumdar A (2021) Sparse subspace clustering friendly deep dictionary learning for hyperspectral image classification. IEEE Geosci Remote Sens Lett 19:1–5. https://doi.org/10.1109/LGRS.2021.3112603
Article Google Scholar
Goel A, Majumdar A (2024) Sparse subspace clustering incorporated deep convolutional transform learning for hyperspectral band selection. Earth Sci Inf 17:2727–2735. https://doi.org/10.1007/s12145-024-01312-8
Article Google Scholar
Gupta P, Goel A, Majumdar A, Chouzenoux E, Chierchia G (2024) Deconfcluster: Deep convolutional transform learning based multiview clustering fusion framework. Sig Process 224:109597. https://doi.org/10.1016/j.sigpro.2024.109597
Article Google Scholar
He R, Wang L, Sun Z, Zhang Y, Li B (2015) Information theoretic subspace clustering. IEEE Trans Neural Netw Learn Syst 27:2643–2655. https://doi.org/10.1109/TNNLS.2015.2500600
Article MathSciNet Google Scholar
Ho J, Yang M-H, Lim J, Lee K-C, Kriegman D (2003) Clustering appearances of objects under varying illumination conditions. In: Proceedings of the 2003 IEEE computer society conference on computer vision and pattern recognition, 2003. IEEE, vol 1, p I. https://doi.org/10.1109/CVPR.2003.1211332
Hong W, Wright J, Huang K, Ma Y (2006) Multiscale hybrid linear models for lossy image representation. IEEE Trans Image Process 15:3655–3671. https://doi.org/10.1109/TIP.2006.882016
Article MathSciNet Google Scholar
Huang Q, Zhang Y, Peng H, Dan T, Weng W, Cai H (2020) Deep subspace clustering to achieve jointly latent feature extraction and discriminative learning. Neurocomputing 404:340–350. https://doi.org/10.1016/j.neucom.2020.04.120
Article Google Scholar
Ji P, Reid I, Garg R, Li H, Salzmann M (2017) Adaptive low-rank kernel subspace clustering. arXiv preprint arXiv:1707.04974. https://doi.org/10.48550/arXiv.1707.04974
Ji P, Zhang T, Li H, Salzmann M, Reid I (2017) Deep subspace clustering networks. In: Proceedings of the 31st international conference on neural information processing systems NIPS’17. Curran Associates Inc, Red Hook, pp 23–32
Ji P, Zhang T, Li H, Salzmann M, Reid I (2017) Deep subspace clustering networks. Adv Neural Inf Process Syst 30:5509–5521
Google Scholar
Kang Z, Lu X, Lu Y, Peng C, Chen W, Xu Z (2020) Structure learning with similarity preserving. Neural Netw 129:138–148. https://doi.org/10.1016/j.neunet.2020.05.030
Article Google Scholar
Kang Z, Lu Y, Su Y, Li C, Xu Z (2019) Similarity learning via kernel preserving embedding. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 4057–4064. https://doi.org/10.1609/aaai.v33i01.33014057
Kang Z, Peng C, Cheng Q (2017) Kernel-driven similarity learning. Neurocomputing 267:210–219. https://doi.org/10.1016/j.neucom.2017.06.005
Article Google Scholar
Kelkar BA, Rodd SF, Kulkarni UP (2019) Estimating distance threshold for greedy subspace clustering. Expert Syst Appl 135:219–236. https://doi.org/10.1016/j.eswa.2019.06.011
Article Google Scholar
Li C-G, You C, Vidal R (2017) Structured sparse subspace clustering: a joint affinity learning and subspace clustering framework. IEEE Trans Image Process 26:2988–3001. https://doi.org/10.1109/TIP.2017.2691557
Article MathSciNet Google Scholar
Liao M, Li Y, Gao M (2022) Graph-based adaptive and discriminative subspace learning for face image clustering. Expert Syst Appl 192:116359. https://doi.org/10.1016/j.eswa.2021.116359
Article Google Scholar
Lin Z, Chen M, Ma Y (2010) The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055. https://doi.org/10.48550/arXiv.1009.5055
Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2012) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35:171–184. https://doi.org/10.1109/TPAMI.2012.88
Article Google Scholar
Liu M, Wang Y, Sun J, Ji Z (2022) Adaptive low-rank kernel block diagonal representation subspace clustering. Appl Intell 52:2301–2316. https://doi.org/10.1007/s10489-021-02396-1
Article Google Scholar
Liu Z, Ou W, Zhang K, Xiong H (2024) Robust manifold discriminative distribution adaptation for transfer subspace learning. Expert Syst Appl 238:122117. https://doi.org/10.1016/j.eswa.2023.122117
Article Google Scholar
Lu C, Feng J, Lin Z, Mei T, Yan S (2018) Subspace clustering by block diagonal representation. IEEE Trans Pattern Anal Mach Intell 41:487–501. https://doi.org/10.1109/TPAMI.2018.2794348
Article Google Scholar
Lu C, Feng J, Lin Z, Mei T, Yan S (2019) Subspace clustering by block diagonal representation. IEEE Trans Pattern Anal Mach Intell 41:487–501. https://doi.org/10.1109/TPAMI.2018.2794348
Article Google Scholar
Maggu J, Majumdar A (2016) Alternate formulation for transform learning. In: Proceedings of the tenth Indian conference on computer vision, graphics and image processing, pp 1–8. https://doi.org/10.1145/3009977.3010069
Maggu J, Majumdar A (2017) Kernel transform learning. Pattern Recogn Lett 98:117–122. https://doi.org/10.1016/j.patrec.2017.09.002
Article Google Scholar
Maggu J, Majumdar A, Chouzenoux E (2018) Transformed locally linear manifold clustering. In: 2018 26th European signal processing conference (EUSIPCO), pp 1057–1061. https://doi.org/10.23919/EUSIPCO.2018.8553061
Maggu J, Majumdar A, Chouzenoux E (2021) Transformed subspace clustering. IEEE Trans Knowl Data Eng 33:1796–1801. https://doi.org/10.1109/TKDE.2020.2969354
Article Google Scholar
Maggu J, Majumdar A, Chouzenoux E, Chierchia G (2020) Deeply transformed subspace clustering. Signal Process 174:107628. https://doi.org/10.1016/j.sigpro.2020.107628
Article Google Scholar
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:256
Google Scholar
Nguyen H, Yang W, Shen F, Sun C (2015) Kernel low-rank representation for face recognition. Neurocomputing 155:32–42. https://doi.org/10.1016/j.neucom.2014.12.051
Article Google Scholar
Patel VM, Van Nguyen H, Vidal R (2013) Latent space sparse subspace clustering. In: Proceedings of the IEEE international conference on computer vision, pp 225–232
Patel VM, Van Nguyen H, Vidal R (2015) Latent space sparse and low-rank subspace clustering. IEEE J Sel Top Signal Process 9:691–701. https://doi.org/10.1109/JSTSP.2015.2402643
Article Google Scholar
Patel VM, Vidal R (2014) Kernel sparse subspace clustering. In: 2014 IEEE international conference on image processing (ICIP), pp 2849–2853. IEEE. https://doi.org/10.1109/ICIP.2014.7025576
Paul D, Saha S, Mathew J (2020) Improved subspace clustering algorithm using multi-objective framework and subspace optimization. Expert Syst Appl 158:113487. https://doi.org/10.1016/j.eswa.2020.113487
Article Google Scholar
Peng X, Feng J, Zhou JT, Lei Y, Yan S (2020) Deep subspace clustering. IEEE Trans Neural Netw Learn Syst 31:5509–5521. https://doi.org/10.1109/TNNLS.2020.2968848
Article MathSciNet Google Scholar
Peng X, Xiao S, Feng J, Yau W-Y, Yi Z (2016) Deep subspace clustering with sparsity prior. In: IJCAI, pp 1925–1931
Pham D-S, Budhaditya S, Phung D, Venkatesh S (2012) Improved subspace clustering via exploitation of spatial constraints. In: 2012 IEEE conference on computer vision and pattern recognition, pp 550–557. IEEE. https://doi.org/10.1109/CVPR.2012.6247720
Qin Y, Pu N, Wu H (2023) EDMC: efficient multi-view clustering via cluster and instance space learning. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2023.3331197
Article Google Scholar
Qin Y, Pu N, Wu H (2023) Elastic multi-view subspace clustering with pairwise and high-order correlations. IEEE Trans Knowl Data Eng
Qin Y, Tang Z, Wu H, Feng G (2023) Flexible tensor learning for multi-view clustering with Markov chain. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2023.3305624
Article Google Scholar
Ravishankar S, Bresler Y (2013) Closed-form solutions within sparsifying transform learning. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp 5378–5382. IEEE. https://doi.org/10.1109/ICASSP.2013.6638690
Ravishankar S, Bresler Y (2013) Learning sparsifying transforms. IEEE Trans Signal Process 61:1072–1086. https://doi.org/10.1109/TSP.2012.2226449
Article MathSciNet Google Scholar
Ravishankar S, Bresler Y (2015) Online sparsifying transform learning—Part II. IEEE J Sel Top Signal Process 9:637–646. https://doi.org/10.1109/JSTSP.2015.2407860
Article Google Scholar
Ravishankar S, Wen B, Bresler Y (2015) Online sparsifying transform learning—Part I. IEEE J Sel Top Signal Process 9:625–636. https://doi.org/10.1109/JSTSP.2015.2417131
Article Google Scholar
Seung HS, Lee DD (2000) The manifold ways of perception. Science 290:2268–2269. https://doi.org/10.1126/science.290.5500.2268
Article Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905. https://doi.org/10.1109/34.868688
Article Google Scholar
Soltanolkotabi M, Candés EJ (2012) A geometric analysis of subspace clustering with outliers. Ann Stat 40:2195–2238. https://doi.org/10.1214/12-AOS1034
Article MathSciNet Google Scholar
Somandepalli K, Narayanan S (2019) Reinforcing self-expressive representation with constraint propagation for face clustering in movies. In: ICASSP 2019—2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4065–4069. https://doi.org/10.1109/ICASSP.2019.8682314
Song J, Yoon G, Hahn K, Yoon SM (2019) Subspace clustering via structure-enforced dictionary learning. Neurocomputing 362:1–10. https://doi.org/10.1016/j.neucom.2019.07.025
Article Google Scholar
Tang K, Xu K, Jiang W, Su Z, Sun X, Luo X (2022) Selecting the best part from multiple Laplacian autoencoders for multi-view subspace clustering. IEEE Trans Knowl Data Eng 35:7457–7469. https://doi.org/10.1109/TKDE.2022.3178145
Article Google Scholar
Tang K, Xu K, Su Z, Zhang N (2023) Multi-view subspace clustering via consistent and diverse deep latent representations. Inf Sci 651:119719. https://doi.org/10.1016/J.INS.2023.119719
Article Google Scholar
Tschannen M, Bölcskei H (2018) Noisy subspace clustering via matching pursuits. IEEE Trans Inf Theory 64:4081–4104. https://doi.org/10.1109/TIT.2018.2812824
Article MathSciNet Google Scholar
Vidal R (2011) Subspace clustering. Sig Process Mag IEEE 28:52–68. https://doi.org/10.1109/MSP.2010.939739
Article Google Scholar
Vidal R, Favaro P (2014) Low rank subspace clustering (LRSC). Pattern Recogn Lett 43:47–61. https://doi.org/10.1016/j.patrec.2013.08.006
Article Google Scholar
Vidal R, Ma Y, Sastry S (2005) Generalized principal component analysis (GPCA). IEEE Trans Pattern Anal Mach Intell 27:1945–1959. https://doi.org/10.1109/TPAMI.2005.244
Article Google Scholar
Wang J, Shi D, Cheng D, Zhang Y, Gao J (2016) LRSR: low-rank-sparse representation for subspace clustering. Neurocomputing 214:1026–1037. https://doi.org/10.1016/j.neucom.2016.07.015
Article Google Scholar
Waqas M, Tahir MA, Khan SA (2023) Robust bag classification approach for multi-instance learning via subspace fuzzy clustering. Expert Syst Appl 214:119113. https://doi.org/10.1016/j.eswa.2022.119113
Article Google Scholar
Xia G, Sun H, Feng L, Zhang G, Liu Y (2017) Human motion segmentation via robust kernel sparse subspace clustering. IEEE Trans Image Process 27:135–150. https://doi.org/10.1109/TIP.2017.2738562
Article MathSciNet Google Scholar
Xiao S, Tan M, Xu D, Dong ZY (2015) Robust kernel low-rank representation. IEEE Trans Neural Netw Learn Syst 27:2268–2281. https://doi.org/10.1109/TNNLS.2015.2472284
Article MathSciNet Google Scholar
Xie X, Guo X, Liu G, Wang J (2017) Implicit block diagonal low-rank representation. IEEE Trans Image Process 27:477–489. https://doi.org/10.1109/TIP.2017.2764262
Article MathSciNet Google Scholar
Yang Y, Li F (2023) Kernel block diagonal representation subspace clustering with similarity preservation. Appl Sci 13:9345. https://doi.org/10.3390/app13169345
Article Google Scholar
Yang Z, Oja E (2010) Linear and nonlinear projective nonnegative matrix factorization. IEEE Trans Neural Netw 21:734–749. https://doi.org/10.1109/TNN.2010.2041361
Article Google Scholar
Yin M, Liu W, Li M, Jin T, Ji R (2021) Cauchy loss induced block diagonal representation for robust multi-view subspace clustering. Neurocomputing 427:84–95. https://doi.org/10.1016/j.neucom.2020.11.017
Article Google Scholar
You C, Robinson D, Vidal R (2016) Scalable sparse subspace clustering by orthogonal matching pursuit, pp 3918–3927. https://doi.org/10.1109/CVPR.2016.425
Yu S, Yiquan W (2018) Subspace clustering based on latent low rank representation with frobenius norm minimization. Neurocomputing 275:2479–2489. https://doi.org/10.1016/j.neucom.2017.11.021
Article Google Scholar
Zhai H, Zhang H, Zhang L, Li P, Plaza A (2016) A new sparse subspace clustering algorithm for hyperspectral remote sensing imagery. IEEE Geosci Remote Sens Lett 14:43–47. https://doi.org/10.1109/LGRS.2016.2625200
Article Google Scholar
Zhang C, Li H, Chen C, Jia X, Chen C (2022) Low-rank tensor regularized views recovery for incomplete multiview clustering. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3232538
Article Google Scholar
Zhang C, Li H, Chen C, Qian Y, Zhou X (2020) Enhanced group sparse regularized nonconvex regression for face recognition. IEEE Trans Pattern Anal Mach Intell 44:2438–2452. https://doi.org/10.1109/TPAMI.2020.3033994
Article Google Scholar
Zhang C, Li H, Lv W, Huang Z, Gao Y, Chen C (2023) Enhanced tensor low-rank and sparse representation recovery for incomplete multi-view clustering. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 11174–11182. https://doi.org/10.1609/aaai.v37i9.26323
Zhang H, Lin Z, Zhang C, Gao J (2014) Robust latent low rank representation for subspace clustering. Neurocomputing 145:369–373. https://doi.org/10.1016/j.neucom.2014.05.022
Article Google Scholar
Zhou L, Xiao B, Liu X, Zhou J, Hancock ER et al (2019) Latent distribution preserving deep subspace clustering. In: 28th international joint conference on artificial intelligence. York, pp 4440–4446. https://doi.org/10.24963/ijcai.2019/617

Download references

Author information

Authors and Affiliations

CSED, Thapar Institute of Engineering and Technology, Patiala, Punjab, 147004, India
Jyoti Maggu
Department of Computer Science and Engineering, Delhi Technological University, New Delhi, 110042, India
Anurag Goel

Authors

Jyoti Maggu
View author publications
You can also search for this author in PubMed Google Scholar
Anurag Goel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anurag Goel.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Maggu, J., Goel, A. K-BEST subspace clustering: kernel-friendly block-diagonal embedded and similarity-preserving transformed subspace clustering. Pattern Anal Applic 27, 119 (2024). https://doi.org/10.1007/s10044-024-01336-2

Download citation

Received: 13 June 2024
Accepted: 02 September 2024
Published: 19 September 2024
DOI: https://doi.org/10.1007/s10044-024-01336-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

K-BEST subspace clustering: kernel-friendly block-diagonal embedded and similarity-preserving transformed subspace clustering

Abstract

Explore related subjects

1 Introduction

2 Background

2.1 Subspace clustering

2.2 Kernelized subspace clustering

2.3 Kernelized transform learning

2.4 Transformed subspace clustering

2.5 Block diagonal representation

3 Proposed method: KBD-TSC

3.1 Similarity preserving model

3.2 Proposed algorithm

3.3 Optimization of the proposed KBD-TSC model

4 Experimental results and analysis

4.1 Dataset description

4.2 Baseline methods

4.3 Evaluation metrics

4.4 Kernel design

4.5 Computational complexity

4.6 Parameter sensitivity analysis

4.7 Results and discussion

4.8 Convergence analysis

4.9 Computational time

4.10 Ablation experiments

5 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation