Correntropy metric-based robust low-rank subspace clustering for motion segmentation

Guo, Li; Zhang, Xiaoqian; Liu, Zhigui; Wang, Qian; Zhou, Jianping

doi:10.1007/s13042-021-01456-9

Correntropy metric-based robust low-rank subspace clustering for motion segmentation

Original Article
Published: 11 November 2021

Volume 13, pages 1425–1440, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Correntropy metric-based robust low-rank subspace clustering for motion segmentation

Download PDF

Li Guo^1,5,
Xiaoqian Zhang ORCID: orcid.org/0000-0002-4007-4501^1,2,
Zhigui Liu¹,
Qian Wang^3,5 &
…
Jianping Zhou^4,5

384 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The subspace clustering methods for motion segmentation are widely used in the field of computer vision. However, the existing methods ignore the low-rank property of motion trajectory with nonlinear structure and are sensitive to non-Gaussian noise. To this end, we seek to improve the performance of motion segmentation by effectively modeling some important characteristics of the motion trajectories, such as nonlinear structure and contained non-Gaussian noise. Specifically, we propose to use kernel function to model motion trajectory, design a variant of the correntropy-induced metric to measure noise, and integrate the block diagonal regularizer into the kernel subspace clustering to strengthen the block diagonal structure of the learned affinity matrix. More importantly, we propose a unified rank-constrained block diagonal subspace clustering method for motion segmentation, which can handle not only rigid body motion segmentation, but also non-rigid motion segmentation. And we further extend this method to deal with various noises in motion data, such as missing trajectories, corrupted trajectory and outlying trajectory. An effective algorithm HQ& AM, which is integrated by Half-quadratic theory and alternating minimization, is designed to optimize these models. Experimental results on several commonly used motion datasets indicate the effectualness and robustness of our methods.

Robust Motion Segmentation with Unknown Correspondences

Motion Segmentation by Velocity Clustering with Estimation of Subspace Dimension

Sparse Optimization for Motion Segmentation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The purpose of motion segmentation [56] is to label the trajectory points of different moving objects in a dynamic scene into the corresponding groups, as shown in Fig. 1. It is the cornerstone of many machine vision applications, including action recognition [22, 29, 30, 54], visual surveillance [6], event detection [55], motion synthesis [15], and many other applications [35, 36]. Kanatani et al. [24] proposed that the motion segmentation from tracking feature points can be converted into a subspace clustering problem, where each subspace represents an independent motion. Subspace clustering can complete many complex tasks, such as document clustering [1, 2], face clustering [5, 58], gene sequence analysis [51], image segmentation [7, 56], motion segmentation [26, 60, 61], etc. The research topic of this paper is subspace clustering for motion segmentation.

Several commonly used subspace clustering approaches were designed, such as kernel k-means (KKM) [42], efficient k-means algorithm [25] and spectral clustering (SC) [37, 50]. However, they cannot be directly used to motion segmentation because the motion trajectory feature data has the following characteristics.

Non-linear structure: In motion segmentation, the trajectories of one motion are mostly located in a non-linear subspace (or submanifold) [20], because the camera always has a certain degree of perspective distortion, which leads to the invalid assumption of the affine camera model.
Low-rank: To capture smooth motion data, the capture frame rate of motion capture data is very high. Although high capture frame frequency can effectively avoid the motion jitter between adjacent frames and improve the visual effect of captured data, it also leads to high similarity between adjacent data frames and high redundancy of the entire data. If a matrix is used to represent the motion sequence, the rank of the matrix should be low.
Containing noise: Real motion data obtained by a tracker is a complicated process [40], so noise is inevitable during the segmentation process. A trajectory may correspond to some random motions, which makes the assumption of the affine camera model invalid (outlying trajectory). Furthermore, some features in some frames may be lost due to complex actions and fast motions, resulting in some missing entries in the trajectory (incomplete trajectory). And if some feature points are tracked incorrectly by the tracker unintentionally, resulting in some serious errors in the track (corrupted trajectory). The incorrect trajectories mentioned above are all non-Gaussian noises embedded in the data, and their structures are much more complex [56].
Real-time operation: Compared with independent static data, motion data contains a lot of successive information, which is the only clue to guide the clustering algorithm. Therefore, real-time operation is necessary for motion segmentation applications.

In view of the above special characteristics of the motion trajectory feature data, some superior subspace clustering algorithms have been proposed. In particular, Sparse Subspace Clustering (SSC) [10] can complete sparse representation of data point by selecting within-cluster data. Low-rank Representation (LRR) [31] and Least Square Regression (LSR) [34] encourage the matrix obtained to be low-rank. All these clustering algorithms, however, can only deal with linear data, but cannot handle non-linear data. A few other subspace clustering methods, such as Kernel Sparse Subspace Clustering (KSSC) [38], KSSC on SPD Riemannian Manifold (KSSCR) [57] and Low Rank Subspace Clustering (LRSC) [45], design nonlinear subspace clustering by kernel tricks. However, these methods with the predefined kernels cannot guarantee the low rank of the data after mapping to the feature space, which hinders the improvement of segmentation performance. Ji et al. [20] proposed a Low Rank Kernel Subspace Clustering (LRKSC) method, which ensures the low-rank attribute of the matrix by learning low-rank kernel mapping. In general, LRKSC can construct an affinity matrix with block diagonal structures, resulting in correct clustering. However, the block diagonal matrix is obtained indirectly by using the low-rank structural prior, which is sensitive to noise.

Aiming at the noise in the motion data, several advanced methods for modeling noise have been put forward. In particular, [10, 20, 45] assumed that noise in the motion data is sparse and used the ${l_1}$-norm to deal with it. A reconstruction errors measurement method based on ${l_2}$-norm [34] was proposed, which is robust to small Gaussian noise. Liu et al. [31] proposed that the ${l_{2,1}}$-norm can be used as a measure of data approximation, which is robust to outliers of a specific sample. These methods can only deal with Gaussian noise or sparse noise, but they are very sensitive to non-Gaussian noises such as severely mis-tracked features, missing entries or large outliers. In order to solve the above problems, [3, 43, 52] used the correntropy induced metric (CIM) to handle the non-Gaussian and impulsive noises, and achieved excellent performance. Unfortunately, there is no motion segmentation algorithm that can handle various noises in motion data in a unified way.

Recently, Wang et al. [49] have designed a new low-rank transfer subspace clustering method (LRTSC) for human motion segmentation by transferring existing well-labeled source data into target data. However, LRTSC has an obvious disadvantage that it takes a lot of time for data transfer operations, which is contrary to the timeliness of the motion segmentation application.

In view of the above problems of existing subspace clustering methods in motion data processing, we provide the following solutions respectively.

To handle nonlinear data, we propose to use kernel function to model motion trajectory.
To resist various noise contained in motion trajectory, we design a variant of CIM called the kernelized correntropy-induced metric (KCIM) to measure data error.
To strengthen the block diagonal structure of the learned affinity matrix, we integrate BDR into the kernel subspace clustering.
To improve the performance of motion segmentation in various scenarios, we propose a unified rank-constrained block diagonal subspace clustering method (RBDSC) and extend it to two robust versions. Furthermore, a new algorithm HQ& AM is designed to optimize these models.

The remainder of this work is structured as follows: (1) Sect. 2 summarizes the related works. (2) Section 3 elaborates our clustering methods and the corresponding optimization. (3) Section 4 carries out experiments and discussion. (4) The current work is summarized in Sect. 5.

2 Related works

We first introduce motion segmentation methods, and then review subspace clustering algorithms based on SC, including SSC [10], LRR [31] and their extensions.

2.1 Motion segmentation

Although many approaches have been proposed in the literature for motion segmentation, they haven’t fully considered the important characteristics of the motion trajectory feature data presented in Sect. 1.

In early methods [8, 11, 17, 24], factorization-based approaches attempted to directly detect the motion sequence’s cutting points. These methods require that the movements must be independent of each other so that it is easy to handle. However, for most dynamic scenes involving joint objects or motion cameras, the motions are at least partially interdependent.

To deal with partly dependent motions, Vidal et al. [47] put forward general principal component analysis (GPCA), which is a subspace segmentation method based on algebraic geometric. The algorithm does not have any restriction on the relative orientations of the motion subspace, such as allowing arbitrary intersections among different subspaces, so it can handle partially dependent motions. GPCA can work well if and only when the size and the number of subspaces are small, but its performance decreases with the increase of the number of subspaces. Obviously, it is only suitable for short and simple motion sequences.

Many scholars including [12, 23] describe motion segmentation as a statistical clustering problem, which can be solved by expectation–maximization (EM) or its variants. However, they may fall into suboptimal local minima because they require well-initialized iterative methods.

Recently, spectral clustering-based methods have been used for motion segmentation. They first learn an affinity matrix by utilizing local information around each trajectory, and then apply SC to obtain segmentation results [37]. The SC-based methods are the most popular. In this work, we mainly research the clustering segmentation based on spectral clustering.

2.2 Spectral clustering-based subspace clustering

Lately, some advanced SC-based subspace clustering algorithms, such as SSC [10] and LRR [31] have drawn much attention. All of them believe that data points can be expressed as a linear combination of other points from the same subspace. SSC assumes that data has self-expression and sparse-representation property [5, 10]. Specifically, for each data sample x, the coefficient matrix $\varvec{D}$ satisfies

$$\begin{aligned} \mathop {\min }\limits _\mathbf{D} {\left\| \mathbf{D} \right\| _1}{{ + }}\gamma {\left\| \mathbf{E} \right\| _1}{\quad \hbox {s}}{\hbox {.t}}{{. \ \mathbf{X} = \mathbf{XD} + \mathbf{E} , }} \ {{ \mathbf{D} }_{ii}} = 0 \end{aligned}$$

(1)

where ${\left\| \bullet \right\| _1}$ promotes sparsity of matrix, $\mathbf{E} \in \mathbf{R ^{N{{*}}N}}$ is an error matrix and $\gamma$ represents an adjust parameter.

To recover the subspace structures of data, [31] proposed LRR method, which aims at resolving the following rank minimization problem:

$$\begin{aligned} \mathop {\min }\limits _{ \mathbf{D}, \ E } rank( \mathbf{D} ) + \gamma {\left\| \mathbf{E} \right\| _q}{\quad \mathrm { s}}\mathrm{{.t} }{{.\ \mathbf{X} = \mathbf{MD} + \mathbf{E} }} \end{aligned}$$

(2)

where $\varvec{M}$ denotes a dictionary matrice, the $rank( \mathbf{D} )$ is named the lowest-rank of data matrix X relative to dictionary $\varvec{M}$, and ${\left\| \bullet \right\| _q}$ represents certain regularization strategy, e.g., ${\left\| \bullet \right\| _1}$ is used for characterizing the sparse corruptions, ${\left\| \bullet \right\| _2}$ for modeling small Gaussian noise, and ${\left\| \bullet \right\| _{2,1}}$ for dealing with sample-specific corruptions (or outliers).

To solve equation (2) easily, [31] proposed to replace the rank function with the nuclear norm. And a self-expressiveness method is proposed to represent the dictionary in Eq. (2) by the data matrix $\varvec{X}$ itself. Thus, Eq. (2) is rewritten as:

$$\begin{aligned} \mathop {\min }\limits _{ \mathbf{D}, \ E } {\left\| \mathbf{D} \right\| _ * } + \gamma {\left\| \mathbf{E} \right\| _q}{\quad \mathrm { s}}\mathrm{{.t} }{{.\ \mathbf{X} = \mathbf{XD} + \mathbf{E} }} \end{aligned}$$

(3)

where ${\left\| \mathbf{D} \right\| _ * }$ represents the trace norm of matrix $\varvec{D}$.

SSC and LRR have achieved excellent performance when dealing with linear data, but they cannot deal with nonlinear data. Ji et al. [20] proposed the LRKSC method, which cleverly integrates the “kernel trick” strategy, low-rank learning, and self-representation, so that nonlinear data can be handled and low rank of data can be guaranteed. It is formulated as:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M } \frac{{{\lambda _2}}}{2}||\Phi (\mathbf{X} ){{ - }}\Phi (\mathbf{X} )\mathbf{D} {{||}}_F^2 + {\lambda _1}{\left\| \mathbf{D} \right\| _1}{{ + }}{\left\| \mathbf{M} \right\| _ * } + {\lambda _3}{\left\| \mathbf{E} \right\| _1}\\ \quad s.t. \ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{K} = \mathbf{M ^T}{} \mathbf{M} + \mathbf{E} \end{array} \end{aligned}$$

(4)

where $\varvec{K}$ represents kernel matrix, and ${\Phi ^T}(\mathbf{X} )\Phi (\mathbf{X} ){{ = }} {{\mathbf{M }}^T}{} \mathbf{M}$ . However, the above model has two drawbacks: (1) using ${\left\| \mathbf{D} \right\| _1}$ can construct an affinity matrix with block diagonal, but its diagonal structure is fragile. (2) Using ${\left\| \mathbf{E} \right\| _1}$ can model arbitrary sparse corruptions, but it is a challenge for large errors in motion data.

3 The proposed methods

We propose a unified rank-constrained block diagonal subspace clustering method (RBDSC) for motion segmentation. Then, two robust RBDSC models for suppressing noise in motion data are designed:

rank-constrained block diagonal subspace clustering with modeling noise (RBDSC-MN) and rank-constrained block diagonal subspace clustering with removing noise (RBDSC-RN). Finally, the optimization procedures of the proposed models are given. Table 1 presents the important notations involved in the paper.

Table 1 The important notations used in this work

Full size table

3.1 Rank-constrained block diagonal subspace clustering

To handle nonlinear structure of motion trajectory feature data, we adopt the low-rank kernel mapping strategy to map these data into high-dimensional Hilbert space so as to perform linear pattern analysis. And we enforce low-rank constraints on the motion trajectory in the kernel space. Motivated by LRR [31] and LRKSC [20] methods, we use self-expressiveness-based subspace clustering frameworks to learn the affinity matrix. By comprehensively considering low-rank kernel mapping and self-expressiveness, the low-rank kernel self-expressiveness minimization problem is expressed as:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M } \frac{1}{2}||\mathbf{M} {{ - }} \mathbf{MD} {{||}}_F^2{{ + }}{\lambda _1}\mathfrak {R}{{(\mathbf{D} ) + }} {\left\| \mathbf{M} \right\| _ * }\\ \quad s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{K} = \mathbf{M ^T}{} \mathbf{M} \end{array} \end{aligned}$$

(5)

where ${\lambda _1}$ is a nonnegative balancing parameter, and $\mathfrak {R}( \mathbf{D} )$ is the regularization term about matrix $\varvec{D}$. According to different motivations, $\mathfrak {R}(\mathbf{D} )$ can represent different regularization items, such as ${\left\| \mathbf{D} \right\| _ * }$ and ${\left\| \mathbf{D} \right\| _1}$. The affinity matrixes constructed by these methods all have the common block-diagonal property, which contributes to ameliorating the performance of SC-based subspace clustering methods. However, their block diagonal structures acquired by ${\left\| \bullet \right\| _ * }$ and ${\left\| \bullet \right\| _1}$ are generally frangible and will be damaged when encountering a small signal-to-noise ratio, thereby reducing the clustering results. Inspired by [33], we introduce BDR to directly pursue the block-diagonal representation of coefficient matrix $\varvec{D}$. Therefore, we propose a rank-constrained block diagonal representation method (RBDSC) and it is formulated as:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M } \frac{1}{2}||\mathbf{M} {{ - }} \mathbf{MD} {{||}}_F^2 + {\lambda _1}{\left\| \mathbf{D} \right\| _\mathrm{{k}}} {{ + }}{\left\| \mathbf{M} \right\| _ * }\\ \quad s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,{\mathbf{D = }}{{\mathbf{D }}^T},\mathbf{K} = \mathbf{M ^T}{} \mathbf{M} \end{array} \end{aligned}$$

(6)

where k is the number of clusters or classes.

However, the constraint on matrix $\varvec{D}$ in Eq. (6) limits its representation ability. To alleviate this issue, we add intermediate matrix $\varvec{H}$, which satisfies $\mathbf{H} = \mathbf{D}$. The problem (6) translates to

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M,H } {\lambda _1}{\left\| \mathbf{D} \right\| _\mathrm{{k}}}{{ + }}{\left\| \mathbf{M} \right\| _ * } {{ + }}\frac{{||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2}}{2} + \frac{{{\lambda _2}\left\| \mathbf{H - \mathbf{D} } \right\| _F^2}}{2}{{ + }} \frac{{{\lambda _3}||\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} {{||}}_F^2}}{2}\mathrm{{ }}\\ s.t.\mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{ D }}^T} \end{array} \end{aligned}$$

(7)

3.2 Robust rank-constrained block diagonal subspace clustering

For LRKSC in Eq. (4), the ${\left\| \mathbf{E} \right\| _1}$ can only model arbitrary sparse corruptions, but it cannot handle noise in motion data very well, because these noises are complex and unpredictable. Unlike ${L_1}\mathrm{{-norm}}$, correntropy-induced metric (CIM) [52] is a local metric, which is particularly robust to complex noise [39] and non-Gaussian noise [52]. It has been widely used in the field of computer vision, such as face clustering [4], object image clustering [3], feature selection [14], and semi-supervised learning [34]. The correntropy [16, 52] of arbitrary variables w and v is formulated as

$$\begin{aligned} \mathop {{V_\delta }}\limits ^ \vee (w,v) = \frac{1}{N}\sum \limits _{i = 1}^N {{g_\sigma }({w_i} - {v_i})} \end{aligned}$$

(8)

where ${\mathrm{{g}}_\sigma }( \bullet )$ denotes kernel function, and $\sigma$ is the width of $g({\mathrm{{w}}_i},{v_i})$ . Gaussian kernel ${g_\sigma }({\mathrm{{w}}_i},{v_i}) = \exp ({{ - }}\left\| {{w_i}{{ - }}{v_i}} \right\| _2^2 /2{\sigma ^2})$ is usually used as the kernel function. Generally, CIM [52] is used to evaluate the similarity of two variables w and v :

$$\begin{aligned} \begin{array}{l} CIM(\mathrm{{w}},v) = \sqrt{1 - \mathop {{V_\sigma }}\limits ^ \vee (w,v)} \\ \\ \qquad \ \qquad \ = \sqrt{1 - \frac{1}{N}\sum \limits _{i = 1}^N {{g_\sigma }({w_i} - {v_i})} } \end{array} \end{aligned}$$

(9)

Unfortunately, we cannot directly introduce CIM into our model, because CIM is designed for linear data, while the motion tracking features are nonlinear. Fortunately, the motion data is linear in the Hilbert eigenspace [20], where correntropy can be applied to the motion data. Thus, we design a variant of CIM called the kernelized correntropy-induced metric (KCIM), which is expressed as

$$\begin{aligned} KCIM(\phi (w),\phi (v)) = \sqrt{1 - \frac{1}{N}\sum \limits _{i = 1}^N {{g_\sigma }(\phi ({\mathrm{{e}}_i}))} } \end{aligned}$$

(10)

where $\phi ({e_i}) = \phi ({w_i}) - \phi ({v_i})$ for $i = 1,2,3 \ldots N$ .

It can be seen that KCIM is close to 1 when handling a large error ($\phi ({e_i})$ is large), which is much smaller than the absolute error and the mean squared error used. Larger noise usually comes from data loss, data corruption and outliers [39]. Thus, we propose a rank-constrained block diagonal subspace clustering with modeling noise (RBDSC-MN), which combines the rank-constrained block diagonal representation (RBDSC) and KCIM. And the RBDSC-MN model is formulated as:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M,H,E } {\lambda _1}{\left\| \mathbf{D} \right\| _\mathrm{{k}}}{{ + }}{\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{1}{2}||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2 +\\ \\ \qquad \qquad \frac{{{\lambda _2}}}{2}\left\| \mathbf{H - \mathbf{D} } \right\| _F^2 {{ + }}\frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2 \\ \\ \qquad \qquad {{ + }} \sum \limits _i {\sum \limits _j {(1 - {g_\sigma }(\mathbf{E _{ij}}))} } \\ \qquad s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{D }}^T} \end{array} \end{aligned}$$

(11)

where $\mathbf{E _{ij}}$ is the entry (i, j) of matrix $\varvec{E}$.

For larger outliers, the prior information can be further used in the model proposed in (11) to improve the clustering accuracy. Inspired by ${\left\| \bullet \right\| _{2,1}}$ (group sparsity) in LRR [31] to handle large outliers [14, 31, 32], we design a new robust subspace clustering, namely RBDSC-RN, to remove outlying trajectories. And it is formulated as:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M,H,E } {\lambda _1}{\left\| \mathbf{D} \right\| _\mathrm{{k}}}{{ + }}{\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{1}{2}||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2 + \\ \\ \qquad \qquad \frac{{{\lambda _2}}}{2}\left\| \mathbf{H - \mathbf{D} } \right\| _F^2{{ + }} \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T} \mathbf{M} ){{||}}_F^2 \\ \\ \qquad \qquad {{ + }}\sum \limits _i {(1 - {g_\sigma }({{\left\| {\mathbf{E ^i}} \right\| }_2}))} \\ \qquad \mathrm{{ }}s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{D }}^T} \end{array} \end{aligned}$$

(12)

where $\mathbf{E ^i}$ is the ith row of $\varvec{E}$.

Therefore, to effectively deal with the motion data errors, two robust rank-constrained block diagonal subspace clustering methods are used: the RBDSC-MN model in Eq. (11) (handling incomplete trajectories and corrupted trajectories), and the RBDSC-RN model in Eq. (12) (handling outlying trajectories).

3.3 Optimization method

Since the objective functions in Eqs. (11) and (12) are non-convex, optimizing them is a challenging problem. We design a novel algorithm HQ& AM, which combines Half-quadratic theory [52] and alternating minimization [33]. Because Eqs. (7), (11) and (12) are identical except for the error term $\varvec{E}$, we present the optimization process of the model RBDSC (Eq. (7)), and then illustrate the optimization schemes of model RBDSC-MN (Eq. (11)) and model RBDSC-RN (Eq. (12)).

3.3.1 Optimization model RBDSC

(1) Updating $\varvec{M}$:

By fixing ($\varvec{D}$, $\varvec{H}$), the optimization sub-problem corresponding to $\mathbf{M ^{t + 1}}$ in Eq. (7) is

$$\begin{aligned} \mathop {\min }\limits _\mathbf{M} {\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{1}{2}||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2{{ + }}\frac{{{\lambda _3}}}{2}||\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} {{||}}_F^2 \end{aligned}$$

(13)

where $||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2$ can be expanded and we have

$$\begin{aligned} ||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2 = tr(\mathbf{I} - 2\mathbf{H} + \mathbf{H} \mathbf{H ^T})\mathbf{M ^T}{} \mathbf{M} \end{aligned}$$

(14)

Equation (13) can be rewritten as

$$\begin{aligned} \mathop {\min }\limits _\mathbf{M} {\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{{{\lambda _3}}}{2}||\mathbf{M ^T}{} \mathbf{M} - \mathbf{K} + \frac{{(\mathbf{I} - 2\mathbf{H} + \mathbf{H} \mathbf{H ^T})}}{{2{\lambda _3}}}{{||}}_F^2 \end{aligned}$$

(15)

We define $\mathbf{K _\mathbf{G} } =\mathbf { K }- \frac{{(\mathbf {I} - 2\mathbf {H} + \mathbf {H}{} \mathbf{{H}^T})}}{{2{\lambda _3}}}$ , then Eq. (15) is transformed to

$$\begin{aligned} \mathop {\min }\limits _\mathbf{M} {\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{{{\lambda _3}}}{2}||\mathbf{M ^T}{} \mathbf{M} - \mathbf{K _\mathbf{G} }{{||}}_F^2 \end{aligned}$$

(16)

We can obtain a closed-form solution by

$$\begin{aligned} \mathbf{M ^{t + 1}} = {\Sigma ^ * }{\Delta ^T} \end{aligned}$$

(17)

where ${\Sigma ^ * }$ and $\Delta$ are both related to the SVD of $\mathbf{K _\mathbf{G} }$ . Let $\mathbf{K _\mathbf{G} } = \Gamma \Lambda {\Delta ^T}$ represent the SVD of $\mathbf{K _\mathbf{G} }$ , ${\upsilon _i}$ is the ith singular value of $\mathbf{K _\mathbf{G} }$. , ${\Sigma ^ * }{{ = }}\left( {\begin{array}{*{20}{c}} {\varsigma _1^ * }&{}{}&{}0\\ {}&{} \ddots &{}{}\\ 0&{}{}&{}{\varsigma _N^ * } \end{array}} \right)$ with $\varsigma _i^ * = \mathrm{{ arg}}\mathop {\mathrm{{min}}}\limits _{{\varsigma _i}} \frac{{{\lambda _3}}}{2}{({\upsilon _i} - \varsigma _i^2)^2} + {\varsigma _i}$ and

${\varsigma _i} \in \left\{ {x \in {\mathfrak {R}_{{ + }}}|{x^3} - {\upsilon _i}x + {{(2{\lambda _3})}^{ - 1}} = 0} \right\} \cup \left\{ 0 \right\}$ . $\Delta$ is a matrix including the singular values of $\mathbf{K _\mathbf{G} }$ .

(2) Updating $\varvec{H}$:

By fixing ($\varvec{D}$, $\varvec{M}$), the optimization sub-problem corresponding to $\mathbf{H ^{t + 1}}$ in Eq. (7) is

$$\begin{aligned} \mathop {\min }\limits _\mathbf{H} \frac{1}{2}||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2 + \frac{{{\lambda _2}}}{2}\left\| \mathbf {H} - {D} \right\| _F^2 \end{aligned}$$

(18)

We can obtain its closed-form solution given by

$$\begin{aligned} {{\mathbf{H }}^{t + 1}}{{ = (}}{\lambda _2}{} \mathbf{D} + {{\mathbf{M }}^T}{\mathbf{M )/(}}{\lambda _2}{\mathbf{I }} + {{\mathbf{M }}^T}{\mathbf{M )}} \end{aligned}$$

(19)

(3) Update $\varvec{D}$:

By fixing ($\varvec{M}$, $\varvec{H}$), the optimization sub-problem corresponding to $\mathbf{D ^{t + 1}}$ in Eq. (7) is

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D} {\lambda _1}{\left\| \mathbf{D} \right\| _\mathrm{{k}}}{{ + }}\frac{{{\lambda _2}}}{2}\left\| \mathbf{H - \mathbf{D} } \right\| _F^2 \\ \quad \ s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{D }}^T} \end{array} \end{aligned}$$

(20)

Define ${{\mathbf{L }}_\mathbf{D} } = Diag(\mathbf{D} \varvec{1}) - \mathbf{D}$ is the Laplacian matrix corresponding to matrix $\varvec{D}$. The k block-diagonal representation refers to the addition of k smallest eigenvalues:

$$\begin{aligned} {\left\| \mathbf{D} \right\| _\mathrm{{k}}} = \sum \limits _{i = N + 1 - k}^N {{\lambda _i}(\mathbf{L _\mathbf{D} })} \end{aligned}$$

(21)

where ${\lambda _i}(\mathbf{L _\mathbf{D} })$ is the ith smallest eigenvalue of $\mathbf{L _\mathbf{D} }$ . According to [9], Eq. (21) can be rewritten as

$$\begin{aligned} \begin{array}{l} \sum \limits _{i = N + 1 - \mathrm{{k}}}^N {{\lambda _i}({ \mathbf{L} _\mathbf{D} })} = \mathop {\min }\limits _ \mathbf{Q} Tr( \mathbf{Q} { \mathbf{L} _ \mathbf{D} }) \\ \quad \ \;\;s.t.\;\;{Tr}(\mathbf{Q} ) = k,0 \le \mathbf{Q} \le \mathbf{I} \end{array} \end{aligned}$$

(22)

So, Eq. (20) can be transformed to

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,Q } {\lambda _1} Tr(\mathbf{Q} (Diag(\mathbf{D} \varvec{1}) - \mathbf{D} )){{ + }}\frac{{{\lambda _2}}}{2}\left\| \mathbf{H -\mathbf{D} } \right\| _F^2\\ \quad s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{D }}^T},\; \mathbf{0} \le \mathbf{Q} \le \mathbf{I} ,Tr(\mathbf{Q} ) = k \end{array} \end{aligned}$$

(23)

Fix $\mathbf{D} = \mathbf{D ^T}$, updating $\varvec{Q}$ by
$$\begin{aligned} \begin{array}{l} \mathbf{Q ^{t + 1}}{{ = }}\mathop {\arg \min }\limits _{\mathrm{{ }} \mathbf{Q} } {\lambda _1}Tr(\mathbf{Q} (Diag(\mathbf{D} \varvec{1}) - \mathbf{D} ))\\ \qquad s.t.\ Tr(\mathbf{Q} ) = \mathrm{{k}}, \mathbf{0} \le \mathbf{Q} \le \mathbf{I} \end{array} \end{aligned}$$
(24)
Inspired by [9], we get the solution of Eq. (24) as
$$\begin{aligned} \mathbf{Q ^{t + 1}} = \mathbf{V _*}\mathbf{V _*}^T \end{aligned}$$
(25)
where $\mathbf{V _*}$ is the matrix containing the eigenvectors of k minimum eigenvalues corresponding to $Diag(\mathbf{D} \varvec{1}) - \mathbf{D}$ .
Fix $\mathbf{Q} = \mathbf{Q }^T$, updating $\varvec{D}$ by
$$\begin{aligned} \begin{array}{l} {{\mathbf{D }}^{t + 1}}\mathrm{{ = arg }}\mathop {\mathrm{{min}}}\limits _\mathbf{D} {\lambda _1}Tr(\mathbf{Q} (Diag(\mathbf{D} \varvec{1}) - \mathbf{D} )){{ + }}\frac{{{\lambda _2}}}{2}\left\| \mathbf{H - {D}} \right\| _F^2\\ \qquad s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{D }}}^{T} \end{array} \end{aligned}$$
(26)
Equation (26) can be translated into follows:
$$\begin{aligned} \begin{array}{l} \mathbf{D ^{t + 1}}\mathop { = \arg \min }\limits _{\quad \mathbf{D} } \frac{1}{2}\left\| \mathbf{D} - \left( \mathbf{H} - \frac{{{\lambda _1}}}{{{\lambda _2}}}(diag(\mathbf{Q} ){\varvec{1}^T} - \mathbf{Q} )\right) \right\| _F^2 \\ \qquad s.t.\ \;\;\;\mathbf{C} \ge 0,\mathbf{C }_{ii} = 0,\mathbf{C} = \mathbf{C }^T \end{array} \end{aligned}$$
(27)
Let $\mathbf{U} \in {\mathbf {R}^{N \times N}}$, defining $\mathbf {U} = \mathbf{H} - \frac{{{\lambda _1}}}{{{\lambda _2}}}(diag(\mathbf {Q}) {\varvec{1}^T} - \mathbf {Q})$, $\mathop {\mathbf {U}}\limits ^ \wedge = \mathbf {U} - Diag(diag(\mathbf {U}))$. The closed-form solution of Eq.(27) is
$$\begin{aligned} \begin{array}{l} {\mathbf {D}^{t + 1}} = {\left[ \frac{{\mathop {\mathbf {U}}\limits ^ \wedge + \mathop {{\mathbf {U}^T}}\limits ^ \wedge }}{2}\right] _ + } \end{array} \end{aligned}$$
(28)

3.3.2 Optimization model RBDSC-MN

(1) Updating $\varvec{M}$:

By fixing ($\varvec{D}$, $\varvec{H}$), the optimization sub-problem corresponding to $\mathbf{M ^{t + 1}}$ in Eq. (11) is

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{M} {\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{1}{2}||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2{{ + }}\frac{{{\lambda _3}}}{2}||\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} - \mathbf{E} {{||}}_F^2 \end{array} \end{aligned}$$

(29)

Except for $\mathbf{K _\mathbf{G} }$ , the problem of solving Eq. (29) is similar to that of Eq. (13).

$$\begin{aligned} \mathbf{K _\mathbf{G} } = \mathbf{K} - \frac{{(\mathbf{I} - 2\mathbf{H} + \mathbf{H} \mathbf{H ^T})}}{{2{\lambda _3}}} - \mathbf{E} \end{aligned}$$

(30)

(2) Updating $\varvec{H}, \varvec{D}$:

By comparing the two Eqs. (7) and (11), the updates of $\varvec{H}$ and $\varvec{D}$ in Eq. (11) are the same as Eqs. (18) and (20), respectively. Thereby, the solutions of $\varvec{H}$ and $\varvec{D}$ are obtained by Eqs. (19) and (28), respectively.

(3) Updating $\varvec{E}$:

By fixing ($\varvec{M}, \varvec{H}, \varvec{D}$), the optimization sub-problem corresponding to $\mathbf{E ^{t + 1}}$ in Eq. (11) is

$$\begin{aligned} \mathop {\min }\limits _\mathbf{E} \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2{{ + }}\sum \limits _i {\sum \limits _j {(1 - {g_\sigma }(\mathbf{E _{ij}}))} } \end{aligned}$$

(31)

According to HQ analysis [52], the above problem (31) can be converted into

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{E,S } \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2{{ + }}\left\| {\mathbf{S ^{\frac{1}{2}}} \otimes \mathbf{E} } \right\| _F^2 \\ \qquad {{ + }}\sum \limits _i {\sum \limits _j {\Psi (\mathbf{S _{ij}})} } \end{array} \end{aligned}$$

(32)

where $\varvec{S}$ is the matrix containing the auxiliary variable $\mathbf{S _{ij}}$ , $\otimes$ represents the Hadamard product, and $\Psi ( \bullet )$ is the dual potential function of $1 - {g_\sigma }( \bullet )$ .

A new optimization algorithm HQ&AM is designed to solve Eq. (32). Specifically, in each iteration, $\varvec{S}$ is optimized by HQ theory, and then $\varvec{E}$ is optimized by alternating minimization.

Fix $\varvec{E}$, updating $\varvec{S}$ by
$$\begin{aligned} \mathbf{S} _{ij}^{t + 1} = {g_\sigma }(\mathbf{E} _{ij}^t) = \exp ({{ - (}}{} \mathbf{E} _{ij}^t{)^2}/2{\sigma ^2}) \end{aligned}$$
(33)
Fix $\varvec{S}$, updating $\varvec{E}$ by
$$\begin{aligned} \begin{array}{l} {{\mathbf{E }}^{t + 1}}\mathrm{{ = arg }}\mathop {\mathrm{{min}}}\limits _ \mathbf{E} \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }} (\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2 \\ \qquad \quad {{ + }}\left\| {{{(\mathbf{S ^{t + 1}})}^{\frac{1}{2}}} \otimes \mathbf{E} } \right\| _F^2\\ \qquad {{ = }}\ (\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} )./\left( \varvec{1} + \frac{{2\mathbf{S ^{t + 1}}}}{{{\lambda _3}}}\right) \end{array} \end{aligned}$$
(34)
where ./ represents element-by-element division.

3.3.3 Optimization model RBDSC-RN

(1) Updating $\varvec{M}, \varvec{H}, \varvec{D}$:

By comparing the two Eqs. (11) and (12), the updates of $\varvec{M}, \varvec{H} \ and \ \varvec{D}$ in Eq. (12) are the same as (29), (18) and (20), respectively. Thereby, the solutions of $\varvec{H} \ and \ \varvec{D}$ are obtained by (19) and (28), and optimizing $\varvec{M}$ is similar to solving Eq. (13), but $\mathbf{K _\mathbf{G} } = \mathbf{K} - \frac{{(\mathbf{I} - 2\mathbf{H} + \mathbf{H} \mathbf{H ^T})}}{{2{\lambda _3}}} - \mathbf{E}$.

(2) Updating $\varvec{E}$:

By fixing ($\varvec{M}, \varvec{H} \ and \ \varvec{D}$), the optimization sub-problem corresponding to $\mathbf{E ^{t + 1}}$ in Eq. (12) is

$$\begin{aligned} \mathop {\min }\limits _\mathbf{E} \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2{{ + }}\sum \limits _i {(1 - {g_\sigma }({{\left\| {\mathbf{E ^i}} \right\| }_2}))} \end{aligned}$$

(35)

According to HQ analysis [52], the above problem (35) can be converted into

$$\begin{aligned} \mathop {\min }\limits _\mathbf{E,S } \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2{{ + }}\left\| {\mathbf{S ^{\frac{1}{2}}} \otimes \mathbf{E} } \right\| _F^2{{ + }}\sum \limits _i {\Psi (\mathbf{S _i})} \end{aligned}$$

(36)

where $\varvec{S}$ is the matrix containing the auxiliary variable ${\varvec{S}_i}$ .

Fix $\varvec{E}$, updating $\varvec{S}$ by
$$\begin{aligned} \begin{array}{l} \mathbf{S} _{ij}^{t + 1} = {g_\sigma }({\left\| {{{(\mathbf{E ^t})}^i}} \right\| _2}) \\ \qquad = \exp ({{ - }}\left\| {{{(\mathbf{E ^t})}^i}} \right\| _2^2/2{\sigma ^2}), i = 1,2 \ldots N \end{array} \end{aligned}$$
(37)
where ${(\mathbf{E ^t})^i}$ is the ith row of matrix ${{\mathbf{ E }}^t}$ .
Fix $\varvec{S}$, updating $\varvec{E}$

The solutions of $\varvec{E}$ can be obtained by (34).

3.4 Clustering

According to whether the motion trajectory feature data contains noise and the source of noise, we select the corresponding algorithm 1, 2 and 3 to solve (7), (11) and (12) respectively.

4 Experimental evaluation

To demonstrate the superiority of our methods, we conduct extensive experiments on different video sequences. We first evaluate the performance of the proposed algorithms on rigid and non-rigid structure-from-motion: (1) rigid-body motion segmentation [33, 53] with complete trajectories: Hopkins155 Traffic sequences [44]; (2) non-rigid motion (e.g., human motion) segmentation [53]: Keck Gesture [21] and UT-Interaction [41]. Then, we do experiments by using two frames of each sequence in the Hopkins155 dataset [44] to compare our performance. Finally, we conduct robustness experiments: (1) missing and corrupted trajectories: Hopkins12 motion sequences [46]; (2) outlying trajectories^{Footnote 1}: Cars Turning, Books and Nrbooks.

4.1 Compared methods and settings

We compare the proposed methods with the following baselines: SSC [10], LRR [31], SSC-OMP [59], LSR [34], RSIM [19], LRKSC [20], BDR-B [33], BDR-Z [33]. Especially in the human motion segmentation experiment, in addition to comparing our models with the above models, we also compare them with specially designed human motion segmentation methods, including TSC [27], TSS [48], and LRTSC [49].

In the performance comparison test, we try our best to use the available source code and adjust the parameters to obtain excellent results. The misclassification ratio with the best mapping is widely used as a measure of clustering validity [10, 31, 34, 38, 57]. The expression is shown as follows:

$$\begin{aligned} Err\% = \frac{{\sum \nolimits _{i = 1}^n {\upsilon ({s_i}, \sim \mathrm{{map(}}{\mathrm{{g}}_i}\mathrm{{)}})} }}{n} \end{aligned}$$

(38)

where $\upsilon (x, \sim \mathrm{{y) = }}\left\{ \begin{array}{l} 1,\mathrm{{ x }} \ne \mathrm{{y}}\\ 0,\mathrm{{ x = }}y \end{array} \right.$ , ${s_i}$ is the ground truth of the ith element, $\mathrm{{map(}}{\mathrm{{g}}_i}\mathrm{{)}}$ represents the permutation mapping function, and $\sim \mathrm{{map(}}{\mathrm{{g}}_i}\mathrm{{)}}$ denotes that the cluster label ${\mathrm{{g}}_i}$ cannot be mapped to the corresponding ground truth. $Err\%$ is a negative metric, and a lower value means better performance. As suggested in [20], we repeat each experiment 20 times and give an average result to mitigate the impact of initialization.

4.2 Rigid-body motion segmentation with complete trajectories: Hopkins155 traffic sequences [44]

The Hopkins155 dataset [44] is a standard motion segmentation dataset to test subspace clustering based motion segmentation algorithms. It includes the video sequences and the trajectory features tracked in all frames. It consists of 38 traffic sequences, 104 checkerboard sequences and 13 other sequences. The moving objects of Hopkins155 Traffic sequences are cars and trucks, which produce typical rigid structure-from-motions. Figure 2 shows two examples from Hopkins155 Traffic sequences.

Hopkins155 Traffic sequences can be researched under the affine assumption, because the rigid motion trajectories across multiple frames is located in the affine subspace. Considering that each video sequence in the Hopkins155 Traffic sequences has a complete feature track and outliers in the data set have been manually deleted, we apply model RBDSC (with Eq. (7) solved by Algorithm 1) to test the efficiency of our method. The parameters for model RBDSC are set for ${\lambda _1} = 2.8,{\lambda _2} = 10,{\lambda _3} = 1.3*{10^5}$, and the polynomial kernel $k({x_1},{x_2}) = {(0.8 + x_1^T{x_2})^2}$ is used to define K in our model. Table 2 reports the results of our RBDSC method and baselines. We can easily see that the RBDSC method consistently obtains better results than other advanced methods.

Table 2 Segmentation errors (in %) on Hopkins155 traffic sequences

Full size table

4.3 Non-rigid motion segmentation

Human motion is a typical non-rigid motion with great deformation, so it is very challenging to segment different moving objects. To demonstrate the effectiveness of our model for non-rigid motion, we conduct experiments on two typical human motion datasets, including Keck Gesture Dataset (Keck) [21] and UT-Interaction Dataset (UT) [41].

Keck [21] contains 14 different types of gestures, which represent different military signals. Each of the three people performs these 14 gestures. In each sequence, everyone repeats the same gesture three times. The resolution of the RGB frame is $640 * 480$. Figure 3 shows some examples of different gesture classes from the Keck dataset.
UT [41] contains 20 videos, each of which contains six types of human interaction, including pushing, shaking hands, punching, kicking, pointing and hugging. The resolution of each video sequence is $780 * 480$, and the duration is about one minute. Figure 4 shows a few snapshots of human interaction videos from the UT dataset.

In the experiment, we use the low-level HoG feature extracted in [49]. Then, for the comparison methods of different theories, we carry out corresponding experimental settings. For the motion segmentation method based on transfer learning knowledge (TSC, TSS, LRTSC), we refer to [49] to segment the test video sequence based on another data (source). Specifically, TSC, TSS and LRTSC all set the UT (Keck) Dataset as the source and the Keck (UT) Dataset as the target. For other comparison algorithms (not using source information), such as LRR, LSR, SSC and LRKSC, we only input video sequences in the target dataset. We set the parameters of model RBDSC as ${\lambda _1} = 2.8,{\lambda _2} = 21,{\lambda _3} = 140$, and the kernel function $k({x_1},{x_2}) = {(0.8 + x_1^T{x_2})^3}$ is used to define K in our model. The results are recorded in Table 3. Obviously, the error rate of RBDSC is much lower than that of 11 advanced clustering methods. In particular, compared with the second-best model on UT dataset, LRTSC, our model RBDSC achieves a lower classification error rate at an average of 31.75 %.

Then, we compare the computational cost of all comparison methods, all of which were implemented with MATLAB R2019a on a Windows computer with a 3.0 GHz Intel(R) Core(TM) i5-7400 CPU and 8.00 GB memory. Figure 5 shows the running time (in seconds) of all methods on UT dataset, and these methods are ordered in ascending computational costs as SSC-OMP, LSR, RSIM, LRKSC, SSC, RBDSC, BDR-B, BDR-Z, LRR, TSS, TSC and LRTSC. We observe that RBDSC significantly reduces the computing cost compared with the second-best method (LRTSC). The results demonstrate that RBDSC is an effective and time-saving method.

Table 3 Segmentation Errors (in %) on two human motion datasets

Full size table

4.4 Results on Two-frame of each sequence in the Hopkins155 Dataset [44]

The impact of frame counts is very important for designing real-time applications that require processing sequentially input data. To evaluate the performance of our model when inputting a small number of frames, we manually select the first and last frames [20, 28] from each video sequence in the Hopkins155 dataset to form a two-frame Hopkins155 dataset. In view of the problem that the two frames of video sequence cannot accurately satisfy the assumption of camera affine model, [13] proposed that the subspace was derived from the rewriting limit constraint

$$\begin{aligned} {x_d}^{\prime T} \varvec{F}{\mathrm{{x}}_d} = 0 \end{aligned}$$

(39)

where $\mathrm{{x}}_d^T = {[{x_d},{\mathrm{{y}}_d},\mathrm{{ 1]}}^T}$ and $\mathrm{{x}}_d^\prime = {[x_d^\prime ,\mathrm{{ y}}_d^\prime ,\mathrm{{ 1]}}^T}$ were the homogeneous coordinates of the two points corresponding to the 3-D point d in the two- frame, and $\varvec{F} \in {\mathfrak {R}^{3 \times 3}}$ is the fundamental matrix. Equation (39) can be rewritten as [18]

$$\begin{aligned} {f^T}vec(\mathrm{{x}}_d^\prime \mathrm{{x}}_d^T) = 0 \end{aligned}$$

(40)

where $f \in {\mathfrak {R}^{9 \times 9}}$ denotes vectorized fundamental matrix $\varvec{F}$, $vec(\mathrm{{x}}_d^\prime \mathrm{{x}}_d^T) = {({x_d}x_d^\prime ,{x_d}y_d^\prime ,{x_d},{y_d}x_d^\prime ,{y_d}y_d^\prime ,{y_d},x_d^\prime ,y_d^\prime ,1)^T}$ . Thus, $vec(\mathrm{{x}}_d^\prime \mathrm{{x}}_d^T)$ lies on the epipolar subspace of dimension 8 [18]. We assume that two perspective images with multi-motion have multi-epipolar subspaces because different motions are relative to different fundamental matrices [20].

We replicate the data 30 times for increasing the ambient dimension, and set the parameters of model RBDSC (with Eq. (7) solved by Algorithm 1) as ${\lambda _1} = 3,{\lambda _2} = 5,{\lambda _3} = 1.3 * {10^5}$. The kernel function $k({x_1},{x_2}) = {(1.8 + x_1^T{x_2})^2}$ is used to define $\varvec{K}$ in our model.

Table 4 reports the results of our RBDSC method and baselines. Through in-depth analysis of these results, we can get the following observations.

We can easily see that our model RBDSC generally achieves the lowest misclassification rate compared with these advanced algorithms.
We observe that RBDSC is superior to LRKSC overall. This is mainly because RBDSC introduces direct block diagonal constraint [33], while LRKSC uses low rank prior to indirectly approximate block diagonal structure. Figure 6 displays the comparison of the block-diagonal property of the affinity matrix obtained by different clustering methods. These images can be zoomed in on the screen for best results. Obviously, RBDSC has the best block diagonal structure.
Our method RBDSC consistently outperforms BDR-B and BDR-Z, which clearly demonstrates the benefits of combining block diagonal regularizer (BDR) [33] with kernel mapping technology and low-rank kernel learning strategy. Specifically, RBDSC has the ability to handle non-linear data by low-rank kernel mapping when the sequences cannot strictly meet the assumptions of the affine camera model.

Table 4 Segmentation errors (in %) on two-frame Hopkins155 Dataset

Full size table

4.5 Experimental results on robustness

4.5.1 Missing and corrupted trajectories: Hopkins12 Dataset [46]

In practice, some trajectory features in some image frames are lost due to the limitation or occlusion of the tracker. Hopkins12 dataset contains 12 motion sequences: 3 3-motion sequences and 9 2-motion sequences. For these sequences, they have lost some of the trajectory features to some extent. And corrupted trajectories can emerge in video sequences when the tracker unknowingly loses tracking of certain feature points. Therefore, it is challenging to segment the motion subjects in video sequences.

We test our model RBDSC-MN and several advanced subspace segmentation methods on the Hopkins12 dataset with incomplete or corrupted trajectories. Figure 7 shows some frames of 3-motion sequence in the Hopkins12 dataset.

We set the parameters of model RBDSC-MN (with Eq. (11) solved by Algorithm 2)) as ${\lambda _1} = 2.8,{\lambda _2} = 10,{\lambda _3} = 1000$, and the kernel function $k({x_1},{x_2}) = {(0.8 + x_1^T{x_2})^2}$ is used to define $\varvec{K}$ in our model. The results of the model RBDSC-MN and its comparison methods on the Hopkins12 dataset are listed in Table 5. And it distinctly indicates the benefits of model RBDSC-MN in the presence of missing trajectories. RSIM is the second-best method, but it relies on additional optimization strategy on the Grassmann manifold to acquire the estimated value of the orthogonal matrix $\varvec{V}$ to reduce the segmentation errors when the trajectory is incomplete. The competitiveness of model RBDSC-MN comes from the fact that correntropy is used as the loss function for robust subspace clustering deals with missing trajectories, which are non-Gaussian noises. This demonstrates that the model RBDSC-MN can effectively handle non-Gaussian noises in motion segmentation without any preprocessing or postprocessing step.

Table 5 Segmentation errors (in %) on two-frame Hopkins12 Dataset with missing trajectories

Full size table

4.5.2 Outlying trajectories: Cars Turning, Books and Nrbooks

Dynamic scenes often contain some sample outliers, which do not belong to any motion trajectory in the scene. Figure 8 displays three typical video trajectories, which contain both outlying trajectories (outliers) and missing data samples. More detailed information is shown in Table 6. We design model RBDSC-RN to remove outlier rows, and use three representative motion trajectories (Cars Turning, Books and Nrbooks) to evaluate its performance. Table 7 lists the subspace segmentation results of different methods. Obviously, the RBDSC-RN achieves the lowest mis-segmentation rates. The main reason is that the model RBDSC-RN uses non-convex row-correntropy to eliminate outliers, and it is less affected by larger outliers.

Table 6 Detailed information about the three datasets, including Cars Turning, Books and Nrbooks3

Full size table

Table 7 Segmentation errors (in %) on datasets with outlying trajectories

Full size table

4.6 Parameter sensitivity

Our models contain three important parameters: ${\lambda _1}, \ {\lambda _2} \ \mathrm{{ and }} \ {\lambda _3}$ , where ${\lambda _1}$ constrains the significance of BDR term ${\left\| \varvec{D} \right\| _k}$ , ${\lambda _2} \ \mathrm{{ and }} \ {\lambda _3}$ control the weight of $\left\| { \varvec{H} - \varvec{D}} \right\| _F^2$ and $\left\| {\varvec{E} - ( \varvec{K} - {\varvec{M}^T}\varvec{M})} \right\| _F^2$ (or $\left\| {\varvec{K} - {\varvec{M}^T} \varvec{M}} \right\| _F^2$ ) respectively. We use numerous values to evaluate the parameter sensitivity of the proposed model on the Hopkins12 dataset, and record the average misclassification rate of 12 sequences. Figure 9 shows the experimental results of ${\lambda _1} \in [2.5 \ \mathrm{{ 2}}\mathrm{{.6 \ 2}}\mathrm{{.7 \ 2}}\mathrm{{.8 \ 2}}\mathrm{{.9 \ 3}}], {\lambda _2} \in [5 \ \mathrm{{ 10 \ 15 \ 20 \ 25 \ 30}}], and {\lambda _3} \in [\mathrm{{ 0}}\mathrm{{.01 \ 0}}\mathrm{{.1 \ 1 \ 10 \ 100 \ 1000}}]$ . For Hopkins12 dataset, our method can achieve lower segmentation errors when the parameters ${\lambda _1}$, ${\lambda _2}$ and ${\lambda _3}$ are not less than 2.8, 10 and 0.1 respectively.

5 Conclusion

In this work, a unified rank-constrained block diagonal subspace clustering method for motion segmentation is proposed, which can handle both rigid body and non-rigid body motion segmentation. In addition, two robust motion segmentation models (RBDSC-MN and RBDSC-RN) are put forward to deal with missing trajectories, corrupted trajectories and outlying trajectories. The experimental records based on some commonly used video sequences show that our methods can improve the accuracy of motion segmentation.

We have proved the superiority of our algorithms theoretically and experimentally, but our methods have certain limitations. First, adding kernel mapping constraints and KCIM to the block diagonal regularization representation improves the performance of processing non-linear data (i.e. motion trajectory feature), but it brings some computational overhead. We plan to study how to speed up our methods, such as using GPUs. Second, our methods do not consider the problem of crowd movement segmentation. In crowd movement, there are many people who move independently and each person’s various small actions are included in the crowd movement. In the future, we will conduct related research to make our methods suitable for crowd motion segmentation.

Notes

http://www.vision.jhu.edu/data/hopkins155/.

References

Huang S, Xu Z, Lv J (2018) Adaptive local structure learning for document co-clustering. Knowl Based Syst 148:74–84
Article Google Scholar
Yan W, Zhang B, Ma S, Yang Z (2017) A novel regularized concept factorization for document clustering. Knowl Based Syst 135:147–158
Article Google Scholar
Xue X, Zhang X, Feng X, Sun H, Chen W, Liu Z (2020) Robust subspace clustering based on non-convex low-rank approximation and adaptive kernel. Inf Sci 513:190–205
Article MathSciNet Google Scholar
Borgi MA, Nguyen TP, Labate D, Amar CB (2018) Statistical binary patterns and post-competitive representation for pattern recognition. Int J Mach Learn Cybern 9(6):1023–1038
Article Google Scholar
Chen L, Guo G (2019) Ordered smooth representation clustering. Int J Mach Learn Cybern 10(11):3301–3311
Article Google Scholar
Chen W, Zhang E, Zhang Z (2016) A Laplacian structured representation model in subspace clustering for enhanced motion segmentation. Neurocomputing 208(Oct 5):174–182
Google Scholar
Cheng B, Liu G, Wang J, Huang Z, Yan S (2011) Multi-task low-rank affinity pursuit for image segmentation. In: 2011 International conference on computer vision, IEEE, pp 2439–2446
Costeira JP, Kanade T (1998) A multi-body factorization method for independently moving objects. Int J Comput Vis 29(3):159–179
Article Google Scholar
Dattorro J (2010) Convex optimization & Euclidean distance geometry. Lulu. com
Elhamifar E, Vidal R (2012) Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 35(11):2765–2781
Article Google Scholar
Gear CW (1998) Multibody grouping from motion images. Int J Comput Vis 29(2):133–150
Article Google Scholar
Gruber A, Weiss Y (2004) Multibody factorization with uncertainty and missing data using the em algorithm. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004
He R, Tan T, Wang L, Zheng WS (2012) l_{2, 1} regularized correntropy for robust feature selection. In: 2012 IEEE conference on computer vision and pattern recognition
He R, Tan T, Wang L, Zheng WS (2012) l_{2, 1} regularized correntropy for robust feature selection. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 2504–2511
Huang P, Hilton A, Starck J (2009) Human motion synthesis from 3d video. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 1478–1485
Huang T, Wang S, Zhu W (2020) An adaptive kernelized rank-order distance for clustering non-spherical data with high noise. Int J Mach Learn Cybern 11(8):1735–1747
Article Google Scholar
Ichimura N (1999) Motion segmentation based on factorization method and discriminant criterion. Proceedings of the seventh IEEE international conference on computer vision, vol 1. IEEE, pp 600–605.
Ji P, Li H, Salzmann M, Zhong Y (2016) Robust multi-body feature tracker: a segmentation-free approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3843-3851
Ji P, Salzmann M, Li H (2015) Shape interaction matrix revisited and robustified: efficient subspace clustering with corrupted and incomplete data. In: Proceedings of the IEEE international conference on computer Vision, pp 4687–4695
Ji P, Reid I, Garg R, Li H, Salzmann M (2017) Low-rank kernel subspace clustering. arXiv preprint arXiv:1707.04974 1
Jiang Z, Lin Z, Davis L (2012) Recognizing human actions by learning and matching shape-motion prototype trees. IEEE Trans Pattern Anal Mach Intell 34(3):533–547
Article Google Scholar
Zheng J, Jiang Z, Chellappa R (2016) Cross-view action recognition via transferable dictionary learning. IEEE Trans Image Proc 25(6):2542–2556
Article MathSciNet Google Scholar
Kanatani K, Sugaya Y (2003) Multi-stage optimization for multi-body motion segmentation. In: Australia–Japan advanced workshop on computer vision, vol 2. Citeseer, pp 7
Kanatani Ki (2001) Motion segmentation by subspace separation and model selection. In: Proceedings eighth IEEE international conference on computer vision, vol 2. ICCV 2001, IEEE, pp 586–591
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Article Google Scholar
Keuper M, Tang S, Andres B, Brox T, Schiele B (2019) Motion segmentation and multiple object tracking by correlation co-clustering. IEEE Trans Pattern Anal Mach Intell 42:140–153
Article Google Scholar
Li S, Li K, Fu Y (2015) Temporal subspace clustering for human motion segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 4453–4461
Li Z, Guo J, Cheong LF, Zhou SZ (2013) Perspective motion segmentation via collaborative clustering. In: Proceedings of the IEEE international conference on computer vision, pp 1369–1376
Lin G, Zhu H, Kang X, Fan C, Zhang E (2014) Feature structure fusion and its application. Inf Fusion 20:146–154
Article Google Scholar
Lin G, Zhu H, Kang X, Miu Y, Zhang E (2015) Feature structure fusion modelling for classification. Image Process Iet 9(10):883–888
Article Google Scholar
Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184
Article Google Scholar
Lu C, Tang J, Lin M, Lin L, Yan S, Lin Z (2013) Correntropy induced l2 graph for robust subspace clustering. In: Proceedings of the IEEE international conference on computer vision, pp 1801–1808
Lu C, Feng J, Lin Z, Mei T, Yan S (2018) Subspace clustering by block diagonal representation. IEEE Trans Pattern Anal Mach Intell 41(2):487–501
Article Google Scholar
Lu CY, Min H, Zhao ZQ, Zhu L, Huang DS, Yan S (2012) Robust and efficient subspace segmentation via least squares regression. In: European conference on computer vision, Springer, pp 347–360
Lu J, Wang G, Moulin P (2014a) Human identity and gender recognition from gait sequences with arbitrary walking directions. IEEE Trans Inf Forensics Secur 9(1):51–61
Article Google Scholar
Lu J, Zhou X, Tan YP, Shang Y (2014b) Neighborhood repulsed metric learning for kinship verification. IEEE Trans Pattern Anal Mach Intell 36(2):331–345
Article Google Scholar
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, pp 849–856
Patel VM, Vidal R (2014) Kernel sparse subspace clustering. In: 2014 IEEE international conference on image processing (icip), IEEE, pp 2849–2853
Ran He, Yingya Zhang, Zhenan Sun, Qiyue Yin (2015) Robust subspace clustering with complex noise. IEEE Trans Image Process Publ IEEE Signal Process Soc 24(11):4001–13
Article MathSciNet Google Scholar
Rao S, Tron R, Vidal R, Ma Y (2010) Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories. IEEE Trans Pattern Anal Mach Intell 32(10):1832–1845
Article Google Scholar
Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 1593–1600
Schölkopf B, Smola A, Müller K (2008) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Article Google Scholar
Peng S, Ser W, Chen B, Sun L, Lin Z (2018) Correntropy based graph regularized concept factorization for clustering. Neurocomput 316:34–48
Article Google Scholar
Tron R, Vidal R (2007) A benchmark for the comparison of 3-d motion segmentation algorithms. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8
Vidal R, Favaro P (2014) Low rank subspace clustering (LRSC). Pattern Recognit Lett 43:47–61
Article Google Scholar
Vidal R, Tron R, Hartley R (2008) Multiframe motion segmentation with missing data using powerfactorization and gpca. pp 85–105
Vidal R, Ma Y, Sastry S (2012) Generalized principal component analysis (GPCA). IEEE Trans Pattern Anal Mach Intell 27(12):1945–1959
Article Google Scholar
Wang L, Ding Z, Fu Y (2018) Learning transferable subspace for human motion segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Wang L, Ding Z, Fu Y (2019) Low-rank transfer human motion segmentation. IEEE Trans Image Process 28(2):1023–1034
Article MathSciNet Google Scholar
Wang L, Ding S, Wang Y, Ding L (2021) A robust spectral clustering algorithm based on gridpartition and decision-graph. Int J Mach Learn Cybern 12(5):1243–1254
Article Google Scholar
Wang Q, Chen G (2017) Fuzzy soft subspace clustering method for gene co-expression network analysis. Int J Mach Learn Cybern 8(4):1157–1165
Article Google Scholar
Weifeng L, Pokharel PP, Principe JC (2007) Correntropy: properties and applications in non-gaussian signal processing. IEEE Trans Signal Process 55(11):5286–5298
Article MathSciNet Google Scholar
Yan J, Pollefeys M (2006) A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In: European conference on computer vision, Springer, pp 94–106
Yan Y, Ricci E, Liu G, Sebe N (2015) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995
Article MathSciNet Google Scholar
Yan Y, Yang Y, Deyu M, Liu G, Tong W, Hauptman A, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process A Publ IEEE Signal Process Soc 24(6):1867
Article MathSciNet Google Scholar
Yao J, Cao X, Zhao Q, Meng D, Xu Z (2018) Robust subspace clustering via penalized mixture of gaussians. Neurocomput 278:4–11
Article Google Scholar
Yin M, Guo Y, Gao J, He Z, Xie S (2016) Kernel sparse subspace clustering on symmetric positive definite manifolds. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 5157–5164
Yinfeng M, Jiye L, Fuyuan C, Yijun H (2018) A new distance with derivative information for functional k-means clustering algorithm. Inf Sci 463:166–185
MATH Google Scholar
You C, Robinson D, Vidal R (2016) Scalable sparse subspace clustering by orthogonal matching pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3918–3927
Zhang X, Gao H, Li G, Zhao J, Huo J, Yin J, Liu Y, Zheng L (2018) Multi-view clustering based on graph-regularized nonnegative matrix factorization for object recognition. Inform Sci 432:463–478
Article MathSciNet Google Scholar
Zhu H, Member IEEE, Vial R, Lu S (2018) Yotube: searching action proposal via recurrent and static regression networks. IEEE Trans Image Process Publ IEEE Signal Process Soc 27(6):2609
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research work was supported by the Sichuan Science and Technology Program (Grant nos. 2020YJ0432, 2020YFS0360, 2019YJ0309), the National Natural Science Foundation of China (Grant no. 62102331), Innovative Research Group Project of the National Natural Science Foundation of China. We want to thank Canyi Lu for sharing the code for LSR, BDR-B and BDR-Z. And we thank J. Pan, Chong You and Wang L, who provide the code for LRKSC, SSC-OMP and LRTSC respectively. We also thank anonymous commenters for their substantive suggestions to improve our work.

Author information

Authors and Affiliations

School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621000, China
Li Guo, Xiaoqian Zhang & Zhigui Liu
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
Xiaoqian Zhang
School of Optics and Electronics, Beijing Institute of Technology, Beijing, 100081, China
Qian Wang
School of Aerospace Engineering, Beijing Institute of Technology, Beijing, 100081, China
Jianping Zhou
Automatic Research Institute Co., Ltd. of China South industries Group Corporation, Mianyang, 621000, China
Li Guo, Qian Wang & Jianping Zhou

Authors

Li Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhigui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoqian Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, L., Zhang, X., Liu, Z. et al. Correntropy metric-based robust low-rank subspace clustering for motion segmentation. Int. J. Mach. Learn. & Cyber. 13, 1425–1440 (2022). https://doi.org/10.1007/s13042-021-01456-9

Download citation

Received: 19 January 2021
Accepted: 17 October 2021
Published: 11 November 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s13042-021-01456-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Correntropy metric-based robust low-rank subspace clustering for motion segmentation

Abstract

Similar content being viewed by others

Robust Motion Segmentation with Unknown Correspondences

Motion Segmentation by Velocity Clustering with Estimation of Subspace Dimension

Sparse Optimization for Motion Segmentation

Explore related subjects

1 Introduction

2 Related works

2.1 Motion segmentation

2.2 Spectral clustering-based subspace clustering

3 The proposed methods

3.1 Rank-constrained block diagonal subspace clustering

3.2 Robust rank-constrained block diagonal subspace clustering

3.3 Optimization method

3.3.1 Optimization model RBDSC

3.3.2 Optimization model RBDSC-MN

3.3.3 Optimization model RBDSC-RN

3.4 Clustering

4 Experimental evaluation

4.1 Compared methods and settings

4.2 Rigid-body motion segmentation with complete trajectories: Hopkins155 traffic sequences [44]

4.3 Non-rigid motion segmentation

4.4 Results on Two-frame of each sequence in the Hopkins155 Dataset [44]

4.5 Experimental results on robustness

4.5.1 Missing and corrupted trajectories: Hopkins12 Dataset [46]

4.5.2 Outlying trajectories: Cars Turning, Books and Nrbooks

4.6 Parameter sensitivity

5 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation