1 Introduction

The purpose of motion segmentation [56] is to label the trajectory points of different moving objects in a dynamic scene into the corresponding groups, as shown in Fig. 1. It is the cornerstone of many machine vision applications, including action recognition [22, 29, 30, 54], visual surveillance [6], event detection [55], motion synthesis [15], and many other applications [35, 36]. Kanatani et al. [24] proposed that the motion segmentation from tracking feature points can be converted into a subspace clustering problem, where each subspace represents an independent motion. Subspace clustering can complete many complex tasks, such as document clustering [1, 2], face clustering [5, 58], gene sequence analysis [51], image segmentation [7, 56], motion segmentation [26, 60, 61], etc. The research topic of this paper is subspace clustering for motion segmentation.

Fig. 1
figure 1

Several Image Frames from Cars5 Sequence in the Hopkins155 Motion Dataset. Different color segmentation feature points represent different moving objects (colour figure online)

Several commonly used subspace clustering approaches were designed, such as kernel k-means (KKM) [42], efficient k-means algorithm [25] and spectral clustering (SC) [37, 50]. However, they cannot be directly used to motion segmentation because the motion trajectory feature data has the following characteristics.

  • Non-linear structure: In motion segmentation, the trajectories of one motion are mostly located in a non-linear subspace (or submanifold) [20], because the camera always has a certain degree of perspective distortion, which leads to the invalid assumption of the affine camera model.

  • Low-rank: To capture smooth motion data, the capture frame rate of motion capture data is very high. Although high capture frame frequency can effectively avoid the motion jitter between adjacent frames and improve the visual effect of captured data, it also leads to high similarity between adjacent data frames and high redundancy of the entire data. If a matrix is used to represent the motion sequence, the rank of the matrix should be low.

  • Containing noise: Real motion data obtained by a tracker is a complicated process [40], so noise is inevitable during the segmentation process. A trajectory may correspond to some random motions, which makes the assumption of the affine camera model invalid (outlying trajectory). Furthermore, some features in some frames may be lost due to complex actions and fast motions, resulting in some missing entries in the trajectory (incomplete trajectory). And if some feature points are tracked incorrectly by the tracker unintentionally, resulting in some serious errors in the track (corrupted trajectory). The incorrect trajectories mentioned above are all non-Gaussian noises embedded in the data, and their structures are much more complex [56].

  • Real-time operation: Compared with independent static data, motion data contains a lot of successive information, which is the only clue to guide the clustering algorithm. Therefore, real-time operation is necessary for motion segmentation applications.

In view of the above special characteristics of the motion trajectory feature data, some superior subspace clustering algorithms have been proposed. In particular, Sparse Subspace Clustering (SSC) [10] can complete sparse representation of data point by selecting within-cluster data. Low-rank Representation (LRR) [31] and Least Square Regression (LSR) [34] encourage the matrix obtained to be low-rank. All these clustering algorithms, however, can only deal with linear data, but cannot handle non-linear data. A few other subspace clustering methods, such as Kernel Sparse Subspace Clustering (KSSC) [38], KSSC on SPD Riemannian Manifold (KSSCR) [57] and Low Rank Subspace Clustering (LRSC) [45], design nonlinear subspace clustering by kernel tricks. However, these methods with the predefined kernels cannot guarantee the low rank of the data after mapping to the feature space, which hinders the improvement of segmentation performance. Ji et al. [20] proposed a Low Rank Kernel Subspace Clustering (LRKSC) method, which ensures the low-rank attribute of the matrix by learning low-rank kernel mapping. In general, LRKSC can construct an affinity matrix with block diagonal structures, resulting in correct clustering. However, the block diagonal matrix is obtained indirectly by using the low-rank structural prior, which is sensitive to noise.

Aiming at the noise in the motion data, several advanced methods for modeling noise have been put forward. In particular, [10, 20, 45] assumed that noise in the motion data is sparse and used the \({l_1}\)-norm to deal with it. A reconstruction errors measurement method based on \({l_2}\)-norm [34] was proposed, which is robust to small Gaussian noise. Liu et al. [31] proposed that the \({l_{2,1}}\)-norm can be used as a measure of data approximation, which is robust to outliers of a specific sample. These methods can only deal with Gaussian noise or sparse noise, but they are very sensitive to non-Gaussian noises such as severely mis-tracked features, missing entries or large outliers. In order to solve the above problems, [3, 43, 52] used the correntropy induced metric (CIM) to handle the non-Gaussian and impulsive noises, and achieved excellent performance. Unfortunately, there is no motion segmentation algorithm that can handle various noises in motion data in a unified way.

Recently, Wang et al. [49] have designed a new low-rank transfer subspace clustering method (LRTSC) for human motion segmentation by transferring existing well-labeled source data into target data. However, LRTSC has an obvious disadvantage that it takes a lot of time for data transfer operations, which is contrary to the timeliness of the motion segmentation application.

In view of the above problems of existing subspace clustering methods in motion data processing, we provide the following solutions respectively.

  • To handle nonlinear data, we propose to use kernel function to model motion trajectory.

  • To resist various noise contained in motion trajectory, we design a variant of CIM called the kernelized correntropy-induced metric (KCIM) to measure data error.

  • To strengthen the block diagonal structure of the learned affinity matrix, we integrate BDR into the kernel subspace clustering.

  • To improve the performance of motion segmentation in various scenarios, we propose a unified rank-constrained block diagonal subspace clustering method (RBDSC) and extend it to two robust versions. Furthermore, a new algorithm HQ& AM is designed to optimize these models.

The remainder of this work is structured as follows: (1) Sect. 2 summarizes the related works. (2) Section 3 elaborates our clustering methods and the corresponding optimization. (3) Section 4 carries out experiments and discussion. (4) The current work is summarized in Sect. 5.

2 Related works

We first introduce motion segmentation methods, and then review subspace clustering algorithms based on SC, including SSC [10], LRR [31] and their extensions.

2.1 Motion segmentation

Although many approaches have been proposed in the literature for motion segmentation, they haven’t fully considered the important characteristics of the motion trajectory feature data presented in Sect. 1.

In early methods [8, 11, 17, 24], factorization-based approaches attempted to directly detect the motion sequence’s cutting points. These methods require that the movements must be independent of each other so that it is easy to handle. However, for most dynamic scenes involving joint objects or motion cameras, the motions are at least partially interdependent.

To deal with partly dependent motions, Vidal et al. [47] put forward general principal component analysis (GPCA), which is a subspace segmentation method based on algebraic geometric. The algorithm does not have any restriction on the relative orientations of the motion subspace, such as allowing arbitrary intersections among different subspaces, so it can handle partially dependent motions. GPCA can work well if and only when the size and the number of subspaces are small, but its performance decreases with the increase of the number of subspaces. Obviously, it is only suitable for short and simple motion sequences.

Many scholars including [12, 23] describe motion segmentation as a statistical clustering problem, which can be solved by expectation–maximization (EM) or its variants. However, they may fall into suboptimal local minima because they require well-initialized iterative methods.

Recently, spectral clustering-based methods have been used for motion segmentation. They first learn an affinity matrix by utilizing local information around each trajectory, and then apply SC to obtain segmentation results [37]. The SC-based methods are the most popular. In this work, we mainly research the clustering segmentation based on spectral clustering.

2.2 Spectral clustering-based subspace clustering

Lately, some advanced SC-based subspace clustering algorithms, such as SSC [10] and LRR [31] have drawn much attention. All of them believe that data points can be expressed as a linear combination of other points from the same subspace. SSC assumes that data has self-expression and sparse-representation property [5, 10]. Specifically, for each data sample x, the coefficient matrix \(\varvec{D}\) satisfies

$$\begin{aligned} \mathop {\min }\limits _\mathbf{D} {\left\| \mathbf{D} \right\| _1}{{ + }}\gamma {\left\| \mathbf{E} \right\| _1}{\quad \hbox {s}}{\hbox {.t}}{{. \ \mathbf{X} = \mathbf{XD} + \mathbf{E} , }} \ {{ \mathbf{D} }_{ii}} = 0 \end{aligned}$$
(1)

where \({\left\| \bullet \right\| _1}\) promotes sparsity of matrix, \(\mathbf{E} \in \mathbf{R ^{N{{*}}N}}\) is an error matrix and \(\gamma\) represents an adjust parameter.

To recover the subspace structures of data, [31] proposed LRR method, which aims at resolving the following rank minimization problem:

$$\begin{aligned} \mathop {\min }\limits _{ \mathbf{D}, \ E } rank( \mathbf{D} ) + \gamma {\left\| \mathbf{E} \right\| _q}{\quad \mathrm { s}}\mathrm{{.t} }{{.\ \mathbf{X} = \mathbf{MD} + \mathbf{E} }} \end{aligned}$$
(2)

where \(\varvec{M}\) denotes a dictionary matrice, the \(rank( \mathbf{D} )\) is named the lowest-rank of data matrix X relative to dictionary \(\varvec{M}\), and \({\left\| \bullet \right\| _q}\) represents certain regularization strategy, e.g., \({\left\| \bullet \right\| _1}\) is used for characterizing the sparse corruptions, \({\left\| \bullet \right\| _2}\) for modeling small Gaussian noise, and \({\left\| \bullet \right\| _{2,1}}\) for dealing with sample-specific corruptions (or outliers).

To solve equation (2) easily, [31] proposed to replace the rank function with the nuclear norm. And a self-expressiveness method is proposed to represent the dictionary in Eq. (2) by the data matrix \(\varvec{X}\) itself. Thus, Eq. (2) is rewritten as:

$$\begin{aligned} \mathop {\min }\limits _{ \mathbf{D}, \ E } {\left\| \mathbf{D} \right\| _ * } + \gamma {\left\| \mathbf{E} \right\| _q}{\quad \mathrm { s}}\mathrm{{.t} }{{.\ \mathbf{X} = \mathbf{XD} + \mathbf{E} }} \end{aligned}$$
(3)

where \({\left\| \mathbf{D} \right\| _ * }\) represents the trace norm of matrix \(\varvec{D}\).

SSC and LRR have achieved excellent performance when dealing with linear data, but they cannot deal with nonlinear data. Ji et al. [20] proposed the LRKSC method, which cleverly integrates the “kernel trick” strategy, low-rank learning, and self-representation, so that nonlinear data can be handled and low rank of data can be guaranteed. It is formulated as:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M } \frac{{{\lambda _2}}}{2}||\Phi (\mathbf{X} ){{ - }}\Phi (\mathbf{X} )\mathbf{D} {{||}}_F^2 + {\lambda _1}{\left\| \mathbf{D} \right\| _1}{{ + }}{\left\| \mathbf{M} \right\| _ * } + {\lambda _3}{\left\| \mathbf{E} \right\| _1}\\ \quad s.t. \ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{K} = \mathbf{M ^T}{} \mathbf{M} + \mathbf{E} \end{array} \end{aligned}$$
(4)

where \(\varvec{K}\) represents kernel matrix, and \({\Phi ^T}(\mathbf{X} )\Phi (\mathbf{X} ){{ = }} {{\mathbf{M }}^T}{} \mathbf{M}\) . However, the above model has two drawbacks: (1) using \({\left\| \mathbf{D} \right\| _1}\) can construct an affinity matrix with block diagonal, but its diagonal structure is fragile. (2) Using \({\left\| \mathbf{E} \right\| _1}\) can model arbitrary sparse corruptions, but it is a challenge for large errors in motion data.

3 The proposed methods

We propose a unified rank-constrained block diagonal subspace clustering method (RBDSC) for motion segmentation. Then, two robust RBDSC models for suppressing noise in motion data are designed:

rank-constrained block diagonal subspace clustering with modeling noise (RBDSC-MN) and rank-constrained block diagonal subspace clustering with removing noise (RBDSC-RN). Finally, the optimization procedures of the proposed models are given. Table 1 presents the important notations involved in the paper.

Table 1 The important notations used in this work

3.1 Rank-constrained block diagonal subspace clustering

To handle nonlinear structure of motion trajectory feature data, we adopt the low-rank kernel mapping strategy to map these data into high-dimensional Hilbert space so as to perform linear pattern analysis. And we enforce low-rank constraints on the motion trajectory in the kernel space. Motivated by LRR [31] and LRKSC [20] methods, we use self-expressiveness-based subspace clustering frameworks to learn the affinity matrix. By comprehensively considering low-rank kernel mapping and self-expressiveness, the low-rank kernel self-expressiveness minimization problem is expressed as:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M } \frac{1}{2}||\mathbf{M} {{ - }} \mathbf{MD} {{||}}_F^2{{ + }}{\lambda _1}\mathfrak {R}{{(\mathbf{D} ) + }} {\left\| \mathbf{M} \right\| _ * }\\ \quad s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{K} = \mathbf{M ^T}{} \mathbf{M} \end{array} \end{aligned}$$
(5)

where \({\lambda _1}\) is a nonnegative balancing parameter, and \(\mathfrak {R}( \mathbf{D} )\) is the regularization term about matrix \(\varvec{D}\). According to different motivations, \(\mathfrak {R}(\mathbf{D} )\) can represent different regularization items, such as \({\left\| \mathbf{D} \right\| _ * }\) and \({\left\| \mathbf{D} \right\| _1}\). The affinity matrixes constructed by these methods all have the common block-diagonal property, which contributes to ameliorating the performance of SC-based subspace clustering methods. However, their block diagonal structures acquired by \({\left\| \bullet \right\| _ * }\) and \({\left\| \bullet \right\| _1}\) are generally frangible and will be damaged when encountering a small signal-to-noise ratio, thereby reducing the clustering results. Inspired by [33], we introduce BDR to directly pursue the block-diagonal representation of coefficient matrix \(\varvec{D}\). Therefore, we propose a rank-constrained block diagonal representation method (RBDSC) and it is formulated as:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M } \frac{1}{2}||\mathbf{M} {{ - }} \mathbf{MD} {{||}}_F^2 + {\lambda _1}{\left\| \mathbf{D} \right\| _\mathrm{{k}}} {{ + }}{\left\| \mathbf{M} \right\| _ * }\\ \quad s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,{\mathbf{D = }}{{\mathbf{D }}^T},\mathbf{K} = \mathbf{M ^T}{} \mathbf{M} \end{array} \end{aligned}$$
(6)

where k is the number of clusters or classes.

However, the constraint on matrix \(\varvec{D}\) in Eq. (6) limits its representation ability. To alleviate this issue, we add intermediate matrix \(\varvec{H}\), which satisfies \(\mathbf{H} = \mathbf{D}\). The problem (6) translates to

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M,H } {\lambda _1}{\left\| \mathbf{D} \right\| _\mathrm{{k}}}{{ + }}{\left\| \mathbf{M} \right\| _ * } {{ + }}\frac{{||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2}}{2} + \frac{{{\lambda _2}\left\| \mathbf{H - \mathbf{D} } \right\| _F^2}}{2}{{ + }} \frac{{{\lambda _3}||\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} {{||}}_F^2}}{2}\mathrm{{ }}\\ s.t.\mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{ D }}^T} \end{array} \end{aligned}$$
(7)

3.2 Robust rank-constrained block diagonal subspace clustering

For LRKSC in Eq. (4), the \({\left\| \mathbf{E} \right\| _1}\) can only model arbitrary sparse corruptions, but it cannot handle noise in motion data very well, because these noises are complex and unpredictable. Unlike \({L_1}\mathrm{{-norm}}\), correntropy-induced metric (CIM) [52] is a local metric, which is particularly robust to complex noise [39] and non-Gaussian noise [52]. It has been widely used in the field of computer vision, such as face clustering [4], object image clustering [3], feature selection [14], and semi-supervised learning [34]. The correntropy [16, 52] of arbitrary variables w and v is formulated as

$$\begin{aligned} \mathop {{V_\delta }}\limits ^ \vee (w,v) = \frac{1}{N}\sum \limits _{i = 1}^N {{g_\sigma }({w_i} - {v_i})} \end{aligned}$$
(8)

where \({\mathrm{{g}}_\sigma }( \bullet )\) denotes kernel function, and \(\sigma\) is the width of \(g({\mathrm{{w}}_i},{v_i})\) . Gaussian kernel \({g_\sigma }({\mathrm{{w}}_i},{v_i}) = \exp ({{ - }}\left\| {{w_i}{{ - }}{v_i}} \right\| _2^2 /2{\sigma ^2})\) is usually used as the kernel function. Generally, CIM [52] is used to evaluate the similarity of two variables w and v :

$$\begin{aligned} \begin{array}{l} CIM(\mathrm{{w}},v) = \sqrt{1 - \mathop {{V_\sigma }}\limits ^ \vee (w,v)} \\ \\ \qquad \ \qquad \ = \sqrt{1 - \frac{1}{N}\sum \limits _{i = 1}^N {{g_\sigma }({w_i} - {v_i})} } \end{array} \end{aligned}$$
(9)

Unfortunately, we cannot directly introduce CIM into our model, because CIM is designed for linear data, while the motion tracking features are nonlinear. Fortunately, the motion data is linear in the Hilbert eigenspace [20], where correntropy can be applied to the motion data. Thus, we design a variant of CIM called the kernelized correntropy-induced metric (KCIM), which is expressed as

$$\begin{aligned} KCIM(\phi (w),\phi (v)) = \sqrt{1 - \frac{1}{N}\sum \limits _{i = 1}^N {{g_\sigma }(\phi ({\mathrm{{e}}_i}))} } \end{aligned}$$
(10)

where \(\phi ({e_i}) = \phi ({w_i}) - \phi ({v_i})\) for \(i = 1,2,3 \ldots N\) .

It can be seen that KCIM is close to 1 when handling a large error (\(\phi ({e_i})\) is large), which is much smaller than the absolute error and the mean squared error used. Larger noise usually comes from data loss, data corruption and outliers [39]. Thus, we propose a rank-constrained block diagonal subspace clustering with modeling noise (RBDSC-MN), which combines the rank-constrained block diagonal representation (RBDSC) and KCIM. And the RBDSC-MN model is formulated as:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M,H,E } {\lambda _1}{\left\| \mathbf{D} \right\| _\mathrm{{k}}}{{ + }}{\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{1}{2}||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2 +\\ \\ \qquad \qquad \frac{{{\lambda _2}}}{2}\left\| \mathbf{H - \mathbf{D} } \right\| _F^2 {{ + }}\frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2 \\ \\ \qquad \qquad {{ + }} \sum \limits _i {\sum \limits _j {(1 - {g_\sigma }(\mathbf{E _{ij}}))} } \\ \qquad s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{D }}^T} \end{array} \end{aligned}$$
(11)

where \(\mathbf{E _{ij}}\) is the entry (ij) of matrix \(\varvec{E}\).

For larger outliers, the prior information can be further used in the model proposed in (11) to improve the clustering accuracy. Inspired by \({\left\| \bullet \right\| _{2,1}}\) (group sparsity) in LRR [31] to handle large outliers [14, 31, 32], we design a new robust subspace clustering, namely RBDSC-RN, to remove outlying trajectories. And it is formulated as:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,M,H,E } {\lambda _1}{\left\| \mathbf{D} \right\| _\mathrm{{k}}}{{ + }}{\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{1}{2}||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2 + \\ \\ \qquad \qquad \frac{{{\lambda _2}}}{2}\left\| \mathbf{H - \mathbf{D} } \right\| _F^2{{ + }} \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T} \mathbf{M} ){{||}}_F^2 \\ \\ \qquad \qquad {{ + }}\sum \limits _i {(1 - {g_\sigma }({{\left\| {\mathbf{E ^i}} \right\| }_2}))} \\ \qquad \mathrm{{ }}s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{D }}^T} \end{array} \end{aligned}$$
(12)

where \(\mathbf{E ^i}\) is the ith row of \(\varvec{E}\).

Therefore, to effectively deal with the motion data errors, two robust rank-constrained block diagonal subspace clustering methods are used: the RBDSC-MN model in Eq. (11) (handling incomplete trajectories and corrupted trajectories), and the RBDSC-RN model in Eq. (12) (handling outlying trajectories).

3.3 Optimization method

Since the objective functions in Eqs. (11) and (12) are non-convex, optimizing them is a challenging problem. We design a novel algorithm HQ& AM, which combines Half-quadratic theory [52] and alternating minimization [33]. Because Eqs. (7), (11) and (12) are identical except for the error term \(\varvec{E}\), we present the optimization process of the model RBDSC (Eq. (7)), and then illustrate the optimization schemes of model RBDSC-MN (Eq. (11)) and model RBDSC-RN (Eq. (12)).

3.3.1 Optimization model RBDSC

(1) Updating \(\varvec{M}\):

By fixing (\(\varvec{D}\), \(\varvec{H}\)), the optimization sub-problem corresponding to \(\mathbf{M ^{t + 1}}\) in Eq. (7) is

$$\begin{aligned} \mathop {\min }\limits _\mathbf{M} {\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{1}{2}||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2{{ + }}\frac{{{\lambda _3}}}{2}||\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} {{||}}_F^2 \end{aligned}$$
(13)

where \(||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2\) can be expanded and we have

$$\begin{aligned} ||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2 = tr(\mathbf{I} - 2\mathbf{H} + \mathbf{H} \mathbf{H ^T})\mathbf{M ^T}{} \mathbf{M} \end{aligned}$$
(14)

Equation (13) can be rewritten as

$$\begin{aligned} \mathop {\min }\limits _\mathbf{M} {\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{{{\lambda _3}}}{2}||\mathbf{M ^T}{} \mathbf{M} - \mathbf{K} + \frac{{(\mathbf{I} - 2\mathbf{H} + \mathbf{H} \mathbf{H ^T})}}{{2{\lambda _3}}}{{||}}_F^2 \end{aligned}$$
(15)

We define \(\mathbf{K _\mathbf{G} } =\mathbf { K }- \frac{{(\mathbf {I} - 2\mathbf {H} + \mathbf {H}{} \mathbf{{H}^T})}}{{2{\lambda _3}}}\) , then Eq. (15) is transformed to

$$\begin{aligned} \mathop {\min }\limits _\mathbf{M} {\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{{{\lambda _3}}}{2}||\mathbf{M ^T}{} \mathbf{M} - \mathbf{K _\mathbf{G} }{{||}}_F^2 \end{aligned}$$
(16)

We can obtain a closed-form solution by

$$\begin{aligned} \mathbf{M ^{t + 1}} = {\Sigma ^ * }{\Delta ^T} \end{aligned}$$
(17)

where \({\Sigma ^ * }\) and \(\Delta\) are both related to the SVD of \(\mathbf{K _\mathbf{G} }\) . Let \(\mathbf{K _\mathbf{G} } = \Gamma \Lambda {\Delta ^T}\) represent the SVD of \(\mathbf{K _\mathbf{G} }\) , \({\upsilon _i}\) is the ith singular value of \(\mathbf{K _\mathbf{G} }\). , \({\Sigma ^ * }{{ = }}\left( {\begin{array}{*{20}{c}} {\varsigma _1^ * }&{}{}&{}0\\ {}&{} \ddots &{}{}\\ 0&{}{}&{}{\varsigma _N^ * } \end{array}} \right)\) with \(\varsigma _i^ * = \mathrm{{ arg}}\mathop {\mathrm{{min}}}\limits _{{\varsigma _i}} \frac{{{\lambda _3}}}{2}{({\upsilon _i} - \varsigma _i^2)^2} + {\varsigma _i}\) and

\({\varsigma _i} \in \left\{ {x \in {\mathfrak {R}_{{ + }}}|{x^3} - {\upsilon _i}x + {{(2{\lambda _3})}^{ - 1}} = 0} \right\} \cup \left\{ 0 \right\}\) . \(\Delta\) is a matrix including the singular values of \(\mathbf{K _\mathbf{G} }\) .

(2) Updating \(\varvec{H}\):

By fixing (\(\varvec{D}\), \(\varvec{M}\)), the optimization sub-problem corresponding to \(\mathbf{H ^{t + 1}}\) in Eq. (7) is

$$\begin{aligned} \mathop {\min }\limits _\mathbf{H} \frac{1}{2}||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2 + \frac{{{\lambda _2}}}{2}\left\| \mathbf {H} - {D} \right\| _F^2 \end{aligned}$$
(18)

We can obtain its closed-form solution given by

$$\begin{aligned} {{\mathbf{H }}^{t + 1}}{{ = (}}{\lambda _2}{} \mathbf{D} + {{\mathbf{M }}^T}{\mathbf{M )/(}}{\lambda _2}{\mathbf{I }} + {{\mathbf{M }}^T}{\mathbf{M )}} \end{aligned}$$
(19)

(3) Update \(\varvec{D}\):

By fixing (\(\varvec{M}\), \(\varvec{H}\)), the optimization sub-problem corresponding to \(\mathbf{D ^{t + 1}}\) in Eq. (7) is

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D} {\lambda _1}{\left\| \mathbf{D} \right\| _\mathrm{{k}}}{{ + }}\frac{{{\lambda _2}}}{2}\left\| \mathbf{H - \mathbf{D} } \right\| _F^2 \\ \quad \ s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{D }}^T} \end{array} \end{aligned}$$
(20)

Define \({{\mathbf{L }}_\mathbf{D} } = Diag(\mathbf{D} \varvec{1}) - \mathbf{D}\) is the Laplacian matrix corresponding to matrix \(\varvec{D}\). The k block-diagonal representation refers to the addition of k smallest eigenvalues:

$$\begin{aligned} {\left\| \mathbf{D} \right\| _\mathrm{{k}}} = \sum \limits _{i = N + 1 - k}^N {{\lambda _i}(\mathbf{L _\mathbf{D} })} \end{aligned}$$
(21)

where \({\lambda _i}(\mathbf{L _\mathbf{D} })\) is the ith smallest eigenvalue of \(\mathbf{L _\mathbf{D} }\) . According to [9], Eq. (21) can be rewritten as

$$\begin{aligned} \begin{array}{l} \sum \limits _{i = N + 1 - \mathrm{{k}}}^N {{\lambda _i}({ \mathbf{L} _\mathbf{D} })} = \mathop {\min }\limits _ \mathbf{Q} Tr( \mathbf{Q} { \mathbf{L} _ \mathbf{D} }) \\ \quad \ \;\;s.t.\;\;{Tr}(\mathbf{Q} ) = k,0 \le \mathbf{Q} \le \mathbf{I} \end{array} \end{aligned}$$
(22)

So, Eq. (20) can be transformed to

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{D,Q } {\lambda _1} Tr(\mathbf{Q} (Diag(\mathbf{D} \varvec{1}) - \mathbf{D} )){{ + }}\frac{{{\lambda _2}}}{2}\left\| \mathbf{H -\mathbf{D} } \right\| _F^2\\ \quad s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{D }}^T},\; \mathbf{0} \le \mathbf{Q} \le \mathbf{I} ,Tr(\mathbf{Q} ) = k \end{array} \end{aligned}$$
(23)
  • Fix \(\mathbf{D} = \mathbf{D ^T}\), updating \(\varvec{Q}\) by

    $$\begin{aligned} \begin{array}{l} \mathbf{Q ^{t + 1}}{{ = }}\mathop {\arg \min }\limits _{\mathrm{{ }} \mathbf{Q} } {\lambda _1}Tr(\mathbf{Q} (Diag(\mathbf{D} \varvec{1}) - \mathbf{D} ))\\ \qquad s.t.\ Tr(\mathbf{Q} ) = \mathrm{{k}}, \mathbf{0} \le \mathbf{Q} \le \mathbf{I} \end{array} \end{aligned}$$
    (24)

    Inspired by [9], we get the solution of Eq. (24) as

    $$\begin{aligned} \mathbf{Q ^{t + 1}} = \mathbf{V _*}\mathbf{V _*}^T \end{aligned}$$
    (25)

    where \(\mathbf{V _*}\) is the matrix containing the eigenvectors of k minimum eigenvalues corresponding to \(Diag(\mathbf{D} \varvec{1}) - \mathbf{D}\) .

  • Fix \(\mathbf{Q} = \mathbf{Q }^T\), updating \(\varvec{D}\) by

    $$\begin{aligned} \begin{array}{l} {{\mathbf{D }}^{t + 1}}\mathrm{{ = arg }}\mathop {\mathrm{{min}}}\limits _\mathbf{D} {\lambda _1}Tr(\mathbf{Q} (Diag(\mathbf{D} \varvec{1}) - \mathbf{D} )){{ + }}\frac{{{\lambda _2}}}{2}\left\| \mathbf{H - {D}} \right\| _F^2\\ \qquad s.t.\ \mathbf{D} \ge 0,\mathbf{D _{ii}} = 0,\mathbf{D} {{ = }}{{\mathbf{D }}}^{T} \end{array} \end{aligned}$$
    (26)

    Equation (26) can be translated into follows:

    $$\begin{aligned} \begin{array}{l} \mathbf{D ^{t + 1}}\mathop { = \arg \min }\limits _{\quad \mathbf{D} } \frac{1}{2}\left\| \mathbf{D} - \left( \mathbf{H} - \frac{{{\lambda _1}}}{{{\lambda _2}}}(diag(\mathbf{Q} ){\varvec{1}^T} - \mathbf{Q} )\right) \right\| _F^2 \\ \qquad s.t.\ \;\;\;\mathbf{C} \ge 0,\mathbf{C }_{ii} = 0,\mathbf{C} = \mathbf{C }^T \end{array} \end{aligned}$$
    (27)

    Let \(\mathbf{U} \in {\mathbf {R}^{N \times N}}\), defining \(\mathbf {U} = \mathbf{H} - \frac{{{\lambda _1}}}{{{\lambda _2}}}(diag(\mathbf {Q}) {\varvec{1}^T} - \mathbf {Q})\), \(\mathop {\mathbf {U}}\limits ^ \wedge = \mathbf {U} - Diag(diag(\mathbf {U}))\). The closed-form solution of Eq.(27) is

    $$\begin{aligned} \begin{array}{l} {\mathbf {D}^{t + 1}} = {\left[ \frac{{\mathop {\mathbf {U}}\limits ^ \wedge + \mathop {{\mathbf {U}^T}}\limits ^ \wedge }}{2}\right] _ + } \end{array} \end{aligned}$$
    (28)

3.3.2 Optimization model RBDSC-MN

(1) Updating \(\varvec{M}\):

By fixing (\(\varvec{D}\), \(\varvec{H}\)), the optimization sub-problem corresponding to \(\mathbf{M ^{t + 1}}\) in Eq. (11) is

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{M} {\left\| \mathbf{M} \right\| _ * }{{ + }}\frac{1}{2}||\mathbf{M} {{ - }}{} \mathbf{MH} {{||}}_F^2{{ + }}\frac{{{\lambda _3}}}{2}||\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} - \mathbf{E} {{||}}_F^2 \end{array} \end{aligned}$$
(29)

Except for \(\mathbf{K _\mathbf{G} }\) , the problem of solving Eq. (29) is similar to that of Eq. (13).

$$\begin{aligned} \mathbf{K _\mathbf{G} } = \mathbf{K} - \frac{{(\mathbf{I} - 2\mathbf{H} + \mathbf{H} \mathbf{H ^T})}}{{2{\lambda _3}}} - \mathbf{E} \end{aligned}$$
(30)

(2) Updating \(\varvec{H}, \varvec{D}\):

By comparing the two Eqs. (7) and (11), the updates of \(\varvec{H}\) and \(\varvec{D}\) in Eq. (11) are the same as Eqs. (18) and (20), respectively. Thereby, the solutions of \(\varvec{H}\) and \(\varvec{D}\) are obtained by Eqs. (19) and (28), respectively.

(3) Updating \(\varvec{E}\):

By fixing (\(\varvec{M}, \varvec{H}, \varvec{D}\)), the optimization sub-problem corresponding to \(\mathbf{E ^{t + 1}}\) in Eq. (11) is

$$\begin{aligned} \mathop {\min }\limits _\mathbf{E} \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2{{ + }}\sum \limits _i {\sum \limits _j {(1 - {g_\sigma }(\mathbf{E _{ij}}))} } \end{aligned}$$
(31)

According to HQ analysis [52], the above problem (31) can be converted into

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\mathbf{E,S } \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2{{ + }}\left\| {\mathbf{S ^{\frac{1}{2}}} \otimes \mathbf{E} } \right\| _F^2 \\ \qquad {{ + }}\sum \limits _i {\sum \limits _j {\Psi (\mathbf{S _{ij}})} } \end{array} \end{aligned}$$
(32)

where \(\varvec{S}\) is the matrix containing the auxiliary variable \(\mathbf{S _{ij}}\) , \(\otimes\) represents the Hadamard product, and \(\Psi ( \bullet )\) is the dual potential function of \(1 - {g_\sigma }( \bullet )\) .

A new optimization algorithm HQ&AM is designed to solve Eq. (32). Specifically, in each iteration, \(\varvec{S}\) is optimized by HQ theory, and then \(\varvec{E}\) is optimized by alternating minimization.

  • Fix \(\varvec{E}\), updating \(\varvec{S}\) by

    $$\begin{aligned} \mathbf{S} _{ij}^{t + 1} = {g_\sigma }(\mathbf{E} _{ij}^t) = \exp ({{ - (}}{} \mathbf{E} _{ij}^t{)^2}/2{\sigma ^2}) \end{aligned}$$
    (33)
  • Fix \(\varvec{S}\), updating \(\varvec{E}\) by

    $$\begin{aligned} \begin{array}{l} {{\mathbf{E }}^{t + 1}}\mathrm{{ = arg }}\mathop {\mathrm{{min}}}\limits _ \mathbf{E} \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }} (\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2 \\ \qquad \quad {{ + }}\left\| {{{(\mathbf{S ^{t + 1}})}^{\frac{1}{2}}} \otimes \mathbf{E} } \right\| _F^2\\ \qquad {{ = }}\ (\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} )./\left( \varvec{1} + \frac{{2\mathbf{S ^{t + 1}}}}{{{\lambda _3}}}\right) \end{array} \end{aligned}$$
    (34)

    where ./ represents element-by-element division.

3.3.3 Optimization model RBDSC-RN

(1) Updating \(\varvec{M}, \varvec{H}, \varvec{D}\):

By comparing the two Eqs. (11) and (12), the updates of \(\varvec{M}, \varvec{H} \ and \ \varvec{D}\) in Eq. (12) are the same as (29), (18) and (20), respectively. Thereby, the solutions of \(\varvec{H} \ and \ \varvec{D}\) are obtained by (19) and (28), and optimizing \(\varvec{M}\) is similar to solving Eq. (13), but \(\mathbf{K _\mathbf{G} } = \mathbf{K} - \frac{{(\mathbf{I} - 2\mathbf{H} + \mathbf{H} \mathbf{H ^T})}}{{2{\lambda _3}}} - \mathbf{E}\).

(2) Updating \(\varvec{E}\):

By fixing (\(\varvec{M}, \varvec{H} \ and \ \varvec{D}\)), the optimization sub-problem corresponding to \(\mathbf{E ^{t + 1}}\) in Eq. (12) is

$$\begin{aligned} \mathop {\min }\limits _\mathbf{E} \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2{{ + }}\sum \limits _i {(1 - {g_\sigma }({{\left\| {\mathbf{E ^i}} \right\| }_2}))} \end{aligned}$$
(35)

According to HQ analysis [52], the above problem (35) can be converted into

$$\begin{aligned} \mathop {\min }\limits _\mathbf{E,S } \frac{{{\lambda _3}}}{2}||\mathbf{E} {{ - }}(\mathbf{K} - \mathbf{M ^T}{} \mathbf{M} ){{||}}_F^2{{ + }}\left\| {\mathbf{S ^{\frac{1}{2}}} \otimes \mathbf{E} } \right\| _F^2{{ + }}\sum \limits _i {\Psi (\mathbf{S _i})} \end{aligned}$$
(36)

where \(\varvec{S}\) is the matrix containing the auxiliary variable \({\varvec{S}_i}\) .

  • Fix \(\varvec{E}\), updating \(\varvec{S}\) by

    $$\begin{aligned} \begin{array}{l} \mathbf{S} _{ij}^{t + 1} = {g_\sigma }({\left\| {{{(\mathbf{E ^t})}^i}} \right\| _2}) \\ \qquad = \exp ({{ - }}\left\| {{{(\mathbf{E ^t})}^i}} \right\| _2^2/2{\sigma ^2}), i = 1,2 \ldots N \end{array} \end{aligned}$$
    (37)

    where \({(\mathbf{E ^t})^i}\) is the ith row of matrix \({{\mathbf{ E }}^t}\) .

  • Fix \(\varvec{S}\), updating \(\varvec{E}\)

     The solutions of \(\varvec{E}\) can be obtained by (34).

3.4 Clustering

According to whether the motion trajectory feature data contains noise and the source of noise, we select the corresponding algorithm 1, 2 and 3 to solve (7), (11) and (12) respectively.

figure a
figure b
figure c

4 Experimental evaluation

To demonstrate the superiority of our methods, we conduct extensive experiments on different video sequences. We first evaluate the performance of the proposed algorithms on rigid and non-rigid structure-from-motion: (1) rigid-body motion segmentation [33, 53] with complete trajectories: Hopkins155 Traffic sequences [44]; (2) non-rigid motion (e.g., human motion) segmentation [53]: Keck Gesture [21] and UT-Interaction [41]. Then, we do experiments by using two frames of each sequence in the Hopkins155 dataset [44] to compare our performance. Finally, we conduct robustness experiments: (1) missing and corrupted trajectories: Hopkins12 motion sequences [46]; (2) outlying trajectoriesFootnote 1: Cars Turning, Books and Nrbooks.

4.1 Compared methods and settings

We compare the proposed methods with the following baselines: SSC [10], LRR [31], SSC-OMP [59], LSR [34], RSIM [19], LRKSC [20], BDR-B [33], BDR-Z [33]. Especially in the human motion segmentation experiment, in addition to comparing our models with the above models, we also compare them with specially designed human motion segmentation methods, including TSC [27], TSS [48], and LRTSC [49].

In the performance comparison test, we try our best to use the available source code and adjust the parameters to obtain excellent results. The misclassification ratio with the best mapping is widely used as a measure of clustering validity [10, 31, 34, 38, 57]. The expression is shown as follows:

$$\begin{aligned} Err\% = \frac{{\sum \nolimits _{i = 1}^n {\upsilon ({s_i}, \sim \mathrm{{map(}}{\mathrm{{g}}_i}\mathrm{{)}})} }}{n} \end{aligned}$$
(38)

where \(\upsilon (x, \sim \mathrm{{y) = }}\left\{ \begin{array}{l} 1,\mathrm{{ x }} \ne \mathrm{{y}}\\ 0,\mathrm{{ x = }}y \end{array} \right.\) , \({s_i}\) is the ground truth of the ith element, \(\mathrm{{map(}}{\mathrm{{g}}_i}\mathrm{{)}}\) represents the permutation mapping function, and \(\sim \mathrm{{map(}}{\mathrm{{g}}_i}\mathrm{{)}}\) denotes that the cluster label \({\mathrm{{g}}_i}\) cannot be mapped to the corresponding ground truth. \(Err\%\) is a negative metric, and a lower value means better performance. As suggested in [20], we repeat each experiment 20 times and give an average result to mitigate the impact of initialization.

4.2 Rigid-body motion segmentation with complete trajectories: Hopkins155 traffic sequences [44]

The Hopkins155 dataset [44] is a standard motion segmentation dataset to test subspace clustering based motion segmentation algorithms. It includes the video sequences and the trajectory features tracked in all frames. It consists of 38 traffic sequences, 104 checkerboard sequences and 13 other sequences. The moving objects of Hopkins155 Traffic sequences are cars and trucks, which produce typical rigid structure-from-motions. Figure 2 shows two examples from Hopkins155 Traffic sequences.

Fig. 2
figure 2

2 Examples from Hopkins 155 traffic sequences

Hopkins155 Traffic sequences can be researched under the affine assumption, because the rigid motion trajectories across multiple frames is located in the affine subspace. Considering that each video sequence in the Hopkins155 Traffic sequences has a complete feature track and outliers in the data set have been manually deleted, we apply model RBDSC (with Eq. (7) solved by Algorithm 1) to test the efficiency of our method. The parameters for model RBDSC are set for \({\lambda _1} = 2.8,{\lambda _2} = 10,{\lambda _3} = 1.3*{10^5}\), and the polynomial kernel \(k({x_1},{x_2}) = {(0.8 + x_1^T{x_2})^2}\) is used to define K in our model. Table 2 reports the results of our RBDSC method and baselines. We can easily see that the RBDSC method consistently obtains better results than other advanced methods.

Table 2 Segmentation errors (in %) on Hopkins155 traffic sequences

4.3 Non-rigid motion segmentation

Human motion is a typical non-rigid motion with great deformation, so it is very challenging to segment different moving objects. To demonstrate the effectiveness of our model for non-rigid motion, we conduct experiments on two typical human motion datasets, including Keck Gesture Dataset (Keck) [21] and UT-Interaction Dataset (UT) [41].

  • Keck [21] contains 14 different types of gestures, which represent different military signals. Each of the three people performs these 14 gestures. In each sequence, everyone repeats the same gesture three times. The resolution of the RGB frame is \(640 * 480\). Figure 3 shows some examples of different gesture classes from the Keck dataset.

  • UT [41] contains 20 videos, each of which contains six types of human interaction, including pushing, shaking hands, punching, kicking, pointing and hugging. The resolution of each video sequence is \(780 * 480\), and the duration is about one minute. Figure 4 shows a few snapshots of human interaction videos from the UT dataset.

In the experiment, we use the low-level HoG feature extracted in [49]. Then, for the comparison methods of different theories, we carry out corresponding experimental settings. For the motion segmentation method based on transfer learning knowledge (TSC, TSS, LRTSC), we refer to [49] to segment the test video sequence based on another data (source). Specifically, TSC, TSS and LRTSC all set the UT (Keck) Dataset as the source and the Keck (UT) Dataset as the target. For other comparison algorithms (not using source information), such as LRR, LSR, SSC and LRKSC, we only input video sequences in the target dataset. We set the parameters of model RBDSC as \({\lambda _1} = 2.8,{\lambda _2} = 21,{\lambda _3} = 140\), and the kernel function \(k({x_1},{x_2}) = {(0.8 + x_1^T{x_2})^3}\) is used to define K in our model. The results are recorded in Table 3. Obviously, the error rate of RBDSC is much lower than that of 11 advanced clustering methods. In particular, compared with the second-best model on UT dataset, LRTSC, our model RBDSC achieves a lower classification error rate at an average of 31.75 %.

Then, we compare the computational cost of all comparison methods, all of which were implemented with MATLAB R2019a on a Windows computer with a 3.0 GHz Intel(R) Core(TM) i5-7400 CPU and 8.00 GB memory. Figure 5 shows the running time (in seconds) of all methods on UT dataset, and these methods are ordered in ascending computational costs as SSC-OMP, LSR, RSIM, LRKSC, SSC, RBDSC, BDR-B, BDR-Z, LRR, TSS, TSC and LRTSC. We observe that RBDSC significantly reduces the computing cost compared with the second-best method (LRTSC). The results demonstrate that RBDSC is an effective and time-saving method.

Fig. 3
figure 3

Examples of 7 different gesture classes in the Keck dataset [21]

Fig. 4
figure 4

Some snapshots of human interaction videos in the UT dataset [41]

Table 3 Segmentation Errors (in %) on two human motion datasets
Fig. 5
figure 5

Comparison of the running time of different algorithms on UT Dataset

4.4 Results on Two-frame of each sequence in the Hopkins155 Dataset [44]

The impact of frame counts is very important for designing real-time applications that require processing sequentially input data. To evaluate the performance of our model when inputting a small number of frames, we manually select the first and last frames [20, 28] from each video sequence in the Hopkins155 dataset to form a two-frame Hopkins155 dataset. In view of the problem that the two frames of video sequence cannot accurately satisfy the assumption of camera affine model, [13] proposed that the subspace was derived from the rewriting limit constraint

$$\begin{aligned} {x_d}^{\prime T} \varvec{F}{\mathrm{{x}}_d} = 0 \end{aligned}$$
(39)

where \(\mathrm{{x}}_d^T = {[{x_d},{\mathrm{{y}}_d},\mathrm{{ 1]}}^T}\) and \(\mathrm{{x}}_d^\prime = {[x_d^\prime ,\mathrm{{ y}}_d^\prime ,\mathrm{{ 1]}}^T}\) were the homogeneous coordinates of the two points corresponding to the 3-D point d in the two- frame, and \(\varvec{F} \in {\mathfrak {R}^{3 \times 3}}\) is the fundamental matrix. Equation (39) can be rewritten as [18]

$$\begin{aligned} {f^T}vec(\mathrm{{x}}_d^\prime \mathrm{{x}}_d^T) = 0 \end{aligned}$$
(40)

where \(f \in {\mathfrak {R}^{9 \times 9}}\) denotes vectorized fundamental matrix \(\varvec{F}\), \(vec(\mathrm{{x}}_d^\prime \mathrm{{x}}_d^T) = {({x_d}x_d^\prime ,{x_d}y_d^\prime ,{x_d},{y_d}x_d^\prime ,{y_d}y_d^\prime ,{y_d},x_d^\prime ,y_d^\prime ,1)^T}\) . Thus, \(vec(\mathrm{{x}}_d^\prime \mathrm{{x}}_d^T)\) lies on the epipolar subspace of dimension 8 [18]. We assume that two perspective images with multi-motion have multi-epipolar subspaces because different motions are relative to different fundamental matrices [20].

We replicate the data 30 times for increasing the ambient dimension, and set the parameters of model RBDSC (with Eq. (7) solved by Algorithm 1) as \({\lambda _1} = 3,{\lambda _2} = 5,{\lambda _3} = 1.3 * {10^5}\). The kernel function \(k({x_1},{x_2}) = {(1.8 + x_1^T{x_2})^2}\) is used to define \(\varvec{K}\) in our model.

Table 4 reports the results of our RBDSC method and baselines. Through in-depth analysis of these results, we can get the following observations.

  • We can easily see that our model RBDSC generally achieves the lowest misclassification rate compared with these advanced algorithms.

  • We observe that RBDSC is superior to LRKSC overall. This is mainly because RBDSC introduces direct block diagonal constraint [33], while LRKSC uses low rank prior to indirectly approximate block diagonal structure. Figure 6 displays the comparison of the block-diagonal property of the affinity matrix obtained by different clustering methods. These images can be zoomed in on the screen for best results. Obviously, RBDSC has the best block diagonal structure.

  • Our method RBDSC consistently outperforms BDR-B and BDR-Z, which clearly demonstrates the benefits of combining block diagonal regularizer (BDR) [33] with kernel mapping technology and low-rank kernel learning strategy. Specifically, RBDSC has the ability to handle non-linear data by low-rank kernel mapping when the sequences cannot strictly meet the assumptions of the affine camera model.

Table 4 Segmentation errors (in %) on two-frame Hopkins155 Dataset
Fig. 6
figure 6

Visual comparison of the block-diagonal of the affinity matrix obtained on the L1R2RCT _B_g23 sequence through different models

4.5 Experimental results on robustness

4.5.1 Missing and corrupted trajectories: Hopkins12 Dataset [46]

In practice, some trajectory features in some image frames are lost due to the limitation or occlusion of the tracker. Hopkins12 dataset contains 12 motion sequences: 3 3-motion sequences and 9 2-motion sequences. For these sequences, they have lost some of the trajectory features to some extent. And corrupted trajectories can emerge in video sequences when the tracker unknowingly loses tracking of certain feature points. Therefore, it is challenging to segment the motion subjects in video sequences.

We test our model RBDSC-MN and several advanced subspace segmentation methods on the Hopkins12 dataset with incomplete or corrupted trajectories. Figure 7 shows some frames of 3-motion sequence in the Hopkins12 dataset.

Fig. 7
figure 7

Some Frames from 3-motion Sequence in Hopkins12 Dataset [46]

We set the parameters of model RBDSC-MN (with Eq. (11) solved by Algorithm 2)) as \({\lambda _1} = 2.8,{\lambda _2} = 10,{\lambda _3} = 1000\), and the kernel function \(k({x_1},{x_2}) = {(0.8 + x_1^T{x_2})^2}\) is used to define \(\varvec{K}\) in our model. The results of the model RBDSC-MN and its comparison methods on the Hopkins12 dataset are listed in Table 5. And it distinctly indicates the benefits of model RBDSC-MN in the presence of missing trajectories. RSIM is the second-best method, but it relies on additional optimization strategy on the Grassmann manifold to acquire the estimated value of the orthogonal matrix \(\varvec{V}\) to reduce the segmentation errors when the trajectory is incomplete. The competitiveness of model RBDSC-MN comes from the fact that correntropy is used as the loss function for robust subspace clustering deals with missing trajectories, which are non-Gaussian noises. This demonstrates that the model RBDSC-MN can effectively handle non-Gaussian noises in motion segmentation without any preprocessing or postprocessing step.

Table 5 Segmentation errors (in %) on two-frame Hopkins12 Dataset with missing trajectories

4.5.2 Outlying trajectories: Cars Turning, Books and Nrbooks

Dynamic scenes often contain some sample outliers, which do not belong to any motion trajectory in the scene. Figure 8 displays three typical video trajectories, which contain both outlying trajectories (outliers) and missing data samples. More detailed information is shown in Table 6. We design model RBDSC-RN to remove outlier rows, and use three representative motion trajectories (Cars Turning, Books and Nrbooks) to evaluate its performance. Table 7 lists the subspace segmentation results of different methods. Obviously, the RBDSC-RN achieves the lowest mis-segmentation rates. The main reason is that the model RBDSC-RN uses non-convex row-correntropy to eliminate outliers, and it is less affected by larger outliers.

Fig. 8
figure 8

Example image frames from trajectories with outliers

Table 6 Detailed information about the three datasets, including Cars Turning, Books and Nrbooks3
Table 7 Segmentation errors (in %) on datasets with outlying trajectories

4.6 Parameter sensitivity

Our models contain three important parameters: \({\lambda _1}, \ {\lambda _2} \ \mathrm{{ and }} \ {\lambda _3}\) , where \({\lambda _1}\) constrains the significance of BDR term \({\left\| \varvec{D} \right\| _k}\) , \({\lambda _2} \ \mathrm{{ and }} \ {\lambda _3}\) control the weight of \(\left\| { \varvec{H} - \varvec{D}} \right\| _F^2\) and \(\left\| {\varvec{E} - ( \varvec{K} - {\varvec{M}^T}\varvec{M})} \right\| _F^2\) (or \(\left\| {\varvec{K} - {\varvec{M}^T} \varvec{M}} \right\| _F^2\) ) respectively. We use numerous values to evaluate the parameter sensitivity of the proposed model on the Hopkins12 dataset, and record the average misclassification rate of 12 sequences. Figure 9 shows the experimental results of \({\lambda _1} \in [2.5 \ \mathrm{{ 2}}\mathrm{{.6 \ 2}}\mathrm{{.7 \ 2}}\mathrm{{.8 \ 2}}\mathrm{{.9 \ 3}}], {\lambda _2} \in [5 \ \mathrm{{ 10 \ 15 \ 20 \ 25 \ 30}}], and {\lambda _3} \in [\mathrm{{ 0}}\mathrm{{.01 \ 0}}\mathrm{{.1 \ 1 \ 10 \ 100 \ 1000}}]\) . For Hopkins12 dataset, our method can achieve lower segmentation errors when the parameters \({\lambda _1}\), \({\lambda _2}\) and \({\lambda _3}\) are not less than 2.8, 10 and 0.1 respectively.

Fig. 9
figure 9

Segmentation errors rates on Hopkins12 Dataset versus the parameters \({\lambda _1}, \ {\lambda _2} \ \mathrm{{ and }} \ {\lambda _3}\)

5 Conclusion

In this work, a unified rank-constrained block diagonal subspace clustering method for motion segmentation is proposed, which can handle both rigid body and non-rigid body motion segmentation. In addition, two robust motion segmentation models (RBDSC-MN and RBDSC-RN) are put forward to deal with missing trajectories, corrupted trajectories and outlying trajectories. The experimental records based on some commonly used video sequences show that our methods can improve the accuracy of motion segmentation.

We have proved the superiority of our algorithms theoretically and experimentally, but our methods have certain limitations. First, adding kernel mapping constraints and KCIM to the block diagonal regularization representation improves the performance of processing non-linear data (i.e. motion trajectory feature), but it brings some computational overhead. We plan to study how to speed up our methods, such as using GPUs. Second, our methods do not consider the problem of crowd movement segmentation. In crowd movement, there are many people who move independently and each person’s various small actions are included in the crowd movement. In the future, we will conduct related research to make our methods suitable for crowd motion segmentation.