1 Introduction

The effective use of high-dimensional data involves a strong framework and a flexible methodology to achieve a tight connection between information science and the original purpose of data analysis of datasets derived from various scientific disciplines. A number of manifold learning methods have been proposed for solving the problem of nonlinear dimensionality reduction (NLDR), e.g., isometric feature mapping (ISOMAP) [1], locally linear embedding (LLE) [2], and Laplacian eigenmaps (LE) [3]. These methods assume that the real-world data are intended for embedding into a lower dimensional space while preserving the geometrical structure [4]. If low-dimensional representations of the data can be obtained, then clustering, visualization, and searching are more convenient and effective. So the manifold learning algorithms have been widely exploited in a broad range of applications including image classification [5, 6], image retrieval [7], activity recognition [8] and biomedical information detection [9].

The classical manifold learning algorithms often assume that the data resides on a single manifold. However, a mixture of manifolds typically occurs in practice. For instance, in handwritten digit recognition, each digit forms its own manifold in the feature space; in computer vision, motion segmentation is an essential process for many computer vision algorithms. These manifolds may intersect or partially overlap, and they may contain different dimensionality, orientation, and density.

At present, several multi-manifold learning methods have been proposed. K-Manifolds [10] was first proposed to classify unorganized data roughly occurring on multiple intersecting nonlinear manifolds. Unfortunately, this method is limited to deal with intersecting manifolds because the estimation of geodesic distances fails when the clusters are widely separated. In contrast, the decomposition-composition (D-C) method [11] and DC-Isomap algorithm [12] can only classify well-separated multi-manifolds. Their decomposition processes classify the multi-manifold to several sub-manifolds, and their composition processes also enable low-dimensional embedding. There are many spectral clustering [13,14,15] and Multi-manifold clustering [16,17,18] methods capable of clustering multi-manifolds into several sub-manifolds, however, these methods are not capable of dimensionality reduction for multi-manifolds.

t-SNE [19, 20] and Umap [21] are two dimensionality reduction and visualization techniques, which produce significantly better visualizations by reducing the tendency to crowd points together in the center of the map. In other word, they can cluster the high-dimensional data that lie on several different, but related, low-dimensional manifolds, and then visualize them in 2D or 3D space. But they are not suitable for the intersected manifolds, especially in low dimensional spaces. The points at the intersection are still very similar after mapping, and the “crowding problem” still exists. Some supervised or semi-supervised manifold learning methods have been proposed to cluster and visualize multi-manifolds. These methods typically fall into one of two methods [22]: distance-based and constraint-based. The distance-based methods use distance metrics to satisfy the objective function. The S-Isomap algorithm [23] utilizes class information to form a geodesic distance matrix that distorts the topology among multi-manifolds, although this method can be applied to a single-manifold with noise. The multi-manifold semi-supervised learning (MMSSL) algorithm [24] is a semi-supervised version of Isomap that cannot be assured to cross the intersection of the manifolds. The M-Isomap [25] first introduces local pairwise constraints to construct the intrinsic and penalty graphs and then uses the graph embedding framework to determine the final optimization criterion. However, M-Isomap only preserves the local geodesic distances, thus it only focuses on increasing the distances between the local interclass points but not the holistic interclass separation. The multi-manifold discriminant Isomap (MMD-Isomap) algorithm [26] is an improvement of the M-Isomap method that constructs global pairwise constraints and optimizes its objective function. M-Isomap and MMD-Isomap are all supervised versions of Isomap. The semi-supervised local multi-manifold Isomap (SSMM-Isomap) [27] improves these two methods by extending the manifold feature learning to a semi-supervised scenario, linear extension scenario and local feature learning scenario simultaneously. Although these constraint-based methods perform well for multi-manifold classifications, it is difficult to select the parameters and separate the embedding of the sub-manifolds, especially for more than three intersecting manifolds.

Compared to supervised learning methods, learning descriptors from unlabeled data is preferable because in many cases (e.g., image registration, template matching, or object tracking, especially for their applications involving images from relatively less commonly used sensors in the vision community, such as infrared, SAR, hyperspectral, medical images, etc.) collecting labels is expensive and even impractical while local image without labels are ubiquitous and easy to collect. [28]

In this paper, we propose an unsupervised multi-manifold discriminant Isomap (UMD-Isomap) algorithm, for the purpose of nonlinear dimensionality reduction and clustering of data lying on the multi-manifold. Our UMD-Isomap algorithm uses the global pairwise constraints by training m mixtures of probabilistic principal component analyzers (MPPCA) [29, 30], propagates the local tangent subspaces, and optimizes the objective function of MMD-Isomap [26] and SSMM-Isomap [27] to dimensionality reduction multi-manifolds. The pairwise constraints include the information of the sub-manifolds, and a priori determination of the number of classes or sub-manifolds is not required. The sub-manifolds are dynamically constructed using the process of tangent subspaces propagation. If the number of sub-manifolds is determined in advance, the pairwise constraints can be re-constructed by ensembling c sets of cluster results.It is noteworthy to express several aspects that demonstrate the contributions of the proposed UMD-Isomap method:

  • An unsupervised method is used for the dimensionality reduction and clustering of multi-manifold data; this approach not only creates the global pairwise constraints for clustering the multi-manifold but also establishes a more suitable objective function for its dimensionality reduction.

  • An effective method is utilized to compute the global pairwise constraints for multi-manifolds. First, m MPPCA “patches” are trained on the multi-manifold dataset. Then their tangent spaces are propagated to construct the global pairwise constraints. At the same time, the sub-manifolds are created. If the number of sub-manifolds is known, the pairwise constraints can be re-constructed by ensembling c sets of cluster results.

  • A suitable objective function is optimized and two effective methods are used for dimensionality reduction of the multi-manifold in the lower dimensional spaces. The method not only optimizes MMD-Isomap and SSMM-Isomap, but it is also easy to determine the parameters and fit for more than three intersecting manifolds.

The rest of the paper is organized as follows. Section 2 reviews the related works. Section 3 describes the proposed method with details. Section 4 presents the experimental results and discussions. The conclusion is presented in Section 5.

2 Related works and necessary definitions

Suppose that the set of data points \({X}=\{{{x}_{1}},...,{{x}_{n}}\}\) is in high dimensional space \({{R}^{D}}\) and the feature space is \({{R}^{d}}\).

2.1 Mixtures of probabilistic PCA

In the mixtures of probabilistic PCA (MPPCA) model proposed in [29, 30], each d-dimensional data vector \(x_n\) in the i.i.d sample \({{\mathbf {X}}=\{x_n\}_{n=1}^N}\) is generated with two steps. First, a natural number j is generated based on the distribution \({p(j)=\pi _j,j=1,...,m}\) under the constraint \({\sum _{j=1}^m \pi _j=1}\). Second, given this j, \(x_n\) is generated from the restricted factor analysis model

$$\begin{aligned} x_n|j={{\mathbf {A}}_j}{{\mathbf {y}}_{nj}}+\mu _j+\epsilon _{nj}; \nonumber \\ {\mathbf {y}}_{nj}\sim {\mathcal {N}}({\mathbf {0}},{\mathbf {I}}),\epsilon _{nj}\sim {\mathcal {N}}({\mathbf {0}},\sigma _j^2{\mathbf {I}}), \end{aligned}$$
(1)

where \({\mathbf {I}}\) denotes some unit matrix, \(\mu _j\) is a d-dimensional mean vector, \({\mathbf {A}}_j\) is a \(d \times k_j\) factor loading matrix, \({\mathbf {y}}_{nj}\) is an independent \(k_j\)-dimensional latent factor vector and \(\sigma _j^2\) is the noise variance associated with component j. Clearly, this is a mixture of m probabilistic PCA sub-models with mixture proportion \(\pi _j\)’s. Unlike the traditional factor analysis model, which assumes that \(\epsilon _{nj}\) has a diagonal covariance, MPPCA assumes a scalar covariance.

Let \(\theta =\{{\pi _j},{\theta _j};j=1,...,m\}\), \(\theta _{j}=\left( {{\mathbf {A}}_j},{\mu _j},{\sigma }_j^2\right)\), and \({\varvec{\Sigma }}_{j}={{\mathbf {A}}}_{j}{\mathbf {A}}_{j}^{'}+{\sigma }_j^2{\mathbf {I}}\). Under the MPPCA model, the log likelihood of observing the data \({\mathbf {X}}\) is

$$\begin{aligned} {{\mathcal {L}}({\mathbf {X}}|\theta )}={{\sum }_{n=1}^N log\left[ {\sum }_{j=1}^m \pi _j p(x_n | \theta _j)\right] }, \end{aligned}$$
(2)

where \({p(x_n|\theta _j)}={(2\pi )^{-d/2} |{\varvec{\Sigma }}_j| ^{-1/2}\cdot exp\{-\frac{1}{2}(x_n-\mu _j)^T{\varvec{\Sigma }}_j^{-1}(x_n-\mu _j)\}}\), The maximum likelihood estimate \({\hat{\theta }}\) is defined as

$$\begin{aligned} {\hat{\theta }}=\mathop {argmax}\limits _{\theta }{{\mathcal {L}}({\mathbf {X}}|\theta )}. \end{aligned}$$
(3)

If the prior \(p(\theta )\) for model parameter \(\theta\) is available, then the maximum a posteriori estimate \({\hat{\theta }}\) can be obtained as

$$\begin{aligned} {{\hat{\theta }}=\mathop {argmax}\limits _{\theta } \{ {\mathcal {L}}({\mathbf {X}}|\theta )+logp(\theta )\}}. \end{aligned}$$
(4)

Given the number of components m and the subspace dimension \({\mathbf {k}}=(k_1,k_2,...,k_m)\), the parameter \(\theta\) in MPPCA model can be estimated by the well known expectation-maximization (EM) algorithm. [31].

2.2 MMD-Isomap and SSMM-Isomap

MMD-Isomap is a supervised multi-manifold learning method over Isomap [26]. The algorithm uses the global pairwise constraints [32,33,34,35,36] to solve the optimization problem. In the pairwise constraints, some pairs of points are in same class and their relationships are recorded in the must-Link (ML) set. Other pairs of points are in different classes and their relationships are recorded in the cannot-Link (CL) set. The global pairwise ML and CL sets are defined as:

$$\begin{aligned} ML&= \{(x_i,x_j)| i\ne j,l(x_i)=l(x_j)\}, \end{aligned}$$
(5)
$$\begin{aligned} CL&= \{(x_i,x_j)| i\ne j,l(x_i)\ne l(x_j)\}, \end{aligned}$$
(6)

where \(l(x_i)\epsilon \{1,2,...,c\}\) is the class label of \(x_i\), \(i=1,2,...,N\), and c is the number of classes. Then, MMD-Isomap aims to preserve the global geometry structures of intra-class data and separate the inter-class data by solving the following two problems:

$$\begin{aligned} J_{ML}&={\min _Y}{{\sum }_{(x_i,x_j)\epsilon ML}(d(y_i,y_j)-{d_G}(x_i,x_j))^2} , \end{aligned}$$
(7)
$$\begin{aligned} J_{CL}&={\max _Y}{{\sum }_{(x_i,x_j)\epsilon CL}d^2(y_i,y_j)=\Vert y_i-y_j\Vert ^2} . \end{aligned}$$
(8)

where \(d^2(y_i,y_j)\) denotes the Euclidean distance between the low- dimensional representations \(y_i\) and \(y_j\) , and \(d_G(x_i,x_j)\) is the shortest path distance for approximating the geodesic distance. Minimizing Eq. 2) is equivalent to preserving the pairwise distance between \(d(y_i,y_j)\) and \(d_G(x_i,x_j)\) for the sample pair \((x_i,x_j)\epsilon {ML}\), and maximizing Eq. 3) aims to separate all the sample pairs \((x_i,x_j)\epsilon {CL}\). Finally, MMD-Isomap solves the criterion:

$$\begin{aligned} J=\, & {} {\min _Y} \sigma (Y), \nonumber \\ \sigma (Y)=\, & {} \frac{1-\alpha }{|ML|} {\sum _{(x_i,x_j)\epsilon ML}(d(y_i,y_j)-d_G(x_i,x_j))^2 -\frac{\alpha }{|CL|}{\sum _{(x_i,x_j)\epsilon CL}d^2(y_i,y_j)}}, \end{aligned}$$
(9)

and \(\alpha \epsilon (0,1)\) is a weighting parameter for trading-off the effects of discrimination over the ML and CL constraints. Note that MMD-Isomap solves Eq. 4) using the Scaling by MAjorizing a COmplicated Function (SMACOF) [37, 38] approach.

SSMM-Isomap is an improvement of MMD-Isomap [27]. First, the objective function in MMD-Isomap is optimized. Then two solution schemes are presented: SMACOF and Eigen-decomposition. The main contribution in SSMM-Isomap is its objective function:

$$\begin{aligned}&J = {\min _{Y,P}} \sigma (Y,P),\nonumber \\&s.t.YY^T=I, \nonumber \\&\sigma (Y,P)=\frac{1-\alpha }{|ML|} {\sum _{(x_i,x_j)\epsilon ML}(d(y_i,y_j)-d_G(x_i,x_j))^2-\frac{\alpha }{|CL|}{\sum _{(x_i,x_j)\epsilon CL}d^2(y_i,y_j)}} \nonumber \\&+\beta \sum \limits _{i=1}^{N} \Vert {y_i-\sum \limits _{x_j\in N(x_i)} W_{ij}y_j}\Vert _2^{2}+\gamma \sum \limits _{i=1}^{N} \Vert {Px_i-y_i}\Vert _2^{2}, \end{aligned}$$
(10)

The first two items in SSMM-Isomap are the same as in MMD-Isomap. To enable the proposed model to compute low-dimensional local manifold features by using labeled and unlabeled data, the third item is added. The reconstruction weights matrix W of the training samples can be computed by minimizing the following LLE-style optimization problem [2]. To enable the SSMM-Isomap method to learn an explicit projection or feature extractor for handling the external new data, a fourth item is included, which encodes the mismatch between the embedded features using an extractor and the reduced manifold features, so that the learnt extractor P can embed the external new data efficiently by projection [27].

3 Unsupervised multi-manifold discriminant ISOMAP

3.1 Problem formulation

The proposed UMD-Isomap algorithm can be seen as a refinement of MMD-Isomap and SSMM-Isomap. The algorithm not only computes the pairwise constraints ML and CL by training MPPCA “patches” and propagating their local tangent spaces, but also improves the objective functions of MMD-Isomap and SSMM-Isomap. Furthermore, two solutions are proposed to minimize the objective function and obtain low-dimensional embedding results.

It is difficult to determine whether a connected component consists of a single manifold or multiple intersecting manifolds, and how many manifolds exists in the connected component. Though the data are located globally on or close to the multiple smooth nonlinear manifolds, locally, each point and its neighbors are located on a linear patch of the manifold [2, 39]. Moreover, the local tangent space at each point provides a good low-dimensional linear approximation to the local geometric structure of the manifold. As it will become clear shortly, at the intersections of different manifolds, points on the same manifold have similar local tangent spaces whereas points on different manifolds have dissimilar tangent spaces [16].

Generally, the tangent space at each point can be estimated by performing a singular value decomposition (SVD) of \(x_1,...x_n\) [40, 41]. When two points x and y are very close to each other their local tangent spaces are very similar even if they are from different manifolds. This occurs because their local neighborhood based on the Euclidean distance have a large overlap, resulting in similar local covariance matrices. Therefore, this traditional definition of the local tangent space does not work well for intersecting manifolds.

The global nonlinear manifolds can be well-approximated locally by a series of local linear manifolds, and principal component analyzers can successfully cross the intersecting linear manifolds [29]. Moreover, the points approximated by the same linear analyzer usually have similar local tangent spaces which can also be well-approximated by the principal subspace of the local analyzer [16]. Therefore, we can train many local linear analyzers to approximate the underlying manifolds. Subsequently. the local tangent space of a given sample is determined by the principle subspace of its corresponding local analyzer. Thus our UMD-Isomap trains m MPPCA “patches” first before propagating the points in these patches to create the sub-manifolds. At the same time, the points in the same sub-manifolds must be linked, and the points in the different sub-manifolds can not be linked. That is to say, the pairwise constraints ML and CL are created. Subsequently, the optimized objective functions is proposed based on the pairwise constraints, and the two solutions (UMDwithSMACOF and UMDwithEd) represent the final embedding results of the multi-manifold.

It is noteworthy that the number of classes created by the UMD-Isomap algorithm may be higher than that of the real sub-manifolds because the algorithm classifies the dataset according to its topological structure with no consideration of the classification. The number of classes does not have to be determined in advance in our algorithm because it is affected only by the parameter m and \(\theta _0\) in MPPCA and the propagation of the tangent subspaces. If the number of classes is known in advance, the proposed algorithm can merge its clustering results using clustering ensemble methods. For example, the MNIST dataset contains thousands of handwritten digit images of 10 digits, and the Coil20 data set contains 1440 images of 20 objects. After running the normal UMD-Isomap algorithm, the number of classes(submanifolds) may be higher than 10(or 20). The UMD-Isomap algorithm creates c sets of ML and CL and accumulates a consensus similarity matrix for a clustering ensemble algorithm. The result of the clustering ensemble algorithm consists of newer pairwise constraints with a fixed number of sub-manifolds.

figure a

3.2 Construct the neighborhood

In the UMD-Isomap algorithm (Algorithm 1), the neighborhood of \(x_i\) can be computed by the k-nearest neighbor (K-NN) or the dynamical neighborhood algorithm [42]. The K-NN is simple and efficient, whereas the parameter K is not easy to determine, and one global setting may not work well for the entire manifold. The selection of the neighborhood should be data-driven and mainly depends on its local topological structure. The dynamical neighborhood algorithm represents a more accurate neighborhood relationship because it considers the sampling density and manifold curvature. Thus, the neighborhood graph determined by the dynamical neighborhood algorithm improves the accuracy of propagating the sub-manifolds.

3.3 Construct the pairwise constraints

It is difficult to determine whether the points are in the same manifold or not by using only local geometric information. Intuitively, for two points in the same local area, if (a) they are close to each other and (b) have similar tangent spaces, then there is a high probability that they occur on the same manifold. And if they have different local tangent spaces, they are very likely to come from different manifolds [16].

UMD-Isomap defines the dissimilarity between the tangent spaces \({{\mathbf {T}}_i}\) and \({{\mathbf {T}}_j}\) as:

$$\begin{aligned} v_{ij} = \sqrt{1 - {\sigma ^2}(U_i^T{U_j})}, \end{aligned}$$
(11)

where \({U_i}\) and \({U_j}\) are the basis of the tangent subspace \({{\mathbf {T}}_i}\) and \({{\mathbf {T}}_z}\), and \(\sigma\) is the minimum singular value of \(U_i^T{U_j}\).

After m patches are trained by MPPCA, if there is a pair of points \((x_i,x_j)\) that are neighbors and exist in different patches \(p_i\) and \(p_j\), the patches \(p_i\) and \(p_j\) are adjacent to each other and are recorded as \(\mathbf {N(p_i,p_j)}\). The pair of points \((x_i,x_j)\) is also recorded as the boundary point pairs. Because the points in the “patches” belong to the same sub-manifold, the UMD-Isomap can only compute the dissimilarity between the boundary point pairs.

Before propagating the tangent subspaces of the points in the “patches”, the relationship between these patches is defined first:

$$\begin{aligned} R_{ij} = {\left\{ \begin{array}{ll} 1,&{} if \; \mathbf {N(p_i,p_j)} \; and \; v(p_i,p_j)<\theta _0\\ -1,&{} if \; \mathbf {N(p_i,p_j)} \; and \; v(p_i,p_j)>\theta _0\\ 0,&{} otherwise. \end{array}\right. } \end{aligned}$$
(12)

where \(v(p_i,p_j)\) is the dissimilarity between the tangent subspaces of two patches \(p_i\) and \(p_j)\) , and \(\theta _0\) is the threshold of the dissimilarity between the tangent spaces of these patches.

If \(R_{ij}=1\), patches \(p_i\) and \(p_j\) are neighbors, and it is possible that they exist in the same sub-manifold. If \(R_{ij}=-1\), patches \(p_i\) and \(p_j\) are neighbors, and it is impossible that they exist in the same sub-manifold. That is to say, they exist in the intersecting area. If \(R_{ij}=-1\), the patches \(p_i\) and \(p_j\) are not neighbors.

The first pair of “patches” to be propagated should be located at the point with the minimum local tangent subspace. That is to say,

$$\begin{aligned} p_{start} = \{(p_i,p_j)|\arg {\min }(v_{st}),v_{st}<\theta _0,x_s \in p_i,x_t \in p_j, R_{ij}=1\} \end{aligned}$$
(13)

After the first pair of “patches” is determined, the next patch is determined as follow:

$$\begin{aligned} p_{next} = \{p_j|\arg {\min }(v_{st}),v_{st}<\theta _0,x_s \in p_i,x_t \in p_j,p_i \in Ped, p_j \notin Ped,R_{ij} \ne -1\}. \end{aligned}$$
(14)

where Ped is the set of “patches” that had been propagated.

If \(p_{next}\) is empty, the propagated points in Ped must be linked, because they are in the same sub-manifold. Thus \(ML_{Ped,Ped}=1\). These points cannot be linked to other points, because they are in different sub-manifolds. Thus, \(CL_{Ped,nPed}=1\), where nPed is the set of “patches” that are not propagated this time. These “patches” cannot be propagated the next time therefore, their relationships with other “patches” is \(R_{ij}=-1\).

figure b

3.4 Re-construct the pairwise constraints

The \(s_{ij} \in ML\) indicates whether the pair of points \(x_i\) and \(x_j\) is in the same class or not. After running MPPCA and ConstructPC algorithms \(c^*\) times, the consensus similarity matrix M is constructed by accumulating \(c^*\) sets of ML. The \(m_{ij} \in M\) denotes the number of the pair of points \(x_i\) and \(x_j\) in the same class. The ensemble clustering can be derived from M to obtain \(l^*\) classes in a number of ways, such as using hierarchical clustering or spectral clustering algorithms. Then the new pairwise constraints are re-constructed according to the \(l^*\) new classes.

figure c

3.5 Compute the Embedding Results

In order to tackle the dimensionality reduction of multi-manifold, MMD-Isomap defines the objective function as expressed in Eq. 4). The objective function attempts to preserve the global geometrical structures of the intra-class data points, therefore \(d(y_i,y_j)\) is as close as possible to \(d_G(x_i,x_j)\). Meanwhile, the objective function aims to separate the interclass data points ensuring that the distance between them is as large as possible. However, a question remains: how large are the distances between the interclass data points? The item \(\sum \limits _{(x_i,x_j)\epsilon CL}d^2(y_i,y_j)\) maybe very large due to the dimension of the dataset, but the item \(\sum \limits _{(x_i,x_j)\epsilon ML}(d(y_i,y_j)-d_G(x_i,x_j))^2\) maybe controllable. Therefore, it is difficult to set the threshold of \(\sigma (Y)\) and to select a suitable parameter \(\alpha\). To accommodate this uncertainty, our proposed UMD-Isomap defines the following objective function:

$$\begin{aligned}&J = {\min _Y} \sigma (Y), \nonumber \\&\sigma (Y)=\frac{1-\alpha }{|ML|} {\sum _{(x_i,x_j)\epsilon ML}(d(y_i,y_j)-d_G(x_i,x_j))^2-\frac{\alpha }{|CL|}{\sum _{(x_i,x_j)\epsilon CL}(d(y_i,y_j)-C)^2}}. \end{aligned}$$
(15)

where \(C=\gamma * max(d_G(x_i,x_j))\), \(max(d_G(x_i,x_j))\) represents the maximum distance in sub-manifolds and \(\gamma\) is an adjustment parameter to increase the distances between sub-manifolds. Thus, c is a constant that determines the distances between the interclass data points. Then, we can scale by majorizing a convex function (SMACOF) and eigen-decomposition techniques respectively to solve the above-mentioned problem.

3.5.1 Effective solution using SMACOF

Despite the slow convergence of the SMACOF algorithm, a large number of iterations may be required for achieving high-accuracy, depending on the size of the data set and the used metric [26].

$$\begin{aligned} \sigma (Y)&=\frac{1-\alpha }{|ML|} \sum _{(x_i,x_j)\epsilon ML}(d(y_i,y_j)-d_G(x_i,x_j))^2\nonumber \\&\quad -\frac{\alpha }{|CL|}{\sum _{(x_i,x_j)\epsilon CL}(d(y_i,y_j)-C)^2} \nonumber \\&=\eta _{con}^2 + \eta ^2(Y)-2\rho (Y) \end{aligned}$$
(16)

where

$$\begin{aligned} \eta _{con}^2&=\frac{1-\alpha }{|ML|}{\sum _{(x_i,x_j)\epsilon ML}d_G^2(x_i,x_j)} - \alpha C^2, \end{aligned}$$
(17)
$$\begin{aligned} \eta ^2(Y)&=\frac{1-\alpha }{|ML|}{\sum _{(x_i,x_j)\epsilon ML}d^2(y_i,y_j)}-\frac{\alpha }{|CL|}{\sum _{(x_i,x_j)\epsilon CL}d^2(y_i,y_j)},\end{aligned}$$
(18)
$$\begin{aligned} \rho (Y)&=\frac{1-\alpha }{|ML|}{\sum _{(x_i,x_j)\epsilon ML}d(y_i,y_j)d_G(x_i,x_j)} - \frac{\alpha C}{|CL|}{\sum _{(x_i,x_j)\epsilon ML}d(y_i,y_j){\mathbf {1}}}. \end{aligned}$$
(19)

Let

$$\begin{aligned} w_{ij}^{ML}= \left\{ \begin{aligned}&\frac{1-\alpha }{|ML|},&if(x_i,x_j)\in ML\\&0,&otherwise \end{aligned} \right. , w_{ij}^{CL}= \left\{ \begin{aligned}&\frac{\alpha }{|CL|},&if(x_i,x_j)\in CL\\&0,&otherwise \end{aligned} \right. . \end{aligned}$$
(20)

then Eq. 18) can be rewritten as

$$\begin{aligned} \eta ^2(Y)&=\sum \limits _{i,j=1,i\ne j}^{N}w_{ij}^{ML}d^2(y_i,y_j)-\sum \limits _{i,j=1,i\ne j}^{N}w_{ij}^{CL}d^2(y_i,y_j) \nonumber \\&=tr(Y(\sum \limits _{i,j=1,i\ne j}^{N}w_{ij}^{ML}F^{ij})Y^T)-tr(Y(\sum \limits _{i,j=1,i\ne j}^{N}w_{ij}^{CL}F^{ij})Y^T)\nonumber \\&=tr(YVY^T) \end{aligned}$$
(21)

where

$$\begin{aligned} V=V_{ML}-V_{CL}, V_{ML}=\sum \limits _{i,j=1,i\ne j}^{N}w_{ij}^{ML}F^{ij}, V_{CL}=\sum \limits _{i,j=1,i\ne j}^{N}w_{ij}^{CL}F^{ij}, \end{aligned}$$
(22)

and \(F_{ij}\) is a \(N\times N\) matrix with elements \(F_{ii}^{ij}=F_{jj}^{ij}=1\), \(F_{ij}^{ij}=F_{ji}^{ij}=-1\), and all other elements zero.

Now, consider Eq. 20). Let

$$\begin{aligned} wb_{ij}^{ML}&= \left\{ \begin{aligned}&\frac{(1-\alpha )d_G(x_i,x_j)}{|ML|d(z_i,z_j)},&if(x_i,x_j)\in ML, d(z_i,z_j)\ne 0\\&0,&otherwise \end{aligned} \right. , \end{aligned}$$
(23)
$$\begin{aligned} wb_{ij}^{CL}&= \left\{ \begin{aligned}&\frac{\alpha C {\mathbf {1}}}{|CL|d(z_i,z_j)},&if(x_i,x_j)\in CL, d(z_i,z_j)\ne 0\\&0,&otherwise \end{aligned} \right. . \end{aligned}$$
(24)

Then we have

$$\begin{aligned} \rho (Y)&=\frac{1-\alpha }{|ML|}{\sum _{(x_i,x_j)\epsilon ML}d(y_i,y_j)d_G(x_i,x_j)} \nonumber \\&\quad - \frac{\alpha C}{|CL|}{\sum _{(x_i,x_j)\epsilon ML}d(y_i,y_j){\mathbf {1}}} \nonumber \\&={\sum _{{\begin{array}{c} (x_i,x_j)\epsilon ML\\ d(z_i,z_j)\ne 0 \end{array}}}wb_{ij}^{ML}d(y_i,y_j)d(z_i,z_j)}\nonumber \\&\quad -{\sum _{{\begin{array}{c} (x_i,x_j)\epsilon CL\\ d(z_i,z_j)\ne 0\end{array}}}wb_{ij}^{CL}d(y_i,y_j)d(z_i,z_j)}. \end{aligned}$$
(25)

If we define

$$\begin{aligned} B(Z) = \sum \limits _{i,j=1\;i\ne j}^{N}wb_{ij}^{ML}F^{ij}-\sum \limits _{i,j=1\;i\ne j}^{N}wb_{ij}^{CL}F^{ij}, \end{aligned}$$
(26)

the Cauchy–Schwartz inequality implies that for all pairs of configurations Y and Z, we have \(\rho (Y) \ge tr(YB(Z)Z)\). Then Eq. (16) can be rewritten as

$$\begin{aligned} \sigma (Y)&=\eta _{con}^2 + \eta ^2(Y)-2\rho (Y) \nonumber \\&\le \eta _{con}^2 + tr(YVY^T) -2tr(YB(Z)Z^T). \end{aligned}$$
(27)

If we define \(\tau (Y,Z) = \eta _{con}^2 + tr(YVY^T) -2tr(YB(Z)Z^T)\), then \(\tau (Y,Z)\) is an auxiliary function of \(\sigma (Y)\). The Y value for \(\tau (Y,Z)\) attaining its minimum, can be calculated by setting the partial derivative equal to zero. So that \(Y = ZB(Z)V^{-1}\).

The solution using SMACOF is described in the UMDwithSMACOF algorithm.

figure d

3.5.2 Effective Solution using Eigen-decomposition

It is also feasible to solve the objective function in Eq. (16) using the eigen-decomposition technique. To facilitate the optimization, we first transform Eq. (16) into the following trace form-based expressions:

$$\begin{aligned} \sigma (Y)&=\frac{1-\alpha }{|ML|} \sum _{(x_i,x_j)\in ML}(d(y_i,y_j)-d_G(x_i,x_j))^2-\frac{\alpha }{|CL|}{\sum _{(x_i,x_j)\in CL}(d(y_i,y_j)-C)^2} \nonumber \\&=tr(YR^{ML}Y^T) - tr(YR^{CL}Y^T)\nonumber \\&=tr(Y(R^{ML}-R^{CL})Y^T). \end{aligned}$$
(28)

where

$$\begin{aligned} R^{ML}&=-\frac{1-\alpha }{2|ML|}(H(Q^{ML})H), \end{aligned}$$
(29)
$$\begin{aligned} R^{CL}&=-\frac{\alpha C}{2|CL|}(H(Q^{CL})H), \end{aligned}$$
(30)
$$\begin{aligned} Q_{ij}^{ML}&= d_{G}^{ML}(x_i,x_j), \end{aligned}$$
(31)
$$\begin{aligned} Q_{ij}^{CL}&=\bigg \{ \begin{array}{ll} 1,if(x_i,x_j)\in CL\\ 0,otherwise \end{array}, \end{aligned}$$
(32)

and \(H=I-{{\mathbf {1}}}{{\mathbf {1}}}^T/N\) is a centering matrix, \({\mathbf {1}}\) denotes a vector of all ones.

It should be noted that HQH is used to solve the objective function \(E=\sum ((d(y_i,y_j)-d_G(x_i,x_j))^2)\) in MDS [43] and Isomap [1]. Therefore, a solution using the eigen-decomposition of \(\sigma (Y)\) only reflects the distances between the sub-manifolds but does not reflect their topological structure. In addition, the iteration process \(\sum \limits _{(x_i,x_j)\in ML}(d(y_i,y_j)-d_G(x_i,x_j))^2\) in the solution using the SMACOF algorithm results in the sub-manifolds with better low-dimensional embedding. In order to optimize the solution with the eigen-decomposition and obtain accurate embedding results, the LLE-style item described in [2, 13] is added to the proposed objective function:

$$\begin{aligned} \beta \sum \limits _{i=1}^{N} \Vert {y_i-\sum \limits _{x_j\in N(x_i)} W_{ij}y_j}\Vert ^2 = \beta tr(Y(I-W)(I-W)^TY^T)=tr(YMY^T) \end{aligned}$$
(33)

where \(\beta\) is a trade-off parameter, and \(M=\beta (I-W)(I-W)^T\) describes the pairwise local weighting relationships. Subsequently, the new objective function expressed in Eq. (16) can be rewritten as

$$\begin{aligned} \sigma (Y)=tr(Y(R^{ML}-R^{CL}+M)Y^T). \end{aligned}$$
(34)

Letting

$$\begin{aligned} V=R^{ML}-R^{CL}+M, \end{aligned}$$
(35)

we can obtain the embedding results \(Y=[\sqrt{{\lambda _1}}v_1,\sqrt{{\lambda _2}}v_2,...,\sqrt{{\lambda _d}}v_d]^T\), where \(v_i,i=1,2,...,d\), are the standard eigenvectors corresponding to first d leading eigenvalues of the matrix V.

figure e

3.6 Complexity analysis

In the proposed UMD-Isomap algorithm, the process to compute the pairwise constraints is achieved by MPPCA and ConstructPC (in Algorithm 2), and the process to dimensionality reduction is achieved by two algorithms: UMDwithSMACOF (in Algorithm 4) or UMDwithEd (in Algorithm 5).

The complexity of MPPCA is \(O(mNDdt_1)\), where m denotes the number of “patches” trained by MPPCA, and \(t_1\) represents the number of iterations. The complexity of ConstructPC is \(O(\sum _{i=1}^{m}t_i)\), where \(t_i\) denotes only the time required to compute the dissimilarity of tangent subspaces between the points in “Ped” set and their neighbors in “nPed” set. Compared with \(O(mNDdt_1)\), the time complexity of ConstructPC is smaller. Hence, the complexity of computing the pairwise constraints can be approximated as \(O(mNDdt_1)\).

In the dimensionality reduction process, the geodesic distances between each pair of points in the ML set are computed by using Dijkstra’s algorithm. Then the embedding results of the original samples are computed using SMACOF (in Algorithm 4) or eigen-decomposition (in Algorithm 5), respectively. The complexity of Dijkstra’s algorithm is \(O(N^2)\). therefore, the complexity of computing the geodesic distances matrix is \(O(\sum _{{sM}_i}N_{{sM}_i}^2)\), where \({sM}_i\) represents the sub-manifolds determined by the pairwise constraints ML and CL. The complexity of SMACOF is \(O(t_2dN^2)\), where \(t_2\) is the number of iterations. Compared to the computation the geodesic distances matrix and the iterative calculation of the embedding results using SMACOF, the complexity of creating the pairwise constraints and he iterative calculation of the embedding using the eigen-decomposition is very small. Therefore, the complexities of Algorithm 4 and Algorithm 5 are approximately \(O(\sum _{sM_i}N_{{sM}_i}^2+ t_2dN^2)\) and \(O(\sum _{{sM}_i}N_{{sM}_i}^2)\), respectively.

4 Experiments

In this section, the performance of the proposed UMD-Isomap is assessed using a series of synthetic and real-world datasets. First, computation of the pairwise constraints is analyzed using four artificial datasets and three real datasets. Because the pairwise constraints reflect the clustering results of the sub-manifolds, the clustering accuracy is used as the evaluation criterion to assess the performance of computing the pairwise constraints. Subsequently, two approaches for dimensionality reduction (i.e., UMDwithEd and UMDwithSMACOF) in the proposed UMD-Isomap are evaluated using two artificial datasets and three real datasets. The two-dimensional(2D) embedding results are visualized to assess the quality. After that, the influence of the parameters and the results of the time complexity analysis are also discussed.

All of the experiments were performed on a PC with an Intel Core based system with 3.6 GHz CPU and 8 GB RAM and using the Matlab platform.

4.1 Clustering analysis

Clearly, the higher the classification accuracy is, the better the algorithm performance is. In experiments, we evaluate the clustering performance with two standard clustering evaluation metrics, i.e. Clustering Accuracy (ACC) and Normalized Mutual Information (NMI). The clustering accuracy is defined as the maximum classification accuracy among all possible alignments:

$$\begin{aligned} ACC = {\max _{align}} {\sum }_{i=1}^N \delta (t_i = l_i)/N, \end{aligned}$$
(36)

where \(t_i\) is the true label, \(l_i\) is the obtained label of \(x_i\), and \(\delta (\cdot )\) is the delta function. And the Normalized Mutual Information measures the similarity between the clustering results and the true classes:

$$\begin{aligned} NMI = \frac{\sum _{i=1}^{l^*} \sum _{j=1}^{l^*} p_{ij}log(\frac{p_{ij}}{p_i p_j})}{\sqrt{\sum _{i=1}^{l^*} p_i log p_i\sum _{j=1}^{l^*} p_j log p_j}}, \end{aligned}$$
(37)

where \(l^*\) is the number of sub-manifolds (or classes), \(p_i\) is the probabilities that the sample belongs to the ith cluster, and \(p_{ij}\) is the joint probability that the sample belongs to the both the ith cluster and the jth cluster.

In this subsection, the computation results of the pairwise constraints in our UMD-Isomap are compared with those of several other methods, (i.e., the k-means on Isomap, K-manifolds [10], Ncut [14], t-SNE [19], Umap [20]Footnote 1 and spctral multi-manifold clustering (SMMC) [16]). Neither Isomap nor K-manifolds performs well on intersecting nonlinear manifolds. However, since these methods are quite well-known, and in order to confirm that the proposed algorithm outperforms them, these two methods were included in the comparison.

Note that the clustering performance of UMD-Isomap is represented by the results of the ConstructPC or ReConstructPC algorithms, without considering the clustering results of the low-dimensional embedding for the proposed UMD-Isomap. Some supervised and semi-supervised algorithms always use k-means to test the clustering or classification results of the low-dimensional embedding. Although the accuracy of clustering is high, the result only reflects dimensionality reduction performance but not the clustering performances because the supervised and semi-supervised algorithms use a large number of labels and cannot classify the multi-manifold themselves.

Fig. 1
figure 1

The classification results of Isomap, K-manifolds, Ncut, t-SNE, Umap, SMMC and two versions of UMD-Isomap on four artificial data sets, the different colors represent the different sub-manifolds. a \( \$ \) data; b Hybrid data; c three planes; d two spirals

Four artificial datasets with different characteristics and complexities are selected and the clustering results are shown in Fig. 1 and Table 1. As illustrated in Fig. 1, the different colors denote different classes. In Table 1, the average clustering accuracy (ACC) and Normalized Mutual Information (NMI) of 10 independent runs the standard deviations, and the maximum values for the different methods are tabulated.

Table 1 The Classification accuracy and Normal Mutual Information (mean ± std

The following is observed: (1) The Isomap, K-manifolds, Ncut, t-SNE and Umap methods did not perform well on intersecting multi-manifolds. (2) The SMMC not only resulted in satisfactory performance for grouping samples generated from intersecting manifolds, but also performed well on the more general hybrid nonlinear manifold clustering problems. However, its results were not sufficiently robust because of the unstable results determined by MPPCA. (3) Both versions of UMD-Isomap resulted in excellent clustering accuracy and normal mutual information. However, the ConstructPC (in UMD-Isomap) algorithm is also not sufficiently robust for the same reason as the SMMC, and the number of its classes are created more than the real one(see the pictures of the sixth line in Fig. 1). (5) The results of the ReConstructPC (in UMD-Isomap) algorithm are more robust because it ensembles \(c^*=5\) sets of ML and computes the new clustering results with the parameter \(l^*\).

Three widely-known real-world datasets are also selected to evaluate the efficiency and accuracy of the different algorithms. The first one is the Coil20 dataset, which consists of 1440 images of 20 objects, and each object is viewed under a full 360\(^{\circ }\) of rotation. The MNIST dataset contains 10000 handwritten digit images (’0’ to ’9’). The USPS data set consists of 9289 handwritten digit images(’0’ to ’9’). The clustering results of the real-world datasets cannot be visualized in R2 or R3 spaces. Hence, we only compared the clustering accuracies (ACC) and Normalized Mutual Information (NMI) for the seven algorithms on the Coil20, MNIST, and USPS datasets (in Table 2).

The experimental results revealed a number of significant achievements: (1) The clustering results of t-SNE and Umap on three real-world datasets are desirable, which proves the “crowding problems” in high-dimensional space is not as serious as that in low-dimensional space. And the clustering results of Umap on USPS is best in seven algorithms; (2) The clustering results of Ncut on Coil20 is best in seven algorithms, because the sub-manifolds in Coil20 are hardly intersect. But its clustering results on USPS is worse than those of t-SNE and our two algorithms, because the USPS dataset has more samples and lower dimensions than other two real-world datasets; (3) The SMMC algorithm is not appropriate for real-world datasets because its accuracy are even lower than those of the Isomap, k-manifolds and Ncut algorithms. (4) The clustering accuracy is higher for ConstructPC than for ReConstructPC because the number of classes created by the ConstructPC algorithm is more than the real one. The influence of the classed number on the clustering accuracy will be discussed in Sect. 4.3.1. (5) the clustering results are better for the two algorithms in our proposed UMD-Isomap than for the other algorithms, demonstrating the effectiveness of the proposed algorithm.

In addition, the average computation times are shown in Table 3, demonstrating a rough indication of their time complexity. (1)The running time of the ConstructPC algorithm in the UMD-Isomap is less than those of other algorithms. However, it is still a little higher than that of the Ncut algorithm for the Coil20 data set because the time complexity of the ConstructPC algorithm is not only related to the number of samples but also the dimensionality of the datasets; (2) The running time of the ReConstructPC algorithm is larger because it computes the similarity matrix by running \(c^*\) sets of ConstructPC algorithm; (3) The proposed method is more efficient than the other method, which is also confirmed by the time complexity results.

In brief, the experiments on both the artificial datasets and the real-world datasets verify the effectiveness of the proposed algorithm.

Table 2 Classification accuracies (mean ± std. Followed by the highest accuracy) and computation time(s) for different methods using three data sets
Table 3 The running time(s) for different methods on seven data sets

4.2 Dimensionality reduction and visualization

Herein, two solutions for dimensionality reduction process (i.e., UMDwithEd and UMDwithSMACOF) in the proposed UMD-Isomap are compared with Isomap [1], t-SNE [19], Umap [20], MMD-Isomap [26], SSMM-Isomap [27] algorithms and S-Isomap [23].

The 2D embedding results of \( \$ \) and Hybrid datasets are visualized in Figs. 2 and  3, and those of Coil20, Mnist and USPS datasets are displayed in Figs. 4,  5 and  6. It was revealed that (1) Isomap cannot be applied to multi-manifolds, and the embeddings of sub-manifolds mix together; (2) t-SNE and Umap can construct better embedding for each sub-manifold for the high-dimension real-world manifolds, but it is not suitable for low-dimension intersect manifolds; (3) MMD-Isomap and SSMM-Isomap-2 can compute accurate embeddings for each sub-manifold. However, these embeddings overlap together; (4) S-Isomap can compute low-dimensional embeddings of multi-manifolds and visualize them as well, but the embedding for each sub-manifold is not accurate; (5) the proposed UMDwithEd algorithm achieves similar results with SSMM-Isomap-1, although they have different objective functions, which reflects that the solution based on an eigen-decomposition technique maybe not proper for multi-manifolds datasets; (6) the proposed UMDwithSMACOF algorithm not only indicates the most accurate embedding results, but also visualizes sub-manifolds separately. For example, Fig. 4(i) shows a two dimensional representation of Coil20 dataset discovered by UMDwithSMACOF, and the embedding of each object is visualized in a circle, which reflects the underlying rotational degree of freedom.

Fig. 2
figure 2

The 2D embedding results obtained by each method on the \(\$ \) dataset

Fig. 3
figure 3

The 2D embedding results obtained by each method on the Hybrid dataset

Fig. 4
figure 4

The 2D embedding results obtained by each method on the Coil20 dataset

Fig. 5
figure 5

The 2D embedding results obtained by each method on the Mnist dataset

Fig. 6
figure 6

The 2D embedding results obtained by each method on the USPS dataset

The time of running eight algorithms on five data sets is shown in Table 4. Based on the results it can be concluded that: (1)the running times of UMDwithSMACOF, SSMM-Isomap2 and MMD-Isomap algorithms are longer, because these algorithms all depend on iterations to reach objective functions; (2)the UMDwithEd algorithm has a better time complexity than other algorithms, although UMDwithSMACOF can achieve a more accurate embedding. Also, the running time on three real-world datasets in UMDwithEd is less than others obviously, because these datasets have more sub-manifolds and there is a smaller number of samples in each sub-manifold, which verifies the analysis of time complexity.

Table 4 Comparison of the running time of different algorithms on five data sets

4.3 Discussion on the parameters

In the UMD-Isomap, m and \(\theta _0\) determine the clustering accuracy of the ConstructPC and ReConstructPC algorithms and the number of classes constructed by ConstructPC. At the same time, the classes number also impacts the clustering accuracy of the ConstructPC. \(\alpha\) is the trade-off parameter for the UMDwithSMACOF and UMDwithEd algorithms. \(\beta\) is the adjusting parameter only for UMDwithEd, and \(\gamma\) is a parameter to enlarge the distances between sub-manifolds. In the subsection, the influence of m and \(\theta _0\) and the classes number on the clustering accuracy will be discussed firstly. Then the parameters of \(\alpha\), \(\beta\), and \(\gamma\) will be discussed respectively. Finally, the iteration processes of MPPCA and SMACOF will be discussed, because they are the bottleneck of the time complexity in ConstructPC and UMDwithSMACOF.

4.3.1 The Parameters of m and \(\theta _0\)

m is the number of MPPCA “patches”. The propagation process propagates the MPPCA “patches” to construct the sub-manifolds. So the classes number must be less than m. \(\theta _0\) is the threshold of the dissimilarity of tangent subspaces. The larger \(\theta _0\) is, the more the classes number is. The smaller \(\theta _0\) is, the less the classes number is.

m and \(\theta _0\) determine the clustering accuracy of the ConstructPC and ReConstructPC algorithms and the number of classes constructed by ConstructPC. The number of classed and the clustering accuracy constructed by ConstructPC algorithm on \( \$ \) dataset is shown in Fig. 7. Now we know the real classes number of \( \$ \) dataset is 2. So the number of classed in Fig. 7(a) should be 2. If the number of classes decomposed by ConstructPC algorithm is slightly larger than 2, the value of m and \(\theta _0\) can be accepted. So the value of m can be set between 70 and 120, and the value of \(\theta _0\) can be set between 0.6 and 0.8.

Fig. 7
figure 7

The influence of m and \(\theta _0\) on (a)the number of classes ; (b)the clustering accuracy

4.3.2 The Parameters of \(\alpha\), \(\beta\), and \(\gamma\)

Fig. 8
figure 8

The influence of \(\alpha\) on the UMDwithEd and UMDwithSMACOF algorithms

Fig. 9
figure 9

The influence of \(\beta\) on the UMDwithEd algorithms

Fig. 10
figure 10

The influence of \(\gamma\) on the UMDwithEd and UMDwithSMACOF algorithms

Our objective function attempts to preserve the global geometrical structures of the intra-class data points and separate the interclass data points. The parameter \(\alpha\) plays key role in the algorithm, determining the dimensionality reduction and visualization results of multi-manifold. The smaller \(\alpha\) is, the better the global geometrical structures of the intra-class points preserves. However, the interclass data points cannot be distinguished if the parameter is too small. So a smaller value for \(\alpha\) less than 0.5 is better, and it is usually set to 0.01. The bigger the parameter \(\gamma\) is, the larger the interclass distances become. And it is usually set to 3 or 4. The parameter \(\beta\) can adjust the accuracy of single-manifold in UMDwithEd. When the parameter \(\beta\) is small, the UMDwithEd algorithm only show the distance relationship between point-pairs. When the parameter \(\beta\) becomes large, the topological structure constructed by the weight matrix in LLE-style item is demonstrated. And the parameter is usually set to 0.1.

The three parameters are analyzed in Figs. 8,  9 and  10, respectively. It can be seen that (1) the UMDwithEd algorithm is not sensitive to \(\alpha\) and \(\gamma\), and it is only sensitive to \(\beta\); (2) the UMDwithSMACOF algorithm is more sensitive to \(\alpha\), and it can construct better embedding if the parameter \(\alpha\) is between 1e-8 and 0.1; (3) the interclass distances in UMDwithSMACOF algorithm can be adjusted by \(\gamma\); (4) our algorithms are very robust to the parameters, they nearly can be seen parameter-free approaches.

4.3.3 Convergence analysis

Fig. 11
figure 11

The time complexity vs. the iteration process. a The log likelihood and the iteration number in MPPCA; (b)the objective function and the iteration number in SMACOF

The bottleneck of the time complexity in ConstructPC and UMDwithSMACOF is in the iteration process of MPPCA and SMACOF. Fortunately, Fig. 11 shows that two algorithms have a rapid convergence rate and use fewer iterations to achieve a good performance. It is noted that each set of the log likelihood and the objective function subtracts their minimum values for better comparison.

5 Conclusions

In this paper, a novel algorithm is proposed, named UMD-Isomap, for the purpose of nonlinear dimensionality reduction and clustering of data lying on the multi-manifold. Different from most existing manifold learning algorithms, the proposed algorithm are used for the dimensionality reduction of multi-manifold. Each sub-manifold can not only get its low-dimensional embedding, but also all low-dimensional embeddings of multi-manifold do not overlap.

Our proposed algorithm is an unsupervised algorithm, which runs ConstructPC (or ReConstructPC) to cluster the data lying on the multi-manifolds, and transforms the clustering results into the pairwise constraints. Compared with the existing manifold clustering and spectral clustering algorithms, our proposed algorithms achieved better performance in terms of clustering accuracy and normalized mutual information. After that, Our proposed algorithm runs UMDwithSMACOF or UMDwithEd to construct the low-dimensional embeddings for multi-manifold. The UMDwithSMACOF can represent more accurate embeddings for multi-manifolds, however, its time complexity is higher because of its iteration process. The UMDwithEd represents the embedding more effectively, and it separates sub-manifolds distinctly, as it is not sensitive to the parameter.

Since the pairwise constraints can be transformed into label vectors, the ConstructPC and ReConstructPC algorithms can be applied to any supervised learning algorithm to construct corresponding unsupervised learning algorithm. And the UMDwithSMACOF and UMDwithEd algrithms can visualize 2D or 3D embeddings of any clustering results.

In future, there are many directions we need to further study. At first, the time complexity of the proposed algorithm is related to the dimension of the dataset, the optimization of solution will be explored in our future work. Secondly, the convergence of the model is only proved by some experiments in this paper, and the theoretical proof will be our next task. Finally, we will also explore other applications of UMD-Isomap, such as image recognition, scene classification and speech recognition.