1 Introduction

Dimensionality reduction (DR), as a fundamental issue in pattern recognition, is to project the original high-dimensional data into a lower-dimensional space by getting rid of the redundant or even irrelevant information according to a certain criterion. In recent years, since many emerging applications are revolved with the high-dimensional data, such as gene expressions, text mining, image classification and retrieval, the technique has attracted considerable attention and extensive studies have been done.

So far, many DR methods have been proposed. Among them, principal component analysis (PCA) [1] and linear discriminant analysis (LDA) [2, 3] are two representative methods. Recently, some methods based on manifold learning have shown their advantages, e.g. multidimensional scaling (MDS) [4], laplacian eigenmaps (LE) [5, 6], locally linear embedding (LLE) [7], t-distributed stochastic neighbor embedding (t-SNE) [8] and Isomap [9, 10]. In particular, Isomap, which can reveal the intrinsic geometric structure of manifold by preserving geodesic distance of all similarity pairs, has presented some encouraging results. However, the original Isomap is not good at extracting discriminative features for classification, as no class information of labeled data is considered. To handle the problem, several supervised versions of Isomap are proposed. WeightedIso uses a constant factor to change the Euclidean distance between two data points with the same class labels in the first step of Isomap [11]. In supervised Isomap (S-Isomap), two parameters are applied to update the distances among the pairwise points with the same and different class labels [12]. Zhang etc. [13] proposed a pairwise-constrained marginal Isomap (M-Isomap) which incorporates the pairwise cannot-link (CL) and must-link (ML) constraints induced from the neighborhood graph into Isomap to guide the discriminant manifold learning. Inspired by M-Isomap, multi-manifold discriminant Isomap (MMD-Isomap) [14] was presented by introducing two global pairwise constraints and defining a joint optimization objective.

Despite the supervised DR methods can generally perform better than the unsupervised ones, the performances are deeply influenced by the number of labeled samples, and performance deterioration becomes inevitable when there are no enough labeled data available [15, 16]. Active learning and semi-supervised learning are two promising learning paradigms to address the problem. Different from active learning which tries to select the most informative samples to be labeled [17], semi-supervised learning makes use of the prior knowledge of labeled samples and discriminative information hiding in unlabeled samples [18,19,20,21]. Semi-supervised discriminant analysis (SDA) [22] improves LDA by adding a regularization term to preserve the local structures of data. In [23], semi-supervised Isomap (SS-Isomap) is proposed by using prior information on exact mapping of certain data points to compute the low dimensional coordinates of unknown points. It is not in the typical sense of semi-supervised learning where both the labeled and unlabeled data are used for classification. It actually presents a method for out-of-sample mapping. Besides, the authors indicated that the improvement of SS-Isomap over the basic Isomap is not significant according to experimental results. In [24], multiple view semi-supervised dimensionality reduction (MVSSDR), an improved version of semi-supervised dimensionality reduction (SSDR) [25], uses the pairwise constraints to derive embedding in each view and makes these embeddings comparable through a linear transformation.

Although some semi-supervised DR methods have been proposed, most of them try to compute linear projections. However, non-linear DR may play an important role in human perception and learning. As a popular non-linear DR method, original Isomap is unsupervised and its performance can be improved by considering the class label information when a sufficient number of labeled samples available. In the case of limited training data, semi-supervised learning is helpful. In the paper, we study to apply useful information from the labeled and unlabeled samples to manifold learning for Isomap, and a semi-supervised discriminant Isomap (SSD-Isomap) is presented. In the method, two pairwise constraints including must-link (ML) and likely-link (LL) are first defined. Among the constraints, ML is constructed based on the labeled samples, while LL is built for the unlabeled samples. Then, two graphs based on the constraints are obtained, and the distances between points belonging to LL constraint are reset by a scale parameter. Finally, the corresponding optimal nonlinear subspace is sought to preserve the real distance of data.

The main contributions of this paper are summarized as follows:

  1. 1.

    Besides the common used constraint of must-link (ML), the constraint of likely-link (LL) is presented to depict the neighborhoods of data points without class labels. Compared with those supervised versions of Isomap, such as S-Isomap, M-Isomap and MMD-Isomap, the local structure information in the unlabeled sample points is used for modification of initial values of point distances.

  2. 2.

    SSD-Isomap uses a similar procedure in Isomap to obtain the low dimensional embedding after the geodesic distance matrix is initialized. Unlike M-Isomap and MMD-Isomap, no extra optimization algorithms are needed. Extensive experiments on data visualization, image retrieval and classification show that our method has better performance compared with other state-of-the-art DR methods.

The rest of this paper is organized as follow. In Sect. 2, the related work is briefly introduced. Section 3 describes the details of the proposed SSD-Isomap. In Sect. 4, extensive experiments are carried out for performance evaluation of the proposed method. Finally, Sect. 5 gives the conclusion.

2 Related work

In the section, we first summarize some notations used throughout the paper. The closely related works about Isomap is then reviewed.

Given a data set \({\mathbf{X}}=[{{\mathbf{x}}_1},{{\mathbf{x}}_2}, \ldots ,{{\mathbf{x}}_N}] \in {R^{M \times N}}\)with N points, DR is to find a mapping function that maps these points to a new data set \({\mathbf{Y}}=[{{\mathbf{y}}_1},{{\mathbf{y}}_2}, \ldots ,{{\mathbf{y}}_N}] \in {R^{m \times N}}\) in a lower dimensional space with dimension m (\(m \ll M\)). \(d({{\mathbf{x}}_i},{{\mathbf{x}}_j})\) denotes the Euclidean distance between \({{\mathbf{x}}_i}\) and \({{\mathbf{x}}_j}\).

2.1 Isomap

Isomap is a classic global nonlinear DR algorithm which aims at seeking an optimal subspace that best preserves the geodesic distance in pair data. It can be summarized as follows:

  1. 1.

    Construct a weighted undirected neighborhood graph \(G(V,E)\), where node \({v_i} \in V\) corresponds to point \({{\mathbf{x}}_i}\). For every pair of data points, if \(d({{\mathbf{x}}_i},{{\mathbf{x}}_j})\) is smaller than the fixed radius \(\varepsilon\) or \({{\mathbf{x}}_j} \in {\text{KNN}}({{\mathbf{x}}_i})\) (KNN means\({{\mathbf{x}}_j}\)is the K-nearest neighbors of \({{\mathbf{x}}_i}\)), the weight of edge \(e({{\mathbf{x}}_i},{{\mathbf{x}}_j}) \in E\) is set to \(d({{\mathbf{x}}_i},{{\mathbf{x}}_j})\).

  2. 2.

    Compute geodesic distances. Initialize the distance \({d_G}({\mathbf{x}_i},{\mathbf{x}_j})=d({\mathbf{x}_i},{\mathbf{x}_j})\) if \({{\mathbf{x}}_j}\) and \({{\mathbf{x}}_i}\) are neighbors, otherwise, let \({d_G}({\mathbf{x}_i},{\mathbf{x}_j})=\infty\). Estimate geodesic distances between all pairs of data points through computing all the shortest path distances \({d_G}({{\mathbf{x}}_i},{{\mathbf{x}}_j})\) in G. Dijkstra’s or Floyd’s algorithm can be applied to find the shortest paths.

  3. 3.

    Construct m-dimensional embedding. Define the Isomap criterion as follows:

    $$\mathop {\hbox{min} }\limits_{{\mathbf{Y}}} \sum\limits_{{{\mathbf{x}_i},{\mathbf{x}_j}}} {{{(d({\mathbf{y}_i},{\mathbf{y}_j}) - {d_G}({\mathbf{x}_i},{\mathbf{x}_j}))}^2}}$$
    (1)

Let \({\mathbf{H}}={\mathbf{I}} - (1/N){\mathbf{e}}{{\mathbf{e}}^T}\), where I is a \(N \times N\) identity matrix, and e is the vector of all ones. Let Q be a \(N \times N\) matrix with elements \({{\mathbf{Q}}_{ij}}=d_{G}^{2}({\mathbf{x}_i},{\mathbf{x}_j})\). According to MDS, the lower-dimensional embedding Y is obtained as \({[\sqrt {{\lambda _1}} {\mathbf{v}_1},\sqrt {{\lambda _2}} {\mathbf{v}_2}, \ldots \sqrt {{\lambda _m}} {\mathbf{v}_m}]^T}\), where \(\{ {{\mathbf{v}}_i}\} _{{i=1}}^{m}\) denotes the eigenvector according to the first m leading eigenvalues of \(\mathbf{R}= - \mathbf{H}\mathbf{Q}\mathbf{H}/2\).

2.2 Supervised versions of Isomap

Since Isomap is an unsupervised DR algorithm, it is not good at extracting discriminative features for classification task. Some supervised versions have been presented by considering the class label information [26]. In WeightedIso, the Euclidean distances between points with the same labels are reduced by a constant rescaling factor, and then Isomap is implemented based on these new distances. S-Isomap [12] develops this idea and defines a different distance metric as follows:

$$\mathop d\limits^{ \wedge } ({\mathbf{x}_i},{\mathbf{x}_j})=\left\{ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\sqrt {1 - \exp ( - {d^2}({\mathbf{x}_i},{\mathbf{x}_j})/\beta )} }&{{\text{if}}}&{l({\mathbf{x}_i})=l({\mathbf{x}_j})} \end{array}} \\ {\begin{array}{*{20}{c}} {\sqrt {\exp ({d^2}({\mathbf{x}_i},{\mathbf{x}_j})/\beta )} - \alpha }&{{\text{if}}}&{l({\mathbf{x}_i}) \ne l({\mathbf{x}_j})} \end{array}} \end{array}} \right.,$$
(2)

where \(l({\mathbf{x}_i})\) is the class label of \({\mathbf{x}_i}\). The parameters of \(\alpha\) and \(\beta\) are used to control the range of \(d({{\mathbf{x}}_i},{{\mathbf{x}}_j})\). Usually, \(\alpha\) is set to be a small positive value, and \(\beta\) is set to be the average Euclidean distance between all pairs of data point. The two parameters both modify the distances between data points with help of the class labels, and can improve the classification performance.

In M-Isomap [13], two local pairwise constraint sets, i.e., must-link (ML) and cannot-link (CL) are defined as

$$S_{{ML}} = \left\{ {({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} )\left| {l\left( {{\mathbf{x}}_{i} } \right) = l({\mathbf{x}}_{j} ),{\mathbf{x}}_{j} \in {\text{KNN(}}{\mathbf{x}}_{i} )} \right.} \right\},$$
(3)
$$S_{{CL}} = \left\{ {({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} )\left| {l({\mathbf{x}}_{i} ) \ne l({\mathbf{x}}_{j} ),{\mathbf{x}}_{j} \in {\text{KNN(}}{\mathbf{x}}_{i} )} \right.} \right\}.$$
(4)

According to Eqs. (3) and (4), two pairwise-constrained neighborhood graphs \({G_{ML}}\) and \({G_{CL}}\) can be constructed. M-Isomap optimizes the following two criteria:

$${J_{ML}}=\mathop {\hbox{min} }\limits_{{\mathbf{Y}}} \sum\limits_{{({\mathbf{x}_i},{\mathbf{x}_j}) \in {S_{ML}}}} {{{(d({\mathbf{y}_i},{\mathbf{y}_j}) - d_{G}^{{ML}}({\mathbf{x}_i},{\mathbf{x}_j}))}^2}} ,$$
(5)
$${J_{CL}}=\mathop {\hbox{max} }\limits_{{\mathbf{Y}}} \sum\limits_{{({\mathbf{x}_i},{\mathbf{x}_j}) \in {S_{CL}}}} {{{(d({\mathbf{y}_i},{\mathbf{y}_j}) - d_{G}^{{CL}}({\mathbf{x}_i},{\mathbf{x}_j}))}^2}} ,$$
(6)

where \(d_{G}^{{ML}}({\mathbf{x}_i},{\mathbf{x}_j})\) and \(d_{G}^{{CL}}({\mathbf{x}_i},{\mathbf{x}_j})\) are the shortest path distances between \({{\mathbf{x}}_i}\) and \({{\mathbf{x}}_j}\) in \({G_{ML}}\) and \({G_{CL}}\), respectively. By combing Eqs. (5) and (6), the optimization problem can be solved by using iterative trace ratio (ITR) algorithm. Inspired by M-Isomap, MMD-Isomap [14] is proposed by introducing two global pairwise constraints sets. The two optimization criteria are combined through a regularization parameter, and SMACOF is used to solve the objective.

3 Semi-supervised discriminant Isomap

The supervised versions of Isomap can usually perform better than the original one, but the performance improvement largely depends on whether there are enough number of labeled samples. According to the idea of semi-supervised learning, some discriminative information hiding in the unlabeled samples can be complementary to the prior knowledge. In view of this, a semi-supervised discriminant Isomap (SSD-Isomap) is presented. A “good” projection should be the one which two data points in the new subspace are close to each other if and only if they have the same labels or they are in neighborhood in the original feature space.

3.1 Pairwise-constrained graphs

Let \({\mathbf{X}_L}=\{ {\mathbf{x}_1}, \ldots ,{\mathbf{x}_L}\} \in {{\varvec{R}}^{M \times L}}\) and \({\mathbf{X}_U}=\{ {\mathbf{x}_{L+1}}, \ldots ,{\mathbf{x}_{L+U}}\} \in {{\varvec{R}}^{M \times U}}\) be the labeled and unlabeled data sets. A weighted undirected neighborhood graph \(G(V,E)\) is first constructed. The edge weight \(e({\mathbf{x}_i},{\mathbf{x}_j}) \in \{ 0,\,0.5,\,1\}\) is used to indicate three types (disconnection, strong connection and weak connection) of the neighboring points. Specifically, when the labeled samples \({\mathbf{x}_i}\) and \({\mathbf{x}_j}\) have the same class labels, the connection between the two points is strong with \(e({\mathbf{x}_i},{\mathbf{x}_j})=1\); when \({\mathbf{x}_i} \in {{\mathbf{X}}_U}\) is among K-nearest neighbors of \({\mathbf{x}_j}\) or \({\mathbf{x}_j} \in {{\mathbf{X}}_U}\) is among K-nearest neighbors of \({\mathbf{x}_i}\), the connection between the two points is weak with \(e({\mathbf{x}_i},{\mathbf{x}_j})=0.5\); otherwise, there is no connection with \(e({\mathbf{x}_i},{\mathbf{x}_j})=0\). Therefore, a global pairwise-constrained set (namely ML) for the labeled samples and a local pairwise-constrained set named by likely-link (LL) for the unlabeled samples are defined as follows:

$$S_{{ML}}^{L} = \left\{ {({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} )\left| {e({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} )} \right. = 1,l({\mathbf{x}}_{j} ) = l({\mathbf{x}}_{i} )} \right\},$$
(7)
$$S_{{LL}}^{U} = \left\{ {({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} )\left| {e({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} )} \right. = 0.5,{\mkern 1mu} {\mathbf{x}}_{j} \in KNN({\mathbf{x}}_{i} )} \right\},$$
(8)

where \(l({\mathbf{x}_i})\) is the class label of \({\mathbf{x}_i}\) (\(i=1,2, \ldots ,L\)). Based on these definitions, an ML-constrained graph \({G^L}(V,E)\) based on the labeled data and an LL-constrained graph \({G^U}(V,E)\) based on the unlabeled data are constructed by keeping the edges with \(e({\mathbf{x}_i},{\mathbf{x}_j})=1\) and \(e({\mathbf{x}_i},{\mathbf{x}_j})=0.5\), respectively.

3.2 Geodesic distance metrics

Based on the constrained graph \({G^L}(V,E)\), \({d^{ML}}({\mathbf{x}_i},{\mathbf{x}_j})={\left\| {{\mathbf{x}_i} - {\mathbf{x}_j}} \right\|_2}\) is computed for the linking pair \(({\mathbf{x}_i},{\mathbf{x}_j}) \in S_{{ML}}^{L}\). For a pair \(({\mathbf{x}_i},{\mathbf{x}_j}) \in S_{{LL}}^{U}\), they are likely to be connected, but there is a possibility that they come from different classes and thus no edge should be put between them. In view of this, a compromise factor \(\upgamma\) is applied to reset their distance according to the graph \({G^U}(V,E)\):

$${d^{LL}}({\mathbf{x}_i},{\mathbf{x}_j})={{{{\left\| {{\mathbf{x}_i} - {\mathbf{x}_j}} \right\|}_2}} \mathord{\left/ {\vphantom {{{{\left\| {{\mathbf{x}_i} - {\mathbf{x}_j}} \right\|}_2}} \gamma }} \right. \kern-0pt} \gamma },$$
(9)

where \(\gamma \in (0,\;1)\). Through computing \({d^{ML}}\) and \({d^{LL}}\) for all pairs \(({\mathbf{x}_i},{\mathbf{x}_j})\) (\(i,\,j=1,2, \ldots ,L+U\)), the distance matrix \({D_{SS}}\) is constructed. The geodesic distances between all pairs of points are estimated by computing their shortest path distances \({d_{SSD}}({{\mathbf{x}}_i},{{\mathbf{x}}_j})\). Similar to Isomap, the shortest path between each pair of points is computed by Floyd’s algorithm. In particular, initialize \({d_{SSD}}({\mathbf{x}_i},{\mathbf{x}_j})={d_{SS}}({\mathbf{x}_i},{\mathbf{x}_j})\) for points \({\mathbf{x}_i}\) and \({\mathbf{x}_j}\). Then for each value of \(k=1,2, \ldots ,L+U\) in turn, replace all entries \({d_{SSD}}({{\mathbf{x}}_i},{{\mathbf{x}}_j})\) by \(\hbox{min} \{ {d_{SSD}}({\mathbf{x}_i},{\mathbf{x}_j}),{d_{SSD}}({\mathbf{x}_i},{\mathbf{x}_k})+{d_{SSD}}({\mathbf{x}_k},{\mathbf{x}_j})\}\). Finally, the matrix \({D_{SSD}}\)will contain the shortest path distances between all pairs of points.

3.3 Objective function

Like Isomap and S-Isomap, SSD-Isomap seeks the projection which can preserve the original manifold structure between all pair of points in a lower dimensional space. The object function is defined as:

$${J_{SSD}}=\mathop {\hbox{min} }\limits_{\mathbf{Y}} \sum\limits_{\begin{subarray}{l} ({\mathbf{x}_i},{\mathbf{x}_j}) \in S_{{ML}}^{L} \\ \text{or}\;({\mathbf{x}_i},{\mathbf{x}_j}) \in S_{{LL}}^{U} \end{subarray} } {{{(d({\mathbf{y}_i},{\mathbf{y}_j}) - {d_{SSD}}({\mathbf{x}_i},{\mathbf{x}_j}))}^2}} ,$$
(10)

where \({\mathbf{Y}}=[{{\mathbf{y}}_1},{{\mathbf{y}}_2}, \ldots ,{{\mathbf{y}}_N}] \in {R^{m \times (L+U)}}\) is the lower dimensional representations of \(\mathbf{X}\). Let \({\mathbf{R}^{SSD}}= - \mathbf{H}{\mathbf{Q}^{SSD}}\mathbf{H}/2\) and \(\mathbf{Q}_{{ij}}^{{SSD}}={(d_{G}^{{SSD}}({\mathbf{x}_i},{\mathbf{x}_j}))^2}\); then Eq. (10) can be rewritten as follows:

$${J_{SSD}}(\mathbf{Y})=\mathop {\hbox{min} }\limits_{\mathbf{Y}} {\left\| {{\mathbf{Y}^T}\mathbf{Y} - {\mathbf{R}^{SSD}}} \right\|_F},$$
(11)

where \({\left\| {\, \cdot \,} \right\|_F}\)is the Frobenius matrix norm. The final embedding Y can be obtained by the classic MDS. The whole training procedure of SSD-Isomap is listed in Table 1.

Table 1 SSD-Isomap

As one of Isomap series methods, SSD-Isomap uses the constraints of ML and LL to guide the computation of geodesic distance metrics. The computational complexity is \({\rm O}({N^3})\) when Floyd’s algorithm is used, and it can be improved to \({\rm O}(k{N^2}\log N)\) when Dijkstra’s algorithm is used, where N and k are the sample size and neighborhood size, respectively. Based on the shortest paths between all pairs of samples, eigen-decomposition is applied to obtain the lower-dimensional embedding. The time complexity is\(~{\rm O}({N^3})\). In fact, the computational complexity is always a bottleneck for all the Isomap series methods when they are applied to large data sets. But landmark Isomap [10] presents an effective solution to the problem, and can reduce the computational cost to \({\rm O}(knN\log N)\) for the shortest-paths calculation and \({\rm O}({n^2}N)\) for the MDS eigenvalue calculation, where n is the number of landmark points and \(~n \ll N\). So, SSD-Isomap can handle large-scale datasets by adopting the scheme.

4 Experiments

In this section, the extensive experiments are carried out to evaluate the performance of the proposed SSD-Isomap. The visual, classification and image retrieval performances are compared with those of some state-of-the-art DR algorithms including the unsupervised Isomap, MDS, Laplacian Eigenmaps (LE), LLE, the supervised LDA, S-Isomap, M-Isomap, and the semi-supervised SDA, SSDR. In this study, we test a synthetic data set, six benchmark image data sets including Corel [27,28,29], UC Merced LULC [30], Caltech101 [31], YALE [32, 33], ORL [34], MNIST [35], and six UCI data sets [36]. All of the experiments were run in MATLAB with Intel(R) core(TM) i5-3470 CPU at 3.2 GHz and 12 GB RAM.

4.1 Visualization

An artificial data set of Swiss roll is used in the section. This 3-D data set with 1600 samples has 4 classes and is shown in Fig. 1. In the experiment, half of data is randomly selected as the training (labeled) samples, and the rest is treated as test (unlabeled) samples. In Figs. 1, 4-class samples are marked by different colors. Furthermore, labeled samples are denoted by the symbol of ‘circle’, and unlabeled samples from four classes are denoted by the symbol of ‘stars’, ‘plus’, ‘square’ and ‘diamond’, respectively. For unsupervised DR methods, all 1600 samples are used to obtain the lower dimensional representations. For supervised methods, the lower embedding is computed only based on the training samples, and the map approximated by a BP neural network is applied to all samples. For semi-supervised methods, the map is established based on both labeled and unlabeled samples.

Fig. 1
figure 1

Data set of Swiss roll

To get the best performance of algorithms, their parameters are carefully adjusted. The number of neighbors K is set to 30 for S-Isomap, 7 for Isomap and LLE, 150 for the proposed SSD-Isomap. We follow the same settings as [12] for S-Isomap: the parameter \(\alpha\) is set to 0.5, and the parameter \(\beta\) is set to be the average Euclidean distance between all pairs of data points. In our algorithm, the parameter of \(\gamma\)is determined by fivefold cross validation and is set to 0.1. The visualization results are shown in Fig. 2.

Fig. 2
figure 2figure 2

The 2D embedding obtained by different methods on the Swiss roll data set

It can be observed that the unsupervised methods including Isomap, MDS, LE and LLE cannot separate the four-class data clearly. T-SNE can achieve a clear separation, but the clusters in green and blue are wrongly divided into two subparts. In supervised and semi-supervised methods, LDA, M-Isomap, SDA, SSDR and MVSSDR also fail to achieve separation. The clusters in the S-Isomap embedding space are generally separable, but there still exist overlaps between clusters in red and blue, and clusters in green and black. Compared with other methods, the proposed SSD-Isomap provides a better separation on the clusters.

4.2 Image retrieval

Three data sets including Caltech101, Corel1000 and UC Merced LULC are used in the section. Caltech101 data set, which has been widely adopted for object recognition and image retrieval tasks, has 101 categories with 40–800 images per category. The size of each image is roughly 300 × 200 pixels. Corel1000 data set is a part of the real-world photos from COREL Photo Gallery. It has 10 categories with 100 images per category. The UC Merced LULC data set, obtained from aerial imagery, consists of images from 20 classes, with a pixel resolution of 30 cm. Each class contains 100 images of size 256 by 256 pixels. We extract basic color features and wavelet texture features to describe images [37, 38]. The features include color histogram (32 dimensions), color moment (64 dimensions), color auto correlogram (6 dimensions), wavelet moment (40 dimensions) and Gabor transform where the number of scales was set 4 and orientation was set 6 (48 dimensions). All the features are concatenated into a long vector as an image feature and each image is represented by a 190-dimensional vector. The new feature is normalized to zero mean and unit variance.

In our experiment on Caltech101, we use 10 out of 101 categories, and images from each class are randomly split into a training set of 80 images and a test set of 20 images. For Corel1000 and UC Merced LULC data sets, image samples are evenly divided into training set and test set. In training set, samples are further split into labeled and unlabeled subsets equally, and only samples belonging to the labeled subset can be used for the supervised DR methods. For other methods, all samples in the training set are used to obtain the lower dimensional projection. For nonlinear DR methods, such as Isomap, S-Isomap, M-Isomap and SSD-Isomap, a BP neural network is constructed to simulate the mapping from high dimension to lower dimension based on the training set. In the reduced dimension space, L2 distance metric is employed to measure the similarities between the query image from the test set and the labeled images. Precision is used as the quantitative index for performance evaluation.

The dimension of data is reduced to the number of classes of each data set for all DR methods except for LDA. The parameters of each method are carefully adjusted to get the best performance. Table 2 lists the comparison on precision of top 5 to 25 (with 5 step intervals) retrieved images, and average precisions are also presented. On each data set, the best performance is emphasized by bold. As we can see, SSD-Isomap obtains the highest precisions on the three data sets. According to the average performance, the 12 methods can be sorted as: (1) SSD-Isomap, SDA, S-Isomap, SSDR, LDA, MVSSDR, LE, MDS, Isomap, t-SNE, M-Isomap and LLE for Corel1000; (2) SSD-Isomap, LDA, SDA, SSDR, S-Isomap, MVSSDR, t-SNE, LE, MDS, Isomap, LLE and M-Isomap for LULC; (3) SSD-Isomap, S-Isomap, LDA, SDA, SSDR, M-Isomap, MVSSDR, MDS, LE, Isomap, t-SNE and LLE for Caltech101.

Table 2 Retrieval precisions of different methods

To compare the effect of number of dimensions on retrieval performance, Fig. 3 presents the average precisions of different methods when the number of dimension changes from 5 to 40 on the Caltech101 data set. It is obvious that the supervised and semi-supervised methods perform better than unsupervised ones. SSD-Isomap generally achieves the highest precisions except when the dimension is reduced to 5 and 30. SDA slightly outperforms S-Isomap, and SSDR ranks between S-Isomap and M-Isomap. In the figure, many methods achieve best performance when number of dimensions is 10. As known, the high-dimensional data can be efficiently represented in a space of a much lower dimension without losing much information. The number of reduced dimensions is a key parameter for DR. If the dimension is too small, important features are projected onto the same dimension, and if the dimension is too large, the projections become noisy. In Fig. 3, the performances of different methods exhibit the phenomenon. When the number of dimensions grows, the retrieval precision of each method reaches a maximum in a ten-dimensional space and then decreases. Hence the intrinsic dimension of Caltech101 data set may be 10 by empirical analysis. Indeed, how to estimate intrinsic dimension is still an open issue, and it is beyond the scope of this paper. But Fig. 3 also shows that intrinsic dimension is significant for DR methods.

Fig. 3
figure 3

Average precision comparison when number of dimensions changes on Caltech101 data set

4.3 Classification

In classification experiments, three image data sets including MNIST, YALE and ORL, and six UCI data sets are used for performance comparison. The MNIST data set has 70,000 hand written digit images with sizes of 28 × 28 pixels. Each image is denoted by a 784-dimensional vector. YALE face data set contains 165 images of 15 individuals. These face images are resized to 32 × 32 pixels with 256 Gy levels, and each image is presented by a 1024-dimensional vector. In ORL data set, 400 images with sizes of 32 × 32 pixels are from 40 persons, so each image is also denoted by a 1024-dimensional vector. Table 3 gives the details of the six UCI data sets.

Table 3 Six UCI data sets

The same manner as in Sect. 4.2 is adopted to generate the training and test sets for MNIST, YALE and ORL data sets. In UCI data set, the two sets are obtained by fivefold cross validation. A BP neural network is used to approximate the maps of nonlinear DR methods. The final class labels of test data are determined by KNN classifier. The dimension of data is reduced to the number of classes of each data set for all DR methods except for LDA.

Table 4 presents the classification accuracies of 12 methods on the nine data sets. It is interesting to find that LDA performs better than SSD-Isomap on ORL and wine data sets. It seems that manifold methods generally perform worse on the two data sets. For ORL data set, SSD-Isomap is inferior to LDA, but comparable with SDA and SSDR. For wine data set, LDA, SDA and SSDR outperform SSD-Isomap, but SSD-Isomap performs best in the manifold methods. In general, SSD-Isomap shows the best or second best performance in most data sets except wine, and achieves the highest averaged accuracy. Figure 4 shows the classification accuracy obtained by SSD-Isomap with different \(\gamma\) on wine data set. It can be seen that better performance can be achieved when \(\gamma\) ranges from 0.2 to 0.6.

Table 4 Classification accuracy on the MNIST, YALE, ORL and UCI
Fig. 4
figure 4

Classification accuracy of SSD-Isomap with different values of \(\gamma\) on wine data set

5 Conclusions

A novel nonlinear DR method, SSD-Isomap, is presented in the paper. In SSD-Isomap, the original unsupervised Isomap is extended to the semi-supervised learning paradigm. We use a new constraint of LL from the unlabeled data to depict the local structure of data points, and the popular constraint of ML from the labeled data to present the class information. Based on the two constraints, graphs are constructed and the geodesic distance matrix is initialized. Then, the matrix is optimized by a similar procedure in Isomap and the low dimensional embedding is obtained. SSD-Isomap not only exploits useful information from both labeled and unlabeled data, but also applies the intrinsic nonlinear structure of the data. Therefore, it can achieve a discriminative lower-dimensional mapping. The extensive experimental comparisons between SSD-Isomap and other state-of-the-art methods have demonstrated that SSD-Isomap is more robust and effective in visualization, image retrieval and classification.