Keywords

1 LLE Principle

A given data set \( X = \{ x_{1} ,x_{2} , \ldots ,x_{N} \} \) includes D dimensional real-valued vectors of N. And the data is sampled at a potential smooth manifold. It requires sufficient sample data points and each sampled data point and its adjacent points fall on a local linear block or around the block of the potential manifold. Reconstruction of the adjacent point set of each data point will be a set of linear coefficients which can describe the local linear geometry nature of the manifold [1]. The easiest way to determine the number (K) of adjacent point of each data point is to use Euclidean distance. The total reconstruction error is determined by the following cost function \( \varepsilon (W) = \sum\limits_{i = 1}^{N} {||x_{i} - \sum\limits_{j = 1}^{K} {W_{ij} x_{j} ||}^{2} } \) Weight matrix (W) is a \( N \times N \) dimensional symmetric matrix. And the weight \( W_{ij} \) represents the contribution of number j data point with reconstruction of number \( i \) data point [2]. To calculate the weights needs to minimize the cost function meeting the following two constraints:

  1. 1.

    Each data point can only be reconstructed by its adjacent points. If number j data point is not the adjacent point of number \( i \) data point, \( W_{ij} = 0 \);

  2. 2.

    All weight matrix columns of elements add up to 1. The optimum weights meeting these two constraints can be obtained by solving a least squares problem [3].

According to constraint 1, the reconstructed cost function can be written as: \( \varepsilon (W) = \sum\limits_{i = 1}^{N} {||x_{i} - \sum\limits_{j = 1}^{K} {W_{ij} x_{j} ||}^{2} }.\) Then, LLE builds neighborhood to reserve the mapping of high-dimensional observation vector \( x_{i} \) into low-dimensional vector \( y_{i} \) on a manifold. In the mapping process, firstly choose the D dimensional coordinate \( y_{i} \), then remain \( W_{ij} \) unchanged, and by optimizing \( y_{i} \) achieve the target minimum of the following embedding cost function \( \phi (Y) = \sum\limits_{i = 1}^{N} {||y_{i} - \sum\limits_{j = 1}^{N} {W_{ij} y_{j} ||}^{2} } \)

2 ISOMAP Principle

The key step of the ISOMAP is how to calculate the geodesic distance between sample points. In ISOMPA, the approximation calculation of geodesic distance is as follows: the geodesic distance between sample point \( x_{i} \) and its point in neighborhood is replaced by Euclidean distance between them. While that of sample point \( x_{i} \) and its point beyond neighborhood is replaced by shortest path between them on the manifold [4]. For example:

Select neighborhood and construct the neighborhood graph G. Then calculate the Euclidean distance between each sample point \( x_{i} \) and the rest of the sample points. When \( x_{j} \) is one of the points of K nearest to \( x_{i} \), they are considered as adjacent, that is, graph G has edge \( x_{i} x_{j} \) (Such a neighborhood is called \( K - \) neighborhood). Or when the Euclidean distance \( d(x_{i} ,x_{j} ) \) between \( x_{i} \) and \( x_{j} \) is less than fixed value \( \varepsilon \), the graph G is considered to have edge \( x_{i} x_{j} \) (Such a neighborhood is called \( \varepsilon - \) neighborhood). Suppose the weight of edge \( x_{i} x_{j} \) is \( d(x_{i} ,x_{j} ) \).

Calculate the shortest path. If the graph G has edge \( x_{i} x_{j} \), suppose the shortest path is \( d_{G} (x_{i} ,x_{j} ) = d(x_{i} ,x_{j} ) \),in contrast, \( d_{G} (x_{i} ,x_{j} ) = \infty \).According to \( l = 1, \ldots ,N \), \( d_{G} (x_{i} ,x_{j} ) = \hbox{min} \{ d_{G} (x_{i} ,x_{j} ),d_{G} (x_{i} ,x_{l} ) + d_{G} (x_{i} ,x_{j} )\} \), the shortest path of distance matrix will be \( D_{G} = [d_{G}^{2} (x_{i} ,y_{j} )]_{i,j = 1}^{N} \) which consists of the square of the shortest path between the sample points of the graph G [5].

Calculate D dimensional embedding. Apply MDS to the distance matrix \( D_{G}. \) When

$$ B = - (I - 1_{N}^{T} 1_{N} /N)D_{G} (I - 1_{N} 1_{N}^{T} /N)/2 $$
(99.1)

the largest Eigen value \( \lambda_{1} , \ldots ,\lambda_{d} \) of B and corresponding eigenvector \( u_{1} , \ldots ,u_{d} \) make up the matrix \( U = [u_{1} , \ldots ,u_{d} ] \) so that \( Y = diag(\lambda_{1}^{1/2} , \ldots \lambda_{d}^{1/2} )U^{T} \) is the result of D dimensional embedding.

3 Efficiency and Contrast of LLE and ISOMAP

MATLAB software has high efficiency in dealing with matrix and programming, so the MATLAB7.1 software is used to calculate program.

3.1 Effectiveness and Efficiency of LLE Dimensionality Reduction

3.1.1 Swiss Roll Data Sets

The number of data points is 800, that is, X = 800; target dimension after dimensionality reduction are two-dimensional, i.e. d = 2; construct the neighborhood graph with close neighbors K, K = 8, 12.

It can be seen from Table 99.1 when the selected number of neighbor points is different, the data point relations of low-dimensional space after dimensionality reduction is different, and the more the neighbor points are, and the longer the computing time is.

Table 99.1 Dimensionality reduction timetable of swiss roll data sets (different neighbors)

The number of data points is 800, 1,600 and 3,200, that is, X = 800, 1600, 3200; target dimension after dimensionality reduction are two-dimensional, i.e. d = 2; construct the neighborhood graph with close neighbors K, K = 8.

In Table 99.2, the time of dealing with 1,600 data points is 3.601 times the time to 800 data points, and the time of 3,200 data points is 3.807 times the time to deal with 800 data points.

Table 99.2 Dimensionality reduction timetable of swiss roll data sets (different sizes of data sets)

3.1.2 Twin Peaks Data Sets

The number of data points is 800, that is, X = 800; target dimension after dimensionality reduction are two-dimensional, i.e. d = 2; construct the neighborhood graph with close neighbors K, K = 8, 12.

The number of data points is 800, 1,600 and 3,200, that is, X = 800, 1600, 3200; target dimension after dimensionality reduction are two-dimensional, i.e. d = 2; construct the neighborhood graph with close neighbors K, K = 8.

By comparing Table 99.3 with Table 99.1, Table 99.4 with Table 99.2, it can be seen that based on the same conditions, LLE processing efficiency of different data sets is similar.

Table 99.3 Dimensionality reduction timetable of twin peaks data sets (different neighbors)
Table 99.4 Dimensionality reduction timetable of twin peaks data sets (different sizes of data sets)

3.2 Effectiveness and Efficiency of ISOMAP Dimensionality Reduction

3.2.1 Swiss Roll Data Sets

After the efficiency contrast of LLE, we find there is no need to repeatedly run. The time is almost the same, no more than one second. So ISOMAP efficiency experiments will take the approach that each condition is only run once.

In Fig. 99.1a, the computing time of the data sets is 55.819 s and Fig. 99.1c, 455.437 s. It is clear that the computing time of ISOMAP is a little long but the efficiency is not high.

Fig. 99.1
figure 1

Dimensionality reduction effectiveness figure of swiss roll data sets

4 Conclusion

Swiss roll is used as the test data, and the number of samples and close neighbors are respectively 800 and 12. The use of LLE and ISOMAP reduces above data sets from three-dimensional to two-dimensional and obtain the processing time 0.8952 s and 55.819 s respectively. When the number of samples adds to 1,600, the obtained processing time is 3.1033 s and 455.437 s respectively. It concludes that the processing time of LLE is better than that of ISOMAP.