Keywords

1 Introduction

Dimensionality reduction is one of the most useful tools for data analysis in data mining. Many dimensionality reduction algorithms can achieve the best features when tuning parameters. However, tuning parameters in the process of dimensionality reduction significantly increases the time cost. Cloud computing [6, 7], which has supercomputing power, can extract features more efficiently when tuning parameters.

The most popular dimensionality reduction algorithms include locally linear embedding (LLE) [1], ISOMAP [2], and Laplacian eigenmap (LE) [3]. These algorithms only provide the embedding results for training samples. There are many extensions that attempt to solve the out-of-sample problem, such as locality preserving projections (LPP) [4, 5]. These algorithms could preserve the local information by constructing an adjacency graph, but they cannot work well in classification because they are unsupervised.

Many supervised algorithms have been proposed to overcome the aforementioned drawbacks. Linear discriminant analysis (LDA) was proposed in [8,9,10], Yan et al. proposed marginal Fisher analysis (MFA) [11]; Zhang et al. proposed discriminant neighborhood embedding (DNE) [12]; Ding et al. proposed similarity-balanced discriminant neighborhood embedding (SBDNE) [14] and double adjacency graph-based discriminant neighborhood embedding (DAG-DNE) [13], and so on. However, these algorithms may not consider the different degrees of intra-class information and inter-class information, which is important to learn the projection matrix.

Inspired by recent progress, in this study, we propose a novel supervised discriminant subspace learning algorithm called locality balanced double adjacency graphs-based discriminant neighbor embedding (LBDAG-DNE). In LBDAG-DNE, we employ DAG-DNE to construct two adjacency graphs to preserve the intra-class information and inter-class information, which link every sample to its homogeneous and heterogeneous neighbors, respectively. In LBDAG-DNE, we introduce a parameter that can balance the intra-class information and inter-class information depending on the situational requirements. Thus, LBDAG-DNE could maintain the balance between intra-class information and inter-class information and find an optimal projection matrix. Experimental results validate the effectiveness of LBDAG-DNE in comparison with several related state-of-the-art methods.

The rest of this paper is structured as follows. In Sect. 2, we provide a summary of the classic algorithms. Our LBDAG-DNE algorithm is introduced in Sect. 3. The experimental results are presented in Sect. 4. Finally, we provide the concluding remarks in Sect. 5.

2 Related Work

Over the past few years, dimensionality reduction techniques have received much attention, and correspondingly, many algorithms have been proposed [11,12,13, 15]. We will briefly introduce some of the classic algorithms in this section.

Yan et al. [11] proposed MFA in 2005, which finds an optimal projection matrix by simultaneously minimizing the intra-class scatter and maximizing the inter-class scatter by constructing two adjacency graphs. However, it cannot determine the optimal discriminant subspace.

Soon after this, Zhang et al. [12] proposed DNE. It maintains the local structure and distinguishes homogeneous and heterogeneous neighbors by constructing an adjacency graph, which can determine the optimal discriminant subspace. However, DNE does not construct a link between each point and its heterogeneous neighbors when constructing the adjacency graph.

Recently, Ding et al. [13] proposed DAG-DNE, which can effectively solve the problem of DNE and LDNE, with each sample respectively linked to its homogeneous and heterogeneous neighbors by constructing double adjacency graphs. However, DAG-DNE simply considers intra-class information and inter-class information to have the same degree of importance. In actuality, they play different roles in the classification task.

The above algorithms may simply consider intra-class information and inter-class information to have the same degree of importance. However, they play different roles in the classification task. Thus, when projected into a low-dimensional space, some more important discriminative information may be missed.

3 Our Proposed LBDAG-DNE

3.1 LBDAG-DNE

Let \( \{ ({\mathbf{x}}_{i} ,y_{i} )\}_{i = 1}^{N} \) be a set of training samples, where \( {\mathbf{x}}_{i} \in R^{d} \) and \( y_{i} \in \{ 1,2, \ldots ,C\} \). LBDAG-DNE aims to find a projection matrix \( {\mathbf{P}} \), with the ability to project the data from a high-dimensional space into a low-dimensional space \( {\mathbf{V}}_{i} = {\mathbf{P}}^{T} {\mathbf{x}}_{i} \), which allows neighbors belonging to the same class to be compact while neighbors belonging to different classes become separable.

Similar to DAG-DNE, LBDAG-DNE requires the construction of two adjacency graphs. Let \( {\mathbf{F}}^{w} \) and \( {\mathbf{F}}^{b} \) be the intra-class and inter-class adjacency matrices, respectively. For a sample \( {\mathbf{x}}_{i} \), \( NH_{k}^{w} ({\mathbf{x}}_{i} ) \) and \( NH_{k}^{b} ({\mathbf{x}}_{i} ) \) denote its \( K \) homogeneous and heterogeneous neighbors, respectively.

The intra-class adjacency matrix \( {\mathbf{F}}^{w} \) is defined as

$$ F_{ij}^{w} = \left\{ {\begin{array}{*{20}l} { + 1,} \hfill & {{\mathbf{x}}_{i} \in NH_{k}^{w} ({\mathbf{x}}_{j} )\;or\;{\mathbf{x}}_{j} \in NH_{k}^{w} ({\mathbf{x}}_{i} )} \hfill \\ {0,} \hfill & {otherwise} \hfill \\ \end{array} } \right. $$
(1)

and the inter-class adjacency matrix \( {\mathbf{F}}^{b} \) is

$$ F_{ij}^{b} = \left\{ {\begin{array}{*{20}l} { + 1,} \hfill & {{\mathbf{x}}_{i} \in NH_{k}^{b} ({\mathbf{x}}_{j} )\;or\;{\mathbf{x}}_{j} \in NH_{k}^{b} ({\mathbf{x}}_{i} )} \hfill \\ {0,} \hfill & {otherwise} \hfill \\ \end{array} } \right. $$
(2)

The intra-class scatter is defined as follows:

$$ \begin{aligned} \varPhi ({\mathbf{P}}) = & \,\sum\limits_{i.j} {||{\mathbf{P}}^{T} {\mathbf{x}}_{i} - {\mathbf{P}}^{T} {\mathbf{x}}_{j} ||^{2} } F_{ij}^{w} \\ \, = & \,2tr\{ {\mathbf{P}}^{T} {\mathbf{X}}({\mathbf{D}}^{w} - {\mathbf{F}}^{w} ){\mathbf{X}}^{T} {\mathbf{P}}\} \\ \end{aligned} $$
(3)

where \( {\mathbf{D}}^{w} \) is a diagonal matrix, and its entries are the column sums of \( {\mathbf{F}}^{w} \).

The inter-class scatter is as follows:

$$ \begin{aligned} \varPsi ({\mathbf{P}}) = & \,\sum\limits_{i.j} {||{\mathbf{P}}^{T} {\mathbf{x}}_{i} - {\mathbf{P}}^{T} {\mathbf{x}}_{j} ||^{2} } F_{ij}^{b} \\ \, = & \,2tr\{ {\mathbf{P}}^{T} {\mathbf{X}}({\mathbf{D}}^{b} - {\mathbf{F}}^{b} ){\mathbf{X}}^{T} {\mathbf{P}}\} \\ \end{aligned} $$
(4)

where \( {\mathbf{D}}^{b} \) is a diagonal matrix, and its entries are the column sums of \( {\mathbf{F}}^{b} \).

The goal is to allow neighbors belonging to the same class be compact, while neighbors belonging to different classes become separable in the subspace. We need to maximize the margin of total inter-class scatter and total intra-class scatter, i.e.,

$$ \Theta ({\mathbf{P}}) =\Psi ({\mathbf{P}}) - \beta\Phi ({\mathbf{P}}) $$
(5)

where \( \beta \in [0,10] \) is a tuning parameter that controls the tradeoff between intra-class information and inter-class information.

LBDAG-DNE seeks to find a projection matrix \( {\mathbf{P}} \) by solving the following objective function. The complete derivation and theoretical justifications are similar to those of DAG-DNE. Therefore, the details of the derivation and theoretical justification can be found in [13].

$$ \left\{ {\begin{array}{*{20}l} {\mathop {\text{max}}\limits_{{\mathbf{P}}} } \hfill & {{\text{tr}}\left\{ {{\mathbf{P}}^{T} {\mathbf{XSX}}^{T} {\mathbf{P}}} \right\}} \hfill \\ {s.t.} \hfill & {{\mathbf{P}}^{T} {\mathbf{P}}\,{ = }\,{\mathbf{I}}} \hfill \\ \end{array} } \right. $$
(6)

where \( S = {\mathbf{D}}^{b} - {\mathbf{F}}^{b} - \beta *{\mathbf{D}}^{w} + \beta *{\mathbf{F}}^{w} \).

The projection matrix \( {\mathbf{P}} \) can be found by solving the generalized eigenvalue problem as follows:

$$ {\mathbf{XSX}}^{T} {\mathbf{P}} = \lambda {\mathbf{P}} $$
(7)

Thus, \( {\mathbf{P}} \) is composed of the optimal \( r \) projection vectors corresponding to the \( r \) largest eigenvalues.

The details for LBDAG-DNE are given in Algorithm 1.

figure a

3.2 Connection to LBDAG-DNE and DAG-DNE

By constructing two adjacency graphs, DAG-DNE can maintain the local intrinsic structure for the original data in the subspace, allowing it to effectively find optimal discriminant directions. However, DAG-DNE simply considers the intra-class information and inter-class information to have the same degree of importance. In actuality, they play different roles in the classification task. Thus, when projected into the low-dimensional space, some more important discriminative information may be missed. LBDAG-DNE regulates the different levels of the intra-class information and inter-class information by introducing a balance factor. As a result, LBDAG-DNE can adjust the balance factor according to the actual situation to achieve a good performance.

4 Experiments and Analysis

4.1 Data Sets

We conducted experiments on three data sets that are publicly available: MNISTFootnote 1, UMISTFootnote 2. Brief descriptions of these data sets are given below (see Table 1 for some important statistics):

Table 1. Data sets used in our experiments

MNIST is a data set of handwritten digits. Each image is represented as a 784-dimensional vector.

UMIST is a data set that takes into account race, sex, and appearance, which we downsampled to a size of \( 32 \times 32 \) for computational efficiency.

4.2 Experimental Setup

All of the algorithms were implemented in MATLAB 2012b, and executed on an Intel (R) i5 Core CPU 2.50 GHz machine with 4 GB of RAM. Our experiment required the nearest neighbor parameter \( K \) to construct adjacency graphs. For simplicity, the nearest neighbor (NN) classifier was used for classifying test images in the projected spaces.

4.3 Comparison Algorithms

To demonstrate the effectiveness and efficiency of our proposed LBDAG-DNE, we compared it with three other state-of-the-art algorithms. The following is a list of information concerning the experimental settings of each method:

  1. (1)

    DNE: discriminant neighborhood embedding proposed in [12].

  2. (2)

    MFA: marginal Fisher analysis proposed in [11].

  3. (3)

    DAG-DNE: double adjacency graphs-based discriminant neighborhood embedding proposed in [13].

4.4 Performance Metric

The classification result was evaluated by comparing the obtained label of each sample with the label provided by the data set. We used the accuracy [11, 12] to measure the classification performance. Given a data point \( {\mathbf{x}}_{i} \), let \( c({\mathbf{x}}_{i} ) \) and \( c'({\mathbf{x}}_{i} ) \) be the obtained classification label and the label provided by the corpus, respectively. The accuracy is defined as follows:

$$ Accuracy = \frac{{\sum\nolimits_{i = 1}^{N} {\delta (c({\mathbf{x}}_{i} ),c'({\mathbf{x}}_{i} ))} }}{N} $$
(9)

where \( N \) is the total number of samples, and \( \delta (a,b) \) is the delta function that equals one if \( a = b \) and equals zero otherwise.

4.5 Experimental Results

To evaluate the effectiveness and correctness of the proposed algorithm, experiments were carried out on the MNIST, UMIST, and ORL databases, and the results were compared with those of DNE, MFA, and DAG-DNE.

In the parameter selection step, we randomly selected 60% of the images from the 60% training set as the training set, and the remaining 40% of the images from the 60% training set as the test set to selection parameters and then used the result to choose \( \beta \).

4.5.1 Results with Handwritten Dataset

For the MNIST data set, we considered five classes, including the digits 1, 3, 5, 7, and 9. For each class, we randomly selected 50 samples from the original training set as our training samples, and 50 samples from the original test set as our test samples. Figure 1 shows some image samples from the MNIST dataset. The performances of the four methods are reported in Fig. 2. We used \( K = 1 \), 3, 5, and 7 to construct the adjacency graphs for all the methods.

Fig. 1.
figure 1

Sample face images from MNIST database

Fig. 2.
figure 2

Classification accuracy on MNIST database

Here, we mainly focus on the effect of the dimensionality of the discriminant subspace on the classification accuracy under different choices for the nearest neighbor parameter \( K \). Without prior knowledge, \( K \) was set to be 1, 3, 5, and 7. PCA was utilized to reduce the dimensionality from 784 to 80. We repeated 30 trials and report the average results. Figure 2(a), (c), (e), and (g) shows the accuracy of the four methods with different dimensions and different values of \( K \). Figure 2(a), (c), (e), and (g) shows that the classification accuracies of all four methods increase rapidly, and then almost become stable. More importantly, we can obviously see that LBDAG-DNE performs better than DNE, MFA, and DAG-DNE across a wide dimensionality range on the MNIST dataset, and the increase for LBDAG-DNE is the most rapid.

From Fig. 2(b), (d), (f), and (h), we can observe that LBDAG-DNE can obtain a good performance at a relatively low discriminant subspace, and can reduce the computational complexity and improve the classification performance.

Thus, the experimental results on the MNIST dataset illustrate that LBDAG-DNE outperforms the other algorithms. In spite of the variation in \( K \), LBDAG-DNE has the highest recognition accuracy among these methods.

4.5.2 Results with UMIST Dataset

For UMIST datasets, we randomly selected 20% of the images from the database as training samples, with the remaining 80% used as test samples. Figure 3 shows some image samples from the UMIST dataset. We repeated 20 runs and report the average results and corresponding parameters in Table 2.

Fig. 3.
figure 3

Sample face images from UMIST database.

Table 2. Best average recognition rates of all methods on UMIST dataset.

First, we consider the parameter selection. The nearest neighbor parameter \( K \) is selected from the set \( \{ 1,3\} \). Figure 4 illustrates the relationship between the accuracy and the value of \( \beta \). From Fig. 4, we know that the accuracy is not the highest when \( \beta = 1 \), where \( \beta \) is a tuning parameter that balances the tradeoff between intra-class information and inter-class information. The intra-class information and inter-class information play different roles in the classification task.

Fig. 4.
figure 4

Average recognition rates vs.\( \beta \)

Figure 5(a) and (c) shows the accuracies of the four methods vs. the dimensionality of the subspace with different \( K \). Figure 5(b) and (d) shows the relationship for the subspace dimension with the best accuracy. As seen in Fig. 5(a) and (c), the classification accuracies of all four algorithms increase rapidly. However, LBDAG-DNE has the fastest increase. From Fig. 5(b) and (d), we can see that LBDAG-DNE has the lowest discriminant subspace, which provides a good performance.

Fig. 5.
figure 5

Recognition rates for different parameters on UMIST database

Furthermore, Table 2 reports the best average recognition rates on the test sets for all of the methods, along with the corresponding dimension of the reduced subspace under different values of \( K \). In spite of the variation in \( K \), LBDAG-DNE has the highest recognition rate among these algorithms.

Based on the results of the handwriting and face recognition experiments, we can see that the classification performance of LBDAG-DNE is the best compared to DNE, MFA, and DAG-DNE. This suggests that the intra-class information and inter-class information have different degrees of importance for classification. In other words, they play different roles in the classification task. Moreover, the superiority of LBDAG-DNE was effectively demonstrated in all of the experiments. We could reduce the computational complexity and improve the classification using LBDAG-DNE to extract the effective features.

5 Conclusion

The superior computing power of cloud computing makes it possible to utilize tuning parameters to select the best features. In this paper, we proposed a novel supervised discriminant subspace learning algorithm, called LBDAG-DNE, with the goal of learning a good embedded subspace from the original high-dimensional space for classification. LBDAG-DNE maintains the intra-class and inter-class structure by constructing adjacency graphs and balances them by introducing a balance parameter. More importantly, by introducing a balance parameter, it can also regulate the different levels of the intra-class information and inter-class information. Thus, LBDAG-DNE could find an optimal projection matrix. Experimental results show that LBDAG-DNE could achieve the best classification performance in comparison with several related state-of-the-art methods.