Multi-class Semi-supervised Logistic I-RELIEF Feature Selection Based on Nearest Neighbor

Tang, Baige; Zhang, Li

doi:10.1007/978-3-030-16145-3_22

Baige Tang¹⁹ &
Li Zhang^19,20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11440))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2288 Accesses
4 Citations

Abstract

The multi-class semi-supervised logistic I-RELEIF (MSLIR) algorithm has been proposed and showed its feature selection ability using both labeled and unlabeled samples. Unfortunately, MSLIR is poor when predicting labels for unlabeled samples. To solve this issue, this paper presents a novel multi-class semi-supervised logistic I-RELEIF based on nearest neighbor (MSLIR-NN) for multi-class feature selection tasks. To generate better margin vectors for unlabeled samples, MSLIR-NN uses the nearest neighbor scheme to first predict the labels of unlabeled samples and then calculates their margin vectors according to these estimated labels. Experimental results demonstrate that MSLIR-NN can improve the prediction accuracy of unlabeled data.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Semi-supervised Feature Selection Based on Logistic I-RELIEF for Multi-classification

Multi-label K-Nearest Neighbor Classification Method Based on Semi-supervised

MASS: A Semi-supervised Multi-label Classification Algorithm with Specific Features

Keywords

1 Introduction

In many fields such as data mining and machine learning, we usually need to deal with high-dimensional data which may contain a large number of irrelevant and redundant features. These features would lead to the sparsity of data distribution in the feature space and be a hindrance to data analysis tasks. The rapid growth of data dimension not only increases the computational cost and memory consumption, but also affects the classification performance of classifiers. In order to improve learning performance, a variety of data dimensionality reduction methods have been produced, among which feature selection is one of the most effective techniques for processing high-dimensional data [1, 2].

The main goal of feature selection is to select an optimal feature subset, which contains most useful information in original features and has the greatest correlation with classification tasks. Based on the optimal feature subset, the training time of classifiers can be effectively shorten, and the learning performance could be enhanced further [3,4,5]. In recent years, the technology of feature selection is of diversity. A lot of feature methods have been proposed [9,10,11, 15, 17, 18]. Here, we consider RELIEF-based methods.

Kira and Rendell proposed the famous Relief algorithm in 1992, which uses the Euclidean distance as a metric to select features with great weights [6]. In 1994, Kononenko presented the extended algorithm RELIEF-F to solve the problem of multi-class classification [7]. On the basis of RELIEF, Sun et al. proposed an iterative RELEIF (I-RELIEF) algorithm to alleviate the deficiencies of RELEIF by exploring the framework of the expectation-maximization algorithm [8]. In order to better estimate feature weights, Sun et al. also proposed a logistic I-RELIEF (LIR) algorithm, which optimizes I-RELIEF in the form of logistic regression [16]. All of the above RELIEF-based algorithms are supervised feature selection ones, which can only use data with class labels. However, these methods cannot have a good performance when there exist few labeled data and a large number of unlabeled data. To remedy it, a semi-supervised logistic I-RELEIF (SLIR) method was presented [16]. In SLIR, both labeled and unlabeled data are used to calculate margin vectors of samples. However, SLIR was designed only for binary classification tasks. Tang et al. developed a multi-class semi-supervised logistic I-RELIEF (MSLIR) algorithm [19]. MSLIR designs a novel scheme to find margin vectors of unlabeled samples by calculating all possible candidate margin vectors and picking an optimal one under the condition of current feature weights. However, although MSLIR can implement multi-class feature selection in semi-supervised learning and get better classification performance than LIR, the supervised learning method, the prediction performance of unlabeled samples is unsatisfactory.

In order to solve the above issue, we propose a multi-class semi-supervised logistic I-RELEIF based on nearest neighbor (MSLIR-NN) for multi-class feature selection. In MSLIR-NN, the nearest neighbor scheme is adopted to assign pseudo labels to unlabeled samples according to labeled data in each iteration. In this case, unlabeled samples with pseudo labels could be treated as labeled ones. Thus, the margin vectors of labeled and unlabeled samples could be calculated easily. MSLIR-NN has a smaller computational complexity than MSLIR when calculating margin vectors of unlabeled samples. In experiments, support vector machine (SVM) and nearest neighbor (NN) classifiers are used to ensure the fairness of the classification results, respectively. Experimental results show that MSLIR-NN greatly improves the prediction performance of unlabeled data and enhances the performance of classifiers.

The rest part of this paper is organized as follows. The proposed method is described in detail in Sect. 2. The connections of MSLIR-NN to other related work are also discussed. Section 3 gives and analyzes experimental results. Section 4 concludes this paper.

2 Proposed Method: MSLIR-NN

In this section, we design a novel multi-class semi-supervised feature selection method, MSLIR-NN which adopts the nearest neighbor scheme to assign pseudo labels to unlabeled samples in each iteration. In doing so, unlabeled samples with pseudo labels could be treated as labeled ones. Thus, the margin vectors of labeled and unlabeled samples could be calculated easily. In the following, we describe MSLIR-NN in detail and discuss its connections to MSLIR and SLIR.

2.1 Margin Vectors

Assume that there is a labeled sample set $D _{l}=\{(\varvec{\mathrm {x}}^{l}_{i},y^{l}_{i})\}^{L}_{i=1}$ and an unlabeled sample set $D _{u}=\{\varvec{\mathrm {x}}^{u}_{i}\}^{U}_{i=1}$, where $\mathbf x _i^l \in \mathbb {R}^I$, $y^{l}_{i}\in \{1,2,\dots ,c\})$, $\mathbf x _i^u \in \mathbb {R}^I$, I is the number of original features, c is the class number, L and U represent the number of labeled and unlabeled samples, respectively. Generally, $L\ll U$.

It is well known that one of main differences of RELIEF-based methods is the way of calculating margin vectors of samples. Without loss of generality, let $\varvec{\mathrm {z}}_i^{l}$ and $\varvec{\mathrm {z}}_i^{u}$ be margin vectors of the labeled sample $\varvec{\mathrm {x}}_i^{l}$ and the unlabeled sample $\varvec{\mathrm {x}}_i^{u}$, respectively. It is easy to generate the margin vectors of labeled samples in semi-supervised RELIEF-based methods. Similar to MSRIL [19], the margin vector $\mathbf z ^{l}_{i}$ of the labeled sample $\varvec{\mathrm {x}}^{l}_{i}$ can be expressed as follows:

$$\begin{aligned} \varvec{\mathrm {z}}^l_{i}&=\sum _{\varvec{\mathrm {x}}^{l}_{k}\in M _{i}}{} P (\varvec{\mathrm {x}}^{l}_{k}=NM (\varvec{\mathrm {x}}^{l}_{i})|\varvec{\mathrm {w}})|\varvec{\mathrm {x}}^{l}_{i}-\varvec{\mathrm {x}}^{l}_{k}|\nonumber \\&-\sum _{\varvec{\mathrm {x}}^{l}_{k}\in H _{i}}{} P (\varvec{\mathrm {x}}^{l}_{k}=NH (\varvec{\mathrm {x}}^{l}_{i})|\varvec{\mathrm {w}})|\varvec{\mathrm {x}}^{l}_{i}-\varvec{\mathrm {x}}^{l}_{k}| \end{aligned}$$

(1)

where $\mathbf w $ is the feature weight vector, the set $M _{i}=\{\varvec{\mathrm {x}}^{l}_{k}|(\varvec{\mathrm {x}}^{l}_{k},y^{l}_{k})\in D _{l},y^{l}_{i}\ne y^{l}_{k},k=1,\cdots ,L, y^{l}_{k}\in \{1,\cdots ,c\}\}$ contains all labeled samples that have different labels from $\varvec{\mathrm {x}}^{l}_{i}$, the set $H _{i}=\{\varvec{\mathrm {x}}^{l}_{k}|(\varvec{\mathrm {x}}^{l}_{k},y^{l}_{k})\in D _{l},y^{l}_{i}=y^{l}_{k},k=1,\cdots ,L, y^{l}_{k}\in \{1,\cdots ,c\}\}$ contains all labeled samples that have the same label as $\varvec{\mathrm {x}}^{l}_{i}$, $P (\varvec{\mathrm {x}}_{k}^{l}=NM (\varvec{\mathrm {x}}_{i}^l)|\varvec{\mathrm {w}})$ and $P (\varvec{\mathrm {x}}_{k}^l=NH (\varvec{\mathrm {x}}_{i}^l)|\varvec{\mathrm {w}})$ are the probabilities that the sample $\varvec{\mathrm {x}}_{k}^l$ is the nearest miss and the nearest hit of $\varvec{\mathrm {x}}_{i}^l$, respectively, $NM (\varvec{\mathrm {x}}_{i}^l)$ represents the nearest miss (the nearest neighbor of sample $\varvec{\mathrm {x}}^{l}_{i}$ from a different class) of $\varvec{\mathrm {x}}_{i}^l$, and $NH (\varvec{\mathrm {x}}_{i}^l)$ the nearest hit (the nearest neighbor of sample $\varvec{\mathrm {x}}^{l}_{i}$ from the same class) of $\varvec{\mathrm {x}}_{i}^l$.

For semi-supervised RELIEF-based methods, it is the key and difficulty that how to define the margin vectors of unlabeled samples. Before calculating them, we first predict the pseudo labels of unlabeled data. According to the information contained in the labeled set $D _{l}$, we use the nearest neighbor scheme to predict the pseudo labels of unlabeled data in the set $D _{u}$. Note that $\mathbf w $ changes as iterations. For any unlabeled sample, its neighborhood is metabolic under different weight conditions. In other words, $\mathbf w $ has an effect on the procedure of searching nearest neighbors. Thus, we search nearest neighbors of unlabeled samples in the weighted feature space instead of the original input space. Then, we extend $D_u$ to the set $\hat{D}_{u}=\{(\mathbf x _i^u,\hat{y}_i^u)\}^{U}_{i=1}$ with pseudo labels $\hat{y}_i^u$ for $\mathbf x _i^u$. Similar to (1), we define the margin vector $\varvec{\mathrm {z}}_i^u$ of the unlabeled sample $\mathbf x _{i}^u$:

$$\begin{aligned} \varvec{\mathrm {z}}^u_{i}&=\sum _{\varvec{\mathrm {x}}^{l}_{k}\in M '_{i}}{} P (\varvec{\mathrm {x}}^{l}_{k}=NM (\varvec{\mathrm {x}}^{u}_{i})|\varvec{\mathrm {w}})|\varvec{\mathrm {x}}^{u}_{i}-\varvec{\mathrm {x}}^{l}_{k}|\nonumber \\&-\sum _{\varvec{\mathrm {x}}^{l}_{k}\in H '_{i}}{} P (\varvec{\mathrm {x}}^{l}_{k}=NH (\varvec{\mathrm {x}}^{u}_{i})|\varvec{\mathrm {w}})|\varvec{\mathrm {x}}^{u}_{i}-\varvec{\mathrm {x}}^{l}_{k}| \end{aligned}$$

(2)

where the set $M '_{i}=\{\varvec{\mathrm {x}}^{l}_{k}|(\varvec{\mathrm {x}}^{l}_{k},y^{l}_{k})\in D _{l},\hat{y}^{u}_{i}\ne y^{l}_{k},k=1,\cdots ,L, y^{l}_{k}\in \{1,\cdots ,c\}\}$ contains all labeled samples that have different labels from $\varvec{\mathrm {x}}^{u}_{i}$, and the set $H '_{i}=\{\varvec{\mathrm {x}}^{l}_{k}|(\varvec{\mathrm {x}}^{l}_{k},y^{l}_{k})\in D _{l},\hat{y}^{u}_{i}=y^{l}_{k},k=1,\cdots ,L, y^{l}_{k}\in \{1,\cdots ,c\}\}$ contains all labeled samples that have the same label as $\varvec{\mathrm {x}}^{u}_{i}$.

2.2 Optimization Problem

After obtaining margin vectors of all samples, the optimization of MSLIR-NN can be described as:

$$\begin{aligned} \min _{\varvec{\mathrm {w}}}&\Vert \varvec{\mathrm {w}}\Vert _{1}+\alpha \sum ^{L}_{i=1}\log (1+\exp (-\varvec{\mathrm {w}}^{T}\varvec{\mathrm {z}}^{l}_{i}))+ \beta \sum ^{U}_{i=1}\log (1+\exp (-\varvec{\mathrm {w}}^{T}\varvec{\mathrm {z}}^{u}_{i}))\\ \nonumber s.t.&~~\varvec{\mathrm {w}}\ge 0 \end{aligned}$$

(3)

where $\Vert \cdot \Vert _{1}$ is the 1-norm, the regularization parameters $\alpha \ge 0$ and $\beta \ge 0$ control the importance of labeled and unlabeled samples, respectively.

To eliminate the constraint of $\varvec{\mathrm {w}}\ge 0$, let $\varvec{\mathrm {w}}=[v^{2}_{1},\cdots ,v^{2}_{I}]^{T}$ and $\varvec{\mathrm {v}}=[v_{1},\cdots ,v_{I}]$. Substituting $\mathbf v $ into (3), we can make it to an unconstraint optimization problem and have

$$\begin{aligned} \min _{\varvec{\mathrm {v}}} J=\Vert \varvec{\mathrm {v}}\Vert ^{2}_{2}&+\alpha \sum ^{L}_{i=1}\log (1+\exp (-\sum ^{I}_{d=1}v^{2}_{d}z^{l}_{id}))\nonumber \\&+\beta \sum ^{U}_{i=1}\log (1+\exp (-\sum ^{I}_{d=1}v^{2}_{d}z^{u}_{id})) \end{aligned}$$

(4)

where $\varvec{\mathrm {z}}^{l}_{i}=[z^{l}_{i1},\cdots ,z^{l}_{iI}]$ and $\varvec{\mathrm {z}}^{u}_{i}=[z^{u}_{i1},\cdots ,z^{u}_{iI}]$. (4) can be solved by using the gradient descent method. The derivation of J to $\varvec{\mathrm {v}}$ can be written as follows:

$$\begin{aligned} \frac{\partial J}{\partial v_{k}}= 2v_{k}&-\alpha \sum ^{L}_{i=1}\frac{\exp (-\sum ^{I}_{d=1}v^{2}_{d}z^{l}_{id})(2v_{k}z^{l}_{ik})}{1+\exp (-\sum ^{I}_{d=1}v^{2}_{d}z^{l}_{id})}\nonumber \\&-\beta \sum ^{U}_{i=1}\frac{\exp (-\sum ^{I}_{d=1}v^{2}_{d}z^{u}_{id})(2v_{k}z^{u}_{ik})}{1+\exp (-\sum ^{I}_{d=1}v^{2}_{d}z^{u}_{id})} \end{aligned}$$

(5)

Let

$$\begin{aligned} Q =\alpha \sum ^{L}_{i=1}\frac{\exp (-\sum ^{I}_{d=1}v^{2}_{d}z^{l}_{id})(v_{k}z^{l}_{ik})}{1+\exp (-\sum ^{I}_{d=1}v^{2}_{d}z^{l}_{id})}+\beta \sum ^{U}_{i=1}\frac{\exp (-\sum ^{I}_{d=1}v^{2}_{d}z^{u}_{id})(v_{k}z^{u}_{ik})}{1+\exp (-\sum ^{I}_{d=1}v^{2}_{d}z^{u}_{id})} \end{aligned}$$

Then we can update $v_k$ by

$$\begin{aligned} v_{k}\leftarrow v_{k}-\eta (v_{k}-Q) \end{aligned}$$

(6)

where $\eta >0$ is the learning rate.

2.3 Algorithm and Complexity Analysis

The algorithm description of MSLIR-NN is shown in Algorithm 1. Given a labeled dataset $D_{l}$ and an unlabeled dataset $D_{u}$, the weight vector $\mathbf {w}$ is updated iteratively. Note that the weight vector in the t-th iteration is denoted as $\varvec{\mathrm {w}}_{(t-1)}$. First, under the current feature weights, MSLIR-NN computes the weighted labeled samples $\varvec{\mathrm {x}}^{l*}$ and unlabeled $\varvec{\mathrm {x}}^{u*}$, that is: $\varvec{\mathrm {x}}^{l*}$=$\varvec{\mathrm {x}}^l\circ \varvec{\mathrm {w}}_{(t)}$ and $\varvec{\mathrm {x}}^{u*}$ = $\varvec{\mathrm {x}}^u\circ \varvec{\mathrm {w}}_{(t-1)}$, where $\circ $ denotes the element-by-element multiplication. The pseudo labels of unlabeled samples are determined in the weighted sample space using the NN scheme. Then, the margin vectors of $\varvec{\mathrm {x}}^{l}$ and $\varvec{\mathrm {x}}^{u}$ are calculated by (1) and (2), respectively. Finally, $\varvec{\mathrm {w}}$ is obtained by solving (3). MSLIR-NN alternatively modifies the weight vector until convergence.

The computational complexity of MSLIR-NN mainly includes three parts: the calculation of margin vectors for labeled samples, the calculation of margin vectors for unlabeled samples, and the solution to the optimization problem (3). The computational complexity of calculating margin vectors for labeled samples is identical to that of MSLIR and SLIR, which is $O(dL^2)$ without considering the calculation of probability terms, where d is the dimension of samples, and L is the number of labeled samples. For calculating of margin vectors for unlabeled samples, the computational complexity in MSLIR-NN is about O(dUL), where U is the number of unlabeled samples. For the last part, MSLIR-NN has the same computational complexity as MSLIR and SLIR.

2.4 Connections to Related Work

Four RELEIF-based methods, LIR, SLIR, MSLIR and MSLIR-NN use the logistic regression formulation to optimize the feature weight vector $\mathbf w $. We discuss the connections of MSLIR-NN to LIR, SLIR, MSLIR in the following.

MSLIR-NN is a semi-supervised learning method as well as SLIR and MSLIR, LIR is designed for supervised learning. Base on LIR, SLIR introduces a term about unlabelled samples into the objective function. MSLIR changes the objective function of SLIR, which makes a balance calculation between labeled and unlabeled samples. Although MSLIR-NN has the same optimization function as MSLIR, MSLIR-NN has a different way for computing margin vectors of unlabelled samples.

SLIR, MSLIR and MSLIR-NN all adopt the way of calculating margin vectors of labeled samples in LIR. It is intuitive for SLIR to get margin vectors of unlabeled samples since SLIR deals with only binary classification tasks. For a given unlabeled sample, MSLIR first calculates all possible candidate margin vectors and takes an optimal one in the weighted feature space as its margin vector. The computational complexity of MSLIR is O(cdLU) when calculating margin vectors of unlabel samples. MSLIR-NN first assigns a pseudo label for the unlabeled sample and then directly calculate its margin vector as label samples. Compared to MSLIR, MSLIR-NN has a lower complexity, or O(dLU), which is independent of the class number c and identical to SLIR.

3 Experiments

We conduct extensive experiments to demonstrate the efficiency and effectiveness of MSLIR-NN. Ten UCI datasets [20] including Pendigits, Satimage, Waveform, Wine, Vehicle, Iris, Breast, Heart, Wdbc and Pima are adopted, where the first six datasets are multi-class, and the rest four ones are binary. All datasets are randomly divided into training and test subsets, and the training subsets contain labeled and unlabeled samples. A brief description of datasets is listed in Table 1, where “#Training” and “#Test” represent the number of training and test samples, respectively, “#Labeled” and “#Unlabeled” are the number of labeled and unlabeled samples in a training set, “#Feature” represents the dimension of samples, and “#Class” indicates the number of categories in datasets. For each dataset, we add 100 additional noise features which are independently Gaussian distributed. We normalize all features with the original data.

Table 1. Description of ten UCI datasets.

Full size table

In our experiments, the compared methods include RELIEF-F, LIR, SLIR, MSLIR and MSLIR-NN. Both RELIEF-F and LIR use only labeled data, while SLIR, MSLIR and MSLIR-NN use both labeled and unlabeled data. The classification performance is tested on the same test subsets. In LIR, the regularization parameter $\lambda $ and learning rate $\eta $ are 10 and 0.03, respectively. In MSLIR-NN, MSLIR and SLIR, the parameters $\alpha $ and $\beta $ are 10 and 0.1, respectively, and the learning rate is the same as that of LIR.

To eliminate the effect of statistical error, each algorithm runs 10 times for each dataset, and takes the average result as the final one. In order to ensure the reliability of experimental results, the nearest neighbor (NN) and support vector machine (SVM) classifiers are used in experiments. Here the Gaussian kernel and regularization parameters in SVM are selected by the grid search method, where both vary from $2^{-10}$ to $2^{10}$.

3.1 Experiments on Multi-class Datasets

For multi-classification tasks, we compare the proposed algorithm with other three methods MSLIR, LIR and RELIEF-F. Experiments are implemented on the Pendigits, Satimage, Waveform, Wine, Vehicle and Iris datasets. The performance of these feature selection algorithms are evaluated by the classification accuracy with selected features, and the final experimental results obtained by SVM are shown in Fig. 1. From Fig. 1, we can observe that the classification accuracy of MSLIR-NN is the best among compared methods, which indicates that features selected by MSLIR-NN have a greater correlation with the label information. In Figs. 1(a) and (c), it can be observed that the classification accuracy of the four algorithms increases as increasing the number of features, which demonstrates that the chosen features are all useful features for classification and the noisy features are excluded. In Figs. 1(b), (d), (e) and (f), the classification accuracy tends to be steady or decreasing, which indicates that feature selection is useful. In other words, not all original features are related to classification tasks. For example, in the Wine dataset, when the number of selected features is eight, MSLIR-NN achieves the highest classification accuracy of 98%, and the last four selected features are likely to be irrelevant features.

Table 2. Classification accuracy and standard deviations (%) of NN using four feature selection methods on multi-class datasets

Full size table

The performance curves obtained by NN are shown in Fig. 2. From these figures, we can have similar conclusions as those from Fig. 1. We give the best average accuracies and the corresponding standard deviations of four methods in Table 2, where the best results are bolded. Compared with the other three methods, MSLIR-NN has a higher classification accuracy and smaller standard deviation, reflecting that our method has better stability than the previous MSLIR.

3.2 Experiments on Binary Datasets

Similar to MSLIR, MSLIR-NN can also be applied to binary classification tasks. Since SLIR is only applicable to binary classification tasks, we compare MSLIR-NN and MSLIR with it on Breast, Heart, Wdbc and Pima datasets.

The experimental results obtained by SVM are given in Fig. 3. We can see that MSLIR-NN is much better than both MSLIR and SLIR, especially in Figs. 3(a), (c) and (d).

The performance curves obtained by NN are shown in Fig. 4. MSLIR-NN still has an advantage over other two methods. We list the best average classification accuracies and corresponding standard deviations in Table 3. Obviously, MSLIR-NN has the best performance among three methods on four binary datasets, followed by MSLIR. Compared to MSLIR, MSLIR-NN is improved 2.03% accuracy on Heart and 5.73% the accuracy on Pima, respectively.

Table 3. Classification accuracy and standard deviations (%) of NN using three feature selection methods on binary datasets

Full size table

3.3 Comparison of MSLIR-NN and MSLIR

MSLIR-NN and MSLIR have the same objective function, and different ways for constructing margin vectors of unlabeled samples. Here, we compare them in two aspects, the prediction ability on unlabeled samples in training subsets and the running time of feature selection.

Table 4 lists the accuracy on unlabeled samples and running time of feature selection for two methods. We can see that MSLIR-NN is significantly better than MSLIR on the prediction performance, which indicates that our new proposed algorithm can use a few labeled data to predict the label of unlabeled data. Thus we can calculate the margin vectors of unlabeled samples more accurately, which can improve the stability of algorithm. The computational complexity of algorithms can be reflected by the running time of feature selection. Obviously, MSLIR-NN is much faster than MSLIR on all ten datasets, which supports our analysis about computational complexity in Sect. 2.3.

Table 4. Comparison of MSLIR-NN and MSLIR algorithms

Full size table

4 Conclusions

In this paper, we propose MSLIR-NN based on MSLIR for multi-class semi-supervised feature selection by introducing the nearest neighbor scheme. MSLIR-NN has a less complexity than MSLIR, and can improve the accuracy of label prediction for unlabeled data which contributes to the calculation way of margin vectors of unlabeled samples. Extensive experiments are performed on binary and multi-class classification tasks. Two classical classifiers NN and SVM are used to implement classification after feature selection has finished. On multi-class datasets, MSLIR-NN is superior to supervised methods LIR and RELEIF-F, and the semi-supervised method MSLIR. On the binary datasets, MSLIR-NN performs the best among three semi-supervised methods. In experiments of comparison with MSLIR, MSLIR-NN unfolds its ability in predicting labels of unlabeled samples and speedability. In general, MSLIR-NN can extract useful features and achieve better performance.

References

Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(6), 1157–1182 (2003)
MATH Google Scholar
Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)
Article Google Scholar
Benabdeslem, K., Hindawi, M.: Efficient semi-supervised feature selection: constraint, relevance, and redundancy. IEEE Trans. Knowl. Data Eng. 26(5), 1131–114326 (2014)
Article Google Scholar
Zhang, D., Chen, S., Zhou, Z.H.: Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recogn. 41(5), 1440–1451 (2008)
Article MATH Google Scholar
Sheikhpour, R., Sarram, M.A., Gharaghani, S., et al.: A survey on semi-supervised feature selection methods. Pattern Recogn. 64(C), 141–158 (2016)
MATH Google Scholar
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Tenth National Conference on Artificial Intelligence, pp. 129–134 (1992)
Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: European Conference on Machine Learning on Machine Learning, pp. 171–182 (1994)
Google Scholar
Sun, Y.: Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1035–1051 (2007)
Article Google Scholar
Cheng, Z.D., Zhang, Y.J., Fan, X., Zhu, B.: Study on discriminant matrices of commonly used fisher discriminant functions. Acta Autom. Sinica 36(10), 1361–1370 (2010)
Article MathSciNet Google Scholar
Chen, L.F., Liao, H.Y.M., Ko, M.T., Lin, J.C., Yu, G.J.: A new LDA based face recognition system which can solve the small sample size problem. Pattern Recogn. 33(10), 1713–1726 (2000)
Article Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Comput. Soc. 27(8), 1226 (2005)
Google Scholar
Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)
Article Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: International Conference on Neural Information Processing Systems, vol. 18, pp. 507–514 (2005)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Zeng, Z., Wang, X.D., Zhang, J., Wu, Q.: Semi-supervised feature selection based on local discriminative information. Neurocomputing 173(P1), 102–109 (2016)
Article Google Scholar
Cheng, Y., Cai, Y., Sun, Y., Li, J.: Semi-supervised feature selection under the Logistic I-RELIEF framework. In: International Conference on Pattern Recognition, pp. 1–4 (2008)
Google Scholar
Zhao, Z., Liu, H.: Semi-supervised feature selection via spectral analysis. In: SIAM International Conference on Data Mining, SIAM 2007, pp. 641–646. SIAM, Minneapolis (2007)
Google Scholar
Xu, J., Tang, B., He, H., Man, H.: Semi-supervised feature selection based on relevance and redundancy criteria. IEEE Trans. Neural Netw. Learn. Syst. 28(9), 1974–1984 (2016)
Article Google Scholar
Tang, B., Zhang, L.: Semi-supervised feature selection based on logistic I-RELIEF for multi-classification. In: Geng, X., Kang, B.-H. (eds.) PRICAI 2018. LNCS (LNAI), vol. 11012, pp. 719–731. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97304-3_55
Chapter Google Scholar
UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets.html

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant No. 61373093, by the Soochow Scholar Project of Soochow University, and by the Six Talent Peak Project of Jiangsu Province of China.

Author information

Authors and Affiliations

School of Computer Science and Technology, Joint International Research Laboratory of Machine Learning and Neuromorphic Computing, Soochow University, Suzhou, 215006, Jiangsu, China
Baige Tang & Li Zhang
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Li Zhang

Authors

Baige Tang
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang .

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang
Nanjing University, Nanjing, China
Zhi-Hua Zhou
University of Macau, Taipa, Macau, China
Zhiguo Gong
Southeast University, Nanjing, China
Min-Ling Zhang
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Sheng-Jun Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, B., Zhang, L. (2019). Multi-class Semi-supervised Logistic I-RELIEF Feature Selection Based on Nearest Neighbor. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11440. Springer, Cham. https://doi.org/10.1007/978-3-030-16145-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-16145-3_22
Published: 22 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16144-6
Online ISBN: 978-3-030-16145-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-class Semi-supervised Logistic I-RELIEF Feature Selection Based on Nearest Neighbor

Abstract

Similar content being viewed by others

Semi-supervised Feature Selection Based on Logistic I-RELIEF for Multi-classification

Multi-label K-Nearest Neighbor Classification Method Based on Semi-supervised

MASS: A Semi-supervised Multi-label Classification Algorithm with Specific Features

Keywords

1 Introduction