Semi-supervised Multi-class Classification Methods Based on Laplacian Vector Projection

Xue, Yangtao; Zhang, Li

doi:10.1007/978-981-99-5844-3_12

Yangtao Xue¹² &
Li Zhang¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1869))

Included in the following conference series:

International Conference on Neural Computing for Advanced Applications

478 Accesses
1 Citations

Abstract

Laplacian pair-weight vector projection (LapPVP) is a binary classifier for semi-supervised learning, which seeks a pair of projection vectors only for two-class data. This paper extends LapPVP to semi-supervised multi-class classification tasks and proposes two novel semi-supervised multi-class methods, named one-versus-one LapPVP (OVO-LapPVP) and one-versus-rest LapPVP (OVR-LapPVP). By using the strategy of “one-versus-one”, OVO-LapPVP decomposes a semi-supervised multi-class classification task into multiple binary problems that can be directly solved by multiple LapPVPs. Considering the concept of “one-versus-rest”, OVR-LapPVP is designed for generating multiple hyperplanes for multiple classes, one for each class. The above proposed semi-supervised multi-class classification methods both consider the discriminative information of labeled data and graph structure of unlabeled data. Experiments are conducted on nine UCI datasets to display the classification performance for multi-class data. Compared with other popular semi-supervised multi-class classification methods based on manifold regularization, the proposed semi-supervised multi-class classification methods hold the advantage of LaPVP and have better performance.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Laplacain Pair-Weight Vector Projection with Adaptive Neighbor Graph for Semi-supervised Learning

Applying Regularization Least Squares Canonical Correlation Analysis in Extreme Learning Machine for Multi-label Classification Problems

Manifold proximal support vector machine with mixed-norm for semi-supervised classification

Article 01 October 2014

Keywords

1 Introduction

Twin support vector machine (TSVM) has become one of the popular binary classifiers because it can greatly reduce the computational complexity of support vector machine (SVM) [5]. TSVM obtains two nonparallel hyperplanes by two smaller-sized and related SVM-type problems and is a supervised binary classifier that require a large amount of labeled data. However, it is difficult for TSVM to deal with multi-classification problems. In order to inherit the advantages of TSVM, some multi-class versions of TSVM have been proposed for wider applications [9, 10].

Inspired by the manifold regularization, Laplacian TSVM (LapTSVM) was proposed for semi-supervised learning and become a useful extension of TSVM [8]. Similar to LapTSVM, Laplacian twin parametric-margin SVM (LapTPMSVM) [12] and Laplacian least squares TSVM (LapLSTSVM) [1] are the semi-supervised versions of twin parametric-margin SVM [7] and least squares TSVM [6], respectively, which are variants of TSVM. The above three semi-supervised methods all aim to generate two nonparallel hyperplanes so that each hyperplane is closer to its own class as far as possible from the other. Laplacian pair-weight vector projection (LapPVP) was also motivated by the idea of nonparallel hyperplanes [11]. LapPVP is an excellent binary classifier compared with other nonparallel hyperplanes methods. Furthermore, LapPVP integrates the between-class scatter, the within-class scatter, and the Laplacian regularization together for semi-supervised binary classification. Each class data provides the class-specific information by computing the between-class and within-class scatters, which enhances the power of discriminative representation. The Laplacian regularization provides the graph structure of labeled and unlabeled data, which is the intrinsic geometrical structure of data.

Although these semi-supervised binary classifiers are promising, their multi-class versions have been rarely explored. To solve the semi-supervised multi-class classification problems, we extend the formulation of LapPVP to its multi-class versions and propose two novel semi-supervised multi-class methods, named one-versus-one LapPVP (OVO-LapPVP) and one-versus-rest LapPVP (OVR-LapPVP). Researchers have developed the ideas of “one-versus-one” (OVO) and “one-versus-rest” (OVR) to decompose a multi-class problem into multiple binary sub-problems [3, 9]. Originally, LapPVP obtains a pair of projection vectors for two-class data. Both OVO-LapPVP and OVR-LapPVP solve a multi-class classification task using multiple LapPVPs, but differ in the number of LapPVPs. In specific, the multi-class versions of LapPVP have the following characteristics:

(1)
OVO-LapPVP obtains $C(C-1)/2$ pairs of projection vectors for C-class data, whereas OVR-LapPVP achieves C nonparallel hyperplanes for C-class data, one for each class.
(2)
The proposed multi-class classification methods are all solved by eigenvalue decomposition, which avoids finding solutions to complex quadratic programmings.
(3)
Experimental results on UCI datasets demonstrate the effectiveness of OVO-LapPVP and OVR-LapPVP.

2 Semi-supervised Multi-class LapPVPs

In this section, we describe the proposed OVO-LapPVP and OVR-LapPVP detailly. Now, we consider a semi-supervised multi-class classification task. Let $\textbf{X}_l = [\textbf{x}_1,\cdots ,\textbf{x}_l]^T\in \mathbb {R}^{l\times m}$ and $\textbf{X}_u = [\textbf{x}_{l+1},\cdots ,\textbf{x}_{n}]^T\in \mathbb {R}^{(n-l)\times m}$ be the labeled and unlabeled sample matrices, respectively, $\textbf{x}_i \in \mathbb {R}^m$ is an m-dimensional sample, l and n are the numbers of labeled and total samples, respectively. Let $y_i$ be the label of $\textbf{x}_i$, where $y_i \in [1,2,...,C]$, and C is the number of classes. Let $\textbf{X}_{l,c}$ be the labeled sample matrix belonging to the cth class. The total sample matrix can be denoted as $\textbf{X} = [\textbf{X}_{l,1};\cdots ;\textbf{X}_{l,C}; \textbf{X}_u]\in \mathbb {R}^{n\times m}$.

In LapPVP, we need to construct an adjacency graph using the K nearest neighbor method. The ith row and jth column element of the adjacency matrix $\textbf{S}$ induced by the adjacency graph can be represented as

$$\begin{aligned} S_{ij} = {\left\{ \begin{array}{ll} 1, &{} if~~\textbf{x}_i \in N_K(\textbf{x}_j)~~or~~\textbf{x}_j \in N_K(\textbf{x}_i) \\ 0, &{} otherwise \end{array}\right. }, \end{aligned}$$

(1)

where $N_K(\textbf{x}_i)$ is the set of K nearest neighbors of $\textbf{x}_i$. Then we get the classical Laplacian matrix $\textbf{L} = \textbf{D} - \textbf{S}$, where $\textbf{D}$ is the diagonal matrix with ${D}_{ii} = \sum _{j} {S}_{ij}$.

2.1 OVO-LapPVP

The one-versus-one strategy is a popular technique that can easily extend binary classifiers to multi-class ones. By using the one-versus-one strategy, OVO-LapPVP is presented, which can generate $C(C-1)/2$ binary classifiers for C-class data. Given a C-class dataset, we obtain the set of class pairs $Z =\{z=(c_1,c_2)| c_1,c_2=1,2,\cdots ,C, c_1\ne c_2\}$. To construct a binary classifier with class pair z, suppose class $c_1$ is the positive class, and class $c_2$ as the negative one. The corresponding LapPVP classifier is trained with labeled samples in both classes $c_1$ and $c_2$ and all unlabeled samples.

Let $\textbf{X}_z = [\textbf{X}_{l,c_1}; \textbf{X}_{l,c_2}; \textbf{X}_u] \in \mathbb {R}^{n_z \times m}$ with $z=(c_1,c_2)$, where $n_z$ is the total number of samples in both classes $c_1$ and $c_2$ and without labels. The objective functions of OVO-LapPVP with respect to z can be expressed as follows:

$$\begin{aligned} \begin{aligned}&\max \limits _{\textbf{v}_{c_1}} \textbf{v}_{c_1}^T \textbf{H}_{c_1} \textbf{v}_{c_1} - \rho _{c_1} \textbf{v}_{c_1}^T \textbf{X}_z^T \textbf{L}_z \textbf{X}_z \textbf{v}_{c_1} \\&s.t. \quad \textbf{v}_{c_1}^T \textbf{v}_{c_1} = 1 \end{aligned} \end{aligned}$$

(2)

and

$$\begin{aligned} \begin{aligned}&\max \limits _{\textbf{v}_{c_2}} \textbf{v}_{c_2}^T \textbf{H}_{c_2} \textbf{v}_{c_2} - \rho _{c_2} \textbf{v}_{c_2}^T \textbf{X}_z^T \textbf{L}_z \textbf{X}_z \textbf{v}_{c_2} \\&s.t. \quad \textbf{v}_{c_2}^T \textbf{v}_{c_2}= 1 \end{aligned}, \end{aligned}$$

(3)

where ${\textbf{v}_{c_1}} \in \mathbb {R}^{m}$ and ${\textbf{v}_{c_2}}\in \mathbb {R}^{m}$ are projection vectors, $\rho _{c_1}$ and $\rho _{c_2}$ are regularization parameters that are greater than 0, the Laplacian matrix $\textbf{L}_z\in \mathbb {R}^{n_z\times n_z}$ is computed based on $\textbf{X}_z$, and the discriminative matrices $\textbf{H}_{c_1}\in \mathbb {R}^{m \times m}$ and $\textbf{H}_{c_2} \in \mathbb {R}^{m \times m}$ are defined as

$$\begin{aligned} \begin{aligned} \textbf{H}_{c_1}&= \alpha _{c_1} \left( \textbf{X}_z - \textbf{e}_z \textbf{u}_{c_1}^T\right) ^T \left( \textbf{X}_z - \textbf{e}_z \textbf{u}_{c_1}^T\right) \\&- (1-\alpha _{c_1}) (\textbf{X}_{c_1} - \textbf{e}_{c_1}\textbf{u}_{c_1}^T )^T (\textbf{X}_{c_1} - \textbf{e}_{c_1} \textbf{u}_{c_1}^T) \end{aligned} \end{aligned}$$

(4)

and

$$\begin{aligned} \begin{aligned} \textbf{H}_{c_2}&= \alpha _{c_2} \left( \textbf{X}_z - \textbf{e}_z \textbf{u}_{c_2}^T\right) ^T \left( \textbf{X}_z - \textbf{e}_z \textbf{u}_{c_2}^T\right) \\&- (1-\alpha _{c_2}) (\textbf{X}_{c_2} - \textbf{e}_{c_2}\textbf{u}_{c_2}^T )^T (\textbf{X}_{c_2} - \textbf{e}_{c_2} \textbf{u}_{c_2}^T), \end{aligned} \end{aligned}$$

(5)

where $\alpha _{c_1}\in [0,1]$ and $\alpha _{c_2}\in [0,1]$ are parameters to balance the between-class scatter matrix and the within-class one, class centers $\textbf{u}_{c_j}=\frac{1}{|X_{l,c_j}|}\sum _{\textbf{x}_i\in X_{l,c_j}} \textbf{x}_{i}$ ($j=1,2$), $X_{l,c_j}$ is the set of labeled samples in the $c_j$th class, $\textbf{e}_z \in \mathbb {R}^{n_z}$ and $\textbf{e}_{c_j} \in \mathbb {R}^{|X_{l,c_j}|}$ ($j=1,2$) are vectors of all ones.

The following theorems describe the solutions to the optimization problems (2) and (3).

Theorem 1

The optimal solution $\textbf{v}_{c_1}$ of the optimization problem (2) is the eigenvector corresponding to the maximum eigenvalue of the matrix $(\textbf{H}_{c_1}-\textbf{X}_z^T \textbf{L}_z \textbf{X}_z)$.

Proof. To find the solution to (2), we first generate the corresponding Lagrangian function with positive multipliers $\lambda _{c_1}$. That is

$$\begin{aligned} \begin{aligned} L(\textbf{v}_{c_1}, \lambda _{c_1}) =&\textbf{v}^T_{c_1} \textbf{H}_{c_1} \textbf{v}_{c_1} - \rho _{c_1} \textbf{v}^T_{c_1} \textbf{X}_z^T \textbf{L}_z \textbf{X}_z \textbf{v}_{c_1} - \lambda _{c_1} (\textbf{v}^T_{c_1}\textbf{v}_{c_1} -1). \end{aligned} \end{aligned}$$

(6)

Next, we derive the partial derivative of $L(\textbf{v}_{c_1}, \lambda _{c_1})$ with respect to the primal variable $\textbf{v}_{c_1}$, and then make it equal zero, which results in

$$\begin{aligned} \begin{aligned}&\frac{\partial L(\textbf{v}_{c_1}, \lambda _{c_1})}{\partial \textbf{v}_{c_1}} = \left( \textbf{H}_{c_1}- \rho _{c_1} \textbf{X}_z^T \textbf{L}_z \textbf{X}_z\right) \textbf{v}_{c_1} - \lambda _{c_1} \textbf{v}_{c_1} = 0 \\&\Rightarrow \left( \textbf{H}_{c_1}- \rho _{c_1} \textbf{X}_z^T \textbf{L}_z \textbf{X}_z\right) \textbf{v}_{c_1} = \lambda _{c_1} \textbf{v}_{c_1}. \end{aligned} \end{aligned}$$

(7)

By introducing the Tikhonov regularization term in Eq. (7), the solution can be obtained by solving a classical eigenvalue problem. Finally, the solution $\textbf{v}_{c_1}$ can computed as the eigenvector corresponding to the maximum eigenvalue of $\left( \textbf{H}_{c_1}- \rho _{c_1} \textbf{X}_z^T \textbf{L}_z \textbf{X}_z\right) $. $\square $

Theorem 2

The optimal solution $\textbf{v}_{c_2}$ of the optimization problem (3) is the eigenvector corresponding to the maximum eigenvalue of the matrix $(\textbf{H}_{c_2}-\textbf{X}_z^T \textbf{L}_z \textbf{X}_z)$.

The proof of Theorem 2 is similar to that of Theorem 1; thus, we omit the proof of Theorem 2. Theorems 1 and 2 provide the optimal projection vectors $\textbf{v}_{c_1}$ and $\textbf{v}_{c_2}$ by performing eigen-decomposition, respectively. After obtaining $\textbf{v}_{c_1}$ and $\textbf{v}_{c_2}$ for class pair $z = (c_1,c_2)$, we can respectively compute the distances between an unknown test point $\textbf{x}$ and class centers $\textbf{u}_{c_j}~(j=1,2)$ by

$$\begin{aligned} \begin{aligned}&d(\textbf{x},\textbf{u}_{c_1}) = \Vert \textbf{v}_{c_1}^T \textbf{x} - \textbf{v}_{c_1}^T \textbf{u}_{c_1}\Vert \end{aligned} \end{aligned}$$

(8)

and

$$\begin{aligned} \begin{aligned}&d(\textbf{x},\textbf{u}_{c_2}) = \Vert \textbf{v}_{c_2}^T \textbf{x} - \textbf{v}_{c_2}^T \textbf{u}_{c_2}\Vert \end{aligned}, \end{aligned}$$

(9)

where $\Vert \cdot \Vert $ is the 2-norm of a vector. If $d(\textbf{x},\textbf{u}_{c_1}) \le d(\textbf{x},\textbf{u}_{c_2})$ for class pair z, then $\textbf{x}$ is more like a sample in class $c_1$.

To assign a class label to $\textbf{x}$, we construct a vote vector $\hat{\textbf{y}}=[\hat{y}_1,\cdots ,\hat{y}_C]^T$, where $\hat{y}_c$ is the vote for the cth class. Generally, the initial value of $\hat{y}_c$ is 0 for all $c=1,\cdots ,C$. For each class pair $z =(c_1,c_2)$, we update the vote for only class $c_1$ or class $c_2$. That is

$$\begin{aligned} \left\{ \begin{array}{c l} \hat{y}_{c_1} \leftarrow \hat{y}_{c_1}+1, &{} if~~d(\textbf{x},\textbf{u}_{c_1}) \le d(\textbf{x},\textbf{u}_{c_2})\\ \hat{y}_{c_2} \leftarrow \hat{y}_{c_2}+1, &{} otherwise \end{array}. \right. \end{aligned}$$

(10)

The update procedure is performed on the whole set Z. Thus, the strategy of classification rule for $\textbf{x}$ is defined as

$$\begin{aligned} c^*=\arg \max _{c=1,\cdots ,C} \hat{y}_c, \end{aligned}$$

(11)

where $c^*$ is the estimated class label for $\textbf{x}$. We summarize the specific procedure of OVO-LapPVP in Algorithm 1.

2.2 OVR-LapPVP

Originally, LapPVP obtains a projection vector for each class that keeps data points in the same class as close to one another, meanwhile as far from points in the other class. In LapPVP, the within-class scatter requires only one-class training data, and the Laplacian regularization requires all data to construct the adjacent graph. However, the between-scatter captures the information of different classes. Hence, it is easy to generate multiple projection vectors for multi-class data by using the one-versus-rest strategy. This section proposes OVR-LapPVP by constructing C projection vectors for C-class training data. In other words, each class has its own projection vector.

To construct a binary classifier for the cth class, we suppose samples in class c are positive, and samples in other rest $(C-1)$ classes are negative. Let $\textbf{X}_{l,c}$ be the labeled sample matrix for the cth class, and $l_c$ is the number of samples in the cth class. Then the total labeled sample matrix is $\textbf{X}_l = [\textbf{X}_{l,1};\cdots ;\textbf{X}_{l,C}]$, and the training sample matrix is $\textbf{X} = [\textbf{X}_l;\textbf{X}_u]$, where $\textbf{X}_u$ is the unlabeled sample matrix.

Thus, the cth optimization formulation of OVR-LapPVP is given by

$$\begin{aligned} \begin{aligned}&\max \limits _{\textbf{v}_c}~~ \textbf{v}_c^T \textbf{H}_c \textbf{v}_c - \rho _c \textbf{v}_c^T \textbf{X}^T \textbf{L} \textbf{X} \textbf{v}_c, \\&s.t. \quad \textbf{v}_c^T \textbf{v}_c = 1, \end{aligned} \end{aligned}$$

(12)

where the discriminative matrix $\textbf{H}_c$ is defined as

$$\begin{aligned} \begin{aligned} \textbf{H}_c = \alpha _c \textbf{B}_c - (1-\alpha _c) \textbf{W}_c, \end{aligned} \end{aligned}$$

(13)

where $\alpha _c \in [0,1]$ is the parameter to balance the between-class scatter matrix $\textbf{B}_c$ and within-class scatter matrix $\textbf{W}_c$ of the cth class, $\textbf{B}_c$ and $\textbf{W}_c$ respectively have the forms

$$\begin{aligned} \textbf{B}_c = \left( \textbf{X} - \textbf{e} \textbf{u}_c^T\right) ^T \left( \textbf{X} - \textbf{e} \textbf{u}_c^T\right) \end{aligned}$$

(14)

and

$$\begin{aligned} \textbf{W}_c = (\textbf{X}_{c} - \textbf{e}_c\textbf{u}_c^T )^T (\textbf{X}_{c} - \textbf{e}_c \textbf{u}_c^T), \end{aligned}$$

(15)

where $\textbf{e}\in \textbf{R}^n$ and $\textbf{e}_c \in \mathbb {R}^{l_c}$ are vectors of ones. To find the solution of the optimization problem (12), we have the following theorem.

Theorem 3

The optimal solution $\textbf{v}_{c}$ of the optimization problem (12) is the eigenvector corresponding to the maximum eigenvalue of the matrix $(\textbf{H}_{c}-\textbf{X}^T \textbf{L} \textbf{X})$.

Proof. The Lagrangian function of Eq. (12) can be written as

$$\begin{aligned} \begin{aligned} L(\textbf{v}_c, \lambda _c) =&\textbf{v}^T_c \textbf{H}_c \textbf{v}_c - \rho _c \textbf{v}^T_c \textbf{X}^T \textbf{L} \textbf{X} \textbf{v}_c - \lambda _c (\textbf{v}^T_c\textbf{v}_c -1). \end{aligned} \end{aligned}$$

(16)

Then, we derive the partial derivative of $L(\textbf{v}_c, \lambda _c)$ with respect to $\textbf{v}_c$ and make it equal zero and have

$$\begin{aligned} \begin{aligned}&\frac{\partial L({\textbf{v}}_c, \lambda _c)}{\partial {\textbf{v}}_c} = \left( {\textbf{H}}_c- \rho _c {\textbf{X}}^T {\textbf{L}} {\textbf{X}}\right) {\textbf{v}}_c - \lambda _c {\textbf{v}}_c = 0 \\&\Rightarrow \left( {\textbf{H}}_c- \rho _c {\textbf{X}}^T {\textbf{L}} {\textbf{X}}\right) {\textbf{v}}_c = \lambda _c {\textbf{v}}_c. \end{aligned} \end{aligned}$$

(17)

Finally, the cth optimization problem of OVR-LapPVP is also transferred to an eigenvalue decomposition problem. The solution $\textbf{v}_c$ is obtained as the eigenvectors corresponding to the maximum eigenvalue of the matrix $\left( \textbf{H}_c- \rho _c \textbf{X}^T \textbf{L} \textbf{X}\right) $. $\square $

After obtaining C projection vectors, one for each class, the unknown test data is predicted by the projected distances between it and all class centers. The decision function is defined as:

$$\begin{aligned} c^* = \arg \min \limits _{c = 1,2,...,C} \Vert \textbf{v}_c^T \textbf{x} - \textbf{v}_c^T \textbf{u}_c\Vert \end{aligned}$$

(18)

where $c^*$ is the estimated class label of $\textbf{x}$. We summarize the specific procedure of OVR-LapPVP in Algorithm 2.

2.3 Comparison of Multi-class LapPVP

We make a comparison for the proposed two multi-class methods here. OVO-LapPVP constructs $C(C-1)/2$ binary classifiers by solving $C(C-1)/2$ pairs of eigenvalue decomposition problems. The computational complexity of OVO-LapPVP is approximately $O(C^2m^3)$. OVR-LapPVP generates C projection vectors by solving C eigenvalue decomposition problems. The computational complexity of OVR-LapPVP is approximately $O(Cm^3)$.

Obviously, OVO-LapPVP requires a large amount of training time as compared to OVR-LapPVP. Moreover, the strategy of classification in OVO-LapPVP is more complex than that of OVR-LapPVP because OVO-LapPVP handles more binary classifiers. Of course, OVR-LapPVP has its own disadvantage of having a greater space complexity because its optimization problems are performed on all training data.

Table 1. Information of benchmark datasets

Full size table

3 Experiments

3.1 Experimental Setup

To confirm the feasibility and effectiveness of the proposed multi-class methods, we need to compare them with other multi-class ones. However, few paper discusses the semi-supervised multi-class tasks. Thus, algorithms compared here are extended by binary semi-supervised classifiers (LapTSVM, LapTPMSVM, and LapLSTSVM) using strategies of both OVO and OVR, where LapTSVM, LapTPSVM and LapLSTSVM are recently popular nonparallel classifiers for semi-supervised learning as well as LapPVP. Finally, we get six multi-class algorithms: OVO-LapTSVM, OVR-LapTSVM, OVO-LapTPMSVM, OVR-LapTPMSVM, OVO-LapLSTSVM and OVR-LapLSTSVM.

Experiments are conducted on benchmark datasets that are from the University of California at Irvine (UCI) Machine Learning Repository [2]. The details of datasets are presented in Table 1. For each dataset, we randomly select $70\%$ of samples from each class for training, and the remaining $30\%$ for test. The features of data are all normalized to the interval [0, 1]. To reduce the running time of parameter selection, hyper-parameters are set to the same value for all binary classifiers in one multi-class method. In addition, the grid search method [4] is used for selecting the optimal hyper-parameters. In OVO-LapPVP and OVR-LapPVP, $\alpha $ takes its value from the set $\{2^{-10}, 2^{-9}, \dots , 2^{0}\}$, and $\rho $ from $\{2^{-5}, 2^{-4}, \dots , 2^{5}\}$. Moreover, the number of nearest neighbors varies in the set $\{1, 3, 5, 7, 9\}$.

All Matlab scripts of the involved algorithms are written by ourselves. For a fair comparison, we exploit the Matlab toolbox of quadratic programming to solve quadratic programmings in the relevant methods. All methods are implemented in MATLAB R2015b on a personal computer, whose system configuration is Intel Core i5 (3.6 GHz) and 8 GB random access memory.

Parameters Analysis. It is well known, the parameters of a classifier may have a great influence on its classification performance. Here, we conduct the grid search method to analyze the impact of parameters on proposed methods through the Glass dataset. As mentioned before, the pair of parameters take the same values. In OVO-LapPVP and OVR-LapPVP, $\alpha $ takes its value from the set $\{2^{-10}, 2^{-9}, \dots , 2^{0}\}$, and $\rho $ from $\{2^{-5}, 2^{-4}, \dots , 2^{5}\}$. Here, K is selected from the set $\{1,3, 5, 7, 9\}$.

Figures 1 and 2 separately show the influence of parameters of OVO-LapPVP and OVR-LapPVP with $20\%$ of labeled data and $50\%$ of unlabeled data on the Glass dataset. It is obviously observed that the pair of parameters $(\alpha , \rho )$ can also greatly affect the performance of OVO-LapPVP and OVR-LapPVP. Additionally, when varying the number of nearest neighbors, the optimal pair of parameters $(\alpha , \rho )$ is totally different. Thus, the pair of optimal parameters cannot be fixed for all datasets. Therefore, parameter selection may be an issue for our method. However, the grid search method can help us to find appropriate parameters for OVO-LapPVP and OVR-LapPVP in experiments.

3.2 Results and Discussion

Experiments are conducted on nine UCI datasets to compare the classification performance of semi-supervised multi-class methods. We run 10 trials on each dataset and report the average accuracy.

Table 2. Average accuracy and standard deviation (%) obtained by eight multi-class methods with $10\%$ of labeled data

Full size table

Table 2 lists the results of eight semi-supervised multi-class classification methods on nine datasets with $10\%$ of labeled data and $50\%$ of unlabeled data. It is clear that the proposed methods including OVO-LapPVP and OVR-LapPVP can perform better. On Dnatest, Iris, Lungcancer, Wine, X85DK and Zoo datasets, OVO-LapPVP has the highest accuracy, followed by OVR-LapPVP. On the Glass dataset, OVR-LapPVP has the highest accuracy, followed by OVO-LapPVP.

Tables 3 and 4 separately summary the results of eight multi-class methods obtained with $20\%$ and $30\%$ of labeled data, respectively. Both OVO-LapPVP and OVR-LapPVP outperform other methods on all datasets except Balance and Waveform. The classification performance on the X8D5K dataset shows that the proposed semi-supervised classification methods are effective for multi-class classification tasks.

Table 3. Average accuracy and standard deviation (%) obtained by eight multi-class methods with $20\%$ of labeled data

Full size table

Observation on Tables 2, 3 and 4 indicates that OVO-LapPVP achieves the highest performance on 15 cases and the second highest on 6 cases. Note that there are 27 cases totally. At the same time, we can see that OVR-LapPVP is the best on 9 cases and the second best on 12 cases. To be simply, our methods are superior to other methods in 24 out of 27 cases. Findings suggests that OVO-LapPVP has the best performance for multi-class classification tasks among compared methods, followed by OVR-LapPVP.

The running time of eight methods on nine datasets with $20\%$ of labeled and $50\%$ of unlabeled training data is recorded in Fig. 3. The first, third, fifth, and seventh columns in every sub-figure represent the running time obtained by methods using the OVO strategy, and the rest columns represent those using OVR. Obviously, the OVR-based methods spend less time than OVO-based ones because the OVR-based methods need to solve less optimization problems. Four methods, OVO-LapLSTSVM, OVR-LapLSTSVM, OVO-LapPVP and OVR-LapPVP, that avoid the complex quadratic programmings can save much time. Therefore, OVO-LapPVP and OVR-LapPVP are promising when applied to multi-class classification tasks.

Table 4. Average accuracy and standard deviation (%) obtained by eight multi-class methods with $30\%$ of labeled data

Full size table

Additionally, we give the rank of individual methods in Table 5. A method that has the highest accuracy is ranked the first, that has the second highest accuracy is ranked the second, and so on. Digits in the row of “$10\%$” mean the average rank obtained from Table 2, “$20\%$” and “$30\%$” are from Tables 3 and 4, respectively. The row of “Average rank” provides the average rank over 27 cases, and that of “Friedman test” shows the rank difference between methods and reference method, where OVR-LapPVP is taken as the reference method. The smaller the value of average rank and Friedman test is, the better performance the corresponding method has. It is obvious that OVO-LapPVP is slightly better than OVR-LapPVP since the value of OVO-LapPVP is less than 0 in the Friedman test. According to Table 5, we can conclude OVO-LapPVP has a greater superiority, and OVR-LapPVP ranks the second.

Table 5. Rank of eight methods on UCI datasets

Full size table

4 Conclusion

In this paper, we propose OVO-LapPVP and OVR-LapPVP for multi-class classification tasks. As the extension of LapPVP, the proposed methods both can take advantage of the discriminative information of the labeled data and the graph structure of unlabeled data to obtain the optimal projection vectors. OVO-LapPVP adopts the OVO strategy, and OVR-LapPVP adopts the OVR strategy. That is to say, OVO-LapPVP obtains $C(C-1)/2$ pairs of projection vectors, and OVR-LapPVP obtains C ones for the C-class data. Since OVO-LapPVP needs more optimization problems to solve than OVR-LapPVP, which results that OVO-LapPVP costs more time than OVR-LapPVP on the same dataset.

Experiments conducted on the UCI datasets have demonstrated that the proposed methods have better classification performance compared with other popular semi-supervised methods based on manifold regularization and twin support vector machine. OVO-LapPVP outperforms OVR-LapPVP in classification performance, but is worse than OVR-LapPVP in running efficiency. Therefore, OVO-LapPVP and OVR-LapPVP can be applied to different scenes. OVO-LapPVP and OVR-LapPVP are both excellent on solving multi-class classification problems for linear cases, in the following work, we can consider the methods with kernel ticks for nonlinear cases.

References

Chen, W., Shao, Y., Deng, N., Feng, Z.: Laplacian least squares twin support vector machine for semi-supervised classification. Neurocomputing 145, 465–476 (2014)
Article Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). https://archive.ics.uci.edu/ml
Goyal, N., Gupta, K., Kumar, N.: Multiclass twin support vector machine for plant species identification. Multimedia Tools Appl. 78(19), 27785–27808 (2019)
Article Google Scholar
Iwan, S., Adam, P., Gary, W.: SVM parameter optimization using grid search and genetic algorithm to improve classification performance. Telecommun. Comput. Electron. Control 12, 1502–1509 (2016)
Google Scholar
Jayadeva, Khemchandani, R., Chandra, S.: Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 905–910 (2007)
Google Scholar
Kumar, M.A., Gopal, M.: Least squares twin support vector machines for pattern classification. Expert Syst. Appl. 36(4), 7535–7543 (2009)
Article Google Scholar
Peng, X.: TPMSVM: a novel twin parametric-margin support vector machine for pattern recognition. Pattern Recogn. 44(10–11), 2678–2692 (2011)
Article MATH Google Scholar
Qi, Z., Tian, Y., Shi, Y.: Laplacian twin support vector machine for semi-supervised classification. Neural Netw. 35, 46–53 (2012)
Article MATH Google Scholar
Tomar, D., Agarwal, S.: A comparison on multi-class classification methods based on least squares twin support vector machine. Knowl. Based Syst. 81, 131–147 (2015)
Article Google Scholar
Xie, J., Hone, K.S., Xie, W., Gao, X., Shi, Y., Liu, X.: Extending twin support vector machine classifier for multi-category classification problems. Intell. Data Anal. 17(4), 649–664 (2013)
Article Google Scholar
Xue, Y., Zhang, L.: Laplacian pair-weight vector projection for semi-supervised learning. Inf. Sci. 573, 1–19 (2021)
Article MathSciNet Google Scholar
Yang, Z., Xu, Y.: Laplacian twin parametric-margin support vector machine for semi-supervised classification. Neurocomputing 171, 325–334 (2016)
Article Google Scholar

Download references

Acknowledgment

This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant Nos. 19KJA550002 and 19KJA610002, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, China
Yangtao Xue & Li Zhang

Authors

Yangtao Xue
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Haijun Zhang
Chaohu University, Hefei, China
Yinggen Ke
Chongqing University, Chongqing, China
Zhou Wu
South China Normal University, Guangzhou, China
Tianyong Hao
Hefei University of Technology, Hefei, China
Zhao Zhang
Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
Chaohu University, Hefei, China
Yuanyuan Mu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, Y., Zhang, L. (2023). Semi-supervised Multi-class Classification Methods Based on Laplacian Vector Projection. In: Zhang, H., et al. International Conference on Neural Computing for Advanced Applications. NCAA 2023. Communications in Computer and Information Science, vol 1869. Springer, Singapore. https://doi.org/10.1007/978-981-99-5844-3_12

Download citation

DOI: https://doi.org/10.1007/978-981-99-5844-3_12
Published: 31 August 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5843-6
Online ISBN: 978-981-99-5844-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semi-supervised Multi-class Classification Methods Based on Laplacian Vector Projection

Abstract

Similar content being viewed by others

Laplacain Pair-Weight Vector Projection with Adaptive Neighbor Graph for Semi-supervised Learning

Applying Regularization Least Squares Canonical Correlation Analysis in Extreme Learning Machine for Multi-label Classification Problems

Manifold proximal support vector machine with mixed-norm for semi-supervised classification

Keywords

1 Introduction