Keywords

1 Introduction

Twin support vector machine (TSVM) has become one of the popular binary classifiers because it can greatly reduce the computational complexity of support vector machine (SVM) [5]. TSVM obtains two nonparallel hyperplanes by two smaller-sized and related SVM-type problems and is a supervised binary classifier that require a large amount of labeled data. However, it is difficult for TSVM to deal with multi-classification problems. In order to inherit the advantages of TSVM, some multi-class versions of TSVM have been proposed for wider applications [9, 10].

Inspired by the manifold regularization, Laplacian TSVM (LapTSVM) was proposed for semi-supervised learning and become a useful extension of TSVM [8]. Similar to LapTSVM, Laplacian twin parametric-margin SVM (LapTPMSVM) [12] and Laplacian least squares TSVM (LapLSTSVM) [1] are the semi-supervised versions of twin parametric-margin SVM [7] and least squares TSVM [6], respectively, which are variants of TSVM. The above three semi-supervised methods all aim to generate two nonparallel hyperplanes so that each hyperplane is closer to its own class as far as possible from the other. Laplacian pair-weight vector projection (LapPVP) was also motivated by the idea of nonparallel hyperplanes [11]. LapPVP is an excellent binary classifier compared with other nonparallel hyperplanes methods. Furthermore, LapPVP integrates the between-class scatter, the within-class scatter, and the Laplacian regularization together for semi-supervised binary classification. Each class data provides the class-specific information by computing the between-class and within-class scatters, which enhances the power of discriminative representation. The Laplacian regularization provides the graph structure of labeled and unlabeled data, which is the intrinsic geometrical structure of data.

Although these semi-supervised binary classifiers are promising, their multi-class versions have been rarely explored. To solve the semi-supervised multi-class classification problems, we extend the formulation of LapPVP to its multi-class versions and propose two novel semi-supervised multi-class methods, named one-versus-one LapPVP (OVO-LapPVP) and one-versus-rest LapPVP (OVR-LapPVP). Researchers have developed the ideas of “one-versus-one” (OVO) and “one-versus-rest” (OVR) to decompose a multi-class problem into multiple binary sub-problems [3, 9]. Originally, LapPVP obtains a pair of projection vectors for two-class data. Both OVO-LapPVP and OVR-LapPVP solve a multi-class classification task using multiple LapPVPs, but differ in the number of LapPVPs. In specific, the multi-class versions of LapPVP have the following characteristics:

  1. (1)

    OVO-LapPVP obtains \(C(C-1)/2\) pairs of projection vectors for C-class data, whereas OVR-LapPVP achieves C nonparallel hyperplanes for C-class data, one for each class.

  2. (2)

    The proposed multi-class classification methods are all solved by eigenvalue decomposition, which avoids finding solutions to complex quadratic programmings.

  3. (3)

    Experimental results on UCI datasets demonstrate the effectiveness of OVO-LapPVP and OVR-LapPVP.

2 Semi-supervised Multi-class LapPVPs

In this section, we describe the proposed OVO-LapPVP and OVR-LapPVP detailly. Now, we consider a semi-supervised multi-class classification task. Let \(\textbf{X}_l = [\textbf{x}_1,\cdots ,\textbf{x}_l]^T\in \mathbb {R}^{l\times m}\) and \(\textbf{X}_u = [\textbf{x}_{l+1},\cdots ,\textbf{x}_{n}]^T\in \mathbb {R}^{(n-l)\times m}\) be the labeled and unlabeled sample matrices, respectively, \(\textbf{x}_i \in \mathbb {R}^m\) is an m-dimensional sample, l and n are the numbers of labeled and total samples, respectively. Let \(y_i\) be the label of \(\textbf{x}_i\), where \(y_i \in [1,2,...,C]\), and C is the number of classes. Let \(\textbf{X}_{l,c}\) be the labeled sample matrix belonging to the cth class. The total sample matrix can be denoted as \(\textbf{X} = [\textbf{X}_{l,1};\cdots ;\textbf{X}_{l,C}; \textbf{X}_u]\in \mathbb {R}^{n\times m}\).

In LapPVP, we need to construct an adjacency graph using the K nearest neighbor method. The ith row and jth column element of the adjacency matrix \(\textbf{S}\) induced by the adjacency graph can be represented as

$$\begin{aligned} S_{ij} = {\left\{ \begin{array}{ll} 1, &{} if~~\textbf{x}_i \in N_K(\textbf{x}_j)~~or~~\textbf{x}_j \in N_K(\textbf{x}_i) \\ 0, &{} otherwise \end{array}\right. }, \end{aligned}$$
(1)

where \(N_K(\textbf{x}_i)\) is the set of K nearest neighbors of \(\textbf{x}_i\). Then we get the classical Laplacian matrix \(\textbf{L} = \textbf{D} - \textbf{S}\), where \(\textbf{D}\) is the diagonal matrix with \({D}_{ii} = \sum _{j} {S}_{ij}\).

2.1 OVO-LapPVP

The one-versus-one strategy is a popular technique that can easily extend binary classifiers to multi-class ones. By using the one-versus-one strategy, OVO-LapPVP is presented, which can generate \(C(C-1)/2\) binary classifiers for C-class data. Given a C-class dataset, we obtain the set of class pairs \(Z =\{z=(c_1,c_2)| c_1,c_2=1,2,\cdots ,C, c_1\ne c_2\}\). To construct a binary classifier with class pair z, suppose class \(c_1\) is the positive class, and class \(c_2\) as the negative one. The corresponding LapPVP classifier is trained with labeled samples in both classes \(c_1\) and \(c_2\) and all unlabeled samples.

Let \(\textbf{X}_z = [\textbf{X}_{l,c_1}; \textbf{X}_{l,c_2}; \textbf{X}_u] \in \mathbb {R}^{n_z \times m}\) with \(z=(c_1,c_2)\), where \(n_z\) is the total number of samples in both classes \(c_1\) and \(c_2\) and without labels. The objective functions of OVO-LapPVP with respect to z can be expressed as follows:

$$\begin{aligned} \begin{aligned}&\max \limits _{\textbf{v}_{c_1}} \textbf{v}_{c_1}^T \textbf{H}_{c_1} \textbf{v}_{c_1} - \rho _{c_1} \textbf{v}_{c_1}^T \textbf{X}_z^T \textbf{L}_z \textbf{X}_z \textbf{v}_{c_1} \\&s.t. \quad \textbf{v}_{c_1}^T \textbf{v}_{c_1} = 1 \end{aligned} \end{aligned}$$
(2)

and

$$\begin{aligned} \begin{aligned}&\max \limits _{\textbf{v}_{c_2}} \textbf{v}_{c_2}^T \textbf{H}_{c_2} \textbf{v}_{c_2} - \rho _{c_2} \textbf{v}_{c_2}^T \textbf{X}_z^T \textbf{L}_z \textbf{X}_z \textbf{v}_{c_2} \\&s.t. \quad \textbf{v}_{c_2}^T \textbf{v}_{c_2}= 1 \end{aligned}, \end{aligned}$$
(3)

where \({\textbf{v}_{c_1}} \in \mathbb {R}^{m}\) and \({\textbf{v}_{c_2}}\in \mathbb {R}^{m}\) are projection vectors, \(\rho _{c_1}\) and \(\rho _{c_2}\) are regularization parameters that are greater than 0, the Laplacian matrix \(\textbf{L}_z\in \mathbb {R}^{n_z\times n_z}\) is computed based on \(\textbf{X}_z\), and the discriminative matrices \(\textbf{H}_{c_1}\in \mathbb {R}^{m \times m}\) and \(\textbf{H}_{c_2} \in \mathbb {R}^{m \times m}\) are defined as

$$\begin{aligned} \begin{aligned} \textbf{H}_{c_1}&= \alpha _{c_1} \left( \textbf{X}_z - \textbf{e}_z \textbf{u}_{c_1}^T\right) ^T \left( \textbf{X}_z - \textbf{e}_z \textbf{u}_{c_1}^T\right) \\&- (1-\alpha _{c_1}) (\textbf{X}_{c_1} - \textbf{e}_{c_1}\textbf{u}_{c_1}^T )^T (\textbf{X}_{c_1} - \textbf{e}_{c_1} \textbf{u}_{c_1}^T) \end{aligned} \end{aligned}$$
(4)

and

$$\begin{aligned} \begin{aligned} \textbf{H}_{c_2}&= \alpha _{c_2} \left( \textbf{X}_z - \textbf{e}_z \textbf{u}_{c_2}^T\right) ^T \left( \textbf{X}_z - \textbf{e}_z \textbf{u}_{c_2}^T\right) \\&- (1-\alpha _{c_2}) (\textbf{X}_{c_2} - \textbf{e}_{c_2}\textbf{u}_{c_2}^T )^T (\textbf{X}_{c_2} - \textbf{e}_{c_2} \textbf{u}_{c_2}^T), \end{aligned} \end{aligned}$$
(5)

where \(\alpha _{c_1}\in [0,1]\) and \(\alpha _{c_2}\in [0,1]\) are parameters to balance the between-class scatter matrix and the within-class one, class centers \(\textbf{u}_{c_j}=\frac{1}{|X_{l,c_j}|}\sum _{\textbf{x}_i\in X_{l,c_j}} \textbf{x}_{i}\) (\(j=1,2\)), \(X_{l,c_j}\) is the set of labeled samples in the \(c_j\)th class, \(\textbf{e}_z \in \mathbb {R}^{n_z}\) and \(\textbf{e}_{c_j} \in \mathbb {R}^{|X_{l,c_j}|}\) (\(j=1,2\)) are vectors of all ones.

The following theorems describe the solutions to the optimization problems (2) and (3).

Theorem 1

The optimal solution \(\textbf{v}_{c_1}\) of the optimization problem (2) is the eigenvector corresponding to the maximum eigenvalue of the matrix \((\textbf{H}_{c_1}-\textbf{X}_z^T \textbf{L}_z \textbf{X}_z)\).

Proof. To find the solution to (2), we first generate the corresponding Lagrangian function with positive multipliers \(\lambda _{c_1}\). That is

$$\begin{aligned} \begin{aligned} L(\textbf{v}_{c_1}, \lambda _{c_1}) =&\textbf{v}^T_{c_1} \textbf{H}_{c_1} \textbf{v}_{c_1} - \rho _{c_1} \textbf{v}^T_{c_1} \textbf{X}_z^T \textbf{L}_z \textbf{X}_z \textbf{v}_{c_1} - \lambda _{c_1} (\textbf{v}^T_{c_1}\textbf{v}_{c_1} -1). \end{aligned} \end{aligned}$$
(6)

Next, we derive the partial derivative of \(L(\textbf{v}_{c_1}, \lambda _{c_1})\) with respect to the primal variable \(\textbf{v}_{c_1}\), and then make it equal zero, which results in

$$\begin{aligned} \begin{aligned}&\frac{\partial L(\textbf{v}_{c_1}, \lambda _{c_1})}{\partial \textbf{v}_{c_1}} = \left( \textbf{H}_{c_1}- \rho _{c_1} \textbf{X}_z^T \textbf{L}_z \textbf{X}_z\right) \textbf{v}_{c_1} - \lambda _{c_1} \textbf{v}_{c_1} = 0 \\&\Rightarrow \left( \textbf{H}_{c_1}- \rho _{c_1} \textbf{X}_z^T \textbf{L}_z \textbf{X}_z\right) \textbf{v}_{c_1} = \lambda _{c_1} \textbf{v}_{c_1}. \end{aligned} \end{aligned}$$
(7)

By introducing the Tikhonov regularization term in Eq. (7), the solution can be obtained by solving a classical eigenvalue problem. Finally, the solution \(\textbf{v}_{c_1}\) can computed as the eigenvector corresponding to the maximum eigenvalue of \(\left( \textbf{H}_{c_1}- \rho _{c_1} \textbf{X}_z^T \textbf{L}_z \textbf{X}_z\right) \).    \(\square \)

Theorem 2

The optimal solution \(\textbf{v}_{c_2}\) of the optimization problem (3) is the eigenvector corresponding to the maximum eigenvalue of the matrix \((\textbf{H}_{c_2}-\textbf{X}_z^T \textbf{L}_z \textbf{X}_z)\).

The proof of Theorem 2 is similar to that of Theorem 1; thus, we omit the proof of Theorem 2. Theorems 1 and 2 provide the optimal projection vectors \(\textbf{v}_{c_1}\) and \(\textbf{v}_{c_2}\) by performing eigen-decomposition, respectively. After obtaining \(\textbf{v}_{c_1}\) and \(\textbf{v}_{c_2}\) for class pair \(z = (c_1,c_2)\), we can respectively compute the distances between an unknown test point \(\textbf{x}\) and class centers \(\textbf{u}_{c_j}~(j=1,2)\) by

$$\begin{aligned} \begin{aligned}&d(\textbf{x},\textbf{u}_{c_1}) = \Vert \textbf{v}_{c_1}^T \textbf{x} - \textbf{v}_{c_1}^T \textbf{u}_{c_1}\Vert \end{aligned} \end{aligned}$$
(8)

and

$$\begin{aligned} \begin{aligned}&d(\textbf{x},\textbf{u}_{c_2}) = \Vert \textbf{v}_{c_2}^T \textbf{x} - \textbf{v}_{c_2}^T \textbf{u}_{c_2}\Vert \end{aligned}, \end{aligned}$$
(9)

where \(\Vert \cdot \Vert \) is the 2-norm of a vector. If \(d(\textbf{x},\textbf{u}_{c_1}) \le d(\textbf{x},\textbf{u}_{c_2})\) for class pair z, then \(\textbf{x}\) is more like a sample in class \(c_1\).

To assign a class label to \(\textbf{x}\), we construct a vote vector \(\hat{\textbf{y}}=[\hat{y}_1,\cdots ,\hat{y}_C]^T\), where \(\hat{y}_c\) is the vote for the cth class. Generally, the initial value of \(\hat{y}_c\) is 0 for all \(c=1,\cdots ,C\). For each class pair \(z =(c_1,c_2)\), we update the vote for only class \(c_1\) or class \(c_2\). That is

$$\begin{aligned} \left\{ \begin{array}{c l} \hat{y}_{c_1} \leftarrow \hat{y}_{c_1}+1, &{} if~~d(\textbf{x},\textbf{u}_{c_1}) \le d(\textbf{x},\textbf{u}_{c_2})\\ \hat{y}_{c_2} \leftarrow \hat{y}_{c_2}+1, &{} otherwise \end{array}. \right. \end{aligned}$$
(10)

The update procedure is performed on the whole set Z. Thus, the strategy of classification rule for \(\textbf{x}\) is defined as

$$\begin{aligned} c^*=\arg \max _{c=1,\cdots ,C} \hat{y}_c, \end{aligned}$$
(11)

where \(c^*\) is the estimated class label for \(\textbf{x}\). We summarize the specific procedure of OVO-LapPVP in Algorithm 1.

figure a

2.2 OVR-LapPVP

Originally, LapPVP obtains a projection vector for each class that keeps data points in the same class as close to one another, meanwhile as far from points in the other class. In LapPVP, the within-class scatter requires only one-class training data, and the Laplacian regularization requires all data to construct the adjacent graph. However, the between-scatter captures the information of different classes. Hence, it is easy to generate multiple projection vectors for multi-class data by using the one-versus-rest strategy. This section proposes OVR-LapPVP by constructing C projection vectors for C-class training data. In other words, each class has its own projection vector.

To construct a binary classifier for the cth class, we suppose samples in class c are positive, and samples in other rest \((C-1)\) classes are negative. Let \(\textbf{X}_{l,c}\) be the labeled sample matrix for the cth class, and \(l_c\) is the number of samples in the cth class. Then the total labeled sample matrix is \(\textbf{X}_l = [\textbf{X}_{l,1};\cdots ;\textbf{X}_{l,C}]\), and the training sample matrix is \(\textbf{X} = [\textbf{X}_l;\textbf{X}_u]\), where \(\textbf{X}_u\) is the unlabeled sample matrix.

Thus, the cth optimization formulation of OVR-LapPVP is given by

$$\begin{aligned} \begin{aligned}&\max \limits _{\textbf{v}_c}~~ \textbf{v}_c^T \textbf{H}_c \textbf{v}_c - \rho _c \textbf{v}_c^T \textbf{X}^T \textbf{L} \textbf{X} \textbf{v}_c, \\&s.t. \quad \textbf{v}_c^T \textbf{v}_c = 1, \end{aligned} \end{aligned}$$
(12)

where the discriminative matrix \(\textbf{H}_c\) is defined as

$$\begin{aligned} \begin{aligned} \textbf{H}_c = \alpha _c \textbf{B}_c - (1-\alpha _c) \textbf{W}_c, \end{aligned} \end{aligned}$$
(13)

where \(\alpha _c \in [0,1]\) is the parameter to balance the between-class scatter matrix \(\textbf{B}_c\) and within-class scatter matrix \(\textbf{W}_c\) of the cth class, \(\textbf{B}_c\) and \(\textbf{W}_c\) respectively have the forms

$$\begin{aligned} \textbf{B}_c = \left( \textbf{X} - \textbf{e} \textbf{u}_c^T\right) ^T \left( \textbf{X} - \textbf{e} \textbf{u}_c^T\right) \end{aligned}$$
(14)

and

$$\begin{aligned} \textbf{W}_c = (\textbf{X}_{c} - \textbf{e}_c\textbf{u}_c^T )^T (\textbf{X}_{c} - \textbf{e}_c \textbf{u}_c^T), \end{aligned}$$
(15)

where \(\textbf{e}\in \textbf{R}^n\) and \(\textbf{e}_c \in \mathbb {R}^{l_c}\) are vectors of ones. To find the solution of the optimization problem (12), we have the following theorem.

Theorem 3

The optimal solution \(\textbf{v}_{c}\) of the optimization problem (12) is the eigenvector corresponding to the maximum eigenvalue of the matrix \((\textbf{H}_{c}-\textbf{X}^T \textbf{L} \textbf{X})\).

Proof. The Lagrangian function of Eq. (12) can be written as

$$\begin{aligned} \begin{aligned} L(\textbf{v}_c, \lambda _c) =&\textbf{v}^T_c \textbf{H}_c \textbf{v}_c - \rho _c \textbf{v}^T_c \textbf{X}^T \textbf{L} \textbf{X} \textbf{v}_c - \lambda _c (\textbf{v}^T_c\textbf{v}_c -1). \end{aligned} \end{aligned}$$
(16)

Then, we derive the partial derivative of \(L(\textbf{v}_c, \lambda _c)\) with respect to \(\textbf{v}_c\) and make it equal zero and have

$$\begin{aligned} \begin{aligned}&\frac{\partial L({\textbf{v}}_c, \lambda _c)}{\partial {\textbf{v}}_c} = \left( {\textbf{H}}_c- \rho _c {\textbf{X}}^T {\textbf{L}} {\textbf{X}}\right) {\textbf{v}}_c - \lambda _c {\textbf{v}}_c = 0 \\&\Rightarrow \left( {\textbf{H}}_c- \rho _c {\textbf{X}}^T {\textbf{L}} {\textbf{X}}\right) {\textbf{v}}_c = \lambda _c {\textbf{v}}_c. \end{aligned} \end{aligned}$$
(17)

Finally, the cth optimization problem of OVR-LapPVP is also transferred to an eigenvalue decomposition problem. The solution \(\textbf{v}_c\) is obtained as the eigenvectors corresponding to the maximum eigenvalue of the matrix \(\left( \textbf{H}_c- \rho _c \textbf{X}^T \textbf{L} \textbf{X}\right) \).    \(\square \)

After obtaining C projection vectors, one for each class, the unknown test data is predicted by the projected distances between it and all class centers. The decision function is defined as:

$$\begin{aligned} c^* = \arg \min \limits _{c = 1,2,...,C} \Vert \textbf{v}_c^T \textbf{x} - \textbf{v}_c^T \textbf{u}_c\Vert \end{aligned}$$
(18)

where \(c^*\) is the estimated class label of \(\textbf{x}\). We summarize the specific procedure of OVR-LapPVP in Algorithm 2.

figure b

2.3 Comparison of Multi-class LapPVP

We make a comparison for the proposed two multi-class methods here. OVO-LapPVP constructs \(C(C-1)/2\) binary classifiers by solving \(C(C-1)/2\) pairs of eigenvalue decomposition problems. The computational complexity of OVO-LapPVP is approximately \(O(C^2m^3)\). OVR-LapPVP generates C projection vectors by solving C eigenvalue decomposition problems. The computational complexity of OVR-LapPVP is approximately \(O(Cm^3)\).

Obviously, OVO-LapPVP requires a large amount of training time as compared to OVR-LapPVP. Moreover, the strategy of classification in OVO-LapPVP is more complex than that of OVR-LapPVP because OVO-LapPVP handles more binary classifiers. Of course, OVR-LapPVP has its own disadvantage of having a greater space complexity because its optimization problems are performed on all training data.

Table 1. Information of benchmark datasets

3 Experiments

3.1 Experimental Setup

To confirm the feasibility and effectiveness of the proposed multi-class methods, we need to compare them with other multi-class ones. However, few paper discusses the semi-supervised multi-class tasks. Thus, algorithms compared here are extended by binary semi-supervised classifiers (LapTSVM, LapTPMSVM, and LapLSTSVM) using strategies of both OVO and OVR, where LapTSVM, LapTPSVM and LapLSTSVM are recently popular nonparallel classifiers for semi-supervised learning as well as LapPVP. Finally, we get six multi-class algorithms: OVO-LapTSVM, OVR-LapTSVM, OVO-LapTPMSVM, OVR-LapTPMSVM, OVO-LapLSTSVM and OVR-LapLSTSVM.

Experiments are conducted on benchmark datasets that are from the University of California at Irvine (UCI) Machine Learning Repository [2]. The details of datasets are presented in Table 1. For each dataset, we randomly select \(70\%\) of samples from each class for training, and the remaining \(30\%\) for test. The features of data are all normalized to the interval [0, 1]. To reduce the running time of parameter selection, hyper-parameters are set to the same value for all binary classifiers in one multi-class method. In addition, the grid search method [4] is used for selecting the optimal hyper-parameters. In OVO-LapPVP and OVR-LapPVP, \(\alpha \) takes its value from the set \(\{2^{-10}, 2^{-9}, \dots , 2^{0}\}\), and \(\rho \) from \(\{2^{-5}, 2^{-4}, \dots , 2^{5}\}\). Moreover, the number of nearest neighbors varies in the set \(\{1, 3, 5, 7, 9\}\).

All Matlab scripts of the involved algorithms are written by ourselves. For a fair comparison, we exploit the Matlab toolbox of quadratic programming to solve quadratic programmings in the relevant methods. All methods are implemented in MATLAB R2015b on a personal computer, whose system configuration is Intel Core i5 (3.6 GHz) and 8 GB random access memory.

Parameters Analysis. It is well known, the parameters of a classifier may have a great influence on its classification performance. Here, we conduct the grid search method to analyze the impact of parameters on proposed methods through the Glass dataset. As mentioned before, the pair of parameters take the same values. In OVO-LapPVP and OVR-LapPVP, \(\alpha \) takes its value from the set \(\{2^{-10}, 2^{-9}, \dots , 2^{0}\}\), and \(\rho \) from \(\{2^{-5}, 2^{-4}, \dots , 2^{5}\}\). Here, K is selected from the set \(\{1,3, 5, 7, 9\}\).

Fig. 1.
figure 1

Accuracy vs. parameters \((\alpha ,\rho )\) of OVO-LapPVP on Glass dataset under different K

Fig. 2.
figure 2

Accuracy vs. parameters \((\alpha ,\rho )\) of OVR-LapPVP on Glass dataset under different K

Figures 1 and 2 separately show the influence of parameters of OVO-LapPVP and OVR-LapPVP with \(20\%\) of labeled data and \(50\%\) of unlabeled data on the Glass dataset. It is obviously observed that the pair of parameters \((\alpha , \rho )\) can also greatly affect the performance of OVO-LapPVP and OVR-LapPVP. Additionally, when varying the number of nearest neighbors, the optimal pair of parameters \((\alpha , \rho )\) is totally different. Thus, the pair of optimal parameters cannot be fixed for all datasets. Therefore, parameter selection may be an issue for our method. However, the grid search method can help us to find appropriate parameters for OVO-LapPVP and OVR-LapPVP in experiments.

3.2 Results and Discussion

Experiments are conducted on nine UCI datasets to compare the classification performance of semi-supervised multi-class methods. We run 10 trials on each dataset and report the average accuracy.

Table 2. Average accuracy and standard deviation (%) obtained by eight multi-class methods with \(10\%\) of labeled data

Table 2 lists the results of eight semi-supervised multi-class classification methods on nine datasets with \(10\%\) of labeled data and \(50\%\) of unlabeled data. It is clear that the proposed methods including OVO-LapPVP and OVR-LapPVP can perform better. On Dnatest, Iris, Lungcancer, Wine, X85DK and Zoo datasets, OVO-LapPVP has the highest accuracy, followed by OVR-LapPVP. On the Glass dataset, OVR-LapPVP has the highest accuracy, followed by OVO-LapPVP.

Tables 3 and 4 separately summary the results of eight multi-class methods obtained with \(20\%\) and \(30\%\) of labeled data, respectively. Both OVO-LapPVP and OVR-LapPVP outperform other methods on all datasets except Balance and Waveform. The classification performance on the X8D5K dataset shows that the proposed semi-supervised classification methods are effective for multi-class classification tasks.

Table 3. Average accuracy and standard deviation (%) obtained by eight multi-class methods with \(20\%\) of labeled data

Observation on Tables 2, 3 and 4 indicates that OVO-LapPVP achieves the highest performance on 15 cases and the second highest on 6 cases. Note that there are 27 cases totally. At the same time, we can see that OVR-LapPVP is the best on 9 cases and the second best on 12 cases. To be simply, our methods are superior to other methods in 24 out of 27 cases. Findings suggests that OVO-LapPVP has the best performance for multi-class classification tasks among compared methods, followed by OVR-LapPVP.

The running time of eight methods on nine datasets with \(20\%\) of labeled and \(50\%\) of unlabeled training data is recorded in Fig. 3. The first, third, fifth, and seventh columns in every sub-figure represent the running time obtained by methods using the OVO strategy, and the rest columns represent those using OVR. Obviously, the OVR-based methods spend less time than OVO-based ones because the OVR-based methods need to solve less optimization problems. Four methods, OVO-LapLSTSVM, OVR-LapLSTSVM, OVO-LapPVP and OVR-LapPVP, that avoid the complex quadratic programmings can save much time. Therefore, OVO-LapPVP and OVR-LapPVP are promising when applied to multi-class classification tasks.

Table 4. Average accuracy and standard deviation (%) obtained by eight multi-class methods with \(30\%\) of labeled data

Additionally, we give the rank of individual methods in Table 5. A method that has the highest accuracy is ranked the first, that has the second highest accuracy is ranked the second, and so on. Digits in the row of “\(10\%\)” mean the average rank obtained from Table 2, “\(20\%\)” and “\(30\%\)” are from Tables 3 and 4, respectively. The row of “Average rank” provides the average rank over 27 cases, and that of “Friedman test” shows the rank difference between methods and reference method, where OVR-LapPVP is taken as the reference method. The smaller the value of average rank and Friedman test is, the better performance the corresponding method has. It is obvious that OVO-LapPVP is slightly better than OVR-LapPVP since the value of OVO-LapPVP is less than 0 in the Friedman test. According to Table 5, we can conclude OVO-LapPVP has a greater superiority, and OVR-LapPVP ranks the second.

Table 5. Rank of eight methods on UCI datasets
Fig. 3.
figure 3

Running time of eight methods on UCI datasets

4 Conclusion

In this paper, we propose OVO-LapPVP and OVR-LapPVP for multi-class classification tasks. As the extension of LapPVP, the proposed methods both can take advantage of the discriminative information of the labeled data and the graph structure of unlabeled data to obtain the optimal projection vectors. OVO-LapPVP adopts the OVO strategy, and OVR-LapPVP adopts the OVR strategy. That is to say, OVO-LapPVP obtains \(C(C-1)/2\) pairs of projection vectors, and OVR-LapPVP obtains C ones for the C-class data. Since OVO-LapPVP needs more optimization problems to solve than OVR-LapPVP, which results that OVO-LapPVP costs more time than OVR-LapPVP on the same dataset.

Experiments conducted on the UCI datasets have demonstrated that the proposed methods have better classification performance compared with other popular semi-supervised methods based on manifold regularization and twin support vector machine. OVO-LapPVP outperforms OVR-LapPVP in classification performance, but is worse than OVR-LapPVP in running efficiency. Therefore, OVO-LapPVP and OVR-LapPVP can be applied to different scenes. OVO-LapPVP and OVR-LapPVP are both excellent on solving multi-class classification problems for linear cases, in the following work, we can consider the methods with kernel ticks for nonlinear cases.