One novel class of Bézier smooth semi-supervised support vector machines for classification

Wang, En; Wang, Zi-Yang; Wu, Qing

doi:10.1007/s00521-021-05765-6

One novel class of Bézier smooth semi-supervised support vector machines for classification

Original Article
Published: 01 March 2021

Volume 33, pages 9975–9991, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

One novel class of Bézier smooth semi-supervised support vector machines for classification

Download PDF

362 Accesses
4 Citations
Explore all metrics

Abstract

The semi-supervised support vector machine (S³VM) for classification is introduced for dealing with quantities of unlabeled data in the real world. Labeled data are utilized to train the algorithm and then were adapted to classify the unlabeled data. However, this algorithm has several drawbacks, such as the non-smooth term of semi-supervised objective function negatively affects the classification precision. Moreover, it is required to endure heavy burden in solving two quadratic programming problems with inversion matrix operation. To cope with this problem, this article puts forward a novel class of Bézier smooth semi-supervised support vector machines (BS⁴VMs), based on the approximation property of Bézier function to the non-smooth term. Because of this approximation, a fast quasi-Newton method for solving BS⁴VMs can be used to decrease the calculating time scale. This new kind of algorithm enhances the generalization and robustness of S³VM for nonlinear case as well. Further, to show how the BS⁴VMs can be practically implemented, experiments on synthetic, UCI dataset, USPS dataset, and large-scale NDC database are offered. The theoretical analysis and experiments comparisons clearly confirm the superiority of BS⁴VMs in both classification accuracy and calculating time.

DCA Based Algorithms for Feature Selection in Semi-supervised Support Vector Machines

Weighted Least Squares Support Vector Machine for Semi-supervised Classification

Article 05 February 2018

Semi-supervised Feature Selection Using Sparse Laplacian Support Vector Machine

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the information age, mass production of information has caused serious information overloaded. Facing this dilemma, support vector machine (SVM), one kind of fast information classification algorithm, becomes one effective solution. As one kind of full-supervised statistical machine learning, support vector machine (SVM) has get widely application for its good performance in information classification. However, in order to achieve satisfactory classification standard, it is necessary to train the SVM with quantities of labeled datasets. In fact, this condition cannot be fully achieved as the acquisition of the labeled data is usually difficult or the payment is much expensive. In contrast, unlabeled data are abundant and easy to collect. Furthermore, relatively few labeled datasets lead to a frequent drawback, that is the over fitting to the training data with a consequent loss of generality. Thus, to deal with this problem, the semi-supervised support vector machine (S³VM) learning method is proposed [1,2,3].

The semi-supervised support vector machine is the method utilizing both the labeled and unlabeled data for learning. The main goal of the S³VM is to employ the large collection of unlabeled data together with a limited labeled data to improve the classification accuracy. Because of its elegant properties with unique global optimal solution and avoiding the disaster of dimensionality, lots of scholars have marched for this area and applied the S³VM to many fields, such as text classification [4], the multi-class human action recognition [5, 6], biomedical science [7, 8], graph reduction [9], image and video classification [10], and applications in industry and business [11, 12].

However, the main drawback of S³VM is that the objective function is usually non-smooth. It needs to endure heavy burden in solving two quadratic programming problems with inversion matrix operation. Also, fast algorithm cannot be used, increasing the computing complexity. Some researchers have proposed several advanced methods to smooth the objection function. In 2005, the replacement of the non-smooth term $\max \{ 0,1 - \left| x \right|\}$ with $\exp ( - 3x^{2} )$ is given and the low density separation LDS-S³VM [3] was proposed by Chapelle and Zien. But the approximation accuracy is not so high. In 2009, Liu et al. showed the polynomial function [13]$P(x) = \frac{{1 - x^{2} }}{2} + \frac{1}{8}(1 - x^{2} )^{2} + \frac{1}{16}(1 - x^{2} )^{3} + \frac{5}{128}(1 - x^{2} )^{4} + \frac{7}{256}(1 - x^{2} )^{5} ,$ $x \in [ - \frac{1}{k},\frac{1}{k}]$. However, the 10-order polynomial function is too complex and has too many calculations. Later, Yang et. al offered one new smoothing strategy of approximate function $\rho_{\varepsilon } (x) = \sqrt {x^{2} + \varepsilon } \approx \left| x \right|$ [14] based on robust difference of convex functions in 2013. This new smooth method applied the DC optimization algorithms for solving the S⁴VMs, and didn’t add new variables and constraints to the corresponding S³VMs. It is a promising direction to facilitate the research of S⁴VMs. Zhang et al. introduced their cubic spline function [15] $s(x,k) = \frac{{k^{2} \left| x \right|^{3} }}{3} - kx^{2} - \frac{1}{3k} + 1,$ $(\left| x \right| \le \frac{1}{k}),$ and quintic spline function [16] $s(x,k) = - \frac{{k^{4} \left| x \right|^{5} }}{5} + \frac{1}{2}k^{3} x^{4} - kx^{2} - \frac{3}{10k} + 1,$ $(\left| x \right| \le \frac{1}{k})$ in 2015. However, the above smooth techniques are not so satisfied.

Motivated by the works of [3, 13,14,15,16], a new research question is gradually arisen, whether there is any other smooth technology, improving accuracy and decreasing calculation scale. In this paper, a new class of Bézier smooth functions is applied. Employing the smooth Bézier function $B_{n} (x)$ to approximate the non-smooth term $\max \{ 0,1 - \left| t \right|\}$, a novel class of Bézier smooth semi-supervised support vector machines (BS⁴VMs) is derived. The new programming possesses the following attractive advantages: firstly, the fast gradient algorithm can be used to solve the BS⁴VMs as the objective function becomes smooth and differentiable. Much calculation time can be saved. Secondly, a new class of smooth functions is proposed. The optimal smooth function can be selected for different scale datasets. Lastly and more importantly, convergence analysis and experimental comparisons verify that BS⁴VMs are superior to the given models in classification capability and efficiency.

In order to make the expression more clear, the definition of each variable involved in equations is listed in Table 1. For example, all vectors are column vectors, and $\nabla f(t)$ represents the gradient of the function.

Table 1 List of symbols

Full size table

The rest of this paper is organized as follows. The preliminary background knowledge of S³VM will be introduced in Sect. 2. Section 3 shows how the BS⁴VMs can be derived. A fast quasi-Newton algorithm for solving the programming will be followed in Sect. 4. Then the nonlinear BS⁴VMs and the convergence analysis of the model are listed in Sects. 5 and 6. The comparisons of the proposed algorithm with other advanced methods based on four kinds of datasets will be analyzed in Sect. 7. The discussion and conclusion will be followed in the last section.

2 Preliminary of semi-supervised support vector machine

The purpose of S³VM for binary classification is to maximize the margin by using the labeled and unlabeled data. Considering one programming, the training data contain the $l$ labeled points $\{ (x^{i} ,y_{i} )\}_{i = 1}^{l} ,y_{i} = \pm 1$ and the $u$ unlabeled dataset $\{ x^{i} \}_{i = l + 1}^{l + u}$, where $x^{i} { = (}x_{1}^{i} ,x_{2}^{i} ,...,x_{m}^{i} {)} \in {\mathbb{R}}^{m} .$ For the linearly separable data, one optimal separating hyperplane with the largest distance for the S³VM classifier should be explored.

Let $y \triangleq (y^{l} ,y^{l + u} )$ be a column vector, where $y^{l} { = (}y_{{1}} {,}y_{2} ,...,y_{l} {)}^{\rm T}$ is the known label, and $y^{{l}{\text{ + u}}} { = (}y_{{l{ + 1}}} {,}y_{{l{ + }2}} ,...,y_{{l{ + }u}} {)}^{\rm T}$ is the unknown label. The vector $y_{n} = [y_{l + 1} ,...,y_{l + n} ]$ based on the largest margin is the pursuing goal. For the linear condition, the S³VM can be described as

$$ \begin{gathered} \, J(w){ = }\min \frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{i = 1}^{l} {\xi_{i} } + C^{*} \sum\limits_{j = l + 1}^{l + u} {\xi_{i} } \hfill \\ s.t. \, y_{i} (w^{\rm T} x^{i} + b) \ge 1 - \xi_{i} ,i = 1,...,l \hfill \\ \, \left| {w^{\rm T} x^{{_{j} }} + b} \right| \ge 1 - \xi_{i} ,j = l + 1,...,l + u, \hfill \\ \, \xi = \{ \xi_{1} ,\xi_{2} ,...,\xi_{n} \} \ge 0 \hfill \\ \end{gathered} $$

(1)

where $C$ and $C^{*} ,$ the penalty parameters for both labeled and unlabeled data, are greater than zero. The programming (1) can be changed into the unconstrained form of

$$ \, J(w){ = }\mathop {\min }\limits_{{w,b \in R^{n + l} }} \frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{i = 1}^{l} {L^{2} (y_{i} (w^{\rm T} x_{i} + b)) + } C^{*} \sum\limits_{i = l + 1}^{l + u} {L(\left| {w^{\rm T} x_{i} + b} \right|)} $$

(2)

in which $L(t)$ is the hinge loss function and $L(t) = \max (0,1 - t)$[3].

3 Bézier smooth semi-supervised support vector for classification

3.1 Background knowledge about the Bézier function

Bézier curves were invented in 1968 by the French engineer Pierre Bézier for the initial purpose of designing automobile bodies [18]. For one series of interpolation points $P_{0} ,P_{1} , \cdots P_{n - 1} ,P_{n}$ that need to be fitted, the intermediate points $P_{1} , \cdots P_{n - 1}$ are used to specify the endpoint tangent vectors. Hence the Bézier curve passes through $P_{0}$ and $P_{n}$ and approximates the other controlpoints, just like Fig. 1. To accomplish this goal, some kinds of weighting functions representing the influence of the control points at a given point of the Bézier curve are required. Arbitrary function satisfying the requirements is allowed, but in most cases the Bernstein polynomial is employed. A Bézier curve of degree n can be expressed as $B(t) = \sum\nolimits_{i = 0}^{n} {C_{i}^{n} } (t)P_{i} ,$ where $P_{i}$ is the control point or anchor point. $C_{i}^{n} (t)$ means the Bernstein polynomial given by $C_{i}^{n} (t){ = }\left( \begin{gathered} n \hfill \\ i \hfill \\ \end{gathered} \right)(1 - t)^{n - i} t^{i}$, in which $i \in \{ 0,1,...,n\}$.

Many advantages for Bézier Curves have been noticed:

(1)
They always passed through anchor points $P_{0}$ and $P_{n}$.
(2)
They are always tangent to the lines of path $P_{0} \to P_{1}$ and $P_{n - 1} \to P_{n}$.
(3)
They always lie within the convex hull consisting of the control points [19]. Owing to these good performances, the Bézier curves have been widely applied in computer graphic, such as technical illustration programs, CAD programs, trajectory guidance, and so forth [20,21,22,23].

For approximating the hinge loss function, the quadratic parameter Bézier function can be expressed as $\left\{ {\begin{array}{*{20}c} {B_{2x} (t) = (2t - 1)/k} \\ {B_{2y} (t) = ( - 2t^{2} + 2t)/k} \\ \end{array} } \right.$ in which $p_{0} = ( - \frac{1}{k},0),p_{1} = (0,\frac{1}{k}),p_{2} = (\frac{1}{k},0)$. Eliminating the parameter $t$, $y = B_{2} (x) = - \frac{1}{2k}(k^{2} x^{2} - 1)$ will be given. Similarly, the cubic parameter Bézier function $\left\{ \begin{gathered} B_{3x} (t) = (2t^{3} - 2t^{2} + 2t - 1)/k \hfill \\ B_{3y} (t) = ( - 3t^{2} + 3t)/k \hfill \\ \end{gathered} \right.$ will be acquired by interpolating four points $p_{0} ,p_{1} ,p_{2} ,p_{3}$, in which $p_{0} = ( - \frac{1}{k},0),p_{1} = p_{2} = (0,\frac{1}{k}),$ $p_{3} = (\frac{1}{k},0).$ From the general formula $B(t) = \sum\nolimits_{i = 0}^{n} {C_{i}^{n} } (t)P_{i} .$ the n-order Bézier function $y = B_{n} (x)$ will be acquired.

Theorem 1

Bézier curve $B_{n - 1} (t)$ is $n - 1$-order smooth at the points $x = \pm \frac{1}{k}$.

Proof

The proof is based on mathematical induction.

(i) $\forall x \in \Omega ,B_{2} (x) = - \frac{1}{2}(k^{2} x^{2} - 1)$ satisfies the following equalities at the points $x = \pm \frac{1}{k}$

$$ \left\{ \begin{gathered} B_{2} ( - \frac{1}{k}) = 0,B_{2} (\frac{1}{k}) = 0, \hfill \\ B_{2}^{\prime } ( - \frac{1}{k}) = 1,B_{2}^{\prime } (\frac{1}{k}) = - 1. \hfill \\ \end{gathered} \right. $$

(3)

So, $B_{2} (x,k)$ is one-order smooth.

(ii) $B_{3} (x)$ satisfies the following equalities at the points $x = \pm \frac{1}{k}$,

$$ \left\{ {\begin{array}{*{20}c} \begin{gathered} B_{3} ( - \frac{1}{k}) = 0, \, B_{3} (\frac{1}{k}) = 0, \hfill \\ B_{3}^{\prime } ( - \frac{1}{k}) = 1, \, B_{3}^{\prime } (\frac{1}{k}) = - 1, \hfill \\ \end{gathered} \\ { \, B_{3}^{\prime \prime } ( - \frac{1}{k}) = 0, \, B_{3}^{\prime \prime } (\frac{1}{k}) = 0.} \\ \end{array} } \right. $$

(4)

Hence, $B_{3} (x)$ is twice-order smooth.

(iii) Let $B_{{P_{0} P_{1} ...P_{n - 1} }}$ denote the Bézier curve determined by points $P_{0} ,P_{1} ,...,P_{n - 1}$. Based on

$$ B(t) = B_{{P_{0} \cdots P_{n - 1} }} (t) = (1 - t)B_{{P_{0} \cdots P_{n - 2} }} (t) + tB_{{P_{1} \cdots P_{n - 1} }} (t), $$

(5)

according to the mathematical induction, $B_{n - 1} (x)$ is $n - 1$ order smooth can be proved.

3.2 Bézier smooth semi-supervised support vector for classification

From (2), the last term $C^{*} \sum\nolimits_{i = l + 1}^{l + u} {L(\left| {w^{\rm T} x_{i} + b} \right|)}$ is non-smooth and difficult to solve [4], making the formula (2) become a difficult-solving mixed-integer quadratic programming. Replacing this term with smooth function $y = B_{n} (x),$ a new class of Bézier smooth semi-supervised support vector machines (BS⁴VMs) is derived, described in formula (6)

$$ \mathop {\min }\limits_{w,b} \varphi (w,b) = \min \frac{1}{2}w^{2} + C\sum\limits_{i = 1}^{l} {L^{2} (y_{i} (w^{\rm T} x_{i} + b)) + } C^{*} \sum\limits_{i = l + 1}^{l + u} {B_{n} (w^{\rm T} x_{i} + b)} . $$

(6)

In this paper, without loss of generality, 4-order Bézier interpolation function $y = B_{4} (x)$ is taken into consideration. The higher the order of Bézier function, the better the approximation. The approximation comparison of different smooth models can be seen in Fig. 2.

From Fig. 2, one can find that (1) the 4-order Bézier function performs best among 3-order Bézier function, exponent function, 10-order polynomial, the cubic spline function, and quintic spline function in approximating the hinge loss function. (2) 3-order Bézier function performs almost the same with 10-order polynomial, while the calculation complexity is much less than the latter.

4 The nonlinear kernel for BS⁴VM

For the nonlinear case, the kernel function $k(x^{i} ,x^{j} ) = \phi (x^{i} )^{\rm T} \phi (x^{j} )$ can be applied to map the original data into the high dimension Hilbert space. After this transforming, the linear program will be arrived. Let $\phi :R^{m} \to R^{d} \left( {d > m} \right)$ be the mapping function of the formula (1). The nonlinear kernel-based S³VM can be shown as

$$ \begin{gathered} \, J(w){ = }\min \frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{i = 1}^{l} {\xi_{i} } + C^{*} \sum\limits_{j = l + 1}^{l + u} {\xi_{i} } \hfill \\ s.t. \, y_{i} (w^{\rm T} \phi (x^{{_{i} }} ) + b) \ge 1 - \xi_{i} ,i = 1,...,l \hfill \\ \, \left| {w^{\rm T} \phi (x^{{_{j} }} ) + b} \right| \ge 1 - \xi_{i} ,j = l + 1,...,l + u. \hfill \\ \, \xi = \{ \xi_{1} ,\xi_{2} ,...,\xi_{n} \} \ge 0 \hfill \\ \end{gathered} $$

(7)

In this paper, the Gaussian kernel $k(x^{i} ,x^{j} ) = \exp ( - \left\| {x^{i} - x^{j} } \right\|_{2}^{2} /2\sigma^{2} )$ is adopted and the kernel function $K = k(x^{i} ,x^{j} )$ is positive semi-definite matrix [17]. For formula (2), the variable $w$ will be replaced by $w = \sum\nolimits_{i = 1}^{m} {u_{i} } y_{i} x_{i}$, in which $u \in R^{m}$. The nonlinear S³VM is achieved.

$$ \mathop {\min }\limits_{u,b} \varphi (u,b) = \min \frac{1}{2}u^{2} + C\sum\limits_{i = 1}^{l} {L^{2} (y_{i} (k(x_{i} ,x_{j} )u_{j} + b)) + } C^{*} \sum\limits_{i = l + 1}^{l + u} {L(\left| {k(x_{i} ,x_{j} )u_{j} + b} \right|)} . $$

(8)

Applying the n-order Bézier smooth function, the nonlinear BS⁴VM model with kernel function is offered.

$$ \mathop {\min }\limits_{u,b} \varphi (u,b) = \min \frac{1}{2}u^{2} + C\sum\limits_{i = 1}^{l} {L^{2} (y_{i} (k(x_{i} ,x_{j} )u_{j} + b)) + } C^{*} \sum\limits_{i = l + 1}^{l + u} {B_{n} (k(x_{i} ,x_{j} )u_{j} + b)} . $$

(9)

The objective function (9) is $n - 1$-order differentiable for any arbitrary kernel.

5 One fast quasi-Newton method for solving BS⁴VM

In this section, the sub-LBFGS algorithm will be employed to solve semi-supervised problem (1) [2, 24, 25]. Differentiate (2) with the method of subgradient, the following will be given:

$$ \partial J(w) = w + C\sum\limits_{i = 1}^{l} {\beta_{i} y_{i} x_{i} + } C^{*} \sum\limits_{i = l + 1}^{l + n} {\beta_{i} y_{i} x_{i} ,} $$

(10)

where $\beta_{i} : = \left\{ \begin{gathered} 1{\text{ if }}i \in E,E: = \{ i:1 - y_{i} w^{\rm T} x_{i} > 0\} , \hfill \\ \psi {\text{ where}}\psi \in (0,1),{\text{ if }}i \in M,M: = \{ i:1 - y_{i} w^{\rm T} x_{i} = 0\} , \hfill \\ 0{\text{ if }}i \in W,W: = \{ i:1 - y_{i} w^{\rm T} x_{i} < 0\} , \hfill \\ \end{gathered} \right.$ $E,M{\text{ and }}W$ denote the sets of points which are in error, on the margin and well-classified, respectively. For a given direction $p$, it is required to find a subgradient $g$. Based on formula (10), Eq. (11) will be given:

$$ \begin{gathered} \mathop {\sup }\limits_{{g \in \partial J(w_{t} )}} g^{\rm T} p = \mathop {\sup }\limits_{{\beta_{i} ,i \in M_{t} }} (w + C\sum\limits_{{i \in M_{t} }}^{{}} {\beta_{i} y_{i} x_{i} + } C^{*} \sum\limits_{{i \in M_{t} }}^{{}} {\beta_{i} y_{i} x_{i} } )^{\rm T} p \hfill \\ { = }w^{\rm T} p + C\sum\limits_{{i \in M_{t} }}^{{}} {\mathop {\sup }\limits_{{\beta_{i} \in [0,1]}} \beta_{i} y_{i} x_{i}^{\rm T} p + C^{*} \sum\limits_{{i \in M_{t} }}^{{}} {\mathop {\sup }\limits_{{\beta_{i} \in [0,1]}} \beta_{i} y_{i} x_{i}^{\rm T} p.} } \hfill \\ \end{gathered} $$

(11)

Now the S³VM algorithm with sub-LBFGS optimization solving procedure can be offered (Algorithm 1).

In step 3 of Algorithm 1, a classifier is obtained by firstly running BS⁴VM on the labeled examples alone. Steps 5–17 show the loop iteration process when solving the objective programming. Step 9 identifies pairs of unlabeled examples with temporary positive and negative labels such that switching these labels would decrease the value of the objective function.

6 Convergence analysis of the Bézier function and BS⁴VM

This section will show the approximation precision of Bézier function to hinge loss function and the convergence of BS⁴VM. In addition, the convergence condition holds in the nonlinear BS⁴VM.

6.1 Approximation accuracy analysis of Bézier function

Theorem 2

Let $x \in R,$$k > 0,$$L(x)$ stand for the hinge loss function, and $B_{4} (x,k)$ be the Bézier function with five interpolation points. There will be such results

$$ 0 \le B_{4} (x,k) \le L(\left| x \right|,k) $$

(12)

$$ 0 \le L^{2} (\left| x \right|,k) - B_{4}^{2} (x,k) \le \frac{15}{{64k^{2} }} $$

Proof

(i) It is obvious that $L(\left| x \right|,k) - B_{4} (x,k){ = }0$ holds with $\left| x \right| > \frac{1}{k}.$ For $x \in [ - \frac{1}{k},0),$ $L(\left| x \right|,k)$ and $B_{4} (x,k)$ are monotonically increasing, and $L(\left| x \right|,k) - B_{4} (x,k) \ge$ $L(\left| {\frac{ - 1}{k}} \right|,k) - B_{4} (\frac{ - 1}{k},k){ = }0$ is easy to obtain. For $x \in \left[ {0,\frac{1}{k}} \right],$ $L(\left| x \right|,k)$ and $B_{4} (x,k)$ are monotonically decreasing, and there will be $L(\left| x \right|,k) - B_{4} (x,k) \ge L(\left| \frac{1}{k} \right|,k) - B_{4} (\frac{1}{k},k){ = }0$. So $0 \le B_{4} (x,k) \le L(\left| x \right|,k)$ is achieved.

(ii) $L^{2} (\left| x \right|,k) - B_{4}^{2} (x,k){ = }0$ holds with $\left| x \right| > \frac{1}{k}.$ For $x \in [ - \frac{1}{k},0),$ from (i), one can find $L(\left| x \right|,k)$ and $B_{4} (x,k)$ are monotonically increasing; therefore, $0 \le L(\left| x \right|,k){ - }B_{4} (x,k) \le L(0,k){ - }B_{4} (0,k){ = }\frac{1}{8k}$ is established. As is known, $L(\left| x \right|,k){ + }B_{4} (x,k) \le$ $L(0,k) + B_{4} (0,k){ = }\frac{15}{{8k}},$$L^{2} (\left| x \right|,k) - B_{4}^{2} (x,k) \le \frac{1}{8k} \cdot \frac{15}{{8k}}{ = }\frac{15}{{64k^{2} }}$ will be derived. In short, $0 \le L^{2} (\left| x \right|,k) - B_{4}^{2} (x,k) \le \frac{15}{{64k^{2} }}$ is proved.

6.2 Convergence analysis of the BS⁴VM

Theorem 3

Let $A \in R^{m \times n} ,b \in R^{m \times 1} ,$ and define two real functions $g(x)$ and $f(x,k)$ as follows:

$$ \begin{gathered} g(x) = \frac{1}{2}\left\| x \right\|_{2}^{2} + \frac{1}{2}\left\| {L(\left| {Ax + b} \right|)} \right\|_{2}^{2} + \frac{1}{2}\left\| {L(\left| {Ax + b} \right|)} \right\|, \hfill \\ f(x,k) = \frac{1}{2}\left\| x \right\|_{2}^{2} + \frac{1}{2}\left\| {B_{{4}} (Ax + b,k)} \right\|_{2}^{2} + \frac{1}{2}\left\| {B_{{4}} (Ax + b,k)} \right\|. \hfill \\ \end{gathered} $$

(13)

The following results can be achieved:

(1) $\forall k > 0$, there will be $\left\| {x_{k}^{*} - x^{*} } \right\| \le \frac{{{15}}}{{128k^{2} }}$

(2) $\mathop {\lim }\limits_{k \to \propto } \left\| {x_{k}^{*} - x^{*} } \right\| = 0$

Proof

(i) Applying the first-order optimization condition and convex property of $g(x)$ and $f(x,k)$, formula (14) is attained,

$$ g(x_{k}^{*} ) - g(x^{*} ) \ge \nabla g(x^{*} )(x_{k}^{*} - x^{*} ) + \frac{1}{2}\left\| {x_{k}^{*} - x^{*} } \right\|_{2}^{2} = \frac{1}{2}\left\| {x_{k}^{*} - x^{*} } \right\|_{2}^{2} , $$

$$ f(x_{{}}^{*} ,k) - f(x_{k}^{*} ,k) \ge \nabla f(x^{*} )(x^{*} - x_{k}^{*} ) + \frac{1}{2}\left\| {x_{k}^{*} - x^{*} } \right\|_{2}^{2} = \frac{1}{2}\left\| {x_{k}^{*} - x^{*} } \right\|_{2}^{2} {. } $$

(14)

Based on the formula (13) and the property of $B_{4} (x,k) \le h(x)$, formula (14) is acquired,

$$ \begin{gathered} \left\| {x_{k}^{*} - x^{*} } \right\| \le g(x_{k}^{*} ) - g(x^{*} ) + f(x_{{}}^{*} ,k) - f(x_{k}^{*} ,k) \hfill \\ { = (}f(x_{{}}^{*} ,k) - g(x^{*} ){) - (}f(x_{k}^{*} ,k) - g(x_{k}^{*} )) \hfill \\ \, \le g(x^{*} ) - f(x_{{}}^{*} ,k) \hfill \\ \, = \frac{1}{2}\left\| {L(Ax + b)} \right\|_{2}^{2} - \frac{1}{2}\left\| {B_{4} (Ax + b,k)} \right\|_{2}^{2} . \hfill \\ \end{gathered} $$

(15)

According to Theorem 2, for $x \in [ - \frac{1}{k},\frac{1}{k}]$, $L_{{}}^{2} (\left| x \right|,k) - B_{4}^{2} (\left| x \right|,k) \le L_{{}}^{2} (0,k) - B_{4}^{2} (0,k){ = }\frac{{{15}}}{{64k^{2} }}$. So $\left\| {x_{k}^{*} - x^{*} } \right\|{ = }\frac{1}{2}[L_{{}}^{2} (\left| x \right|,k) - B_{4}^{2} (x;k)] \le \frac{{{15}}}{{128k^{2} }}$ holds.

(ii) As $\left\| {x_{k}^{*} - x^{*} } \right\| \le \frac{15}{{128k^{2} }}$, it is easy to draw the conclusion of $\mathop {\lim }\limits_{k \to \infty } \left\| {x_{k}^{*} - x^{*} } \right\| = 0$. Theorem 3 is proved.

7 The experiments and comparisons

This section will evaluate the performance, effectiveness and complexity of the proposed BS⁴VMs. It will be surveyed from two dimensions. The longitudinal dimension means the comparison BS⁴VMs with other three smooth models, LDS⁴VM (S³VM with low density separation) [3], CS⁴VM (S³VM with cubic spline function) [15], and the QS⁴VM (S³VM with quintic spline function) [16]. The horizontal dimension stands for the comparison of BS⁴VMs within different orders. This part lists three kinds of BS⁴VMs, BS⁴VM-I (S³VM with 2-order Bézier function), BS⁴VM-II (S³VM with 3-order Bézier function), and BS⁴VM-III (S³VM with 4-order Bézier function). Experiments are carried on four kinds of datasets, the artificial datasets, UCI dataset,^{Footnote 1} USPS dataset, and large-scale NDC dataset. These four kinds of datasets are of significant difference. Subsection 7.1 shows the experiment on small-size artificial dataset named “checkboard.” It is produced generated by two dimensions of uniformly distributing the regions to points. The “checkboard” belongs to one kind of data with nonlinear separable. In subsection 7.2, UCI datasets are the real-world datasets. They are generated from some statistical departments, electronic sensors, and reports. Some datasets are multi-classes and irregular data. Preprocessing is required. They have different data size. In subsection 7.3, handwritten symbol consists of 16*16 grayscale pixels of handwritten digits from ‘0' to ‘9'. These data are from the USPS Company and belong to the digital pattern recognition of real world. The last kind of dataset is NDC, namely, normally distributed clusters, generated by the NDC algorithm. The algorithm generates a series of random centers for multivariate normal distributions. Randomly generate a fraction for this center and a separating plane. Based on the plane, some classes for centers will be chosen. Then the points are randomly generated from the distributions. The size can be changed according to the experimenter. For the test of large-scale dataset, the NDC is a good choice.

Because of the too high complexity of the 10-order polynomial function in [13], the calculation time exceeds the acceptable range. Thus, this section ignores algorithm [13] in comparison. As the parameters $C$ and $C^{*}$ are not sensitive to the accuracy of classification, $C = C^{*}$ is set varying from 10^–2 to 10². All classifiers are implemented on PC of Windows 10 with 64 bit operation system, Intel I7 processor (1.6 GHZ) and 16 GB RAM. The codes of models are written in MATLAB R2009a.

Experiments are set up according to the following rules: the ratio of the labeled points $m$ varies from 5 to 65%, and the rest is the unlabeled points, similar to the unlabeled data ratio evolving from 20 to 80% in [26]. The labeled ratio is set according to the missing label scenarios in real world. 5% labeled ratio means the majority of data labels are missing. This is a picky condition to detect a good classifier. On the other hand, if the labeled ratio is more than 70%, too many labels means the gap between semi-supervised SVM and full-supervised SVM is quite small. Therefore, the labeled ratio is set from 5 to 65% with the interval of 20%. The labeled data are used for training the LDS⁴VM, CS⁴VM, QS⁴VM, BS⁴VM, and then predicting the unlabeled points. Before simulation, all databases are normalized and the two classes of label are divided into classes of − 1 and + 1. Each experiment is carried on with tenfold cross-validation.

7.1 Experiment based on artificial dataset

The first experiment is designed to demonstrate the effectiveness of BS⁴VM through the artificial nonlinear “tried and true” checkboard dataset [27]. The checkboard dataset is generated by two dimensions of uniformly distributing the regions to points and labeling two classes “White” and “Black.” Each dimension has 100 points, and thus the checkboard dataset has 10,000 samples to train and test the algorithms, just as Fig. 3 shows. The comparison result can be seen in Table 2.

Table 2 Test accuracy on checkboard dataset with different labeled ratio (the bold part is the best result)

Full size table

Table 2 demonstrates that (1) with the increase in the labeled ratio, the classification accuracy climbs on the whole. (2) The higher the order of the spline polynomial, the better the classification accuracy. (3) The checkboard dataset is not suitable for too few labeled samples, as the experiment result is not so satisfactory with labeled ratio equal to 5%. Lastly, the comparison in Table 2 shows the BS⁴VM performs the best classification accuracy.

7.2 Results on UCI datasets

In this subsection, eight real-world UCI datasets^{Footnote 2} are chosen to test the four classification algorithms. This collection of databases was created in 1987 and has been widely used by the machine learning community for the empirical analysis. It provides various datasets from many areas in reality life, such as disease diagnosis, manufacturing, business, and so on. The calculating results are given in Table 3.

Table 3 Tenfold cross-validation results of the average correction with different ratios of labeled points on eight public datasets for the four algorithms (the bold part is the best result)

Full size table

Table 3 illustrates the detailed comparisons of the proposed model with other three models in eight different datasets. From Table 3, one can find that with the increase in the labeled ratio, all the algrorithms show better classification accuracy. For Clean dataset with labeled ratio varying from 25 to 65%, the experimental result by BS⁴VM (accuracy 68.35%, 71.28%, 75.45%) outperforms other three algorithms, LDS⁴VM (66.95%, 65.94%, 72.90%), CS⁴VM (66.11%, 66.13%, 70.66%), and QS⁴VM (66.53%, 67.94%, 71.41%). This conclusion holds for datasets Lympho, Bupa, Tumor, WDBC, and Adult as well in most scenarios. For datasets Balance, German, the advantages of classification accuracy go up and down, and BS⁴VM performs a litter better than other three methods.

For the purpose of describing the dynamic process of test accuracy for each dataset with various labeled ratios, Fig. 4 is given. It presents the overall trend of these algorithms. It claims all the lines have the trend of climbing with the labeled ratio increasing. Taking Data (4) for example, the red line stands for the proposed BS⁴VM method, the blue and black lines mean QS⁴VM, CS⁴VM, and the purple line denotes LDS⁴VM. For the labeled ratio 5%, the accuracy of CS⁴VM is better than BS⁴VM. But with the ascension of ratio, the red line is always above the other three lines, claiming the BS⁴VM performs the best with high labeled ratio.

To further analyze the statistical accuracy more clearly, the average ranks of all the classifiers are computed and listed in Table 4 and Fig. 5. Table 4 indicates the average ranks of eight datasets. This rank order is calculated by average value of each algorithm with different labeled ratios. The smaller the number of rank, the higher the simulation accuracy. From the last row of Table 4, one can notice that BS⁴VM ranks in the first place for eight datasets, whereas the others stand on second, third and fourth places.

Table 4 Accuracy average ranks of LDS⁴VM,CS⁴VM, QS⁴VM, BS⁴VM with linear kernel

Full size table

In order to verify the advantage of proposed algorithm BS⁴VM, the Friedman statistical method is employed. Fredman statistic is distributed based on $\chi_{F}^{2}$ with $k - 1$ degree of freedom, where $k$ means the counts of algorithms and $N$ stands for the counts of datasets.

For the above experiment on UCI datasets, under the null hypothesis that all the algorithms are equivalent, Fredman statistic can be calculated as [28]

$$ \chi_{F}^{2} = \frac{12N}{{k(k + 1)}}[\sum\limits_{i = 1}^{4} {R_{i}^{2} } - \frac{{k(k + 1)^{2} }}{4}] = \frac{12 \times 8}{{4 \times 5}}[{3}{\text{.2188}}^{2} + {2}{\text{.7031}}^{2} + {2}{\text{.8281}}^{2} + {1}{\text{.25}}^{2} - \frac{{4 \times 5^{2} }}{4}]{ = 10}{\text{.6954}} $$

$$ F_{F} = \frac{{(N - 1)\chi_{F}^{2} }}{{N(k - 1) - \chi_{F}^{2} }} = \frac{7 \times 10.6954}{{8 \times 3 - 10.6954}}{ = }5.6264 $$

For four algorithms and eight datasets, $F_{F}$ is distributed with $(k - 1) = 3$ and $(k - 1)(N - 1) = 21$ degrees of freedom. The critical or threshold value of $F(3,21)$ for significance level $\alpha = 0.05$ is 3.072. Obviously, $F_{F} = 5.6264 > F(3,21) = 3.072$, thus the null hypothesis will be rejected, and these four algorithms having significant differences can be surely verified.

After the null hypothesis is rejected, the Nemenyi test can be proceed when all classifiers are compared to each other [28]. The performance of two classifiers is of significant difference if the corresponding average ranks differ by at least the critical difference $CD = q_{\alpha } \sqrt {\frac{k(k + 1)}{{6N}}}$. For the UCI experiment, $CD = 2.291\sqrt {\frac{4 \times 5}{{6 \times 8}}} = 1.4788$ at $\alpha = 0.1.$ As the average rank difference between LDS⁴VM and BS⁴VM (3.2188–1.25 = 1.9688) is bigger than critical difference 1.4788, the performance of BS⁴VM is significantly better than that of LDS⁴VM. Similarly, the performance of BS⁴VM is quite superior than that of QS⁴VM (2.8281 − 1.25 = 1.5781 > 1.4788). Due to 2.7031 − 1.25 = 1.4531 < 1.4788, this Nemenyi test cannot detect the significant difference between CS⁴VM and BS⁴VM.

Figure 5 visually presents the accuracy ranks of experiment results with different labeled ratios. One can find that the advantage of BS⁴VM varies. But from a statistical point of view, the BS⁴VM performs best, just as Table 4 shows. The proposed algorithm shows satisfactory performance from Fig. 5b–d for most cases. This reminds us that, for different machine learning algorithms, the statistical results of quantities of datasets are more precision and credible, rather not one specific calculation.

7.3 Results on handwritten symbol recognition

In this section, USPS handwritten datasets^{Footnote 3} will be investigated to show the impact of the number of labeled data on the classification accuracy. The handwritten database consists of grayscale images of handwritten digits from ‘0' to ‘9', as shown in Fig. 6.

The comparison of four pairwise digits ‘0' versus ’8', ‘2' versus ‘4', ‘1' versus ‘7', and ‘3' versus ’6' is given, respectively. The calculation results of accuracy and dynamic process can be seen in Table 5 and Fig. 7. From Table 5 and Fig. 7, the classification accuracies of two pairs ‘0′ versus ’8′ and ‘1′ versus ‘7′ arrive at more than 80%, even almost 99%, while the classification accuracies of pairs ‘2' versus ‘4' and ‘3' versus ’6' are less than 80%, even smaller than 52%. Thus, the generalization ability of S³VM varies. The suitable dataset should be considered if one plans to carry out the identification process.

Table 5 Tenfold cross-validation results of the average correction and the number of labeled points on USPS database for four algorithms (the bold part is the best result)

Full size table

Table 6 and Fig. 8 express the accuracy ranks of each dataset with various labeled percentages. Table 6 proves that the BS⁴VM ranks in the first place; meanwhile, the other three algorithms perform similarly. Figure 8 shows the accuracy rank of each calculation. Taking Fig. 8d for example, when the labeled data are more than 50%, the proposed learning algorithm gets well trained and shows satisfactory precision.

Table 6 Average ranks of five algorithms with linear kernel on USPS accuracy values

Full size table

The Friedman statistical method can also be applied on USPS dataset to compare these algorithms from a quantitative perspective. For the four algorithms and four datasets,

$$ \chi_{F}^{2} = \frac{12N}{{k(k + 1)}}[\sum\limits_{i = 1}^{4} {R_{i}^{2} } - \frac{{k(k + 1)^{2} }}{4}] = \frac{12 \times 4}{{4 \times 5}}[{2}{\text{.65625}}^{2} + {2}{\text{.8125}}^{2} + {2}{\text{.75}}^{2} + {1}{\text{.78125}}^{2} - \frac{{4 \times 5^{2} }}{4}]{ = 10}{\text{.1203}} $$

$$ F_{F} = \frac{{(N - 1)\chi_{F}^{2} }}{{N(k - 1) - \chi_{F}^{2} }} = \frac{3 \times 10.1203}{{4 \times 3 - 10.1203}}{ = 16}{\text{.1521}} $$

$F_{F}$ is distributed with $(k - 1) = 3$ and $(k - 1)(N - 1) = 9$ degrees of freedom. The threshold value of $F(3,9)$ for significance level $\alpha = 0.05$ is 3.863. Obviously, $F_{F} = 16.1521 > F(3,9) = 3.863$, thus the null hypothesis will be rejected, and the hypothesis that four algorithms are of significant difference is proved. This means the generalization and robustness of BS⁴VM are promising.

7.4 Results on large-scale NDC dataset for nonlinear Gaussian kernel

In the last subsection, further to verify which algorithm performs best on both accuracy and calculating time among BS⁴VMs, experiments based on the NDC dataset^{Footnote 4} for nonlinear Gaussian kernel are carried out. The NDC dataset is designed with large-scale attributes or with large samples to test the robustness of the new algorithms. The NDC dataset is a temporal higher-order network dataset, which means a sequence of time-stamped simplices where each simplex is a set of nodes. As in the real world, large-scale datasets are more commonly classified, the test accuracy and calculation time should be considered as well.

Table 7 and Fig. 9 show the performances of three kinds of BS⁴VMs, namely BS⁴VM-I, BS⁴VM-II, and BS⁴VM-III with different orders of Bézier function. One can notice that (1) the BS⁴VMs classify the NDC datasets very well, and most of the results are more than 96%. (2) With the climbs of labeled ratio and attributions of NDC1 ~ NDC5, the computing time increases quickly. However, with the rise of samples of NDC6 ~ NDC10, the calculating time doesn’t go up dramatically. (3) Because these three algorithms are belong to the same kind of smooth technique, the accuracy differences are quite small. But the accuracy of BS⁴VM-III stands the first place for most cases. Meanwhile, the computing time of BS⁴VM-I and BS⁴VM-II line up top two on account of the higher complexity of BS⁴VM-III.

Table 7 The test correction and calculation time comparisons for Gaussian kernel (the bold part is the best result)

Full size table

In order to clarify the comparison results, Table 8 lists the average ranks of BS⁴VM-I, BS⁴VM-II and BS⁴VM-III with Gaussian kernel on accuracy and calculation time for NDC. From the statistics, the accuracy average rank of BS⁴VM-III is 1.8875, smaller than other two number, indicating this method is more superior. The consuming time ranks of BS⁴VM-I and BS⁴VM-II are equal, revealing the computing complexity are the same even though BS⁴VM-II has higher order of Bézier function.

Table 8 Average ranks of BS⁴VM-I, BS4VM-II and BS4VM-III with Gaussian kernel on NDC correction and time values

Full size table

For the purpose of verifying whether the performances of the three algorithms have significant difference, the Friedman statistical method is utilized. For this experiment with three methods and ten datasets, statistical results $\chi_{F}^{2}$ and $F_{F}$ will be

$$ \chi_{F}^{2} = \frac{12N}{{k(k + 1)}}[\sum\limits_{i = 1}^{5} {R_{i}^{2} } - \frac{{k(k + 1)^{2} }}{4}] = \frac{12 \times 10}{{3 \times 4}}[{2}{\text{.175}}^{2} + {1}{\text{.95}}^{2} + {1}{\text{.8875}}^{2} - \frac{{3 \times 4^{2} }}{4}]{ = 0}{\text{.958}} $$

$$ F_{F} = \frac{{(N - 1)\chi_{F}^{2} }}{{N(k - 1) - \chi_{F}^{2} }} = \frac{9 \times 0.958}{{10 \times 3 - 0.958}}{ = }0.4528 $$

The critical value of $F(2,18)$ for significance level $\alpha = 0.05$ is 3.55. Visibly, $F_{F} = 0.4528 < F(2,18) = 3.555$, and thus these three algorithms have no significant differences from the quantification method is verified. It is suggested that if high accuracy is considered, higher order of BS⁴VMs should have the priority. However, if calculating time weighs a lot, the lower order of BS⁴VMs should be chosen.

For the goal of visual expression, the diversities of classification correction and calculation time of each dataset with the variety of labeled ratio, the histogram Figs. 10 and 11 are given.

From Fig. 10c and d, the classification precision of BS⁴VM-III lies in the forefront, when the threshold of labeled proportion is above 45%. However, this superior performance is at the cost of complex calculation, just as Fig. 11c and d shows. From the ranks of calculation time, BS⁴VM-I shows the perfect performance in Fig. 11a, b and d, as the lower order of Bézier function, the less computational complexity.

8 Conclusion

Considering the non-smooth term of semi-supervised support vector machines blocking the improvement in classification accuracy, a new class of Bézier functions is ultilized to approximate the hinge loss function, and a novel kind of Bézier smooth semi-supervised support vector machines (BS⁴VMs) is constructed. The convergence proves the proposed model can draw close to the non-smooth objective function theoretically. As n-order Bézier function is $n - 1$-order smooth and differentiable, the fast algorithm can be used to solve the programming. In contrast to the LDS⁴VM, CS⁴VM, and QS⁴VM, experiments on artificial data, UCI data, USPS handwritten database, and NDC datasets clearly show that the BS⁴VMs have the best performance and efficiency among exponential function, cubic spline function, and quintic spline function. Moreover, the proposed algorithms show good performance for large-scale datasets. Due to the advantage of different order of BS⁴VMs varying, when applying BS⁴VMs, performance or efficiency priority should be paid attention. For further research, the feature selection and fuzzy membership should be good ways to improve the accuracy for different kinds of datasets. Bézier function for semi-supervised SVM on regression and its generalization performances will be explored as well.

Notes

The UCI dataset can be available at https://archive.ics.uci.edu/ml/ datasets.php and https://cs.nyu.edu/~roweis/data.html.
http://archive.ics.uci.edu/ml/index.php.
The USPS datasets are available at http://www.cs.nyu.edu/*roweis/data.html.
http://www.cs.cornell.edu/~arb/data/NDC-classes/.

References

Bennett KP, Demiriz A (1999) Semi-supervised support vector machines. In: Kearns Michael S, Solla Sara A, Cohn David A (eds) Advances in neural information processing systems. MIT Press, London, pp 368–374
Google Scholar
Reddy IS, Shevade S, Murty MN et al (2011) A fast quasi-Newton method for semi-supervised SVM. Pattern Recogn 44(10):2305–2313
Article Google Scholar
Chapelle O, Zien A (2005) Semi-supervised classification by low density separation. In: AISTATS 2005—Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pp 57–64
Lanquillon C (2000) Learning from labeled and unlabeled documents: a comparative study on semi-supervised text classification. In: Zighed Djamel A, Komorowski Jan, Żytkow Jan (eds) Lecture notes in computer science. Springer, Berlin, pp 490–497
Google Scholar
Liu CY, Jiang ZS, Su XX (2019) Detection of human fall using floor vibration and multi-features semi-supervised SVM. Sensors 19(17):3720
Article Google Scholar
Kumar MP, Rajagopal MK (2019) Detecting facial emotions using normalized minimal feature vectors and semi-supervised twin support vector machines classifier. Appl Intell 49:4150–4174
Article Google Scholar
Lang RL, Lu RB, Zhao CQ (2020) Graph-based semi-supervised one class support vector machine for detecting abnormal lung sounds. Appl Math Comput 364:124487
MathSciNet MATH Google Scholar
Ju Z, Gu H (2016) Predicting pupylation sites in prokaryotic proteins using semi-supervised self-training support vector machine algorithm. Anal Biochem 507:1–6
Article Google Scholar
Xie XJ (2020) Multi-view semi-supervised least squares twin support vector machines with manifold-preserving graph reduction. Int J Mach Learn Cybern 11(11):2489–2499
Article Google Scholar
Mygdalis V, Iosifidis A, Tefas A et al (2018) Semi-supervised subclass support vector data description for image and video classification. Neurocomputing 278:51–61
Article Google Scholar
Liu CY, Gryllias K (2020) A semi-supervised support vector data description-based fault detection method for rolling element bearings based on cyclic spectral analysis. Mech Syst Signal Process 140:106682
Article Google Scholar
Li Z, Tian Y, Li K et al (2017) Reject inference in credit scoring using semi-supervised support vector machines. Expert Syst Appl 74:105–114
Article Google Scholar
Liu YQ, Liu SY, Gu MT (2009) Polynomial smooth classification algorithm of vector machines. Comput Sci (in Chinese) 36(7):179–181
Google Scholar
Yang L, Wang L (2013) A class of smooth semi-supervised SVM by difference of convex functions programming and algorithm. Knowl-Based Syst 41:1–7
Article Google Scholar
Zhang XD, Ma JG (2015) A general cubic spline smooth semi-supervised support vector machine. Chin J Eng 37:385–389
Google Scholar
Zhang XD, Ma JG, Li AH et al (2015) Quintic spline smooth semi-supervised support vector classification machine. J Syst Eng Electron 26:626–632
Article Google Scholar
Deng N, Tian Y, Zhang C (2012) Support vector machines: optimization based theory, algorithms, and extensions. Chapman and Hall/CRC, London
Book Google Scholar
Bézier P (1968) Renault uses numerical control for car body design and tooling[C]//Paper Sae 680010, Society of Automotive Engineers Congress
Choi JW, Elkaim GH (2008) Bézier curve for trajectory guidance. World Congr Eng Comput Sci WCECS 2173(1):22–24
Google Scholar
Mandad M, Campen M (2020) Bézier guarding: precise higher-order meshing of curved 2D domains. ACM Trans Graph 39(4):103–118
Article Google Scholar
Raja SP (2020) Bézier and B-spline curves - a study and its application in wavelet decomposition. Int J Wavelets Multiresolut Inf Process 18(4):2050030
Article MathSciNet Google Scholar
Zhu YF, Xu G, Ling CN (2019) Construction of energy-minimizing Bézier surfaces interpolating given diagonal curves. J Image Graph 24(11):1998–2008
Google Scholar
Wu Q, Wang E (2015) Bézier function smooth support vector regression. ICIC express letters. Part B Appl Int J Res Surv 6:1773–1779
Google Scholar
Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York
Book Google Scholar
Yu J, Vishwanathau SVN, Gunter S et al (2010) A Quasi-Newton approach to nonsmooth convex optimization problems in machine learning. J Mach Learn Res 11:1145–1200
MathSciNet MATH Google Scholar
Chen W, Shao Y, Hong N (2014) Laplacian smooth twin support vector machine for semi-supervised classification. Int J Mach Learn Cybern 5:459–468
Article Google Scholar
Ho TK, and Kleinberg EM (1996) “Checkerboard dataset”, http://www.cs.wisc.edu/~musicant/data/ndc/ , accessed on July 20 2020
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by the Social Science Foundation of China under Grant (18ZDA027).

Author information

Authors and Affiliations

School of Marxism, Xi’an Shiyou University, Xi`an, 710065, China
En Wang
Research Institute of Frontier Science, Beihang University, Beijing, 100191, China
Zi-Yang Wang
School of Automation, Xi’an University of Posts and Telecommunications, Xi`an, 710121, China
Qing Wu

Authors

En Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zi-Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to En Wang or Zi-Yang Wang.

Ethics declarations

Conflict of interest

The author declares that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, E., Wang, ZY. & Wu, Q. One novel class of Bézier smooth semi-supervised support vector machines for classification. Neural Comput & Applic 33, 9975–9991 (2021). https://doi.org/10.1007/s00521-021-05765-6

Download citation

Received: 12 August 2020
Accepted: 21 January 2021
Published: 01 March 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00521-021-05765-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

One novel class of Bézier smooth semi-supervised support vector machines for classification

Abstract

Similar content being viewed by others

DCA Based Algorithms for Feature Selection in Semi-supervised Support Vector Machines

Weighted Least Squares Support Vector Machine for Semi-supervised Classification

Semi-supervised Feature Selection Using Sparse Laplacian Support Vector Machine

Explore related subjects

1 Introduction

2 Preliminary of semi-supervised support vector machine

3 Bézier smooth semi-supervised support vector for classification

3.1 Background knowledge about the Bézier function

Theorem 1

Proof

3.2 Bézier smooth semi-supervised support vector for classification

4 The nonlinear kernel for BS4VM

5 One fast quasi-Newton method for solving BS4VM

6 Convergence analysis of the Bézier function and BS4VM

6.1 Approximation accuracy analysis of Bézier function

Theorem 2

Proof

6.2 Convergence analysis of the BS4VM

Theorem 3

Proof

7 The experiments and comparisons

7.1 Experiment based on artificial dataset

7.2 Results on UCI datasets

7.3 Results on handwritten symbol recognition

7.4 Results on large-scale NDC dataset for nonlinear Gaussian kernel

8 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

4 The nonlinear kernel for BS⁴VM

5 One fast quasi-Newton method for solving BS⁴VM

6 Convergence analysis of the Bézier function and BS⁴VM

6.2 Convergence analysis of the BS⁴VM