Linear and Nonlinear Classifiers of Data with Support Vector Machines and Generalized Support Vector Machines

Nazir, Talat; Qi, Xiaomin; Silvestrov, Sergei

doi:10.1007/978-3-319-42105-6_18

Talat Nazir^3,4,
Xiaomin Qi³ &
Sergei Silvestrov³

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 179))

1527 Accesses

Abstract

The support vector machine for linear and nonlinear classification of data is studied. The notion of generalized support vector machine for data classifications is used. The problem of generalized support vector machine is shown to be equivalent to the problem of generalized variational inequality and various results for the existence of solutions are established. Moreover, examples supporting the results are provided.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Support Vector Machines

Linear Support Vector Machines

Support Vector Machines

Keywords

1 Support Vector Machine

Support vector machines (SVM) [2, 3, 13, 14, 18] were developed by Vapnik et al. (1995) and are gaining popularity due to many attractive features. As a very powerful tool for data classification and regression, it has been used in many fields, such as text classification [5], facial expression recognition [9], gene analysis [4] and many others [1, 6,7,8, 10,11,12, 17, 19,20,21,22]. Recently, it has been used for faults classification in a water level control system [15]. And a faults classifier based SVM is used to diagnose the faults for a water level control process [16].

The classification problems can be restricted to consideration of the two-class problems without loss of generality. The goal of support vector classification (SVC) is to separate the two classes by a hyperplane which can also work well on unseen examples. The method is to find the optimal hyperplane that maximizes the margin between two classes of data. The set of data is said to be optimally separated by the hyperplane if it is separated without error and the distance between the closest data is maximal. Support vector classification can be thought of a process using given data to find the decision plane which can guarantee good predictive performance on unseen data. And the process of finding the decision plane is a quadratic programming process.

In this paper, we study the problems of support vector machine and generalized support vector machine. We also show the sufficient conditions for the existence of solutions for problems of generalized support vector machine. We also present various examples to support these results.

Throughout this paper, by $\mathbb {N}$, $\mathbb {R}$, $\mathbb {R}^{n}$ and $\mathbb {R}_{n}^{+}$ we denote the set of all natural numbers, the set of all real numbers, the set of all n-tuples real numbers, the set of all n-tuples of nonnegative real numbers, respectively.

Also, we consider $\left\| \cdot \right\| \ $and $<\cdot ,\cdot>$ as Euclidean norm and usual inner product on $\mathbb {R}^{n}$, respectively, such as, $<\mathbf {x},\mathbf {y}>=\mathbf {x}.\mathbf {y} =x_{1}y_{1}+x_{2}y_{2}+\cdots +x_{n}y_{n}$ for all $\mathbf {x}=\left( x_{1},x_{2},\ldots ,x_{n}\right) $, $\mathbf {y}=\left( y_{1},y_{2},\ldots ,y_{n}\right) $ in $\mathbb {R}^{n}.$ Furthermore, for any two vectors $\mathbf {x,y\in }$ $\mathbb {R}^{n}$, we say that $\mathbf {x\le y}$ if and only if $x_{i}\le y_{i}$ for all $i\in \{1,2,\ldots ,n\}$, where $x_{i}$ and $y_{i}$ are the components of $\mathbf {x}$ and $\mathbf {y}$, respectively.

1.1 Data Classification

Actually, complex real-world applications are always not linearly separable. Kernel representations offer an alternative solution by projecting the data into a higher dimensional feature space to increase the computational power of the linear learning machine .

In order to learn linear or non-linear relations with a linear machine, a set of non-linear features is selected. This is equivalent to applying a fixed non-linear mapping function $\varPhi $ that transforms data in input space X to data in feature space $\digamma $, in which the linear machine can be used. For this classification, both spaces X and $\digamma $ need to be vector spaces, where dimension of these two spaces may or may not be same. When the given data is linearly separable, we consider $\varPhi $ as identity operator. For binary classification of data, we consider the decision function $f:\mathbb {R}^{n}\rightarrow \mathbb {R}$, where the input $\mathbf {x}=(x_{1},\ldots ,x_{n})$ is assigned to the positive class if, $f(\mathbf {x})\ge 0$ and otherwise to the negative class. The decision function is defined as

$$\begin{aligned} f\left( \mathbf {x}\right) =<\mathbf {w},\varPhi \left( \mathbf {x}\right) >+ b. \end{aligned}$$

(1)

This means two steps will be built for non-linear machine : first a fixed non-linear mapping of the data to a feature space, and then a linear machine is used to classify them in the feature space.

In addition, the vector $\mathbf {w}$ is a linear combination of the support vectors in the training data and can be written as

$$\begin{aligned} \mathbf {w}=\sum _{i}\alpha _{i}\varPhi \left( \mathbf {x}_{i}\right) , \end{aligned}$$

(2)

where each $\alpha _{i}$ is Lagrange multiplier of the support vectors.

So the decision function can be rewritten as

$$\begin{aligned} f\left( \mathbf {x}\right) =\sigma \left( \sum _{i}\alpha _{i}(\varPhi (\mathbf {x} _{i})\cdot \varPhi \left( \mathbf {x}\right) )+b\right) , \end{aligned}$$

(3)

where $\sigma $ is a sign function.

The Kernel K has an associated feature with mapping $\varPhi \ $, and it takes two inputs and give their similarity in feature space $\digamma $, that is, $K:\digamma \times \digamma \rightarrow \mathbb {R}$ is defined as

$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\varPhi \left( \mathbf {x}_{i}\right) \cdot \varPhi \left( \mathbf {x}\right) . \end{aligned}$$

(4)

Thus, the decision function from (3) becomes

$$\begin{aligned} f\left( \mathbf {x}\right) =\sigma (\sum _{i}\alpha _{i}K(\mathbf {x}_{i}, \mathbf {x})+b). \end{aligned}$$

(5)

Some useful kernels for real valued vectors are defined below:

(I)
Linear kernel
$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\mathbf {x}_{i}\cdot \mathbf {x}. \end{aligned}$$
(II)
Polynomial kernel (of degree p)
$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\left( \mathbf {x}_{i}\cdot \mathbf {x}\right) ^{p}\ or \ \left( \mathbf {x}_{i}\cdot \mathbf {x}+1\right) ^{p}, \end{aligned}$$
where p is a tunable parameter.
(III)
Radial Basis Function (RBF) kernel
$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\exp [-\gamma ||\mathbf {x}_{i}-\mathbf {x} ||^{2}], \end{aligned}$$
where $\gamma $ is a hyperparameter (also called kernel bandwidth). The RBF kernel corresponds to an infinite feature space.
(IV)
Sigmoid Kernel
$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\tanh \left( k\mathbf {x}_{i}\cdot \mathbf {x} +\theta \right) , \end{aligned}$$
where k is a scalar and $\theta $ is the displacement.
(V)
Inverse multi-quadratic kernel
$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\left( \left\| \mathbf {x}_{i}-\mathbf {x} \right\| ^{2}+\gamma ^{-2}\right) ^{-1/2}, \end{aligned}$$
where $\gamma $ is a hyperparameter (also called kernel bandwidth).

Now, from (1), we define the functional margin of an example $\left( \varPhi \left( \mathbf {x}_{i}\right) ,y_{i}\right) $ with respect to a hyperplane $\left( \mathbf {w},b\right) $ to be the quantity

$$\begin{aligned} \gamma _{i}=y_{i}\left( \left\langle \mathbf {w},\varPhi \left( \mathbf {x} _{i}\right) \right\rangle +b\right) , \end{aligned}$$

where $y_{i}\in \{-1,1\}.$ Note that $\gamma _{i}>0$ implies correct classification of $\left( \mathbf {x}_{i},y_{i}\right) .$ If we replace functional margin by geometric margin we obtain the equivalent quantity for the normalized linear function $(\frac{1}{\left\| \mathbf {w}\right\| } \mathbf {w},\frac{1}{\left\| \mathbf {w}\right\| }b)$, which therefore measures the Euclidean distances of the points from the decision boundary in the input space.

Actually geometric margin can be written as

$$\begin{aligned} \tilde{\gamma }=\frac{1}{\left\| \mathbf {w}\right\| }\gamma . \end{aligned}$$

To find the hyperplane which has maximal geometric margin for a training set S means to find maximal $\tilde{\gamma }.$ For convenience, we let $\gamma =1$, the objective function can be written as

$$\begin{aligned} \max \frac{1}{\left\| \mathbf {w}\right\| }. \end{aligned}$$

Of course, there are some constraints for the optimization problem. According to the definition of margin, we have $y_{i}\left( \left\langle \mathbf {w},\varPhi \left( \mathbf {x}_{i}\right) \right\rangle +b\right) \ge 1$, $i=1,\ldots ,l.$ We rewrite in the equivalent form the objective function with the constraints as

$$\begin{aligned} \min \frac{1}{2}\left\| \mathbf {w}\right\| ^{2} \quad \text{ such } \text{ that } \quad y_{i}\left( \left\langle \mathbf {w},\varPhi \left( \mathbf {x}_{i}\right) \right\rangle +b\right) \ge 1, \ i=1,\ldots ,l. \end{aligned}$$

(6)

We denote this problem by SVM for data classification.

Example 1

Let’s take the group of points $(0,2),(0,-2),\left( 1,1\right) ,\left( 1,-1\right) ,\left( -1,1\right) $, $ \left( -1,-1\right) $ as positive class and the group of points $ (2,0),(-2,0),\left( 2,1\right) ,\left( 2,-1\right) , $ $\left( -2,1\right) ,\left( -2,-1\right) $ as negative class shown in Fig. 1.

By using the mapping function

$$\begin{aligned} \varPhi \left( \mathbf {x}\right) =\left( x_{1}^{2},\sqrt{2}x_{1}x_{2},x_{2}^{2}\right) , \end{aligned}$$

which transforms data from two-dimensional input space to three-dimensional feature space, that is $(1,\sqrt{2},1),\ (1,-\sqrt{2},1)$ and $(0,0,4)\ $as positive class and $(4,2\sqrt{2},1),\ (4,-2\sqrt{2},1)$ and (4, 0, 0) as negative data shown in Fig. 2.

Now by using this data in three dimensional feature space, we consider the following: For positive points, we have

$$\begin{aligned} \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 1 \\ \sqrt{2} \\ 1 \end{array} \right] +b\ge & {} 1, \\ \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 1 \\ -\sqrt{2} \\ 1 \end{array} \right] +b\ge & {} 1, \\ \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 0 \\ 0 \\ 4 \end{array} \right] +b\ge & {} 1, \end{aligned}$$

which implies

$$\begin{aligned} w_{1}+\sqrt{2}w_{2}+w_{3}+b\ge & {} 1, \\ w_{1}-\sqrt{2}w_{2}+w_{3}+b\ge & {} 1, \\ 4w_{3}+b\ge & {} 1. \end{aligned}$$

For negative points, we have

$$\begin{aligned} \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 4 \\ 2\sqrt{2} \\ 1 \end{array} \right] +b\le & {} -1, \\ \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 4 \\ -2\sqrt{2} \\ 1 \end{array} \right] +b\le & {} -1, \\ \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 4 \\ 0 \\ 0 \end{array} \right] +b\le & {} -1, \end{aligned}$$

implying that

$$\begin{aligned} 4w_{1}+2\sqrt{2}w_{2}+w_{3}+b\le & {} -1, \\ 4w_{1}-2\sqrt{2}w_{2}+w_{3}+b\le & {} -1, \\ 4w_{1}+b\le & {} -1. \end{aligned}$$

From the equations, we get $\mathbf {w}=(-0.6667,0,0)$ with $\left\| \mathbf {w}\right\| =0.6667$ and shown in Fig. 3.

Further, if we use Radial Basis Function (RBF) Kernel $K(\mathbf {x}_{i}, \mathbf {x})=\exp [-\gamma ||\mathbf {x}_{i}-\mathbf {x}||^{2}]$, with $\gamma = $ 1 / 3, we get $w=(0.0031,0.0012)$ which is shown in Fig. 4.

Also if we use Sigmoid Kernel $K(\mathbf {x}_{i},\mathbf {x})=\tanh \left( k \mathbf {x}_{i}\cdot \mathbf {x}+\theta \right) \ $with $k=1/3$ and $\theta =$ 2.85, we get $w=\left( 0,0\right) $ shown in Fig. 5.

Example 2

Let us look at another example. The positive data be shown as red square and the negative data be shown as blue circle respectively as shown in Fig. 6.

It is also a non-linear separable problem. Now, if we transfer the original data into the feature space by using the mapping function $\varPhi \left( \mathbf {x}\right) $, we can see that the data in the feature space is linear separable see Fig. 7.

Using Polynomial Kernel with $p=2$, we get $\mathbf {w}=(-0.4898,-0.1633)$ which is shown in Fig. 8.

Next if we use Radial Basis Function (RBF) Kernel $K(\mathbf {x}_{i},\mathbf {x })=\exp [-\gamma ||\mathbf {x}_{i}-\mathbf {x}||^{2}]$, with $\gamma =$ 2, we get $w=(-0.0016,0.0014)$ as shown in Fig. 9.

Example 3

Consider the points $(0,0),(1,0),\left( -1,0\right) $ as positive class and points $(2,0),(3,0),\left( -2,0\right) ,\left( -3,0\right) $ as negative class see in Fig. 10.

Note that, no linear separator exists for this data in the input space. Now, if we use $\varPhi \left( \mathbf {x}\right) =(x_{1}^{2},\sqrt{2} x_{1}x_{2},x_{2}^{2})$, then it transforms two-dimensional data into three-dimensional feature space, which can be separated by hyperplane H as shown in the Fig. 11.

2 Generalized Support Vector Machines

Consider a new control function $F:\mathbb {R}^{p}\rightarrow \mathbb {R}^{p}$ defined as

$$\begin{aligned} F\left( \mathbf {x}\right) =W\varPhi \left( \mathbf {x}\right) +B, \end{aligned}$$

(7)

where $W\in \mathbb {R}^{p\times p}$, $B\in \mathbb {R}^{p}$ are parameters and p is the dimension of feature space. In addition, W contains the $\mathbf {w}_{i}$ as a row, where each $\mathbf {w}_{i}$ is the linear combination of the support vectors in the feature space and can be written as

$$\begin{aligned} \mathbf {w}_{i}=\sum _{j}\alpha _{j}^{(i)}\varPhi \left( \mathbf {x}_{j}\right) , \end{aligned}$$

(8)

where $\varPhi $ is a mapping that transforms data in input space X to data in feature space $\digamma $. From (7), we obtain

$$\begin{aligned} F\left( \mathbf {x}\right)= & {} \left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)}\varPhi \left( \mathbf {x}_{j}\right) \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)}\varPhi \left( \mathbf {x}_{j}\right) \end{array} \right] \varPhi \left( \mathbf {x}\right) +B \\= & {} \left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)}\varPhi \left( \mathbf {x}_{j}\right) \varPhi \left( \mathbf {x}\right) \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)}\varPhi \left( \mathbf {x}_{j}\right) \varPhi \left( \mathbf {x}\right) \end{array} \right] +B \\= & {} \left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)}K(\mathbf {x}_{j},\mathbf {x)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)}K(\mathbf {x}_{j},\mathbf {x)} \end{array} \right] +B \\= & {} \left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)} \end{array} \right] K(\mathbf {x}_{j},\mathbf {x)}+B, \end{aligned}$$

where $K(\mathbf {x}_{j},\mathbf {x)}$ is the kernel having associated feature with mapping $\varPhi .$

Define

$$\begin{aligned} \tilde{\gamma }_{k}^{*}= & {} \mathbf {y}_{k}\left( W\varPhi \left( \mathbf {x} _{k}\right) +B\right) \\= & {} \mathbf {y}_{k}\left( \left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)} \end{array} \right] K(\mathbf {x}_{j},\mathbf {x)}+B\right) \\= & {} \mathbf {y}_{k}(\mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B)\ge 1 \quad \text{ for } \quad k=1,2,\ldots ,l, \end{aligned}$$

where $\mathbf {y}_{k}\in \left\{ \left( -1,-1,\ldots ,-1\right) ,\left( 1,1,{\ldots },1\right) \right\} $ is a p-dimensional vector, $K(\mathbf {x}_{j}, \mathbf {x)=}\varPhi \left( \mathbf {x}\right) \varPhi \left( \mathbf {x}_{k}\right) $ and $\mathbf {\zeta =}\left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)} \end{array} \right] $.

Definition 1

We define a map $G:\mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}$ by

$$\begin{aligned} G\left( \mathbf {w}_{i}\right) =\left( \left\| \mathbf {w}_{i}\right\| ,\left\| \mathbf {w}_{i}\right\| ,\ldots ,\left\| \mathbf {w} _{i}\right\| \right) \quad \text{ for } \quad i=1,2,\ldots ,p, \end{aligned}$$

(9)

where $\mathbf {w}_{i}$ are the rows of $W_{p\times p}$ for $i=1,2,\ldots ,p$.

Now, the problem is to find $\mathbf {w}_{i}\in \mathbb {R}^{p}$ that satisfy

$$\begin{aligned} \min _{\mathbf {w}_{i}\in W}G\left( \mathbf {w}_{i}\right) \quad \text{ such } \text{ that } \quad \eta \ge 0, \end{aligned}$$

(10)

where $\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)} +B\right) -1.$

We call this problem as the Generalized Support Vector Machine (GSVM).

Note that, if $\left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)} \end{array} \right] K(\mathbf {x}_{j},\mathbf {x)}=-B$, then $\eta =-1$ and we obtain no solution of GSVM problem.

Example 4

Consider the data of points for positive and negative class as given in Example 1. Then by using polynomial Kernel of degree two, we obtain $(1,\sqrt{2},1)$, $\ (1,-\sqrt{2},1)$, (0, 0, 4) the vectors of positive data and $(4,2\sqrt{2},1)$, $\ (4,-2 \sqrt{2},1)$, (4, 0, 0) the vector negative data in feature space. From positive data points, we have

$$\begin{aligned} \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 1 \\ \sqrt{2} \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 1 \\ -\sqrt{2} \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 0 \\ 0 \\ 4 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} w_{11}+\sqrt{2}w_{12}+w_{13}+b_{1}\ge & {} 1, \\ w_{21}+\sqrt{2}w_{22}+w_{23}+b_{2}\ge & {} 1, \\ w_{31}+\sqrt{2}w_{32}+w_{33}+b_{3}\ge & {} 1, \end{aligned}$$

$$\begin{aligned} w_{11}-\sqrt{2}w_{12}+w_{13}+b_{1}\ge & {} 1, \\ w_{21}-\sqrt{2}w_{22}+w_{23}+b_{2}\ge & {} 1, \\ w_{31}-\sqrt{2}w_{32}+w_{33}+b_{3}\ge & {} 1, \end{aligned}$$

$$\begin{aligned} 4w_{13}+b_{1}\ge & {} 1, \\ 4w_{23}+b_{2}\ge & {} 1, \\ 4w_{33}+b_{3}\ge & {} 1. \end{aligned}$$

Also from negative data points,

$$\begin{aligned} \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 4 \\ 2\sqrt{2} \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 4 \\ -2\sqrt{2} \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 4 \\ 0 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \\ -1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} 4w_{11}+2\sqrt{2}w_{12}+w_{13}+b_{1}\le & {} -1, \\ 4w_{21}+2\sqrt{2}w_{22}+w_{23}+b_{2}\le & {} -1, \\ 4w_{31}+2\sqrt{2}w_{32}+w_{33}+b_{3}\le & {} -1, \end{aligned}$$

$$\begin{aligned} 4w_{11}-2\sqrt{2}w_{12}+w_{13}+b_{1}\le & {} -1, \\ 4w_{21}-2\sqrt{2}w_{22}+w_{23}+b_{2}\le & {} -1, \\ 4w_{31}-2\sqrt{2}w_{32}+w_{33}+b_{3}\le & {} -1, \end{aligned}$$

$$\begin{aligned} 4w_{11}+b_{1}\le & {} -1, \\ 4w_{12}+b_{2}\le & {} -1, \\ 4w_{13}+b_{3}\le & {} -1. \end{aligned}$$

By solving these equations, we get

$$\begin{aligned} W=\left[ \begin{array}{ccc} -1.39 &{} -0.512 &{} -0.627 \\ 0.667 &{} 0 &{} -0.667 \\ 0.667 &{} 0 &{} 0 \end{array} \right] \quad \text{ and } \quad B=\left[ \begin{array}{c} 3.742 \\ 1.047 \\ 1.51 \end{array} \right] , \end{aligned}$$

with smallest norm of $\mathbf {w}_{i}$

$$\begin{aligned} \min _{\mathbf {w}_{i}\in W}G\left( \mathbf {w}_{i}\right) =(0.667,0.667,0.667). \end{aligned}$$

Hence we get $\mathbf {w}=(0.667,0,0)$ that minimize $G\left( \mathbf {w} _{i}\right) $ for $i=1,2,3.$

If we are dealing with the data that can linearly separable, then in the process of GSVM, map $\varPhi $ deals as identity operator. The next example we show the situations for this case.

Example 5

Let us consider the three categories of data:

Situation 1 Suppose that we have data $\left( 2,0\right) ,\left( 0,2\right) ,\left( 2,1\right) $ as positive class and data $\left( -1,0\right) ,\left( 0,-1\right) ,\left( -1,-1/2\right) $ as negative class shown in Fig. 12.

For positive points, we have $\left( 2,0\right) $, $\left( 0,2\right) , $ $\left( 2,1\right) $, so

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 2 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ 2 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 2 \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \end{aligned}$$

which implies

$$\begin{aligned} \left[ \begin{array}{c} 2w_{11} \\ 2w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{c} 2w_{12} \\ 2w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{c} 2w_{11}+w_{12} \\ 2w_{21}+w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . \end{aligned}$$

Again, for the negative points, we have $\left( -1,0\right) $, $\left( 0,-1\right) ,\left( -1,-1/2\right) $ and

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} -1 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ -1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} -1 \\ -1/2 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} \left[ \begin{array}{c} -w_{11} \\ -w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{c} -w_{12} \\ -w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{c} -w_{11}-\frac{1}{2}w_{12} \\ -w_{21}-\frac{1}{2}w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . \end{aligned}$$

From above equations, we get

$$\begin{aligned} W=\left[ \begin{array}{cc} \frac{2}{3} &{} \frac{2}{3} \\ \frac{2}{3} &{} \frac{2}{3} \end{array} \right] \quad \text{ and } \quad B=\left[ \begin{array}{c} -\frac{1}{3} \\ -\frac{1}{3} \end{array} \right] . \end{aligned}$$

Thus we get

$$\begin{aligned} \min _{\mathbf {w}_{i}\in W}G\left( \mathbf {w}_{i}\right) =\left( \frac{2\sqrt{2}}{3} ,\frac{2\sqrt{2}}{3}\right) . \end{aligned}$$

Hence we get $\mathbf {w}=(\frac{2}{3},\frac{2}{3})$ that minimizes $G\left( \mathbf {w}_{i}\right) $ for $i=1,2$.

Situation 2 We consider the data (1, 0), (0, 1), (1 / 2, 1) as positive class, data $\left( -4,0\right) ,\left( 0,-4\right) ,(-2,-4)$ as negative class which is shown in Fig. 13.

Now, for positive points of Situation 2, we have (1, 0), (0, 1), (1 / 2, 1) and

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} \frac{1}{2} \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} \left[ \begin{array}{c} w_{11} \\ w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{c} w_{12} \\ w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{c} \frac{1}{2}w_{11}+w_{12} \\ \frac{1}{2}w_{21}+w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . \end{aligned}$$

For negative points for this case, we have

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} -4 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ -4 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} -2 \\ -4 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} \left[ \begin{array}{c} -4w_{11} \\ -4w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{c} -4w_{12} \\ -4w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{c} -2w_{11}-4w_{12} \\ -2w_{21}-4w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] . \end{aligned}$$

Thus, we obtain that

$$\begin{aligned} W=\left[ \begin{array}{cc} \frac{2}{5} &{} \frac{2}{5} \\ \frac{2}{5} &{} \frac{2}{5} \end{array} \right] \quad \text{ and } \quad B=\left[ \begin{array}{c} \frac{3}{5} \\ \frac{3}{5} \end{array} \right] . \end{aligned}$$

Thus we get

$$\begin{aligned} \min _{i\in \{1,2\}} \ G\left( \mathbf {w}_{i}\right) =\left( \frac{2\sqrt{2}}{5 },\frac{2\sqrt{2}}{5}\right) . \end{aligned}$$

Hence we get $\mathbf {w}=(\frac{2}{5},\frac{2}{5})$ that minimize $G\left( \mathbf {w}_{i}\right) $ for $i=1,2.$

In the next Situation 3, we combine of this two groups of data. Now, we have data $\left( 2,0\right) ,\left( 0,2\right) ,\left( 2,1\right) ,(1,0),(0,1),(1/2,1)$ as positive class and $\left( -1,0\right) $, $ \left( 0,-1\right) $, $ \left( -1,-1/2\right) $, $\left( -4,0\right) $, $\left( 0,-4\right) $, $ (-2,-4)$ as negative class see Fig. 14.

For the positive points of the combination, we have

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \end{aligned}$$

and

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} \left[ \begin{array}{c} w_{11} \\ w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \quad \text{ and } \quad \left[ \begin{array}{c} w_{12} \\ w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . \end{aligned}$$

For negative points for this case, we have

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} -1 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \end{aligned}$$

and

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ -1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} \left[ \begin{array}{c} -w_{11} \\ -w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} -1 \\ -1 \end{array} \right] \quad \text{ and } \quad \left[ \begin{array}{c} -w_{12} \\ -w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} -1 \\ -1 \end{array} \right] . \end{aligned}$$

From this, we obtain that

$$\begin{aligned} W=\left[ \begin{array}{cc} 1 &{} 1 \\ 1 &{} 1 \end{array} \right] \quad \text{ and } \quad B=\left[ \begin{array}{c} 0 \\ 0 \end{array} \right] . \end{aligned}$$

Thus we get

$$\begin{aligned} \min _{i\in \{1,2\}} \ G\left( \mathbf {w}_{i}\right) =(\sqrt{2},\sqrt{2} ). \end{aligned}$$

Hence we get $\mathbf {w}=(1,1)$ that minimize $G\left( \mathbf {w}_{i}\right) $ for $i=1,2.$

The problem of GSVM defined in (10) is equivalent to

$$\begin{aligned} \text{ find } \quad \mathbf {w}_{i}\in W:\ \left\langle G^{\prime }\left( \mathbf {w}_{i}\right) ,\mathbf {v}-\mathbf {w}_{i}\right\rangle \ge 0 \quad \text{ for } \text{ all } \quad \mathbf {v}\in \mathbb {R}^{p} \quad \text{ with } \quad \eta \ge 0. \end{aligned}$$

(11)

Hence the problem of GSVM becomes to the problem of generalized variational inequality .

Note that it we take $G^{\prime }\left( \mathbf {w}_{i}\right) = \frac{\mathbf {w}_{i}}{\left\| \mathbf {w}_{i}\right\| }$, then from (11), we obtain

$$\begin{aligned} \text{ find } \quad \mathbf {w}_{i}\in W: \ \ \left\langle \mathbf {w}_{i},\mathbf {v-w}_{i}\right\rangle \ge 0 \quad \text{ for } \text{ all } \quad \mathbf {v}\in \mathbb {R}^{p} \quad \text{ with } \quad \eta \ge 0, \end{aligned}$$

(12)

or

$$\begin{aligned} \text{ find } \quad \mathbf {w}_{i}\in W: \ \ \left\langle \mathbf {w}_{i},\mathbf {v}\right\rangle \ge \left\| \mathbf {w}_{i}\right\| ^{2} \quad \text{ for } \text{ all } \quad \mathbf {v}\in \mathbb {R}^{p} \quad \text{ with } \quad \eta \ge 0. \end{aligned}$$

(13)

We study the sufficient conditions for the existence of solutions for GSVM problems.

Proposition 1

Let $G: \mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}$ be a differentiable operator. An element $\mathbf {w}^{*}\in \mathbb {R}^{p}$ minimizes G if and only if $G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}$, that is, $\mathbf {w}^{*}\in \mathbb {R}^{p}$ solves GSVM if and only if $G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}.$

Proof

Let $G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}$, then for all $\mathbf {v}\in \mathbb {R}^{p}$ with $\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j}, \mathbf {x)}+B\right) -1\ge 0$,

$$\begin{aligned}<G^{\prime }\left( \mathbf {w}^{*}\right) ,\mathbf {v}-\mathbf {w}^{*}> \ =\ <0,\mathbf {v}-\mathbf {w}^{*}>\ =\ \ 0, \end{aligned}$$

and consequently, the inequality

$$\begin{aligned} <G^{\prime }\left( \mathbf {w}^{*}\right) ,\mathbf {v}-\mathbf {w}^{*}> \ \ge \ 0 \end{aligned}$$

holds for all $\mathbf {v}\in \mathbb {R}^{p}.$ Hence $\mathbf {w}^{*}\in \mathbb {R}^{p}$ solves problem of GSVM.

Conversely, assume that $\mathbf {w}^{*}\in \mathbb {R}^{p}$ satisfies

$$\begin{aligned} <G^{\prime }\left( \mathbf {w}^{*}\right) ,\mathbf {v}-\mathbf {w}^{*}> \ \ge 0 \ \ \forall \ \mathbf {v}\in \mathbb {R}^{n}\quad \text{ such } \text{ that } \quad \eta \ge 0. \end{aligned}$$

Taking $\mathbf {v}=\mathbf {w}^{*}-G^{\prime }\left( \mathbf {w}^{*}\right) $ in the above inequality implies that

$$\begin{aligned} <G^{\prime }\left( \mathbf {w}^{*}\right) ,-G^{\prime }\left( \mathbf {w} ^{*}\right) >\ \ge \ 0, \end{aligned}$$

which further implies

$$\begin{aligned} -||G^{\prime }(\mathbf {w}^{*})||^{2}\ \ge \ 0, \end{aligned}$$

and we get $G^{\prime }(\mathbf {w}^{*})=\mathbf {0}.$ $\square $

Remark 1

Note that if $G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}$ at some $\mathbf {w}^{*}\in \mathbb {R}^{p}$, then we obtain $\frac{\mathbf {w}^{*}}{\left\| \mathbf {w}^{*}\right\| }=\mathbf {0}$ which implies $\mathbf {w}^{*}=\mathbf {0}.$ Thus it follows from Proposition 2.4 that if $G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}$ at some $\mathbf {w}^{*}\in \mathbb {R}^{p}$, then $\mathbf {w}^{*}=\mathbf {0}\ $solves GSVM problem.

Remark 2

If $\mathbf {w}^{*}=\mathbf {0}$, then from (8), we obtain

$$\begin{aligned} \sum _{j}\alpha _{j}^{(*)}\varPhi \left( \mathbf {x}_{j}\right) =\mathbf {0,} \end{aligned}$$

which implies

$$\begin{aligned} \sum _{j}\alpha _{j}^{(*)}\varPhi \left( \mathbf {x}_{j}\right) \varPhi \left( \mathbf {x}\right) =0\mathbf {,} \end{aligned}$$

that is

$$\begin{aligned} \sum _{j}\alpha _{j}^{(*)}K\left( \mathbf {x}_{j},\mathbf {x}\right) =0 \mathbf {.} \end{aligned}$$

(14)

Since $\alpha _{j}^{(*)}>0$ for all j, so we have

$$\begin{aligned} K\left( \mathbf {x}_{j},\mathbf {x}\right) =0\mathbf {.} \end{aligned}$$

Definition 2

Let K be a closed and convex subset of $\mathbb {R}^{n}$. Then, for every point $\mathbf {x}\in \mathbb {R}^{n}$, there exists a unique nearest point in K, denoted by $P_{K}\left( \mathbf {x}\right) $, such that $\left\| \mathbf {x}-P_{K}\left( \mathbf {x}\right) \right\| \le \left\| \mathbf {x}-\mathbf {y}\right\| $ for all $\mathbf {y}\in K$ and also note that $P_{K}\left( \mathbf {x}\right) = \mathbf {x}$ if $\mathbf {x}\in K$. $P_{K}$ is called the metric projection of $\mathbb {R}^{n}$ onto K. It is well known that $P_{K}: \mathbb {R}^{n}\rightarrow K$ is characterized by the properties:

(i)
$P_{K}\left( \mathbf {x}\right) =\mathbf {z}$ for $\mathbf {x}\in \mathbb {R}^{n}$ if and only if $<\mathbf {z-x},\mathbf {y}-\mathbf {z}>$ $\ge $ 0 for all $\mathbf {y}\in \mathbb {R}^{n}$;
(ii)
For every $\mathbf {x,y}\in \mathbb {R}^{n}$, $\left\| P_{K} \left( \mathbf {x}\right) -P_{K}\left( \mathbf {y}\right) \right\| ^{2}$ $\le $ $<\mathbf {x}-\mathbf {y},P_{K}\left( \mathbf {x}\right) -P_{K}\left( \mathbf {y}\right)>$;
(iii)
$\left\| P_{K}\left( \mathbf {x}\right) -P_{K}\left( \mathbf {y}\right) \right\| $ $\le $ $\left\| \mathbf {x}-\mathbf {y}\right\| $, for every $\mathbf {x,y}\in \mathbb {R}^{n}$, that is, $P_{K}$ is nonexpansive map.

Proposition 2

Let $G: \mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}$ be a differentiable operator. An element $\mathbf {w}^{*}\in \mathbb {R}^{p}$ minimize mapping G defined in (11) if and only if $\mathbf {w}^{*}$ is the fixed point of map

$$\begin{aligned} P_{\mathbb {R}_{+}^{n}}\left( I-\rho G^{\prime }\right) : \mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}\ for \ any\ \rho >0, \end{aligned}$$

that is,

$$\begin{aligned} \mathbf {w}^{*}= & {} P_{\mathbb {R}_{+}^{p}}\left( I-\rho G^{\prime }\right) (\mathbf {w}^{*}) \\= & {} P_{\mathbb {R}_{+}^{p}}\left( \mathbf {w}^{*}-\rho G^{\prime }\left( \mathbf {w}^{*}\right) \right) , \end{aligned}$$

where $P_{\mathbb {R}_{+}^{p}}$ is a projection map from $\mathbb {R}^{p}$ to $\mathbb {R}_{+}^{p}$ and $\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B\right) -1\ge 0.$

Proof

Suppose $\mathbf {w}^{*}\in \mathbb {R}_{+}^{p}$ is solution of GSVM. Then for $\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B\right) -1\ge 0$, we have

$$\begin{aligned} <G^{\prime }\left( \mathbf {w}^{*}\right) ,\mathbf {w}-\mathbf {w}^{*}>\ \ge \ 0\quad \text{ for } \text{ all } \quad \mathbf {w}\in \mathbb {R}^{p}. \end{aligned}$$

Adding $< \mathbf {w}^{*},\mathbf {w}-\mathbf {w}^{*}>$ on both sides, we get

$$\begin{aligned}< {\mathbf {w}}^{*},{\mathbf {w}}-{\mathbf {w}}^{*}> +< G^{\prime }\left( {\mathbf {w}}^{*}\right) ,{\mathbf {w}}-{\mathbf {w}}^{*}> \ge \ < {\mathbf {w}}^{*},{\mathbf {w}}-{\mathbf {w}}^{*} > \quad \text {for all} \quad {\mathbf {w}}\in \mathbb {R}^{p}, \end{aligned}$$

which further implies that

$$\begin{aligned} <\mathbf {w}^{*}-\left( \mathbf {w}^{*}-G^{\prime }\left( \mathbf {w} ^{*}\right) \right) ,\mathbf {w}-\mathbf {w}^{*}>\ \ge \ 0 \quad \text{ for } \text{ all } \quad \mathbf {w}\in \mathbb {R}^{p}, \end{aligned}$$

which is possible only if $\mathbf {w}^{*}=P_{\mathbb {R}_{+}^{p}}\left( \mathbf {w}^{*}-\rho G^{\prime } \left( \mathbf {w}^{*}\right) \right) $, that is, $\mathbf {w}^{*}$ is the fixed point of $G^{\prime }.$

Conversely, let $\mathbf {w}^{*}=P_{\mathbb {R}_{+}^{p}}\left( \mathbf {w}^{*}-\rho G^{\prime } \left( \mathbf {w}^{*}\right) \right) $ with $\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B\right) -1\ge 0$, then we have

$$\begin{aligned} <\mathbf {w}^{*}-\left( \mathbf {w}^{*}-G^{\prime }\left( \mathbf {w}^{*}\right) \right) ,\mathbf {w}-\mathbf {w}^{*}>\ \ge \ 0 \quad \text{ for } \text{ all } \quad \mathbf {w}\in \mathbb {R}^{p}, \end{aligned}$$

which implies

$$\begin{aligned} <G^{\prime }\left( \mathbf {w}^{*}\right) ,\mathbf {w}-\mathbf {w}^{*}>\ \ge \ 0 \quad \text{ for } \text{ all } \quad \mathbf {w}\in \mathbb {R}^{p}, \end{aligned}$$

and so $\mathbf {w}^{*}\in \mathbb {R}_{+}^{p}$ is the solution of GSVM. $\square $

3 Conclusion

The linear and nonlinear data classifications by using support vector machine and generalized support vector machine have been studied. We also studied the sufficient conditions for existence of the solution of generalized support vector machine. Some examples are shown for supporting these results.

References

Adankon, M.M., Cheriet, M.: Model selection for the LS-SVM. Application to handwriting recognition. Pattern Recognit. 42(12), 3264–3270 (2009)
Article MATH Google Scholar
Cortes, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel Based Learning Methods. Cambridge University Press, Cambridge (2000)
Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Article MATH Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the European Conference on Machine Learning. Springer, Heidelberg (1998)
Google Scholar
Khan, N., Ksantini, R., Ahmad, I., Boufama, B.: A novel SVM+NDA model for classification with an application to face recognition. Pattern Recognit. 45(1), 66–79 (2012)
Article MATH Google Scholar
Li, S., Kwok, J.T., Zhu, H., Wang, Y.: Texture classification using the support vector machines. Pattern Recognit. 36(12), 2883–2893 (2003)
Article MATH Google Scholar
Liu, R., Wang, Y., Baba, T., Masumoto, D., Nagata, S.: SVM-based active feedback in image retrieval using clustering and unlabeled data. Pattern Recognit. 41(8), 2645–2655 (2008)
Article MATH Google Scholar
Michel, P., Kaliouby, R.E.: Real time facial expresion recognition in video using support vector machines. In: Proceedings of ICMI’03, pp. 258–264 (2003)
Google Scholar
Noble, W.S.: Support Vector Machine Applications in Computational Biology. MIT Press, Cambridge (2004)
Google Scholar
Shao, Y., Lunetta, R.S.: Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS J. Photogramm. Remote Sens. 70, 78–87 (2012)
Article Google Scholar
Shao, Y.H., Chen, W.J., Deng, N.Y.: Nonparallel hyperplane support vector machine for binary classification problems. Inf. Sci. 263, 22–35 (2014)
Article MathSciNet MATH Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1996)
MATH Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Wang, D., Qi, X., Wen, S., Deng, M.: SVM based fault classifier design for a water level control system. In: Proceedings of 2013 International Conference on Advanced Mechatronic Systems, pp. 152–157. Luoyang, China (2013)
Google Scholar
Wang, D., Qi, X., Wen, S., Dan, Y., Ouyang, L., Deng, M.: Robust nonlinear control and SVM classifier based fault diagnosis for a water level process. ICIC Express Lett. 5(1), 767–774 (2014)
Google Scholar
Wang, X.Y., Wang, T., Bu, J.: Color image segmentation using pixel wise support vector machine classification. Pattern Recognit. 44(4), 777–787 (2011)
Article MATH Google Scholar
Weston, J., Watkins, C.: Multi-class support vector machines. Technical report CSD-TR- 98-04, Department of Computer Science, Royal Holloway, University of London (1998)
Google Scholar
Wu, Y.C., Lee, Y.-S., Yang, J.-C.: Robust and efficient multiclass SVM models for phrase pattern recognition. Pattern Recognit. 41(9), 2874–2889 (2008)
Article MATH Google Scholar
Xue, Z., Ming, D., Song, W., Wan, B., Jin, S.: Infrared gait recognition based on wavelet transform and support vector machine. Pattern Recognit. 43(8), 2904–2910 (2010)
Article MATH Google Scholar
Zhao, Z., Liu, J., Cox, J.: Safe and efficient screening for sparse support vector machine. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 14, pp. 542–551, New York, NY, USA (2014)
Google Scholar
Zuo, R., Carranza, E.J.M.: Support vector machine: a tool for mapping mineral prospectivity. Comput. Geosci. 37(12), 1967–1975 (2011)
Article Google Scholar

Download references

Acknowledgements

Talat Nazir and Xiaomin Qi are grateful to the Erasmus Mundus project FUSION for supporting the research visit to Mälardalen University, Sweden, and to the Research environment MAM in Mathematics and Applied Mathematics, Division of Applied Mathematics, the School of Education, Culture and Communication of Mälardalen University for creating excellent research environment.

Author information

Authors and Affiliations

Division of Applied Mathematics, School of Education, Culture and Communication, Mälardalen University, Box 883, 721 23, Västerås, Sweden
Talat Nazir, Xiaomin Qi & Sergei Silvestrov
Department of Mathematics, COMSATS Institute of Information Technology, Abbottabad, 22060, Pakistan
Talat Nazir

Authors

Talat Nazir
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomin Qi
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Silvestrov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Talat Nazir .

Editor information

Editors and Affiliations

Division of Applied Mathematics, School of Education, Culture and Communication, Mälardalen University,, Västerås, Sweden
Sergei Silvestrov
Division of Applied Mathematics,, School of Education, Culture and Communication, Mälardalen University, Västerås, Sweden
Milica Rančić

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nazir, T., Qi, X., Silvestrov, S. (2016). Linear and Nonlinear Classifiers of Data with Support Vector Machines and Generalized Support Vector Machines. In: Silvestrov, S., Rančić, M. (eds) Engineering Mathematics II. Springer Proceedings in Mathematics & Statistics, vol 179. Springer, Cham. https://doi.org/10.1007/978-3-319-42105-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-42105-6_18
Published: 11 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42104-9
Online ISBN: 978-3-319-42105-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Linear and Nonlinear Classifiers of Data with Support Vector Machines and Generalized Support Vector Machines

Abstract

Similar content being viewed by others

Support Vector Machines

Linear Support Vector Machines

Support Vector Machines

Keywords

1 Support Vector Machine

1.1 Data Classification

Example 1

Example 2

Example 3

2 Generalized Support Vector Machines

Definition 1

Example 4

Example 5

Proposition 1

Proof

Remark 1

Remark 2

Definition 2

Proposition 2

Proof

3 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Linear and Nonlinear Classifiers of Data with Support Vector Machines and Generalized Support Vector Machines

Abstract

Similar content being viewed by others

Support Vector Machines

Linear Support Vector Machines

Support Vector Machines

Keywords

1 Support Vector Machine

1.1 Data Classification

Example 1

Example 2

Example 3

2 Generalized Support Vector Machines

Definition 1

Example 4

Example 5

Proposition 1

Proof

Remark 1

Remark 2

Definition 2

Proposition 2

Proof

3 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation