Keywords

1 Support Vector Machine

Support vector machines (SVM) [2, 3, 13, 14, 18] were developed by Vapnik et al. (1995) and are gaining popularity due to many attractive features. As a very powerful tool for data classification and regression, it has been used in many fields, such as text classification [5], facial expression recognition [9], gene analysis [4] and many others [1, 6,7,8, 10,11,12, 17, 19,20,21,22]. Recently, it has been used for faults classification in a water level control system [15]. And a faults classifier based SVM is used to diagnose the faults for a water level control process [16].

The classification problems can be restricted to consideration of the two-class problems without loss of generality. The goal of support vector classification (SVC) is to separate the two classes by a hyperplane which can also work well on unseen examples. The method is to find the optimal hyperplane that maximizes the margin between two classes of data. The set of data is said to be optimally separated by the hyperplane if it is separated without error and the distance between the closest data is maximal. Support vector classification can be thought of a process using given data to find the decision plane which can guarantee good predictive performance on unseen data. And the process of finding the decision plane is a quadratic programming process.

In this paper, we study the problems of support vector machine and generalized support vector machine. We also show the sufficient conditions for the existence of solutions for problems of generalized support vector machine. We also present various examples to support these results.

Throughout this paper, by \(\mathbb {N}\), \(\mathbb {R}\), \(\mathbb {R}^{n}\) and \(\mathbb {R}_{n}^{+}\) we denote the set of all natural numbers, the set of all real numbers, the set of all n-tuples real numbers, the set of all n-tuples of nonnegative real numbers, respectively.

Also, we consider \(\left\| \cdot \right\| \ \)and \(<\cdot ,\cdot>\) as Euclidean norm and usual inner product on \(\mathbb {R}^{n}\), respectively, such as, \(<\mathbf {x},\mathbf {y}>=\mathbf {x}.\mathbf {y} =x_{1}y_{1}+x_{2}y_{2}+\cdots +x_{n}y_{n}\) for all \(\mathbf {x}=\left( x_{1},x_{2},\ldots ,x_{n}\right) \), \(\mathbf {y}=\left( y_{1},y_{2},\ldots ,y_{n}\right) \) in \(\mathbb {R}^{n}.\) Furthermore, for any two vectors \(\mathbf {x,y\in }\) \(\mathbb {R}^{n}\), we say that \(\mathbf {x\le y}\) if and only if \(x_{i}\le y_{i}\) for all \(i\in \{1,2,\ldots ,n\}\), where \(x_{i}\) and \(y_{i}\) are the components of \(\mathbf {x}\) and \(\mathbf {y}\), respectively.

1.1 Data Classification

Actually, complex real-world applications are always not linearly separable. Kernel representations offer an alternative solution by projecting the data into a higher dimensional feature space to increase the computational power of the linear learning machine .

In order to learn linear or non-linear relations with a linear machine, a set of non-linear features is selected. This is equivalent to applying a fixed non-linear mapping function \(\varPhi \) that transforms data in input space X to data in feature space \(\digamma \), in which the linear machine can be used. For this classification, both spaces X and \(\digamma \) need to be vector spaces, where dimension of these two spaces may or may not be same. When the given data is linearly separable, we consider \(\varPhi \) as identity operator. For binary classification of data, we consider the decision function \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\), where the input \(\mathbf {x}=(x_{1},\ldots ,x_{n})\) is assigned to the positive class if, \(f(\mathbf {x})\ge 0\) and otherwise to the negative class. The decision function is defined as

$$\begin{aligned} f\left( \mathbf {x}\right) =<\mathbf {w},\varPhi \left( \mathbf {x}\right) >+ b. \end{aligned}$$
(1)

This means two steps will be built for non-linear machine : first a fixed non-linear mapping of the data to a feature space, and then a linear machine is used to classify them in the feature space.

In addition, the vector \(\mathbf {w}\) is a linear combination of the support vectors in the training data and can be written as

$$\begin{aligned} \mathbf {w}=\sum _{i}\alpha _{i}\varPhi \left( \mathbf {x}_{i}\right) , \end{aligned}$$
(2)

where each \(\alpha _{i}\) is Lagrange multiplier of the support vectors.

So the decision function can be rewritten as

$$\begin{aligned} f\left( \mathbf {x}\right) =\sigma \left( \sum _{i}\alpha _{i}(\varPhi (\mathbf {x} _{i})\cdot \varPhi \left( \mathbf {x}\right) )+b\right) , \end{aligned}$$
(3)

where \(\sigma \) is a sign function.

The Kernel K has an associated feature with mapping \(\varPhi \ \), and it takes two inputs and give their similarity in feature space \(\digamma \), that is, \(K:\digamma \times \digamma \rightarrow \mathbb {R}\) is defined as

$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\varPhi \left( \mathbf {x}_{i}\right) \cdot \varPhi \left( \mathbf {x}\right) . \end{aligned}$$
(4)

Thus, the decision function from (3) becomes

$$\begin{aligned} f\left( \mathbf {x}\right) =\sigma (\sum _{i}\alpha _{i}K(\mathbf {x}_{i}, \mathbf {x})+b). \end{aligned}$$
(5)

Some useful kernels for real valued vectors are defined below:

  1. (I)

    Linear kernel

    $$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\mathbf {x}_{i}\cdot \mathbf {x}. \end{aligned}$$
  2. (II)

    Polynomial kernel (of degree p)

    $$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\left( \mathbf {x}_{i}\cdot \mathbf {x}\right) ^{p}\ or \ \left( \mathbf {x}_{i}\cdot \mathbf {x}+1\right) ^{p}, \end{aligned}$$

    where p is a tunable parameter.

  3. (III)

    Radial Basis Function (RBF) kernel

    $$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\exp [-\gamma ||\mathbf {x}_{i}-\mathbf {x} ||^{2}], \end{aligned}$$

    where \(\gamma \) is a hyperparameter (also called kernel bandwidth). The RBF kernel corresponds to an infinite feature space.

  4. (IV)

    Sigmoid Kernel

    $$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\tanh \left( k\mathbf {x}_{i}\cdot \mathbf {x} +\theta \right) , \end{aligned}$$

    where k is a scalar and \(\theta \) is the displacement.

  5. (V)

    Inverse multi-quadratic kernel

    $$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\left( \left\| \mathbf {x}_{i}-\mathbf {x} \right\| ^{2}+\gamma ^{-2}\right) ^{-1/2}, \end{aligned}$$

    where \(\gamma \) is a hyperparameter (also called kernel bandwidth).

Now, from (1), we define the functional margin of an example \(\left( \varPhi \left( \mathbf {x}_{i}\right) ,y_{i}\right) \) with respect to a hyperplane \(\left( \mathbf {w},b\right) \) to be the quantity

$$\begin{aligned} \gamma _{i}=y_{i}\left( \left\langle \mathbf {w},\varPhi \left( \mathbf {x} _{i}\right) \right\rangle +b\right) , \end{aligned}$$

where \(y_{i}\in \{-1,1\}.\) Note that \(\gamma _{i}>0\) implies correct classification of \(\left( \mathbf {x}_{i},y_{i}\right) .\) If we replace functional margin by geometric margin we obtain the equivalent quantity for the normalized linear function \((\frac{1}{\left\| \mathbf {w}\right\| } \mathbf {w},\frac{1}{\left\| \mathbf {w}\right\| }b)\), which therefore measures the Euclidean distances of the points from the decision boundary in the input space.

Actually geometric margin can be written as

$$\begin{aligned} \tilde{\gamma }=\frac{1}{\left\| \mathbf {w}\right\| }\gamma . \end{aligned}$$

To find the hyperplane which has maximal geometric margin for a training set S means to find maximal \(\tilde{\gamma }.\) For convenience, we let \(\gamma =1\), the objective function can be written as

$$\begin{aligned} \max \frac{1}{\left\| \mathbf {w}\right\| }. \end{aligned}$$

Of course, there are some constraints for the optimization problem. According to the definition of margin, we have \(y_{i}\left( \left\langle \mathbf {w},\varPhi \left( \mathbf {x}_{i}\right) \right\rangle +b\right) \ge 1\), \(i=1,\ldots ,l.\) We rewrite in the equivalent form the objective function with the constraints as

$$\begin{aligned} \min \frac{1}{2}\left\| \mathbf {w}\right\| ^{2} \quad \text{ such } \text{ that } \quad y_{i}\left( \left\langle \mathbf {w},\varPhi \left( \mathbf {x}_{i}\right) \right\rangle +b\right) \ge 1, \ i=1,\ldots ,l. \end{aligned}$$
(6)
Fig. 1
figure 1

The data points given in Example 1

Fig. 2
figure 2

The data separation in three dimensional feature space

We denote this problem by SVM for data classification.

Example 1

Let’s take the group of points \((0,2),(0,-2),\left( 1,1\right) ,\left( 1,-1\right) ,\left( -1,1\right) \), \( \left( -1,-1\right) \) as positive class and the group of points \( (2,0),(-2,0),\left( 2,1\right) ,\left( 2,-1\right) , \) \(\left( -2,1\right) ,\left( -2,-1\right) \) as negative class shown in Fig. 1.

By using the mapping function

$$\begin{aligned} \varPhi \left( \mathbf {x}\right) =\left( x_{1}^{2},\sqrt{2}x_{1}x_{2},x_{2}^{2}\right) , \end{aligned}$$

which transforms data from two-dimensional input space to three-dimensional feature space, that is \((1,\sqrt{2},1),\ (1,-\sqrt{2},1)\) and \((0,0,4)\ \)as positive class and \((4,2\sqrt{2},1),\ (4,-2\sqrt{2},1)\) and (4, 0, 0) as negative data shown in Fig. 2.

Now by using this data in three dimensional feature space, we consider the following: For positive points, we have

$$\begin{aligned} \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 1 \\ \sqrt{2} \\ 1 \end{array} \right] +b\ge & {} 1, \\ \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 1 \\ -\sqrt{2} \\ 1 \end{array} \right] +b\ge & {} 1, \\ \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 0 \\ 0 \\ 4 \end{array} \right] +b\ge & {} 1, \end{aligned}$$

which implies

$$\begin{aligned} w_{1}+\sqrt{2}w_{2}+w_{3}+b\ge & {} 1, \\ w_{1}-\sqrt{2}w_{2}+w_{3}+b\ge & {} 1, \\ 4w_{3}+b\ge & {} 1. \end{aligned}$$

For negative points, we have

$$\begin{aligned} \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 4 \\ 2\sqrt{2} \\ 1 \end{array} \right] +b\le & {} -1, \\ \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 4 \\ -2\sqrt{2} \\ 1 \end{array} \right] +b\le & {} -1, \\ \left( w_{1},w_{2},w_{3}\right) \left[ \begin{array}{c} 4 \\ 0 \\ 0 \end{array} \right] +b\le & {} -1, \end{aligned}$$

implying that

$$\begin{aligned} 4w_{1}+2\sqrt{2}w_{2}+w_{3}+b\le & {} -1, \\ 4w_{1}-2\sqrt{2}w_{2}+w_{3}+b\le & {} -1, \\ 4w_{1}+b\le & {} -1. \end{aligned}$$

From the equations, we get \(\mathbf {w}=(-0.6667,0,0)\) with \(\left\| \mathbf {w}\right\| =0.6667\) and shown in Fig. 3.

Further, if we use Radial Basis Function (RBF) Kernel \(K(\mathbf {x}_{i}, \mathbf {x})=\exp [-\gamma ||\mathbf {x}_{i}-\mathbf {x}||^{2}]\), with \(\gamma = \) 1 / 3, we get \(w=(0.0031,0.0012)\) which is shown in Fig. 4.

Also if we use Sigmoid Kernel \(K(\mathbf {x}_{i},\mathbf {x})=\tanh \left( k \mathbf {x}_{i}\cdot \mathbf {x}+\theta \right) \ \)with \(k=1/3\) and \(\theta =\) 2.85, we get \(w=\left( 0,0\right) \) shown in Fig. 5.

Fig. 3
figure 3

The data separation using Polynomial Kernel of degree 2

Fig. 4
figure 4

The data separation using Radial Basis Function (RBF)

Fig. 5
figure 5

The data separation using Sigmoid Kernel

Fig. 6
figure 6

The data points given in Example 2

Example 2

Let us look at another example. The positive data be shown as red square and the negative data be shown as blue circle respectively as shown in Fig. 6.

It is also a non-linear separable problem. Now, if we transfer the original data into the feature space by using the mapping function \(\varPhi \left( \mathbf {x}\right) \), we can see that the data in the feature space is linear separable see Fig. 7.

Using Polynomial Kernel with \(p=2\), we get \(\mathbf {w}=(-0.4898,-0.1633)\) which is shown in Fig. 8.

Next if we use Radial Basis Function (RBF) Kernel \(K(\mathbf {x}_{i},\mathbf {x })=\exp [-\gamma ||\mathbf {x}_{i}-\mathbf {x}||^{2}]\), with \(\gamma =\) 2, we get \(w=(-0.0016,0.0014)\) as shown in Fig. 9.

Fig. 7
figure 7

The data separation in feature space of Example 2

Fig. 8
figure 8

The data separation of Example 2 using Polynomial Kernel

Fig. 9
figure 9

The data separation of Example 2 using RBF Kernel

Example 3

Consider the points \((0,0),(1,0),\left( -1,0\right) \) as positive class and points \((2,0),(3,0),\left( -2,0\right) ,\left( -3,0\right) \) as negative class see in Fig. 10.

Note that, no linear separator exists for this data in the input space. Now, if we use \(\varPhi \left( \mathbf {x}\right) =(x_{1}^{2},\sqrt{2} x_{1}x_{2},x_{2}^{2})\), then it transforms two-dimensional data into three-dimensional feature space, which can be separated by hyperplane H as shown in the Fig. 11.

Fig. 10
figure 10

The data points given in Example 3

Fig. 11
figure 11

Data separation of Example 3 by using Polynomial Kernel of degree 2

2 Generalized Support Vector Machines

Consider a new control function \(F:\mathbb {R}^{p}\rightarrow \mathbb {R}^{p}\) defined as

$$\begin{aligned} F\left( \mathbf {x}\right) =W\varPhi \left( \mathbf {x}\right) +B, \end{aligned}$$
(7)

where \(W\in \mathbb {R}^{p\times p}\), \(B\in \mathbb {R}^{p}\) are parameters and p is the dimension of feature space. In addition, W contains the \(\mathbf {w}_{i}\) as a row, where each \(\mathbf {w}_{i}\) is the linear combination of the support vectors in the feature space and can be written as

$$\begin{aligned} \mathbf {w}_{i}=\sum _{j}\alpha _{j}^{(i)}\varPhi \left( \mathbf {x}_{j}\right) , \end{aligned}$$
(8)

where \(\varPhi \) is a mapping that transforms data in input space X to data in feature space \(\digamma \). From (7), we obtain

$$\begin{aligned} F\left( \mathbf {x}\right)= & {} \left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)}\varPhi \left( \mathbf {x}_{j}\right) \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)}\varPhi \left( \mathbf {x}_{j}\right) \end{array} \right] \varPhi \left( \mathbf {x}\right) +B \\= & {} \left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)}\varPhi \left( \mathbf {x}_{j}\right) \varPhi \left( \mathbf {x}\right) \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)}\varPhi \left( \mathbf {x}_{j}\right) \varPhi \left( \mathbf {x}\right) \end{array} \right] +B \\= & {} \left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)}K(\mathbf {x}_{j},\mathbf {x)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)}K(\mathbf {x}_{j},\mathbf {x)} \end{array} \right] +B \\= & {} \left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)} \end{array} \right] K(\mathbf {x}_{j},\mathbf {x)}+B, \end{aligned}$$

where \(K(\mathbf {x}_{j},\mathbf {x)}\) is the kernel having associated feature with mapping \(\varPhi .\)

Define

$$\begin{aligned} \tilde{\gamma }_{k}^{*}= & {} \mathbf {y}_{k}\left( W\varPhi \left( \mathbf {x} _{k}\right) +B\right) \\= & {} \mathbf {y}_{k}\left( \left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)} \end{array} \right] K(\mathbf {x}_{j},\mathbf {x)}+B\right) \\= & {} \mathbf {y}_{k}(\mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B)\ge 1 \quad \text{ for } \quad k=1,2,\ldots ,l, \end{aligned}$$

where \(\mathbf {y}_{k}\in \left\{ \left( -1,-1,\ldots ,-1\right) ,\left( 1,1,{\ldots },1\right) \right\} \) is a p-dimensional vector, \(K(\mathbf {x}_{j}, \mathbf {x)=}\varPhi \left( \mathbf {x}\right) \varPhi \left( \mathbf {x}_{k}\right) \) and \(\mathbf {\zeta =}\left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)} \end{array} \right] \).

Definition 1

We define a map \(G:\mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}\) by

$$\begin{aligned} G\left( \mathbf {w}_{i}\right) =\left( \left\| \mathbf {w}_{i}\right\| ,\left\| \mathbf {w}_{i}\right\| ,\ldots ,\left\| \mathbf {w} _{i}\right\| \right) \quad \text{ for } \quad i=1,2,\ldots ,p, \end{aligned}$$
(9)

where \(\mathbf {w}_{i}\) are the rows of \(W_{p\times p}\) for \(i=1,2,\ldots ,p\).

Now, the problem is to find \(\mathbf {w}_{i}\in \mathbb {R}^{p}\) that satisfy

$$\begin{aligned} \min _{\mathbf {w}_{i}\in W}G\left( \mathbf {w}_{i}\right) \quad \text{ such } \text{ that } \quad \eta \ge 0, \end{aligned}$$
(10)

where \(\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)} +B\right) -1.\)

We call this problem as the Generalized Support Vector Machine (GSVM).

Note that, if \(\left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)} \end{array} \right] K(\mathbf {x}_{j},\mathbf {x)}=-B\), then \(\eta =-1\) and we obtain no solution of GSVM problem.

Example 4

Consider the data of points for positive and negative class as given in Example 1. Then by using polynomial Kernel of degree two, we obtain \((1,\sqrt{2},1)\), \(\ (1,-\sqrt{2},1)\), (0, 0, 4) the vectors of positive data and \((4,2\sqrt{2},1)\), \(\ (4,-2 \sqrt{2},1)\), (4, 0, 0) the vector negative data in feature space. From positive data points, we have

$$\begin{aligned} \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 1 \\ \sqrt{2} \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 1 \\ -\sqrt{2} \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 0 \\ 0 \\ 4 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} w_{11}+\sqrt{2}w_{12}+w_{13}+b_{1}\ge & {} 1, \\ w_{21}+\sqrt{2}w_{22}+w_{23}+b_{2}\ge & {} 1, \\ w_{31}+\sqrt{2}w_{32}+w_{33}+b_{3}\ge & {} 1, \end{aligned}$$
$$\begin{aligned} w_{11}-\sqrt{2}w_{12}+w_{13}+b_{1}\ge & {} 1, \\ w_{21}-\sqrt{2}w_{22}+w_{23}+b_{2}\ge & {} 1, \\ w_{31}-\sqrt{2}w_{32}+w_{33}+b_{3}\ge & {} 1, \end{aligned}$$
$$\begin{aligned} 4w_{13}+b_{1}\ge & {} 1, \\ 4w_{23}+b_{2}\ge & {} 1, \\ 4w_{33}+b_{3}\ge & {} 1. \end{aligned}$$

Also from negative data points,

$$\begin{aligned} \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 4 \\ 2\sqrt{2} \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 4 \\ -2\sqrt{2} \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{ccc} w_{11} &{} w_{12} &{} w_{13} \\ w_{21} &{} w_{22} &{} w_{23} \\ w_{31} &{} w_{32} &{} w_{33} \end{array} \right] \left[ \begin{array}{c} 4 \\ 0 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \\ -1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} 4w_{11}+2\sqrt{2}w_{12}+w_{13}+b_{1}\le & {} -1, \\ 4w_{21}+2\sqrt{2}w_{22}+w_{23}+b_{2}\le & {} -1, \\ 4w_{31}+2\sqrt{2}w_{32}+w_{33}+b_{3}\le & {} -1, \end{aligned}$$
$$\begin{aligned} 4w_{11}-2\sqrt{2}w_{12}+w_{13}+b_{1}\le & {} -1, \\ 4w_{21}-2\sqrt{2}w_{22}+w_{23}+b_{2}\le & {} -1, \\ 4w_{31}-2\sqrt{2}w_{32}+w_{33}+b_{3}\le & {} -1, \end{aligned}$$
$$\begin{aligned} 4w_{11}+b_{1}\le & {} -1, \\ 4w_{12}+b_{2}\le & {} -1, \\ 4w_{13}+b_{3}\le & {} -1. \end{aligned}$$

By solving these equations, we get

$$\begin{aligned} W=\left[ \begin{array}{ccc} -1.39 &{} -0.512 &{} -0.627 \\ 0.667 &{} 0 &{} -0.667 \\ 0.667 &{} 0 &{} 0 \end{array} \right] \quad \text{ and } \quad B=\left[ \begin{array}{c} 3.742 \\ 1.047 \\ 1.51 \end{array} \right] , \end{aligned}$$

with smallest norm of \(\mathbf {w}_{i}\)

$$\begin{aligned} \min _{\mathbf {w}_{i}\in W}G\left( \mathbf {w}_{i}\right) =(0.667,0.667,0.667). \end{aligned}$$

Hence we get \(\mathbf {w}=(0.667,0,0)\) that minimize \(G\left( \mathbf {w} _{i}\right) \) for \(i=1,2,3.\)

If we are dealing with the data that can linearly separable, then in the process of GSVM, map \(\varPhi \) deals as identity operator. The next example we show the situations for this case.

Example 5

Let us consider the three categories of data:

Situation 1 Suppose that we have data \(\left( 2,0\right) ,\left( 0,2\right) ,\left( 2,1\right) \) as positive class and data \(\left( -1,0\right) ,\left( 0,-1\right) ,\left( -1,-1/2\right) \) as negative class shown in Fig. 12.

Fig. 12
figure 12

Data for situation 1 in Example 5

For positive points, we have \(\left( 2,0\right) \), \(\left( 0,2\right) , \) \(\left( 2,1\right) \), so

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 2 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ 2 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 2 \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \end{aligned}$$

which implies

$$\begin{aligned} \left[ \begin{array}{c} 2w_{11} \\ 2w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{c} 2w_{12} \\ 2w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{c} 2w_{11}+w_{12} \\ 2w_{21}+w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . \end{aligned}$$

Again, for the negative points, we have \(\left( -1,0\right) \), \(\left( 0,-1\right) ,\left( -1,-1/2\right) \) and

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} -1 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ -1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} -1 \\ -1/2 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} \left[ \begin{array}{c} -w_{11} \\ -w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{c} -w_{12} \\ -w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{c} -w_{11}-\frac{1}{2}w_{12} \\ -w_{21}-\frac{1}{2}w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . \end{aligned}$$

From above equations, we get

$$\begin{aligned} W=\left[ \begin{array}{cc} \frac{2}{3} &{} \frac{2}{3} \\ \frac{2}{3} &{} \frac{2}{3} \end{array} \right] \quad \text{ and } \quad B=\left[ \begin{array}{c} -\frac{1}{3} \\ -\frac{1}{3} \end{array} \right] . \end{aligned}$$

Thus we get

$$\begin{aligned} \min _{\mathbf {w}_{i}\in W}G\left( \mathbf {w}_{i}\right) =\left( \frac{2\sqrt{2}}{3} ,\frac{2\sqrt{2}}{3}\right) . \end{aligned}$$

Hence we get \(\mathbf {w}=(\frac{2}{3},\frac{2}{3})\) that minimizes \(G\left( \mathbf {w}_{i}\right) \) for \(i=1,2\).

Situation 2 We consider the data (1, 0), (0, 1), (1 / 2, 1) as positive class, data \(\left( -4,0\right) ,\left( 0,-4\right) ,(-2,-4)\) as negative class which is shown in Fig. 13.

Fig. 13
figure 13

The data separation for situation 2

Now, for positive points of Situation 2, we have (1, 0), (0, 1), (1 / 2, 1) and

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} \frac{1}{2} \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} \left[ \begin{array}{c} w_{11} \\ w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{c} w_{12} \\ w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \\ \left[ \begin{array}{c} \frac{1}{2}w_{11}+w_{12} \\ \frac{1}{2}w_{21}+w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\ge & {} \left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . \end{aligned}$$

For negative points for this case, we have

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} -4 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ -4 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} -2 \\ -4 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} \left[ \begin{array}{c} -4w_{11} \\ -4w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{c} -4w_{12} \\ -4w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \\ \left[ \begin{array}{c} -2w_{11}-4w_{12} \\ -2w_{21}-4w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right]\le & {} \left[ \begin{array}{c} -1 \\ -1 \end{array} \right] . \end{aligned}$$

Thus, we obtain that

$$\begin{aligned} W=\left[ \begin{array}{cc} \frac{2}{5} &{} \frac{2}{5} \\ \frac{2}{5} &{} \frac{2}{5} \end{array} \right] \quad \text{ and } \quad B=\left[ \begin{array}{c} \frac{3}{5} \\ \frac{3}{5} \end{array} \right] . \end{aligned}$$

Thus we get

$$\begin{aligned} \min _{i\in \{1,2\}} \ G\left( \mathbf {w}_{i}\right) =\left( \frac{2\sqrt{2}}{5 },\frac{2\sqrt{2}}{5}\right) . \end{aligned}$$

Hence we get \(\mathbf {w}=(\frac{2}{5},\frac{2}{5})\) that minimize \(G\left( \mathbf {w}_{i}\right) \) for \(i=1,2.\)

In the next Situation 3, we combine of this two groups of data. Now, we have data \(\left( 2,0\right) ,\left( 0,2\right) ,\left( 2,1\right) ,(1,0),(0,1),(1/2,1)\) as positive class and \(\left( -1,0\right) \), \( \left( 0,-1\right) \), \( \left( -1,-1/2\right) \), \(\left( -4,0\right) \), \(\left( 0,-4\right) \), \( (-2,-4)\) as negative class see Fig. 14.

Fig. 14
figure 14

The data separation for situation 3

For the positive points of the combination, we have

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \end{aligned}$$

and

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ 1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} 1 \\ 1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} \left[ \begin{array}{c} w_{11} \\ w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} 1 \\ 1 \end{array} \right] \quad \text{ and } \quad \left[ \begin{array}{c} w_{12} \\ w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} 1 \\ 1 \end{array} \right] . \end{aligned}$$

For negative points for this case, we have

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} -1 \\ 0 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \end{aligned}$$

and

$$\begin{aligned} \left[ \begin{array}{cc} w_{11} &{} w_{12} \\ w_{21} &{} w_{22} \end{array} \right] \left[ \begin{array}{c} 0 \\ -1 \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} -1 \\ -1 \end{array} \right] , \end{aligned}$$

which gives

$$\begin{aligned} \left[ \begin{array}{c} -w_{11} \\ -w_{21} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} -1 \\ -1 \end{array} \right] \quad \text{ and } \quad \left[ \begin{array}{c} -w_{12} \\ -w_{22} \end{array} \right] +\left[ \begin{array}{c} b_{1} \\ b_{2} \end{array} \right] =\left[ \begin{array}{c} -1 \\ -1 \end{array} \right] . \end{aligned}$$

From this, we obtain that

$$\begin{aligned} W=\left[ \begin{array}{cc} 1 &{} 1 \\ 1 &{} 1 \end{array} \right] \quad \text{ and } \quad B=\left[ \begin{array}{c} 0 \\ 0 \end{array} \right] . \end{aligned}$$

Thus we get

$$\begin{aligned} \min _{i\in \{1,2\}} \ G\left( \mathbf {w}_{i}\right) =(\sqrt{2},\sqrt{2} ). \end{aligned}$$

Hence we get \(\mathbf {w}=(1,1)\) that minimize \(G\left( \mathbf {w}_{i}\right) \) for \(i=1,2.\)

The problem of GSVM defined in (10) is equivalent to

$$\begin{aligned} \text{ find } \quad \mathbf {w}_{i}\in W:\ \left\langle G^{\prime }\left( \mathbf {w}_{i}\right) ,\mathbf {v}-\mathbf {w}_{i}\right\rangle \ge 0 \quad \text{ for } \text{ all } \quad \mathbf {v}\in \mathbb {R}^{p} \quad \text{ with } \quad \eta \ge 0. \end{aligned}$$
(11)

Hence the problem of GSVM becomes to the problem of generalized variational inequality .

Note that it we take \(G^{\prime }\left( \mathbf {w}_{i}\right) = \frac{\mathbf {w}_{i}}{\left\| \mathbf {w}_{i}\right\| }\), then from (11), we obtain

$$\begin{aligned} \text{ find } \quad \mathbf {w}_{i}\in W: \ \ \left\langle \mathbf {w}_{i},\mathbf {v-w}_{i}\right\rangle \ge 0 \quad \text{ for } \text{ all } \quad \mathbf {v}\in \mathbb {R}^{p} \quad \text{ with } \quad \eta \ge 0, \end{aligned}$$
(12)

or

$$\begin{aligned} \text{ find } \quad \mathbf {w}_{i}\in W: \ \ \left\langle \mathbf {w}_{i},\mathbf {v}\right\rangle \ge \left\| \mathbf {w}_{i}\right\| ^{2} \quad \text{ for } \text{ all } \quad \mathbf {v}\in \mathbb {R}^{p} \quad \text{ with } \quad \eta \ge 0. \end{aligned}$$
(13)

We study the sufficient conditions for the existence of solutions for GSVM problems.

Proposition 1

Let \(G: \mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}\) be a differentiable operator. An element \(\mathbf {w}^{*}\in \mathbb {R}^{p}\) minimizes G if and only if \(G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}\), that is, \(\mathbf {w}^{*}\in \mathbb {R}^{p}\) solves GSVM if and only if \(G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}.\)

Proof

Let \(G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}\), then for all \(\mathbf {v}\in \mathbb {R}^{p}\) with \(\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j}, \mathbf {x)}+B\right) -1\ge 0\),

$$\begin{aligned}<G^{\prime }\left( \mathbf {w}^{*}\right) ,\mathbf {v}-\mathbf {w}^{*}> \ =\ <0,\mathbf {v}-\mathbf {w}^{*}>\ =\ \ 0, \end{aligned}$$

and consequently, the inequality

$$\begin{aligned} <G^{\prime }\left( \mathbf {w}^{*}\right) ,\mathbf {v}-\mathbf {w}^{*}> \ \ge \ 0 \end{aligned}$$

holds for all \(\mathbf {v}\in \mathbb {R}^{p}.\) Hence \(\mathbf {w}^{*}\in \mathbb {R}^{p}\) solves problem of GSVM.

Conversely, assume that \(\mathbf {w}^{*}\in \mathbb {R}^{p}\) satisfies

$$\begin{aligned} <G^{\prime }\left( \mathbf {w}^{*}\right) ,\mathbf {v}-\mathbf {w}^{*}> \ \ge 0 \ \ \forall \ \mathbf {v}\in \mathbb {R}^{n}\quad \text{ such } \text{ that } \quad \eta \ge 0. \end{aligned}$$

Taking \(\mathbf {v}=\mathbf {w}^{*}-G^{\prime }\left( \mathbf {w}^{*}\right) \) in the above inequality implies that

$$\begin{aligned} <G^{\prime }\left( \mathbf {w}^{*}\right) ,-G^{\prime }\left( \mathbf {w} ^{*}\right) >\ \ge \ 0, \end{aligned}$$

which further implies

$$\begin{aligned} -||G^{\prime }(\mathbf {w}^{*})||^{2}\ \ge \ 0, \end{aligned}$$

and we get \(G^{\prime }(\mathbf {w}^{*})=\mathbf {0}.\) \(\square \)

Remark 1

Note that if \(G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}\) at some \(\mathbf {w}^{*}\in \mathbb {R}^{p}\), then we obtain \(\frac{\mathbf {w}^{*}}{\left\| \mathbf {w}^{*}\right\| }=\mathbf {0}\) which implies \(\mathbf {w}^{*}=\mathbf {0}.\) Thus it follows from Proposition 2.4 that if \(G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}\) at some \(\mathbf {w}^{*}\in \mathbb {R}^{p}\), then \(\mathbf {w}^{*}=\mathbf {0}\ \)solves GSVM problem.

Remark 2

If \(\mathbf {w}^{*}=\mathbf {0}\), then from (8), we obtain

$$\begin{aligned} \sum _{j}\alpha _{j}^{(*)}\varPhi \left( \mathbf {x}_{j}\right) =\mathbf {0,} \end{aligned}$$

which implies

$$\begin{aligned} \sum _{j}\alpha _{j}^{(*)}\varPhi \left( \mathbf {x}_{j}\right) \varPhi \left( \mathbf {x}\right) =0\mathbf {,} \end{aligned}$$

that is

$$\begin{aligned} \sum _{j}\alpha _{j}^{(*)}K\left( \mathbf {x}_{j},\mathbf {x}\right) =0 \mathbf {.} \end{aligned}$$
(14)

Since \(\alpha _{j}^{(*)}>0\) for all j, so we have

$$\begin{aligned} K\left( \mathbf {x}_{j},\mathbf {x}\right) =0\mathbf {.} \end{aligned}$$

Definition 2

Let K be a closed and convex subset of \(\mathbb {R}^{n}\). Then, for every point \(\mathbf {x}\in \mathbb {R}^{n}\), there exists a unique nearest point in K, denoted by \(P_{K}\left( \mathbf {x}\right) \), such that \(\left\| \mathbf {x}-P_{K}\left( \mathbf {x}\right) \right\| \le \left\| \mathbf {x}-\mathbf {y}\right\| \) for all \(\mathbf {y}\in K\) and also note that \(P_{K}\left( \mathbf {x}\right) = \mathbf {x}\) if \(\mathbf {x}\in K\). \(P_{K}\) is called the metric projection of \(\mathbb {R}^{n}\) onto K. It is well known that \(P_{K}: \mathbb {R}^{n}\rightarrow K\) is characterized by the properties:

  1. (i)

    \(P_{K}\left( \mathbf {x}\right) =\mathbf {z}\) for \(\mathbf {x}\in \mathbb {R}^{n}\) if and only if \(<\mathbf {z-x},\mathbf {y}-\mathbf {z}>\) \(\ge \) 0 for all \(\mathbf {y}\in \mathbb {R}^{n}\);

  2. (ii)

    For every \(\mathbf {x,y}\in \mathbb {R}^{n}\), \(\left\| P_{K} \left( \mathbf {x}\right) -P_{K}\left( \mathbf {y}\right) \right\| ^{2}\) \(\le \) \(<\mathbf {x}-\mathbf {y},P_{K}\left( \mathbf {x}\right) -P_{K}\left( \mathbf {y}\right)>\);

  3. (iii)

    \(\left\| P_{K}\left( \mathbf {x}\right) -P_{K}\left( \mathbf {y}\right) \right\| \) \(\le \) \(\left\| \mathbf {x}-\mathbf {y}\right\| \), for every \(\mathbf {x,y}\in \mathbb {R}^{n}\), that is, \(P_{K}\) is nonexpansive map.

Proposition 2

Let \(G: \mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}\) be a differentiable operator. An element \(\mathbf {w}^{*}\in \mathbb {R}^{p}\) minimize mapping G defined in (11) if and only if \(\mathbf {w}^{*}\) is the fixed point of map

$$\begin{aligned} P_{\mathbb {R}_{+}^{n}}\left( I-\rho G^{\prime }\right) : \mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}\ for \ any\ \rho >0, \end{aligned}$$

that is,

$$\begin{aligned} \mathbf {w}^{*}= & {} P_{\mathbb {R}_{+}^{p}}\left( I-\rho G^{\prime }\right) (\mathbf {w}^{*}) \\= & {} P_{\mathbb {R}_{+}^{p}}\left( \mathbf {w}^{*}-\rho G^{\prime }\left( \mathbf {w}^{*}\right) \right) , \end{aligned}$$

where \(P_{\mathbb {R}_{+}^{p}}\) is a projection map from \(\mathbb {R}^{p}\) to \(\mathbb {R}_{+}^{p}\) and \(\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B\right) -1\ge 0.\)

Proof

Suppose \(\mathbf {w}^{*}\in \mathbb {R}_{+}^{p}\) is solution of GSVM. Then for \(\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B\right) -1\ge 0\), we have

$$\begin{aligned} <G^{\prime }\left( \mathbf {w}^{*}\right) ,\mathbf {w}-\mathbf {w}^{*}>\ \ge \ 0\quad \text{ for } \text{ all } \quad \mathbf {w}\in \mathbb {R}^{p}. \end{aligned}$$

Adding \(< \mathbf {w}^{*},\mathbf {w}-\mathbf {w}^{*}>\) on both sides, we get

$$\begin{aligned}< {\mathbf {w}}^{*},{\mathbf {w}}-{\mathbf {w}}^{*}> +< G^{\prime }\left( {\mathbf {w}}^{*}\right) ,{\mathbf {w}}-{\mathbf {w}}^{*}> \ge \ < {\mathbf {w}}^{*},{\mathbf {w}}-{\mathbf {w}}^{*} > \quad \text {for all} \quad {\mathbf {w}}\in \mathbb {R}^{p}, \end{aligned}$$

which further implies that

$$\begin{aligned} <\mathbf {w}^{*}-\left( \mathbf {w}^{*}-G^{\prime }\left( \mathbf {w} ^{*}\right) \right) ,\mathbf {w}-\mathbf {w}^{*}>\ \ge \ 0 \quad \text{ for } \text{ all } \quad \mathbf {w}\in \mathbb {R}^{p}, \end{aligned}$$

which is possible only if \(\mathbf {w}^{*}=P_{\mathbb {R}_{+}^{p}}\left( \mathbf {w}^{*}-\rho G^{\prime } \left( \mathbf {w}^{*}\right) \right) \), that is, \(\mathbf {w}^{*}\) is the fixed point of \(G^{\prime }.\)

Conversely, let \(\mathbf {w}^{*}=P_{\mathbb {R}_{+}^{p}}\left( \mathbf {w}^{*}-\rho G^{\prime } \left( \mathbf {w}^{*}\right) \right) \) with \(\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B\right) -1\ge 0\), then we have

$$\begin{aligned} <\mathbf {w}^{*}-\left( \mathbf {w}^{*}-G^{\prime }\left( \mathbf {w}^{*}\right) \right) ,\mathbf {w}-\mathbf {w}^{*}>\ \ge \ 0 \quad \text{ for } \text{ all } \quad \mathbf {w}\in \mathbb {R}^{p}, \end{aligned}$$

which implies

$$\begin{aligned} <G^{\prime }\left( \mathbf {w}^{*}\right) ,\mathbf {w}-\mathbf {w}^{*}>\ \ge \ 0 \quad \text{ for } \text{ all } \quad \mathbf {w}\in \mathbb {R}^{p}, \end{aligned}$$

and so \(\mathbf {w}^{*}\in \mathbb {R}_{+}^{p}\) is the solution of GSVM. \(\square \)

3 Conclusion

The linear and nonlinear data classifications by using support vector machine and generalized support vector machine have been studied. We also studied the sufficient conditions for existence of the solution of generalized support vector machine. Some examples are shown for supporting these results.