Abstract
The support vector machine for linear and nonlinear classification of data is studied. The notion of generalized support vector machine for data classifications is used. The problem of generalized support vector machine is shown to be equivalent to the problem of generalized variational inequality and various results for the existence of solutions are established. Moreover, examples supporting the results are provided.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Linear and nonlinear classification
- Support vector machine
- Generalized support vector machine
- Kernel function
1 Support Vector Machine
Support vector machines (SVM) [2, 3, 13, 14, 18] were developed by Vapnik et al. (1995) and are gaining popularity due to many attractive features. As a very powerful tool for data classification and regression, it has been used in many fields, such as text classification [5], facial expression recognition [9], gene analysis [4] and many others [1, 6,7,8, 10,11,12, 17, 19,20,21,22]. Recently, it has been used for faults classification in a water level control system [15]. And a faults classifier based SVM is used to diagnose the faults for a water level control process [16].
The classification problems can be restricted to consideration of the two-class problems without loss of generality. The goal of support vector classification (SVC) is to separate the two classes by a hyperplane which can also work well on unseen examples. The method is to find the optimal hyperplane that maximizes the margin between two classes of data. The set of data is said to be optimally separated by the hyperplane if it is separated without error and the distance between the closest data is maximal. Support vector classification can be thought of a process using given data to find the decision plane which can guarantee good predictive performance on unseen data. And the process of finding the decision plane is a quadratic programming process.
In this paper, we study the problems of support vector machine and generalized support vector machine. We also show the sufficient conditions for the existence of solutions for problems of generalized support vector machine. We also present various examples to support these results.
Throughout this paper, by \(\mathbb {N}\), \(\mathbb {R}\), \(\mathbb {R}^{n}\) and \(\mathbb {R}_{n}^{+}\) we denote the set of all natural numbers, the set of all real numbers, the set of all n-tuples real numbers, the set of all n-tuples of nonnegative real numbers, respectively.
Also, we consider \(\left\| \cdot \right\| \ \)and \(<\cdot ,\cdot>\) as Euclidean norm and usual inner product on \(\mathbb {R}^{n}\), respectively, such as, \(<\mathbf {x},\mathbf {y}>=\mathbf {x}.\mathbf {y} =x_{1}y_{1}+x_{2}y_{2}+\cdots +x_{n}y_{n}\) for all \(\mathbf {x}=\left( x_{1},x_{2},\ldots ,x_{n}\right) \), \(\mathbf {y}=\left( y_{1},y_{2},\ldots ,y_{n}\right) \) in \(\mathbb {R}^{n}.\) Furthermore, for any two vectors \(\mathbf {x,y\in }\) \(\mathbb {R}^{n}\), we say that \(\mathbf {x\le y}\) if and only if \(x_{i}\le y_{i}\) for all \(i\in \{1,2,\ldots ,n\}\), where \(x_{i}\) and \(y_{i}\) are the components of \(\mathbf {x}\) and \(\mathbf {y}\), respectively.
1.1 Data Classification
Actually, complex real-world applications are always not linearly separable. Kernel representations offer an alternative solution by projecting the data into a higher dimensional feature space to increase the computational power of the linear learning machine .
In order to learn linear or non-linear relations with a linear machine, a set of non-linear features is selected. This is equivalent to applying a fixed non-linear mapping function \(\varPhi \) that transforms data in input space X to data in feature space \(\digamma \), in which the linear machine can be used. For this classification, both spaces X and \(\digamma \) need to be vector spaces, where dimension of these two spaces may or may not be same. When the given data is linearly separable, we consider \(\varPhi \) as identity operator. For binary classification of data, we consider the decision function \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\), where the input \(\mathbf {x}=(x_{1},\ldots ,x_{n})\) is assigned to the positive class if, \(f(\mathbf {x})\ge 0\) and otherwise to the negative class. The decision function is defined as
This means two steps will be built for non-linear machine : first a fixed non-linear mapping of the data to a feature space, and then a linear machine is used to classify them in the feature space.
In addition, the vector \(\mathbf {w}\) is a linear combination of the support vectors in the training data and can be written as
where each \(\alpha _{i}\) is Lagrange multiplier of the support vectors.
So the decision function can be rewritten as
where \(\sigma \) is a sign function.
The Kernel K has an associated feature with mapping \(\varPhi \ \), and it takes two inputs and give their similarity in feature space \(\digamma \), that is, \(K:\digamma \times \digamma \rightarrow \mathbb {R}\) is defined as
Thus, the decision function from (3) becomes
Some useful kernels for real valued vectors are defined below:
-
(I)
Linear kernel
$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\mathbf {x}_{i}\cdot \mathbf {x}. \end{aligned}$$ -
(II)
Polynomial kernel (of degree p)
$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\left( \mathbf {x}_{i}\cdot \mathbf {x}\right) ^{p}\ or \ \left( \mathbf {x}_{i}\cdot \mathbf {x}+1\right) ^{p}, \end{aligned}$$where p is a tunable parameter.
-
(III)
Radial Basis Function (RBF) kernel
$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\exp [-\gamma ||\mathbf {x}_{i}-\mathbf {x} ||^{2}], \end{aligned}$$where \(\gamma \) is a hyperparameter (also called kernel bandwidth). The RBF kernel corresponds to an infinite feature space.
-
(IV)
Sigmoid Kernel
$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\tanh \left( k\mathbf {x}_{i}\cdot \mathbf {x} +\theta \right) , \end{aligned}$$where k is a scalar and \(\theta \) is the displacement.
-
(V)
Inverse multi-quadratic kernel
$$\begin{aligned} K(\mathbf {x}_{i},\mathbf {x})=\left( \left\| \mathbf {x}_{i}-\mathbf {x} \right\| ^{2}+\gamma ^{-2}\right) ^{-1/2}, \end{aligned}$$where \(\gamma \) is a hyperparameter (also called kernel bandwidth).
Now, from (1), we define the functional margin of an example \(\left( \varPhi \left( \mathbf {x}_{i}\right) ,y_{i}\right) \) with respect to a hyperplane \(\left( \mathbf {w},b\right) \) to be the quantity
where \(y_{i}\in \{-1,1\}.\) Note that \(\gamma _{i}>0\) implies correct classification of \(\left( \mathbf {x}_{i},y_{i}\right) .\) If we replace functional margin by geometric margin we obtain the equivalent quantity for the normalized linear function \((\frac{1}{\left\| \mathbf {w}\right\| } \mathbf {w},\frac{1}{\left\| \mathbf {w}\right\| }b)\), which therefore measures the Euclidean distances of the points from the decision boundary in the input space.
Actually geometric margin can be written as
To find the hyperplane which has maximal geometric margin for a training set S means to find maximal \(\tilde{\gamma }.\) For convenience, we let \(\gamma =1\), the objective function can be written as
Of course, there are some constraints for the optimization problem. According to the definition of margin, we have \(y_{i}\left( \left\langle \mathbf {w},\varPhi \left( \mathbf {x}_{i}\right) \right\rangle +b\right) \ge 1\), \(i=1,\ldots ,l.\) We rewrite in the equivalent form the objective function with the constraints as
We denote this problem by SVM for data classification.
Example 1
Let’s take the group of points \((0,2),(0,-2),\left( 1,1\right) ,\left( 1,-1\right) ,\left( -1,1\right) \), \( \left( -1,-1\right) \) as positive class and the group of points \( (2,0),(-2,0),\left( 2,1\right) ,\left( 2,-1\right) , \) \(\left( -2,1\right) ,\left( -2,-1\right) \) as negative class shown in Fig. 1.
By using the mapping function
which transforms data from two-dimensional input space to three-dimensional feature space, that is \((1,\sqrt{2},1),\ (1,-\sqrt{2},1)\) and \((0,0,4)\ \)as positive class and \((4,2\sqrt{2},1),\ (4,-2\sqrt{2},1)\) and (4, 0, 0) as negative data shown in Fig. 2.
Now by using this data in three dimensional feature space, we consider the following: For positive points, we have
which implies
For negative points, we have
implying that
From the equations, we get \(\mathbf {w}=(-0.6667,0,0)\) with \(\left\| \mathbf {w}\right\| =0.6667\) and shown in Fig. 3.
Further, if we use Radial Basis Function (RBF) Kernel \(K(\mathbf {x}_{i}, \mathbf {x})=\exp [-\gamma ||\mathbf {x}_{i}-\mathbf {x}||^{2}]\), with \(\gamma = \) 1 / 3, we get \(w=(0.0031,0.0012)\) which is shown in Fig. 4.
Also if we use Sigmoid Kernel \(K(\mathbf {x}_{i},\mathbf {x})=\tanh \left( k \mathbf {x}_{i}\cdot \mathbf {x}+\theta \right) \ \)with \(k=1/3\) and \(\theta =\) 2.85, we get \(w=\left( 0,0\right) \) shown in Fig. 5.
Example 2
Let us look at another example. The positive data be shown as red square and the negative data be shown as blue circle respectively as shown in Fig. 6.
It is also a non-linear separable problem. Now, if we transfer the original data into the feature space by using the mapping function \(\varPhi \left( \mathbf {x}\right) \), we can see that the data in the feature space is linear separable see Fig. 7.
Using Polynomial Kernel with \(p=2\), we get \(\mathbf {w}=(-0.4898,-0.1633)\) which is shown in Fig. 8.
Next if we use Radial Basis Function (RBF) Kernel \(K(\mathbf {x}_{i},\mathbf {x })=\exp [-\gamma ||\mathbf {x}_{i}-\mathbf {x}||^{2}]\), with \(\gamma =\) 2, we get \(w=(-0.0016,0.0014)\) as shown in Fig. 9.
Example 3
Consider the points \((0,0),(1,0),\left( -1,0\right) \) as positive class and points \((2,0),(3,0),\left( -2,0\right) ,\left( -3,0\right) \) as negative class see in Fig. 10.
Note that, no linear separator exists for this data in the input space. Now, if we use \(\varPhi \left( \mathbf {x}\right) =(x_{1}^{2},\sqrt{2} x_{1}x_{2},x_{2}^{2})\), then it transforms two-dimensional data into three-dimensional feature space, which can be separated by hyperplane H as shown in the Fig. 11.
2 Generalized Support Vector Machines
Consider a new control function \(F:\mathbb {R}^{p}\rightarrow \mathbb {R}^{p}\) defined as
where \(W\in \mathbb {R}^{p\times p}\), \(B\in \mathbb {R}^{p}\) are parameters and p is the dimension of feature space. In addition, W contains the \(\mathbf {w}_{i}\) as a row, where each \(\mathbf {w}_{i}\) is the linear combination of the support vectors in the feature space and can be written as
where \(\varPhi \) is a mapping that transforms data in input space X to data in feature space \(\digamma \). From (7), we obtain
where \(K(\mathbf {x}_{j},\mathbf {x)}\) is the kernel having associated feature with mapping \(\varPhi .\)
Define
where \(\mathbf {y}_{k}\in \left\{ \left( -1,-1,\ldots ,-1\right) ,\left( 1,1,{\ldots },1\right) \right\} \) is a p-dimensional vector, \(K(\mathbf {x}_{j}, \mathbf {x)=}\varPhi \left( \mathbf {x}\right) \varPhi \left( \mathbf {x}_{k}\right) \) and \(\mathbf {\zeta =}\left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)} \end{array} \right] \).
Definition 1
We define a map \(G:\mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}\) by
where \(\mathbf {w}_{i}\) are the rows of \(W_{p\times p}\) for \(i=1,2,\ldots ,p\).
Now, the problem is to find \(\mathbf {w}_{i}\in \mathbb {R}^{p}\) that satisfy
where \(\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)} +B\right) -1.\)
We call this problem as the Generalized Support Vector Machine (GSVM).
Note that, if \(\left[ \begin{array}{c} \sum _{j}\alpha _{j}^{(1)} \\ \vdots \\ \sum _{j}\alpha _{j}^{(p)} \end{array} \right] K(\mathbf {x}_{j},\mathbf {x)}=-B\), then \(\eta =-1\) and we obtain no solution of GSVM problem.
Example 4
Consider the data of points for positive and negative class as given in Example 1. Then by using polynomial Kernel of degree two, we obtain \((1,\sqrt{2},1)\), \(\ (1,-\sqrt{2},1)\), (0, 0, 4) the vectors of positive data and \((4,2\sqrt{2},1)\), \(\ (4,-2 \sqrt{2},1)\), (4, 0, 0) the vector negative data in feature space. From positive data points, we have
which gives
Also from negative data points,
which gives
By solving these equations, we get
with smallest norm of \(\mathbf {w}_{i}\)
Hence we get \(\mathbf {w}=(0.667,0,0)\) that minimize \(G\left( \mathbf {w} _{i}\right) \) for \(i=1,2,3.\)
If we are dealing with the data that can linearly separable, then in the process of GSVM, map \(\varPhi \) deals as identity operator. The next example we show the situations for this case.
Example 5
Let us consider the three categories of data:
Situation 1 Suppose that we have data \(\left( 2,0\right) ,\left( 0,2\right) ,\left( 2,1\right) \) as positive class and data \(\left( -1,0\right) ,\left( 0,-1\right) ,\left( -1,-1/2\right) \) as negative class shown in Fig. 12.
For positive points, we have \(\left( 2,0\right) \), \(\left( 0,2\right) , \) \(\left( 2,1\right) \), so
which implies
Again, for the negative points, we have \(\left( -1,0\right) \), \(\left( 0,-1\right) ,\left( -1,-1/2\right) \) and
which gives
From above equations, we get
Thus we get
Hence we get \(\mathbf {w}=(\frac{2}{3},\frac{2}{3})\) that minimizes \(G\left( \mathbf {w}_{i}\right) \) for \(i=1,2\).
Situation 2 We consider the data (1, 0), (0, 1), (1 / 2, 1) as positive class, data \(\left( -4,0\right) ,\left( 0,-4\right) ,(-2,-4)\) as negative class which is shown in Fig. 13.
Now, for positive points of Situation 2, we have (1, 0), (0, 1), (1 / 2, 1) and
which gives
For negative points for this case, we have
which gives
Thus, we obtain that
Thus we get
Hence we get \(\mathbf {w}=(\frac{2}{5},\frac{2}{5})\) that minimize \(G\left( \mathbf {w}_{i}\right) \) for \(i=1,2.\)
In the next Situation 3, we combine of this two groups of data. Now, we have data \(\left( 2,0\right) ,\left( 0,2\right) ,\left( 2,1\right) ,(1,0),(0,1),(1/2,1)\) as positive class and \(\left( -1,0\right) \), \( \left( 0,-1\right) \), \( \left( -1,-1/2\right) \), \(\left( -4,0\right) \), \(\left( 0,-4\right) \), \( (-2,-4)\) as negative class see Fig. 14.
For the positive points of the combination, we have
and
which gives
For negative points for this case, we have
and
which gives
From this, we obtain that
Thus we get
Hence we get \(\mathbf {w}=(1,1)\) that minimize \(G\left( \mathbf {w}_{i}\right) \) for \(i=1,2.\)
The problem of GSVM defined in (10) is equivalent to
Hence the problem of GSVM becomes to the problem of generalized variational inequality .
Note that it we take \(G^{\prime }\left( \mathbf {w}_{i}\right) = \frac{\mathbf {w}_{i}}{\left\| \mathbf {w}_{i}\right\| }\), then from (11), we obtain
or
We study the sufficient conditions for the existence of solutions for GSVM problems.
Proposition 1
Let \(G: \mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}\) be a differentiable operator. An element \(\mathbf {w}^{*}\in \mathbb {R}^{p}\) minimizes G if and only if \(G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}\), that is, \(\mathbf {w}^{*}\in \mathbb {R}^{p}\) solves GSVM if and only if \(G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}.\)
Proof
Let \(G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}\), then for all \(\mathbf {v}\in \mathbb {R}^{p}\) with \(\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j}, \mathbf {x)}+B\right) -1\ge 0\),
and consequently, the inequality
holds for all \(\mathbf {v}\in \mathbb {R}^{p}.\) Hence \(\mathbf {w}^{*}\in \mathbb {R}^{p}\) solves problem of GSVM.
Conversely, assume that \(\mathbf {w}^{*}\in \mathbb {R}^{p}\) satisfies
Taking \(\mathbf {v}=\mathbf {w}^{*}-G^{\prime }\left( \mathbf {w}^{*}\right) \) in the above inequality implies that
which further implies
and we get \(G^{\prime }(\mathbf {w}^{*})=\mathbf {0}.\) \(\square \)
Remark 1
Note that if \(G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}\) at some \(\mathbf {w}^{*}\in \mathbb {R}^{p}\), then we obtain \(\frac{\mathbf {w}^{*}}{\left\| \mathbf {w}^{*}\right\| }=\mathbf {0}\) which implies \(\mathbf {w}^{*}=\mathbf {0}.\) Thus it follows from Proposition 2.4 that if \(G^{\prime }\left( \mathbf {w}^{*}\right) =\mathbf {0}\) at some \(\mathbf {w}^{*}\in \mathbb {R}^{p}\), then \(\mathbf {w}^{*}=\mathbf {0}\ \)solves GSVM problem.
Remark 2
If \(\mathbf {w}^{*}=\mathbf {0}\), then from (8), we obtain
which implies
that is
Since \(\alpha _{j}^{(*)}>0\) for all j, so we have
Definition 2
Let K be a closed and convex subset of \(\mathbb {R}^{n}\). Then, for every point \(\mathbf {x}\in \mathbb {R}^{n}\), there exists a unique nearest point in K, denoted by \(P_{K}\left( \mathbf {x}\right) \), such that \(\left\| \mathbf {x}-P_{K}\left( \mathbf {x}\right) \right\| \le \left\| \mathbf {x}-\mathbf {y}\right\| \) for all \(\mathbf {y}\in K\) and also note that \(P_{K}\left( \mathbf {x}\right) = \mathbf {x}\) if \(\mathbf {x}\in K\). \(P_{K}\) is called the metric projection of \(\mathbb {R}^{n}\) onto K. It is well known that \(P_{K}: \mathbb {R}^{n}\rightarrow K\) is characterized by the properties:
-
(i)
\(P_{K}\left( \mathbf {x}\right) =\mathbf {z}\) for \(\mathbf {x}\in \mathbb {R}^{n}\) if and only if \(<\mathbf {z-x},\mathbf {y}-\mathbf {z}>\) \(\ge \) 0 for all \(\mathbf {y}\in \mathbb {R}^{n}\);
-
(ii)
For every \(\mathbf {x,y}\in \mathbb {R}^{n}\), \(\left\| P_{K} \left( \mathbf {x}\right) -P_{K}\left( \mathbf {y}\right) \right\| ^{2}\) \(\le \) \(<\mathbf {x}-\mathbf {y},P_{K}\left( \mathbf {x}\right) -P_{K}\left( \mathbf {y}\right)>\);
-
(iii)
\(\left\| P_{K}\left( \mathbf {x}\right) -P_{K}\left( \mathbf {y}\right) \right\| \) \(\le \) \(\left\| \mathbf {x}-\mathbf {y}\right\| \), for every \(\mathbf {x,y}\in \mathbb {R}^{n}\), that is, \(P_{K}\) is nonexpansive map.
Proposition 2
Let \(G: \mathbb {R}^{p}\rightarrow \mathbb {R}_{+}^{p}\) be a differentiable operator. An element \(\mathbf {w}^{*}\in \mathbb {R}^{p}\) minimize mapping G defined in (11) if and only if \(\mathbf {w}^{*}\) is the fixed point of map
that is,
where \(P_{\mathbb {R}_{+}^{p}}\) is a projection map from \(\mathbb {R}^{p}\) to \(\mathbb {R}_{+}^{p}\) and \(\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B\right) -1\ge 0.\)
Proof
Suppose \(\mathbf {w}^{*}\in \mathbb {R}_{+}^{p}\) is solution of GSVM. Then for \(\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B\right) -1\ge 0\), we have
Adding \(< \mathbf {w}^{*},\mathbf {w}-\mathbf {w}^{*}>\) on both sides, we get
which further implies that
which is possible only if \(\mathbf {w}^{*}=P_{\mathbb {R}_{+}^{p}}\left( \mathbf {w}^{*}-\rho G^{\prime } \left( \mathbf {w}^{*}\right) \right) \), that is, \(\mathbf {w}^{*}\) is the fixed point of \(G^{\prime }.\)
Conversely, let \(\mathbf {w}^{*}=P_{\mathbb {R}_{+}^{p}}\left( \mathbf {w}^{*}-\rho G^{\prime } \left( \mathbf {w}^{*}\right) \right) \) with \(\eta =\mathbf {y}_{k}\left( \mathbf {\zeta }K(\mathbf {x}_{j},\mathbf {x)}+B\right) -1\ge 0\), then we have
which implies
and so \(\mathbf {w}^{*}\in \mathbb {R}_{+}^{p}\) is the solution of GSVM. \(\square \)
3 Conclusion
The linear and nonlinear data classifications by using support vector machine and generalized support vector machine have been studied. We also studied the sufficient conditions for existence of the solution of generalized support vector machine. Some examples are shown for supporting these results.
References
Adankon, M.M., Cheriet, M.: Model selection for the LS-SVM. Application to handwriting recognition. Pattern Recognit. 42(12), 3264–3270 (2009)
Cortes, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel Based Learning Methods. Cambridge University Press, Cambridge (2000)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the European Conference on Machine Learning. Springer, Heidelberg (1998)
Khan, N., Ksantini, R., Ahmad, I., Boufama, B.: A novel SVM+NDA model for classification with an application to face recognition. Pattern Recognit. 45(1), 66–79 (2012)
Li, S., Kwok, J.T., Zhu, H., Wang, Y.: Texture classification using the support vector machines. Pattern Recognit. 36(12), 2883–2893 (2003)
Liu, R., Wang, Y., Baba, T., Masumoto, D., Nagata, S.: SVM-based active feedback in image retrieval using clustering and unlabeled data. Pattern Recognit. 41(8), 2645–2655 (2008)
Michel, P., Kaliouby, R.E.: Real time facial expresion recognition in video using support vector machines. In: Proceedings of ICMI’03, pp. 258–264 (2003)
Noble, W.S.: Support Vector Machine Applications in Computational Biology. MIT Press, Cambridge (2004)
Shao, Y., Lunetta, R.S.: Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS J. Photogramm. Remote Sens. 70, 78–87 (2012)
Shao, Y.H., Chen, W.J., Deng, N.Y.: Nonparallel hyperplane support vector machine for binary classification problems. Inf. Sci. 263, 22–35 (2014)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1996)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Wang, D., Qi, X., Wen, S., Deng, M.: SVM based fault classifier design for a water level control system. In: Proceedings of 2013 International Conference on Advanced Mechatronic Systems, pp. 152–157. Luoyang, China (2013)
Wang, D., Qi, X., Wen, S., Dan, Y., Ouyang, L., Deng, M.: Robust nonlinear control and SVM classifier based fault diagnosis for a water level process. ICIC Express Lett. 5(1), 767–774 (2014)
Wang, X.Y., Wang, T., Bu, J.: Color image segmentation using pixel wise support vector machine classification. Pattern Recognit. 44(4), 777–787 (2011)
Weston, J., Watkins, C.: Multi-class support vector machines. Technical report CSD-TR- 98-04, Department of Computer Science, Royal Holloway, University of London (1998)
Wu, Y.C., Lee, Y.-S., Yang, J.-C.: Robust and efficient multiclass SVM models for phrase pattern recognition. Pattern Recognit. 41(9), 2874–2889 (2008)
Xue, Z., Ming, D., Song, W., Wan, B., Jin, S.: Infrared gait recognition based on wavelet transform and support vector machine. Pattern Recognit. 43(8), 2904–2910 (2010)
Zhao, Z., Liu, J., Cox, J.: Safe and efficient screening for sparse support vector machine. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 14, pp. 542–551, New York, NY, USA (2014)
Zuo, R., Carranza, E.J.M.: Support vector machine: a tool for mapping mineral prospectivity. Comput. Geosci. 37(12), 1967–1975 (2011)
Acknowledgements
Talat Nazir and Xiaomin Qi are grateful to the Erasmus Mundus project FUSION for supporting the research visit to Mälardalen University, Sweden, and to the Research environment MAM in Mathematics and Applied Mathematics, Division of Applied Mathematics, the School of Education, Culture and Communication of Mälardalen University for creating excellent research environment.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Nazir, T., Qi, X., Silvestrov, S. (2016). Linear and Nonlinear Classifiers of Data with Support Vector Machines and Generalized Support Vector Machines. In: Silvestrov, S., Rančić, M. (eds) Engineering Mathematics II. Springer Proceedings in Mathematics & Statistics, vol 179. Springer, Cham. https://doi.org/10.1007/978-3-319-42105-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-42105-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42104-9
Online ISBN: 978-3-319-42105-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)