Support vector machine classification using semi-parametric model

Akbari, Mohammad Ghassem; Khorashadizadeh, Saeed; Majidi, Mohammad-Hassan

doi:10.1007/s00500-022-07376-2

Support vector machine classification using semi-parametric model

Data analytics and machine learning
Published: 22 August 2022

Volume 26, pages 10049–10062, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

Support vector machine classification using semi-parametric model

Download PDF

Mohammad Ghassem Akbari¹,
Saeed Khorashadizadeh ORCID: orcid.org/0000-0003-0192-3678² &
Mohammad-Hassan Majidi²

266 Accesses
3 Citations
Explore all metrics

Abstract

Pattern recognition and data mining using support vector machine (SVM) have been the focus of widespread researches in recent decades. In SVM, a hyper-plane is designed to classify the training data. A challenge in SVM is that the parameters of hyper-planes are constants. As a result, there may be some critical points that will be classified into a wrong set. It should be mentioned that finding this hyper-plane is very similar to solving a regression problem using parametric or semi-parametric models in statistics. This is the main motivation of this paper. The contribution of this paper is combining SVM classifier and semi-parametric models (SP-SVM) to solve the aforementioned challenge. In fact, using semi-parametric linear model results in some serial linear decision boundaries with several slopes and intercepts. In other words, there are two types of kernels in the proposed SP-SVM: the kernels that perform nonlinear transformation of the input features and the kernels needed in the semi-parametric model. The validations have been done on Iris data set and also some other linearly non-separable classification problems. The accuracy of the proposed SP-SVM outperforms some related algorithms such as K-nearest neighbor (KNN)-based weighted multi-class twin support vector machines (KWMTSVM), support vector classification–regression machine for K-class classification (K-SVCR), twin multi-class classification support vector machines (twin-KSVC), intelligent particle swarm classifier (IPS-classifier) and random forest. The accuracy of SP-SVM is 97.33%. Thus, SP-SVM can play an important role in increasing the accuracy of industrial machines that perform classifications, for example, agricultural products.

Support Vector Machine: Applications and Improvements Using Evolutionary Algorithms

Support Vector Machine Optimization Using Secant Hyperplane Kernel

Support Vector Machine with Customized Kernel

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The support vector machine (SVM) is a learning method, which avoids the problems of over-learning, under-learning and local minimum of network structure in the artificial neural networks. As a binary classification algorithm, it has widespread applications in various fields of engineering and medical sciences (Subasi 2013; Dukart et al. 2013; Alajlan et al. 2012; Haddi et al. 2013; Roy et al. 2016). Time series prediction using SVM has been one of popular research areas in recent years. The SVMs proposed for time series prediction cover many practical application areas from financial market prediction to electric consumption load forecasting and other scientific fields (Sapankevych and Sankar 2009). Pattern recognition is also among the research fields that SVM has been applied frequently (Byun and Lee 2002; Bashbaghi et al. 2017; Christlein et al. 2017; Liu et al. 2018; Shah et al. 2017; Solera-Urena et al. 2012). For example, automatic speech recognition using SVM has been presented in Solera-Urena et al. (2012). In communication systems, various problems can be solved by SVM. For example, in wireless sensor networks, SVM can be used for solving the localization problems (Zhu and Wei 2017). In (Garcia et al. 2006), a robust SVM has been proposed for channel estimation in orthogonal frequency division multiplexing (OFDM) systems.

From mathematical point of view, SVM is based on the structural risk minimized using the maximum margin idea (Ozer et al. 2011). In fact, a convex objective function is optimized to find the classifier (decision boundary). Kernel function plays a dominant role in SVM generalization performance. To be more precise, kernel function transfers the input data into a higher dimensional feature space. As a result, the data can be separated linearly, and the classification will be performed with more accuracy.

Nowadays, pattern recognition plays an important role is industry. There are many industrial robots and other machines such as sorters that perform classification. Food industry is one the industries in which the accuracy of classification is very critical and important. For example, 1% false classification of agricultural products such as pistachio, olive and walnut can decrease the profit significantly. Thus, improving the accuracy of classifies such as SVM is necessary and important and will be applicable in today’s modern industry.

After proposing SVM, extensive studies have been devoted to enhancing its performance using different strategies. Reducing the number of support vectors is one of these strategies. For example, the proposed approach in Downs et al. (2001) allows the recognition and elimination of unnecessary support vectors while leaving the solution unchanged. Also, $\nu$—support vector classification has been proposed that introduces a regularization parameter $\nu$ to control the number of support vectors and margin errors (Gu and Sheng 2017). A robust SVM in Song et al. (2002) has been proposed that tries to solve the over-fitting problem when outliers exist in the training sets and reduce the number of support vectors. Feature extraction is another solution for enhancement of the SVM performance. In (Caoa et al. 2003), three approaches, namely kernel principal component analysis (KPCA), principal component analysis (PCA) and independent component analysis (ICA), have been compared for feature extraction in SVM. Feature normalization is another solution proposed in the literature (Steinwart et al. 2004; Bi et al. 2005). Modifying or changing the kernel function with the aim of enhancing SVM performance is another idea that has been studied extensively (Ye and Suganthan 2012; Zhang et al. 2004; Kuo et al. 2014; Izquierdo-Verdiguier et al. 2013; Moghaddam and Hamidzadeh 2016).

This paper presents a novel SVM using semi-parametric linear models. Regression analysis is a statistical approach for studying the relation between independent variables and a dependent variable. These methods are usually classified into two main methods, so-called parametric and nonparametric methods. Parametric models are very helpful for studying the relationship between variables. However, such methods may sometimes be applied at the risk of introducing modeling biases. In non-parametric models, no prior model structure is required. They can provide useful insight for further parametric fitting. However, they suffer from some drawbacks such as the curse of dimensionality, difficulty of interpretation, and lack of extrapolation capability. Therefore, semi-parametric linear models are more useful and are applied in many applications since both the parametric and nonparametric components can simultaneously exist in the model (Hesamian et al. 2017; Zarei et al. 2020).

This paper is organized as follows. Section 2 presents a brief explanation of SVM. Section 3 demonstrates the proposed SVM using semi-parametric linear regression. The results of some experiments on typical classification problems are illustrated in Sect. 4. Finally, Sect. 5 concludes the paper.

2 Support vector machine

2.1 Linear support vector machines

Without any loss of generality, the classification problem in SVM is demonstrated for two-class problems. The aim is to classify the two groups using a function which is obtained from available data. The main objective is to design a classifier performing well for unseen data. Consider the data in Fig. 1. It is obvious that there are many acceptable linear decision boundaries classifying the data, but there is only one that maximizes the margin (maximizes the distance between the classifier and the closest Lu data point of each group). This linear classifier is known as the optimal separating hyper-plane. Instinctively, we expect this boundary to generalize well in contrast to the other possible boundaries (Gunn 1998).

Consider the training dataset ${\mathbf{x}}_{i} \in R^{d} \,\,\,i = 1,2,...,N$ with a label $y_{i} \in \left\{ { - 1, + 1} \right\}$ for all the training data and $d$ is the dimension of the problem. Consider the hyper-plain described by

$$ {\mathbf{\theta }}^{T} .{\mathbf{x}} + b = 0 $$

(1)

The classification problem is to obtain the hyper-plane such that ${{\varvec{\uptheta}}}^{T} .{\mathbf{x}}_{i} + b \ge + 1$ for positive class and ${{\varvec{\uptheta}}}^{T} .{\mathbf{x}}_{i} + b \le - 1$ for negative class. According to Gunn 1998, in order to find the hyper-plane with the largest margin, the following minimizing problem should be solved:

$$ \mathop {\min }\limits_{{{\mathbf{w}},b}} \,J({{\varvec{\uptheta}}}) = \frac{{\left\| {{\varvec{\uptheta}}} \right\|^{2} }}{2} $$

(2)

The constraint of this minimization problem is

$$ y_{i} ({{\varvec{\uptheta}}}^{T} .{\mathbf{x}}_{i} + b) - 1 \ge 0 $$

(3)

This quadratic programming (QP) problem can be solved by standard techniques such as Lagrange multipliers (Gutschoven and Verlinde 2000; Osuna et al. 1997) or intelligent optimization techniques such as particle swarm optimization (PSO) and genetic algorithm (GA) (Fateh and Khorashadizadeh 2012; Zadeh et al. 2016).

2.2 Nonlinear support vector machines and kernels

Real-life classification problems may be difficult to be solved by a linear SVM. Therefore, its extension to nonlinear decision boundary is inevitable. In nonlinear problems, the kernel function will map the input space into a high dimensional feature space by a nonlinear transformation. For example, consider Fig. 2. According to Cover theorem, input space can be converted into a new feature space in which the patterns can be linearly separable with high probability, if the dimensionality of the feature space is high enough (Haykin 1999). This nonlinear transformation is performed in implicit way through so-called kernel functions.

2.2.1 Inner-product kernels

In order to deal with nonlinear classification problems using SVM, a mapping $\varphi :\,R^{n} \to H$ is required. This mapping will transform the input data into the Euclidean space $H$ which is a significantly higher dimensional. Now, the linear SVM is performed in the new space with dimension $d$. As a result, the training algorithm uses the data through dot product in $H$ of the form $\varphi ({\mathbf{x}}_{i} )\varphi ({\mathbf{x}}_{j} )$. If the number of training vectors $\varphi ({\mathbf{x}}_{i} )$ is very large, then the calculation of the dot products will be time-consuming and computational. Moreover, $\varphi$ is not known a priori. In this situation, Mercer theorem (Burges 1998) is used to replace $\varphi ({\mathbf{x}}_{i} )\varphi ({\mathbf{x}}_{j} )$ by a positive definite symmetric kernel function $K({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} )$. In other words, $K({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} ) = \varphi ({\mathbf{x}}_{i} )\varphi ({\mathbf{x}}_{j} )$. To be more precise, kernel substitution paves the way for designing nonlinear methods from algorithms previously limited to dealing with linear separable input data (Campbell 2000). In addition, it prevents from.

the so-called challenge of dimension curse (Vapnik 1995). Some typical kernel functions are listed in Table 1.

Table 1 Some typical kernels for nonlinear mapping in SVM

Full size table

2.3 Designing SVM based on Lagrange optimization

The optimization problem described by (2) and (3) is a typical quadratic programming problem, since the objective function $J({{\varvec{\uptheta}}})$ is quadratic and the constraints are linear. Based on this constrained optimization problem and considering the training set $\left\{ {\left( {{\mathbf{x}}_{i} ,d_{i} } \right)} \right\}_{i = 1}^{N}$ and the Lagrange multipliers $\left\{ {\alpha_{i} } \right\}_{i = 1}^{N}$, another problem called the dual problem can be formulated in the form of

$$ \mathop {\max }\limits_{{}} \,Q(\alpha ) = \sum\limits_{i = 1}^{N} {\alpha_{i} - \frac{1}{2}} \sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {\alpha_{i} \alpha_{j} d_{i} d_{j} {\mathbf{x}}_{i}^{T} {\mathbf{x}}_{j} } } $$

(4)

subject to the constraints

$$ \sum\limits_{i = 1}^{N} {\alpha_{i} d_{i} } = 0 $$

(5)

$$ \alpha_{i} \ge 0\,\,\,\,\,\,\,i = 1,2,...,N $$

(6)

For non-separable patterns, the constraint (6) should be modified as (Byun and Lee 2002)

$$ 0 \le \alpha_{i} \le C\,\,\,\,\,\,\,\,\,\,\,\,i = 1,2,...,N $$

(7)

where C is a user-defined positive parameter. This optimization problem can be solved by several methods mentioned in Byun and Lee (2002).

3 The proposed SVM based on semi-parametric linear regression

Consider the hyper plain that classifies the training dataset in the form of

$$ {{\varvec{\uptheta}}}^{T} .{\mathbf{x}}_{i} - b = 0\,\,\,\,\,\,\,\,\,\,\,y_{i} \in \{ - 1,1\} \,\,\,\,i = 1,2,...,m $$

(8)

in which $y_{i} \in \{ - 1,1\} \,\,\,\,i = 1,2,...,m$ is the label of each sample ${\mathbf{x}}_{i}$. Suppose that ${{\varvec{\uptheta}}},{\mathbf{x}}_{i} \in {\mathbb{R}}^{p}$, i.e., ${{\varvec{\uptheta}}}^{T} = \left[ {\begin{array}{*{20}c} {\theta_{1} } & {\theta_{2} } & {\begin{array}{*{20}c} {...} & {\theta_{p} } \\ \end{array} } \\ \end{array} } \right]$ and ${\mathbf{x}}_{i}^{T} = \left[ {\begin{array}{*{20}c} {x_{i1} } & {x_{i2} } & {...} & {x_{ip} } \\ \end{array} } \right]$. The proposed method starts by rewriting (8) as

$$ {{\varvec{\uptheta}}}^{T} .{\mathbf{x}}_{i} - f({\mathbf{x}}_{i} ) = 0 $$

(9)

where $f({\mathbf{x}}_{i} )$ is a term that will be calculated later. In fact, instead of obtaining a constant value for the parameter $b$ in (8), we will make it dependent to the sample ${\mathbf{x}}_{i}$. In other words, $f({\mathbf{x}}_{i} )$ is the term that results in the best classification performance with the smallest error. Since $f({\mathbf{x}}_{i} )$ is unknown, we try to estimate it by

$$ \hat{f}({\mathbf{x}}_{i} ) = \sum\limits_{j = 1}^{m} {w_{j} } ({\mathbf{x}}_{i} )({{\varvec{\uptheta}}}^{T} .{\mathbf{x}}_{j} ) $$

(10)

$$ w_{j} ({\mathbf{x}}_{i} ) = \frac{{K\left( {\frac{{\left\| {{\mathbf{x}}_{j} - {\mathbf{x}}_{i} } \right\|}}{h}} \right)}}{{\sum\nolimits_{j = 1}^{m} {K\left( {\frac{{\left\| {{\mathbf{x}}_{j} - {\mathbf{x}}_{i} } \right\|}}{h}} \right)} }} $$

(11)

where $K(.)$ is the kernel function and $h$ is the smoother parameter that is obtained by cross-validation measure (Campbell 2000). The kernel function has the following properties (Wasserman 2006):

$$ \int {K(u)du = 1} ,\,\,\,\,\,\int {uK(u)du = 0,\,\,\,\,\,\,\int {u^{2} K(u)du = \sigma_{k}^{2} > 0} } $$

(12)

Some commonly used kernels are illustrated in Fig. 3.

Substitution of $\hat{f}({\mathbf{x}}_{i} )$ from (10) into (9) results in:

$$ {{\varvec{\uptheta}}}^{T} .{\mathbf{x}}_{i} - \sum\limits_{j = 1}^{m} {w_{j} } ({\mathbf{x}}_{i} )({{\varvec{\uptheta}}}^{T} .{\mathbf{x}}_{j} ) = 0\, $$

(13)

In other words, we have ${{\varvec{\uptheta}}}^{T} {\mathbf{x}}_{i}^{*} = 0\,$ in which

$$ {\mathbf{x}}_{i}^{*} = {\mathbf{x}}_{i} - \sum\limits_{j = 1}^{m} {w_{j} } ({\mathbf{x}}_{i} ){\mathbf{x}}_{j} $$

(14)

Now, in order to obtain the optimal values for the vector ${{\varvec{\uptheta}}}$ based on SVM that yields in the best classification performance, the following optimization problem should be solved:

$$ L = \mathop {\min }\limits_{{{\varvec{\uptheta}}}} \left\| {{\varvec{\uptheta}}} \right\|^{2} \,\,\,subject\,\,to\,\,\,y_{i} ({{\varvec{\uptheta}}}^{T} .{\mathbf{x}}_{i}^{*} ) - 1 \ge 0 $$

(15)

Using Lagrange optimization, the cost function (15) is rewritten in the form of

$$ L_{p} = \mathop {\min }\limits_{{{\varvec{\uptheta}}}} \left\| {{\varvec{\uptheta}}} \right\|^{2} - \sum\limits_{i = 1}^{m} {\alpha_{i} \left( {y_{i} ({{\varvec{\uptheta}}}^{T} .\,{\mathbf{x}}_{i}^{*} ) - 1} \right)} $$

(16)

where $\alpha_{i}$ is called Lagrange multiplier. Therefore, differentiating $L_{p}$ with respect to ${{\varvec{\uptheta}}}$ and setting the result equal to zero $\left( {i.e.,\frac{\partial }{{\partial {{\varvec{\uptheta}}}}}L_{p} = 0} \right)$, we get:

$$ {{\varvec{\uptheta}}} = \sum\limits_{i = 1}^{m} {\alpha_{i} y_{i} \,{\mathbf{x}}_{i}^{*} } $$

(17)

Due to the duality theorem (Bertsekas 1995, we can reformulate the cost function (16) as

$$ L_{D} = \sum\limits_{i = 1}^{m} {\alpha_{i} \left( {y_{i} ({{\varvec{\uptheta}}}^{T} .\,{\mathbf{x}}_{i}^{*} + 0) - 1} \right)} - \frac{1}{2}\sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{m} {\alpha_{i} \alpha_{j} y_{i} y_{j} \left( {\,{\mathbf{x}}_{i}^{{{\mathbf{*}}T}} .{\mathbf{x}}_{j}^{{\mathbf{*}}} } \right)} } $$

(18)

Substitution of (17) into (18) results in

$$ L_{D} = \sum\limits_{i = 1}^{m} {\alpha_{i} \left[ {y_{i} \left( {\sum\limits_{j = 1}^{m} {\alpha_{j} y_{j} \left( {\,{\mathbf{x}}_{i}^{{{\mathbf{*}}T}} .{\mathbf{x}}_{j}^{{\mathbf{*}}} } \right)} } \right) - 1} \right]} - \frac{1}{2}\sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{m} {\alpha_{i} \alpha_{j} y_{i} y_{j} \left( {\,{\mathbf{x}}_{i}^{{{\mathbf{*}}T}} .{\mathbf{x}}_{j}^{{\mathbf{*}}} } \right)} } $$

(19)

For classification problems in which the training set (input space) is not linearly separable, a nonlinear map is used to produce linearly separable data (feature space) (Haykin 2007). Suppose that $\{ \varphi_{j} ({\mathbf{x}})\}_{j = 1}^{{m_{1} }}$ is a set of nonlinear transformations from the input space to the feature space and $m_{1}$ is the dimension of the feature space. As a result, the hyper-plane acting as the decision surface is given by:

$$ \sum\limits_{j = 1}^{{m_{1} }} {\theta_{j} \varphi_{j} ({\mathbf{x}})} + b = 0 $$

(20)

Assuming $\theta_{0} = b$ and $\varphi_{0} ({\mathbf{x}}) = 1$, the hyper-plane (20) can be simply converted to

$$ \sum\limits_{j = 0}^{{m_{1} }} {\theta_{j} \varphi_{j} ({\mathbf{x}})} = 0 $$

(21)

Adapting the optimal weight vector given in (17) to this new situation involving feature space where we now seek linear separabality of the features yields in

$$ {{\varvec{\uptheta}}} = \sum\limits_{i = 1}^{m} {\alpha_{i} y_{i} \,\varphi ({\mathbf{x}}_{i}^{*} )} $$

(22)

As a result, the cost function of the dual problem given in (19) is rewritten as

$$\begin{gathered} L_{D} = \mathop {\sup }\limits_{{{\mathbf{\alpha }},\lambda }} \left\{ {\sum\limits_{{i = 1}}^{m} {\left\{ {\alpha _{i} \left( {\sum\limits_{{j = 1}}^{m} {y_{i} y_{j} \alpha _{j} \psi \left( {{\mathbf{x}}_{i}^{*} ,{\mathbf{x}}_{j}^{*} } \right)} } \right)} \right.} } \right. \hfill \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\left. { - \frac{1}{2}\sum\limits_{{i = 1}}^{m} {\left\{ {\sum\limits_{{j = 1}}^{m} {\alpha _{i} \alpha _{j} y_{i} y_{j} \psi \left( {{\mathbf{x}}_{i}^{*} ,{\mathbf{x}}_{j}^{*} } \right)} } \right\}} } \right\} \hfill \\ \end{gathered}$$

(23)

where $\psi \left( {{\mathbf{x}}_{i}^{*} ,{\mathbf{x}}_{j}^{*} } \right) = \varphi ({\mathbf{x}}_{i}^{*} )\varphi ({\mathbf{x}}_{j}^{*} )$ is the inner-product kernel.

4 Experimental results

In order to investigate the performance of the proposed method in classification problems, some examples are presented. Then, Iris dataset will be classified using the proposed method (SP-SVM), and the results of some previous related works on Iris dataset will be presented for comparision.

Example 1

Consider the training data and the corresponding labels presented in Fig. 4. In other words, this training data can be described as given in Table 2.

Table 2 Training data in Example 1

Full size table

The kernel used for the semi-parametric regression is a Gaussian kernel:

$$ K\left( {\frac{{\left\| {{\mathbf{x}}_{j} - {\mathbf{x}}_{i} } \right\|}}{h}} \right) = \exp \left( { - \frac{{\left\| {{\mathbf{x}}_{j} - {\mathbf{x}}_{i} } \right\|^{2} }}{h}} \right) $$

(24)

and the polynomial kernel

$$ \psi \left( {{\mathbf{x}}_{i}^{*} ,{\mathbf{x}}_{j}^{*} } \right) = ({\mathbf{x}}_{j}^{*T} {\mathbf{x}}^{*}_{i} + 1) $$

(25)

has been selected for the nonlinear mapping in SVM. In this example, we have assumed that $h^{ - 1} = 0.005$. The weights calculated by the proposed method are $\theta_{1} = 3.09602,\theta_{2} = - 2.64302$, and the predicted outputs (labels) are

$$ \begin{gathered} y = \{ 0.999996,\, - 1.52512,\,4.05512,\,1.53007, \hfill \\ \;\;\;\;\;\;\left. { - \,0.999996,\, - 3.53841,\,2.04821} \right\} \hfill \\ \end{gathered} $$

(26)

It is obvious that all the data have been correctly classified, since the signs of predicted values in (26) arethe same as the sign of values in the training data (the third row in Table 2). In order to plot the decision boundary, one can use (14) to get

$$ \begin{gathered} {\mathbf{x}}_{i}^{*} = \left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,i = 1,2,...,8 \hfill \\ \left\{ \begin{gathered} x_{i1}^{*} = x_{i1} - \sum\limits_{j = 1}^{8} {Exp\left[ { - 0.005 \times \left( {\left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} } \right)} \right]} \hfill \\ x_{i2}^{*} = x_{i2} - \sum\limits_{j = 1}^{8} {Exp\left[ { - 0.005 \times \left( {\left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} } \right)} \right]} \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered} $$

The values of $\left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)$ are presented in Table 3. Moreover, the decision boundary in the $\left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)$ plane is plotted in Fig. 5.

Table 3 Transformed values in Example 1

Full size table

Thus, using the aforementioned optimal weights and values obtained for $\left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,$ and the equation ${{\varvec{\uptheta}}}^{T} {\mathbf{x}}_{i}^{*} = 0\,$, the decision boundary in the $\left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,$ plane is given by

$$ 3.096x_{i1}^{*} - 2.64x_{i2}^{*} = 0\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,i = 1,2,...,8 $$

Example 2

Consider the training data and the corresponding labels presented in Fig. 6. In other words, this training data can be described as given in Table 4.

Table 4 Training data in Example 2

Full size table

The kernel used for the semi-parametric regression is a the same as given in (24) with $h = 40$, and the polynomial kernel

$$ \psi \left( {{\mathbf{x}}_{i}^{*} ,{\mathbf{x}}_{j}^{*} } \right) = ({\mathbf{x}}_{j}^{*T} {\mathbf{x}}^{*}_{i} + 1)^{2} $$

(27)

has been adopted for the nonlinear mapping in SVM. The weights calculated by the proposed method are $\theta _{1} = - 33.9996,\theta _{2} = 21.9998,\theta _{3} = 9.99987, \theta _{4} = - 5.9996,\theta _{5} = 2.0001$ and the predicted outputs (labels) are

$$ \begin{gathered} y = \{ 0.99982,\,6.99987,\,0.9994,\, - 0.99987\} \hfill \\ \,\,\,\,\,\,\,\,\,\,\left. { - 10.9999,\,9.9997,\,0.9993,0.9994} \right\} \hfill \\ \end{gathered}$$

(28)

As it can be seen, the proposed method can correctly classify these data. In order to obtain the decision boundary, one can use (14) to get

$$ \begin{gathered}{\mathbf{x}}_{i}^{*} = \left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,i = 1,2,...,8 \hfill \\ \left\{ \begin{gathered} x_{i1}^{*} = x_{i1} - \sum\limits_{j = 1}^{8} {0.5 \times I\left[ {\left. \begin{gathered} \left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} + \left( {x_{{_{i1} }}^{2} - x_{{_{j1} }}^{2} } \right)^{2} + \left( {x_{{_{i2} }}^{2} - x_{{_{j2} }}^{2} } \right)^{2} + \left( {x_{i1} x_{i2} - x_{j1} x_{j2} } \right)^{2} \hfill \\ \le 40 \hfill \\ \end{gathered} \right]} \right. \times x_{j1} } \hfill \\ \hfill \\ x_{i2}^{*} = x_{i2} - \sum\limits_{j = 1}^{8} {0.5 \times I\left[ {\left. \begin{gathered} \left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} + \left( {x_{{_{i1} }}^{2} - x_{{_{j1} }}^{2} } \right)^{2} + \left( {x_{{_{i2} }}^{2} - x_{{_{j2} }}^{2} } \right)^{2} + \left( {x_{i1} x_{i2} - x_{j1} x_{j2} } \right)^{2} \hfill \\ \le 40 \hfill \\ \end{gathered} \right]} \right. \times x_{j2} } \hfill \\ \end{gathered} \right. \hfill \\ I\left( A \right) = \left\{ \begin{gathered} 1\,\,\,\,\,\,\,\,\,\,A \hfill \\ 0\,\,\,\,\,\,\,\,A^{c} \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered} $$

Thus, using the aforementioned optimal weights and values obtained for $\left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,$ and the equation ${{\varvec{\uptheta}}}^{T} {\mathbf{x}}_{i}^{*} = 0\,$, the decision boundary in the $\left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,$ plane is given by

$$ - 34x_{{i1}}^{*} + 22x_{{i2}}^{*} + 10\left( {x_{{i1}}^{2} } \right)^{*} - 6\left( {x_{{i2}}^{2} } \right)^{*} + 2\left( {x_{{i1}} x_{{i2}} } \right)^{*} = 0\;\;i = 1,2,...,8 $$

Example 3

Consider the training data and the related classes presented in Fig. 7. This training data can be described as given in Table 5.

Table 5 Training data in Example 3

Full size table

The kernel used for the semi-parametric regression is the same as given in (24) with $h = 40$, and the polynomial kernel $\psi \left( {{\mathbf{x}}_{i}^{*} ,{\mathbf{x}}_{j}^{*} } \right) = ({\mathbf{x}}_{j}^{*T} {\mathbf{x}}^{*}_{i} + 1)^{2}$ has been adopted for feature transformation. The weights obtained by the proposed method are θ₁ = − 28.33, θ₂ = − 5.77, θ₃ = 22.1099, θ₄ = 9.11057, θ₅ = − 19.5545 and the predicted outputs (labels) are

$$ \begin{gathered} y\, = \,\{ 15.22, - 0.99,\,0.99,\, - 3.77,\, - 39.55, \hfill \\ \left. {{\mkern 1mu} \;\;\;\;\;\;\; - 3.55,\, - 0.99,\,43.1,\,20.22,\,0.99} \right\} \hfill \\ \end{gathered} $$

(29)

As it can be seen, the proposed method can correctly classify these data. In order to obtain the decision boundary, one can use (14) to get

$$ \begin{gathered}{\mathbf{x}}_{i}^{*} = \left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,i = 1,2,...,11 \hfill \\ \left\{ \begin{gathered} x_{i1}^{*} = x_{i1} - \sum\limits_{j = 1}^{11} {0.5 \times I\left[ {\left. \begin{gathered} \left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} + \left( {x_{{_{i1} }}^{2} - x_{{_{j1} }}^{2} } \right)^{2} + \left( {x_{{_{i2} }}^{2} - x_{{_{j2} }}^{2} } \right)^{2} + \left( {x_{i1} x_{i2} - x_{j1} x_{j2} } \right)^{2} \hfill \\ \le 40 \hfill \\ \end{gathered} \right] \times x_{j1} } \right.} \hfill \\ \hfill \\ x_{i2}^{*} = x_{i2} - \sum\limits_{j = 1}^{11} {0.5 \times I\left[ {\left. \begin{gathered} \left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} + \left( {x_{{_{i1} }}^{2} - x_{{_{j1} }}^{2} } \right)^{2} + \left( {x_{{_{i2} }}^{2} - x_{{_{j2} }}^{2} } \right)^{2} + \left( {x_{i1} x_{i2} - x_{j1} x_{j2} } \right)^{2} \hfill \\ \le 40 \hfill \\ \end{gathered} \right] \times x_{j2} } \right.} \hfill \\ \end{gathered} \right. \hfill \\ I\left( A \right) = \left\{ \begin{gathered} 1\,\,\,\,\,\,\,\,\,\,A \hfill \\ 0\,\,\,\,\,\,\,\,A^{c} \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered} $$

Thus, using the aforementioned optimal weights and values obtained for $\left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,$ and the equation ${{\varvec{\uptheta}}}^{T} {\mathbf{x}}_{i}^{*} = 0\,$, the decision boundary in the $\left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,$ plane is given by

$$ - 28.33x_{i1}^{*} - 5.78x_{i2}^{*} + 22.11\left( {x_{i1}^{2} } \right)^{*} + 9.11\left( {x_{i2}^{2} } \right)^{*} - 19.55\left( {x_{i1} x_{i2} } \right)^{*} = 0\,\,\,\,\,\,\,\,i = 1,\,...,11 $$

Example 4

Consider the training data and the related classes presented in Fig. 8. This training data can be described as given in Table 6.

Table 6 Training data in Example 4

Full size table

The kernel used for the semi-parametric regression is the same as given in (24) with $h = 40$, and the polynomial kernel $\psi \left( {{\mathbf{x}}_{i}^{*} ,{\mathbf{x}}_{j}^{*} } \right) = ({\mathbf{x}}_{j}^{*T} {\mathbf{x}}^{*}_{i} + 1)^{2}$ has been used for feature transformation. The weights achieved by the proposed method are $\theta_{1} = - 6.15384,\theta_{2} = - 11.6923,\theta_{3} = 5.84615,\theta_{4} = 6.15383,\theta_{5} = - 4.76922$, and the predicted outputs (labels) are

$$ y = \{ 9.3,\,\, - 0.99,\,\,\,0.99,\,\,\,3.61,\,\, - 11.46,\,\,\, - 2.076,\,\,\, - 0.99,12.3,13.53,1,0.99\} $$

(30)

As it can be seen, the proposed method can correctly classify these data. In order to obtain the decision boundary, one can use (14) to get

$$ \begin{gathered} \;\;\;{\mathbf{x}}_{i}^{*} = \left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,\qquad i = 1,2,...,11 \hfill \\ \left\{ \begin{gathered} x_{i1}^{*} = x_{i1} - \sum\limits_{j = 1}^{11} {0.5 \times I\left[ {\left. \begin{gathered} \left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} + \left( {x_{{_{i1} }}^{2} - x_{{_{j1} }}^{2} } \right)^{2} + \left( {x_{{_{i2} }}^{2} - x_{{_{j2} }}^{2} } \right)^{2} + \left( {x_{i1} x_{i2} - x_{j1} x_{j2} } \right)^{2} \hfill \\ \le 40 \hfill \\ \end{gathered} \right] \times x_{j1} } \right.} \hfill \\ \hfill \\ x_{i2}^{*} = x_{i2} - \sum\limits_{j = 1}^{11} {0.5 \times I\left[ {\left. \begin{gathered} \left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} + \left( {x_{{_{i1} }}^{2} - x_{{_{j1} }}^{2} } \right)^{2} + \left( {x_{{_{i2} }}^{2} - x_{{_{j2} }}^{2} } \right)^{2} + \left( {x_{i1} x_{i2} - x_{j1} x_{j2} } \right)^{2} \hfill \\ \le 40 \hfill \\ \end{gathered} \right] \times x_{j2} } \right.} \hfill \\ \end{gathered} \right. \hfill \\ \, I\left( A \right) = \left\{ \begin{gathered} 1\,\,\,\,\,\,\,\,\,\,A \hfill \\ 0\,\,\,\,\,\,\,\,A^{c} \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered} $$

Thus, using the aforementioned optimal weights and values obtained for $\left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,$ and the equation ${{\varvec{\uptheta}}}^{T} {\mathbf{x}}_{i}^{*} = 0\,$, the decision boundary in the $\left( {x_{i1}^{*} \,,\,x_{i2}^{*} } \right)\,$ plane is given by

$$ - 6.15x_{i1}^{*} - 11.698x_{i2}^{*} + 5.85\left( {x_{i1}^{2} } \right)^{*} + 6.15\left( {x_{i2}^{2} } \right)^{*} + 4.77\left( {x_{i1} x_{i2} } \right)^{*} = 0\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,i = 1,2,...,11 $$

Example 5 In order to test the performance of the proposed SVM on real data, Iris dataset from UCI Machine Learning Repository is used. Since there are 3 classes (+1, 0 and -1) in this dataset, we use the proposed method twice. Since there are 4 inputs in this dataset, it is impossible to plot them and distinguish the decision boundary. However, in order to decide which class should be separated in the first stage, consider Figs. 9, 10, 11, 12, 13, 14 in which some 2 dimensional plots of the input features are plotted. As seen in these figures, the green cluster with the desired output “+1” is completely separated from the other clusters. Therefore, in the first step, the data will be classified into 2 groups: the green patterns are labeled by “+1,” and the black and red patterns are labeled by “−1.” In order to obtain the decision boundary, one can use (14) to obtain:

$$ \begin{gathered} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\mathbf{x}}_{i}^{*} = \left( {x_{i1}^{*} \,,\,x_{i2}^{*} \,\,,\,\,x_{i3}^{*} \,,\,x_{i4}^{*} } \right)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,i = 1,2,...,150 \hfill \\ \left\{ \begin{gathered} x_{i1}^{*} = x_{i1} - \sum\limits_{j = 1}^{11} {10 \times I\left[ {\left. \begin{gathered} \left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} + \left( {x_{i3} - x_{j3} } \right)^{2} + \left( {x_{i4} - x_{j4} } \right)^{2} + \left( {x_{{_{i1} }}^{2} - x_{{_{j1} }}^{2} } \right)^{2} + \left( {x_{{_{i2} }}^{2} - x_{{_{j2} }}^{2} } \right)^{2} \hfill \\ + \left( {x_{{_{i3} }}^{2} - x_{{_{j3} }}^{2} } \right)^{2} + \left( {x_{{_{i4} }}^{2} - x_{{_{j4} }}^{2} } \right)^{2} + \left( {x_{i1} x_{i2} - x_{j1} x_{j2} } \right)^{2} + \left( {x_{i1} x_{i3} - x_{j1} x_{j3} } \right)^{2} + \left( {x_{i1} x_{i4} - x_{j1} x_{j4} } \right)^{2} \hfill \\ + \left( {x_{i2} x_{i3} - x_{j2} x_{j3} } \right)^{2} + \left( {x_{i2} x_{i4} - x_{j2} x_{j4} } \right)^{2} + \left( {x_{i3} x_{i4} - x_{j3} x_{j4} } \right)^{2} \le 40 \hfill \\ \end{gathered} \right] \times x_{j1} } \right.} \hfill \\ x_{i2}^{*} = x_{i2} - \sum\limits_{j = 1}^{11} {10 \times I\left[ {\left. \begin{gathered} \left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} + \left( {x_{i3} - x_{j3} } \right)^{2} + \left( {x_{i4} - x_{j4} } \right)^{2} + \left( {x_{{_{i1} }}^{2} - x_{{_{j1} }}^{2} } \right)^{2} + \left( {x_{{_{i2} }}^{2} - x_{{_{j2} }}^{2} } \right)^{2} \hfill \\ + \left( {x_{{_{i3} }}^{2} - x_{{_{j3} }}^{2} } \right)^{2} + \left( {x_{{_{i4} }}^{2} - x_{{_{j4} }}^{2} } \right)^{2} + \left( {x_{i1} x_{i2} - x_{j1} x_{j2} } \right)^{2} + \left( {x_{i1} x_{i3} - x_{j1} x_{j3} } \right)^{2} + \left( {x_{i1} x_{i4} - x_{j1} x_{j4} } \right)^{2} \hfill \\ + \left( {x_{i2} x_{i3} - x_{j2} x_{j3} } \right)^{2} + \left( {x_{i2} x_{i4} - x_{j2} x_{j4} } \right)^{2} + \left( {x_{i3} x_{i4} - x_{j3} x_{j4} } \right)^{2} \le 40 \hfill \\ \end{gathered} \right] \times x_{j2} } \right.} \hfill \\ x_{i3}^{*} = x_{i2} - \sum\limits_{j = 1}^{11} {10 \times I\left[ {\left. \begin{gathered} \left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} + \left( {x_{i3} - x_{j3} } \right)^{2} + \left( {x_{i4} - x_{j4} } \right)^{2} + \left( {x_{{_{i1} }}^{2} - x_{{_{j1} }}^{2} } \right)^{2} + \left( {x_{{_{i2} }}^{2} - x_{{_{j2} }}^{2} } \right)^{2} \hfill \\ + \left( {x_{{_{i3} }}^{2} - x_{{_{j3} }}^{2} } \right)^{2} + \left( {x_{{_{i4} }}^{2} - x_{{_{j4} }}^{2} } \right)^{2} + \left( {x_{i1} x_{i2} - x_{j1} x_{j2} } \right)^{2} + \left( {x_{i1} x_{i3} - x_{j1} x_{j3} } \right)^{2} + \left( {x_{i1} x_{i4} - x_{j1} x_{j4} } \right)^{2} \hfill \\ + \left( {x_{i2} x_{i3} - x_{j2} x_{j3} } \right)^{2} + \left( {x_{i2} x_{i4} - x_{j2} x_{j4} } \right)^{2} + \left( {x_{i3} x_{i4} - x_{j3} x_{j4} } \right)^{2} \le 40 \hfill \\ \end{gathered} \right] \times x_{j3} } \right.} \hfill \\ x_{i4}^{*} = x_{i2} - \sum\limits_{j = 1}^{11} {10 \times I\left[ {\left. \begin{gathered} \left( {x_{i1} - x_{j1} } \right)^{2} + \left( {x_{i2} - x_{j2} } \right)^{2} + \left( {x_{i3} - x_{j3} } \right)^{2} + \left( {x_{i4} - x_{j4} } \right)^{2} + \left( {x_{{_{i1} }}^{2} - x_{{_{j1} }}^{2} } \right)^{2} + \left( {x_{{_{i2} }}^{2} - x_{{_{j2} }}^{2} } \right)^{2} \hfill \\ + \left( {x_{{_{i3} }}^{2} - x_{{_{j3} }}^{2} } \right)^{2} + \left( {x_{{_{i4} }}^{2} - x_{{_{j4} }}^{2} } \right)^{2} + \left( {x_{i1} x_{i2} - x_{j1} x_{j2} } \right)^{2} + \left( {x_{i1} x_{i3} - x_{j1} x_{j3} } \right)^{2} + \left( {x_{i1} x_{i4} - x_{j1} x_{j4} } \right)^{2} \hfill \\ + \left( {x_{i2} x_{i3} - x_{j2} x_{j3} } \right)^{2} + \left( {x_{i2} x_{i4} - x_{j2} x_{j4} } \right)^{2} + \left( {x_{i3} x_{i4} - x_{j3} x_{j4} } \right)^{2} \le 40 \hfill \\ \end{gathered} \right] \times x_{j4} } \right.} \hfill \\ \end{gathered} \right. \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,I\left( A \right) = \left\{ \begin{gathered} 1\,\,\,\,\,\,\,\,\,\,A \hfill \\ 0\,\,\,\,\,\,\,\,A^{c} \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered} $$

After running the optimization problem, the optimal weights are obtained and the optimal hyper-plane is given by

$$ \begin{gathered} - 0.104x_{{i1}}^{*} - 0.042x_{{i2}}^{*} 0.0072x_{{i3}}^{*} + 0.0316x_{{i4}}^{*} + 0.0246\left( {x_{{i1}}^{2} } \right)^{*} \hfill \\ - 0.106\left( {x_{{i2}}^{2} } \right) + 0.227\left( {x_{{i3}}^{2} } \right)^{*} + 0.096\left( {x_{{i2}}^{2} } \right)^{*} - 0.0796\left( {x_{{i1}} x_{{i2}} } \right)^{*} \hfill \\ + 0.0743\left( {x_{{i1}} x_{{i3}} } \right)^{*} + 0.19\left( {x_{{i1}} x_{{i4}} } \right) + 0.1778\left( {x_{{i2}} x_{{i3}} } \right)^{*} \hfill \\ + 0.146\left( {x_{{i2}} x_{{i4}} } \right)^{*} + 0.188\left( {x_{{i3}} x_{{i4}} } \right)^{*} = 0\;\;\;i = 1,2,...,150 \hfill \\ \end{gathered} $$

This hyper-plane can classify the patterns correctly without any error.

Now, a hyper-plane should be calculated to classify the red and black patterns. Therefore, the red group is labeled by “+1” and the black group is labeled by “−1.” In order to obtain the decision boundary, one can use (14) to obtain:

$$ \begin{gathered} \,\,\,\,\,\,\,\,\,\,\,\,\,{\mathbf{x}}_{i}^{*} = \left( {x_{{i1}}^{*} \,,\,x_{{i2}}^{*} \,\,,\,\,x_{{i3}}^{*} \,,\,x_{{i4}}^{*} } \right)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,i = 1,2,...,150 \hfill \\ \left\{ \begin{gathered} x_{{i1}}^{*} = x_{{i1}} - \sum\limits_{{j = 1}}^{{11}} {8 \times I\left[ {\left. \begin{gathered} \left( {x_{{i1}} - x_{{j1}} } \right)^{2} + \left( {x_{{i2}} - x_{{j2}} } \right)^{2} + \left( {x_{{i3}} - x_{{j3}} } \right)^{2} + \left( {x_{{i4}} - x_{{j4}} } \right)^{2} + \left( {x_{{_{{i1}} }}^{2} - x_{{_{{j1}} }}^{2} } \right)^{2} + \left( {x_{{_{{i2}} }}^{2} - x_{{_{{j2}} }}^{2} } \right)^{2} \hfill \\ + \left( {x_{{_{{i3}} }}^{2} - x_{{_{{j3}} }}^{2} } \right)^{2} + \left( {x_{{_{{i4}} }}^{2} - x_{{_{{j4}} }}^{2} } \right)^{2} + \left( {x_{{i1}} x_{{i2}} - x_{{j1}} x_{{j2}} } \right)^{2} + \left( {x_{{i1}} x_{{i3}} - x_{{j1}} x_{{j3}} } \right)^{2} + \left( {x_{{i1}} x_{{i4}} - x_{{j1}} x_{{j4}} } \right)^{2} \hfill \\ + \left( {x_{{i2}} x_{{i3}} - x_{{j2}} x_{{j3}} } \right)^{2} + \left( {x_{{i2}} x_{{i4}} - x_{{j2}} x_{{j4}} } \right)^{2} + \left( {x_{{i3}} x_{{i4}} - x_{{j3}} x_{{j4}} } \right)^{2} \le 40 \hfill \\ \end{gathered} \right] \times x_{{j1}} } \right.} \hfill \\ x_{{i2}}^{*} = x_{{i2}} - \sum\limits_{{j = 1}}^{{11}} {8 \times I\left[ {\left. \begin{gathered} \left( {x_{{i1}} - x_{{j1}} } \right)^{2} + \left( {x_{{i2}} - x_{{j2}} } \right)^{2} + \left( {x_{{i3}} - x_{{j3}} } \right)^{2} + \left( {x_{{i4}} - x_{{j4}} } \right)^{2} + \left( {x_{{_{{i1}} }}^{2} - x_{{_{{j1}} }}^{2} } \right)^{2} + \left( {x_{{_{{i2}} }}^{2} - x_{{_{{j2}} }}^{2} } \right)^{2} \hfill \\ + \left( {x_{{_{{i3}} }}^{2} - x_{{_{{j3}} }}^{2} } \right)^{2} + \left( {x_{{_{{i4}} }}^{2} - x_{{_{{j4}} }}^{2} } \right)^{2} + \left( {x_{{i1}} x_{{i2}} - x_{{j1}} x_{{j2}} } \right)^{2} + \left( {x_{{i1}} x_{{i3}} - x_{{j1}} x_{{j3}} } \right)^{2} + \left( {x_{{i1}} x_{{i4}} - x_{{j1}} x_{{j4}} } \right)^{2} \hfill \\ + \left( {x_{{i2}} x_{{i3}} - x_{{j2}} x_{{j3}} } \right)^{2} + \left( {x_{{i2}} x_{{i4}} - x_{{j2}} x_{{j4}} } \right)^{2} + \left( {x_{{i3}} x_{{i4}} - x_{{j3}} x_{{j4}} } \right)^{2} \le 40 \hfill \\ \end{gathered} \right] \times x_{{j2}} } \right.} \hfill \\ x_{{i3}}^{*} = x_{{i2}} - \sum\limits_{{j = 1}}^{{11}} {8 \times I\left[ {\left. \begin{gathered} \left( {x_{{i1}} - x_{{j1}} } \right)^{2} + \left( {x_{{i2}} - x_{{j2}} } \right)^{2} + \left( {x_{{i3}} - x_{{j3}} } \right)^{2} + \left( {x_{{i4}} - x_{{j4}} } \right)^{2} + \left( {x_{{_{{i1}} }}^{2} - x_{{_{{j1}} }}^{2} } \right)^{2} + \left( {x_{{_{{i2}} }}^{2} - x_{{_{{j2}} }}^{2} } \right)^{2} \hfill \\ + \left( {x_{{_{{i3}} }}^{2} - x_{{_{{j3}} }}^{2} } \right)^{2} + \left( {x_{{_{{i4}} }}^{2} - x_{{_{{j4}} }}^{2} } \right)^{2} + \left( {x_{{i1}} x_{{i2}} - x_{{j1}} x_{{j2}} } \right)^{2} + \left( {x_{{i1}} x_{{i3}} - x_{{j1}} x_{{j3}} } \right)^{2} + \left( {x_{{i1}} x_{{i4}} - x_{{j1}} x_{{j4}} } \right)^{2} \hfill \\ + \left( {x_{{i2}} x_{{i3}} - x_{{j2}} x_{{j3}} } \right)^{2} + \left( {x_{{i2}} x_{{i4}} - x_{{j2}} x_{{j4}} } \right)^{2} + \left( {x_{{i3}} x_{{i4}} - x_{{j3}} x_{{j4}} } \right)^{2} \le 40 \hfill \\ \end{gathered} \right] \times x_{{j3}} } \right.} \hfill \\ x_{{i4}}^{*} = x_{{i2}} - \sum\limits_{{j = 1}}^{{11}} {8 \times I\left[ {\left. \begin{gathered} \left( {x_{{i1}} - x_{{j1}} } \right)^{2} + \left( {x_{{i2}} - x_{{j2}} } \right)^{2} + \left( {x_{{i3}} - x_{{j3}} } \right)^{2} + \left( {x_{{i4}} - x_{{j4}} } \right)^{2} + \left( {x_{{_{{i1}} }}^{2} - x_{{_{{j1}} }}^{2} } \right)^{2} + \left( {x_{{_{{i2}} }}^{2} - x_{{_{{j2}} }}^{2} } \right)^{2} \hfill \\ + \left( {x_{{_{{i3}} }}^{2} - x_{{_{{j3}} }}^{2} } \right)^{2} + \left( {x_{{_{{i4}} }}^{2} - x_{{_{{j4}} }}^{2} } \right)^{2} + \left( {x_{{i1}} x_{{i2}} - x_{{j1}} x_{{j2}} } \right)^{2} + \left( {x_{{i1}} x_{{i3}} - x_{{j1}} x_{{j3}} } \right)^{2} + \left( {x_{{i1}} x_{{i4}} - x_{{j1}} x_{{j4}} } \right)^{2} \hfill \\ + \left( {x_{{i2}} x_{{i3}} - x_{{j2}} x_{{j3}} } \right)^{2} + \left( {x_{{i2}} x_{{i4}} - x_{{j2}} x_{{j4}} } \right)^{2} + \left( {x_{{i3}} x_{{i4}} - x_{{j3}} x_{{j4}} } \right)^{2} \le 40 \hfill \\ \end{gathered} \right] \times x_{{j4}} } \right.} \hfill \\ \end{gathered} \right. \hfill \\ I\left( A \right) = \left\{ \begin{gathered} 1\,\,\,\,\,\,\,\,\,\,A \hfill \\ 0\,\,\,\,\,\,\,\,A^{c} \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered} $$

After performing the optimization procedure, the optimal weights are obtained and the optimal hyper-plane is given by:

$$ \begin{gathered} 2.726x_{{i1}}^{*} {\mkern 1mu} + 15.808x_{{i2}}^{*} - 3.{\text{713}}x_{{i3}}^{*} - 7.835x_{{i4}}^{*} + 0.042\left( {x_{{i1}}^{2} } \right)^{*} \hfill \\ \;\; + 8.681\left( {x_{{i2}}^{2} } \right) + 1.687\left( {x_{{i3}}^{2} } \right)^{*} - 0.{\text{061 }}\left( {x_{{i2}}^{2} } \right)^{*} - 7.84\left( {x_{{i1}} x_{{i2}} } \right)^{{*^{*} }} \hfill \\ \;\; + 0.811\left( {x_{{i1}} x_{{i3}} } \right)^{*} + 5.774\left( {x_{{i1}} x_{{i4}} } \right) - 3.98\left( {x_{{i2}} x_{{i3}} } \right)^{*} \hfill \\ \;\; - 8.94\left( {x_{{i2}} x_{{i4}} } \right)^{*} + 1.486\left( {x_{{i3}} x_{{i4}} } \right)^{*} = 0\;\;i = 1,2,...,150 \hfill \\ \end{gathered} $$

Using these two hyper-planes, only 4 patterns of the 150 patterns in Iris dataset will be classified wrongly. Therefore, the accuracy of the proposed method is 97.33%.

In order to compare the performance of the proposed SVM with previous related works, consider the resulZahiri and Seyedin 2007ts in Siswantoro et al. (2016); Zahiri and Seyedin 2007; Tanveer et al. 2021). In (Siswantoro et al. 2016), a combination of Kalman filter and multi-layer perceptron (NN-LMKF) has been presented. In fact, a linear model based on Kalman filter has been used as a post-processing unit after the Multi-layer perceptron. In (Zahiri and Seyedin 2007), a swarm intelligence-based classifier (IPS) has been presented. In (Tanveer et al. 2021), some classification algorithms using K-nearest neighbor (KNN) have been presented. For example, the accuracy of K-nearest neighbor (KNN)-based weighted multi-class twin support vector machines (KWMTSVM), support vector classification–regression machine for K-class classification (K-SVCR) and twin multi-class classification support vector machines (twin-KSVC) has been reported in Tanveer et al. (2021). In (Wu et al. 2019), the results of a random forest classifier on Iris dataset have been presented. Table 7 compares the results of the aforementioned algorithms with the proposed method (SP-SVM). As shown in this paper, the proposed method outperforms the aforementioned algorithms.

Table 7 Comparision of Iris dataset classification accuracy of different algorithms

Full size table

5 Conclusion

In this paper, a new version of SVM has been proposed based on semi-parametric linear model. The similarity of the hyper-plane in SVM and parametric or semi-parametric models in statistics has been the main motivation of this paper. In other words, similar to semi-parametric regression model that the coefficients of the model are functions of the input data, a new version of SVM has been developed in this paper that parameters of the hyper-plane are functions of the input data. As a result, the proposed classifier is more flexible in comparison with the conventional SVM, since critical data can be classified in the correct set due to the fact that the parameters of the hyper-plane are not constant. The kernels used in data transformation are simple kernels such as polynomial kernels and the kernel used in the semi-parametric model is the Gaussian Kernel. The numerical results show that the proposed method can successfully be used in classification problems.

Data availability

The authors confirm that the data supporting the findings of this study are available within the article or its references.

References

Alajlan N, Bazi Y, Melgani F, Yager RR (2012) Fusion of supervised and unsupervised learning for improved classification of hyperspectral images. Inf Sci 217:39–55
Article Google Scholar
Bashbaghi S, Granger E, Sabourin R, Bilodeau GA (2017) Dynamic ensembles of exemplar-svms for still-to-video face recognition. Pattern Recogn 69:61–81
Article Google Scholar
Bertsekas DP (1995) Dynamic programming and optimal control. Athena scientific, Belmont, MA
MATH Google Scholar
Bi J, Chen Y, Wang JZ (2005) A sparse support vector machine approach to region-based image categorization. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 1121–1128
Burges CC (1998) A tutorial on support vector machines for pattern recognition. In: Proceedings of international conference on data mining and knowledge discovery, vol 2, no 2, pp 121–167
Byun H, Lee SW (2002) Applications of support vector machines for pattern recognition: a survey. In: Lee S-W, Verri A (eds) Pattern recognition with support vector machines. Springer, Berlin, Heidelberg, pp 213–236. https://doi.org/10.1007/3-540-45665-1_17
Chapter MATH Google Scholar
Campbell C (2000) An introduction to kernel methods. In: Howlett RJ, Jain LC (eds) Radial basis function networks design and applications. Springer Verlag, Berlin
Google Scholar
Caoa LJ, Chuab KS, Chongc WK, Leea HP, Gud QM (2003) A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing 55:321–336
Article Google Scholar
Christlein V, Bernecker D, Hönig F, Maier A, Angelopoulou E (2017) Writer identification using GMM supervectors and exemplar-SVMs. Pattern Recogn 63:258–267
Article Google Scholar
Downs T, Gates KE, Masters A (2001) Exact simplification of support vector solutions. J Mach Learn Res 2:293–297
MATH Google Scholar
Dukart J, Mueller K, Barthel H, Villringer A, Sabri O, Schroeter ML, Alzheimer’s Disease Neuroimaging Initiative (2013) Meta-analysis based SVM classification enables accurate detection of Alzheimer’s disease across different clinical centers using FDG-PET and MRI. Psychiatry Res Neuroimaging 212(3):230–236
Fateh MM, Khorashadizadeh S (2012) Optimal robust voltage control of electrically driven robot manipulators. Nonlinear Dyn 70(2):1445–1458
Article MathSciNet Google Scholar
Garcia MG, Rojo-Álvarez JL, Alonso-Atienza F, Martínez-Ramón M (2006) Support vector machines for robust channel estimation in OFDM. IEEE Signal Process Lett 13(7):397–400
Article Google Scholar
Gu B, Sheng VS (2017) A robust regularization path algorithm for -support vector classification. IEEE Trans Neural Netw Learn Syst 28(5):1241–1248
Article Google Scholar
Gunn SR (1998) Support vector machines for classification and regression, University of Southampton
Gutschoven B, Verlinde P (2000) Multi-modal identity verification using support vector machines (SVM). In: Proceedings of the third international conference on information fusion, pp 3–8
Haddi Z, Alami H, El Bari N, Tounsi M, Barhoumi H, Maaref A, Jaffrezic-Renault N, Bouchikhi BE (2013) Electronic nose and tongue combination for improved classification of Moroccan virgin olive oil profiles. Food Res Int 54(2):1488–1498
Article Google Scholar
Haykin S (1999) Neural networks. Prentice Hall Inc, USA
MATH Google Scholar
Haykin S (2007) Neural networks: a comprehensive foundation. Prentice-Hall Inc
MATH Google Scholar
Hesamian G, Akbari MG, Asadollahi M (2017) Fuzzy semi-parametric partially linear model with fuzzy inputs and fuzzy outputs. Expert Syst Appl 71:230–239
Article Google Scholar
https://en.wikipedia.org/wiki/Kernel_(statistics)
Izquierdo-Verdiguier E, Gomez-Chova L, Bruzzone L, Camps-Valls G (2013) Semisupervised kernel feature extraction for remote sensing image analysis. IEEE Trans Geosci Remote Sens 2(9):5567–5578
Article Google Scholar
Kuo B, Ho H, Li C, Hung C, Taur J (2014) A Kernel-based Feature Selection Method for SVM with RBF Kernel for Hyperspectral Image Classification. IEEE J Select Top Appl Earth Oobserv Remote Sens 7(1):317–326
Article Google Scholar
Liu Y, Wen K, Gao Q, Gao X, Nie F (2018) SVM based multi-label learning with missing labels for image annotation. Pattern Recogn 78:307–317
Article Google Scholar
Moghaddam VH, Hamidzadeh J (2016) New Hermite orthogonal polynomial kernel and combined kernels in Support Vector Machine classifier. Pattern Recogn 60:921–935
Article MATH Google Scholar
Osuna E, Freund R, Girosi F (1997) Training support machines: an application to face detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 130–136
Ozer S, Chen C, Cirpan HA (2011) A set of new Chebyshev kernel functions for support vector machine pattern classification. Pattern Recogn 44:1435–1447
Article MATH Google Scholar
Roy A, Singha J, Devi SS, Laskar RH (2016) Impulse noise removal using SVM classification based fuzzy filter from gray scale images. Signal Process 128:262–273
Article Google Scholar
Sapankevych, N. I., & Sankar, R. (2009). Time series prediction using support vector machines: a survey. IEEE Computational Intelligence Magazine, 4(2).
Shah JH, Sharif M, Yasmin M, Fernandes SL (2017) Facial expressions classification and false label reduction using LDA and threefold SVM. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2017.06.021
Article Google Scholar
Siswantoro J, Prabuwono AS, Abdullah A, Idrus B (2016) A linear model based on Kalman filter for improving neural network classification performance. Expert Syst Appl 49:112–122
Article Google Scholar
Solera-Urena R, García-Moral AI, Pelaez-Moreno C, Martinez-Ramon M, Diaz-de-Maria F (2012) Real-time robust automatic speech recognition using compact support vector machines. IEEE Trans Audio Speech Lang Process 20(4):1347
Article Google Scholar
Song Q, Hu W, Xie W (2002) Robust support vector machine with bullet hole image classification. IEEE Trans Syst Man Cybern Part C (applications and Reviews) 32(4):440–448
Article Google Scholar
Steinwart L, Schölkopf SB (2004) Sparseness of support vector machines--some asymptotically sharp bounds. Adv Neural Inf Process Syst 16
Subasi A (2013) Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. Comput Biol Med 43(5):576–586
Article Google Scholar
Tanveer M, Sharma A, Suganthan PN (2021) Least squares KNN-based weighted multiclass twin SVM. Neurocomputing 459:454–464
Article Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer
Book MATH Google Scholar
Wasserman L (2006) All of nonparametric statistics. Springer, New York
MATH Google Scholar
Wu Y, He J, Ji Y, Huang G, Yao H, Zhang P, Wen Xu, Guo M, Li Y (2019) Enhanced classification models for iris dataset. Procedia Comput Sci 162:946–954
Article Google Scholar
Ye R, Suganthan PN (2012) A kernel-ensemble bagging support vector machine. In: 12th international conference on intelligent systems design and applications (ISDA), pp 847–852
Zadeh SMH, Khorashadizadeh S, Fateh MM, Hadadzarif M (2016) Optimal sliding mode control of a robot manipulator under uncertainty using PSO. Nonlinear Dyn 84(4):2227–2239
Article MathSciNet Google Scholar
Zahiri SH, Seyedin SA (2007) Swarm intelligence based classifiers. J Franklin Inst 344(5):362–376
Article MATH Google Scholar
Zarei R, Akbari MG, Chachi J (2020) Modeling autoregressive fuzzy time series data based on semi-parametric methods. Soft Comput 24(10):7295–7304
Article MATH Google Scholar
Zhang L, Zhou W, Jiao L (2004) Wavelet support vector machine. IEEE Trans Syst Man Cybern Part B 34(1):34–39. https://doi.org/10.1109/TSMCB.2003.811113
Article Google Scholar
Zhu F, Wei J (2017) Localization algorithm for large scale wireless sensor networks based on fast-SVM. Wireless Pers Commun 95(3):1859–1875
Article Google Scholar

Download references

Acknowledgements

Authors declare that the manuscript has not been submitted to more than one journal for simultaneous consideration and it has not been published previously (partly or in full). In addition, this study has not been split up into several parts to increase the quantity of submissions. No data have been fabricated or manipulated to support the conclusions. Moreover, no data, text or theories by others are presented as if they were our own. Proper acknowledgements to other works have been given.

Funding

No organization has funded this study.

Author information

Authors and Affiliations

Department of Statistics, University of Birjand, 97175/615, Birjand, Iran
Mohammad Ghassem Akbari
Faculty of Electrical and Computer Engineering, University of Birjand, 97175/615, Birjand, Iran
Saeed Khorashadizadeh & Mohammad-Hassan Majidi

Authors

Mohammad Ghassem Akbari
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Khorashadizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad-Hassan Majidi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MGA developed the main ideas, generated the formulation of the algorithm and performed the experiments. SK provided the English text of the paper, edited the formulations and simulations, provided the reply letter and revised the manuscript. MM provided the references for literature review and proposed some experiments.

Corresponding author

Correspondence to Saeed Khorashadizadeh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human or animals rights

Authors declare that there has been no human participant or animal in this research.

Informed consent

Authors declare that there is no need for informed consent for this paper, since the results have been obtained using computer simulations.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Akbari, M., Khorashadizadeh, S. & Majidi, MH. Support vector machine classification using semi-parametric model. Soft Comput 26, 10049–10062 (2022). https://doi.org/10.1007/s00500-022-07376-2

Download citation

Accepted: 05 July 2022
Published: 22 August 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00500-022-07376-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Support vector machine classification using semi-parametric model

Abstract

Similar content being viewed by others

Support Vector Machine: Applications and Improvements Using Evolutionary Algorithms

Support Vector Machine Optimization Using Secant Hyperplane Kernel

Support Vector Machine with Customized Kernel

1 Introduction

2 Support vector machine

2.1 Linear support vector machines

2.2 Nonlinear support vector machines and kernels

2.2.1 Inner-product kernels

2.3 Designing SVM based on Lagrange optimization

3 The proposed SVM based on semi-parametric linear regression

4 Experimental results

Example 1

Example 2

Example 3

Example 4

5 Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Human or animals rights

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Support vector machine classification using semi-parametric model

Abstract

Similar content being viewed by others

Support Vector Machine: Applications and Improvements Using Evolutionary Algorithms

Support Vector Machine Optimization Using Secant Hyperplane Kernel

Support Vector Machine with Customized Kernel

Explore related subjects

1 Introduction

2 Support vector machine

2.1 Linear support vector machines

2.2 Nonlinear support vector machines and kernels

2.2.1 Inner-product kernels

2.3 Designing SVM based on Lagrange optimization

3 The proposed SVM based on semi-parametric linear regression

4 Experimental results

Example 1

Example 2

Example 3

Example 4

5 Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Human or animals rights

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation