Asymmetric ν-twin support vector regression

Xu, Yitian; Li, Xiaoyan; Pan, Xianli; Yang, Zhiji

doi:10.1007/s00521-017-2966-z

Asymmetric ν-twin support vector regression

Original Article
Published: 27 April 2017

Volume 30, pages 3799–3814, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Asymmetric ν-twin support vector regression

Download PDF

Yitian Xu¹,
Xiaoyan Li²,
Xianli Pan¹ &
…
Zhiji Yang¹

688 Accesses
20 Citations
Explore all metrics

Abstract

Twin support vector regression (TSVR) aims at finding 𝜖-insensitive up- and down-bound functions for the training points by solving a pair of smaller-sized quadratic programming problems (QPPs) rather than a single large one as in the conventional SVR. So TSVR works faster than SVR in theory. However, TSVR gives equal emphasis to the points above the up-bound and below the down-bound, which leads to the same influences on the regression function. In fact, points in different positions have different effects on the regressor. Inspired by it, we propose an asymmetric ν-twin support vector regression based on pinball loss function (Asy- ν-TSVR). The new algorithm can effectively control the fitting error by tuning the parameters ν and p. Therefore, it enhances the generalization ability. Moreover, we study the distribution of samples and give the upper bounds for the samples locating in different positions. Numerical experiments on one artificial dataset, eleven benchmark datasets and a real wheat dataset demonstrate the validity of our proposed algorithm.

An $l_1$-norm loss based twin support vector regression and its geometric extension

Article 19 November 2018

Robust Twin Support Vector Regression with Smooth Truncated Hε Loss Function

Article 02 March 2023

A least squares twin support vector machine method with uncertain data

Article 22 August 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The support vector machine (SVM), introduced by Vapnik, is a successful model and prediction tool for classification and regression, which has spread to many fields [1, 2]. Compared with other machine learning approaches like artificial neural networks, SVM has many advantages. First, SVM solves a quadratic programming problem (QPP), assuring that once an optimal solution is obtained, it is the unique solution. Second, by maximizing the margin between two classes of samples, SVM derives a sparse and robust solution. Third, SVM implements the structural risk minimization principle rather than the empirical risk minimization principle, which minimizes the upper bound of the generalization error. The introduction of kernel function extends the linear case to the nonlinear case, and effectively overcomes the “curse of dimension” [3, 4]. Because of its great generalization performance, SVM has been successfully applied in various aspects ranging from pattern recognition, text categorization, and financial regression.

The ν-support vector regression (ν-SVR) [5], which is based on statistical learning theory, has become a standard tool in regression tasks. The ν-SVR extends standard SVR [2] techniques by Vapnik via enforcing a fraction of the data samples to lie inside an 𝜖-tube, as well as minimizing the width of this tube [6]. It introduces a new parameter ν to control the fitting error. ν-SVR has better generalization performance than the traditional techniques. In the ν-SVR, the 𝜖-insensitive loss function is defined as follows

$$\begin{array}{@{}rcl@{}} L_{\epsilon}(u)=\left\{ \begin{array}{cc} u-\epsilon,& u\geq \epsilon,\\ 0, & -\epsilon< u<\epsilon,\\ -u-\epsilon,& u\leq -\epsilon. \end{array} \right. \end{array} $$

(1)

ν-SVR owns better generalization ability compared with other machine learning methods. However, one of the main challenges for it is high computational complexity. In order to improve the computational speed, Peng proposed an efficient twin support vector machine for regression problem (TSVR) [7,8,9] based on twin support vector machine (TSVM) [11]. It aims at generating two nonparallel bound functions [10] by solving two smaller-sized QPPs such that each function determines the 𝜖-insensitive up- or down-bounds of the unknown regressor. Each QPP involves only one group of constraints for all samples, which makes TSVR work faster than the standard SVR in theory [7, 12]. Then, it receives many attentions, and many variants have also been proposed in literatures [13,14,15,16].

Recently, Huang extends 𝜖-insensitive loss L _𝜖(u) to the pinball loss $L_{\epsilon }^{p}(u)$. Samples locating in different positions are given different penalties [17,18,19,20], and then it yields better generalization performance. Inspired by it, we propose an asymmetric ν-twin support vector regression (Asy- ν-TSVR) in this paper. Where the asymmetric tube is used, and then Asy- ν-TSVR produces good generalization performance. Asy- ν-TSVR is especially suitable for dealing with the asymmetric noise [21,22,23,24].

Asy- ν-TSVR aims at finding two nonparallel functions: 𝜖 ₁-insensitive down function $f_{1}(x)={w_{1}^{T}}x+b_{1}$ and 𝜖 ₂-insensitive up function $f_{2}(x)={w_{2}^{T}}x+b_{2}$ [25, 26]. Similar to the TSVR [7], Asy- ν-TSVR also solves two smaller-sized QPPs rather than a larger one, and each involves only one group of constraints for all samples. By introducing the pinball loss [27] into it, the samples lying above and below the bounds are given different punishments [29, 30]. To verify the validity of our proposed algorithm, an artificial experiment, eleven benchmark experiments and a real wheat dataset have been performed. Compared with ν-SVR, Asy- ν-SVR, least squares for support vector regression (LS-SVR) [28], and TSVR, our proposed Asy- ν-TSVR has better generalization ability.

The paper is organized as follows: Section 2 briefly dwells on ν-SVR, Asy- ν-SVR, and TSVR. Asy- ν-TSVR is proposed in Section 3, which includes both the linear and nonlinear cases. The bounds are discussed in Section 4. Section 5 performs experiments on three kinds of datasets to investigate the feasibility and validity of our proposed algorithm. Section 6 ends the paper with concluding remarks.

2 Related works

In this section, we give a brief description of the ν-SVR, Asy- ν-SVR, and TSVR. Given a training set T = {(x ₁,y ₁),(x ₂,y ₂),⋯ ,(x _l,y _l)}, where x _i ∈ R ^d and y _i ∈ R. For the sake of conciseness, let matrix A = (x ₁,x ₂,⋯ ,x _l)^T, and matrix Y = (y ₁,y ₂,⋯ ,y _l)^T. e is a vector of ones of appropriate dimensions.

2.1 ν-support vector regression

The nonlinear ν-SVR seeks to find a regression function f(x) = w ^T ϕ(x) + b in a high-dimensional feature space tolerating the small error in fitting the given data points. This can be achieved by utilizing the 𝜖-insensitive loss function L _𝜖(u) that sets an 𝜖-insensitive “tube” as small as possible, within which errors are discarded. The ν-SVR can be obtained by solving the following QPP,

$$\begin{array}{@{}rcl@{}} \displaystyle{\min_{w,b,\xi,\xi^{*},\epsilon}}~~&&\frac{1}{2}\|w\|^{2} +C\nu\epsilon+\frac{C}{l}\sum\limits_{i=1}^{l}(\xi_{i}+\xi_{i}^{*})\\ \text{s.t.}~~&&(w^{T}\phi(x_{i})+b)-y_{i} \leq \epsilon+\xi_{i},i=1,2,\cdots,l,\\ ~~&&y_{i} -(w^{T}\phi(x_{i})+b)\leq \epsilon+\xi_{i}^{*},i=1,2,\cdots,l,\\ ~~&&\epsilon\geq 0, \xi_{i}\geq 0, \xi_{i}^{*}\geq 0, i=1,2,\cdots,l. \end{array} $$

(2)

where C,ν are parameters chosen a priori. Parameter C controls the trade-off between the fitting errors and flatness of the regression function. ν has its theoretical interpretation that controls the fractions of the support vectors and the margin errors. To be more precise, ν is an upper bound on the fraction of errors and a lower bound on the fraction of support vectors [31,32,33]. ξ _i and $\xi _{i}^{*}$ are the slack vectors reflecting whether the samples locate into the 𝜖-tube or not.

By introducing the Lagrangian multiplies α and $\alpha _{i}^{*}$, we can derive the dual problem of the ν-SVR as follows

$$\begin{array}{@{}rcl@{}} \displaystyle{\min_{\alpha^{(*)}}}~~&&\!\!\!\frac{1}{2}\!\sum\limits_{i,j=1}^{l}\!(\alpha_{i}^{*}\,-\,\alpha_{i})(\alpha_{j}^{*} \,-\,\alpha_{j}\!)K(x_{i},x_{j}\!) \,-\,\sum\limits_{i=1}^{l}(\alpha_{i}^{*}\,-\,\alpha_{i})y_{i}\\ \text{s.t.}~~&&\sum\limits_{i=1}^{l}(\alpha_{i}-\alpha_{i}^{*})=0,\\ ~~&&0\leq\alpha_{i}^{({*})}\leq C/l,i=1,2,\cdots,l, \\ ~~&&\sum\limits_{i=1}^{l}(\alpha_{i}+\alpha_{i}^{*})\leq C\nu. \end{array} $$

(3)

Once the QPP (3) is solved, we can achieve its solution $\alpha ^{(*)} =(\alpha _{1},\alpha _{1}^{*},\alpha _{2},\alpha _{2}^{*},\cdots ,\alpha _{l},\alpha _{l}^{*})$ and the threshold b, and then obtain the regression function,

$$\begin{array}{@{}rcl@{}} f(x)=\sum\limits_{i=1}^{l}(\alpha_{i}^{*}-\alpha_{i})K(x_{i},x)+b. \end{array} $$

(4)

Here K(x _i,x) represents a kernel function which gives the dot product in the high-dimensional feature space.

2.2 Asymmetric ν-support vector regression

The ν-SVR considers only one possible location of the 𝜖-tube: it imposes that the numbers of samples above and below the tube are equal. To further improve the computational accuracy, Huang imposes that those outliers can be divided asymmetrically over both regions. To pursue an asymmetric tube, he introduces the following asymmetric loss function

$$\begin{array}{@{}rcl@{}} L_{\epsilon}^{p}(u)=\left\{ \begin{array}{cc} \frac{1}{2p}(u-\epsilon),& u\geq \epsilon,\\ 0, & -\epsilon< u<\epsilon,\\ \frac{1}{2(1-p)}(-u-\epsilon),& u\leq -\epsilon, \end{array} \right. \end{array} $$

(5)

where p is a parameter related to asymmetry. As we can learn that $L_{\epsilon }^{p}(u)$ can reduce to L _𝜖 when p = 0.5.

The Asy- ν-SVR solves the following QPP,

$$\begin{array}{@{}rcl@{}} \displaystyle{\min_{w,b,\epsilon,\xi_{i},\xi_{i}^{*}}}~~&&\frac{1}{2\gamma}w^{T}w +\nu\epsilon+\frac{1}{l}\sum\limits_{i=1}^{l}(\xi_{i}+\xi_{i}^{*})\\ \text{s.t.}~~&&(w^{T}\phi(x_{i})+b-\epsilon)-y_{i} \leq 2(1-p)\xi_{i},i=1,\cdots, l,\\ ~~&&y_{i} -(w^{T}\phi(x_{i})+b+\epsilon)\leq 2p\xi_{i}^{*}, i=1,2, \cdots, l,\\ ~~&&\epsilon\geq 0,\xi_{i}\geq 0, \xi_{i}^{*}\geq 0, i=1,2, \cdots, l. \end{array} $$

(6)

The coefficients γ and ν control the trade-off among the margin, the size of the slack variables and the width of 𝜖-tube. The 𝜖 as unknown controls the width of the 𝜖-insensitive zone, which is used to fit the training data, and it can affect the number of support vectors. ξ _i and $\xi _{i}^{*}$ are the slack vectors reflecting whether the samples locating into the 𝜖-tube or not. Parameter p is related to the asymmetry. Where parameters γ,ν and p are chosen in advance. Apparently Asy- ν-SVR reduces to ν-SVR when p = 0.5. So Asy- ν-SVR is an extension of the ν-SVR.

We can derive the dual formulation of the Asy- ν-SVR as follows

$$\begin{array}{@{}rcl@{}} \displaystyle{\min_{\lambda^{*},\lambda}}~~&&\!\!\!\frac{1}{2}\!\sum\limits_{i,j=1}^{l} \!(\lambda_{i}^{*}\,-\,\lambda_{i})^{T}\!K(x_{i},x_{j}\!)(\lambda_{j}^{*}\,-\,\lambda_{j}\!) \,-\,\sum\limits_{i=1}^{l}\!y_{i}(\lambda_{i}^{*}\,-\,\lambda_{i}\!)\\ \text{s.t.}~~&&\sum\limits_{i=1}^{l}(\lambda_{i}^{*}-\lambda_{i})=0,\\ ~~&&\sum\limits_{i=1}^{l}(\lambda_{i}^{*}+\lambda_{i})\leq\nu\gamma,\\ ~~&&0\leq\lambda_{i}^{*}\leq \frac{\gamma}{2pl},i=1,2,\cdots,l, \\ ~~&&0\leq\lambda_{i}\leq \frac{\gamma}{2(1-p)l},i=1,2,\cdots,l. \end{array} $$

(7)

Once the QPP (7) is solved, we can achieve its solution $\lambda ^{(*)} =(\lambda _{1},\lambda _{1}^{*},\lambda _{2},\lambda _{2}^{*},\cdots ,\lambda _{l},\lambda _{l}^{*})$ and threshold b, and then obtain the following regressor

$$\begin{array}{@{}rcl@{}} f(x)=\sum\limits_{i=1}^{l}(\lambda_{i}^{*}-\lambda_{i})K(x,x_{i})+b. \end{array} $$

(8)

This extension gives an effective way to deal with skewed noise in regression problems.

2.3 Twin support vector regression

To improve the computational speed, Peng [7] proposed an efficient TSVR for the regression problem. TSVR generates an 𝜖-insensitive down-bound function $f_{1}(x)={w_{1}^{T}}x+b_{1}$ and an 𝜖-insensitive up-bound function $f_{2}(x)={w_{2}^{T}}x+b_{2}$. TSVR is illustrated in Fig. 1.

The final regressor f(x) is decided by the mean of these two bound functions, i.e.,

$$\begin{array}{@{}rcl@{}} f\!(x)\,=\,\frac{1}{2}(f_{1}(x)\,+\,f_{2}(x))\,=\,\frac{1}{2}(w_{1}\,+\,w_{2})^{T}x\,+\,\frac{1}{2}(b_{1}\,+\,b_{2}). \end{array} $$

(9)

For the nonlinear case, TSVR solves the following pair of smaller-sized QPPs,

$$\begin{array}{@{}rcl@{}} \displaystyle{\min_{w_{1},b_{1},\xi}} ~~&&\!\!\!\frac{1}{2}\|Y\,-\,e\epsilon_{1}\,-\,(K(A, A^{T})w_{1}\,+\,eb_{1})\|^{2}\,+\,C_{1}e^{T}\xi\\ \text{s.t.}~~&&Y-(K(A, A^{T})w_{1}+eb_{1})\geq e\epsilon_{1}-\xi,\\ ~~&&\xi\geq 0, \end{array} $$

(10)

and

$$\begin{array}{@{}rcl@{}} \displaystyle{\min_{w_{2},b_{2},\eta}} ~~&&\!\!\!\frac{1}{2}\|Y\,+\,e\epsilon_{2}\,-\,(K(A, A^{T})w_{2}\,+\,eb_{2})\|^{2}\,+\,C_{2}e^{T}\eta\\ \text{s.t.}~~&&(K(A, A^{T})w_{2}+eb_{2})-Y\geq e\epsilon_{2}-\eta,\\ ~~&&\eta\geq 0. \end{array} $$

(11)

In the objective function of (10) or (11), the first term minimizes the squared distances from the training points to f ₁(x) + 𝜖 ₁ and f ₂(x) − 𝜖 ₂, the second term aims to minimize the sum of error variables. Where parameters C ₁ and C ₂ chosen in advance determines the trade-off between above goals. The constraints require the training points should be larger than the function f ₁(x) at least 𝜖 ₁, at the same time they should be smaller than the function f ₂(x) at least 𝜖 ₂. ξ and η are slack vectors. For the outliers, the same penalty is given to them in TSVR.

After introducing the Lagrangian function L, and differentiating L with respect to variables, we can derive their dual formulations of (10) and (11) as follows

$$\begin{array}{@{}rcl@{}} \displaystyle{\max_{\alpha}} ~~&&\!\!\!-\frac{1}{2}\alpha^{T}\!H\!(H^{T}\!H)^{-1} H^{T}\alpha\,+\,f^{T}H\!(H^{T}\!H)^{-1}H^{T}\!\alpha\,-\,f^{T}\alpha\\ \text{s.t.}~~&& 0\leq \alpha \leq C_{1}e, \end{array} $$

(12)

and

$$\begin{array}{@{}rcl@{}} \displaystyle{\max_{\beta}} ~~&&\!\!\!-\frac{1}{2}\beta^{T}H\!(H^{T}\!H)^{-1}H^{T}\beta \,-\,h^{T}H\!(H^{T}\!H)^{-1}H^{T}\beta\,+\,h^{T}\beta\\ \text{s.t.}~~&&0 \leq \beta \leq C_{2}e, \end{array} $$

(13)

where H = [K(A,A ^T) e], f = Y − e 𝜖 ₁, and h = Y + e 𝜖 ₂.

Once the dual QPPs (12) and (13) are solved, we can get

$$\begin{array}{@{}rcl@{}} \left[\begin{array}{cc} w_{1}\\ b_{1} \end{array}\right]=(H^{T}H)^{-1}H^{T}(f-\alpha), \end{array} $$

(14)

and

$$\begin{array}{@{}rcl@{}} \left[\begin{array}{cc} w_{2}\\ b_{2} \end{array}\right]=(H^{T}H)^{-1}H^{T}(h+\beta). \end{array} $$

(15)

Note that TSVR is comprised of a pair of QPPs such that each QPP determines one of up- or down-bound function by using only one group of constraints compared with the standard SVR. Hence, TSVR solves two smaller-sized QPPs rather than a single large one, which implies that TSVR works faster than the standard SVR in theory.

3 Asymmetric ν-twin support vector regression

As we know that the same penalties are given to the points above the up-bound and below the down-bound in TSVR. In fact, they have different effects on the regression function. Motivated by studies above, we propose the following asymmetric ν-twin support vector regression based on the pinball loss function.

3.1 Linear case

We extend TSVR to the asymmetric case, where p is the parameter related to asymmetric. Asy- ν-TSVR generates an 𝜖 ₁-insensitive down-bound function $f_{1}(x)={w_{1}^{T}}x+b_{1}$ and an 𝜖 ₂-insensitive up-bound function $f_{2}(x)={w_{2}^{T}}x+b_{2}$, and they are nonparallel. Asy- ν-TSVR is illustrated in Fig. 2.

The final regressor f(x) is decided by the mean of these two bound functions, i.e.

$$\begin{array}{@{}rcl@{}} f(x)\,=\,\frac{1}{2}(f_{1}(x)\,+\,f_{2}(x))=\frac{1}{2}(w_{1}+w_{2})^{T}x+\frac{1}{2}(b_{1}+b_{2}). \end{array} $$

In TSVR, after replacing the 𝜖-insensitive loss function L _𝜖 by the pinball loss $L_{\epsilon }^{p}(u)$, Asy- ν-TSVR solves the following pair of smaller-sized QPPs,

$$\begin{array}{@{}rcl@{}} \displaystyle{\min_{w_{1},b_{1},\epsilon_{1},\xi}}~~&&\!\!\!\frac{1}{2}\|Y\,-\,(Aw_{1}\,+\,eb_{1})\|^{2} \,+\,C_{1}\nu_{1}\epsilon_{1}\,+\,\frac{1}{l}C_{1}e^{T}\xi\\ \text{s.t.}~~&&Y-(Aw_{1}+eb_{1})\geq -e\epsilon_{1}-2(1-p)\xi,\\ ~~&&\epsilon_{1}\geq 0, \xi\geq 0, \end{array} $$

(16)

and

$$\begin{array}{@{}rcl@{}} \displaystyle{\min_{w_{2},b_{2},\epsilon_{2},\eta}}~~&&\!\!\!\frac{1}{2}\|Y\,-\,(Aw_{2}\,+\,eb_{2})\|^{2} \,+\,C_{2}\nu_{2}\epsilon_{2}\,+\,\frac{1}{l}C_{2}e^{T}\eta\\ \text{s.t.}~~&&(Aw_{2}+eb_{2})-Y\geq -e\epsilon_{2}-2p\eta,\\ ~~&&\epsilon_{2}\geq 0, \eta\geq 0 \end{array} $$

(17)

where C ₁, C ₂, ν ₁, ν ₂ are parameters chosen in advance, ξ and η are slack vectors.

The first term in the objective function of (16) or (17) minimizes the sum of squared distances from the estimated function $f_{1}(x)={w_{1}^{T}}x+b_{1}$ or $f_{2}(x)={w_{2}^{T}}x+b_{2}$ to the training points. The second term means the 𝜖 ₁-tube and 𝜖 ₂-tube are as narrow as possible. The third term minimizes the sum of error variables. The constraints require the training points lie above $f_{1}(x)-\epsilon _{1}={w_{1}^{T}}x+b_{1}-\epsilon _{1}$ or below $f_{2}(x)+\epsilon _{2}={w_{2}^{T}}x+b_{2}+\epsilon _{2}$ as possible. The slack vector ξ or η is introduced to measure the error wherever the distance is closer than 𝜖 ₁ or 𝜖 ₂. Note that the equal emphasis is given to ξ,η in TSVR. Here, for the outliers, we apply a slightly different penalty to them with the parameter p, i.e., pinball loss. Moreover, it degrades into 𝜖-intensive loss when p = 0.5.

To derive the dual formulations of Asy- ν-TSVR, we first introduce the following Lagrangian function for the problem (16), which is

$$\begin{array}{@{}rcl@{}} \displaystyle ~~&&\!\!\!\!L(w_{1},b_{1},\epsilon_{1},\xi,\alpha,\beta,\gamma\!)\,=\, \frac{1}{2}\|Y\!\,-\,(Aw_{1}\!\,+\,eb_{1})\|^{2}\,+\,C_{1}\nu_{1}\epsilon_{1}+\!\frac{1}{l}C_{1}e^{T}\xi\\ ~~&&\!\!\!-\alpha^{T}\!(Y\,-\,(Aw_{1}\,+\,eb_{1})\,+\,e\epsilon_{1}\,+\,2(1\,-\,p)\xi)\,-\,\beta \epsilon_{1}\,-\,\gamma^{T}\xi, \end{array} $$

(18)

where α, β, and γ are nonnegative and the Lagrangian multipliers. After differentiating L in (18) with respect to variables w ₁,b ₁,𝜖 ₁ and ξ, we have

$$\begin{array}{@{}rcl@{}} \displaystyle \frac{\partial L}{\partial w_{1}}=-A^{T}(Y-(Aw_{1}+eb_{1}))+A^{T}\alpha=0, \end{array} $$

(19)

$$\begin{array}{@{}rcl@{}} \frac{\partial L}{\partial b_{1}}=-e^{T}(Y-(Aw_{1}+eb_{1}))+e^{T}\alpha=0, \end{array} $$

(20)

$$\begin{array}{@{}rcl@{}} \frac{\partial L}{\partial \epsilon_{1}}=C_{1}\nu_{1}-e^{T}\alpha-\beta=0, \end{array} $$

(21)

$$\begin{array}{@{}rcl@{}} \frac{\partial L}{\partial \xi}=\frac{1}{l}C_{1}e-2(1-p)\alpha-\gamma=0. \end{array} $$

(22)

Combining (19) and (20) leads to

$$\begin{array}{@{}rcl@{}} -\left[\begin{array}{cc} A^{T}\\ e^{T} \end{array}\right] \!\left( (Y\,-\,e\epsilon_{1})-[A~ e]\left[\begin{array}{cc} w_{1}\\ b_{1} \end{array}\right]\right)\,+\,\left[\begin{array}{cc} A^{T}\\ e^{T} \end{array}\right] \alpha=0. \end{array} $$

(23)

Define G = [A e], and μ ₁ = [w ₁,b ₁]^T, then we have

$$\begin{array}{@{}rcl@{}} \displaystyle-G^{T}(Y-G\mu_{1})+G^{T}\alpha=0. \end{array} $$

(24)

From (24), we can get

$$\begin{array}{@{}rcl@{}} \displaystyle \mu_{1}\,=\,(G^{T}G)^{-1}(G^{T}Y\,-\,G^{T}\alpha)\,=\,(G^{T}G)^{-1}G^{T}(Y\,-\,\alpha). \end{array} $$

(25)

Then,

$$\begin{array}{@{}rcl@{}} \displaystyle L\!\!&=&\!\!\frac{1}{2}\|Y\!\,-\,G\mu_{1}\!\|^{2}\,+\,C_{1}\nu_{1}\epsilon_{1}\,+\,\frac{1}{l}C_{1}e^{T}\xi\,-\,\alpha^{T}\!(Y\!\,-\,G\mu_{1} \,+\,e\epsilon_{1}\,+\,2(1\!\,-\,\!p)\xi)\\ &&-\beta\epsilon_{1}\,-\,\gamma^{T}\xi \,=\,\frac{1}{2}\|Y\,-\,G\mu_{1}\|^{2}\,-\,\alpha^{T}(Y\,-\,G\mu_{1})\\ &=&\!\!\frac{1}{2}\|Y\!\,-\,G\!(G^{T}\!G)^{-1}G^{T}\!(Y\!\,-\,\alpha)\|^{2} \,-\,\alpha^{T}\!(Y\!\,-\,G(G^{T}\!G)^{-1}G^{T}\!(Y\!\,-\,\alpha)\!). \end{array} $$

(26)

We can get

$$\begin{array}{@{}rcl@{}} \displaystyle L\,=\,-\frac{1}{2}\alpha^{T} G(G^{T}\!G)^{\!-1}G^{T} \!\alpha\,+\,Y^{T} \!G(G^{T}\!G)^{\!-1}G^{T} \!\alpha\,-\,Y^{T}\!\alpha. \end{array} $$

(27)

From (21) and (22), we can obtain the following constraints,

$$\begin{array}{@{}rcl@{}} \displaystyle e^{T}\alpha\leq C_{1}\nu_{1},~~0\leq\alpha\leq C_{1} e/2(1-p)l. \end{array} $$

(28)

Finally, we can derive the dual formulation of (16) as follows

$$\begin{array}{@{}rcl@{}} \displaystyle{\max_{\alpha}} ~~&&\!\!\!\!-\frac{1}{2}\alpha^{T} \!G(\!G^{T}\!G)^{\!-1}G^{T} \!\alpha\,+\,Y^{T} \!G(\!G^{T}\!G)^{\!-1}G^{T} \!\alpha\,-\,Y^{T}\!\alpha\\ \text{s.t.}~~&&e^{T}\alpha\leq C_{1}\nu_{1},\\ ~~&& 0\leq\alpha\leq C_{1} e/2(1-p)l. \end{array} $$

(29)

Similary, we can obtain the dual formulation of (17) as

$$\begin{array}{@{}rcl@{}} \displaystyle{\max_{\gamma}} ~~&&\!\!\!-\frac{1}{2}\gamma^{T} \!G(\!G^{T}\!G)^{\!-1}G^{T} \!\gamma\!\,-\,\!Y^{T} \!G(G^{T}\!G)^{\!-1}G^{T} \!\gamma\,+\,Y^{T}\!\gamma\\ \text{s.t.}~~&&e^{T}\gamma\leq C_{2}\nu_{2},\\ ~~&&0\leq\gamma\leq C_{2} e/2pl. \end{array} $$

(30)

Once (30) is solved, we can obtain

$$\begin{array}{@{}rcl@{}} \left[\begin{array}{cc} w_{2}\\ b_{2} \end{array}\right]=(G^{T}G)^{-1}G^{T}(Y+\gamma). \end{array} $$

(31)

3.2 Nonlinear case

In order to extend our model to the nonlinear case, we consider the following kernel-generated functions instead of linear functions,

$$\begin{array}{@{}rcl@{}} f_{1}\!(x)\,=\,K(x,A^{T})w_{1}\,+\,b_{1} ~\!and\!~f_{2}(x)\,=\,K\!(x,A^{T})w_{2}\,+\,b_{2}. \end{array} $$

(32)

The corresponding formulations are designed as follows

$$\begin{array}{@{}rcl@{}} \displaystyle{\min_{w_{1},b_{1},\epsilon_{1},\xi}}~~&&\!\!\!\frac{1}{2}\|Y\,-\,(K(A,A^{T})\mu_{1}\,+\,eb_{1})\!\|^{2} \,+\,C_{1}\nu_{1}\epsilon_{1}\,+\,\frac{1}{l}C_{1}e^{T}\xi\\ \text{s.t.}~~&&Y\,-\,(K(A,A^{T})\mu_{1}\,+\,eb_{1})\!\geq \!-e\epsilon_{1}\,-\,2(1\,-\,p)\xi,\\ ~~&&\epsilon_{1}\geq 0,\xi\geq 0, \end{array} $$

(33)

and

$$\begin{array}{@{}rcl@{}} \displaystyle{\min_{w_{2},b_{2},\epsilon_{2},\eta}}~~&&\!\!\!\frac{1}{2}\|Y\,-\,(\!K\!(A,A^{T})\mu_{2}\,+\,eb_{2})\!\|^{2} \,+\,C_{2}\nu_{2}\epsilon_{2}\,+\,\frac{1}{l}C_{2}e^{T}\eta\\ \text{s.t.}~~&&(K\!(A,A^{T})\mu_{2}\,+\,eb_{2})\,-\,Y\!\geq\! -e\epsilon_{2}\,-\,2p\eta\\ ~~&&\epsilon_{2}\geq 0,\eta\geq 0, \end{array} $$

(34)

where C ₁, C ₂, ν ₁, and ν ₂ are parameters chosen in advance, ξ and η are slack vectors. The dual formulations of (33) and (34) can be derived as follows

$$\begin{array}{@{}rcl@{}} \displaystyle{\max_{\alpha}}~~&&\!\!\!-\frac{1}{2}\alpha^{T} H\!(H^{T}\!H)^{\!-1}H^{T} \!\alpha\,+\,Y^{T} H\!(H^{T}\!H)^{\!-1}H^{T} \!\alpha\,-\,Y^{T}\alpha\\ \text{s.t.}~~&&e^{T}\alpha\leq C_{1}\nu_{1},\\ ~~&&0\leq\alpha\leq C_{1} e/2(1-p)l, \end{array} $$

(35)

and

$$\begin{array}{@{}rcl@{}} \displaystyle{\max_{\gamma}}~~&&\!\!\!-\frac{1}{2}\gamma^{T} H\!(H^{T}\!H)^{-1}H^{T}\!\gamma\,-\,Y^{T} H\!(H^{T}\!H)^{-1}H^{T} \gamma\,+\,Y^{T}\gamma\\ \text{s.t.}~~&&e^{T}\gamma\leq C_{2}\nu_{2},\\ ~~&&0\leq\gamma\leq C_{2} e/2pl, \end{array} $$

(36)

where H = [K(A,A ^T) e]. Once (35) and (36) are solved, we can get the augmented vectors

$$\begin{array}{@{}rcl@{}} \left[\begin{array}{cc} w_{1}\\ b_{1} \end{array}\right]=(H^{T}H)^{-1}H^{T}(Y-\alpha), \end{array} $$

(37)

and

$$\begin{array}{@{}rcl@{}} \left[\begin{array}{cc} w_{2}\\ b_{2} \end{array}\right]=(H^{T}H)^{-1}H^{T}(Y+\gamma). \end{array} $$

(38)

After the optimal values w ₁,w ₂ and b ₁,b ₂ are calculated, the regression function is expressed as

$$\begin{array}{@{}rcl@{}} f\!(x)\,=\,\frac{1}{2}(f_{1}(x)\,+\,f_{2}(x)\!)\,=\,\frac{1}{2}(w_{1}\,+\,w_{2})^{T}\!K(x, A^{T})\,+\,\frac{1}{2}(b_{1}\,+\,b_{2}). \end{array} $$

(39)

Compared with other algorithms, Asy- ν-TSVR owns following characteristics: First, it considers the sum of squared distances from the training points to the shifted functions f ₁(x) and f ₂(x) instead of others. Second, Asy- ν-TSVR introduces the asymmetric loss function $L_{\epsilon }^{p}(u)$ not 𝜖-insensitive loss L _𝜖, and different penalty is proposed to give the samples locating in different positions. Third, Asy- ν-TSVR degrades into the TSVR when p = 0.5, so Asy- ν-TSVR is an extension of TSVR.

4 Discussion about the bound

In this section, we discuss the bounds of three algorithms, i.e., ν-SVR, Asy- ν-SVR, and Asy- ν-TSVR. In ν-SVR, there are same upper bounds for the points lying above and below the ν-tube. However, in Asy- ν-SVR and Asy- ν-TSVR, there are different upper bounds for the points above the hyper-plane w ^T x + b + 𝜖 = 0 and below the hyper-plane w ^T x + b − 𝜖 = 0.

Proposition 1

The optimal solution in ν -SVR satisfies:

$$\begin{array}{@{}rcl@{}} ~~&&\sum\limits_{i=1}^{l}L(y_{i}< w^{T}x_{i}+b-\epsilon) \leq \frac{1}{2} l\nu,\\ ~~&&\sum\limits_{i=1}^{l}L(y_{i} > w^{T}x_{i}+b+\epsilon) \leq \frac{1}{2} l \nu,\\ ~~&&\sum\limits_{i=1}^{l}L(w^{T}x_{i}\,+\,b\,-\,\epsilon \!\leq\! y_{i} \!\leq\! w^{T}x_{i}\,+\,b\,+\,\epsilon) \!\geq l-l \nu, \end{array} $$

(40)

where L(a)stands for an indicator function, which is equal to one when a is true and is zero otherwise.

Please refer to literature [5] for proof of this proposition.

Proposition 2

The optimal solution in Asy- ν -SVR satisfies:

$$\begin{array}{@{}rcl@{}} ~~&&\sum\limits_{i=1}^{l}L(y_{i}< w^{T}x_{i}+b-\epsilon) \leq (1-p)l \nu,\\ ~~&&\sum\limits_{i=1}^{l}L(y_{i} > w^{T}x_{i}+b+\epsilon) \leq p l \nu,\\ ~~&&\sum\limits_{i=1}^{l}L(w^{T}x_{i}\,+\,b\,-\,\epsilon \!\leq \!y_{i} \!\leq\! w^{T}x_{i}\,+\,b\!+\epsilon) \!\geq l-l\nu, \end{array} $$

(41)

where p is the parameter related to asymmetric. It means that the points lying above and below the ν -tube have different upper bounds. And when p=0.5, it has the same upper bound as ν -SVR.

The proof of this proposition is in [18]. We can find that there are different upper bounds for the samples lying above and below the 𝜖-tube in the Asy- ν-SVR.

Proposition 3

The optimal solution in Asy- ν -TSVR satisfies:

$$\begin{array}{@{}rcl@{}} ~~&&\!\!\sum\limits_{i=1}^{l}L(y_{i}< {w_{1}^{T}}x_{i}+b_{1}-\epsilon_{1}) \leq 2(1-p)l \nu_{1},\\ ~~&&\!\!\sum\limits_{i=1}^{l}L(y_{i} > {w_{2}^{T}}x_{i}+b_{2}+\epsilon_{2}) \leq 2p l \nu_{2},\\ ~~&&\!\!\sum\limits_{i=1}^{l}\!L({w_{1}^{T}}\!x_{i}\,+\,b_{1}\!\,-\,\epsilon_{1} \!\leq\! y_{i} \!\!\leq\! \!{w_{2}^{T}}x_{i}\,+\,b_{2}\,+\,\epsilon_{2})\!\!\geq \!l\!\,-\,2(\!1\,-\,\!p\!)l \nu_{1}\!\,-\,2p l \nu_{2}, \end{array} $$

(42)

where L(a)stands for an indicator function, which is equal to one when a is true and is zero otherwise.

Proof

Any point below the line ${w_{1}^{T}}x+b_{1}-\epsilon _{1}=0$ satisfies $({w_{1}^{T}}x_{i}+b_{1}-\epsilon _{1})-y_{i}=2(1-p)\xi _{i}$, and ξ _i > 0. According to the complementary slackness condition, we have γ _i = 0. We can further get $\alpha _{i}=\frac {C_{1}}{2(1-p)l}$ from (22). Since β ≥ 0, we get e ^T α ≤ C ₁ ν ₁ from (21), it means that $\displaystyle \sum \limits _{i=1}^{l}\alpha _{i} \leq C_{1}\nu _{1}$. From above, we get

$$\begin{array}{@{}rcl@{}} \sum\limits_{i=1}^{l}L(y_{i}< {w_{1}^{T}}x_{i}+b_{1}-\epsilon_{1}) \leq 2(1-p)l \nu_{1}. \end{array} $$

(43)

Similarly, the points lying above the line ${w_{2}^{T}}x+b_{2}+\epsilon _{2}=0$ satisfy

$$\begin{array}{@{}rcl@{}} \sum\limits_{i=1}^{l}L(y_{i} > {w_{2}^{T}}x_{i}+b_{2}+\epsilon_{2}) \leq 2pl \nu_{2}. \end{array} $$

(44)

Combining (43) and (44), we can get

$$\begin{array}{@{}rcl@{}} \sum\limits_{i=1}^{l}\!L({w_{1}^{T}}\!x_{i}\!+b_{1}\!\,-\,\epsilon_{1} \!\!\leq \!y_{i}\! \leq\! {w_{2}^{T}}x_{i}\,+\,b_{2}\,+\,\epsilon_{2}) \!\geq \!l\!\,-\,2(1\!\,-\,\!p)l \nu_{1}\!\,-\,2pl \nu_{2}. \end{array} $$

(45)

Note that there are different upper bounds for the samples lying above the upper bound and below the down-bound in the Asy- ν-TSVR.

□

5 Numerical experiments

To demonstrate the validity of our Asy- ν-TSVR, we compare it with other four algorithms, i.e., ν-SVR, LS-SVR, TSVR, and Asy- ν-SVR using one artificial dataset, eleven benchmark datasets, and a real wheat dataset. For the experiment on each dataset, we use 5-fold cross-validation to evaluate the performance of five algorithms. That is to say, the dataset is split randomly into five subsets; one of those sets is reserved as a test set, and the rest is training set. This process is repeated five times, and the average of five testing results is used as the performance measure.

5.1 Evaluation criteria

In order to evaluate the performance of our Asy- ν-TSVR, the evaluation criteria are specified before presenting the experimental results. The size of testing set is denoted by l, while y _i denotes the real value of a sample point x _i, $\widehat {y_{i}}$ denotes the predicted value of x _i, $\displaystyle \overline {y}=\frac {1}{l}\sum \limits _{i=1}^{l}y_{i}$ is the mean of y ₁,y ₂,⋯ ,y _l. We use the following criteria for algorithm evaluation [34].

MAE: Mean absolute error, defined as

$$\begin{array}{@{}rcl@{}} \text{MAE}= \displaystyle\frac{1}{m}\sum\limits_{i=1}^{m}|y_{i}-\hat{y}_{i}|. \end{array} $$

(46)

MAE is also a popular deviation measurement between the real and predicted values.

RMSE: Root mean squared error, defined as

$$\begin{array}{@{}rcl@{}} \text{RMSE}= \displaystyle\sqrt{\frac{1}{m}\sum\limits_{i=1}^{m}(y_{i}-\hat{y}_{i})^{2}}. \end{array} $$

(47)

SSE/SST: Ratio between sum of squared error and sum of squared deviation of testing samples, defined as

$$\begin{array}{@{}rcl@{}} \mathrm{SSE/SST}={\displaystyle\sum\limits_{i=1}^{m}(y_{i}-\hat{y}_{i})^{2}}/{\displaystyle\sum\limits_{i=1}^{m}(y_{i}-\bar{y})^{2}}. \end{array} $$

(48)

SSR/SST: Ratio between interpretable sum of squared deviation and real sum of squared deviation of testing samples, defined as

$$\begin{array}{@{}rcl@{}} \mathrm{SSR/SST}=\displaystyle\sum\limits_{i=1}^{m}(\hat{y}_{i}-\bar{y})^{2}/{\sum\limits_{i=1}^{m}(y_{i}-\bar{y})^{2}}. \end{array} $$

(49)

In most cases, small SSE/SST means there is good agreement between the estimates and the real values, and decreasing SSE/SST is usually accompanied by an increase in SSR/SST.

Time: The total training time and testing time.

5.2 Parameter selection

The performance of these five algorithms depends heavily on the choices of parameters. In our experiments, we choose optimal values for the parameters by the grid search method. In five algorithms, the Gaussian kernel parameter σ is selected from the set {2ⁱ|i = −4,−3,⋯ ,8}. In TSVR and Asy- ν-TSVR, we set C ₁= C ₂, ν ₁= ν ₂ and 𝜖 ₁= 𝜖 ₂ to degrade the computational complexity of parameter selection. The parameter C is searched from the set {2ⁱ|i = −3,−2,⋯ ,8}. The optimal values for ν in four algorithms are chosen from the set {0.1,0.2,⋯ ,0.9}. The optimal values for 𝜖 in algorithms are selected from the set {0.1,0.2,⋯ ,0.9}. Parameter p is searched from the set {0.2,0.4,0.45,0.5,0.55,0.6,0.8}.

5.3 Experiment on artificial dataset with noises

To evaluate the performance of Asy- ν-TSVR, we carry out an artificial experiment under different cases. We firstly generate 100 points (X _i,i = 1,⋯ ,100) following a uniform distribution in [0,1]⁵, linear function $Y(X)=w^{T}X+b+(\delta _{\chi ^{2}}-4)$ with w = [1;0.5;−0.5;−1;2], b = −3 is used to calculate their values, here $\delta _{\chi ^{2}}$ follows a chi-squared distribution with 4 degrees of freedom. The generated points (X _i,Y _i),i = 1,⋯ ,100 are regarded as samples. Subsequently we randomly select 5% of the samples and replace their observed results by random values following a uniform distribution in the range of [−15,15]; thus, the new samples (X _i,Y _i),i = 1,⋯ ,100 are produced. Finally, a uniform distribution in the range of [−30,30] is used to build another samples. Fivefold cross-validation is applied as before, the experimental results of five algorithms are listed in Table 1. Apparently, our proposed algorithm Asy- ν-TSVR always performs the lowest MAE among five algorithms on this artificial dataset with different noises. The reason may lie in that our Asy- ν-TSVR adopts the pinball loss function but ν-SVR, LS-SVR, and TSVR adopt 𝜖-insensitive loss. It makes Asy- ν-TSVR less sensitive to noises and has better generalization ability since samples from different positions are given different punishments. The Gaussian kernel function k(x _i,x _j) = exp(−||x _i − x _j||²/σ ²) is used on this artificial dataset. The optimal parameters used in the experiment are listed in the last column of Table 1.

Table 1 Testingerrors on artificial datasets with differentnoises

Full size table

5.4 Experiments on benchmark datasets

To further verify the efficiency of our algorithm, we conduct experiments on eleven benchmark datasets from the UCI machine learning repository^{Footnote 1}. The datasets are Auto Price, Bodyfat, Chwirut, Con. S, Diabetes, Machine-Cpu, Pyrimidibes, Triazines, Housing, Istanbul Stock Exchange, and Yacht Hydrodynamics. Both linear kernel and Gaussian kernel are considered in five algorithms. Moreover, statistical tests, including the Wilcoxon signed-rank test and Friedman test, are also used to demonstrate the validity of our proposed method. At last, we also study the relationship between the efficiency of our Asy- ν-TSVR and the asymmetry of the datasets in Section 5.4.4.

5.4.1 Result comparison and discussion

The experimental results of five algorithms are summarized in Table 2 when linear kernel is employed, and in Table 3 when Gaussian kernel is used. In the error items, the first item denotes the mean value of five times testing results, and the second item stands for plus or minus the standard deviation. Time denotes the mean value of time taken by five time experiments, and each experimental time consists of training time and testing time.

Table 2 Performance comparisons of five algorithms with linear case on eleven benchmark datasets

Full size table

Table 3 Performance comparisons of five algorithms with Gaussian kernel on eleven benchmark datasets

Full size table

In terms of MAE criterion, from Table 2, we can find that Asy- ν-SVR produces the lowest testing error among five algorithms in most cases when linear kernel is employed, followed by Asy- ν-TSVR. Both of them employ the pinball loss, which implies that pinball loss is more suitable than 𝜖-insensitive loss for these datasets. In addition, in terms of RMSE criterion, Asy- ν-TSVR yields the comparable testing error with Asy- ν-SVR and TSVR. It further shows that pinball loss is suitable for these datasets. Meanwhile, we can find that small MAE and RMSE corresponds to small SSE/SST and large SSR/SST in most cases.

No matter from MAE criterion or RMSE criterion, from Table 3, we can find that Asy- ν-TSVR yields the lowest testing errors among five algorithms on most datasets. Followed by TSVR. On dataset Pyrimidibes, although Asy- ν-TSVR produces lightly higher MAE than ν-SVR and Asy- ν-SVR. Asy- ν-TSVR produces the lowest RMSE among five algorithms. In addition, we can also find that Asy- ν-SVR yields slightly lower MAE than ν-SVR. It further testifies that the pinball loss function is effective in the Asy- ν-TSVR. Meanwhile, we can find that small MAE and RMSE corresponds to small SSE/SST and large SSR/SST in most cases.

In terms of computational time, from Tables 2 and 3, we can really find that TSVR costs more running time than Asy- ν-TSVR for most cases. It means that the introduction of pinball loss function does not increase computational cost of Asy- ν-TSVR. In addition, ν-SVR and Asy- ν-SVR cost larger running time than TSVR and Asy- ν-TSVR. The main reason lies in that they solve a larger-sized QPP but TSVR and Asy- ν-TSVR solve a pair of smaller-sized QPPs. LS-SVR costs the least running time among five algorithms since it solves a system of linear equations rather than a QPP.

To further verify the computational burden with different p values in our Asy- ν-TSVR, we analyze the experiments on 11 benchmark datasets with different p values. The average values of computational time on each dataset are summarized in Table 4, and the MAE of Asy- ν-SVR and Asy- ν-TSVR are displayed in Table 5.

Table 4 The average of computational time with different p values for Asy- ν-TSVR with Gaussian kernel

Full size table

Table 5 The average of MAE with different p values for Asy- ν-SVR and Asy- ν-TSVR with Gaussian kernel

Full size table

From Table 4, we can find that Asy- ν-TSVR costs less time when p value is near to 0.5. That is, smaller or larger p value will increase the computational burden. From Table 5, we can see that the MAE of both Asy- ν-SVR and Asy- ν-TSVR are monotonic as p varies from 0.2 to 0.8 for most datasets. For example, the MAE is monotonically decreasing on Auto Price and Machine-Cpu. However, it is monotonically increasing on Pyrimidibes. The p value controls the imbalance ratio of punishments on different samples. The experimental results in Table 5 verify that the introduction of pinball loss can actually improve the performance of the model for Asymmetric datasets. And different datasets fit for different p values.

5.4.2 Wilcoxon signed-rank test

In these benchmark experiments, both linear kernel and nonlinear kernel are employed. To verify that which is more effective, the Wilcoxon signed-rank test [35] is used.

The Wilcoxon signed-rank test is a nonparametric statistical hypothesis test used when comparing two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e., it is a paired difference test). Here, we use the “signrank” function in Matlab to do the Wilcoxon signed-rank test. For each algorithm, the null hypothesis H ₀ demonstrates that the MAE in linear case is larger than that in the nonlinear case.

Table 6 lists the p values of a right-sided Wilcoxon signed-rank test, the test statistic W and a logical value indicating the test decision. h = 1 indicates a rejection of the null hypothesis, and h = 0 indicates a failure to reject the null hypothesis at the 5% significance level. We can see that the results are h = 0 for all five algorithms, which means that the nonlinear case is superior to the linear case.

Table 6 Results of the Wilcoxon signed-rank test between the linear case and nonlinear case on five algorithms

Full size table

5.4.3 Paired t test

From Table 3, one can easily observe that our proposed Asy- ν-TSVR does not outperform four other algorithms for all datasets in the nonlinear case. We use five times testing results corresponding to the lowest error to perform the paired t test. The null hypothesis H ₀ demonstrates that there is no significant difference between the two algorithms tested. The hypothesis H ₀ is rejected if the p value is less than 0.05 under the significance level α = 0.05. We compute the p values between Asy- ν-TSVR and other algorithms. From the experimental results, we can find that there are significant differences between Asy- ν-TSVR and other algorithms on datasets Chwirut, Housing, and Yacht Hydrodynamics. However, there is no significant difference on other datasets.

5.4.4 Friedman test

To further demonstrate the validity of our proposed Asy- ν-TSVR in the nonlinear case, Friedman test [36, 37] is employed. We assume Friedman test with the corresponding post hoc tests which is considered to be a simple, nonparametric yet safe test. For this, the average ranks of five algorithms on MAE for all datasets are calculated and listed in Table 7. Under the null hypothesis that all algorithms are equivalent, one can compute the Friedman statistic according to (50),

$$\begin{array}{@{}rcl@{}} {\chi_{F}^{2}}=\frac{12N}{k(k+1)}\left[\sum\limits_{j}{R_{j}^{2}}-\frac{k(k+1)^{2}}{4}\right], \end{array} $$

(50)

where $\displaystyle R_{j}=\frac {1}{N}\sum \limits _{i}{r_{i}^{j}}$, and ${r_{i}^{j}}$ denotes the j-th of k algorithms on the i-th of N datasets. Friedman’s ${\chi _{F}^{2}}$ is undesirably conservative and derives a better statistic

$$\begin{array}{@{}rcl@{}} F_{F}=\frac{(N-1){\chi_{F}^{2}}}{N(k-1)-{\chi_{F}^{2}}}, \end{array} $$

(51)

which is distributed according to the F-distribution with k-1 and (k − 1)(N − 1) degrees of freedom.

Table 7 Average ranks of five algorithms with Gaussian kernel on MAE values

Full size table

We can obtain ${\chi _{F}^{2}}=12.09$ and F _F = 3.789 according to (50) and (51), where F _F is distributed according to F-distribution with (4, 40) degrees of freedom. The critical value of F(4, 40) is 2.61 for the significance level α = 0.05, and similarly it is 3.13 for α = 0.025 and 3.83 for α = 0.01. Since the value of F _F is much larger than the critical value, there is significant difference between five algorithms. Note that the average rank of Asy- ν-TSVR is far lower than the remaining algorithms. It means that our Asy- ν-TSVR is more valid than other four algorithms.

5.4.5 Analyze the asymmetry of two datasets

In order to analyze the asymmetry of datasets, the kernel function-based method is used to estimate the p.d.f. of $y-\widehat f(x)$. In the following, the p.d.f. of $y-\widehat f(x)$ obtained by the ν-SVR and the Asy- ν-TSVR are listed. Where the results of Chwirut dataset are shown in Fig. 3, and Bodyfat dataset in Fig. 4.

Apparently, Figs. 3 and 4 imply that the asymmetry of Bodyfat dataset is not obvious. On the contrary, the Chwirut dataset shows noticeable asymmetry. However, we can find from Tables 2 and 3 that our proposed algorithm Asy- ν-TSVR outperforms the rest algorithms on these two datasets. This is because the pinball loss function enhances the generalization ability of our Asy- ν-TSVR for Asymmetric datasets. And the symmetric datasets can be regarded as the special case of the Asymmetric datasets.

5.5 Experiment on wheat dataset

There are 210 wheat samples from all over China in this real data experiments [34]. The protein content of wheat ranges from 9.83 to 20.26%, and the wet gluten of wheat ranges from 14.8 to 44.6%. They are provided by Heilongjiang research institute of agricultural science. Each sample has 1193 spectral features. They were scanned in transmission mode using a commercial spectrometer MATRIX-I. Samples were acquired in a rectangular quartz cuvette of 1-mm path length with air as reference at room temperature (20–24 ^∘C). The reference spectrum was subtracted from the sample spectra to remove background noise. The rectangular quartz cuvette was cleaned after each sample was scanned to minimize cross-contamination.

In this high-dimensional data experiment, we predict the content of protein and the wet gluten of wheat using 1193 spectral features of wheat. Now that Gaussian kernel yields better generalization performance than linear kernel, we only consider the former, the prediction errors of five algorithms are summed in Tables 8 and 9.

Table 8 Performance comparisons of five algorithms with Gaussian kernel on protein content of wheat

Full size table

Table 9 Performance comparisons of five algorithms with Gaussian kernel function on wet gluten of wheat

Full size table

From Tables 8 and 9, we can learn that Asy- ν-TSVR yields the lowest prediction errors (0.9122% for protein content and 2.2957% for wet gluten) among five algorithms. The prediction errors of TSVR follow. LS-SVR outperforms ν-SVR and Asy- ν-SVR either on the prediction of protein content or the prediction of wet gluten. ν-SVR and Asy- ν-SVR produce the comparable testing errors. In addition, Asy- ν-SVR produces lightly lower testing errors than ν-SVR. It implies that the pinball loss is effective in ν-SVR and TSVR. In terms of the computational time, LS-SVR costs the shortest running time since it solves a system of linear equations, but other four algorithms solve a pair of smaller-sized QPPs or a larger-sized QPP.

6 Conclusion

In this paper, we propose an Asy- ν-TSVR for the regression problem. Asy- ν-TSVR solves a pair of smaller-sized QPPs instead of a larger-sized one as in the traditional ν-SVR. So it works faster than Asy- ν-SVR. Asy- ν-TSVR employs the pinball loss function $L_{\epsilon }^{p}(u)$ as opposed to the 𝜖-insensitive loss function L _𝜖(u); then, it can effectively reduce the disturbance of the noise and improve the generalization performance. Three kinds of experiments demonstrate the validity of our proposed Asy- ν-TSVR. Asy- ν-TSVR degrades into the ν-TSVR when p = 0.5, so our Asy- ν-TSVR is an extended version of the ν-TSVR, and it is applicable to the symmetric and asymmetric datasets. How to apply the pinball loss function to other TSVMs is our future work.

Notes

http://archive.ics.uci.edu/ml/datasets.html

References

Steinwart I, Christmann A (2008) Support vector machines. Springer, New York
MATH Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Book Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press
Schölkopf B, Smola A, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12:1207–1245
Article Google Scholar
Bi J, Bennett KP (2003) A geometric approach to support vector regression. Neurocomputing 55:79–108
Article Google Scholar
Peng XJ (2010) TSVR: an efficient twin support vector machine for regression. Neural Netw 23:365–372
Article Google Scholar
Peng XJ (2012) Efficient twin parametric insensitive support vector regression model. Neurocomputing 79:26–38
Article Google Scholar
Peng XJ, Xu D, Shen JD (2014) A twin projection support vector machine for data regression. Neurocomputing 138:131–141
Article Google Scholar
Santanu G, Mukherjee A, Dutta PK (2009) Nonparallel plane proximal classifier. Signal Process 89:510–522
Article Google Scholar
Jayadeva KR, Chandra S (2007) Khemchandani Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29:905–910
Article Google Scholar
Xu YT, Xi WW, Lv X (2012) An improved least squares twin support vector machine. J Info Comput Sci 9:1063–1071
Google Scholar
Singh M, Chadha J, Ahuja P, Jayadeva S (2011) Chandra Reduced twin support vector regression. Neurocomputing 74:1474– 1477
Article Google Scholar
Zhao YP, Zhao J, Zhao M (2013) Twin least squares support vector regression. Neurocomputing 118:225–236
Article Google Scholar
Kumar MA, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36:7535–7543
Article Google Scholar
Tomar D, Agarwal S (2015) Twin Support Vector Machine: a review from 2007 to 2014. Egyptian Info J 16:55–69
Article Google Scholar
Huang XL, Shi L, Suykens JAK (2014) Support vector machine classifier with pinball loss. IEEE Trans Pattern Anal Mach Intell 36:984–997
Article Google Scholar
Huang XL, Shi L, Pelckmans K, Suykens JAK (2014) Asymmetric ν-tube support vector regression. Comput Stat Data Anal 77:371–382
Article MathSciNet Google Scholar
Xu YT, Yang ZJ, Zhang YQ, Pan XL, Wang LS (2016) A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification. Knowl-Based Syst 95:75–85
Article Google Scholar
Xu YT, Yang ZJ, Pan XL (2016) A novel twin support vector machine with pinball loss. IEEE Trans Neural Netw Learn Syst 28(2):359–370
Article MathSciNet Google Scholar
Hao PY (2010) New support vector algorithms with parametric insensitive/margin model. Neural Netw 23:60–73
Article Google Scholar
Le Masne Q, Pothier H, Birge NO, Urbina C, Esteve D (2009) Asymmetric noise probed with a josephson junction. Phys Rev Lett 102:067002
Article Google Scholar
Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Prob Lett 54:437–447
Article MathSciNet Google Scholar
Sengupta RN (2008) Use of asymmetric loss functions in sequential estimation problems for multiple linear regression. J Appl Stat 35:245–261
Article MathSciNet Google Scholar
Xu YT, Guo R (2014) An improved ν-twin support vector machine. Appl Intell 41:42–54
Article Google Scholar
Xu YT, Wang L, Zhong P (2012) A rough margin-based ν-twin support vector machine. Neural Comput Applic 21:1307–1317
Article Google Scholar
Steinwart I, Christmann A (2011) Estimating conditional quantiles with the help of the pinball loss. Bernoulli 17:211–225
Article MathSciNet Google Scholar
Suykens JAK, Tony VG, Jos DB et al (2002) Least squares support vector machines. World Scientific Pub Co, Singapore
Book Google Scholar
Xu YT (2012) A rough margin-based linear ν support vector regression. Stat Prob Lett 82:528–534
Article MathSciNet Google Scholar
Xu YT, Wang LS (2012) A weighted twin support vector regression. Knowl-Based Syst 33:92–101
Article Google Scholar
Navia-Vzquez F, Prez-Cruz A, Arts-Rodrguezand A, Figueiras-Vidal R (2001) Weighted least squares training of support vectors classifiers which leads to compact and adaptive schemes. IEEE Trans Neural Netw 12 (5):1047–1059
Article Google Scholar
Prez-Cruz J, Herrmann DJL, Scholkopf B (2003) Weston Weston Extension of the nu-SVM range for classification. In: Prez-Cruz J, Herrmann DJL, Scholkopf B (eds) Advances in learning theory: methods, models and applications. IOS Press, pp 179–196
Scholkopf B, Smola A, Bartlett P, Williamson R (2000) New support vector algorithms. Neural Comput 12(5):1207–1245
Article Google Scholar
Xu YT, Wang LS (2014) k-nearest neighbor-based weighted twin support vector regression. Appl Intell 41:299–309
Article Google Scholar
Gibbons JD, Chakraborti S (2011) Nonparametric statistical inference, 5th Ed. Chapman & Hall CRC Press, Taylor & Francis Group, Boca Raton
MATH Google Scholar
Dems̆ar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining. Experimental analysis of power. Info Sci 180:2044–2064
Google Scholar

Download references

Acknowledgments

The authors gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation. This work was supported in part by the National Natural Science Foundation of China (No. 11671010) and Natural Science Foundation of Beijing Municipality (No. 4172035).

Author information

Authors and Affiliations

College of Science, China Agricultural University, Beijing, 100083, China
Yitian Xu, Xianli Pan & Zhiji Yang
Donlinks School of Economics and Management, University of Science and Technology Beijing, Beijing, 100083, China
Xiaoyan Li

Authors

Yitian Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xianli Pan
View author publications
You can also search for this author in PubMed Google Scholar
Zhiji Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yitian Xu.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, Y., Li, X., Pan, X. et al. Asymmetric ν-twin support vector regression. Neural Comput & Applic 30, 3799–3814 (2018). https://doi.org/10.1007/s00521-017-2966-z

Download citation

Received: 28 April 2015
Accepted: 24 March 2017
Published: 27 April 2017
Issue Date: December 2018
DOI: https://doi.org/10.1007/s00521-017-2966-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Asymmetric ν-twin support vector regression

Abstract

Similar content being viewed by others

An \(l_1\)-norm loss based twin support vector regression and its geometric extension

Robust Twin Support Vector Regression with Smooth Truncated Hε Loss Function

A least squares twin support vector machine method with uncertain data

Explore related subjects

1 Introduction

2 Related works

2.1 ν-support vector regression

2.2 Asymmetric ν-support vector regression

2.3 Twin support vector regression

3 Asymmetric ν-twin support vector regression

3.1 Linear case

3.2 Nonlinear case

4 Discussion about the bound

Proposition 1

Proposition 2

Proposition 3

Proof

5 Numerical experiments

5.1 Evaluation criteria

5.2 Parameter selection

5.3 Experiment on artificial dataset with noises

5.4 Experiments on benchmark datasets

5.4.1 Result comparison and discussion

5.4.2 Wilcoxon signed-rank test

5.4.3 Paired t test

5.4.4 Friedman test

5.4.5 Analyze the asymmetry of two datasets

5.5 Experiment on wheat dataset

6 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation