Weighted support vector machine using fuzzy rough set theory

Moslemnejad, Somaye; Hamidzadeh, Javad

doi:10.1007/s00500-021-05773-7

Weighted support vector machine using fuzzy rough set theory

Methodologies and Application
Published: 20 April 2021

Volume 25, pages 8461–8481, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

Weighted support vector machine using fuzzy rough set theory

Download PDF

Somaye Moslemnejad¹ &
Javad Hamidzadeh²

442 Accesses
10 Citations
Explore all metrics

Abstract

The existence of both uncertainty and imprecision has detrimental impact on efficiency of decision-making applications and some machine learning methods, in particular support vector machine in which noisy samples diminish the performance of SVM training. Therefore, it is important to introduce a special method in order to improve this problem. Fuzzy aspects can handle mentioned problem which has been considered in some classification methods. This paper presents a novel weighted support vector machine to improve the noisy sensitivity problem of standard support vector machine for multiclass data classification. The basic idea is considered to add a weighted coefficient to the penalty term Lagrangian formula for optimization problem, which is called entropy degree, using lower and upper approximation for membership function in fuzzy rough set theory. As a result, noisy samples have low degree and important samples have high degree. To evaluate the power of the proposed method WSVM-FRS (Weighted SVM-Fuzzy Rough Set), several experiments have been conducted based on tenfold cross-validation over real-world data sets from UCI repository and MNIST data set. Experimental results show that the proposed method is superior than the other state-of-the-art competing methods regarding accuracy, precision and recall metrics.

Applications of Boolean Kernels in Rough Sets

IOWA Rough-Fuzzy Support Vector Data Description

Learning Rough Set Based Classifiers Using Boolean Kernels

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Support vector machine (SVM) is supervised machine learning model which has been presented for classification and regression analysis (Vapnik 1995; Yang and Xu 2017; Mao et al. 2014; Santhanama et al. 2016). SVM has been converted into a convex quadratic optimization problem and then solved by a quadratic programming (QP) technique (Vanir 1999). It can also classify nonlinear samples using the kernel function (Cortes and Vapnik 1995; Xue et al. 2018; Moghaddam and Hamidzadeh 2016).

In the real-world applications, there is some complex information under uncertainty which reduces efficiency and accuracy of decision-making systems. Machine learning algorithms would be able to handle decision-making applications via classification algorithms, but it requires a specific algorithm which has high tolerance in dealing with uncertainty aspects, namely noise. The classification hyperplane, which is obtained by SVM, is determined by the support vectors. The presence of noise increases inefficiency of the standard SVM training which makes the decision boundary from the optimal hyperplane. In machine learning, there are many techniques to improve classification (Alcantud et al. 2019; Hamidzadeh and Moradi 2018, 2020; Hamidzadeh et al. 2014, 2017; Hamidzadeh and Namaei 2018; Hamidzadeh and Ghadamyari 2019; Javid and Hamidzadeh 2019). Therefore, there are many methods which have been proposed to improve the SVM classification using identifying uncertainty samples such as noisy and outlier ones in order to discard or delete them (Han et al. 2016; Nguyen et al. 2018; Xu et al. 2016). On the other hand, some solutions have been presented to deal with noisy and outlier samples in weighted support vector machine classification methods in order to reduce the effect of unimportant samples (Karal 2017; Sheng et al. 2015; Zhou et al. 2016; Yang et al. 2005). The previous methods, in which weighted SVM is based on probabilities, cannot reduce the effect of noisy samples in SVM training accurately.

Fuzzy aspects in the real-world applications play a crucial role due to existence of sophisticated information in real-world applications such as medical diagnosis, pattern recognition and wherever exist big data. With respect to the importance of this aspect, the use of probabilistic and fuzzy methods is effective (Singh et al. 2020; Sivasankar et al. 2020). Also regarding importance of the efficiency and the speed of classification-based model, it is important to introduce a specific method that classifies precisely. In order to realize this aim, it is better to introduce a more precise and quick method. Therefore, the fuzzy rough set theory is redesigned. This paper presents a method to lessen the effect of noise in SVM training with soft margin using fuzzy rough set theory (Dubois and Prade 1990). In the proposed method, a weight coefficient is added to the penalty term Lagrangian formula for optimization problem. This weighted coefficient is called entropy degree, which uses lower and upper approximation for membership function in fuzzy rough set theory. As a result, in the proposed method—WSVM-FRS (Weighted SVM-Fuzzy Rough Set)—noisy samples have low degree, and important samples have high degree. The results have obtained good classification accuracy, precision and recall for SVM training.

The rest of this paper is organized as follows: In Sect. 2, a survey of weighted support vector machine algorithms is presented. In Sect. 3, primary concepts of support vector machine and fuzzy rough set theory are presented. In Sect. 4, the proposed method is introduced. The experimental results are shown in Sect. 5. Finally, Sect. 6 contains conclusions and future works.

2 Related works

Several methods have been proposed for weighted support vector machine. This section surveys some important weighted support vector machine methods to deal with noisy and outlier samples.

Weighted support vector machine (WSVM) (Yang et al. 2005) has been presented to improve the outlier sensitivity problem of SVM. The basic idea is possibility c-means (PCM), which is extended into kernel space to generate different weight values for main training data points and outliers according to their relative importance in the training set. In Du et al. (2017), a fuzzy compensation multiclass SVM method has been introduced to improve the outlier and noise sensitivity problem, which gives the dual effects to penalty term through treating every data point as both positive and negative classes, but with different memberships. In Sheng et al. (2015), a method has been presented to reduce the noise sensitive issue based on fuzzy least square support vector machine. By applying fuzzy inference and nonlinear correlation measurement, the effects of the samples with low confidence can be reduced. WDRSVM (Li et al. 2016) has been presented a weighted doubly regularized support vector machine to deal with noise by using both the distance information between classes and within each class. Incosh loss (Karal 2017) has been introduced to obtain support vector regression (SVR) models for coping with different noise distributions, which is optimal in the maximum likelihood sense for the hyper-secant error distributions. In Ding et al. (2017), the presented method (WLMSVM) is a new classifier for multiclass classification, called weighted linear loss multiple birth support vector machine based on information granulation to enhance the performance of multiple WLTSVM.

BWSVM (Sun et al. 2017) has presented a band-weighted support vector machine, which is to quantify the divergent contributions of different bands when implementing SVM; the BWSVM adopts the L1 norm penalty term of band weights on the original SVM. In Li et al. (2017), a method has been introduced to construct the new weighted mechanisms for both loss and penalty, which has developed the weight partly adaptive elastic net for dealing with the binary classification problem of microarray with noise by using the distances from the sample points to both class centers. In Lu et al. (2017) a probabilistic weighted least squares SVM method has been presented to model these kinds of processes under noise; this method can increase robustness and accuracy even with outliers or non-Gaussian noise. RLS-SVM (Yang et al. 2014) has presented a method based on the truncated least squares loss function for regression and classification with noise. DS-RLSSVM (Zhou et al. 2016) has been developed to model complex systems in the presence of various types of random noise. The integration of the distributed LS-SVM and fuzzy clustering is used to construct the evidence for the LS-SVM parameters. In Zhang et al. (2018), a method has been presented to incorporate prior knowledge into SVM using sample confidence, which is called feature weighted confidence with SVM. This method computes the sample confidence directly from the weights of prior features provided by SVM. In Xu et al. (2015), a new support vector weighted quantile regression approach has been introduced that is closely built upon the idea of support vector machine. It can be estimated by solving a Lagrangian dual problem of quadratic programming and is able to implement the nonlinear quantile regression by introducing a kernel function. In Tang et al. (2019), a new approach of integrating piecewise linear representation and weighted support vector machine has been introduced to forecast the stock turning points. K-SRLSSVCR (Ma et al. 2019) has been proposed a robust least squares version of K-SVCR (K-RLSSVCR) based on squares ε-insensitive ramp loss and truncated least squares loss, which partially depress the impact of outliers on the new model via its nonconvex ε-insensitive ramp loss and truncated least squares loss.

Overall, in the real-world applications, it is impossible to ignore the importance of productivity and speed of classification-based model due to the existence of complex information. Consequently, it is substantial to introduce a specific method in order to decide and classify precisely. Regarding this, fuzzy aspects and probability distribution can be effective. Although the previous methods have increased the precision and accuracy of SVM classifier, those methods have not been able to handle uncertainty aspects such as noisy and outlier samples based on a probability distribution. In order to realize this aim, it is better to introduce a more precise and quick method. Thus, the fuzzy rough set theory is redesigned. Following the strategy outlined above, this paper proposes a new weighted support vector machine method based on the fuzzy rough set theory in order to decrease uncertainty.

3 Preliminaries

In this section, a brief overview of the constructive concepts of the proposed method is presented. In Sect. 3.1, the concepts regarding the support vector machine are reviewed. In Sect. 3.2, the basic comprehension of fuzzy rough set theory is defined.

3.1 Support vector machine

Suppose that there is a group of training samples $\{ (x_{i} ,y_{i} ),x_{i} \in R^{d} ,y_{i} \in \{ + 1, - 1\} ,\;i = 1, \ldots ,N\} ,$ where $x_{i}$ represents the ith sample and $y_{i}$ represents its corresponding class label. SVM aims to find a hyperplane which separates the positive training samples from those negative ones and maximizing the margin W between both training samples. Also SVM can extend two class problems to multiclass problems (Hsu and Lin 2002) which $y_{i} \in \{ 1, \ldots ,C_{i} \} ,$ where C_i represents the number of classes. There are two common methods for implementation for SVM multiclass classification including one-against-all method and one-against-one method. In this paper, one-against-one method is considered. In order to maximize the margin, thus it needs to minimize $\left\| W \right\|$ that converts to primary quadratic programming of SVM (Vapnik 1995; Yang and Xu 2017; Mao et al. 2014; Santhanama et al. 2016) as following:

$$ \begin{aligned} & \min \;\frac{1}{2}\,\left\| W \right\|^{2} + C\sum\limits_{i = 1}^{N} {\xi_{i} } \\ & s.t\;y_{i} (w^{\tau } \phi (x_{i} ) + b) \ge 1 - \xi_{i} ; \\ & \quad \xi_{i} \ge 0;\;i = 1, \ldots ,N \\ \end{aligned} $$

(1)

where $\xi_{i}$ is the error term and $C > 0$ is the regularization parameter. The above optimization can be converted to one problem in the form of (2) by introducing Lagrange parameters

$$ \begin{aligned} & \min \;\frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{i = 1}^{N} {\xi_{i} } - \sum\limits_{i = 1}^{N} {\alpha_{i} \left[ {y_{i} (w^{\tau } \phi (x_{i} ) + b) - 1 + \xi_{i} } \right]} - \sum\limits_{i = 1}^{N} {\xi_{i} } \mu_{i} \\ & s.t\;\alpha_{i} \ge 0;\mu_{i} \ge 0,i = 1, \ldots ,N \\ \end{aligned} $$

(2)

where w is a weight vector, b is bias,$\alpha$ is Lagrangian coefficients, and $\phi (x_{i} )$ is kernel of training samples.

To apply such trends, the results of partial derivatives are replaced in (2). Therefore, a dual problem is constructed in the form of (3). In fact, it can be solved by quadratic programming.

$$ \begin{aligned} & \min \;\frac{1}{2}\sum\limits_{i} {\sum\limits_{j} {\alpha_{i} } } \alpha_{j} y_{i} y_{j} x_{i}^{\tau } x_{j} - \sum\limits_{i} {\alpha_{i} } = \frac{1}{2}\sum\limits_{i} {\sum\limits_{j} {\alpha_{i} \alpha_{j} y_{i} y_{j} k(x_{i} ,x_{j} )} } - \sum\limits_{i} {\alpha_{i} } \\ & s.t\;\sum\limits_{i} {\alpha_{i} } y_{i} = 0;\;0 \le \alpha_{i} \le C,\;\forall i \\ \end{aligned} $$

(3)

Hence, the solution has the form

$$ w = \sum\limits_{i = 1}^{n} {\alpha_{i} y_{i} \phi (x_{i} )} = \sum\limits_{i \in sv} {\alpha_{i} y_{i} \phi (x_{i} )} $$

(4)

where SV is the number of support vectors

$$ \begin{aligned} & sv = \{ i|0 \le \alpha_{i} \le C\} \\ & \forall i \in sv,\,w^{\tau } \phi (x_{i} ) + b = y_{i} \\ & y_{i} \in \{ + 1, - 1\} \\ & b_{i} = y_{i} - w^{\tau } \phi (x_{i} ) \\ \end{aligned} $$

(5)

where $x_{i}$ is support vector. And the average of all this $b_{i}$ defines the bias

$$ b = \frac{1}{{\left| {sv} \right|}}\sum\limits_{i \in sv} {b_{i} = } \frac{1}{{\left| {sv} \right|}}\sum\limits_{i \in sv} {(y_{i} - w^{\tau } \phi (x_{i} ))} $$

(6)

Once the optimal pair (w, b) is determined, the decision function is obtained by

$$ f = (w,\phi (z)) + b = {\text{sign}}\left( {\sum\limits_{i = 1}^{{N_{sv} }} {\alpha_{i} } y_{i} K(x_{i} ,z) + b} \right) $$

(7)

where $N_{sv}$ is the number of support vectors.

3.2 Fuzzy rough set

Fuzzy rough set theory (Dubois and Prade 1990) is built upon two other theories, namely fuzzy set theory (Zadeh 1965) and rough set theory (Pawlak 1982). Rough set tries to divide universe of discourse to positive region (lower approximation sets), boundary region, and negative region. Fuzzy rough set for each sample returns a pair of memberships that show lower approximation membership as a degree of certainty and upper approximation membership as a possibility degree of being included in target samples.

The lower and upper approximation memberships are constructed by indiscernibility relationships (IR) between samples and the amount of dependency on the target set, $\mu_{F}$ (Verbiest et al. 2013b). IR as shown in (8) measures the similarity of each pair of samples. When two samples are identical, IR becomes 1, and when the samples are completely different, it shows zero.

$$ {\text{IR}}(x_{i} ,x_{j} ) = \mathop \tau \limits_{{\alpha \in {\text{Dimension}}}} \left( {1 - \left| {x_{i} (\alpha ) - x_{j} (\alpha )} \right|^{2} } \right) $$

(8)

where $\tau$ is a triangular norm (t-norm),$\tau :[0,1]^{2} \to [0,1]$ based on dimension (attribute) $\alpha$.

According to the above relation, the differences between two samples in every dimension are aggregated by a t-norm. The result is called indiscernibility relation between two samples under decision boundary DB. On the other hand, membership of each sample into target class, $\mu_{F} ,$ can be shown differently like binary ones or the others. Therefore, by these two concepts, lower and upper approximation memberships are defined in (9) and (10), respectively:

$$ \mu_{{\underline{{F_{{{\text{IR,}}\mu_{F} }} }} }} (x_{i} ) = \mathop {\inf }\limits_{{x_{j} \in T,x_{i} \ne x_{j} }} I({\text{IR}}(x_{i} ,x_{j} ),\mu_{F} (x_{j} )) $$

(9)

$$ \mu_{{\overline{{F_{{{\text{IR}},\mu_{F} }} }} }} (x_{i} ) = \mathop {\sup }\limits_{{x_{j} \in T,x_{i} \ne x_{j} }} \tau ({\text{IR}}(x_{i} ,x_{j} ),\mu_{F} (x_{j} )). $$

(10)

As shown in the above equations, fuzzy operators, implicator (I) and t-norm (τ), combine two basic elements and result in several outputs. Then, “inf” and “sup” select one of these outcomes as the final result. Hence, outlier and noise data can change lower and upper approximation memberships in a wide range. Therefore, Verbiest et al. (2013a) propose adjusted versions of the memberships using order weigh average (OWA) instead of “inf” and “sup.” These forms of memberships are presented as follows:

$$ \mu_{{\underline{{F_{{{\text{IR,}}\mu_{F} }} }} }} (x_{i} ) = \mathop {{\text{OWA}}_{\min } }\limits_{{x_{j} \in T,x_{i} \ne x_{j} }} I({\text{IR}}(x_{i} ,x_{j} ),\mu_{F} (x_{j} )) $$

(11)

$$ \mu_{{\overline{{F_{{{\text{IR}},\mu_{F} }} }} }} (x_{i} ) = \mathop {{\text{OWA}}_{\max } }\limits_{{x_{j} \in T,x_{i} \ne x_{j} }} \tau ({\text{IR}}(x_{i} ,x_{j} ),\mu_{F} (x_{j} )). $$

(12)

Many methods like popular fuzzy belief function (Shafer 1976) follow similar trend to handle conflicting, incomplete, and uncertain information (Liu et al. 2011, 2015, 2016). In fact, it tries to explain probability of occurring a subset of universe of discourse based on belief and plausibility functions in the form of (13) and (14), respectively.

$$ \forall A \in T:{\text{Bel}}(A) = \sum\limits_{B:B \in A} {P_{\alpha } } (B) $$

(13)

$$ \forall A \in T:{\text{Pl}}(A) = \sum\limits_{B:B \cap A \ne \phi } {P_{\alpha } } (B) $$

(14)

where $P_{\alpha } (B)$ demonstrates the probability of happening B while the amount of α has the probability of occurrence relevant to α-cut in fuzzy concept (Chen et al. 2008). On the other hand, the relation between these two concepts as explained in (Dubois and Prade 1990; Chen et al. 2008; Yao and Lingras 1998; Wu et al. 2002; Liu et al. 2015) can be gained as follows:

$$ \forall A \in T:{\text{Bel}}(A) = \mu_{{\underline{{F_{{{\text{IR,}}\mu_{F} }} }} }} (A) $$

(15)

$$ \forall A \in T:{\text{Pl}}(A) = \mu_{{\overline{{F_{{{\text{IR}},\mu_{F} }} }} }} (A). $$

(16)

Consequently, fuzzy rough set is powerful to handle the vagueness and conflict among data and its effectiveness has been proved by remarkable results in fields such as KNN improvement (Verbiest et al. 2013a; Bian and Mazlack 2003; Derrac et al. 2013), fuzzy decision tree expansion (Zhai 2011), and solid multiple traveling salesman problem (Changdar et al. 2016). However, it stands on common fuzzy operators which may be improvable in some issues.

4 Proposed method

In this section, a novel method, namely WSVM-FRS (Weighted SVM-Fuzzy Rough Set), is proposed for SVM training which is one of the novel data characteristic to reduce the effect of noise in SVM training with soft margin toward important samples in contrast to the others. In this method, a weighted coefficient is added to the penalty term Lagrangian formula for optimization problem, using lower and upper approximation for membership function in fuzzy rough set theory. Consequently, in WSVM-FRS noisy samples have low degree. The simplest form of samples is shown by discrete attributes reviewed in the following paragraph.

In Fig. 1, assume a curved line as boundary region dividing universe of discourse. The whole area is sectionalized by indiscernibility relation (IR) shaping equivalent squares. Therefore, these squares contain samples that have similar attributes. According to the rough set theory, the upper approximation represented includes all squares having at least one sample in the boundary region. On the other hand, as shown in Fig. 1, samples in square units located in the boundary region are positive region (lower approximation sets) of the universe segregated by curved boundary region and IR. Consequently, as shown in Fig. 1, simultaneously considering both approximation sets, samples that are certainly in the boundary region gain higher values rather than those on the boundary region. In addition, negative samples are the ones taking zero values.

The approximation memberships constructing ED use (8) and (17) as IR and amount of belonging samples to target set $t_{{\text{s}}} ,$ respectively. $t_{{\text{s}}}$ demonstrates reverse distance of samples from the center of the class

$$ t_{{\text{s}}} = 1 - \frac{{\sum\limits_{{\alpha \in {\text{Dimension}}}} {\left| {\sum\nolimits_{{x_{i} \in T}} {x_{i} (\alpha )} - c^{*} } \right|} }}{{\left| {c^{*} } \right|}}^{2} $$

(17)

where T is target set and α dimension (attribute) and ‖ is absolute value. Also c^* is the center of class. The center of each class c^* is defined as follows

$$ c^{*} = \frac{1}{N}\sum\limits_{{x_{i} \in y_{i} }} {x_{i} } $$

(18)

where N is total number of training sample in each class.

Therefore, the lower and upper approximations memberships are reconstructed according to (9) and (10) by the following

$$ \mu_{{\underline{{F_{{{\text{IR}},\mu_{F} }} }} }} (x_{i} ) = \inf I({\text{IR}}(x_{i} ,c^{*} ),t_{{\text{s}}} ) $$

(19)

$$ \mu_{{\overline{{F_{{{\text{IR}},\mu_{F} }} }} }} (x_{i} ) = \sup \tau ({\text{IR}}(x_{i} ,c^{*} ),t_{{\text{s}}} ). $$

(20)

It is important to introduce a measurement parameter which measures relation among samples based on entropy of samples. Furthermore, WSVM-FRS sets penalty and kernel parameters by using grid search. The new data characteristics discover the certainty of a sample. Each sample, $x_{i} ,$ is recognized as certain one if it is represented by entropy of attributes similar to the others.

In order to satisfy the above explanations, WSVM-FRS creates a map of data importance by computing ED in the form of (21).

$$ {\text{ED}}(x_{i} ) = - \sum\limits_{{x_{i} \in R^{d} }} {\mu_{{\overline{{F_{{{\text{IR}},\mu_{F} }} }} }} (x_{i} )\log_{2} \mu_{{\underline{{F_{{{\text{IR}}}} ,_{{\mu_{F} }} }} }} (x_{i} )} $$

(21)

Therefore, in this paper, ED concludes that both lower and upper approximation memberships are proposed as certainty index. In fact, ED gains a general perspective of sample’s roles by the upper approximation membership. Following this, the lower approximation membership, which computes restricted relation between samples and the decision boundary, is added to ED to improve the level of samples, which are certainly in the decision boundary. Then, it maps to range of [0, 1].

In addition to assigning certain values as ED to each sample, kernel function is also used for mapping data to hyper dimension φ(.). The advantages of this mapping appear when inner product is displayed in optimization formula, then it can be replaced with the kernel trick. WSVM-FRS can be optimized as follows to discover boundary:

$$ \begin{aligned} & \min \;\frac{1}{2}\,\left\| W \right\|^{2} + C\sum\limits_{i = 1}^{N} {{\text{ED}}(x_{i} )\xi_{i} } \\ & s.t\;y_{i} (w^{\tau } \phi (x_{i} ) + b) \ge 1 - \xi_{i} ; \\ & \quad \xi_{i} \ge 0;\;i = 1, \ldots ,N. \\ \end{aligned} $$

(22)

This optimization turns into differentiable form below by adding $\alpha_{i} \ge 0$ and $\mu_{i} \ge 0$ as positive Lagrange values:

$$ \begin{aligned} & L(w,b,\alpha ) = \min \frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{i = 1}^{N} {{\text{ED}}(x_{i} )\xi_{i} } - \sum\limits_{i = 1}^{N} {\alpha_{i} \left[ {y_{i} (w^{\tau } \phi (x_{i} ) + b) - 1 + \xi_{i} } \right]} - \sum\limits_{i = 1}^{N} {\xi_{i} } \mu_{i} \\ & s.t\;\alpha_{i} \ge 0;\;\mu_{i} \ge 0,i = 1, \ldots ,N. \\ \end{aligned} $$

(23)

Since minimization of (23) is reachable by minimization of $w,b,\xi_{i}$ and maximization of $\alpha_{i} ,\mu_{i} ,$ partial derivations of $w,b,\xi_{i}$ have been set to zero as follows:

$$ \frac{\partial L(w,b,\alpha )}{{\partial w}} = 0 \Rightarrow w - \sum\limits_{i = 1}^{N} {\alpha_{i} y_{i} \phi (x_{i} )} = 0 \Rightarrow w = \sum\limits_{i = 1}^{N} {\alpha_{i} y_{i} \phi (x_{i} )} $$

(24)

$$ \frac{\partial L(w,b,\alpha )}{{\partial b}} = 0 \Rightarrow \sum\limits_{i = 1}^{N} {\alpha_{i} y_{i} = 0} $$

(25)

$$ \begin{aligned} \frac{\partial L(w,b,\alpha )}{{\partial \xi_{i} }} & = 0 \Rightarrow \sum\limits_{i = 1}^{N} C \times {\text{ED}}(x_{i} ) - \sum\limits_{i = 1}^{N} {\alpha_{i} } - \sum\limits_{i = 1}^{N} {\mu_{i} } \\ & = 0 \Rightarrow \sum\limits_{i = 1}^{N} {(C - \alpha_{i} - \mu_{i} )} = 0\mathop \Rightarrow \limits^{{\alpha_{i} \ge 0,\mu_{i} \ge 0}} C = \alpha_{i} + \mu_{i} \\ & \Rightarrow 0 \le \alpha_{i} \le C,0 \le \mu_{i} \le C. \\ \end{aligned} $$

(26)

By replacing equations of (24) and (25) with (26), a form of DB is constructed in the form of (27):

$$ \begin{aligned} & \min \;\frac{1}{2}\sum\limits_{i} {\sum\limits_{j} {\alpha_{i} \alpha_{j} y_{i} y_{j} k(x_{i} ,x_{j} )} } - \sum\limits_{i} {\alpha_{i} } \\ & s.t\;\sum\limits_{i} {\alpha_{i} y_{i} } = 0;\;0 \le \alpha_{i} \le C \times {\text{ED}}(x_{i} ),\forall i. \\ \end{aligned} $$

(27)

The above form of optimization can be solved by the well-known quadratic optimization. Positive results of (27) describe support vectors (SVs). These special samples show the boundary and bias toward special ones.

Therefore, if x is a new sample, its decision function in the form of (28):

$$ f = (w,\phi (z)) + b = \sum\limits_{i = 1}^{N} {\alpha_{i} y_{i} K(x_{i} ,z)} + b = \sum\limits_{i \in sv} {y_{i} K(x_{i} ,z) + b} . $$

(28)

In WSVM-FRS, the boundary is shaped by more effective SVs, the decision boundary becomes more accurate. In the proposed method, grid search is used to find values of penalty and kernel values. Overall, illustration of the proposed method as a step-by step implementation flowchart in Fig. 2 is shown. The next section will demonstrate the superiority of WSVM-FRS over state-of-the-art methods based on experiments conducted on real data sets.

5 Results and discussion

Several experiments have been conducted in terms of accuracy and the value of area under the receiver operating characteristic (ROC) graph to represent the superiority of the proposed method (WSVM-FRS) in comparison with the alternative classification methods. The alternative classification methods involved in this comparison include probabilistic weighted least squares SVM (Lu et al. 2017), DS-RLSSVM (Zhou et al. 2016), Fuzzy-LSSVM (Sheng et al. 2015), RLS-SVM (Yang et al. 2014), PLR-WSVM (Tang et al. 2019) and K-SRLSSVCR (Ma et al. 2019) methods. After describing the implementation details in Subsection 5.1, the results of the experiments are shown in Subsection 5.2.

5.1 Implementation details

In order to validate WSVM-FRS, experiments have been carried out over the real-world data sets taken from the UCI data set repository (Lichman 2013). The case and mechanism of data sets applicability is based on classification under uncertainty due to decision-based models in real-world application because accurate decision is very important such as in medical data, time series data, letter data, etc. which have been described in UCI data set (Lichman 2013).

For all the experiments, the tenfold cross-validation procedure has been used. That is, each data set was divided into ten mutually exclusive blocks, and the proposed method was applied over a training set, built with nine of the ten blocks, and the left one was used as a testing set. Each block was used as the testing set, and the average of the ten tests was reported. The selected data sets and their related parameters are listed in Table 1. In Table 1, #samples, #features, and #classes denote the number of data samples, the number of attributes and the number of classes, respectively.

Table 1 Selected data sets of UCI data repository (Lichman 2013) in the experiments

Full size table

Experiments have been carried out to evaluate the proposed method against three state-of-the-art weighted SVM methods like probabilistic weighted least squares SVM (Lu et al. 2017), DS-RLSSVM (Zhou et al. 2016), Fuzzy-LSSVM (Sheng et al. 2015), RLS-SVM (Yang et al. 2014), PLR-WSVM (Tang et al. 2019) and K-SRLSSVCR (Ma et al. 2019) methods. In these experiments, grid search has been used for tuning the regularization parameters. A typical soft-margin SVM classifier equipped with an RBF kernel has at least two hyperparameters that need to be tuned for good performance on unseen data. In the grid searching scheme, the regularization constant C (penalty parameter) is tuned within the range $\{ 0,2^{ - 1} ,2^{0} ,2^{1} ,2^{2} ,2^{3} , \ldots \}$, and a kernel parameter γ is tuned within the range $\{ 2^{ - 1} ,2^{0} ,2^{1} ,2^{2} ,2^{3} ,2^{4} , \ldots \}$. These experiments have been executed by a computer with an Intel Core i3 2.4 GHz CPU, 8 GB DDR III memory and software of MATLAB R2015b over Microsoft Windows 7 OS. In experiments, libsvm (Chang and Lin 2011) is employed to implement the base classifier of SVM.

The kernel used in all experiments is a radial basis function:

$$ f(x,z) = \exp \left( { - \frac{{\left\| {x - z} \right\|^{2} }}{{2\sigma^{2} }}} \right). $$

(29)

5.2 Experimental results

The final results of SVM classification can be summarized in four groups: true positive rate is the proportion of positive samples that were correctly identified (TP), false-positive rate is the proportion of negatives samples that were incorrectly classified as positive (FP), true negative rate is defined as the proportion of negatives samples that were classified correctly (TN), and finally false negative rate is the proportion of positives samples that were incorrectly classified as negative (FN). These sample segregations are briefly shown in Table 2.

Table 2 Confusion matrix

Full size table

Accuracy is a popular index displaying the percentages of samples which is truly described as the result. The accuracy can be computed using Eq. (30):

$$ {\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{({\text{TP}} + {\text{FN}}) + ({\text{FP}} + {\text{TN}})}}. $$

(30)

The results of experiments in the form of accuracy are described in Table 3. In most cases, classification accuracy WSVM-FRS has better results than the others.

Table 3 Result of average SVM classification accuracy

Full size table

A statistical significance analysis was performed by considering the nonparametric Wilcoxon signed-rank test (Demsar 2006) to analyze the results and derive strong conclusions. We used this test to determine whether the improvement in the proposed method (WSVM-FRS) is relevant. The last two rows of Tables 3, 4 and 5 present the results of Wilcoxon test. A significance level 0.05 has been considered for the analysis. Number “1” represents that WSVM-FRS significantly improves over the other competing methods in terms of the average SVM classification accuracy measure.

Table 4 Result of average SVM classification precision

Full size table

Table 5 Result of average SVM classification recall

Full size table

Table 3 shows the result of average SVM classification accuracy for the competing methods. In some of the data sets, the proposed WSVM-FRS method has a better classification accuracy than the other methods (Sheng et al. 2015; Zhou et al. 2016; Lu et al. 2017; Yang et al. 2014; Tang et al. 2019; Ma et al. 2019), which is shown in bold. For all data sets, the rank of each method is expressed in Table 3, and the average rank is shown. The best rank in any data set is highlighted. The classification accuracy of SVM training has been shown in terms of percentage (%).

The proposed method has the best performance and the top rank in rank average, compared to the other methods; it has, considerably, performed better than the other methods concerning the amount of ranks average. WSVM-FRS method is successful in six out of 20 existing data sets such as Sonar, Heart, E. coli, Diabetes, Yeast, and Letter in in terms of increasing the SVM classification accuracy, compared to the other methods; it has the best performance. The proposed method has the second to third rank among eight data sets. However, it has between the fourth and sixth rank in six of the data sets in increasing the SVM classification accuracy which means that WSVM-FRS method is different from the other methods and chosen as the best.

For some data sets, other methods have better performance in terms of increasing the classification accuracy. For example, in case of Ionosphere, Transfusion, and Vowel data sets, DS-RLSSVM (Zhou et al. 2016) obtained better results. Although DS-RLSSVM method has, among the average ranks, won the second rank in terms of increasing the classification accuracy, in some data sets, it has the second to sixth rank in increasing the classification accuracy. PLR-WSVM (Tang et al. 2019) method has the third rank average in comparison with competing methods which has had the best result in Glass and Wdbc data sets but in others this method has obtained between the second and seventh rank. After that, K-SRLSSVCR (Ma et al. 2019) method has the fourth rank average among other methods and this method has won in Haberman and Segment data sets but in other data sets has earned different ranks. Similarly, in some cases, probabilistic weighted LS-SVM (Lu et al. 2017) method has the best performance, in Musk, Pendigits, and Satimage data sets, in comparison with other methods; it has won the first rank. Probabilistic weighted LS-SVM method has won the fifth rank with considerable difference with the average rank of WSVM-FRS methods. Probabilistic weighted LS-SVM method does not have an acceptable performance in some data sets; it has the worst performance in increasing the classification accuracy, in four data sets, compared to the other methods. Fuzzy-LSSVM (Sheng et al. 2015) method has the sixth rank average. It has the best performance in terms of increasing the classification accuracy in Iris data set compared to the other methods. It stands far away from the proposed method in ranks average for increasing the SVM classification accuracy. RLS-SVM (Yang et al. 2014) has the seventh rank average in terms of increasing the SVM classification accuracy which is the worst method among ranks average compared to the other methods; it has the best performance in Liver, and Vehicle data sets as well as the first rank. It has the biggest difference with WSVM-FRS method concerning rank average.

The two last rows of Table 3 show the results of the Wilcoxon test comparing the classification accuracy of WSVM-FRS against the other competing methods. A significance level 0.05 has been considered for the analysis. Number “1” represents that WSVM-FRS significantly improves over the other competing methods in terms of classification accuracy.

In pattern recognition and information retrieval classification, precision (also called positive predictive value) is the fraction of retrieved samples that are relevant, while recall (also known as sensitivity) is the fraction of relevant samples that are retrieved. Precision and recall can be measured by (31) and (32), respectively. The results of precision comparison are listed in Table 4, while Table 5 shows experimental results of recall.

$$ {\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}} $$

(31)

$$ {\text{Recall}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}. $$

(32)

Table 4 shows the result of average SVM classification precision for the competing methods. In some of the data sets, the proposed WSVM-FRS method has a better classification precision than the other methods (Sheng et al. 2015; Zhou et al. 2016; Lu et al. 2017; Yang et al. 2014; Tang et al. 2019; Ma et al. 2019), which is shown in bold. For all data sets, the rank of each method is expressed in Table 4, and the average rank is shown. The best rank in any data set is highlighted. The classification precision of SVM training has been shown in terms of percentage (%).

The proposed method has the best performance and the top rank in rank average, compared to the other methods; it has, considerably, performed better than the other methods concerning the amount of ranks average. WSVM-FRS method is successful in six out of 20 existing data sets such as Sonar, Heart, E. coli, Diabetes, Yeast, and Letter in in terms of increasing the SVM classification precision measure, compared to the other methods; it has the best performance. The proposed method has the second to third rank among 9 data sets. However, it has between the fourth and sixth rank in five of the data sets in increasing the SVM classification precision which means that WSVM-FRS method is different from the other methods and chosen as the best.

For some data sets, other methods have better performance in terms of increasing the classification precision. For example, in case of Transfusion, Musk, Pendigits, Satimage, Vowel, and Shuttle data sets, probabilistic weighted LS-SVM (Lu et al. 2017) obtained better results. Although probabilistic weighted LS-SVM method has, among the average ranks, won the second rank in terms of increasing the classification precision, in some data sets, it has the second to seventh rank in increasing the classification precision. Similarly, in some cases, DS-RLSSVM (Zhou et al. 2016) method has the best performance, in Ionosphere data set, in comparison with other methods; it has won the first rank. DS-RLSSVM method has won the third rank with considerable difference with the average rank of WSVM-FRS methods. DS-RLSSVM method does not have an acceptable performance in some data sets, compared to the other methods. PLR-WSVM (Tang et al. 2019) method has the forth rank average in comparison with competing methods which has had the best result in Glass and Wdbc data sets but in others this method has obtained between the second and seventh rank. After that, K-SRLSSVCR (Ma et al. 2019) method has the fifth rank average among other methods and this method has won in Haberman and Segment data sets but in other data sets has earned different ranks.

Similarity, Fuzzy-LSSVM (Sheng et al. 2015) method has the sixth rank average. It has the best performance in terms of increasing the classification precision in Iris data sets compared to the other methods. It stands far away from the proposed method in ranks average for increasing the SVM classification precision. RLS-SVM (Yang et al. 2014) has the seventh rank average in terms of increasing the SVM classification precision which is the worst method among ranks average compared to the other methods; it has the best performance in Liver and Vehicle data sets as well as the first rank. It has the biggest difference with WSVM-FRS method concerning rank average.

The two last rows of Table 4 show the results of the Wilcoxon test comparing the classification precision of WSVM-FRS against the other competing methods. A significance level 0.05 has been considered for the analysis. Number “1” represents that WSVM-FRS significantly improves over the other competing methods in terms of classification precision.

Table 5 shows the result of average SVM classification recall for the competing methods. In some of the data sets, the proposed WSVM-FRS method has a better classification precision than the other methods (Sheng et al. 2015; Zhou et al. 2016; Lu et al. 2017; Yang et al. 2014; Tang et al. 2019; Ma et al. 2019), which is shown in bold. For all data sets, the rank of each method is expressed in Table 5, and the average rank is shown. The best rank in any data set is highlighted. The classification recall of SVM training has been shown in terms of percentage (%).

The proposed method has the best performance and the top rank in rank average, compared to the other methods; it has, considerably, performed better than the other methods concerning the amount of ranks average. WSVM-FRS method is successful in 6 out of 20 existing data sets such as Sonar, Heart, E. coli, Diabetes, Yeast, and Letter in terms of increasing the SVM classification recall, compared to the other methods; it has the best performance. The proposed method has the second to third rank among 11 data sets. However, it does have between fourth and sixth in three of the data sets in increasing the SVM classification recall which means that WSVM-FRS method is different from the other methods and chosen as the best.

For some data sets, other methods have better performance in terms of increasing the classification precision. For example, in case of Ionosphere, Transfusion, and Vowel data sets, DS-RLSSVM (Zhou et al. 2016) obtained better results. Although DS-RLSSVM method has, among the average ranks, won the second rank in terms of increasing the classification recall, in some data sets, it has the second to seventh rank in increasing the classification recall. RLS-SVM (Yang et al. 2014) has the third rank average in terms of increasing the SVM classification recall which is the worst method among ranks average compared to the other methods; it has the best performance in Liver, and Vehicle data sets as well as the first rank. After that, K-SRLSSVCR (Ma et al. 2019) method has the fourth rank average among other methods and this method has won in Haberman and Segment data sets but in other data sets has earned different ranks. Similarly, in one case, Fuzzy-LSSVM (Sheng et al. 2015) method has the best performance, in Iris data set, in comparison with other methods; it has won the first rank. Fuzzy-LSSVM method has won the fourth rank with considerable difference with the average rank of WSVM-FRS methods. Fuzzy-LSSVM method does not have an acceptable performance in some data sets, compared to the other methods. Probabilistic weighted LS-SVM (Lu et al. 2017) method has the fifth rank average. It has the best performance in terms of increasing the classification recall in Musk, Pendigits, Satimage, and Shuttle data sets compared to the other methods. It stands far away from the proposed method in ranks average for increasing the SVM classification recall. Finally, PLR-WSVM (Tang et al. 2019) method has the fifth rank average in comparison with competing methods which has had the best result in Glass and Wdbc data sets but in others this method has obtained between the second and seventh rank.

The two last rows of Table 5 show the results of the Wilcoxon test comparing the classification recall of WSVM-FRS against the other competing methods. A significance level 0.05 has been considered for the analysis. Number “1” represents that WSVM-FRS significantly improves over the other competing methods in terms of classification recall.

As the number of positive samples is smaller than the negative ones, the performance of SVM classification in segregating positive samples is more important. Therefore, the value of area under the receiver operating characteristic (ROC) graph, known as area under curve (AUC), is computed as another metric. ROC graph is constructed by plotting TPR in contrast to false-positive rate (FPR). Hence, more AUC in the form of (33) shows better ability to distinguish positive samples:

$$ {\text{AUC}} = (1 + {\text{TPR}} - {\text{FPR}})/2. $$

(33)

Based on Fig. 3, AUC metric represents special power of WSVM-FRS to gain data characteristics of positive samples. This ability causes better results of the proposed method rather than the others. Finally, through analyzing several experiments on data sets taken from the UCI repository, the superiority of WSVM-FRS over state-of-the-art methods (Sheng et al. 2015; Zhou et al. 2016; Lu et al. 2017; Yang et al. 2014; Tang et al. 2019; Ma et al. 2019) is proved.

Overall, in accuracy, precision and recall metrics, WSVM-FRS has had good performance in data sets that have many classes. Also in some data sets, the proposed method has been able to satisfy all metrics to face with large number datasets and in high-dimension data sets WSVM-FRS to some extent has resulted as well.

5.2.1 Noise analysis

The presence of noise in training data has strong and negative impact on the performance of learning algorithms. Thus, methods should be sufficiently resistant and be able to deal with them. In order to present a deeper discussion and show that the proposed method has better results than competing methods (Sheng et al. 2015; Zhou et al. 2016; Lu et al. 2017; Yang et al. 2014; Tang et al. 2019; Ma et al. 2019), the most common type of artificial noise has been used which is called uniform random addition (Zhu and Wu 2004). Therefore, class noise and attribute noise have been added. The noise level has been raised in an interval of 0% (original datasets) to 30%. To evaluate the impact of noise level in terms of accuracy, the proposed method and all the other comparing methods have been performed on each dataset. The results are demonstrated in Fig. 4 for each dataset. The x-axis indicates the noise level, and the y-axis represents the classification accuracy from different types of classifiers trained.

To keep looking at Fig. 4, when the noise level increases, classification accuracy of all methods has been decreasing dramatically. Furthermore, it is clear that the more noise is added to the datasets, the more proposed method is resistant against the other methods. This is because of a strong and satisfactory classifier based on fuzzy rough set strategy which has been applied. Results indicate the superiority of the proposed method in comparison with the others in dealing with noisy data.

5.2.2 Real-world data set analysis

In this section, two various real-world data sets are considered to illustrate performance of the proposed method (WSVM-FRS) in comparison with some state-of-the-art methods.

MNIST data set To continue evaluation of the proposed method, specific real-world data set, namely MNIST (LeCun et al. 2010), has been considered, which is related to the handwritten digits date set and is a dataset of simple gray handwritten digits, while ImageNet is a large-scale dataset of labeled high-resolution images. MNIST has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. Description of MNIST data sat is illustrated in Table 6. Also demonstration of the dataset is shown in Fig. 5.

Table 6 Files contained in the MNIST dataset

Full size table

Experimental results on MNIST dataset Experiment on MNIST data set has been carried out to evaluate the proposed method against three state-of-the-art weighted SVM methods like probabilistic weighted least squares SVM (Lu et al. 2017), DS-RLSSVM (Zhou et al. 2016), Fuzzy-LSSVM (Sheng et al. 2015), RLS-SVM (Yang et al. 2014), PLR-WSVM (Tang et al. 2019) and K-SRLSSVCR (Ma et al. 2019) methods, which are shown in Table 7. In this table, three evaluation measure have been considered such as classification accuracy, recall and precision.

Table 7 Result of the proposed method performance compared to other competitors regarding MNIST data set

Full size table

Table 7 shows the result of average SVM classification accuracy, recall and precision for the competing methods. Regarding MNIST data set, the proposed WSVM-FRS method has a better classification performance than the other methods (Sheng et al. 2015; Zhou et al. 2016; Lu et al. 2017; Yang et al. 2014; Tang et al. 2019; Ma et al. 2019), which is shown in bold. For all methods, the rank of each method is expressed in Table 7, and the average rank is shown. The best rank is highlighted, which approves the best performance of the proposed method. The classification of three mentioned evaluations of SVM training has been shown in terms of percentage (%).

Fashion-MNIST dataset Also, other type of real-world data set, which has been used in evaluations, is Fashion-MNIST Dataset (Cohen et al. 2017). Fashion-MNIST is based on the assortment on Zalando’s website. Every fashion product on Zalando has a set of pictures shot by professional photographers, demonstrating different aspects of the product, i.e., front and back looks, details, looks with model and in an outfit. The original picture has a light-gray background (hexadecimal color: #fdfdfd) and stored in 762 × 1000 JPEG format. For efficiently serving different frontend components, the original picture is resampled with multiple resolutions, e.g., large, medium, small, thumbnail and tiny.

The front look thumbnail images of 70, 000 unique products in used to build Fashion-MNIST. Those products come from different gender groups: men, women, kids and neutral. In particular, whitecolor products are not included in the dataset as they have low contrast to the background. The thumbnails (51 × 73) are then fed into the following conversion pipeline, which is visualized in Fig. 6.

1.
Converting the input to a PNG image.
2.
Trimming any edges that are close to the color of the corner pixels. The “closeness” is defined by the distance within 5% of the maximum possible intensity in RGB space.
3.
Resizing the longest edge of the image to 28 by subsampling the pixels, i.e., some rows and columns are skipped over.
4.
Sharpening pixels using a Gaussian operator of the radius and standard deviation of 1.0, with increasing effect near outlines.
5.
Extending the shortest edge to 28 and put the image to the center of the canvas.
6.
Negating the intensities of the image.
7.
Converting the image to 8-bit grayscale pixels.

The dataset is divided into a training and a test set. The training set receives a randomly selected 6000 examples from each class. Images and labels are stored in the same file format as the MNIST data set, which is designed for storing vectors and multidimensional matrices. The result files are listed in Table 8. Examples have been sorted by their labels while storing, resulting in smaller label files after compression comparing to the MNIST. It is also easier to retrieve examples with a certain class label.

Table 8 Files contained in the Fashion-MNIST dataset

Full size table

For the class labels, the silhouette code of the product has been used. The silhouette code is manually labeled by the in-house fashion experts and reviewed by a separate team at Zalando. Each product contains only one silhouette code. Table 9 gives a summary of all class labels in Fashion-MNIST with examples for each class.

Table 9 Class names and example images in Fashion-MNIST dataset

Full size table

Experimental results on fashion-MNIST dataset Experiment on Fashion-MNIST Dataset has been conducted out to evaluate the proposed method compared to three state-of-the-art weighted SVM methods like probabilistic weighted least squares SVM (Lu et al. 2017), DS-RLSSVM (Zhou et al. 2016), Fuzzy-LSSVM (Sheng et al. 2015), RLS-SVM (Yang et al. 2014), PLR-WSVM (Tang et al. 2019) and K-SRLSSVCR (Ma et al. 2019) methods, which are demonstrated in Table 10. In this table, three evaluation measure have been considered such as classification accuracy, recall and precision.

Table 10 Result of the proposed method performance compared to other competitors regarding Fashion-MNIST data set

Full size table

Table 10 shows the result of average SVM classification accuracy, recall and precision for the competing methods. Regarding Fashion-MNIST data set, the proposed WSVM-FRS method has a better classification performance than the other methods (Sheng et al. 2015; Zhou et al. 2016; Lu et al. 2017; Yang et al. 2014; Tang et al. 2019; Ma et al. 2019), which is shown in bold. For all methods, the rank of each method is expressed in Table 10, and the average rank is shown. The best rank is highlighted, which approves the best performance of the proposed method. The classification of three mentioned evaluations of SVM training has been shown in terms of percentage (%).

6 Conclusions and future works

In this paper, a novel method, namely WSVM-FRS, has been introduced to reduce uncertainty effect, say, noise in SVM training. The primary reason of introducing the method is consideration of sophisticated information in the real-world applications under uncertainty so that it would retain training effectiveness against of challenging and large-scale datasets, without losing satisfactory speed.

WSVM-FRS has introduced a novel weighted support vector machine to improve the noisy sensitivity problem of standard support vector machine for multiclass data classification. To keep basic idea, weighted coefficient has been added to the penalty term Lagrangian formula for optimization problem, which is called entropy degree, using lower and upper approximation for membership function in fuzzy rough set theory. Consequently, noisy samples have low degree, and important samples have high degree. The performance of WSVM-FRS has been examined on 20 data sets taken from the UCI repository and real-world data sets so that the results have been compared to six other algorithms in recent literature. Experimental results demonstrate that the proposed method has good classification accuracy, precision and recall due to consideration and handling uncertainty aspect including noisy sample. The Wilcoxon test proves that the methods is more statistically different, in terms of appropriated performance, regarding accuracy, precision and recall metrics.

In the future, we not only would endeavor to enhance effectiveness of WSVM-FRS in dealing with data stream, but also would introduce more accurate kernel using ED concept when the method would face with more challenging real-world data sets. Furthermore, we are fascinated to seek other aspects of weighted SVM in order to reduce the noise more effectively and quickly.

References

Alcantud JCR, Díaz S, Montes S (2019) Liberalism and dictatorship in the problem of fuzzy classification. Int J Approx Reason 110:82–95
Article MathSciNet MATH Google Scholar
Bian H, Mazlack L (2003) Fuzzy-rough nearest-neighbor classification approach. In: IEEE 22nd international conference of the North American Fuzzy Information Processing Society, pp 500–505
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Google Scholar
Changdar C, Pal RK, Mahapatra GS (2016) A genetic ant colony optimization based algorithm for solid multiple travelling salesmen problem in fuzzy rough environment. Soft Comput 21:1–15
Google Scholar
Chen D, Yang W, Li F (2008) Measures of general fuzzy rough sets on a probabilistic space. Inf Sci 178:3177–3187
Article MathSciNet MATH Google Scholar
Cohen S Afshar JT, van Schaik A (2017) Emnist: an extension of mnist to handwritten letters. arXiv preprint http://arxiv.org/abs/1702.05373
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Article MATH Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Derrac J, Verbiest N, García S, Cornelis C, Herrera F (2013) On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection. Soft Comput 17:223–238
Article Google Scholar
Ding S, Zhang X, An Y, Xue Y (2017) Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification. Pattern Recognit 67:32–46
Article Google Scholar
Du W, Cao Z, Song T, Li Y, Liang Y (2017) A feature selection method based on multiple kernel learning with expression profiles of different types. BioData Min 10:1–16
Article Google Scholar
Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. Int J Gen Syst 17:191–209
Article MATH Google Scholar
Hamidzadeh J, Ghadamyari R (2019) Clustering data stream with uncertainty using belief function theory and fading function. Soft Comput 24:127–138
Google Scholar
Hamidzadeh J, Moradi M (2018) Improved one-class classification using filled function. Appl Intell 48:1–17
Article Google Scholar
Hamidzadeh J, Moradi M (2020) Enhancing data analysis: uncertainty-resistance method for handling incomplete data. Appl Intell 50:74–86
Article Google Scholar
Hamidzadeh J, Namaei N (2018) Belief-based chaotic algorithm for support vector data description. Soft Comput 23:1–26
MATH Google Scholar
Hamidzadeh J, Monsefi R, Yazdi HS (2014) LMIRA: large margin instance reduction algorithm. Neurocomputing 145:477–487
Article Google Scholar
Hamidzadeh J, Sadeghi R, Namaei N (2017) Weighted support vector data description based on Chaotic bat algorithm. Appl Soft Comput 60:540–551
Article Google Scholar
Han D, Liu W, Dezert J, Yang Y (2016) A novel approach to pre-extracting support vectors based on the theory of belief functions. Knowl Based Syst 110:210–223
Article Google Scholar
Hsu CW, Lin CJ (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13:415–425
Article Google Scholar
Javid M, Hamidzadeh J (2019) An active multi-class classification using privileged information and belief function. Int J Mach Learn Cybern 11:1–14
Google Scholar
Karal O (2017) Maximum likelihood optimal and robust Support Vector Regression with lncosh loss function. Neural Netw 94:1–12
Article MATH Google Scholar
LeCun Y, Cortes C, Burges CJC (2010) The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist
Li J, Wang Y, Cao Y, Xu C (2016) Weighted doubly regularized support vector machine and its application to microarray classification with noise. Neurocomputing 173:595–605
Article Google Scholar
Li J, Wang J, Zheng Y, Xiao H (2017) Microarray classification with noise via weighted adaptive elastic net. In: IEEE data driven control and learning systems, pp 26–27
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml.
Liu Z, Dezert J, Pan Q, Mercier G (2011) Combination of sources of evidence with different discounting factors based on a new dissimilarity measure. Decis Support Syst 52:133–141
Article Google Scholar
Liu Z, Pan Q, Dezert J, Mercier G (2015) Credal c-means clustering method based on belief functions. Knowl Based Syst 74:119–132
Article Google Scholar
Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recognit 52:85–957
Article Google Scholar
Lu X, Liu W, Zhou C, Huang M (2017) Probabilistic weighted support vector machine for robust modeling with application to hydraulic actuator. IEEE Trans Ind Inform 13(4):1723–1733
Article Google Scholar
Ma J, Zhou S, Li. Chen, W. Wang, Z. Zhang, (2019) A sparse robust model for large scale multi-class classification based on K-SVCR. Pattern Recognit Lett 117:16–23
Article Google Scholar
Mao WT, Xu JC, Wang C et al (2014) A fast and robust model selection algorithm for multi-input multi-output support vector machine. Neurocomputing 130:10–19
Article Google Scholar
Moghaddam VH, Hamidzadeh J (2016) New Hermite orthogonal polynomial kernel and combined kernels in Support Vector Machine classifier. Pattern Recognit 60:921–935
Article MATH Google Scholar
Nguyen VL, Desterck S, Masson MH (2018) Partial data querying through racing algorithms. Int J Approx Reason 96:36–55
Article MathSciNet MATH Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356
Article MATH Google Scholar
Santhanama V, Morariua VI, Harwooda D, Davisa LS (2016) A non-parametric approach to extending generic binary classifiers for multi-classification. Pattern Recogn 58:149–158
Article Google Scholar
Shafer G (1976) A mathematical theory of evidence. Princeton University Press
Book MATH Google Scholar
Sheng H, Xiao J, Wang Z, Li F (2015) Electric vehicle state of charge estimation: nonlinear correlation and fuzzy support vector machine. J Power Sources 281:131–137
Article Google Scholar
Singh S, Shreevastava S, Som T, Somani G (2020) A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems. Soft Comput 24:4675–4691
Article MATH Google Scholar
Sivasankar E, Selvi C, Mahalakshmi S (2020) Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method. Soft Comput 24:3975–3988
Article Google Scholar
Sun W, Liu C, Xu Y, Tian L, Li W (2017) A band-weighted support vector machine method for hyperspectral imagery classification. IEEE Geosci Remote Sens Soc 14:1710–1714
Article Google Scholar
Tang H, Dong P, Shi Y (2019) A new approach of integrating piecewise linear representation and weighted support vector machine for forecasting stock turning points. Appl Soft Comput 78:685–696
Article Google Scholar
Vanir V (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Article Google Scholar
Vapnik V (1995) The nature of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Article MATH Google Scholar
Verbiest N, Cornelis C, Herrera F (2013a) FRPS: a fuzzy rough prototype selection method. Pattern Recognit 46:2770–2782
Article MATH Google Scholar
Verbiest N, Cornelis C, Herrera F (2013b) FRPS: a fuzzy rough prototype selection method. Pattern Recognit 46(10):2770–2782
Article MATH Google Scholar
Wu WZ, Leung Y, Zhang WX (2002) Connections between rough set theory and Dempster–Shafer theory of evidence. Int J Gen Syst 31:405–430
Article MathSciNet MATH Google Scholar
Xu Q, Zhang J, Jiang C, Huang X, He Y (2015) Weighted quantile regression via support vector machine. Expert Syst Appl 42:5441–5451
Article Google Scholar
Xu P, Davoine F, Zha H, Denœuxa T (2016) Evidential calibration of binary SVM classifiers. Int J Approx Reason 72:55–70
Article MathSciNet MATH Google Scholar
Xue Y, Zhang L, Wang B, Zhang Z, Li F (2018) Nonlinear feature selection using Gaussian kernel SVM-RFE for fault diagnosis. Appl Intell 48:1–26
Article Google Scholar
Yang L, Xu Z (2017) Feature extraction by PCA and diagnosis of breast tumors using SVM with DE-based parameter tuning. Int J Mach Learn Cyber 10:1–11
MathSciNet Google Scholar
Yang X, Song Q, Cao A (2005) Weighted support vector machine for data classification. IEEE Int Joint Conf Neural Netw 2:859–864
Google Scholar
Yang X, Tan L, He L (2014) A robust least squares support vector machine for regression and classification with noise. Neurocomputing 140:41–52
Article Google Scholar
Yao Y, Lingras P (1998) Interpretations of belief functions in the theory of rough sets. Inf Sci 104:81–106
Article MathSciNet MATH Google Scholar
Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353
Article MATH Google Scholar
Zhai J (2011) Fuzzy decision tree based on fuzzy-rough technique. Soft Comput 15:1087–1096
Article Google Scholar
Zhang W, Yu L, Yoshida T, Wang Q (2018) Feature weighted confidence to incorporate prior knowledge into support vector machines for classification. Knowl Inf Syst 58:1–27
Google Scholar
Zhou C, Lu X, Huang M (2016) Dempster–Shafer theory-based robust least squares support vector machine for stochastic modelling. Neurocomputing 182:145–153
Article Google Scholar
Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22:177–210
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Salman Institute of Higher Education, Mashhad, Iran
Somaye Moslemnejad
Faculty of Computer Engineering and Information Technology, Sadjad University of Technology, Mashhad, Iran
Javad Hamidzadeh

Authors

Somaye Moslemnejad
View author publications
You can also search for this author in PubMed Google Scholar
Javad Hamidzadeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javad Hamidzadeh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moslemnejad, S., Hamidzadeh, J. Weighted support vector machine using fuzzy rough set theory. Soft Comput 25, 8461–8481 (2021). https://doi.org/10.1007/s00500-021-05773-7

Download citation

Accepted: 26 March 2021
Published: 20 April 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s00500-021-05773-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Weighted support vector machine using fuzzy rough set theory

Abstract

Similar content being viewed by others

Applications of Boolean Kernels in Rough Sets

IOWA Rough-Fuzzy Support Vector Data Description

Learning Rough Set Based Classifiers Using Boolean Kernels

1 Introduction

2 Related works

3 Preliminaries

3.1 Support vector machine

3.2 Fuzzy rough set

4 Proposed method

5 Results and discussion

5.1 Implementation details

5.2 Experimental results

5.2.1 Noise analysis

5.2.2 Real-world data set analysis

6 Conclusions and future works

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weighted support vector machine using fuzzy rough set theory

Abstract

Similar content being viewed by others

Applications of Boolean Kernels in Rough Sets

IOWA Rough-Fuzzy Support Vector Data Description

Learning Rough Set Based Classifiers Using Boolean Kernels

Explore related subjects

1 Introduction

2 Related works

3 Preliminaries

3.1 Support vector machine

3.2 Fuzzy rough set

4 Proposed method

5 Results and discussion

5.1 Implementation details

5.2 Experimental results

5.2.1 Noise analysis

5.2.2 Real-world data set analysis

6 Conclusions and future works

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation