A novel fuzzy twin support vector machine based on centered kernel alignment

Xie, Jialiang; Qiu, Jianxiang; Zhang, Dongxiao; Zhang, Ruping

doi:10.1007/s00500-024-09917-3

A novel fuzzy twin support vector machine based on centered kernel alignment

Data analytics and machine learning
Published: 24 July 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

A novel fuzzy twin support vector machine based on centered kernel alignment

Download PDF

Jialiang Xie¹,
Jianxiang Qiu¹,
Dongxiao Zhang¹ &
…
Ruping Zhang²

57 Accesses
Explore all metrics

Abstract

Twin Support Vector Machine (TSVM) transforms a single large quadratic programming problem (QPP) in support vector machine (SVM) into two smaller QPPs by finding two non-parallel classification hyperplanes, so that its computational time is reduced to a quarter of what the traditional SVM takes. However, TSVM ignores the data distribution of class, which makes TSVM sensitive to noise. In this paper, a fuzzy twin support vector machine based on centered kernel alignment (FTSVM-CKA) is proposed to solve the problem that TSVM is sensitive to noise. Firstly, a feature-weighted kernel function is constructed by using the information gain, and it is applied to the calculation of the centered kernel alignment (CKA). This assigns greater weight to strongly correlated features, emphasizing their classification importance over weakly correlated features. Secondly, the CKA method is utilized to derive a heuristic function for calculating the dependency between samples and their corresponding labels, which assigns fuzzy membership to different samples. Based on this, a fuzzy membership assignment strategy is proposed that can effectively address the sensitivity of TSVM to noise. Thirdly, this strategy is combined with TSVM to propose the FTSVM-CKA model. Moreover, this study employs a coordinate descent strategy with shrinking by active set to tackle the computational complexity arising from high-dimensional inputs. This can effectively accelerate the training speed of the model while ensuring classification performance. In order to evaluate the performance of FTSVM-CKA, this study conducts experiments designed on artificial and UCI datasets. The results demonstrate that FTSVM-CKA can efficiently and quickly solve binary classification problems with noise.

Twin Support Vector Machines Classifier Based on Intuitionistic Fuzzy Number

A new fuzzy twin support vector machine for pattern classification

Article 06 April 2017

Fuzzy twin support vector machine based on affinity and class probability for class imbalance learning

Article 12 June 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Support Vector Machine (SVM) (Cortes and Vapnik 1995) utilizes the principle of structural risk minimization and solves a convex quadratic programming problem (QPP) to find the optimal hyperplane, making it an effective machine learning algorithm for solving pattern recognition problems. Due to its theoretical advantages and excellent generalization performance, SVM is widely used in many fields. However, traditional SVM has high computational complexity and it is difficult to rapidly process huge and complex data. To solve this problem, Khemchandani and Chandra (2007) proposed the Twin Support Vector Machine (TSVM). Unlike conventional SVM, TSVM aims to find two non-parallel classification hyperplanes and makes each plane move closer to one class and stay as far as possible from the other. Furthermore, the single large QPP in SVM is transformed into two smaller QPPs, so that the computational time of TSVM is reduced to a quarter of that of traditional SVM. When dealing with large-scale classification problems (Xie et al. 2023a, b), TSVM exhibits shorter training times and lower training costs, which overcome the shortcomings of existing SVMs. Moreover, TSVM is also superior to some existing models in terms of classification performance (Tanveer et al. 2022b). Therefore, TSVM has been widely used in many fields, such as Alzheimer’s disease prediction (Ganaie et al. 2023; Sharma et al. 2022), EEG signal classification (Ganaie et al. 2022a; Hazarika et al. 2023), and text recognition (Francis and Sreenath 2022), etc.

It is worth noting that TSVM and SVM ignore the data distribution of class, which makes them sensitive to noise (Liang and Zhang 2022). To address this issue, researchers have combined fuzzy sets theory with them. Different fuzzy membership assignment strategies are proposed to describe the influence of different samples on the construction of the optimal hyperplane. Then, the negative impact of noise is reduced, and the classification performance is improved (Ganaie et al. 2022, 2020). For example, Yu et al. (2019) utilized a K-nearest neighbors-based probability density estimation alike strategy to calculate the relative density of each training instance, and thus proposed a fuzzy support vector machine with relative density information. Borah and Gupta (2022) incorporated fuzzy membership values, computed using transformed class probability and class affinity, into the objective function of least squares support vector machine type formulation, and then propose an affinity and transformed class probability-based fuzzy least squares support vector machine. Kung and Hao (2023) proposed a fuzzy least squares support vector machine with fuzzy hyperplane. The two key characteristics of the proposed model are that it assigns fuzzy membership degrees to every data vector according to the importance and the parameters for the hyperplane, such as the elements of normal vector and the bias term, are fuzzified variables.

In order to reduce the impact of outliers, Richhariya and Tanveer (2018) proposed a new fuzzy membership function that takes into account both the importance of samples and the data imbalance ratio. This function is combined with the least squares twin support vector machine to effectively address the class imbalance problems. Chen and Wu (2018) employed some available fuzzy membership functions from fuzzy neural networks to weigh the margin of each training sample. By design, the impact of the samples with high uncertainty can be mitigated, which improves the generalization ability of the model. Gupta et al. (2019) proposed a fuzzy membership assignment strategy based on information entropy and combined with TSVM for class imbalance learning. Hao et al. (2021) evaluated which fuzzy hyperplanes each sample lies closest to by defining the fuzzy partial ordering relation and then developed a novel fuzzy TSVM to merge the large volume of information from online news, using this to predict stock price trends. Ganaie et al. (2021) proposed a novel fuzzy least squares projection twin support vector machine, which seeks projections such that the samples of each class are clustered around its corresponding mean and assigns fuzzy weights to each sample to reduce the effect of outliers. Motivated by the idea of angle-based algorithms, Richhariya et al. (2021) proposed an efficient angle-based universum least squares twin support vector machine (AULSTSVM). It is capable of handling heteroscedastic noise in large-scale datasets. Richhariya et al. (2021a) proposed a fuzzy universum least squares twin support vector machine, which assigns fuzzy membership to the universum data, aiming to provide appropriate data distribution information to the classifier. This approach was applied to Alzheimer’s disease and breast cancer detection.

Recently, Rezvani et al. (2019) combined intuitionistic fuzzy sets with TSVM to address the issue of sensitivity to noise, and then resulted in an extension of FTSVM known as intuitionistic fuzzy twin support vector machine (IFTSVM). In IFTSVMs, each training sample is assigned a membership degree and a non-membership degree to construct a scoring function that characterizes the sample’s importance so as to reduce the impact of noise. On this basis, Rezvani and Wang (2021, 2022) respectively used fuzzy Adaptive Resonance Theory and the weighting strategy in conjunction with IFTSVM to tackle the problem of class imbalance learning, explicitly addressing the challenges posed by large-scale class imbalance problems containing noise. Tanveer et al. (2022a) proposed a novel intuitionistic fuzzy weighted least squares twin support vector machine which uses local neighborhood information among the data points and also uses both membership and non-membership weights to reduce the effect of noise and outliers. It was applied to the diagnosis of schizophrenia disease. Ju et al. (2021) combined interval-valued fuzzy sets with TSVMs to address multi-class problems. In this method, interval-valued fuzzy membership is assigned to each sample. Then, an interval-valued fuzzy twin support vector machine is proposed, which effectively reduces the influence of noise and improves the classification performance. In conclusion, constructing an appropriate fuzzy membership assignment strategy is a crucial method to effectively solve the sensitivity to noise in TSVM.

Centered kernel alignment (CKA) is a method that can measure the degree of similarity between two kernels (or kernel matrices). It has been applied to improve the performance of machine learning algorithms due to its effectiveness and low computational complexity. For example, Lu et al. (2014) employed CKA to unify the two tasks of clustering and multiple kernel learning into a single optimization framework, and then a novel multiple kernel clustering method was proposed. In Cárdenas et al. (2016) utilized CKA to assess the affinity between the resonance imaging data kernel matrix and the label target matrix, and then an improved artificial neural network algorithm was proposed to solve the diagnosis problem of Alzheimer’s disease. Wang et al. (2020) combined CKA with SVM to propose a classification algorithm robust to noise where CKA is employed to calculate the dependence between a data point and its associated label. Therefore, it is worth investigating the use of CKA to address the sensitivity to noise.

In this paper, a fuzzy twin support vector machine based on CKA is proposed to address the problem that TSVM is sensitive to noise. This method uses a heuristic function derived from the CKA to calculate the dependence between a data point and its corresponding label and then assigns fuzzy membership to different sample points. Furthermore, a fuzzy membership assignment strategy that can effectively solve the sensitivity of TSVM to noise is proposed. In order to mitigate the dominance of weakly correlated or irrelevant features in the calculation process, a feature-weighted kernel function is constructed by using the information gain, and it is applied to the calculation of the centered kernel alignment. This gives more weight to the strongly correlated features than to the weakly correlated features in order to describe the classification importance of different features. The strategy is combined with TSVM, and then a new fuzzy twin support vector machine (FTSVM-CKA) is proposed. Moreover, to speed up the training of the model, we employ a coordinate descent strategy with shrinking by active set to reduce computational complexity. This can effectively improve the training speed of the model while maintaining the classification performance. Experiments were conducted on an artificial data set and 15 UCI data sets to validate the performance of FTSVM-CKA. The results show that FTSVM-CKA can efficiently and rapidly solve binary classification problems with noise.

In summary, the main contributions of this paper are as follows:

(1) The idea of feature weighting is integrated into the centered kernel alignment method. This paper constructs a feature weighting kernel function and applies it to the calculation of the centered kernel alignment, thus avoiding being dominated by weakly correlated or uncorrelated features in the calculation process.

(2) A fuzzy membership assignment strategy based on the centered kernel alignment method is given. This strategy can significantly reduce the negative impact of noise.

(3) Combining the fuzzy membership assignment strategy based on centered kernel alignment with TSVM, this paper proposed a FTSVM-CKA model, which could effectively solve the classification problem with noise.

(4) The computational complexity brought by the high-dimensional input is addressed by the coordinate descent strategy with shrinking by active set, which then effectively improves the classification speed of the model.

(5) For nonlinear case, kernel trick is applied directly and hence, the exact formulation is solved.

(6) Experimental results on the benchmark dataset demonstrate the ability of the proposed FTSVM-CKA to reduce the negative impact of noise.

The remaining part of this paper is organized as follows: Section 2 reviews some preliminaries. Section 3 describes the structure of the proposed FTSVM-CKA model in detail. The experimental results are reported in Sect. 4. Finally, conclusions and further work are presented in Sect. 5.

2 Related works

In this section, the model structure of TSVM is introduced, and then the concepts of centered kernel alignment and information gain are elaborated. Let $S = \left\{ {({x_1},{y_1}),({x_2},{y_2}),}\right. $ $ \left. {\ldots ,({x_l},{y_l})} \right\} $ be the training sample set, where l is the total number of training samples, ${x_i} \in {R^d}$ and ${y_i} \in \left\{ { - 1, + 1} \right\} $, $i = 1,2,\ldots ,l$ denote the ith training sample and its corresponding target class, respectively. d is the feature dimension of the sample.

2.1 Twin support vector machine

Different from the conventional SVM, TSVM aims to generate two non-parallel planes $w_1^Tx + {b_1} = 0$ and $w_2^Tx + {b_2} = 0$. Each plane is closer to one of the two classes and as far away from the other as possible. The optimization problem for TSVM can be modeled as the following two smaller scale QPPs:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{{w_1},{b_1},{\xi _2}} \mathrm{{ }}\frac{1}{2}{(A{w_1} + {e_1}{b_1})^T}(A{w_1} + {e_1}{b_1}) + {C_1}{e_2}^T{\xi _2}\\ s.t.\mathrm{{ }} - (B{w_1} + {e_2}{b_1}) + {\xi _2} \ge {e_2},\mathrm{{ }}{\xi _2} \ge 0 \end{array} \end{aligned}$$

(1)

and

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{{w_2},{b_2},{\xi _1}} \mathrm{{ }}\frac{1}{2}{(B{w_2} + {e_2}{b_2})^T}(B{w_2} + {e_2}{b_2}) + {C_2}{e_1}^T{\xi _1}\\ s.t.\mathrm{{ }}(A{w_2} + {e_1}{b_2}) + {\xi _1} \ge {e_1},\mathrm{{ }}{\xi _1} \ge 0 \end{array} \end{aligned}$$

(2)

where, A and B denote all samples belonging to the positive and negative classes, respectively. ${\xi _1}$ and ${\xi _2}$ are slack variables, ${e_1}$ and ${e_2}$ are the vector of ones with adequate length, ${C_1}$ and ${C_2}$ are penalty parameters.

By solving the dual problems of Eq. (1) and Eq. (2), two optimal hyperplanes can be obtained. For any input sample ${x^ * }$, its classification decision function is as follows:

$$\begin{aligned} {y^ * } = \mathop {\arg \min }\limits _{i \in 1,2} \frac{{\left| {w_i^T{x^ * } + {b_i}} \right| }}{{\left\| {{w_i}} \right\| }} \end{aligned}$$

(3)

2.2 Centered kernel alignment

Centered kernel alignment (CKA) (Cortes et al. 2012) measures the degree of similarity between two kernels (or kernel matrices) and has been widely used for kernel learning and selection due to its effectiveness and low computational complexity.

For data set $S = \left\{ {({x_1},{y_1}),({x_2},{y_2}),\dots ,({x_l},{y_l})} \right\} $, The kernel matrix K derived from kernel functions k is given by ${K_{i,j}} = k\left( {{x_i},{x_j}} \right) $. Given two kernel functions ${k_1}$ and ${k_2}$, their corresponding kernel matrices are ${K_1}$ and ${K_2}$, respectively. The Frobenius inner product between matrices ${K_1}$ and ${K_2}$ is expressed as follows:

$$\begin{aligned} {\left\langle {{K_1},{K_2}} \right\rangle _F} = \sum \limits _{i = 1}^l {\sum \limits _{j = 1}^l {{k_1}({x_i},{x_j}){k_2}({x_i},{x_j})} } \end{aligned}$$

(4)

Let $e = {(1,1,\ldots ,1)^T} \in {R^l}$ and $I \in {R^{l \times l}}$ is the identity matrix, then the centering matrix H and the centered kernel matrix $\overline{K} $ are calculated as follows:

$$\begin{aligned} H = I - \frac{{e{e^T}}}{l} \in {R^{l \times l}} \end{aligned}$$

(5)

$$\begin{aligned} \overline{K} = HKH \end{aligned}$$

(6)

The CKA of ${k_1}$ and ${k_2}$ on data set S is defined as

$$\begin{aligned} CKA({K_1},{K_2}) = \frac{{{{\left\langle {\overline{{K_1}} ,\overline{{K_2}} } \right\rangle }_F}}}{{\sqrt{{{\left\langle {\overline{{K_1}} ,\overline{{K_1}} } \right\rangle }_F}{{\left\langle {\overline{{K_2}} ,\overline{{K_2}} } \right\rangle }_F}} }} \end{aligned}$$

(7)

2.3 Information gain

Information gain (Han et al. 2022) is often used for feature correlation analysis.

Suppose the sample set S has m category labels ${C_i},i = 1,2,\ldots ,m$, ${S_i}$ denotes the set of all samples in S with label ${C_i}$, then the information entropy of S is defined as follows:

$$\begin{aligned} Info\left( S \right) = - \sum \limits _{i = 1}^m {{p_i}{{\log }_2}\left( {{p_i}} \right) } \end{aligned}$$

(8)

where, ${p_i} = \frac{{\left| {{S_i}} \right| }}{{\left| S \right| }}$ denotes the proportion of samples with label ${C_i}$ in the sample set S, $| \cdot |$ denotes the cardinality.

For a certain feature F, suppose it has different values ${f_i},i = 1,2,\ldots ,v$ and the sample set S is correspondingly split into ${S_i},i = 1,2,\ldots ,v$, where ${S_i}$ contains all the samples in S whose feature F take the value ${f_i}$. Then the information gain $IG\left( {S,F} \right) $ is defined as follows:

$$\begin{aligned} IG\left( {S,F} \right) = Info\left( S \right) - \sum \limits _{i = 1}^v {\frac{{\left| {{S_i}} \right| }}{{\left| S \right| }} \cdot Info\left( {{S_i}} \right) } \end{aligned}$$

(9)

3 A novel fuzzy twin support vector machine based on centered kernel alignment

In this section, we first propose a fuzzy membership assignment strategy based on centered kernel alignment. Then we elaborate the model structure of FTSVM-CKA in the linear and nonlinear cases. Finally, a coordinate descent strategy with shrinking by active set is introduced.

3.1 A fuzzy membership assignment strategy based on centered kernel alignment

Firstly, a feature-weighted kernel function is constructed by using the information gain, and it is applied to the calculation of the centered kernel alignment. This gives more weight to the strongly correlated features than to the weakly correlated features, in order to describe the classification importance of different features. Secondly, the centered kernel alignment method is employed to derive a heuristic function that calculates the dependency between sample points and their corresponding labels. This function assigns fuzzy membership degrees to different sample points, effectively mitigating the detrimental effects of noise.

Let a feature-weighted matrix P derived from the information gain be represented as follows:

$$\begin{aligned} P = \left[ {\begin{array}{*{20}{c}} {{w_1}}&{}{}&{}{}&{}{}\\ {}&{}{{w_2}}&{}{}&{}{}\\ {}&{}{}&{} \ddots &{}{}\\ {}&{}{}&{}{}&{}{{w_d}} \end{array}} \right] \end{aligned}$$

(10)

where, ${w_i},i = 1,2,\ldots ,d$ denotes the weight of the ith feature calculated by the information gain. Then the feature-weighted kernel function can be defined as ${k_p}\left( {{x_i},{x_j}} \right) = k\left( {{x_i}^TP,{x_j}^TP} \right) $. Here are three typical kernels with feature weights:

(1) Linear kernel:

$$\begin{aligned} {k_p}\left( {{x_i},{x_j}} \right) = \left( {{x_i}P} \right) \cdot \left( {{x_j}P} \right) = {x_i}P{P^T}{x_j}^T \end{aligned}$$

(11)

(2) Polynomial kernel:

$$\begin{aligned} \begin{array}{l} {k_p}\left( {{x_i},{x_j}} \right) = {\left[ {\gamma \left( {{x_i}P} \right) \cdot \left( {{x_j}P} \right) + r} \right] ^d}\\ \mathrm{{ }} = {\left( {\gamma {x_i}P{P^T}{x_j}^T + r} \right) ^d},\gamma > 0 \end{array} \end{aligned}$$

(12)

(3) Gaussian kernel:

$$\begin{aligned} \begin{array}{l} {k_p}({x_i},{x_j}) = \exp \left( - \gamma {\left\| {x_i^TP - x_j^TP} \right\| ^2}\right) \\ = \exp ( - \gamma ({({x_i} - {x_j})^T}P{P^T}({x_i} - {x_j}))) \end{array} \end{aligned}$$

(13)

For a binary classification problem, let $K,G \in {R^{l \times l}}$ be kernel matrices defined as ${K_{i,j}} = k({x_i},{x_j})$ and ${G_{i,j}} = g({y_i},{y_j})$, respectively. The $g({y_i},{y_j})$ is defined as follows:

$$\begin{aligned} g({y_i},{y_j}) = \left\{ \begin{array}{l} + 1\mathrm{{ , }}{y_i} = {y_j}\\ - 1\mathrm{{ , }}{y_i} \ne {y_j} \end{array} \right. \end{aligned}$$

(14)

where, the similarities from the same class are set to $\mathrm{{ + }}1$, and those from different classes are $ - 1$. This definition reveals the ideal pairwise similarities between samples. Let $y = {({y_1},{y_2},\ldots ,{y_l})^T}$, then

$$\begin{aligned}&CKA(K,G)\nonumber \\ {}&= \frac{{{{\left\langle {\overline{K} ,\overline{G} } \right\rangle }_F}}}{{\sqrt{{{\left\langle {\overline{K} ,\overline{K} } \right\rangle }_F}{{\left\langle {\overline{G} ,\overline{G} } \right\rangle }_F}} }} = \frac{{{{\left\langle {\overline{K} ,G} \right\rangle }_F}}}{{\sqrt{{{\left\langle {\overline{K} ,K} \right\rangle }_F}{{\left\langle {\overline{G} ,G} \right\rangle }_F}} }}\nonumber \\&= \frac{{{{\left\langle {\overline{K} ,y{y^T}} \right\rangle }_F}}}{{\sqrt{{{\left\langle {\overline{K} ,K} \right\rangle }_F}{{\left\langle {\overline{G} ,G} \right\rangle }_F}} }} = \frac{{\sum \limits _{i = 1}^l {\sum \limits _{j = 1}^l {{y_i}{y_j}\overline{k} \left( {{x_i},{x_j}} \right) } } }}{{\sqrt{{{\left\langle {\overline{K} ,K} \right\rangle }_F}{{\left\langle {\overline{G} ,G} \right\rangle }_F}} }}\nonumber \\&= \frac{1}{{\sqrt{{{\left\langle {\overline{K} ,K} \right\rangle }_F}{{\left\langle {\overline{G} ,G} \right\rangle }_F}} }}\left[ {\sum \limits _{{y_i} = {y_j}} {\overline{k} ({x_i},{x_j})} - \sum \limits _{{y_i} \ne {y_j}} {\overline{k} ({x_i},{x_j})} } \right] \end{aligned}$$

(15)

where, $\overline{k} ({x_i},{x_j}) = {\overline{K} _{i,j}}$ is the centered kernel function.

For given data set $S = \left\{ {({x_1},{y_1}),({x_2},{y_2}),\ldots ,({x_l},{y_l})} \right\} $, kernel functions k and g,we get

$$\begin{aligned} \overline{K} = HKH,\mathrm{{ }}\overline{G} = HGH \end{aligned}$$

(16)

where ${K_{i,j}} = k({x_i},{x_j})$ and ${G_{i,j}} = g({y_i},{y_j})$, $H = I - \frac{{e{e^T}}}{l} \in {R^{l \times l}}$. Then,

$$\begin{aligned} {\left\langle {\overline{K} ,K} \right\rangle _F} = \sum \limits _{i = 1}^l {\sum \limits _{j = 1}^l {\overline{k} } } \left( {{x_i},{x_j}} \right) k\left( {{x_i},{x_j}} \right) \end{aligned}$$

(17)

and

$$\begin{aligned} {\left\langle {\overline{G} ,G} \right\rangle _F} = \sum \limits _{i = 1}^l {\sum \limits _{j = 1}^l {\overline{g} } } \left( {{x_i},{x_j}} \right) g\left( {{x_i},{x_j}} \right) \end{aligned}$$

(18)

are obtained, where $\overline{k} $ and $\overline{g} $ are the centered kernel functions corresponding to $\overline{K} $ and $\overline{G} $, respectively. Thus, $\frac{1}{{\sqrt{{{\left\langle {\overline{K},K} \right\rangle }_F}{{\left\langle {\overline{G},G} \right\rangle }_F}} }}$ is a constant, CKA of ${x_t}$ can be expressed as follows:

$$\begin{aligned} {d_t} = CKA(K,G,{x_t}) = \sum \limits _{{y_t} = {y_i}} {\overline{k} ({x_t},{x_i})} - \sum \limits _{{y_t} \ne {y_i}} {\overline{k} ({x_t},{x_i})} \end{aligned}$$

(19)

Since $\overline{k} ({x_t},{x_j})$ is the centered kernel function that measures the similarity between points $x_t$ and $x_j$, it is worth noting that the larger the similarity represented by the kernel for input patterns of the same class and the smaller the similarity for patterns from different classes, the larger the $d_t$. In other words, a sample with a larger $d_t$ value contributes more to the construction of the optimal classification hyperplane, and a sample with a smaller $d_t$ value is more likely to be noise. Thus the fuzzy membership function based on the CKA to measure the importance of each sample point to the classification can be expressed as follows:

$$\begin{aligned} {s_t} = \frac{{{d_t} - {d_{\min }}}}{{{d_{\max }} - {d_{\min }}}} \end{aligned}$$

(20)

where ${d_{\max }}$ and ${d_{\min }}$ denote the largest and smallest CKA value among all sample points, respectively. Therefore, the larger the value of ${s_t}$, the greater the contribution of the sample ${x_t}$ to the construction of the optimal classification hyperplane, and conversely, the sample ${x_t}$ is likely to be noise. Different from the existing fuzzy membership function based on distance, relative density and entropy, the proposed strategy utilizes the CKA method to derive a heuristic function for calculating the dependency between samples and their corresponding labels, which assigns fuzzy membership to different samples. In addition, the proposed strategy incorporates the idea of feature weighting, which effectively reduces the influence of weakly correlated features. The corresponding input dataset S is thus modified as $S = \left\{ {({x_1},{y_1},{s_1}),({x_2},{y_2},{s_2}),\ldots ,({x_l},{y_l},{s_l})} \right\} $.

3.2 Linear FTSVM-CKA

In linear case, the FTSVM-CKA finds the optimal classifier by solving the following two QPPs:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{{w_1},{b_1},{\xi _2}} \mathrm{{ }}\frac{1}{2}{\left\| {A{w_1} + {e_1}{b_1}} \right\| ^2} + \frac{1}{2}{C_1}{\left\| {{w_1}} \right\| ^2} + {C_2}{S_2}^T{\xi _2}\\ s.t.\mathrm{{ }} - (B{w_1} + {e_2}{b_1}) + {\xi _2} \ge {e_2},\mathrm{{ }}{\xi _2} \ge 0 \end{array} \end{aligned}$$

(21)

and

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{{w_2},{b_2},{\xi _1}} \mathrm{{ }}\frac{1}{2}{\left\| {B{w_2} + {e_2}{b_2}} \right\| ^2} + \frac{1}{2}{C_3}{\left\| {{w_2}} \right\| ^2} + {C_4}{S_1}^T{\xi _1}\\ s.t.\mathrm{{ }}(A{w_2} + {e_1}{b_2}) + {\xi _1} \ge {e_1},\mathrm{{ }}{\xi _1} \ge 0 \end{array} \end{aligned}$$

(22)

where, ${C_1}$, ${C_2}$, ${C_3}$ and ${C_4}$ are penalty parameters, ${\xi _1}$ and ${\xi _2}$ are slack variables, ${e_1}$ and ${e_1}$ are the vector of ones with adequate length. ${S_1} \in {R^{{l_ + }}}$ and ${S_2} \in {R^{{l_ - }}}$ denote the corresponding fuzzy membership of positive and negative class samples, respectively.

This paper takes the process of solving problem (21) as an example. The Lagrangian of problem (21) is written as

$$\begin{aligned}&L({w_1},{b_1},{\xi _2},\alpha ,\beta ) \nonumber \\ {}&\quad = \frac{1}{2}{\left\| {A{w_1} + {e_1}{b_1}} \right\| ^2} + \frac{1}{2}{C_1}{\left\| {{w_1}} \right\| ^2}\nonumber \\&\qquad + {C_2}{S_2}^T{\xi _2} + \alpha [(B{w_1} + {e_2}{b_1}) - {\xi _2} + {e_2}] - \beta {\xi _2} \end{aligned}$$

(23)

where $\alpha $ and $\beta $ are Lagrange multipliers. Applying Karush-Kuhn-Tucker (KKT) conditions, we get

$$\begin{aligned}{} & {} \frac{{\partial L}}{{\partial {w_1}}} = {A^T}(A{w_1} + {e_1}{b_1}) + {C_1}{w_1} + \alpha B = 0\end{aligned}$$

(24)

$$\begin{aligned}{} & {} \frac{{\partial L}}{{\partial {b_1}}} = e_1^T(A{w_1} + {e_1}{b_1}) + \alpha {e_2} = 0\end{aligned}$$

(25)

$$\begin{aligned}{} & {} \frac{{\partial L}}{{\partial {\xi _2}}} = {C_2}S_2^T - \alpha - \beta = 0 \end{aligned}$$

(26)

According to Eq. (24) and Eq. (25),

$$\begin{aligned} \left( {\begin{array}{*{20}{l}} {{A^T}}\\ {e_1^T} \end{array}} \right) \left( {\begin{array}{*{20}{l}} A&{{e_1}} \end{array}} \right) \left( {\begin{array}{*{20}{l}} {{w_1}}\\ {{b_1}} \end{array}} \right) + \left( {\begin{array}{*{20}{l}} B\\ {{e_2}} \end{array}} \right) \alpha = 0 \end{aligned}$$

(27)

can be obtained. Let ${H_1} = \left( {\begin{array}{*{20}{l}} A&{{e_1}} \end{array}} \right) $, ${G_2} = \left( {\begin{array}{*{20}{l}} B&{{e_2}} \end{array}} \right) $, ${u_1} = \left( {\begin{array}{*{20}{l}} {{w_1}}\\ {{b_1}} \end{array}} \right) $, ${u_2} = \left( {\begin{array}{*{20}{l}} {{w_2}}\\ {{b_2}} \end{array}} \right) $, then, $H_1^T{H_1}{u_1} + G_2^T\alpha = 0$. Further, we can get

$$\begin{aligned} {u_1} = - {(H_1^T{H_1})^{ - 1}}G_2^T\alpha \end{aligned}$$

(28)

Since ${(H_1^T{H_1})^{ - 1}}$ is difficult to calculate, ${(H_1^T{H_1} + {C_1}I)^{ - 1}}$ is used instead of it in Eq. (28), where I is the identity matrix with the appropriate dimension. Thus,

$$\begin{aligned} {u_1} = - {(H_1^T{H_1} + {C_1}I)^{ - 1}}G_2^T\alpha \end{aligned}$$

(29)

Similarly,

$$\begin{aligned} {u_2} = {(G_2^T{G_2} + {C_3}I)^{ - 1}}H_1^T\beta \end{aligned}$$

(30)

According to the KKT conditions, the dual problems of Eq. (21) and Eq. (22) are as follows:

$$\begin{aligned} \begin{array}{l} \mathop {\max }\limits _\alpha \mathrm{{ }}{e_2}^T\alpha - \frac{1}{2}{\alpha ^T}{G_2}{(H_1^T{H_1} + {C_1}I)^{ - 1}}G_2^T\alpha \\ s.t.\mathrm{{ }}0 \le \alpha \le {C_2}{S_2} \end{array} \end{aligned}$$

(31)

and

$$\begin{aligned} \begin{array}{l} \mathop {\max }\limits _\beta \mathrm{{ }}{e_1}^T\beta - \frac{1}{2}{\beta ^T}{H_1}{(G_2^T{G_2} + {C_3}I)^{ - 1}}H_1^T\beta \\ s.t.\mathrm{{ }}0 \le \beta \le {C_4}{S_1} \end{array} \end{aligned}$$

(32)

We get the optimal $u_1^ * = \left( {\begin{array}{*{20}{c}} {w_1^ * }\\ {b_1^ * } \end{array}} \right) $ and $u_2^ * = \left( {\begin{array}{*{20}{c}} {w_2^ * }\\ {b_2^ * } \end{array}} \right) $ by solving the two corresponding dual problems. For any input sample ${x^ * }$, its class label ${y^ * }$ can be determined as follows:

$$\begin{aligned} {y^ * } = \mathop {\arg \min }\limits _{i \in 1,2} \left\{ {\frac{{\left| {w{{_i^ * }^T}{x^ * } + b_i^ * } \right| }}{{\left\| {w_i^ * } \right\| }}} \right\} \end{aligned}$$

(33)

3.3 Nonlinear FTSVM-CKA

In nonlinear case, the kernel function $k({x_1},{x_2}) = (\phi ({x_1}),\phi ({x_2}))$ is introduced, where $\phi $ is the Hilbert space transformation. Thus, the classification hyperplanes in the nonlinear case can be represented as $k(x,{X^T}){w_1} + {b_1} = 0,\mathrm{{ }}k(x,{X^T}){w_2} + {b_2} = 0$, where $X = \left[ {A;B} \right] $. The nonlinear FTSVM-CKA is formulated in the primal form as

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{{w_1},{b_1},{\xi _2}} \mathrm{{ }}\frac{1}{2}{\left\| {k(A,{X^T}){w_1} + {e_1}{b_1}} \right\| ^2} + \frac{1}{2}{C_1}{\left\| {{w_1}} \right\| ^2} + {C_2}{S_2}^T{\xi _2}\\ s.t.\mathrm{{ }} - (k(B,{X^T}){w_1} + {e_2}{b_1}) + {\xi _2} \ge {e_2},\mathrm{{ }}{\xi _2} \ge 0 \end{array} \end{aligned}$$

(34)

and

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _{{w_2},{b_2},{\xi _1}} \mathrm{{ }}\frac{1}{2}{\left\| {k(B,{X^T}){w_2} + {e_2}{b_2}} \right\| ^2} + \frac{1}{2}{C_3}{\left\| {{w_2}} \right\| ^2} + {C_4}{S_1}^T{\xi _1}\\ s.t.\mathrm{{ }}(K(A,{X^T}){w_2} + {e_1}{b_2}) + {\xi _1} \ge {e_1},\mathrm{{ }}{\xi _1} \ge 0 \end{array} \end{aligned}$$

(35)

The Lagrangian function of the Eq. (34) is written as

$$\begin{aligned} \begin{array}{l} L({w_1},{b_1},{\xi _2},\alpha ,\beta ) = \\ \frac{1}{2}{\left\| {k(A,{X^T}){w_1} + {e_1}{b_1}} \right\| ^2} + \frac{1}{2}{C_1}{\left\| {{w_1}} \right\| ^2} + {C_2}{S_2}^T{\xi _2}\\ + \alpha [(k(B,{X^T}){w_1} + {e_2}{b_1}) - {\xi _2} + {e_2}] - \beta {\xi _2} \end{array} \end{aligned}$$

(36)

Following the same procedure as in the linear case, we get

$$\begin{aligned} {u_1} = - {(H{_1^ *{} ^T}H_1^ * + {C_1}I)^{ - 1}}G{_2^ *{} ^T}\alpha . \end{aligned}$$

(37)

and

$$\begin{aligned} {u_2} = {(G{_2^*{}^T}G_2^* + {C_3}I)^{ - 1}}H{_1^*{}^T}\beta . \end{aligned}$$

(38)

where $H_1^ * = \left( {\begin{array}{*{20}{c}} {k(A,{X^T})}&{{e_1}} \end{array}} \right) $, $G_2^ * = \left( {\begin{array}{*{20}{c}} {k(B,{X^T})}&{{e_2}} \end{array}} \right) $, ${u_1} = \left( {\begin{array}{*{20}{c}} {{w_1}}\\ {{b_1}} \end{array}} \right) $, ${u_2} = \left( {\begin{array}{*{20}{c}} {{w_2}}\\ {{b_2}} \end{array}} \right) $. Then, the dual problems of Eq. (34) and Eq. (35) are as follows:

$$\begin{aligned} \begin{array}{l} \mathop {\max }\limits _\alpha \mathrm{{ }}{e_2}^T\alpha - \frac{1}{2}{\alpha ^T}G_2^*{(H{_1^*{}^T}H_1^* + {C_1}I)^{ - 1}}G{_2^*{}^T}\alpha \\ s.t.\mathrm{{ }}0 \le \alpha \le {C_2}{S_2} \end{array} \end{aligned}$$

(39)

and

$$\begin{aligned} \begin{array}{l} \mathop {\max }\limits _\beta \mathrm{{ }}{e_1}^T\beta - \frac{1}{2}{\beta ^T}H_1^*{(G{_2^*{}^T}G_2^* + {C_3}I)^{ - 1}}H{_1^*{}^T}\beta \\ s.t.\mathrm{{ }}0 \le \beta \le {C_4}{S_1} \end{array} \end{aligned}$$

(40)

We get the optimal $u_1^ * = \left( {\begin{array}{*{20}{c}} {w_1^ * }\\ {b_1^ * } \end{array}} \right) $ and $u_2^ * = \left( {\begin{array}{*{20}{c}} {w_2^ * }\\ {b_2^ * } \end{array}} \right) $ by solving the dual problems. For any input sample ${x^ * }$, its classification decision function is as follows:

$$\begin{aligned}{} & {} {y^ * }\nonumber \\{} & {} = \mathop {\arg \min }\limits _{i \in 1,2} \left\{ {\frac{{\left| {w{{_1^ * }^T}k({x^ * },{X^T}) + b_1^ * } \right| }}{{\sqrt{w{{_1^ * }^T}k(A,{X^T})w_1^ * } }},\frac{{\left| {w{{_2^ * }^T}k({x^ * },{X^T}) + b_2^ * } \right| }}{{\sqrt{w{{_2^ * }^T}k(B,{X^T})w_2^ * } }}} \right\} .\nonumber \\ \end{aligned}$$

(41)

3.4 The coordinate descent strategy with active set shrinking

To speed up the training, the FTSVM-CKA employs a coordinate descent strategy with shrinking by active set which handles the computational complexity brought by high-dimensional inputs (Gao et al. 2015). Since the dual problems involved in FTSVM-CKA can be solved similarly, we take Eq. (31) as an example. Let $R = {(H_1^T{H_1} + {C_1}I)^{ - 1}}G_2^T$, $\widetilde{R} = {G_2}R$, then Eq. (31) can be reduced to the following problem:

$$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _\alpha \mathrm{{ g}}(\alpha ) = - {e_2}^T\alpha + \frac{1}{2}{\alpha ^T}\widetilde{R}\alpha \\ s.t.\mathrm{{ }}0 \le \alpha \le {C_2}{S_2} \end{array} \end{aligned}$$

(42)

A coordinate descent strategy with shrinking by active set is adopted to solve Eq. (42). Its pseudo code is shown in Algorithm 1. ${g_{\nabla i}}(\alpha )$ is a projection gradient, as follows:

$$\begin{aligned} {g_{\nabla i}}(\alpha ) = \left\{ {\begin{array}{*{20}{l}} {\min (0,{g_{\partial i}}(\alpha )),\mathrm{{ }}if\mathrm{{ }}{\alpha _i} = 0}\\ {\mathrm{{ }}{g_{\partial i}}(\alpha ),\mathrm{{ }}if\mathrm{{ 0}}< {\alpha _i} < {C_2}{\mathrm{{s}}_i}\mathrm{{ }}}\\ {\max (0,{g_{\partial i}}(\alpha )),\mathrm{{ }}if\mathrm{{ }}{\alpha _i} = {C_2}{\mathrm{{s}}_i}} \end{array}} \right. \end{aligned}$$

(43)

where ${g_{\partial i}}$ is the ith component of gradient ${g_\partial }$. Reference to Chang and Lin (2011), Chang et al. (2008), and Shao and Deng (2012) for some details.

Table 1 Details of UCI data sets

Full size table

4 Experimental results

In this paper, different experiments are designed on an artificial dataset, i.e., Ripleys (Ripley 2007) and 15 real-world data sets from UCI machine learning repository (Dua et al. 2017), to evaluate the performance of FTSVM-CKA. TSVM (Khemchandani and Chandra 2007), CDFTSVM (Gao et al. 2015), IFTSVM (Rezvani et al. 2019), AULSTSVM (Richhariya et al. 2021), CatBoost (Prokhorenkova et al. 2018), LightGBM (Ke et al. 2017), XGBoost (Chen et al. 2015), SVM (Cortes and Vapnik 1995), RandomForest (RF) (Breiman 2001) are used as comparison algorithms. For parameters ${C_i},i = 1,2,3,4$ in FTSVMs, as for CDFTSVM and IFTSVM, we set ${C_1} = {C_3},{C_2} = {C_4}$ for FTSVM-CKA. They are correctly explored in $\left\{ {{{10}^i}|i = - 5, - 4,\ldots ,4,5} \right\} $. While the parameters are set as ${c_1} = {c_2}$, ${c_3} = {c_5} = {c_1} \cdot {c_4}$, ${c_4} = {c_6}$ for AULSTSVM (Richhariya et al. 2021). In addition, Gaussian kernel function, i.e., $k({x_1},{x_2}) = \exp (\frac{{ - {{\left\| {{x_1} - {x_2}} \right\| }^2}}}{{{\sigma ^2}}})$ is used in this paper, and $\sigma $ is explored in $\left\{ {{2^i}|i = - 5, - 4,\ldots ,4,5} \right\} $. The 10-fold cross-validation is performed for all the algorithms. All samples are normalized. To simulate label noise, we randomly select a given proportion of samples and flip their corresponding labels, and this proportion is called the noise rate. The experimental environment is listed as follows: Inter Core i5-11500 CPU, 8 G, Windows10, MATLAB2018b.

4.1 Parameter effect

In this subsection, the effect of different ${C_i}$ and $\sigma $ are considered in the Horse dataset to identify the optimal parameters, i.e., ${C_i}$ for linear case and ${C_i}$ and $\sigma $ for nonlinear case, that produce the best performance. First, ${C_i}$, which varies in $\left\{ {1,2,\ldots ,10} \right\} $, is optimized for the linear case. FTSVM-CKA generates better outcomes when ${C_1} = 5$ and ${C_2} = 2$.

Similarly, for nonlinear case, ${C_i}$ and $\sigma $ are optimized and can differ in $\left\{ {i \cdot \frac{1}{2}|i = 1,2,\ldots ,10} \right\} $. FTSVM-CKA with ${C_1} = 5$, ${C_2} = 2$ and $\sigma = 1.5$ produces better outcomes. After obtaining the optimal parameter settings, the performance of the model was evaluated on the remaining testing parts.

4.2 Artificial data sets

The Ripleys data set is a mixture of two Gaussian distributions. It comprises two categories, with each sample consisting of two features. Figures 1 and 2 show the linear separating hyperplanes generated by TSVM and FTSVM-CKA at a noise rate of 0 and 10%, respectively. Figure 1 reveals that the hyperplanes generated by TSVM exhibit noticeable variations across different noise rates. From Fig. 2, one can observe that the disparity between the hyperplanes generated by FTSVM-CKA at various noise rates is significantly smaller in comparison to TSVM.

Figure 3 illustrates the accuracy of FTSVM-CKA and TSVM with varying noise rates in linear and nonlinear cases. It can be observed that the accuracy of FTSVM-CKA is better than that of TSVM, and the classification performance of both algorithms demonstrate a decreasing trend. From Fig. 3, we can find that the accuracy of TSVM fluctuates significantly with increasing noise rate, suggesting its sensitivity to noise. It is worth noting that compared with TSVM, the accuracy of FTSVM-CKA exhibits lower susceptibility to noise and changes more gently. This indicates that FTSVM-CKA can effectively mitigate the sensitivity of TSVM to noise. In conclusion, the experimental results show that the proposed FTSVM-CKA can suppress the adverse effects of noise because we introduce a fuzzy membership assignment strategy based on CKA.

Table 2 Experimental results on UCI data sets at 0% noise rate in the linear case

Full size table

Table 3 Experimental results on UCI data sets at 5% noise rate in the linear case

Full size table

Table 4 Experimental results on UCI data sets at 10% noise rate in the linear case

Full size table

Table 5 Experimental results on UCI data sets at 0% noise rate in the nonlinear case

Full size table

Table 6 Experimental results on UCI data sets at 5% noise rate in the nonlinear case

Full size table

Table 7 Experimental results on UCI data sets at 10% noise rate in the nonlinear case

Full size table

Table 8 Win-Tie-Loss accuracy comparison from Tables 2, 3 and 4

Full size table

Table 9 Win-Tie-Loss accuracy comparison from Tables 5, 6 and 7

Full size table

Table 10 The pairwise significant difference between the proposed FTSVM-CKA and existing algorithms in the linear case

Full size table

Table 11 The pairwise significant difference between the proposed FTSVM-CKA and existing algorithms in the nonlinear case

Full size table

4.3 UCI data sets

Table 1 shows the details of the 15 UCI datasets selected in this paper. In the experiments, the noise rate is set to 0, 5, and 10%, respectively. The average accuracy, along with the standard deviations (SD) and computational time, are calculated to evaluate the experimental results.

We implement TSVM and TSVM-related methods including CDFTSVM and IFTSVM. Tables 2, 3 and 4 present the experimental results of FTSVM-CKA and TSVM, IFTSVM, CDFTSVM in the linear case, with noise rates set at 0, 5, and 10% in sequence. The bold in all tables means the value obtained is the best. The results from Tables 2, 3 and 4 demonstrate that, out of the 13 UCI datasets mentioned earlier, the proposed FTSVM-CKA achieves the highest classification accuracy on 11, 10, and 10 datasets, respectively. The average ranks of accuracy of FTSVM-CKA under different noise rates is 1.15, 1.31 and 1.46, respectively, which are superior to the existing algorithms. This indicates that FTSVM-CKA outperforms the other three algorithms in terms of classification performance in the linear case. FTSVM-CKA utilizes the fuzzy membership assignment strategy based on CKA to mitigate the adverse impact of noise in the classification process, thereby significantly enhancing the classification performance. In addition, it can be found that the calculation time of FTSVM-CKA and CDFTSVM is very close, and both are significantly shorter than that of TSVM and IFTSVM. It shows that the FTSVM-CKA proposed in this paper has a faster training speed than other algorithms. This can be attributed to both FTSVM-CKA and CDFTSVM employ a coordinated descent strategy with shrinking by active set. However, the training time of FTSVM-CKA is slightly higher than that of CDFTSVM, mainly because CDFTSVM uses the simplest fuzzy membership calculation method.

Tables 5, 6 and 7 present the experimental results of FTSVM-CKA, TSVM, CDFTSVM, and IFTSVM under nonlinear conditions, and the noise rates are 0, 5, and 10% sequentially. In the case of different noise rates, the proposed model achieved the best results in 12, 11 and 12 of the 13 datasets, respectively. In the nonlinear case, the average ranks of accuracy of FTSVM-CKA at different noise rates are 1.23, which are better than the existing algorithms. It can be observed that the classification performance of FTSVM-CKA is better than that of other algorithms in nonlinear condition. Similar to the linear case, the computational time of FTSVM-CKA exhibits significantly shorter computational time compared to both TSVM and IFTSVM in the nonlinear case. The experimental results demonstrate that FTSVM-CKA outperforms other algorithms in terms of both classification performance and training speed.

In order to compare the proposed FTSVM-CKA with other algorithms in terms of the classification performance, we utilize the Win-Tie-Loss (Xu et al. 2016) statistical analysis to examine the datasets and record the number of datasets in which FTSVM-CKA outperforms, achieves equal performance, or performs worse than other algorithms in both linear and nonlinear cases. The corresponding results are presented in Tables 8 and 9. One can observe that FTSVM-CKA give better performance than other methods on the majority of datasets. Furthermore, it is evident that FTSVM-CKA exhibits a clear advantage in the presence of noise.

Table 12 Experimental results with machine learning algorithms at 0% noise rate

Full size table

Table 13 Experimental results with machine learning algorithms at 5% noise rate

Full size table

Table 14 Experimental results with machine learning algorithms at 10% noise rate

Full size table

We perform the Friedman test with the post-hoc test to prove the statistical significance of the proposed FTSVM-CKA for generalization performance. The Friedman test uses the average ranks of the algorithms. The average ranks of all algorithms in different situations are recorded in Tables 2, 3, 4, 5, 6 and 7. Under the null hypothesis that all the algorithms have equal ranks, the Friedman statistics which are distributed with $\chi _F^2$ with $\kappa - 1$ degree of freedom are calculated , where $\kappa $ is the number of the algorithms. The performance of two algorithms is significantly different if their average ranks differ by at least the critical difference defined by $CD = {q_\alpha }\sqrt{\frac{{\kappa \left( {\kappa + 1} \right) }}{{6N}}} $, where N is the number of datasets and ${q_\alpha }$ is computed by using the Studentized range statistic. The critical difference for our case at $\alpha = 0.10$ level of significance level is $CD = 2.241\sqrt{\frac{{5\left( {5 + 1} \right) }}{{6 \times 13}}} \approx 1.39$. Tables 10 and 11 show the pairwise significant difference between the algorithms in linear and nonlinear cases, respectively. It can be found that the proposed FTSVM-CKA is significantly different from most of the algorithms.

In addition, FTSVM-CKA is compared with some existing machine learning algorithms whose effectiveness has been recognized, including CatBoost, LightGBM, XGBoost, SVM, RF. The experiment was performed on 5 UCI datasets with 10-fold cross-validation. These comparison algorithms are all set with default parameters. Tables 12, 13 and 14 show the classification accuracy results at 0, 5, and 10% noise rates, respectively. We can find that FTSVM-CKA achieves optimal classification results in all cases. This shows that the classification performance of FTSVM-CKA is significantly better than that of some existing classical and effective machine learning algorithms. It is worth noting that FTSVM-CKA is superior to traditional SVM in both generalization performance and training speed. One fact is that SVM cannot distinguish the importance of different samples for classification, which makes it sensitive to noise. The proposed FTSVM-CKA utilizes the CKA method to derive a heuristic function for calculating the dependency between samples and their corresponding labels, which assigns fuzzy membership to different samples. This effectively identifies noise and reduces the negative impact on classification.

5 Conclusion

To address the problem that traditional TSVM is sensitive to noise, we proposed a novel and efficient fuzzy twin support vector machine based on centered kernel alignment, termed as FTSVM-CKA. FTSVM-CKA utilizes the CKA method which incorporates the idea of feature weighting to assigns fuzzy membership to different samples. We conducted experiments on an artificial dataset and 15 UCI datasets. Noise is added to the original data set to verify the noise robustness of the proposed FTSVM-CKA. The experimental results demonstrate that FTSVM-CKA outperforms several existing learning models and exhibits excellent classification performance. Statistical tests on experimental results confirm the significance of the proposed algorithm. Nevertheless, FTSVM-CKA does not take into account the class imbalance data set. Our future work is to extend FTSVM-CKA to class imbalance learning.

Data availibility

This paper uses the UCI Machine Learning Repository, which is publicly available on the Internet. As follows: https://archive.ics.uci.edu/

References

Borah P, Gupta D (2022) Affinity and transformed class probability-based fuzzy least squares support vector machines. Fuzzy Sets Syst 443:203–235
Article MathSciNet Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Cárdenas-Peña D, Collazos-Huertas D, Castellanos-Dominguez G et al (2016) Centered kernel alignment enhancing neural network pretraining for MRI-based dementia diagnosis. Comput Math Methods Med. https://doi.org/10.1155/2016/9523849
Article MathSciNet Google Scholar
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):1–27
Article Google Scholar
Chang KW, Hsieh CJ, Lin CJ (2008) Coordinate descent method for large-scale l2-loss linear support vector machines. J Mach Learn Res 9(7):1369–1398
MathSciNet Google Scholar
Chen SG, Wu XJ (2018) A new fuzzy twin support vector machine for pattern classification. Int J Mach Learn Cybern 9:1553–1564
Article Google Scholar
Chen T, He T, Benesty M, et al (2015) Xgboost: extreme gradient boosting. R package version 04-2 1(4):1–4
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Article Google Scholar
Cortes C, Mohri M, Rostamizadeh A (2012) Algorithms for learning kernels based on centered alignment. J Mach Learn Res 13(1):795–828
MathSciNet Google Scholar
Dua D, Graff C, et al (2017) Uci machine learning repository. http://archive.ics.uci.edu/ml 7(1)
Francis LM, Sreenath N (2022) Robust scene text recognition: using manifold regularized twin-support vector machine. J King Saud Univ Comput Inf Sci 34(3):589–604
Google Scholar
Ganaie M, Tanveer M, Initiative ADN et al (2021) Fuzzy least squares projection twin support vector machines for class imbalance learning. Appl Soft Comput 113:107933
Article Google Scholar
Ganaie M, Kumari A, Malik AK et al (2022) Eeg signal classification using improved intuitionistic fuzzy twin support vector machines. Neural Comput Appl 36:163–179
Article Google Scholar
Ganaie M, Tanveer M, Lin CT (2022) Large-scale fuzzy least squares twin svms for class imbalance learning. IEEE Trans Fuzzy Syst 30(11):4815–4827
Article Google Scholar
Ganaie M, Kumari A, Girard A et al (2023) Diagnosis of Alzheimer’s disease via intuitionistic fuzzy least squares twin svm. Appl Soft Comput 149:110899
Article Google Scholar
Ganaie M, Tanveer M, Suganthan PN (2020) Regularized robust fuzzy least squares twin support vector machine for class imbalance learning. In: 2020 international joint conference on neural networks (IJCNN), IEEE, pp 1–8
Gao BB, Wang JJ, Wang Y, et al (2015) Coordinate descent fuzzy twin support vector machine for classification. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), IEEE, pp 7–12
Gupta D, Richhariya B, Borah P (2019) A fuzzy twin support vector machine based on information entropy for class imbalance learning. Neural Comput Appl 31(11):7153–7164
Article Google Scholar
Han J, Pei J, Tong H (2022) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Google Scholar
Hao PY, Kung CF, Chang CY et al (2021) Predicting stock price trends based on financial news articles and using a novel twin support vector machine with fuzzy hyperplane. Appl Soft Comput 98:106806
Article Google Scholar
Hazarika BB, Gupta D, Kumar B (2023) Eeg signal classification using a novel Universum-based twin parametric-margin support vector machine. Cogn Comput 16:2047–2062
Article Google Scholar
Ju H, Qiang W, Jing L (2021) A novel interval-valued fuzzy multiple twin support vector machine. Iran J Fuzzy Syst 18(2):93–107
MathSciNet Google Scholar
Ke G, Meng Q, Finley T, et al (2017) Lightgbm: A highly efficient gradient boosting decision tree. In: Advances in neural information processing systems 30
Khemchandani R, Chandra S et al (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Article Google Scholar
Kung CF, Hao PY (2023) Fuzzy least squares support vector machine with fuzzy hyperplane. Neural Process Lett 55(6):7415–7446
Article Google Scholar
Liang Z, Zhang L (2022) Intuitionistic fuzzy twin support vector machines with the insensitive pinball loss. Appl Soft Comput 115:108231
Article Google Scholar
Lu Y, Wang L, Lu J et al (2014) Multiple kernel clustering based on centered kernel alignment. Pattern Recogn 47(11):3656–3664
Article Google Scholar
Prokhorenkova L, Gusev G, Vorobev A, et al (2018) Catboost: unbiased boosting with categorical features. In: Advances in neural information processing systems 31
Rezvani S, Wang X (2021) Class imbalance learning using fuzzy art and intuitionistic fuzzy twin support vector machines. Inf Sci 578:659–682
Article MathSciNet Google Scholar
Rezvani S, Wang X (2022) Intuitionistic fuzzy twin support vector machines for imbalanced data. Neurocomputing 507:16–25
Article Google Scholar
Rezvani S, Wang X, Pourpanah F (2019) Intuitionistic fuzzy twin support vector machines. IEEE Trans Fuzzy Syst 27(11):2140–2151
Article Google Scholar
Richhariya B, Tanveer M (2018) A robust fuzzy least squares twin support vector machine for class imbalance learning. Appl Soft Comput 71:418–432
Article Google Scholar
Richhariya B, Tanveer M, Initiative ADN (2021) A fuzzy Universum least squares twin support vector machine (fulstsvm). Neural Comput Appl 34:11411–11422
Article Google Scholar
Richhariya B, Tanveer M, of Mathematics ADNID (2021) An efficient angle-based universum least squares twin support vector machine for classification. ACM Trans Internet Technol 21(3):1–24
Ripley BD (2007) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Google Scholar
Shao YH, Deng NY (2012) A coordinate descent margin based-twin support vector machine for classification. Neural Netw 25:114–121
Article Google Scholar
Sharma R, Goel T, Tanveer M et al (2022) FDN-ADNET: Fuzzy LS-TWSVM based deep learning network for prognosis of the Alzheimer’s disease using the sagittal plane of mri scans. Appl Soft Comput 115:108099
Article Google Scholar
Tanveer M, Ganaie M, Bhattacharjee A et al (2022) Intuitionistic fuzzy weighted least squares twin svms. IEEE Trans Cybern 53:4400–4409
Article Google Scholar
Tanveer M, Rajani T, Rastogi R et al (2022) Comprehensive review on twin support vector machines. Ann Oper Res. https://doi.org/10.1007/s10479-022-04575-w
Wang T, Qiu Y, Hua J (2020) Centered kernel alignment inspired fuzzy support vector machine. Fuzzy Sets Syst 394:110–123
Xie X, Li Y, Sun S (2023a) Deep multi-view multiclass twin support vector machines. Inf Fusion 91:80–92
Xie X, Sun F, Qian J et al (2023b) Laplacian lp norm least squares twin support vector machine. Pattern Recogn 136:109192
Xu Y, Yang Z, Pan X (2016) A novel twin support-vector machine with pinball loss. IEEE Trans Neural Netw Learn Syst 28(2):359–370
Article MathSciNet Google Scholar
Yu H, Sun C, Yang X et al (2019) Fuzzy support vector machine with relative density information for classifying imbalanced data. IEEE Trans Fuzzy Syst 27(12):2353–2367
Article Google Scholar

Download references

Acknowledgements

This paper would like to thank the editors and the anonymous referees for their professional comments, which improved the quality of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (Nos. 12271211, 12071179), the National Natural Science Foundation of Fujian Province (Nos. 2021J01861, 2020J01710), the Youth Innovation Fund of Xiamen City (3502Z20206020), the Open Fund of Digital Fujian Big Data Modeling and Intelligent Computing Institute, Pre-Research Fund of Jimei University.

Author information

Authors and Affiliations

School of Science, Jimei University, Xiamen, 361021, China
Jialiang Xie, Jianxiang Qiu & Dongxiao Zhang
School of Mathematics and Physics, Xiamen University Malaysia, 43900, Sepang, Selangor, Malaysia
Ruping Zhang

Authors

Jialiang Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jianxiang Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Dongxiao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ruping Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianxiang Qiu.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xie, J., Qiu, J., Zhang, D. et al. A novel fuzzy twin support vector machine based on centered kernel alignment. Soft Comput (2024). https://doi.org/10.1007/s00500-024-09917-3

Download citation

Accepted: 22 May 2024
Published: 24 July 2024
DOI: https://doi.org/10.1007/s00500-024-09917-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A novel fuzzy twin support vector machine based on centered kernel alignment

Abstract

Similar content being viewed by others

Twin Support Vector Machines Classifier Based on Intuitionistic Fuzzy Number

A new fuzzy twin support vector machine for pattern classification

Fuzzy twin support vector machine based on affinity and class probability for class imbalance learning

1 Introduction

2 Related works

2.1 Twin support vector machine

2.2 Centered kernel alignment

2.3 Information gain

3 A novel fuzzy twin support vector machine based on centered kernel alignment

3.1 A fuzzy membership assignment strategy based on centered kernel alignment

3.2 Linear FTSVM-CKA

3.3 Nonlinear FTSVM-CKA

3.4 The coordinate descent strategy with active set shrinking

4 Experimental results

4.1 Parameter effect

4.2 Artificial data sets

4.3 UCI data sets

5 Conclusion

Data availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel fuzzy twin support vector machine based on centered kernel alignment

Abstract

Similar content being viewed by others

Twin Support Vector Machines Classifier Based on Intuitionistic Fuzzy Number

A new fuzzy twin support vector machine for pattern classification

Fuzzy twin support vector machine based on affinity and class probability for class imbalance learning

Explore related subjects

1 Introduction

2 Related works

2.1 Twin support vector machine

2.2 Centered kernel alignment

2.3 Information gain

3 A novel fuzzy twin support vector machine based on centered kernel alignment

3.1 A fuzzy membership assignment strategy based on centered kernel alignment

3.2 Linear FTSVM-CKA

3.3 Nonlinear FTSVM-CKA

3.4 The coordinate descent strategy with active set shrinking

4 Experimental results

4.1 Parameter effect

4.2 Artificial data sets

4.3 UCI data sets

5 Conclusion

Data availibility

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation