Hyper-Sphere Support Vector Classifier with Hybrid Decision Strategy

Liu, Shuang; Chen, Peng

doi:10.1007/978-3-030-63820-7_8

Shuang Liu¹¹ &
Peng Chen¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

International Conference on Neural Information Processing

2346 Accesses

Abstract

If all bounding hyper-spheres for training data of every class are independent, classification for any test sample is easy to compute with high classification accuracy. But real application data are very complicated and relationships between classification bounding spheres are very complicated too. Based on detailed analysis of relationships between bounding hyper-spheres, a hybrid decision strategy is put forward to solve classification problem of the intersections for multi-class classification based on hyper-sphere support vector machines. First, characteristics of data distribution in the intersections are analyzed and then decision class is decided by different strategies. If training samples of two classes in the intersection can be classified by intersection hyper-plane for two hyper-spheres, then new test samples can be decided by this plane. If training samples of two classes in the intersection can be approximately linearly classified, new test samples can be classified by standard optimal binary-SVM hyper-plane. If training samples of two classes in the intersection cannot be linearly classified, new test samples can be decided by introducing kernel function to get optimal classification hyper-plane. If training examples belong to only one class, then new test samples can be classified by exclusion method. Experimental results show performance of our algorithm is more optimal than hyper-sphere support vector machines with only one decision strategy with relatively low computation cost.

Access provided by Autonomous University of Puebla. Download conference paper PDF

KNN-based maximum margin and minimum volume hyper-sphere machine for imbalanced data classification

Article 29 August 2017

A spheres-based support vector machine for pattern classification

Article 22 April 2017

Margin setting algorithm for pattern classification via spheres

Article 15 June 2020

Keywords

1 Introduction

Support Vector Machine (SVM) is originally put forward to solve binary classification problem with the idea of maximum-margin hyper-plane separating training samples. The classification hyper-plane is restricted with support vectors, which are used for the decision of a new test sample. Many researchers have successfully applied SVM in many fields [1, 2]. To solve multi-class classification problem in real applications, binary SVM needs to be extended or combined together to complete the complex classification task. There are many methods extending binary SVM to multi-class classification [3,4,5], such as one-against-one, one-against-all, hierarchy SVM classifiers, or DAG SVM classifiers. But because at least $k(k-1)/2$ quadratic programming (QP) optimization problems needs to be combined together to solve k class classification problem, computation cost for these methods is relatively high.

For multi-class classification problem, sphere-structured SVM is one special solution [6]. Based on one-class SVM, hyper-sphere SVM classifier tries to construct a minimum bounding hyper-sphere restricting all training samples of one class within it as much as possible. The bounding hyper-sphere for each class is restricted with its center and its radius. Similar to binary SVM, this method maximizes the gap between different hyper-spheres by the smallest radius. New test samples are classified depending on the bounding hyper-spheres they falls into. Since this method needs no combination of further computation and solves multi-class classification problem with direct computation of all hyper-spheres together, its computation complexity is less than all the above mentioned combination methods. Sphere-structured SVM has been studied a lot since its presentation. To get good performance for hyper-sphere SVM, Liu et al. [7, 8] proposed one fuzzy hyper-sphere SVM and one multiple sub-hyper-spheres SVM for multi-class classification. Most of these research focused on one decision rule and few researchers adopted two or more decision rules.

As mentioned above, each training data is bounded within a bounding hyper-sphere. If all bounding hyper-spheres for training data of every class are independent and new test sample falls inside only one hyper-sphere, its classification decision is easy to compute. But real application data are very complicated and relationships between different classification bounding hyper-spheres are very complicated too. When hyper-spheres for each class are intersected or not independent, it is difficult for one simple decision function to get the right classification result. Classification accuracy of test samples falling inside the intersection will influence the final classification performance. Based on the analysis of data distributions for such samples in the intersections, a hybrid decision strategy is put forward in this paper. Section 2 introduces statistical analysis of the intersection data distribution, mathematical description of our hyper-sphere SVM and implementation details of the proposed method. Section 3 discusses the experimental results and Sect. 4 gives the conclusions.

2 Our Method

2.1 Mathematical Description of Hyper-Sphere Support Vector Classifier

Similar with mathematic description of the original binary SVM, mathematic principles of hyper-sphere SVM is as follows. Supposing there is a set of n-dimensional training samples of m classes, the task is to compute the minimum bounding hyper-sphere for each class. Here, the minimum bounding hyper-sphere refers to the smallest hyper-sphere that encloses all the training samples of one class. Referenced by binary SVM, slack variables $\xi _{ki}$ are introduced by permitting isolated points and a non-linear mapping function $\phi $ is introduced by transforming the training samples into a high dimensional feature space to solve nonlinear separation problems. So seeking the minimum bounding hyper-sphere for each class is to find the minimum bounding hyper-sphere enclosing all the training examples of that class. This process can be computed by solving the following constrained quadratic optimization problem in Eq. (1).

$$\begin{aligned} \begin{aligned}&\underset{c_{k},R_{k}}{min}\, R_{k}^{2}+C_{k}\sum _{i=1}^{m}\xi _{ki}\\&s.t. \\&\left\| \phi \left( \textit{\textbf{x}}_{\textit{\textbf{ i }}} \right) -c_{k}\right\| ^{2}\le R_{k}^{2}+\xi _{k}\\&\xi _{i}\ge 0,i=1,\cdots ,l_{k} \end{aligned} \end{aligned}$$

(1)

For class k, its minimum bounding hyper-sphere $S_{k}$ is characterized by its center $c_{k}$ and radius $R_{k}$. And $C_{k}$ is the penalty factor and $\xi _{ki}\ge 0$ are slack variables.

By introducing Lagrange multipliers, Lagrange polynomial can be written as Eq. (2).

$$\begin{aligned} L(R_{k},c_{k},\xi _{i},\alpha _{i},\beta _{i})=R_{k}^{2}+C_{k}\sum _{i=1}^{l_{k}}\xi _{ki}-\sum _{i=1}^{l_{k}}\alpha _{i}\left( R_{k}^{2}+\xi _{ki}-\left\| \phi (\textit{\textbf{x}}_{i})-c_{k} \right\| ^{2} \right) -\sum _{i=1}^{l_{k}}\beta _{i}\xi _{ki} \end{aligned}$$

(2)

By taking the partial directives of L with respect to $R_{k}$,$c_{k}$ and $\xi _{i}$ and substituting them back to Eq. (2), the original optimization problem becomes its dual optimization problem in the following format as Eq. (3).

$$\begin{aligned} \begin{aligned}&\underset{\alpha _{i}}{min}\, \sum _{i,j=1}^{l_{k}}\alpha _{i}\alpha _{j}K\left( \textit{\textbf{x}}_{i},\textit{\textbf{x}}_{j}\right) -\sum _{i=1}^{l_{k}}\alpha _{i}K\left( \textit{\textbf{x}}_{i},\textit{\textbf{x}}_{i} \right) \\&s.t.\\&\sum _{i=1}^{l_{k}}\alpha _{i}=1\\&0\le \alpha _{i}\le C_{k},i=1,\cdots ,l_{k} \end{aligned} \end{aligned}$$

(3)

In Eq. (3), the kernel trick is adopted to compute inner products in the feature space, that is, $K(\textit{\textbf{x}}_{i},\textit{\textbf{x}}_{j})=\phi (\textit{\textbf{x}}_{i})\cdot \phi (\textit{\textbf{x}}_{j})$. Support vectors are the vectors $\textit{\textbf{x}}_{i}$ with $\alpha _{i}>0$. The Lagrange multipliers get the solutions after solving Eq. (3). So the center can be computed by Eq. (4) and the resulting decision function can be computed as Eq. (5). Then the radius $R_{k}$ can be computed by equating $f_{k}(\textit{\textbf{x}})$ to zero for any support vector. For class k, its minimum bounding hyper-sphere $S_{k}$ is obtained by the solution of its center and radius.

$$\begin{aligned} c_{k}^{2}=\sum _{i,j=1}^{l_{k}}\alpha _{i}\alpha _{j}K(\textit{\textbf{x}}_{i},\textit{\textbf{x}}_{j}) \end{aligned}$$

(4)

$$\begin{aligned} f_{k}(\textit{\textbf{x}})=sgn(R_{k}^{2}-\sum _{i,j=1}^{l_{k}}\alpha _{i}\alpha _{j}K(\textit{\textbf{x}}_{i},\textit{\textbf{x}}_{j}))+2\sum _{i=1}^{l_{k}}K(\textit{\textbf{x}}_{i},\textit{\textbf{x}})-K(\textit{\textbf{x}},\textit{\textbf{x}})) \end{aligned}$$

(5)

Based on Eq. (5), the new point $\textit{\textbf{x}}$ falls inside of the hyper-sphere if $f_{k}(\textit{\textbf{x}})>0$. $\textit{\textbf{x}}$ falls outside of the hyper-sphere if $f_{k}(\textit{\textbf{x}})<0$ and $\textit{\textbf{x}}$ lies on the hyper-sphere if $f_{k}(\textit{\textbf{x}})=0$.

2.2 Analysis of Data in the Intersection of Hyper-Spheres

Ideally, all hyper-spheres are independent and each test sample is correctly classified by one hyper-sphere decision function. But it can happen that two or more hyper-spheres intersect, that is, one sample falls inside several hyper-spheres. Or a new test sample falls outside of all hyper-spheres. How to correctly classify these data points influence the accuracy performance of the resulting classifier. In Fig. 1, the minimum bounding sphere for class 1 and class 2 are $S_{1}$ and $S_{2}$. $S_{1}^{{}'}$ of class 1 belongs to sphere $S_{2}$ and $S_{2}^{{}'}$ of class 1 belongs to sphere $S_{1}$. $\textit{\textbf{x}}_{1}$ belongs to $S_{1}^{{}'}$ and $\textit{\textbf{x}}_{2}$ belongs to $S_{2}^{{}'}$ and they are both support vectors. For example, $\textit{\textbf{x}}_{2}$ is a support vector of class 2, so $\textit{\textbf{x}}_{2}$ belongs to class 2. But based on the decision function, $f_{2}(\textit{\textbf{x}}_{2})=0$ and $f_{1}(\textit{\textbf{x}}_{2})>0$, so $\textit{\textbf{x}}$ belongs to class 1. Obviously, it is the wrong classification result. So when two or more hyper-spheres intersect, only Eq. (5) is used as decision rule may lead to wrong decision results.

To solve this problem, one sub-hyper-sphere support vector machine is put for-ward to classify samples in the intersections in [9]. In the research, same error data hyper-sphere (data points belong to the same class of the mother hyper-sphere) and different error data hyper-sphere (data points belong to different class of the mother hyper-sphere) are introduced. The decision process is completed by them. In Fig. 1, S-sphere and D-sphere are same error data sub-hyper-sphere and different error data hyper-sphere for class 1. If a new test sample lies in the intersection, S-sphere and D-sphere are used as the classification rule to get the right class.

But by introducing multiple QP optimization problems again, its computation complexity increases. To reduce computation complexity of QP optimization problem, new decision rules are put forward in this paper. Given the training data set, there will be three cases for position of the new test sample, that is, inside one hyper-sphere or in the intersections or outside of all spheres after computing hyper-spheres for all classes. It is easy to get decision class for the inclusion case. If a new test sample $\textit{\textbf{x}}_{2}$ falls outside of all spheres, Eq. (6) is adopted as its class j decision ($j=1,\cdots ,m$).

$$\begin{aligned} min(\frac{d_{xj}^{2}}{R_{j}^{2}}-1),where\, d_{xj}^{2}=\left\| \phi (\textit{\textbf{x}})-c_{j} \right\| ^{2}=K(\textit{\textbf{x}},\textit{\textbf{x}})-2\sum _{i=1}^{l_{k}}\alpha _{i}K(\textit{\textbf{x}},\textit{\textbf{x}}_{i})+\sum _{i,j=1}^{l_{k}}\alpha _{i}\alpha _{j}K(\textit{\textbf{x}}_{i},\textit{\textbf{x}}_{j}) \end{aligned}$$

(6)

If a new test sample belongs to the intersections, there are three cases for different data distribution as following.

(1) For the first case, the intersection hyper-plane can separate samples of two classes directly. Intersection hyper-plane is easy to get from subtraction of two spheres equations.

(2) For the second case, intersection hyper-plane cannot separate samples of two classes directly, so binary optimal plane is used as separation plane for linear and nonlinear cases as shown in Fig. 2.

(3) For the third case, there is only one class data in the intersection. For this case, exclusion method is adopted as the decision rule.

As can be seen from the reference [6, 9], training time complexity for the sphere-structured SVM is O($n^{2}$). Testing time for the sphere-structured SVM is decided by its decision process. For our new hybrid hyper-sphere SVM, its training time complexity is O($n^{2}$). In testing phase, time complexity may be O(1) for case of only one class data in the intersection, O(n) for case of binary classification plane as separation plane and O($n^{2}$) for case of nonlinear classification plane as separation plane. The average time complexity is lower than that of sub-hyper-sphere SVM with O($n^{2}$).

2.3 Implementation Details of the Proposed Method

The proposed hyper-sphere support vector classifier with hybrid decision strategy follows three steps to complete classification process, which are listed as Algorithm 1. And testing process is described as Algorithm 2. Suppose there are m classes needs to be classified.

3 Experimental Results and Analysis

In this section some experimental results are given to verify the efficiency of the new improved classification rule and compare it with hyper-sphere SVMs and sub-hyper-sphere SVMs. The first experimental datasets IRIS and glass come from the UCI [10] Repository of machine learning databases. The second data comes from one real application of action recognition in videos.

Table 1. Comparison result for IRIS classification.

Full size table

For three classes of IRIS data set, all samples are randomly grouped as three parts and one part with number 15 is used as test, the other two 135 as training set. For glass data set, a subset with three classes is selected and the same data pre-processing is adopted as paper [6]. For hyper-sphere SVMs and sub-hyper-sphere SVMs [9], $ C=100,\sigma =1.25$ is used. For our new approach, RBF kernel with parameter optimization search are used to get the best classification accuracy. Experimental results are shown in Table 1. All experiments were done for 10 times and data in Table 1 is the average value of each indicator. Experimental results show our new hybrid classification rule is effective and easy to compute for the simple multi-class classification problems.

The second experimental data set is one action recognition system from videos collecting from the Internet. Extracted features consist of foregrounds extraction, morphology operations and shape feature, KLT tracking points and so on. These features are input into the proposed hybrid hyper-sphere SVM classifiers. The dimension of the input image is 20 dimensions, the number of categories is 6, and the size of data set is 9000. Among 9000 samples, data1 consists of 3000 randomly selected samples for each class and 600 testing samples, and data2 with 3000 and 600, data3 with 7800 and 1200 as data1.The experiment was repeated 10 times and the average value was selected as the final experimental result. The accuracy of our method is 86.15, 90.11 and 92.15 for three datasets respectively. For the other two methods, the highest accuracy is fuzzy hyper-sphere SVM, the values are 85.04, 89.19 and 90.67 for three datasets and 83.98, 88.30, 89.33 for sub-sphere SVM. Performance of our new method is better than the other two methods.

4 Conclusions

To improve classification performance of traditional hyper-sphere SVM, one hybrid decision strategy for hyper-sphere support vector classifier is put forward in this paper. To get high classification performance for test samples and decreases computation complexity of QP optimization problems, four decision rules are discussed and detailed algorithm is given. Results on benchmark data and real application data show our hybrid decision rule leads to better generalization accuracy than the existing methods, decreasing computation complexity and saving training time.

References

Mohanty, S.: Speaker identification using SVM during oriya speech recognition. Int. J. Image, Graph. Signal Process. 10, 28–36 (2015)
Google Scholar
Sun, A., Lim, E.P., Liu, Y.: On strategies for imbalanced text classification using SVM: a comparative study. Decis. Support Syst. 48(1), 191–201 (2009)
Article Google Scholar
Hsu, C.W., Lin, C.J.: A comparison of methods for multi-class support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Article Google Scholar
Vural, V., Dy, J.G.: A hierarchical method for multi-class support vector machines. In: ACM International Conference on Machine Learning, pp. 831–838 (2004)
Google Scholar
Chmielnicki, W., Stapor, K.: Combining one-versus-one and one-versus-all strategies to improve multiclass SVM classifier. In: Proceedings of the 9th International Conference on Computer Recognition Systems, pp. 37–45 (2016)
Google Scholar
Zhu, M., Wang, Y., Chen, S., Liu, X.: Sphere-structured support vector machines for multi-class pattern recognition. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds.) RSFDGrC 2003. LNCS (LNAI), vol. 2639, pp. 589–593. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39205-X_95
Chapter Google Scholar
Liu, S., Chen, P., Li, K.Q.: Multiple sub-hyper-spheres support vector machine for multi-class classification. Int. J. Wavelets Multiresolut. Inf. Process. 12(3), 1450035 (2014)
Article MathSciNet Google Scholar
Liu, S., Chen, P., Yun, J.: Fuzzy hyper-sphere support vector machine for pattern recognition. ICIC Express Lett. 9(1), 87–92 (2015)
Google Scholar
Wu, Q., Jia, C.Y., Zhang, A.F.: An improved algorithm based on sphere structure SVMs and simulation. J. Syst. Simul. (Chinese) 20(2), 345–348 (2008)
Google Scholar
UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/. Accessed 1 May 2020

Download references

Acknowledgement

This research was funded by the Natural Science Foundation of Liaoning Province, China (grant no. 2019-ZD-0175).

Author information

Authors and Affiliations

Dalian Minzu University, Dalian, 116600, China
Shuang Liu
Dalian Neusoft University of Information, Dalian, 116023, China
Peng Chen

Authors

Shuang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Chen .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, S., Chen, P. (2020). Hyper-Sphere Support Vector Classifier with Hybrid Decision Strategy. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-63820-7_8
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hyper-Sphere Support Vector Classifier with Hybrid Decision Strategy

Abstract

Similar content being viewed by others

KNN-based maximum margin and minimum volume hyper-sphere machine for imbalanced data classification

A spheres-based support vector machine for pattern classification

Margin setting algorithm for pattern classification via spheres

Keywords

1 Introduction

2 Our Method

2.1 Mathematical Description of Hyper-Sphere Support Vector Classifier

2.2 Analysis of Data in the Intersection of Hyper-Spheres

2.3 Implementation Details of the Proposed Method

3 Experimental Results and Analysis

4 Conclusions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Hyper-Sphere Support Vector Classifier with Hybrid Decision Strategy

Abstract

Similar content being viewed by others

KNN-based maximum margin and minimum volume hyper-sphere machine for imbalanced data classification

A spheres-based support vector machine for pattern classification

Margin setting algorithm for pattern classification via spheres

Keywords

1 Introduction

2 Our Method

2.1 Mathematical Description of Hyper-Sphere Support Vector Classifier

2.2 Analysis of Data in the Intersection of Hyper-Spheres

2.3 Implementation Details of the Proposed Method

3 Experimental Results and Analysis

4 Conclusions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation