An Output Grouping Based Approach to Multiclass Classification Using Support Vector Machines

Zhao, Xuan; Guan, Steven; Man, Ka Lok

doi:10.1007/978-981-10-1536-6_51

Xuan Zhao⁵,
Steven Guan⁵ &
Ka Lok Man⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 393))

1127 Accesses
2 Citations

Abstract

Support Vector Machine (SVM) classifiers are binary classifiers in nature, which have to be coupled/assembled to solve multi-class problems. One-Versus-Rest (1-v-r) is a fast and accurate method for SVM multiclass classification. This paper investigates the effect of output grouping on multiclass classification with SVM and offers an even faster version of 1-v-r based on our output grouping algorithm.

Access provided by Autonomous University of Puebla. Download conference paper PDF

DCSVM: fast multi-class classification using support vector machines

Article 19 July 2019

Optimizing Support Vector Machines for Multi-class Classification

Linear Support Vector Machines

Keywords

1 Introduction

Support Vector Machine (SVM) is a popular supervised machine learning algorithm for solving problems in classification and regression. An SVM takes a set of training samples (known observations belonging to a certain class/category) and yields a set of decision functions to determine which class a test sample (unknown observation) belongs to [1].

However a novel SVM classifier only solves two-class problems. To solve a multi-class problem, multiple binary SVM classifiers are coupled or assembled as a multi-class SVM. Such a decomposition method plays an important role on the multi-class SVM’s generalization performance and time consumption. One-Versus-One (1-v-1), One-Versus-Rest (1-v-r) and ECOC are three of the most commonly used ensemble methods [2, 3]. Empirical result shows 1-v-r is the fastest method in predicting unknown samples [4–6] with good performance.

This paper and investigates the effect of output (class) grouping, which were previously researched in neural networks [7–9], in multiclass SVM classifiers. Finally we offer an even faster version of 1-v-r with the help of the output-grouping algorithm.

2 Background

Output grouping is a learning strategy being applied in neural networks to improve the accuracy and reduce training cost. The strategy is based on observation of class correlation. In previous study in neural networks, strong correlation between two classes \( i,j \) implies high similarity. Then \( i,j \) can be trained together for better precision [8].

2.1 One-Versus-Rest (1-v-R)

The idea of 1-v-R classification is to build N classifiers for each of the classes. In the training process all the training data is used for every one of the N classifiers. If a classifier is built for class i then the classifier is trained with the samples that belong to class i against the rest of the samples from the training data set.

Once the model is built, an unknown sample goes through all the N classifiers. Each classifier decides whether or not this sample belongs to the corresponding class. Ideally one and only one classifier should respond ‘yes’ to the unknown sample and the corresponding class will be the class that the sample belongs to, otherwise there is either a clash or a total miss. Under such a circumstance the unknown sample goes to the class with the most training samples.

2.2 Output Grouping

For SVM classifiers, if samples from two classes are ‘mixed’ together then it is difficult for SVM to draw a line or plane to distinguish the two classes. In the case of 1-v-r, suppose a class \( i \) needs to be distinguished from the remaining classes (the ‘rest’). If another class \( j \) among the rest is highly similar to \( i \) then it will also be difficult to tell \( i \) and the rest apart, even if other classes among the rest (excluding class \( j \)) are easy to be separated from \( i \). As a result, such interaction, correlation or similarity between classes \( i \) and \( j \) would contribute to a complex but less accurate model.

We tested this hypothesis using the UCI Iris dataset. We duplicated all samples from class 3 and labelled them as class 4, train a linear SVM model using 1-v-r method, ran a prediction on training set and ran a 10-fold cross prediction with classifier and the data set. We then merged samples from classes 3 and 4 and labelled them as a new class (grouping), trained and validated an additional 3-versus-4 classifier and merged the results from both classifiers (Table 1).

Table 1 Iris 1-v-r SVM classification performance

Full size table

If the classifier cannot tell 3 and 4 apart, then the best chance of classification error between 3 and 4 should be 50 % (random guessing). Then the cross validated accuracy should be around 71 % instead of 65.86 %. In other words, the strong similarity between class 3 and 4 should not but has affected the performance of other classification tasks. While with class 3 and 4 grouped, we seem to eliminate such undesirable interference.

2.3 Output-Grouping-One-Versus-Rest (OG) Algorithm

We propose an algorithm, OG, which aggregates outputs (classes) by their ‘similarity’ in groups, then treats the groups as new classes and performs 1-v-r learning.

The algorithm has six components: (1) similarity scorer, (2) class grouper, (3) 1-v-r trainer, (4) intra-group classifier trainer, (5) 1-v-r predictor and (6) intra-group predictor (Figs. 1 and 2; Table 2).

Table 2 Grouping algorithm of OG

Full size table

OG attempts to group “hard-to-tell-apart” classes together so that it should be able to reduce the model complexity, improved the accuracy and further increase the prediction speed. However the speed-up is at the cost of an additional training/prediction, grouping and result merging.

3 Experiments

The OG was evaluated on seven different data sets: UCI Iris Plant, UCI Glass Identification, UCI Vowel, Statlog Handwritten Letters, Statlog Satimage, USPS and MNIST [11]. The values in each data set have been scaled to [0, 1].

Each data set was trained and tested with five multiclass SVM algorithms: OG, OGPG, 1-v-r, 1-v-1 and ECOC [11]. OG is the implementation of our proposed algorithm under the scikit-learn framework, while 1-v-r, 1-v-1 and ECOC are the existing implementations in scikit-learn [12]. All the algorithms use SVC [2] to train and test the SVM models.

To further address the effectiveness of grouping, we introduce a variant of OG, OG with pre-computed group (OGPG), by supplying the algorithm pre-computed groups.

The experiment for each method ran 50 times for the averaged measurements. Each time the model was trained by randomly-spit 2/3 of the data set and validated on the remaining 1/3 (Table 3).

Table 3 Experiment data sets, parameters and pre-computed groups (OGPG only)

Full size table

4 Results

The accuracy is close for all algorithms except for ECOC in some occasions across all the datasets. OG and OGPG perform slightly better than 1-v-r algorithm on smaller datasets (iris, glass and vowel). However we cannot conclude which algorithm is the best in terms of accuracy (Table 4).

Table 4 Averaged accuracy, model size and time consumption of OG, 1-v-r, 1-v-1 and ECOC

Full size table

On large data sets (satimage, usps and mnist) OG and OGPG slightly reduced the training time and significantly reduced the prediction time by 0.8–29.2 % and 5.4–30.6 % respectively, compared with 1-v-r method. OGPG is the fastest in terms of prediction, seconded by OG.

On smaller data sets, OG and OGPG’s worse speed performance were caused by the fact that the overhead of additional grouping and merging dominates the time consumption of SVM training and prediction.

Even though OG performs an additional training/prediction to determine the grouping on another 33–50 % of the training set, its training time is still less than 1-v-r. The model size is also slightly decreased for all data sets except for usps and mnist.

5 Discussions and Conclusions

To our disappointment neither OG nor OGPG did improve the accuracy. Our speculation during the experiment shows that the accuracy tends to be saturated at a certain point when the number of training samples is large enough or the parameters of SVM are fairly well optimised.

We may conclude that OG and OGPG improve the speed performance of 1-v-r method even at the cost of performing additional classification tasks.

Our findings show that output grouping is effective at reducing the training complexity of 1-v-r multi-class classification tasks using support vector machines. Nevertheless output grouping can be used to boost the prediction speed of multiclass SVMs with no extra cost. We are now investigating the cause of the saturated accuracy and working on combining input and output grouping methods [8, 9, 13–16] for better classification accuracy.

References

Chang C-C, Lin C-J (2011) Libsvm. ACM Trans Intell Syst Technol 2(3):1–27
Article Google Scholar
svm@scikit-learn.org (2016). Available: http://scikit-learn.org/stable/modules/svm.html
Hsu C-W, Lin C-J (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Networks 13(2):415–425
Article Google Scholar
Mehra N, Gupta S (2013) Survey on multiclass classification methods. Int J Comput Sci Inf Technol 4(4):572–576
Google Scholar
Rifkin R (2008) Multiclass classification. Lect Slides. February
Google Scholar
Tewari A, Bartlett PL (2007) On the Consistency of multiclass classification methods. J Mach Learn Res 8:1007–1025
MathSciNet MATH Google Scholar
Guan S-U, Li S (2002) Parallel growing and training of neural networks using output parallelism. IEEE Trans Neural Netw 13(3):542–550
Article Google Scholar
Yang S, Guan S-U, Guo SJ, Zhao LF, Li WF, Xue HX (2013) Neural Network output partitioning based on correlation. J Clean Energy Technol 1(4):342–345
Article Google Scholar
Yang S, Guan SU, Li WF, Zhao LF (2013) Low-interference output partitioning for neural network training. J Clean Energy Technol 1(4)
Google Scholar
LIBSVM data: classification, regression, and multi-label. Available https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/. Accessed on 03 Feb 2016
Dietterich TG, Bakiri G (1994) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286
MATH Google Scholar
multiclass@scikit-learn.org (2016). Available: http://scikit-learn.org/stable/modules/multiclass.html
Guo SJ, Guan S-U, Yang S, Li WF, Zhao LF, Song JH (2013) Input partitioning based on correlation for neural network learning. J Clean Energy Technol 1(4):335–338
Article Google Scholar
Guo S, Guan S-U, Li W, Man KL, Liu F, Qin AK (2013) Input space partitioning for neural network learning. Int J Appl Evol Comput 4(2):56–66
Article Google Scholar
Guo S, Guan SU, Li W, Zhao L, Song J, Cao M (2012) Promotion-based input partitioning of neural network. In: International conference on computer, communication, automation and control 2012 (CCAC 2012)
Google Scholar
Guo S, Guan SU, Li W, Zhao L, Song J, Cao M (2014) Promotion-based input partitioning of neural network. In: Proceedings of the 9th international symposium on linear drives for industry applications, vol 3, pp 179–186
Google Scholar
Bordes A, Ertekin S, Weston J, Bottou L (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res 6:1579–1619
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Xi’an Jiaotong-Liverpool University, Suzhou, China
Xuan Zhao, Steven Guan & Ka Lok Man

Authors

Xuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Steven Guan
View author publications
You can also search for this author in PubMed Google Scholar
Ka Lok Man
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuan Zhao .

Editor information

Editors and Affiliations

Computer Science and Engg., Seoul National Uni. of Sci. & Techn. Computer Science and Engg., Seoul, Korea (Republic of)
James J. (Jong Hyuk) Park
School of Comp. Science and Technology, Huazhong Univ. of Science and Technology School of Comp. Science and Technology, Wuhan, China
Hai Jin
Dept. of Multimedia Eng., Dongguk University Dept. of Multimedia Eng., Seoul, Korea (Republic of)
Young-Sik Jeong
College of Computer & Information Sci, King Saud University College of Computer & Information Sci, Riyadh, Saudi Arabia
Muhammad Khurram Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, X., Guan, S., Man, K.L. (2016). An Output Grouping Based Approach to Multiclass Classification Using Support Vector Machines. In: Park, J., Jin, H., Jeong, YS., Khan, M. (eds) Advanced Multimedia and Ubiquitous Engineering. Lecture Notes in Electrical Engineering, vol 393. Springer, Singapore. https://doi.org/10.1007/978-981-10-1536-6_51

Download citation

DOI: https://doi.org/10.1007/978-981-10-1536-6_51
Published: 30 August 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1535-9
Online ISBN: 978-981-10-1536-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Output Grouping Based Approach to Multiclass Classification Using Support Vector Machines

Abstract

Similar content being viewed by others

DCSVM: fast multi-class classification using support vector machines