Abstract
Traditional plane-based clustering methods measure the within-cluster or between-cluster scatter by linear, quadratic or some other unbounded functions, which are sensitive to the samples far from the cluster center. This paper introduces the ramp functions into plane-based clustering and proposes a ramp-based twin support vector clustering (RampTWSVC). RampTWSVC is very robust to the samples far from the cluster center, because its within-cluster and between-cluster scatters are measured by the bounded ramp functions. Thus, it is easier to find the intrinsic clusters than other plane-based clustering methods. The nonconvex programming problem in RampTWSVC is solved efficiently through an alternating iteration algorithm, and its local solution can be obtained in a finite number of iterations theoretically. In addition, its nonlinear manifold clustering formation is also proposed via a kernel trick. Experimental results on several benchmark datasets show the better performance of our RampTWSVC compared with other plane-based clustering methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Clustering that discovers the relationship among data samples has been applied to many real-world problems, e.g., marketing, text mining and web analysis [1,2,3,4,5,6,7]. In particular, the partition-based clustering methods [1, 8, 9] are widely used in real applications for their simplicity, e.g., the classical kmeans [10]. The partition-based clustering supposes there are k clusters and each cluster has a cluster center. Then, the data samples are assigned to these clusters according to the distances from the cluster centers. Kmeans supposes the cluster centers are some points in the sample space, but the cluster centers may be some other types in the datasets, e.g., planes in the “cross-planes” dataset [11,12,13]. As an extension of point center, the plane center is able to discover comprehensive structures in the sample space. Therefore, the partition-based clustering with planes as cluster centers, called plane-based clustering, has been studied widely [14,15,16,17,18,19,20].
In plane-based clustering, starting from an initialization, the clustering result is obtained iteratively through the cluster assignment and cluster update steps. In the cluster assignment step, the samples are assigned to the clusters by the distances from the cluster center planes. In the cluster update step, the cluster center planes are the solutions of k optimization problems, where different plane-based methods have different optimization problems. In general, the optimization problems consist of two aspects: within-cluster compactness and between-cluster separation. The first proposed k-plane clustering (kPC) [14] measured the within-cluster scatter by a quadratic function, and another proximal planes clustering (PPC) [15, 16] measured the within-cluster and between-cluster scatters by the same quadratic function. Subsequently, twin support vector clustering (TWSVC) [17] improved the between-cluster scatter by hinge function [12, 21], robust twin support vector clustering (RTWSVC) [18] and fast robust twin support vector k-plane clustering (FRTWSVC) [18] improved them by absolute function. Figure 1 exhibits the metric functions used in these methods. Since these plane-based clustering methods measure the within-cluster and/or between-cluster scatter by some unbounded functions, some samples may influence the cluster center planes significantly, e.g., the samples far from their corresponding cluster center.
To further reduce the influence of the samples far from the cluster center, a better choice is to hire some bounded functions to measure the within-cluster and between-cluster scatters instead of unbounded functions. Therefore, we consider to hire the ramp functions [22] into the plane-based clustering, where the ramp functions are bounded piecewise linear functions and have been applied in semi-supervised and supervised learning successfully [23,24,25]. In this paper, a ramp-based twin support vector clustering method, called RampTWSVC, is proposed by introducing the ramp functions for the measurement of the within-cluster and between-cluster scatters (see Fig. 1). RampTWSVC reduces the influence of the samples far from the cluster centers to obtain the stable cluster center planes. The optimization problems in our RampTWSVC are some more complicated nonconvex programming problems than other plane-based clustering methods. These problems are recast to some mixed-integer programming problems with the same formation. Then, an iterative algorithm is proposed to solve the mixed-integer programming problem, and we prove that the iterative algorithm terminates in a finite number of iterations at a local solution. In addition, RampTWSVC is extended to nonlinear case via a kernel trick to cope with the manifold clustering [26, 27]. Experimental results on the benchmark datasets show the better performance of the proposed RampTWSVC compared with other plane-based clustering methods.
2 Review of plane-based clustering
In this paper, we consider m data samples \(\{x_1,\,x_2,\ldots ,x_m\}\) in the n-dimensional real vector space \({\mathbb {R}}^{n}\). Assume these m samples belong to k clusters with their corresponding labels \(y\in \{1,\,2,\ldots ,k\}\), and they are represented by a matrix \(X=(x_1,x_2,\ldots ,x_m)\in {\mathbb {R}}^{n\times m}\). We further organize the samples from X with the current label i into a matrix \(X_i\) and those with the rest labels into a matrix \({\hat{X}}_i\) with \(i=1,\,2,\ldots ,k\). For readers’ convenience, the symbols X, \(X_i\), and \({\hat{X}}_i\) will also refer to the corresponding sets, depending on the specific context they appear. For example, the symbol X can be comprehended as a matrix belonging to \(R^{n\times m}\) or a set that contains m samples. The cluster center planes are defined as
where \(w_i\in {\mathbb {R}}^n\) is the weight vector and \(b_i\in {\mathbb {R}}\) is the bias.
The following plane-based clustering methods share the same clustering procedures. Starting from an initial assignment of the m samples into k clusters, the cluster center planes (1) are constructed by the current cluster assignment. Once the cluster center planes are obtained, the m samples are reassigned by
where \(|\cdot |\) denotes the absolute value. The cluster center planes and the sample labels are updated alternately until some termination conditions are satisfied. In the following, we briefly describe the different constructions of the cluster center planes in kPC, PPC, TWSVC and RTWSVC.
2.1 kPC and PPC
kPC [14] wishes each cluster center plane close to the current cluster samples. Further on, PPC [15] considers that it should also be far away from different cluster samples. Therefore, the ith (\(i=1,\ldots ,k\)) cluster center plane in PPC is constructed by considering the problem
where \(||\cdot ||\) denotes the \(L_2\)-norm, e is a column vector of ones with an appropriate dimension and \(c>0\) is a user-set parameter, while the optimization problem in kPC does not have the second term in the objective of (3).
From the objective of (3), it is obvious that the samples from the within-cluster receive quadratic costs, and the samples from the between-cluster receive quadratic rewards. Therefore, the samples far from the cluster center will have great impact on the potential cluster center planes, and PPC may obtain a cluster center plane which is far from the current cluster.
2.2 TWSVC and RTWSVC
In contrast, TWSVC [17] degrades the rewards of the samples from different clusters by considering the problem with \(i=1,\ldots ,k\) as
where \((\cdot )_+\) replaces the negative parts by zeros.
From the second part of (4), it is clear that the samples from the between-cluster would have a certain impact on the cluster center plane only with deviation in [0, 1). Thus, TWSVC is more robust than PPC [17]. However, the issue of within-cluster also exists because of the quadratic function in the first part of (4). Thus, RTWSVC [18] was proposed to decrease its influence by replacing the \(L_2\)-norm in (4) with the \(L_1\)-norm. Note that the metric function of within-cluster of RTWSVC is unbounded from Fig. 1. In order to eradicate the influence of the samples far from the cluster center, it is reasonable to hire a bounded metric function for the within-cluster, whose principle is similar to the metric function for the between-cluster used in TWSVC.
3 RampTWSVC
Similar to the above plane-based clustering methods mentioned in Sect. 2, our RampTWSVC starts with initial sample labels and then computes each cluster center plane and the sample labels alternately, until some termination conditions are satisfied. In the following, we consider to obtain one of the cluster center planes for the given samples with their labels.
3.1 Formation
To obtain the ith (\(i=1,\ldots ,k\)) cluster center plane, our RampTWSVC considers the following problem
where \(c_1,c_2>0\) are the trade-offs. \(R_{1}(x)\) and \(R_{2}(x)\) are two piecewise linear functions w.r.t. the deviation \(|f_i(x)|=|w_i^\top x+b_i|\) (see Fig. 1) as
and
where \(\Delta \in [0,1)\) and \(s\in (-1,0]\) are two constants to control the function form (for instance, \(\Delta =0.3\) and \(s=-0.2\) [23]). \(||w_i||\) and \(||b_i||\) are the regularization terms to control the problem complexity.
It is obvious that both of the functions \(R_1(x)\) (for the within-cluster \(X_i\)) and \(R_2(x)\) (for the between-cluster \({\hat{X}}_i\)) have upper and lower bounds. Thus, the samples much further from the cluster center plane do not have greater impact on the cluster center plane. The above property indicates our RampTWSVC is more robust than PPC, TWSVC and RTWSVC.
In the following, we extend our RampTWSVC to nonlinear manifold clustering, and the solutions to the linear and nonlinear RampTWSVC are elaborated in the next subsection.
The plane-based clustering methods can be extended to nonlinear manifold clustering easily via kernel tricks [13, 28]. For instance, by introducing a pre-defined kernel function \(K(\cdot ,X)\) [11, 17, 21], the plane-based nonlinear clustering seeks k cluster center manifolds in the kernel-generated space as
Then, the nonlinear RampTWSVC considers introducing the ramp functions into the plane-based nonlinear clustering. By replacing \(f_i(x)\) with \(g_i(x)\) in (6) and (7), and substituting them into (5), one can easily obtain the optimization problems to construct the cluster center manifolds (8). When we obtain the k cluster centers (8), a sample x is assigned to a cluster given by
The procedure of the nonlinear case is the same as the linear one, so the details are omitted.
3.2 Solution
In this subsection, we study the solution to problem (5). The corresponding problem in nonlinear RampTWSVC is similar to the one in linear case. For convenience, let \(u_i=(w_i^\top ,b_i)^\top\), \(Z_i\) be a matrix whose jth column \(z_j\) is \(x_j\) with an additional feature 1 (where the corresponding \(x_j\) belongs to the ith cluster), and let \({\hat{Z}}_i\) be a matrix whose column is similar to \(z_j\) (where the corresponding \(x_j\) does not belongs to the ith cluster). After some algebra, problem (5) is recast to
It is easy to see that the above problem is a nonconvex programming problem because of the concave part \(-(\cdot )_+\). By introducing two auxiliary vectors \(p_1\in \{-1,0,1\}^{m_i}\) and \(p_2\in \{-1,0,1\}^{m-m_i}\) (where \(m_i\) is the sample number of the current ith cluster), the above problem is equivalent to the following mixed-integer programming problem
where \(p_1(j)\) and \(p_2(j)\) are the corresponding jth elements of \(p_1\) and \(p_2\), respectively.
Here, we propose an alternating iteration algorithm to solve mixed-integer programming problem (11). Starting with an initialized \(u_i^{(0)}\), it is easy to calculate \(p_1^{(0)}\) and \(p_2^{(0)}\) by the constraints of (11). For fixed \(p_1^{(t-1)}\) and \(p_2^{(t-1)}\) (\(t=1,2,\ldots\)), (11) becomes an unconstrained convex problem and its solution can be obtained by many algorithms easily (e.g., sequential minimal optimization (SMO) [29] and fast Newton-Amijio algorithm [30]). Once \(u_i^{(t)}\) is obtained, \(p_1^{(t)}\) and \(p_2^{(t)}\) are updated again. The loop will be continued until the objective of (11) does not decrease any more.
Theorem 3.1
The above alternating iteration algorithm terminates in a finite number of iterations at a local optimal point, where a local optimal point of the mixed-integer programming problem (11) is defined as the point\((u_i^*,p_1^*,p_2^*)\)if\(u_i^*\)is the global solution to problem (11) with fixed\((p_1^*,p_2^*)\)and vice versa.
Proof
From the procedure of the alternating iteration algorithm, it is obvious that the global solutions to (11) with fixed \(u_i\) or \((p_1,p_2)\) are obtained in iteration. Since there are a finite number of ways to select \(p_1\) and \(p_2\), there are two finite numbers \(0<r_1<r_2\) such that \((p_1^{(r_1)},p_2^{(r_1)})=(p_1^{(r_2)},p_2^{(r_2)})\). Thus, we have \(u_i^{(r_1)}=u_i^{(r_2)}\). That is to say, the objective values are equal in the \(r_1\)th and \(r_2\)th iterations. Since \(p_1^\top Z_i^\top u_i\le 0\) and \(p_2^\top {\hat{Z}}_i^\top u_i\le 0\) always hold, the objective value of (11) keeps nonincreasing in iteration. Therefore, the objective is invariant after the \(r_1\)th iteration, and then the algorithm would terminate at the \(r_1\)th iteration.
Let us consider the point \((u_i^{(r_1)},p_1^{(r_1)},p_2^{(r_1)})\). From the above proof, we have \(G(u_i^{(r_1)},p_1^{(r_1)},p_2^{(r_1)})=G(u_i^{(r_1)},p_1^{(r_1+1)},p_2^{(r_1+1)})\), where \(G(\cdot )\) is the objective value of (11). If there is more than one global solution to (11) with fixed \(u_i\), we always select the same one for the same \(u_i\). Thus, we have \((p_1^{(r_1)},p_2^{(r_1)})=(p_1^{(r_1+1)},p_2^{(r_1+1)})\), which indicates \((u_i^{(r_1)},p_1^{(r_1)},p_2^{(r_1)})\) is a local optimal point. \(\square\)
4 Experimental results
In this section, we analyze the performance of our RampTWSVC compared with kmeans [10], kPC [14], PPC [15], TWSVC [17], RTWSVC [18] and FRTWSVC [18] on several benchmark datasets [31, 32]. All the methods were implemented by MATLAB 2017a on a PC with an Intel Core Duo Processor (double 4.2 GHz) with 16GB RAM. The parameters c in PPC, TWSVC, RTWSVC, FRTWSVC and \(c_1\), \(c_2\) in RampTWSVC were selected from \(\{2^i|i=-8,-7,\ldots ,-1\}\). For nonlinear case, the Gaussian kernel \(K(x_1,x_2)=\exp \{-\mu ||x_1-x_2||^2\}\) [28] was used, and its parameter \(\upmu\) was selected from \(\{2^i|i=-10,-9,\ldots ,1\}\). The random initialization was used on kmeans, and the nearest neighbor graph (NNG) initialization [17] was used on other methods. In the experiments, we used the metric accuracy (AC) [17] and mutual information (MI) [33] to measure the performance of these methods. We ran kmeans ten times on each dataset and recorded its mean value and standard deviation.
Table 1 shows the details of the benchmark datasets. Tables 2 and 3 exhibit the linear and nonlinear clustering results on the benchmark datasets, respectively. The highest metrics among these methods on each dataset are in bold. From Table 2, it can be seen that our linear RampTWSVC performs better than other linear methods on seven datasets in terms of both AC and MI, and it is more accurate than other methods on other six datasets. On the rest of nine datasets, our linear RampTWSVC is also competitive with the best one. From Table 3, it is obvious that our nonlinear RampTWSVC has much higher AC and MI over other methods on most of the datasets. More precisely, we reported the rank of each method on these datasets in Tables 4 and 5 from Tables 2 and 3, respectively, where the numbers represent the ranks of these seven methods, the same number implies the same performance and the smaller number implies the higher performance. The mean metrics were also reported at the last two columns in these tables. It is obvious that our RampTWSVC owns the smallest mean metrics among these methods, which indicates that our RampTWSVC performs better than other methods on these datasets. It is worth to notice that there are several datasets on which other methods perform better than RampTWSVC. For example, kmeans obtains a much higher performance than others on dataset (g) in Table 2, which implies the cluster centers in dataset (g) are more suitable for points than planes. In these plane-based methods, our RampTWSVC cut off the influence of the samples far from the cluster centers, and the bounds are decided by the parameters. So the parameter selection is very important in our RampTWSVC.
To evaluate the parameter influence in our RampTWSVC, as examples, we reported the metric ACs with different parameter settings on four datasets. Figure 2 exhibits the results of linear RampTWSVC with parameters \(c_1\) and \(c_2\). It can be seen that the above two parameters play an important role in this method. Thence, we should be very careful about the parameter selection. Figure 3 exhibits the results of nonlinear RampTWSVC with parameters \(c_1\), \(c_2\) and \(\upmu\), where \(c_1=c_2\) for simplicity. From Fig. 3, it is obtained that we cannot obtain a desired result if \(\upmu\) is improper, which indicates the parameter \(\upmu\) is more important that the other two parameters for nonlinear RampTWSVC. This is obviously because the kernel mapping plays an more important role in the nonlinear case.
5 Conclusions
A plane-based clustering method (RampTWSVC) has been proposed in this paper. RampTWSVC hires the bounded ramp functions to reduce the influence of the samples far from the cluster centers, and it performs well in a robust manner. The nonconvex programming problems in RampTWSVC are recast to a series of mixed-integer programming problems and solved by a proposed alternating iteration algorithm. The local solutions of the nonconvex problems are guaranteed in theory. RampTWSVC contains both the linear and nonlinear formations. Experimental results on several benchmark datasets have indicated that our RampTWSVC performs much better than other plane-based clustering methods on many benchmark datasets. For practical convenience, the corresponding RampTWSVC MATLAB code has been uploaded upon http://www.optimal-group.org/Resources/Code/RampTWSVC.html. Future work includes the parameters regulation and efficient solver design for our RampTWSVC.
References
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H (2016) Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing 208:210–217
Zhou J, Chen L, Chen C, Zhang Y, Li H (2016) Fuzzy clustering with the entropy of attribute weights. Neurocomputing 198:125–134
Zhou Z (2017) A brief introduction to weakly supervised learning. Natl Sci Rev 5(1):44–53
Berry M (2004) Survey of text mining I: clustering, classification, and retrieval, vol 1. Springer, Berlin
Ilin R (2012) Unsupervised learning of categorical data with competing models. IEEE Trans Neural Netw Learn Syst 23(11):1726–1737
Zhang H, Chow T, Wu Q (2015) Organizing books and authors by multilayer som. IEEE Trans Neural Netw Learn Syst 27(12):2537–2550
Tan P, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co. Inc, Boston
Huang X, Ye Y, Zhang H (2014) Extensions of kmeans-type algorithms: a new clustering framework by integrating intracluster compactness and intercluster separation. IEEE Trans Neural Netw Learn Syst 25(8):1433–1446
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall Inc, Upper Saddle River
Mangasarian O, Wild E (2006) Multisurface proximal support vector classification via generalize eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Wang Z, Shao Y, Bai L, Li C, Liu L, Deng N (2018) Insensitive stochastic gradient twin support vector machines for large scale problems. Inf Sci 462:114–131
Bradley P, Mangasarian O (2000) k-plane clustering. J Global Optim 16(1):23–32
Shao Y, Bai L, Wang Z, Hua X, Deng N (2013) Proximal plane clustering via eigenvalues. Proc Comput Sci 17:41–47
Shao Y, Guo Y, Wang Z, Yang Z, Deng N (2017) k-proximal plane clustering. Int J Mach Learn Cybernet 8(5):1537–1554
Wang Z, Shao Y, Bai L, Deng N (2015) Twin support vector machine for clustering. IEEE Trans Neural Netw Learn Syst 26(10):2583–2588
Ye Q, Zhao H, Li Z, Yang X, Gao S, Yin T, Ye N (2018) L1-norm distance minimization-based fast robust twin support vector k-plane clustering. IEEE Trans Neural Netw Learn Syst 29(9):4494–4503
Khemchandani R, Pal A, Chandra S (2018) Fuzzy least squares twin support vector clustering. Neural Comput Appl 29(2):553–563
Bai L, Shao Y, Wang Z, Li C (2019) Clustering by twin support vector machine and least square twin support vector classifier with uniform output coding. Knowl-Based Syst 163:227–240
Shao Y, Zhang C, Wang X, Deng N (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Collobert R, Sinz F, Weston J, Bottou L (2006) Large scale transductive SVMS. J Mach Learn Res 7:1687–1712
Cevikalp H (2017) Best fitting hyperplanes for classification. IEEE Trans Pattern Anal Mach Intell 39(6):1076–1088
Tian Y, Mirzabagheri M, Bamakan S, Wang H, Qu Q (2018) Ramp loss one-class support vector machine: a robust and effective approach to anomaly detection problems. Neurocomputing 310(8):223–235
Liu D, Shi Y, Tian Y, Huang X (2016) Ramp loss least squares support vector machine. J Comput Sci 14:61–68
Souvenir R, Pless R (2005) Manifold clustering. In: Tenth IEEE international conference on computer vision, ICCV 2005, vol 1, IEEE, pp 648–653
Cao W, Haralick R (2006) Nonlinear manifold clustering by dimensionality. In: 18th international conference on pattern recognition. ICPR 2006, vol 1. IEEE, pp 920–924
Khemchandani R, Chandra S (2009) Optimal kernel selection in twin support vector machines. Optim Lett 3:77–88
Platt J (1999) Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods-support vector learning. MIT Press, Cambridge, pp 185–208
Wang Z, Shao Y, Wu T (2013) A ga-based model selection for smooth twin parametric-margin support vector machine. Pattern Recogn 46(8):2267–2277
Blake C, Merz C (1998) UCI Repository for Machine Learning Databases, http://www.ics.uci.edu/~mlearn/MLRepository.html
Li C, Shao Y, Yin W, Liu M (2019) Robust and sparse linear discriminant analysis via an alternating direction method of multipliers. IEEE Trans Neural Netw Learn Syst (in press)
Liu H, Fu Y (2015) Clustering with partition level side information. In: 2015 IEEE international conference on data mining (ICDM). IEEE, pp 877–882
Acknowledgements
This work is supported in part by National Natural Science Foundation of China (Nos. 61966024, 11501310, 61866010, 11871183 and 61703370), in part by Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region (No. NJYT-19-B01), in part by Natural Science Foundation of Hainan Province (No. 118QN181), in part by Scientific Research Foundation of Hainan University (No. kyqd(sk)1804) and in part by Zhejiang Provincial Natural Science Foundation of China (No. LQ17F030003).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Z., Chen, X., Shao, YH. et al. Ramp-based twin support vector clustering. Neural Comput & Applic 32, 9885–9896 (2020). https://doi.org/10.1007/s00521-019-04511-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04511-3