Kernel intuitionistic fuzzy c-means and state transition algorithm for clustering problem

Zhou, Xiaojun; Zhang, Rundong; Wang, Xiangyue; Huang, Tingwen; Yang, Chunhua

doi:10.1007/s00500-020-04879-8

Kernel intuitionistic fuzzy c-means and state transition algorithm for clustering problem

Methodologies and Application
Published: 02 April 2020

Volume 24, pages 15507–15518, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

Kernel intuitionistic fuzzy c-means and state transition algorithm for clustering problem

Download PDF

Xiaojun Zhou^1,2,
Rundong Zhang¹,
Xiangyue Wang¹,
Tingwen Huang³ &
…
Chunhua Yang¹

660 Accesses
17 Citations
Explore all metrics

Abstract

Clustering problems widely exist in machine learning, pattern recognition, image analysis and information sciences, etc. Although many clustering algorithms have been proposed, it is unpractical to find a clustering algorithm suitable for all types of datasets. Fuzzy c-means (FCM) is one of the most frequently-used fuzzy clustering algorithm for the reason that it is efficient, straightforward, and easy to implement. However, the traditional FCM taking Euclidean distance as similarity measurement can not distinguish the intersection between two clusters. Therefore, kernel function has been taken as similarity measurement to solve this issue. As a comprehensive partition criterion, intuitionistic fuzzy set which consider both membership degree and non-membership degree has been used to replace traditional fuzzy set to describe the natural attributes of objective phenomena more delicately. Thus, Kernel intuitionistic fuzzy c-means (KIFCM) has been proposed in this paper to settle clustering problem. Considering FCM is easily getting trapped in local optima due to its high sensitivity to initial centroid. State Transition Algorithm (STA) has been adopted in this study to obtain the initial centroid to enhance its stability. The proposed STA-KIFCM compared with some other clustering algorithms are implemented using five benchmark datasets. Experimental results not only show that the proposed method is efficient and can reveal encouraging results, but also indicate that the proposed method can achieve high accuracy.

Kernel-Distance-Based Intuitionistic Fuzzy c-Means Clustering Algorithm and Its Application

Article 01 October 2019

An Improved Fuzzy C-Means Clustering Algorithm Based on Intuitionistic Fuzzy Sets

A Kernel Fuzzy C-means Clustering Algorithm Based on Firefly Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Clustering problems exist in many areas including machine learning, pattern recognition, image analysis and information sciences, etc (Sipe 2001; Fouche and Langit 2011; Bastanlar and Ozuysal 2014). With the growing interest in automatically understanding, processing and summarizing data, many application domains have employed various clustering algorithms to identify patterns within a dataset. Clustering is the process of assigning data objects into a set of disjoint groups called clusters so that objects in each cluster are more similar to each other than objects from different clusters. Clustering algorithms work by assigning objects to a group if they show a high level of similarity and by assigning objects to different groups if they are distinguished from each other. The common used clustering algorithms can be broadly classified as Hard and Fuzzy (Hathaway and Bezdek 1995). C-means is one of the most popular hard clustering algorithms which partition data objects into c clusters, where the number of clusters, c, is decided in advance according to application purposes. This model is inappropriate for real data sets where there are no definite boundaries between the clusters. Using the distance-based (DB) method can find the appropriate number of clusters in advance (Davies and Bouldin 1979). Fuzzy c-means (FCM) is one of the most popular unsupervised fuzzy clustering algorithm (Wang 1983; Zhang and Chen 2004). Since FCM is easily implemented and has obtained satisfactory results in many applications, it has become an important tool for clustering analysis (Zhou et al. 2014; Askari et al. 2017). Furthermore, due to the increasing complexity of the clustering environment and lack of knowledge about the domain of clustering problem in the process of distance measurement under an fuzzy environment, the kernel-based fuzzy c-means (KFCM) algorithm replaces the Euclidean distance metric which is widely used in previous literatures with a kernel metric. The kernel function is applied in order to achieve better mapping for nonlinear separable datasets. Previous studies have shown that the KFCM algorithm performs better than the FCM algorithm. FCM algorithm considers the degree of membership as studied in many literatures (Liu and Xu 2008; Zhang and Chen 2003; Atanassov and Rangasamy 1986). IFCM not only thinks about the degree of membership, but also considers the non-membership of the point in dataset. The IFCM algorithm can achieve better performance compared with the FCM algorithm (Chaira 2011). For example, the intuitionistic fuzzy c-means (IFCM) algorithm was applied in applications to medical images in Bezdek et al. (1984). It is more reasonable to replace fuzzy measurement with intuitionistic metric. Since kernel intuitionistic fuzzy c-means (KIFCM) is an extension of both KFCM and IFCM, this study replaces the Euclidean distance with kernel function and substitutes the fuzzy measurement with the intuitionistic fuzzy metric. KIFCM can describe the natural properties of clustering phenomena more delicately and comprehensively (Ghosh and Dubey 2013). In KIFCM, the goal is to minimize the sum of distances between data points and their respective parent cluster centroid, taking into account the similarity between elements and corresponding clustering center in each cluster. KIFCM has been widely studied in many literatures because of the nice recognition performance and robustness (Papageorgiou and Iakovidis 2013). Although KIFCM algorithm appears good performance in fuzzy clustering problems, but there still exists some revealed deficiency in motivating the proposal of alternative approaches for fuzzy clustering. Such as initialization with randomly generated centers, high probability of getting trapped into local minima and low accuracy. Therefore, this study intends to improve the KIFCM algorithm by employing global search optimization algorithm.

Many global optimization algorithms such as genetic algorithm (GA) and particle swarm optimization (PSO) have been successfully employed in many clustering applications in previous literatures (Whitley 1994; Hosseinabadi et al. 2019; Kuo et al. 2016; Filho et al. 2015; Deng et al. 2019). They aim to solve optimization problems without being trapped into local minima. The state transition algorithm is a novel intelligent global search algorithm for many optimization problems due to its versatility and simplicity (Li et al. 2012; Zhou et al. 2012; Huang et al. 2018; Zhou et al. 2018; Huang et al. 2018; Zhou et al. 2018). In this paper, STA and KIFCM are combined together to improve the accuracy and obtain better clustering centers. Five benchmark datasets are implemented to compare the proposed method with other methods such as FCM, IFCM, KFCM and KIFCM. The experimental results indicate that the proposed hybrid method achieve better performance and realize higher accuracy than other methods. The novelty and contributions of this paper are listed below:

(i)
This study applies the intuitionistic fuzzy set and kernel function together to settle clustering problems. Using the intuitionistic fuzzy set can partition clustering data points reasonably. Taking the kernel function as the distance metric can distinguish the overlapping clustering data points accurately.
(ii)
This study designs STA to avoid getting trapped into local optima. STA has good performance both in wide exploration and deep exploration which is the trade-off between local and global search.
(iii)
Combining STA with the promising KIFCM algorithm can increase accuracy obviously. Accuracy is the most important index in evaluating the performance of clustering algorithm in clustering problems. This study firstly intends to employ STA and KIFCM to manage clustering problems.

The remainder of this study is organized as follows. Section 2 presents the basic theory of FCM algorithm. Section 3 presents the hybrid method. The experimental results are presented in Sect. 4. Finally, Sect. 5 concludes this study.

2 Theoretical basis of FCM algorithm

The basic theoretical knowledge necessary for this study are presented in this section. This section includes two parts. The first part includes distance-based (DB) c-means algorithm used to determine the number of clusters. Another part includes KIFCM algorithm which combines the intuitionistic fuzzy sets and kernel function into the conventional FCM.

2.1 Distance-based c-means algorithm

The conventional clustering algorithm partitions set of n objects $X=\{x_{1},x_{2},\ldots ,x_{n}\}$ in $R^{d}$ dimensional space into c $(2\le c \le n)$ fuzzy clusters with clustering centers $C=\{c_{1},c_{2},\ldots ,c_{c}\}$. Determining the number of clusters is a significant preliminary study before applying the proposed hybrid method to get the best clustering centers. The DB c-means algorithm aims to divide the dataset into several clusters and calculate the DB index for each cluster. The best number of clusters can be obtained by taking the ratio of intra-cluster distance to inter-cluster distances as metric. If a dataset can be divided into k clusters, the DB value can be calculated as follows:

$$\begin{aligned} DB(k) = \frac{1}{k}\sum _{i=1}^{k}D_{i} \end{aligned}$$

(1)

where $D_{i}$ is defined as below:

$$\begin{aligned} D_{i} = \max \underset{i \ne j}{\{}\frac{\bar{d_{i}}+\bar{d_{j}}}{d_{ij}}\} \end{aligned}$$

(2)

here, $d_{ij}$ represents the inter-distance which is the distance between centroid i and j, but $\bar{d_{i}}$ represents the intra-distance which is the average distance between each point in cluster i to the centroid of cluster i.

Figure 1 shows the result of evaluating the DB value corresponding to 2–10 clusters. It applies c-means algorithm which begins with a random initial centroid to cluster the Old Faithful Geyser (OFG) dataset. At each trial,the results might be different. But the error of evaluation can be reduced when 20 repetitions are conducted. The result shows that the smallest DB value is obtained at 3.

2.2 Intuitionistic and kernel FCM algorithm

In this section, the related background necessary for this study will be introduced. This includes the intuitionistic fuzzy logic and kernel FCM algorithm.

2.2.1 Intuitionistic fuzzy logic

Fuzzy set theory, proposed by Zadeh, has been successfully applied in various fields (Zadeh 1999). The theory states that the degree of membership of an element to a fuzzy set is a single value between 0 and 1. But in Atanassov’s intuitionistic fuzzy set, both the degree of membership $\mu (x)$ and the degree of non-membership $\nu (x)$ are considered (Zhao et al. 2010). The intuitionistic fuzzy logic can describe the natural attributes of objective phenomena more delicately and comprehensively. In the intuitionistic fuzzy set, an object $x\in A$ consists of the degree of membership $\mu _{A}(x)$ and the degree of non-membership $\nu _{A}(x)$, where $ \mu (x)\in [0,1] $, $\nu (x)\in [0,1]$ and $ 0\le \mu _{A}(x) + \nu _{A}(x) \le 1 $. If $\mu _{A}(x) = 1 - \nu _{A}(x)$, $\forall x\in A$,then A degrades into a fuzzy set. In intuitionistic fuzzy set, there is a hesitation degree $\pi _{A}(x)$, defined as below:

$$\begin{aligned} \pi _{A}(x) = 1 - \mu _{A}(x) - \nu _{A}(x) \end{aligned}$$

(3)

Obviously, $\pi _{A}(x)\in [0,1]$. Therefore, the degree of membership in the IFCM algorithm is calculated as below:

$$\begin{aligned} \mu _{ij}^{*} = \mu _{ij} + \pi _{ij} \end{aligned}$$

(4)

In this paper, Yager’s intuitionistic fuzzy complement is taken as the intuitionistic fuzzy complement which can be written as below:

$$\begin{aligned} \pi _{ij} = 1 - \mu _{ij} - \left( 1 - \mu _{ij}^{\alpha }\right) ^{1/\alpha },\quad -1<\alpha <0 \end{aligned}$$

(5)

where $\pi _{ij}$ is Yager’s hesitation degree in the IFCM algorithm. According to Yager’s intuitionistic fuzzy complement, $\nu _{ij}=(1-\mu _{ij}^{\alpha })^{1/\alpha }$ and $\pi _{ij}=1-\mu _{ij}-\nu _{ij}$, we can obtain the Eq. (5). So $\mu _{ij}^{*}$ can be obtained as below:

$$\begin{aligned} \mu _{ij}^{*} = 1 - \left( 1 - \mu _{ij}^{\alpha }\right) ^{1/\alpha },\quad -1<\alpha <0 \end{aligned}$$

(6)

where $\alpha $ is a parameter in the Yager-generating function. $\alpha $ is set to $-1/2$ in this paper, because the test shows a better result when $\alpha $ equals to $-1/2$. $\mu _{ij}^{*}$ is called the Yager’s intuitionistic membership degree in the IFCM algorithm.

2.2.2 Kernel FCM algorithm

Differing from conventional clustering algorithm, the fuzzy clustering of objects is described by a fuzzy matrix $\mu $ with n rows and c columns in which n is the number of data objects and c is the number of clusters. The element $\mu _{ij}$ which is the element in ith row and jth column in $\mu $ matrix indicates the degree of membership of the ith object belonging to the jth cluster. The FCM aims to minimize the sum of distances between data points and their respective parent cluster centroid as defined as below:

$$\begin{aligned} \begin{array}{ll} \hbox {mi}n &{} J = \sum \limits _{i=1}^{n} \sum \limits _{j=1}^{c}\left[ \mu _{ij}\right] ^{m}\times d_{ij}\left( x_{i},c_{j}\right) \\ \quad &{} \sum \limits _{j=1}^{c}\mu _{ij}=1,\quad \forall i = 1,2,\ldots ,n \\ \hbox {s.t.} &{} 0\le \mu _{ij}\le 1, \quad \forall i = 1,2,\ldots ,n; \\ \quad &{} \forall j=1,2,\ldots ,c \end{array} \end{aligned}$$

where the fuzzy parameter m $(m>1)$ determines the amount of fuzziness of the resulting clusters. $\mu _{ij}$ and $c_{j}$ can be calculated by Eqs. (7) and (8), respectively. $d_{ij}(x_{i},c_{j})=\Vert x_{i}-c_{j}\Vert $ is the Euclidian distance from object $x_{i}$ to the clustering center $c_{j}$.

$$\begin{aligned} \mu _{ij}= & {} \frac{1}{\sum _{k=1}^{c}\left( \frac{d_{ij}}{d_{ik}}\right) ^{\frac{2}{m-1}}} \end{aligned}$$

(7)

$$\begin{aligned} c_{j}= & {} \frac{\sum _{i=1}^{n}\left[ \mu _{ij}\right] ^{m}x_{i}}{\sum _{i=1}^{n}\left[ \mu _{ij}\right] ^{m}} \end{aligned}$$

(8)

Therefore, in the traditional FCM algorithm, the membership degree and the clustering center can be obtained by Eqs. (7) and (8), respectively.

In the previous FCM algorithm, the distance between two independent data points x and y is calculated using Euclidean distance $d(x,y) = \Vert x-y\Vert $. This is because the Euclidean distance is the simplest and most used distance measurement. In this paper, the kernel function works as the distance measurement to substitute the Euclidean distance measurement (Fan et al. 2009). Defining a nonlinear map as $\varPhi :x \rightarrow \varPhi (x)\in F $, where $ x\in X $. X represents the data space, and F represents the transformed feature space with higher or even infinite dimension. The bidimensional kernel function frequently used are listed as follows:

Gaussian function
$$\begin{aligned} K(x,y) = \exp \left( -\Vert x-y\Vert ^{2}/\sigma ^{2}\right) , \end{aligned}$$
(9)
Radial basis (RBF) functions
$$\begin{aligned} \begin{array}{c} \displaystyle K(x,y) = \exp \left( -\sum _{i}|x_{i}^{a}-y_{i}^{a}|^{b}/\sigma ^{2}\right) \\ \displaystyle \quad \left( 0 < b \le 2\right) \end{array} \end{aligned}$$
(10)
Hyper tangent function
$$\begin{aligned} K(x,y) = 1-\tanh \left( -\Vert x-y\Vert ^{2}/\sigma ^{2}\right) , \end{aligned}$$
(11)

where x and y represent two independent data points, $\sigma $ represents the standard deviation which can be set in advance. Obviously, RBF function reduces into Gaussian function with $ a=1 $ and $ b=2 $. Taking the kernel function as metric has advantage in distinguishing overlapping data points than conventional Euclidean distance measurement.

Further improvement is achieved by adding a kernel function and intuitionistic fuzzy set into the conventional FCM. The kernel intuitionistic fuzzy c-means (KIFCM) algorithm maps the data point $x_{i}$ and centroid $c_{j}$ in the kernel space as ${\varnothing }(x_{i})$ and ${\varnothing }(c_{j})$, respectively. The sum of distances between data points and their respective parent cluster centroid that the KIFCM aims to minimize is defined as follows:

$$\begin{aligned} \begin{array}{ll} \hbox {min} &{} J = \sum \limits _{i=1}^{n}\sum _{j=1}^{c}\left[ \mu _{ij}\right] ^{m}\times \Vert {\varnothing }\left( x_{i}\right) ,{\varnothing }\left( c_{j}\right) ^{2}\Vert \\ \quad &{} \sum \limits _{j=1}^{c}\mu _{ij}=1,\quad \forall i = 1,2,\ldots ,n \\ \hbox {s.t.} &{} 0\le \mu _{ij}\le 1 , \quad \forall i = 1,2,\ldots ,n; \\ \quad &{} \forall j=1,2,\ldots ,c \end{array} \end{aligned}$$

where $\Vert {\varnothing }(x_{i}),{\varnothing }(c_{j})^{2}\Vert $ is the distance between ${\varnothing }(x_{i})$ and ${\varnothing }(c_{j})$ measured by kernel function, calculated by using Eq. (12):

$$\begin{aligned} \Vert {\varnothing }\left( x_{i}\right) ,{\varnothing }\left( c_{j}\right) ^{2}\Vert = K\left( x_{i},x_{i}\right) + K\left( c_{j}.c_{j}\right) - 2K\left( x_{i},c_{j}\right) \end{aligned}$$

(12)

where K(x, y) is the kernel function. This paper applies the Hyper tangent functions as the kernel function. The Hyper tangent function is defined in Eq. (11). So, the kernel metric which is taken as the distance measurement can be defined as Eq. (13):

$$\begin{aligned} d_{ij}\left( x_{i},c_{j}\right) ^{2} = \Vert \phi \left( x_{i}\right) - \phi \left( c_{j}\right) ^{2} \Vert = 2\left( 1-K\left( x_{i},c_{j}\right) \right) \end{aligned}$$

(13)

The formulations used to calculate the degree of membership $\mu _{ij}$ and clustering center $c_{j}$ are defined as Eqs. (14) and (15), respectively.

$$\begin{aligned} \mu _{ij}= & {} \frac{\left( 1/\left( 1-K\left( x_{i},c_{j}\right) \right) \right) ^\frac{1}{m-1}}{\sum _{k=1}^{c}\left( 1/\left( 1-\left( K\left( x_{i},c_{k}\right) \right) \right) \right) ^\frac{1}{m-1}} \end{aligned}$$

(14)

$$\begin{aligned} c_{j}= & {} \frac{\sum _{i=1}^{n}\left[ \mu _{ij}\right] ^{m}K\left( x_{i},c_{j}\right) x_{i}}{\sum _{i=1}^{n}\left[ \mu _{ij}\right] ^{m}K\left( x_{i},c_{j}\right) } \end{aligned}$$

(15)

2.2.3 Kernel intuitionistic FCM algorithm

This study substitutes the Hyper tangent function for the Euclidean distance measurement as the distance measurement. Based on the intuitionistic fuzzy set, the kernel intuitionistic FCM algorithm which evolves the conventional fuzzy logic into the intuitionistic fuzzy set can be obtained (Lin 2014). The membership $\mu _{ij}^{*}$ can be calculated according to Eq. (16). The new clustering center $c_{j}^{*}$ can be calculated by Eq. (17).

$$\begin{aligned} \mu _{ij}= & {} \frac{\left( 1/\left( 1-K\left( x_{i},c_{j}\right) \right) \right) ^\frac{1}{m-1}}{\sum _{k=1}^{c}\left( 1/\left( 1-\left( K\left( x_{i},c_{k}\right) \right) \right) \right) ^\frac{1}{m-1}}\nonumber \\ \mu _{ij}^{*}= & {} 1 - \left( 1 - \mu _{ij}^{-1/2}\right) ^{-2} \end{aligned}$$

(16)

$$\begin{aligned} c_{j}^{*}= & {} \frac{\sum _{i=1}^{n}\left[ \mu _{ij}^{*}\right] ^{m}K\left( x_{i},c_{j}\right) x_{i}}{\sum _{i=1}^{n}\left[ \mu _{ij}^{*}\right] ^{m}K\left( x_{i},c_{j}\right) } \end{aligned}$$

(17)

The intuitionistic clustering center can be obtained by Eq. (17) in IFCM algorithm. The KIFCM algorithm can be obtained by combining the kernel function and intuitionistic fuzzy set into FCM algorithm. The KIFCM algorithm can be stated in Algorithm 1.

When the relative change in the centroid values becomes small, the KIFCM algorithm is terminated.

3 Hybrid method with STA and KIFCM

The hybrid method proposed in this paper are included in this section. The first part of this section is STA which is a global search optimization algorithm. The second part of this section is KIFCM which shows a good performance in clustering problems. KIFCM algorithm which was proposed firstly in this paper is presented in this section.

3.1 State transition algorithm

STA is a novel intelligent global optimization algorithm proposed by Huang et al. (2019), inspired by the understanding of temporal and spatial correlation. STA has been widely used to solve many global search problems (Zhou et al. 2018, 2019; Huang et al. 2019; Zhou et al. 2019a, b; Han et al. 2018). Its heuristic ideas enable itself to search large and discontinuous search spaces without getting trapped into the local optima. Based on the current state $x_{k}$, the unified form of generation of a new state $x_{k+1}$ in STA can be described as follows:

$$\begin{aligned} \left\{ \begin{array}{l} x_{k+1} = A_{k}x_{k} + B_{k}u_{k} \\ y_{k+1} = f\left( x_{k+1}\right) \end{array} \right. \end{aligned}$$

(18)

where $x_{k}=[x_{k1},x_{k2},\ldots ,x_{kn}]^{{\mathrm{T}}}$ stands for a n-dimensional state, corresponding to a solution of an optimization problem; $u_{k}$ is a function of $x_{k}$ and historical states. $A_{k}$ and $B_{k}$ are state transition matrices, which are usually some state transformation operators; $f(\cdot )$ is the objective function of fitness function, and $y_{k+1}$ is the function value at $x_{k+1}$. Using state space transformation for reference, four special state transition operators are designed to generate continuous solutions for an optimization problem. The four special state transition operators can be described as below:

Rotation transformation
$$\begin{aligned} x_{k+1} = x_{k} + \alpha \frac{1}{n\Vert x_{k}\Vert _{2}} R_{t}x_{k}, \end{aligned}$$
(19)
where $\alpha $ is a positive constant, called the rotation factor; $R_{r}\in {\mathbb {R}}^{n \times n}$, is a random matrix with its entries being uniformly distributed random variables defined on the inteval $[-1,1]$, and $\Vert \cdot \Vert _{2}$ is the 2-norm of a vector. This rotation transformation has the function of searching in a hypersphere with the maximal radius $\alpha $.
Translation transformation
$$\begin{aligned} x_{k+1} = x_{k} + \beta R_{t}\frac{x_{k}-x_{k-1}}{\Vert x_{k}-x_{k-1}\Vert _{2}}, \end{aligned}$$
(20)
where $\beta $ is a positive constant, called the transition factor; $R_{t}\in {\mathbb {R}}$ is a uniformly distributed random variable defined on the interval [0, 1]. The translation transformation has the function of searching along a line from $x_{k-1}$ to $x_{k}$ at the starting point $x_{k}$ with the maximum length $\beta $.
Expansion transformation
$$\begin{aligned} x_{k+1} = x_{k} + \gamma R_{e}x_{k}, \end{aligned}$$
(21)
where $\gamma $ is a positive constant, called the expansion factor; $R_{e}\in {\mathbb {R}}^{n \times n}$ is a random diagonal matrix with its entries obeying the Gaussian distribution. The expansion transformation has the function of expanding the entries in $x_{k}$ to the range of $[-\infty , +\infty ]$, searching in the whole space.
Axesion transformation
$$\begin{aligned} x_{k+1} = x_{k} + \delta R_{a}x_{k}, \end{aligned}$$
(22)
where $\delta $ is a positive constant, called the axesion factor; $R_{a}\in {\mathbb {R}}^{n\times n}$ is a random diagonal matrix with its entries obeying the Gaussian distribution and only one random position having nonzero value. The axesion transformation is aiming to search along the axes, strengthening single dimensional search.

With the state transformation operators, sampling technique and update strategy, the procedures of the state transition algorithm can be described in Algorithm 2.

When the current Best needs to be abandoned, Translation transformation can be taken to update the Best. That is the update strategy of greed. Generally, the maximum iteration reached is taken as the terminating condition. In Algorithm 2, SE is called search enforcement, representing the times of transformation by a certain operator. The changeable rotation factor can not only speed up the process of finding optimal solution, but also increase the accruacy of the solution. For better understanding of more detailed steps of STA, the STA toolbox can be downloaded via the following link:https://www.mathworks.com/matlabcentral/fileexchange/52498-state-transition-algorithm

3.2 STA-KIFCM algorithm

Good performance of the KIFCM algorithm has been shown by many literatures (Pena et al. 1999). However, the results of KIFCM is not stable since KIFCM is highly dependent on the initial clustering centers. The searching ability of the KIFCM algorithm is a local search procedure. Previously, many studies focus on designing a novel cluster initialization (Babu and Murty 1993). STA has good performace in generating heuristic and stochastic initial clustering centers (Zhou et al. 2016). There exist four operators which can generate diverse candidate clustering centers in STA. After that, KIFCM can obtain the clustering centers rapidly and reasonably. So,the hybrid method can generate the optimal clustering centers accurately. The hybrid method with STA and KIFCM can not only avoid getting trapped into the local optima, but also improve the degree of accuracy efficiently. The procedures of the hybrid method can be stated in Algorithm 3.

Several stopping rules can be used. The terminate condition in the Step 3.4 is the maximum number of iterations reached. The Step 4.4 is terminated when the relative change in the degree of membership becomes small. The Step 5 stop when the objective function cannot be minimized anymore. Figure 2 shows the total flowchart of the STA-KIFCM.

4 Experiment results and discussion

This section shows experiments which use five benchmarks datasets to evaluate the performances of fuzzy clustering methods. In this section, the comparison algorithms will be introduced in Sect. 4.1. Then, the experimental setup, results and discussion are included in the rest of this section.

4.1 Comparison algorithms

In this paper, three evolutionary algorithms are adopted to generate the initial centroid for KIFCM. Two typical evolutionary algorithms named GA and PSO are used to compare with STA to overcome a drawback of KIFCM algorithm which is highly dependent on initial centroid. All the parameters of these algorithms are used as the recommended settings on their corresponding reference (Kuo et al. 2018). The STA-KIFCM proposed in this paper has been detailedly described in Fig. 2. In fact, the function of GA and PSO are similar to STA in generating the initial centroid for KIFCM. The main differences between these evolutionary algorithms are the way of generating candidate solutions and updating optimal solution strategy. The detail descriptions of differences between GA-KIFCM, PSO-KIFCM and STA-KIFCM are given as follows.

(1)
GA-KIFCM: the way of generating candidate solution is based on crossover operator and mutation operator. The strategy of updating optimal solution is the rule of greedy (Whitley 1994; Hosseinabadi et al. 2019; Kuo et al. 2018).
(2)
PSO-KIFCM: the vector and position updating formula are adopted to generate candidate solution. The updating optimal solution can be summarized as learning from the best particle (Filho et al. 2015; Deng et al. 2019; Kuo et al. 2018).

In addition, the framework of GA-KIFCM and PSO-KIFCM are similar to STA-KIFCM which has been illustrated in Fig. 2. Compared to STA-KIFCM, GA and PSO are adopted to take the place of STA in GA-KIFCM and PSO-KIFCM, respectively.

4.2 Experimental setup

Table 3 shows the parameters of PSO-KIFCM and GA-KIFCM methods which have been set in Kuo et al. (2018). The parameters are set according to the rule that the number of evaluating target function are uniform. The preparation for the experiment are outlined below:

Table 1 Datasets characteristics

Full size table

Table 2 The comparison of results obtained by each algorithm

Full size table

Parameters of STA-KIFCM settings

According to previous literatures, this study sets the weighting exponent m as 2 (Pedrycz and Rai 2008). The standard deviation $\sigma $ in kernel function is chosen as $\sqrt{2}$. The parameters $\beta $, $\gamma $, $\delta $ in STA are chosen as 1 and the parameter $\alpha $ is decreased linearly from 1 to $e^{-4}$ over the course of iterations to facilitate global exploration in the early stages and exploitation in the latter stages. The search enforcement (SE) in STA can be chosen as 30, which means that the number of candidate solutions in every iteration is 30.
Datasets description

This part adopts five benchmark clustering datasets which can be downloaded from the UCI machine learning repository, with the exception of the Old Faithful Geyser (OFG) dataset, to evaluate the performance of the proposed algorithm. So many previous studies have taken these datasets into various fields including biology, oncology, education, medicine, chemistry and agriculture (Michielssen et al. 1992). Table 1 gives a brief overview of these datasets characteristics.
- Iris dataset was introduced by Fisher (Fisher 1936) consisting of there species of the Iris flowers (Virginica, Versicolour and Setosa) includes a total of 150 instances with 4 features (Length and width of the sepals and petals).
- Wine dataset consisting of wine derived from three cultivars grown in the same region includes a total of 178 instances with 13 features (alcohol, malic acid, ash, alcalinity of ash, magnesium, total phenols, flavanoids, nonflavanoid phenols, proanthocyanins, color intensity, hue, OD280/OD315 of dilured wines and proline).
- Tae dataset consists of evaluations of teaching performance over three regular semesters and two summer semesters of 151 teaching assistant (TA) assignments at the Statistics Department of the University of Wisconsin-Madison. The scores were divided into three roughly equal-sized categories (“low”, “medium”, and “high”) to form the class variable.
- Seed dataset consisting of kernels belonging to three different varieties of wheat:Kama, Rosa and Canadian, 70 elements each, randomly selected for the experiment includes seven geometric parameters of wheat kernels (area, perimeter, compactness, length of kernel, asymmetry coefficient, length of kernel grooves).
- Old Faithful Geyser (OFG) Dataset describe the waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA (Hutchinson et al. 1997). 272 observations on 2 variables (eruptions numeric eruption time in mins, waiting numeric waiting time to next eruption) are contained in the data frame.
Performance indices

The performance indices are used to evaluate the effectiveness of fuzzy clustering method shown in Table 2. The performance indices are listed as follows:

Table 3 Parameter settings for PSO-KIFCM and GA-KIFCM methods

Full size table

(a)
Inter-distance. The sum of distances between data points and their respective parent cluster centroid, smaller value of which are desirable and indicate greater compactness of clustering. The inter-distance can be formulated as follows:
$$\begin{aligned} d=\sum _{i=1}^{k}\sum _{x_{j}\in S_{i}}d_{ji}(x_{j},c_{i})^{2} \end{aligned}$$
(23)
where k represents the number of clustering centers, $S_{i}$ represents the ith cluster and $x_{j}$ represents the jth point belonging to cluster $S_{i}$. $d_{ji}(x_{j},c_{i})^{2}$ represents the distance between point $x_{j}$ and $c_{i}$. In FCM and IFCM, the Euclidean distance $d_{ji}(x_{j},c_{i})^{2}=\Vert x_{j}-c_{i}\Vert $ is taking as the distance measurement. But the kernel function $d_{ji}(x_{j},c_{i})^{2}=\Vert \phi (x_{j}),\phi (c_{i})\Vert $ is the distance metric in KFCM, KIFCM GA-KIFCM PSO-KIFCM and STA-KIFCM.
(b)
Time-consuming. Time complexity is commonly used to evaluate the rapidity of algorithm. The less time consuming with high validity is desirable.
(c)
Accuracy. Each data point in the tested datasets are labeled according to their membership value for each cluster. The final cluster label for each data point is defined as the cluster which has the highest membership value. The same data point may be labeled differently under each repetition. The accuracy of algorithm labeling is the ratio of the correct labels to total labels.

4.3 Result and discussion

The results of the experiment with five benchmark datasets are shown in this part. The result of proposed STA-KIFCM method compared with other methods have been shown in Table 2. In order to reduce the error and contingency, each algorithm is executed 100 times. The comparison results indicate that low accuracy has been achieved by FCM, IFCM and KFCM algorithms with OFG dataset. The reason is so many points in OFG dataset are overlapped and close with each other. Original points in OFG dataset have been shown in Fig. 3. Although FCM and IFCM algorithms cost less time, they behave lower accuracy than other algorithms. KFCM and KIFCM algorithm cost almost the same time. Because they are not very different in programming. The formulation of calculating the degree of membership is the only difference exists in these two algorithms. High accuracy and short Inter-distance can be obtained by STA-KIFCM methods with less time than PSO-KIFCM, GA-KIFCM. Figure 4 displays the comparison result about accuracy. Although the performance of PSO-KIFCM and GA-KIFCM are similar to STA-KIFCM under Iris, Wine and Tae datasets, STA-KIFCM has the highest accuracy in these three datasets. Moreover, not only does STA-KIFCM perform the highest accuracy but also the differences between STA-KIFCM and GA-KIFCM, PSO-KIFCM are obvious under Seed and OFG datasets. So, we can conclude that STA-KIFCM behaves a better performance than PSO-KIFCM and GA-KIFCM under testing all datasets. In obtaining the optimal initial clustering centroid for KIFCM, STA achieves the highest accuracy with the lowest time-consuming under five datasets. It shows that STA has excellent performance than GA and PSO in speedability and optimality when solving the optimization problem in clustering. Figure 3 obtained by the STA-KIFCM method displays the clustering result of OFG dataset. OFG dataset can be clustered into 3 clusters which has been verified in Sect. 2.1. So many data points in cluster 2 are close with points in cluster 3. It is hard to partition the data in the intersection between Clusters 2 and 3 by Euclidean distance measurement. And the intersection between Clusters 2 and 3 can not be easily distinguished by vision. But STA-KIFCM method can classify these points graphically and find the accurate clustering centers exactly. It is easy to distinguish the edge between cluster 2 and cluster 3.

5 Conclusions

In this paper, we integrate KIFCM with STA to overcome the shortcomings of the conventional FCM. Using the novel intelligent global search algorithm STA to provide a better initial centroid can improve the results of KIFCM. Determining the parameters of STA can control the trade-off between local and global search. If the data points are distributed in a large area, wide exploration is required. On the other hand, if the data points are located within a relatively small area, deep exploitation is needed. STA can prevent the KIFCM algorithms from getting trapped into local optima because it aims to find the optimal initialized cluster centroid for KIFCM algorithms. The proposed STA-KIFCM method are evaluated using five well-known clustering datasets and compare with PSO-KIFCM and GA-KIFCM. Experimental results over five well-known datasets show that the STA-KIFCM method is efficient and can reveal very encouraging results in term of quality of clustering centers found because of the four operators in STA. The proposed method in this paper have intrinsic advantages, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of testing datasets. According to these results, further study can focus on three main perspectives. The first is that reducing the time complexity of the hybrid method should be further considered. The second is applying the proposed clustering method to solve real-world problems in industrial process. The third is replacing the kernel function measurement with different similarity measurements to study the influence of similarity measurement on KIFCM clustering results.

References

Askari S, Montazerin N, Zarandi MH et al (2017) Generalized entropy based possibilistic fuzzy C-Means for clustering noisy data and its convergence proof. Neurocomputing 219:186–202
Article Google Scholar
Atanassov KT, Rangasamy P (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20(1):87–96
Article MATH Google Scholar
Babu GP, Murty MN (1993) A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm. Pattern Recognit Lett 14(10):763–769
Article MATH Google Scholar
Bastanlar Y, Ozuysal M (2014) Introduction to machine learning. Methods Mol Biol 1107:105–128
Article Google Scholar
Bezdek JC, Ehrlich R, Full WE et al (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203
Article Google Scholar
Chaira T (2011) A novel intuitionistic fuzzy C means clustering algorithm and its application to medical images. Appl Soft Comput 11(2):1711–1717
Article Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Analysis and Mach Intell 1(2):224–227
Article Google Scholar
Deng W, Yao R, Zhao H et al (2019) A novel intelligent diagnosis method using optimal LS-SVM with improved PSO algorithm. Soft Comput 23(7):2445–2462
Article Google Scholar
Fan J, Han M, Wang J et al (2009) Single point iterative weighted fuzzy C-means clustering algorithm for remote sensing image segmentation. Pattern Recognit 42(11):2527–2540
Article MATH Google Scholar
Filho TD, Pimentel BA, De Souza RM et al (2015) Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization. Expert Syst Appl 42(17):6315–6328
Article Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Human Genet 7(2):179–188
Google Scholar
Fouche G, Langit L (2011) Introduction to data mining. Scalable Comput Pract Exp 9(4):369–402
Google Scholar
Ghosh S, Dubey SK (2013) Comparative analysis of K-means and fuzzy C-means algorithms. Int J Adv Comput Sci Appl 4:4
Google Scholar
Han J, Yang C, Zhou X et al (2018) A two-stage state transition algorithm for constrained engineering optimization problems. Int J Control Autom Syst 16(2):522–534
Article Google Scholar
Hathaway RJ, Bezdek JC (1995) Optimization of clustering criteria by reformulation. IEEE Trans Fuzzy Syst 3(2):241–245
Article Google Scholar
Hosseinabadi AAR, Vahidi J, Saemi B et al (2019) Extended genetic algorithm for solving open-shop scheduling problem. Soft Comput 23(13):5099–5116
Article Google Scholar
Huang Z, Yang C, Zhou X et al (2018) A novel cognitively inspired state transition algorithm for solving the linear bi-level programming problem. Cogn Comput. 1:1–11
Article Google Scholar
Huang Z, Yang C, Zhou X et al (2018) A hybrid feature selection method based on binary state transition algorithm and relieff. IEEE J Biomed Health Inf 23(5):1888–1898
Article Google Scholar
Huang M, Zhou X, Huang T et al (2019) Dynamic optimization based on state transition algorithm for copper removal process. Neural Comput Appl 31(7):2827–2839
Article Google Scholar
Hutchinson RA, Westphal JA, Kieffer SW (1997) In situ observations of old faithful geyser. Geology 25(10):875–878
Article Google Scholar
Kuo RJ, Mei CH, Zulvia FE et al (2016) An application of a metaheuristic algorithm-based clustering ensemble method to APP customer segmentation. Neurocomputing 205:116–129
Article Google Scholar
Kuo RJ, Lin TC, Zulvia FE et al (2018) A hybrid metaheuristic and kernel intuitionistic fuzzy c-means algorithm for cluster analysis. Appl Soft Comput 67:299–308
Article Google Scholar
Li C, Zhou J, Kou P et al (2012) A novel chaotic particle swarm optimization based fuzzy clustering algorithm. Neurocomputing 83:98–109
Article Google Scholar
Lin K (2014) A novel evolutionary kernel intuitionistic fuzzy C-means clustering algorithm. IEEE Trans Fuzzy Syst 22(5):1074–1087
Article Google Scholar
Liu J, Xu M (2008) Kernelized fuzzy attribute C-means clustering algorithm. Fuzzy Sets Syst 159(18):2428–2445
Article MathSciNet MATH Google Scholar
Michielssen E, Ranjithan S, Mittra R (1992) Optimal multilayer filter design using real coded genetic algorithms. IEE Proc J (Optoelectronics) 139(6):413–420
Article Google Scholar
Papageorgiou EI, Iakovidis DK (2013) Intuitionistic fuzzy cognitive maps. IEEE Trans Fuzzy Syst 21(2):342–354
Article Google Scholar
Pedrycz W, Rai P (2008) Collaborative clustering with the use of fuzzy C-means and its quantification. Fuzzy Sets Syst 159(18):2399–2427
Article MathSciNet MATH Google Scholar
Pena JM, Lozano JA, Larranaga P et al (1999) An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recognit Lett 20(10):1027–1040
Article Google Scholar
Sipe MA (2001) Statistical pattern recognition. Pattern Anal Appl 4(1):71–72
Article Google Scholar
Wang P (1983) Pattern recognition with fuzzy objective function algorithms (James C. Bezdek). Siam Rev 25(3):442–442
Google Scholar
Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4(2):65–85
Article Google Scholar
Zadeh LA (1999) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 100(1):9–34
Article MathSciNet Google Scholar
Zhang DQ, Chen SC (2003) Clustering incomplete data using kernel-based fuzzy C-means algorithm. Neural Process Lett 18(3):155–162
Article Google Scholar
Zhang DQ, Chen SC (2004) A novel kernelized fuzzy C-means algorithm with application in medical image segmentation. Artif Intell Med 32(1):37–50
Article Google Scholar
Zhao H, Xu Z, Ni M et al (2010) Generalized aggregation operators for intuitionistic fuzzy sets. Int J Intell Syst 25(1):1–30
Article MATH Google Scholar
Zhou X, Yang C, Gui W et al (2012) State transition algorithm. J Ind Manag Optim 8(4):1039–1056
MathSciNet MATH Google Scholar
Zhou KL, Chao F, Yang SL (2014) Fuzziness parameter selection in fuzzy c-means: the perspective of cluster validation. Sci China Inf Sci 57(11):1–8
Google Scholar
Zhou X, Gao DY, Yang C et al (2016) Discrete state transition algorithm for unconstrained integer optimization problems. Neurocomputing 173:864–874
Article Google Scholar
Zhou X, Shi P, Lim CC et al (2018) A dynamic state transition algorithm with application to sensor network localization. Neurocomputing 273:237–250
Article Google Scholar
Zhou X, Yang C, Gui W (2018) A statistical study on parameter selection of operators in continuous state transition algorithm. IEEE Trans Cybern 49(10):3722–3730
Article Google Scholar
Zhou X, Zhou J, Yang C et al (2018) Set-point tracking and multi-objective optimization-based PID control for the goethite process. IEEE Access 6:36683–36698
Article Google Scholar
Zhou X, Huang M, Huang T et al (2019) Dynamic optimization for copper removal process with continuous production constraints. IEEE Trans Ind Inf. https://doi.org/10.1109/TII.2019.2943500
Article Google Scholar
Zhou X, Yang K, Xie Y et al (2019a) A novel modularity-based discrete state transition algorithm for community detection in networks. Neurocomputing 334:89–99
Article Google Scholar
Zhou X, Long J, Xu C et al (2019) An external archive-based constrained state transition algorithm for optimal power dispatch. Complexity. https://doi.org/10.1155/2019/4727168
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation, Central South University, Changsha, 410083, China
Xiaojun Zhou, Rundong Zhang, Xiangyue Wang & Chunhua Yang
The Peng Cheng Laboratory, Shenzhen, Changsha, 51800, China
Xiaojun Zhou
Texas A&M University at Qatar, PO Box 23874, Doha, Qatar
Tingwen Huang

Authors

Xiaojun Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Rundong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tingwen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chunhua Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rundong Zhang.

Ethics declarations

Conflit of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Funding

This work was funded by the National Natural Science Foundation of China (Grant Nos. 61860206014, 61873285), the Innovation-Driven Plan in Central South University (Grant No. 2018CX012), the 111 Project (Grant No. B17048) and Hunan Provincial Natural Science Foundation of China (Grant No. 2018JJ3683) and the Fundamental Research Funds for the Central Universities of Central South University (Grant No. 2019zzts577).

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, X., Zhang, R., Wang, X. et al. Kernel intuitionistic fuzzy c-means and state transition algorithm for clustering problem. Soft Comput 24, 15507–15518 (2020). https://doi.org/10.1007/s00500-020-04879-8

Download citation

Published: 02 April 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00500-020-04879-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Kernel intuitionistic fuzzy c-means and state transition algorithm for clustering problem

Abstract

Similar content being viewed by others

Kernel-Distance-Based Intuitionistic Fuzzy c-Means Clustering Algorithm and Its Application

An Improved Fuzzy C-Means Clustering Algorithm Based on Intuitionistic Fuzzy Sets

A Kernel Fuzzy C-means Clustering Algorithm Based on Firefly Algorithm

1 Introduction