1 Introduction

Clustering problems exist in many areas including machine learning, pattern recognition, image analysis and information sciences, etc (Sipe 2001; Fouche and Langit 2011; Bastanlar and Ozuysal 2014). With the growing interest in automatically understanding, processing and summarizing data, many application domains have employed various clustering algorithms to identify patterns within a dataset. Clustering is the process of assigning data objects into a set of disjoint groups called clusters so that objects in each cluster are more similar to each other than objects from different clusters. Clustering algorithms work by assigning objects to a group if they show a high level of similarity and by assigning objects to different groups if they are distinguished from each other. The common used clustering algorithms can be broadly classified as Hard and Fuzzy (Hathaway and Bezdek 1995). C-means is one of the most popular hard clustering algorithms which partition data objects into c clusters, where the number of clusters, c, is decided in advance according to application purposes. This model is inappropriate for real data sets where there are no definite boundaries between the clusters. Using the distance-based (DB) method can find the appropriate number of clusters in advance (Davies and Bouldin 1979). Fuzzy c-means (FCM) is one of the most popular unsupervised fuzzy clustering algorithm (Wang 1983; Zhang and Chen 2004). Since FCM is easily implemented and has obtained satisfactory results in many applications, it has become an important tool for clustering analysis (Zhou et al. 2014; Askari et al. 2017). Furthermore, due to the increasing complexity of the clustering environment and lack of knowledge about the domain of clustering problem in the process of distance measurement under an fuzzy environment, the kernel-based fuzzy c-means (KFCM) algorithm replaces the Euclidean distance metric which is widely used in previous literatures with a kernel metric. The kernel function is applied in order to achieve better mapping for nonlinear separable datasets. Previous studies have shown that the KFCM algorithm performs better than the FCM algorithm. FCM algorithm considers the degree of membership as studied in many literatures (Liu and Xu 2008; Zhang and Chen 2003; Atanassov and Rangasamy 1986). IFCM not only thinks about the degree of membership, but also considers the non-membership of the point in dataset. The IFCM algorithm can achieve better performance compared with the FCM algorithm (Chaira 2011). For example, the intuitionistic fuzzy c-means (IFCM) algorithm was applied in applications to medical images in Bezdek et al. (1984). It is more reasonable to replace fuzzy measurement with intuitionistic metric. Since kernel intuitionistic fuzzy c-means (KIFCM) is an extension of both KFCM and IFCM, this study replaces the Euclidean distance with kernel function and substitutes the fuzzy measurement with the intuitionistic fuzzy metric. KIFCM can describe the natural properties of clustering phenomena more delicately and comprehensively (Ghosh and Dubey 2013). In KIFCM, the goal is to minimize the sum of distances between data points and their respective parent cluster centroid, taking into account the similarity between elements and corresponding clustering center in each cluster. KIFCM has been widely studied in many literatures because of the nice recognition performance and robustness (Papageorgiou and Iakovidis 2013). Although KIFCM algorithm appears good performance in fuzzy clustering problems, but there still exists some revealed deficiency in motivating the proposal of alternative approaches for fuzzy clustering. Such as initialization with randomly generated centers, high probability of getting trapped into local minima and low accuracy. Therefore, this study intends to improve the KIFCM algorithm by employing global search optimization algorithm.

Many global optimization algorithms such as genetic algorithm (GA) and particle swarm optimization (PSO) have been successfully employed in many clustering applications in previous literatures (Whitley 1994; Hosseinabadi et al. 2019; Kuo et al. 2016; Filho et al. 2015; Deng et al. 2019). They aim to solve optimization problems without being trapped into local minima. The state transition algorithm is a novel intelligent global search algorithm for many optimization problems due to its versatility and simplicity (Li et al. 2012; Zhou et al. 2012; Huang et al. 2018; Zhou et al. 2018; Huang et al. 2018; Zhou et al. 2018). In this paper, STA and KIFCM are combined together to improve the accuracy and obtain better clustering centers. Five benchmark datasets are implemented to compare the proposed method with other methods such as FCM, IFCM, KFCM and KIFCM. The experimental results indicate that the proposed hybrid method achieve better performance and realize higher accuracy than other methods. The novelty and contributions of this paper are listed below:

  1. (i)

    This study applies the intuitionistic fuzzy set and kernel function together to settle clustering problems. Using the intuitionistic fuzzy set can partition clustering data points reasonably. Taking the kernel function as the distance metric can distinguish the overlapping clustering data points accurately.

  2. (ii)

    This study designs STA to avoid getting trapped into local optima. STA has good performance both in wide exploration and deep exploration which is the trade-off between local and global search.

  3. (iii)

    Combining STA with the promising KIFCM algorithm can increase accuracy obviously. Accuracy is the most important index in evaluating the performance of clustering algorithm in clustering problems. This study firstly intends to employ STA and KIFCM to manage clustering problems.

The remainder of this study is organized as follows. Section 2 presents the basic theory of FCM algorithm. Section 3 presents the hybrid method. The experimental results are presented in Sect. 4. Finally, Sect. 5 concludes this study.

2 Theoretical basis of FCM algorithm

The basic theoretical knowledge necessary for this study are presented in this section. This section includes two parts. The first part includes distance-based (DB) c-means algorithm used to determine the number of clusters. Another part includes KIFCM algorithm which combines the intuitionistic fuzzy sets and kernel function into the conventional FCM.

2.1 Distance-based c-means algorithm

The conventional clustering algorithm partitions set of n objects \(X=\{x_{1},x_{2},\ldots ,x_{n}\}\) in \(R^{d}\) dimensional space into c \((2\le c \le n)\) fuzzy clusters with clustering centers \(C=\{c_{1},c_{2},\ldots ,c_{c}\}\). Determining the number of clusters is a significant preliminary study before applying the proposed hybrid method to get the best clustering centers. The DB c-means algorithm aims to divide the dataset into several clusters and calculate the DB index for each cluster. The best number of clusters can be obtained by taking the ratio of intra-cluster distance to inter-cluster distances as metric. If a dataset can be divided into k clusters, the DB value can be calculated as follows:

$$\begin{aligned} DB(k) = \frac{1}{k}\sum _{i=1}^{k}D_{i} \end{aligned}$$
(1)

where \(D_{i}\) is defined as below:

$$\begin{aligned} D_{i} = \max \underset{i \ne j}{\{}\frac{\bar{d_{i}}+\bar{d_{j}}}{d_{ij}}\} \end{aligned}$$
(2)

here, \(d_{ij}\) represents the inter-distance which is the distance between centroid i and j, but \(\bar{d_{i}}\) represents the intra-distance which is the average distance between each point in cluster i to the centroid of cluster i.

Figure 1 shows the result of evaluating the DB value corresponding to 2–10 clusters. It applies c-means algorithm which begins with a random initial centroid to cluster the Old Faithful Geyser (OFG) dataset. At each trial,the results might be different. But the error of evaluation can be reduced when 20 repetitions are conducted. The result shows that the smallest DB value is obtained at 3.

Fig. 1
figure 1

DB value for different clustering numbers with OFG dataset

2.2 Intuitionistic and kernel FCM algorithm

In this section, the related background necessary for this study will be introduced. This includes the intuitionistic fuzzy logic and kernel FCM algorithm.

2.2.1 Intuitionistic fuzzy logic

Fuzzy set theory, proposed by Zadeh, has been successfully applied in various fields (Zadeh 1999). The theory states that the degree of membership of an element to a fuzzy set is a single value between 0 and 1. But in Atanassov’s intuitionistic fuzzy set, both the degree of membership \(\mu (x)\) and the degree of non-membership \(\nu (x)\) are considered (Zhao et al. 2010). The intuitionistic fuzzy logic can describe the natural attributes of objective phenomena more delicately and comprehensively. In the intuitionistic fuzzy set, an object \(x\in A\) consists of the degree of membership \(\mu _{A}(x)\) and the degree of non-membership \(\nu _{A}(x)\), where \( \mu (x)\in [0,1] \), \(\nu (x)\in [0,1]\) and \( 0\le \mu _{A}(x) + \nu _{A}(x) \le 1 \). If \(\mu _{A}(x) = 1 - \nu _{A}(x)\), \(\forall x\in A\),then A degrades into a fuzzy set. In intuitionistic fuzzy set, there is a hesitation degree \(\pi _{A}(x)\), defined as below:

$$\begin{aligned} \pi _{A}(x) = 1 - \mu _{A}(x) - \nu _{A}(x) \end{aligned}$$
(3)

Obviously, \(\pi _{A}(x)\in [0,1]\). Therefore, the degree of membership in the IFCM algorithm is calculated as below:

$$\begin{aligned} \mu _{ij}^{*} = \mu _{ij} + \pi _{ij} \end{aligned}$$
(4)

In this paper, Yager’s intuitionistic fuzzy complement is taken as the intuitionistic fuzzy complement which can be written as below:

$$\begin{aligned} \pi _{ij} = 1 - \mu _{ij} - \left( 1 - \mu _{ij}^{\alpha }\right) ^{1/\alpha },\quad -1<\alpha <0 \end{aligned}$$
(5)

where \(\pi _{ij}\) is Yager’s hesitation degree in the IFCM algorithm. According to Yager’s intuitionistic fuzzy complement, \(\nu _{ij}=(1-\mu _{ij}^{\alpha })^{1/\alpha }\) and \(\pi _{ij}=1-\mu _{ij}-\nu _{ij}\), we can obtain the Eq. (5). So \(\mu _{ij}^{*}\) can be obtained as below:

$$\begin{aligned} \mu _{ij}^{*} = 1 - \left( 1 - \mu _{ij}^{\alpha }\right) ^{1/\alpha },\quad -1<\alpha <0 \end{aligned}$$
(6)

where \(\alpha \) is a parameter in the Yager-generating function. \(\alpha \) is set to \(-1/2\) in this paper, because the test shows a better result when \(\alpha \) equals to \(-1/2\). \(\mu _{ij}^{*}\) is called the Yager’s intuitionistic membership degree in the IFCM algorithm.

2.2.2 Kernel FCM algorithm

Differing from conventional clustering algorithm, the fuzzy clustering of objects is described by a fuzzy matrix \(\mu \) with n rows and c columns in which n is the number of data objects and c is the number of clusters. The element \(\mu _{ij}\) which is the element in ith row and jth column in \(\mu \) matrix indicates the degree of membership of the ith object belonging to the jth cluster. The FCM aims to minimize the sum of distances between data points and their respective parent cluster centroid as defined as below:

$$\begin{aligned} \begin{array}{ll} \hbox {mi}n &{} J = \sum \limits _{i=1}^{n} \sum \limits _{j=1}^{c}\left[ \mu _{ij}\right] ^{m}\times d_{ij}\left( x_{i},c_{j}\right) \\ \quad &{} \sum \limits _{j=1}^{c}\mu _{ij}=1,\quad \forall i = 1,2,\ldots ,n \\ \hbox {s.t.} &{} 0\le \mu _{ij}\le 1, \quad \forall i = 1,2,\ldots ,n; \\ \quad &{} \forall j=1,2,\ldots ,c \end{array} \end{aligned}$$

where the fuzzy parameter m \((m>1)\) determines the amount of fuzziness of the resulting clusters. \(\mu _{ij}\) and \(c_{j}\) can be calculated by Eqs. (7) and (8), respectively. \(d_{ij}(x_{i},c_{j})=\Vert x_{i}-c_{j}\Vert \) is the Euclidian distance from object \(x_{i}\) to the clustering center \(c_{j}\).

$$\begin{aligned} \mu _{ij}= & {} \frac{1}{\sum _{k=1}^{c}\left( \frac{d_{ij}}{d_{ik}}\right) ^{\frac{2}{m-1}}} \end{aligned}$$
(7)
$$\begin{aligned} c_{j}= & {} \frac{\sum _{i=1}^{n}\left[ \mu _{ij}\right] ^{m}x_{i}}{\sum _{i=1}^{n}\left[ \mu _{ij}\right] ^{m}} \end{aligned}$$
(8)

Therefore, in the traditional FCM algorithm, the membership degree and the clustering center can be obtained by Eqs. (7) and (8), respectively.

In the previous FCM algorithm, the distance between two independent data points x and y is calculated using Euclidean distance \(d(x,y) = \Vert x-y\Vert \). This is because the Euclidean distance is the simplest and most used distance measurement. In this paper, the kernel function works as the distance measurement to substitute the Euclidean distance measurement (Fan et al. 2009). Defining a nonlinear map as \(\varPhi :x \rightarrow \varPhi (x)\in F \), where \( x\in X \). X represents the data space, and F represents the transformed feature space with higher or even infinite dimension. The bidimensional kernel function frequently used are listed as follows:

  • Gaussian function

    $$\begin{aligned} K(x,y) = \exp \left( -\Vert x-y\Vert ^{2}/\sigma ^{2}\right) , \end{aligned}$$
    (9)
  • Radial basis (RBF) functions

    $$\begin{aligned} \begin{array}{c} \displaystyle K(x,y) = \exp \left( -\sum _{i}|x_{i}^{a}-y_{i}^{a}|^{b}/\sigma ^{2}\right) \\ \displaystyle \quad \left( 0 < b \le 2\right) \end{array} \end{aligned}$$
    (10)
  • Hyper tangent function

    $$\begin{aligned} K(x,y) = 1-\tanh \left( -\Vert x-y\Vert ^{2}/\sigma ^{2}\right) , \end{aligned}$$
    (11)

where x and y represent two independent data points, \(\sigma \) represents the standard deviation which can be set in advance. Obviously, RBF function reduces into Gaussian function with \( a=1 \) and \( b=2 \). Taking the kernel function as metric has advantage in distinguishing overlapping data points than conventional Euclidean distance measurement.

Further improvement is achieved by adding a kernel function and intuitionistic fuzzy set into the conventional FCM. The kernel intuitionistic fuzzy c-means (KIFCM) algorithm maps the data point \(x_{i}\) and centroid \(c_{j}\) in the kernel space as \({\varnothing }(x_{i})\) and \({\varnothing }(c_{j})\), respectively. The sum of distances between data points and their respective parent cluster centroid that the KIFCM aims to minimize is defined as follows:

$$\begin{aligned} \begin{array}{ll} \hbox {min} &{} J = \sum \limits _{i=1}^{n}\sum _{j=1}^{c}\left[ \mu _{ij}\right] ^{m}\times \Vert {\varnothing }\left( x_{i}\right) ,{\varnothing }\left( c_{j}\right) ^{2}\Vert \\ \quad &{} \sum \limits _{j=1}^{c}\mu _{ij}=1,\quad \forall i = 1,2,\ldots ,n \\ \hbox {s.t.} &{} 0\le \mu _{ij}\le 1 , \quad \forall i = 1,2,\ldots ,n; \\ \quad &{} \forall j=1,2,\ldots ,c \end{array} \end{aligned}$$

where \(\Vert {\varnothing }(x_{i}),{\varnothing }(c_{j})^{2}\Vert \) is the distance between \({\varnothing }(x_{i})\) and \({\varnothing }(c_{j})\) measured by kernel function, calculated by using Eq. (12):

$$\begin{aligned} \Vert {\varnothing }\left( x_{i}\right) ,{\varnothing }\left( c_{j}\right) ^{2}\Vert = K\left( x_{i},x_{i}\right) + K\left( c_{j}.c_{j}\right) - 2K\left( x_{i},c_{j}\right) \end{aligned}$$
(12)

where K(xy) is the kernel function. This paper applies the Hyper tangent functions as the kernel function. The Hyper tangent function is defined in Eq. (11). So, the kernel metric which is taken as the distance measurement can be defined as Eq. (13):

$$\begin{aligned} d_{ij}\left( x_{i},c_{j}\right) ^{2} = \Vert \phi \left( x_{i}\right) - \phi \left( c_{j}\right) ^{2} \Vert = 2\left( 1-K\left( x_{i},c_{j}\right) \right) \end{aligned}$$
(13)

The formulations used to calculate the degree of membership \(\mu _{ij}\) and clustering center \(c_{j}\) are defined as Eqs. (14) and (15), respectively.

$$\begin{aligned} \mu _{ij}= & {} \frac{\left( 1/\left( 1-K\left( x_{i},c_{j}\right) \right) \right) ^\frac{1}{m-1}}{\sum _{k=1}^{c}\left( 1/\left( 1-\left( K\left( x_{i},c_{k}\right) \right) \right) \right) ^\frac{1}{m-1}} \end{aligned}$$
(14)
$$\begin{aligned} c_{j}= & {} \frac{\sum _{i=1}^{n}\left[ \mu _{ij}\right] ^{m}K\left( x_{i},c_{j}\right) x_{i}}{\sum _{i=1}^{n}\left[ \mu _{ij}\right] ^{m}K\left( x_{i},c_{j}\right) } \end{aligned}$$
(15)

2.2.3 Kernel intuitionistic FCM algorithm

This study substitutes the Hyper tangent function for the Euclidean distance measurement as the distance measurement. Based on the intuitionistic fuzzy set, the kernel intuitionistic FCM algorithm which evolves the conventional fuzzy logic into the intuitionistic fuzzy set can be obtained (Lin 2014). The membership \(\mu _{ij}^{*}\) can be calculated according to Eq. (16). The new clustering center \(c_{j}^{*}\) can be calculated by Eq. (17).

$$\begin{aligned} \mu _{ij}= & {} \frac{\left( 1/\left( 1-K\left( x_{i},c_{j}\right) \right) \right) ^\frac{1}{m-1}}{\sum _{k=1}^{c}\left( 1/\left( 1-\left( K\left( x_{i},c_{k}\right) \right) \right) \right) ^\frac{1}{m-1}}\nonumber \\ \mu _{ij}^{*}= & {} 1 - \left( 1 - \mu _{ij}^{-1/2}\right) ^{-2} \end{aligned}$$
(16)
$$\begin{aligned} c_{j}^{*}= & {} \frac{\sum _{i=1}^{n}\left[ \mu _{ij}^{*}\right] ^{m}K\left( x_{i},c_{j}\right) x_{i}}{\sum _{i=1}^{n}\left[ \mu _{ij}^{*}\right] ^{m}K\left( x_{i},c_{j}\right) } \end{aligned}$$
(17)

The intuitionistic clustering center can be obtained by Eq. (17) in IFCM algorithm. The KIFCM algorithm can be obtained by combining the kernel function and intuitionistic fuzzy set into FCM algorithm. The KIFCM algorithm can be stated in Algorithm 1.

figure a

When the relative change in the centroid values becomes small, the KIFCM algorithm is terminated.

3 Hybrid method with STA and KIFCM

The hybrid method proposed in this paper are included in this section. The first part of this section is STA which is a global search optimization algorithm. The second part of this section is KIFCM which shows a good performance in clustering problems. KIFCM algorithm which was proposed firstly in this paper is presented in this section.

3.1 State transition algorithm

STA is a novel intelligent global optimization algorithm proposed by Huang et al. (2019), inspired by the understanding of temporal and spatial correlation. STA has been widely used to solve many global search problems (Zhou et al. 2018, 2019; Huang et al. 2019; Zhou et al. 2019a, b; Han et al. 2018). Its heuristic ideas enable itself to search large and discontinuous search spaces without getting trapped into the local optima. Based on the current state \(x_{k}\), the unified form of generation of a new state \(x_{k+1}\) in STA can be described as follows:

$$\begin{aligned} \left\{ \begin{array}{l} x_{k+1} = A_{k}x_{k} + B_{k}u_{k} \\ y_{k+1} = f\left( x_{k+1}\right) \end{array} \right. \end{aligned}$$
(18)

where \(x_{k}=[x_{k1},x_{k2},\ldots ,x_{kn}]^{{\mathrm{T}}}\) stands for a n-dimensional state, corresponding to a solution of an optimization problem; \(u_{k}\) is a function of \(x_{k}\) and historical states. \(A_{k}\) and \(B_{k}\) are state transition matrices, which are usually some state transformation operators; \(f(\cdot )\) is the objective function of fitness function, and \(y_{k+1}\) is the function value at \(x_{k+1}\). Using state space transformation for reference, four special state transition operators are designed to generate continuous solutions for an optimization problem. The four special state transition operators can be described as below:

  • Rotation transformation

    $$\begin{aligned} x_{k+1} = x_{k} + \alpha \frac{1}{n\Vert x_{k}\Vert _{2}} R_{t}x_{k}, \end{aligned}$$
    (19)

    where \(\alpha \) is a positive constant, called the rotation factor; \(R_{r}\in {\mathbb {R}}^{n \times n}\), is a random matrix with its entries being uniformly distributed random variables defined on the inteval \([-1,1]\), and \(\Vert \cdot \Vert _{2}\) is the 2-norm of a vector. This rotation transformation has the function of searching in a hypersphere with the maximal radius \(\alpha \).

  • Translation transformation

    $$\begin{aligned} x_{k+1} = x_{k} + \beta R_{t}\frac{x_{k}-x_{k-1}}{\Vert x_{k}-x_{k-1}\Vert _{2}}, \end{aligned}$$
    (20)

    where \(\beta \) is a positive constant, called the transition factor; \(R_{t}\in {\mathbb {R}}\) is a uniformly distributed random variable defined on the interval [0, 1]. The translation transformation has the function of searching along a line from \(x_{k-1}\) to \(x_{k}\) at the starting point \(x_{k}\) with the maximum length \(\beta \).

  • Expansion transformation

    $$\begin{aligned} x_{k+1} = x_{k} + \gamma R_{e}x_{k}, \end{aligned}$$
    (21)

    where \(\gamma \) is a positive constant, called the expansion factor; \(R_{e}\in {\mathbb {R}}^{n \times n}\) is a random diagonal matrix with its entries obeying the Gaussian distribution. The expansion transformation has the function of expanding the entries in \(x_{k}\) to the range of \([-\infty , +\infty ]\), searching in the whole space.

  • Axesion transformation

    $$\begin{aligned} x_{k+1} = x_{k} + \delta R_{a}x_{k}, \end{aligned}$$
    (22)

    where \(\delta \) is a positive constant, called the axesion factor; \(R_{a}\in {\mathbb {R}}^{n\times n}\) is a random diagonal matrix with its entries obeying the Gaussian distribution and only one random position having nonzero value. The axesion transformation is aiming to search along the axes, strengthening single dimensional search.

With the state transformation operators, sampling technique and update strategy, the procedures of the state transition algorithm can be described in Algorithm 2.

figure b

When the current Best needs to be abandoned, Translation transformation can be taken to update the Best. That is the update strategy of greed. Generally, the maximum iteration reached is taken as the terminating condition. In Algorithm 2, SE is called search enforcement, representing the times of transformation by a certain operator. The changeable rotation factor can not only speed up the process of finding optimal solution, but also increase the accruacy of the solution. For better understanding of more detailed steps of STA, the STA toolbox can be downloaded via the following link:https://www.mathworks.com/matlabcentral/fileexchange/52498-state-transition-algorithm

figure c

3.2 STA-KIFCM algorithm

Good performance of the KIFCM algorithm has been shown by many literatures (Pena et al. 1999). However, the results of KIFCM is not stable since KIFCM is highly dependent on the initial clustering centers. The searching ability of the KIFCM algorithm is a local search procedure. Previously, many studies focus on designing a novel cluster initialization (Babu and Murty 1993). STA has good performace in generating heuristic and stochastic initial clustering centers (Zhou et al. 2016). There exist four operators which can generate diverse candidate clustering centers in STA. After that, KIFCM can obtain the clustering centers rapidly and reasonably. So,the hybrid method can generate the optimal clustering centers accurately. The hybrid method with STA and KIFCM can not only avoid getting trapped into the local optima, but also improve the degree of accuracy efficiently. The procedures of the hybrid method can be stated in Algorithm 3.

Several stopping rules can be used. The terminate condition in the Step 3.4 is the maximum number of iterations reached. The Step 4.4 is terminated when the relative change in the degree of membership becomes small. The Step 5 stop when the objective function cannot be minimized anymore. Figure 2 shows the total flowchart of the STA-KIFCM.

4 Experiment results and discussion

This section shows experiments which use five benchmarks datasets to evaluate the performances of fuzzy clustering methods. In this section, the comparison algorithms will be introduced in Sect. 4.1. Then, the experimental setup, results and discussion are included in the rest of this section.

4.1 Comparison algorithms

In this paper, three evolutionary algorithms are adopted to generate the initial centroid for KIFCM. Two typical evolutionary algorithms named GA and PSO are used to compare with STA to overcome a drawback of KIFCM algorithm which is highly dependent on initial centroid. All the parameters of these algorithms are used as the recommended settings on their corresponding reference (Kuo et al. 2018). The STA-KIFCM proposed in this paper has been detailedly described in Fig. 2. In fact, the function of GA and PSO are similar to STA in generating the initial centroid for KIFCM. The main differences between these evolutionary algorithms are the way of generating candidate solutions and updating optimal solution strategy. The detail descriptions of differences between GA-KIFCM, PSO-KIFCM and STA-KIFCM are given as follows.

  1. (1)

    GA-KIFCM: the way of generating candidate solution is based on crossover operator and mutation operator. The strategy of updating optimal solution is the rule of greedy (Whitley 1994; Hosseinabadi et al. 2019; Kuo et al. 2018).

  2. (2)

    PSO-KIFCM: the vector and position updating formula are adopted to generate candidate solution. The updating optimal solution can be summarized as learning from the best particle (Filho et al. 2015; Deng et al. 2019; Kuo et al. 2018).

In addition, the framework of GA-KIFCM and PSO-KIFCM are similar to STA-KIFCM which has been illustrated in Fig. 2. Compared to STA-KIFCM, GA and PSO are adopted to take the place of STA in GA-KIFCM and PSO-KIFCM, respectively.

Fig. 2
figure 2

The flowchart of the STA-KIFCM method

4.2 Experimental setup

Table 3 shows the parameters of PSO-KIFCM and GA-KIFCM methods which have been set in Kuo et al. (2018). The parameters are set according to the rule that the number of evaluating target function are uniform. The preparation for the experiment are outlined below:

Table 1 Datasets characteristics
Table 2 The comparison of results obtained by each algorithm
  • Parameters of STA-KIFCM settings

    According to previous literatures, this study sets the weighting exponent m as 2 (Pedrycz and Rai 2008). The standard deviation \(\sigma \) in kernel function is chosen as \(\sqrt{2}\). The parameters \(\beta \), \(\gamma \), \(\delta \) in STA are chosen as 1 and the parameter \(\alpha \) is decreased linearly from 1 to \(e^{-4}\) over the course of iterations to facilitate global exploration in the early stages and exploitation in the latter stages. The search enforcement (SE) in STA can be chosen as 30, which means that the number of candidate solutions in every iteration is 30.

  • Datasets description

    This part adopts five benchmark clustering datasets which can be downloaded from the UCI machine learning repository, with the exception of the Old Faithful Geyser (OFG) dataset, to evaluate the performance of the proposed algorithm. So many previous studies have taken these datasets into various fields including biology, oncology, education, medicine, chemistry and agriculture (Michielssen et al. 1992). Table 1 gives a brief overview of these datasets characteristics.

    • Iris dataset was introduced by Fisher (Fisher 1936) consisting of there species of the Iris flowers (Virginica, Versicolour and Setosa) includes a total of 150 instances with 4 features (Length and width of the sepals and petals).

    • Wine dataset consisting of wine derived from three cultivars grown in the same region includes a total of 178 instances with 13 features (alcohol, malic acid, ash, alcalinity of ash, magnesium, total phenols, flavanoids, nonflavanoid phenols, proanthocyanins, color intensity, hue, OD280/OD315 of dilured wines and proline).

    • Tae dataset consists of evaluations of teaching performance over three regular semesters and two summer semesters of 151 teaching assistant (TA) assignments at the Statistics Department of the University of Wisconsin-Madison. The scores were divided into three roughly equal-sized categories (“low”, “medium”, and “high”) to form the class variable.

    • Seed dataset consisting of kernels belonging to three different varieties of wheat:Kama, Rosa and Canadian, 70 elements each, randomly selected for the experiment includes seven geometric parameters of wheat kernels (area, perimeter, compactness, length of kernel, asymmetry coefficient, length of kernel grooves).

    • Old Faithful Geyser (OFG) Dataset describe the waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA (Hutchinson et al. 1997). 272 observations on 2 variables (eruptions numeric eruption time in mins, waiting numeric waiting time to next eruption) are contained in the data frame.

  • Performance indices

    The performance indices are used to evaluate the effectiveness of fuzzy clustering method shown in Table 2. The performance indices are listed as follows:

Table 3 Parameter settings for PSO-KIFCM and GA-KIFCM methods
Fig. 3
figure 3

The clustering result of OFG dataset

  1. (a)

    Inter-distance. The sum of distances between data points and their respective parent cluster centroid, smaller value of which are desirable and indicate greater compactness of clustering. The inter-distance can be formulated as follows:

    $$\begin{aligned} d=\sum _{i=1}^{k}\sum _{x_{j}\in S_{i}}d_{ji}(x_{j},c_{i})^{2} \end{aligned}$$
    (23)

    where k represents the number of clustering centers, \(S_{i}\) represents the ith cluster and \(x_{j}\) represents the jth point belonging to cluster \(S_{i}\). \(d_{ji}(x_{j},c_{i})^{2}\) represents the distance between point \(x_{j}\) and \(c_{i}\). In FCM and IFCM, the Euclidean distance \(d_{ji}(x_{j},c_{i})^{2}=\Vert x_{j}-c_{i}\Vert \) is taking as the distance measurement. But the kernel function \(d_{ji}(x_{j},c_{i})^{2}=\Vert \phi (x_{j}),\phi (c_{i})\Vert \) is the distance metric in KFCM, KIFCM GA-KIFCM PSO-KIFCM and STA-KIFCM.

  2. (b)

    Time-consuming. Time complexity is commonly used to evaluate the rapidity of algorithm. The less time consuming with high validity is desirable.

  3. (c)

    Accuracy. Each data point in the tested datasets are labeled according to their membership value for each cluster. The final cluster label for each data point is defined as the cluster which has the highest membership value. The same data point may be labeled differently under each repetition. The accuracy of algorithm labeling is the ratio of the correct labels to total labels.

4.3 Result and discussion

The results of the experiment with five benchmark datasets are shown in this part. The result of proposed STA-KIFCM method compared with other methods have been shown in Table 2. In order to reduce the error and contingency, each algorithm is executed 100 times. The comparison results indicate that low accuracy has been achieved by FCM, IFCM and KFCM algorithms with OFG dataset. The reason is so many points in OFG dataset are overlapped and close with each other. Original points in OFG dataset have been shown in Fig. 3. Although FCM and IFCM algorithms cost less time, they behave lower accuracy than other algorithms. KFCM and KIFCM algorithm cost almost the same time. Because they are not very different in programming. The formulation of calculating the degree of membership is the only difference exists in these two algorithms. High accuracy and short Inter-distance can be obtained by STA-KIFCM methods with less time than PSO-KIFCM, GA-KIFCM. Figure 4 displays the comparison result about accuracy. Although the performance of PSO-KIFCM and GA-KIFCM are similar to STA-KIFCM under Iris, Wine and Tae datasets, STA-KIFCM has the highest accuracy in these three datasets. Moreover, not only does STA-KIFCM perform the highest accuracy but also the differences between STA-KIFCM and GA-KIFCM, PSO-KIFCM are obvious under Seed and OFG datasets. So, we can conclude that STA-KIFCM behaves a better performance than PSO-KIFCM and GA-KIFCM under testing all datasets. In obtaining the optimal initial clustering centroid for KIFCM, STA achieves the highest accuracy with the lowest time-consuming under five datasets. It shows that STA has excellent performance than GA and PSO in speedability and optimality when solving the optimization problem in clustering. Figure 3 obtained by the STA-KIFCM method displays the clustering result of OFG dataset. OFG dataset can be clustered into 3 clusters which has been verified in Sect. 2.1. So many data points in cluster 2 are close with points in cluster 3. It is hard to partition the data in the intersection between Clusters 2 and 3 by Euclidean distance measurement. And the intersection between Clusters 2 and 3 can not be easily distinguished by vision. But STA-KIFCM method can classify these points graphically and find the accurate clustering centers exactly. It is easy to distinguish the edge between cluster 2 and cluster 3.

Fig. 4
figure 4

The comparison of accuracy for different methods

5 Conclusions

In this paper, we integrate KIFCM with STA to overcome the shortcomings of the conventional FCM. Using the novel intelligent global search algorithm STA to provide a better initial centroid can improve the results of KIFCM. Determining the parameters of STA can control the trade-off between local and global search. If the data points are distributed in a large area, wide exploration is required. On the other hand, if the data points are located within a relatively small area, deep exploitation is needed. STA can prevent the KIFCM algorithms from getting trapped into local optima because it aims to find the optimal initialized cluster centroid for KIFCM algorithms. The proposed STA-KIFCM method are evaluated using five well-known clustering datasets and compare with PSO-KIFCM and GA-KIFCM. Experimental results over five well-known datasets show that the STA-KIFCM method is efficient and can reveal very encouraging results in term of quality of clustering centers found because of the four operators in STA. The proposed method in this paper have intrinsic advantages, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of testing datasets. According to these results, further study can focus on three main perspectives. The first is that reducing the time complexity of the hybrid method should be further considered. The second is applying the proposed clustering method to solve real-world problems in industrial process. The third is replacing the kernel function measurement with different similarity measurements to study the influence of similarity measurement on KIFCM clustering results.