1 Introduction

Cluster analysis is an effective technique in data mining (Kushwaha et al. 2017; Kushwaha and Pant 2018) and machine learning that can be applied to many application areas such as image processing, pattern recognition, signal processing, and other fields of engineering (Everitt et al. 2011). It is an unsupervised learning method of arranging data objects into multiple clusters or groups.

1.1 Clustering algorithms

From optimization point of view, clustering can be formulated as a particular kind of NP hard grouping problem (Nanda and Panda 2014). The goal of data clustering relies on the concept of grouping the data objects into a number of clusters such that the following conditions are satisfied:

  • Homogeneity: Data objects within the same cluster are as similar to each other as possible.

  • Heterogeneity: Data objects belonging to different clusters are as dissimilar to each other as possible (Hruschka et al. 2009; Nanda and Panda 2014).

Clustering methods can be divided into two categories:—partitional and hierarchical (Xu et al. 2005; Everitt et al. 2011). Partitional based clustering algorithms divides the data into multiple groups or clusters based on similarity or dissimilarity among the data objects (Xu et al. 2005). Distance based-similarity measure is used to find the similarity among data points. Among various partitional clustering algorithms, k-means is most popular. Unlike partitional clustering algorithms, hierarchical algorithms generate the nested tree like structure or dendogram of clusters. Hierarchical clustering can further be divided into two approaches: top down and bottom up approach. Rock (Guha et al. 2000) is one of the widely used hierarchical clustering algorithm for categorical data. In terms of execution time, partitional clustering is faster in comparison to hierarchical clustering because of its low algorithmic complexity (Han et al. 2012).

Partitional and hierarchical clustering can also be classified in terms of hard and soft clustering techniques. In the former method, each object belongs only to a single cluster at a time while in the latter, each data object partially belongs to one or more clusters with different membership values, ranging between [0, 1]. The sum of the membership values for each data point must be one (Xu and Wunsch 2005).

A well established soft clustering approach is fuzzy C-means (FCM) proposed by Bezdek et al. (1984). It is a widely used clustering algorithm well known for its simplicity and applicability. FCM assigns every data point to multiple clusters by computing a membership matrix. FCM is more useful for the datasets having overlapping clusters. FCM has obtained satisfactory results in many application areas including pattern recognition (Jain and Law 2005). However, a major drawback of FCM is that its performance depends a lot on the choice of initial clusters increasing its risk of getting trapped into a local minimum.

This algorithm is effective for spherical clusters but does not perform well for general clusters for which kernel based clustering algorithms proposed by Shen et al. (2006) are more useful. In KFCM algorithm the Euclidean distance metric used in previous algorithms is replaced with a kernel metric. The kernel function is applied in order to achieve better mapping for nonlinear separable datasets.

However, for such kernel-based methods, a crucial step is the combination or selection of the best kernels among an extensive range of possibilities. This step is often heavily influenced by the prior knowledge about the data and by the patterns that we expect to discover (Shawe-Taylor and Cristianini 2004).

Researchers have shown that application of meta-heuristics like genetic algorithm (Bandyopadhyay and Maulik 2002), ant colony optimization (Shelokar et al. 2004) and evolutionary strategy (Babu and Murty 1994) can help in reducing the initialization problem in clustering problems. Literature also indicate that a combination of fuzzy logic into the meta-heuristics is an effective method for dealing with clustering problems. For example: Pang et al. (2004) proposed a fuzzy discrete particle swarm optimization (Fuzzy-PSO) for solving travelling salesman problem. In this method, the position and velocity of the particles is redefined to represent the fuzzy relation between the data objects and the clusters. Izakian and Abraham (2011) proposed a hybrid fuzzy clustering algorithm which combined FCM and Fuzzy -PSO (FCM–PSO). FCM–PSO provided better results than other fuzzy clustering algorithms (FPSO and FCM).

1.2 Motivation

Some of the major challenges in clustering algorithms are the ability to deal with overlapped clusters and sensitivity to the initial position of cluster centroids. The aim of this paper is to propose a novel fuzzy magnetic optimization clustering (Fuzzy-MOC) algorithm which can solve these problems. The data objects in the proposed algorithm are considered as movable objects and are allowed to move around the feature space in the influence of magnetic field and combine together if they are close enough to each other. The aim is to find the best position of each cluster centroids (cluster representative) where each centroid is modelled by a magnetic particle. To evaluate the performance of Fuzzy-MOC, experiments are conducted on synthetic as well as benchmark health data sets and the results obtained are compared with the results obtained through three other state-of-the-art fuzzy clustering algorithms.

The remainder of this paper is organized as follows: Sect. 2 presents the background related too proposed method. Section 3 provides the details of the proposed algorithm. Experimental results on different datasets are presented in Sect. 4. Finally, the paper concludes with Sect. 5.

2 Background

This section describes the basic fuzzy C-means algorithm, scale free network and also provides an introduction to magnetic optimization algorithm.

2.1 Fuzzy C-means algorithm

In a most general FCM algorithm, the data set having \({n_{data}}\) objects \(o=\{ {o_1},{o_2} \ldots .{o_{{n_{data}}}}\}\) is divided into \(k\) (\(1<k<n\_data\)) fuzzy centres having \(z\) fuzzy centroids/cluster prototype or cluster centres. Each object is represented by quantitative variable \({o_i}=\{ {o_{i1}},{o_{i2}} \ldots .{o_{iDim}}\}\). Fuzzy matrix \(\mu\) is constructed having \(n\_data\) number of rows and \(k\) number of columns. Here \({\mu _{ij}}\) indicates the degree of membership of object \(i\) with the \(jth\) cluster. The higher the value of \({\mu _{ij}}\), the more it indicates that \(i\) belongs to cluster \(j\). The characteristics of \(\mu\) are as follows:

$$0<\mathop \sum \limits_{{i=1}}^{{{n_{data}}}} {\mu _{ij}}<{n_{data}}\quad \nabla j=1,2, \ldots ,k$$
(1)
$${\mu _{ij}}\epsilon [0,1]\quad \nabla i=1,2, \ldots ,{n_{data}};\quad \nabla j=1,2, \ldots ,k$$
(2)
$$\mathop \sum \limits_{{j=1}}^{k} {\mu _{ij}}=1\nabla i=1,2, \ldots ,{n_{data}}$$
(3)

The goal of FCM algorithm is to minimize the error objective function.

$${J_{FCM}}=\mathop \sum \limits_{{j=1}}^{k} \mathop \sum \limits_{{i=1}}^{{n\_data}} {\mu _{ij}}^{m}{o_i} - {z_j}$$
(4)

where cluster centres (cluster prototype) is obtained by using through Eq. 5

$${z_j}=\frac{{\mathop \sum \nolimits_{{i=1}}^{{n\_data}} {\mu _{ij}}^{m}{o_i}}}{{\mathop \sum \nolimits_{{i=1}}^{{n\_data}} {\mu _{ij}}^{m}}}$$
(5)

where \(m\) is the level of cluster fuzziness having value between 0 and infinity.

The membership degrees are updated using Eq. (6) under the constraint

$$\mathop \sum \limits_{{j=1}}^{k} {\mu _{ij}}=1$$
$${\mu _{ij}}={\left[ {\mathop \sum \limits_{{a=1}}^{k} {{\left( {\frac{{{o_i} - {z_j}}}{{{o_i} - {z_a}}}} \right)}^{\frac{1}{{m - 1}}}}} \right]^{ - 1}}$$
(6)

Algorithm 1 provides the pseudo code of FCM.

figure a

2.2 Magnetic optimization algorithm

Magnetic optimization algorithm (MOA) proposed by Tayarani and Akbarzadeh (2008) is based on the principle of magnetic theory where the particles are attracted towards each-other on the basis of the charge they are having. In MOA, the potential solutions (referred to as magnetic particles) are scattered around the search space and particles having higher fitness value are assumed to contain higher mass value and higher magnetic field. In this algorithm, magnetic particles interact in a lattice like structure. Each magnetic particle including the worst apply attractive force to the neighbouring particles based on their magnetic field to effectively search the optimization space. The magnetic force value depends upon the distance between the particles and their magnetic field. This type of force has a long range effect, if the distance between the particles increases, its effect decreases and reaches zero if the distance is infinity.

Consider \(N\) magnetic particles in a \(Dim\) dimensional search space in which the position of the particle \({\text{X}}\) is represented as follows:

$$X_{i}^{k}=(x_{1}^{k},{{\text{x}}_2}, \ldots {{\text{x}}_{\text{S}}})\quad {\text{for}}\;{\text{k}}=1,2,3 \ldots Dim\;{\text{i}}=1,2,3, \ldots {\text{S}}\quad {\text{and}}\quad {\text{itr}}=0$$
(7)

where \(i\) represents the ith magnetic particle located in the lattice structure \({\text{S}}\). Based on Tayarani and Akbarzadeh (2008), the objective function value is calculated by each particle and is stored in the magnetic field \({B_{i}}\). After this the magnetic field of each particle is normalized as follows:

$${B_i}=\frac{{{B_i} - Best}}{{{B_i} - Worst}}$$
(8)

where

$$Best=min ({B_i})$$
(9)
$$Worst=max ({B_i})$$
(10)

The mass value of the ith magnetic particle \({M_i}\) is calculated as follows:

$${M_i}=\alpha +\rho \times B_{i}^{{itr}}$$
(11)

where \(\alpha\) and \(\rho\) are two constant values. The parameters α and ρ control the movement of the particles.

Acceleration of the ith magnetic particle is calculated as:

$$A_{i}^{k}(itr+1)=\frac{{Force_{i}^{k}}}{{{M_i}}} \times Rand$$
(11)

\({\text{ Rand}}\) is the uniform random number between 0 and 1. Each particle applies the force only to its neighbors, the neighbors of \({{\text{B}}_{\text{i}}}\) is found. Force which is applied to particle \({X_i}\) from its neighbor’s \({X_u}(\forall {X_u} \in {N_i})\) is calculated as follows:

$$Forc{e_i}=Forc{e_i}+ \frac{{dist \times {B_u}}}{{D({X_i}, {X_u})}}$$
(12)
$$dist={X_i} - {X_u}$$

where \(D({X_i}, {X_u})\) is the distance between the particle \({X_i}\)and its neighbors \({X_u}\)

$$D({X_i}, {X_u})=\frac{1}{{Dim}}\mathop \sum \limits_{{k=1}}^{{Dim}} \left| {\frac{{{X_i} - {X_u}}}{{{{\text{u}}_k} - {l_k}}}} \right|$$
(13)

where \(k\) is the dimension of the particle \(.{l_k}\) and \({{\text{u}}_k}\) are the lower and upper bound of the kth dimension of the particle.

Then, the next velocity and next position of the ith magnetic particle is calculated using Eqs. 15 and 16:

$$V_{i}^{k}(itr+1)=V_{i}^{k}(itr)+A_{i}^{k}(itr+1)$$
(15)
$$X_{i}^{k}(itr+1)=X_{i}^{k}(itr)+V_{i}^{k}(itr+1)$$
(16)

2.3 Scale free network

Scale free networks are based on the concept that despite having diverse applications, most networks appearing in nature follow a universal organizing principles (Barabási et al. 2000). It is characterized by a highly heterogeneous degree distribution, which follows a “power-law”. In scale-free (SF) network, there are few nodes which have lot of connections (links) and some nodes have just a few connections. Scale-free (SF) networks is characterized by a highly heterogeneous degree distribution, which follows a “power-law” as shown in Fig. 1.

Fig. 1
figure 1

Power law distribution of node linkages (Holme and Kim 2002)

3 Proposed algorithm: Fuzzy-MOC

The most challenging problem of clustering is its sensitivity to the initial centroid selection and overlapping clusters problem. To solve this problem, Fuzzy-MOC clustering algorithm is proposed in which the magnetic particles move around the search space to find the best representative of the cluster centroids. MOA is customized for dealing with data clustering by making suitable modifications in the algorithm.

In the proposed algorithm, the problem space is modelled as the data points in a multi-dimensional space. Data points may not belong exactly to a single cluster only but may belong to multiple clusters. Fuzzy-MOC algorithm tries to determine the set of candidate cluster centroids and thus determining a near optimal classification of the dataset at hand. The main idea of the proposed algorithm is that data points are considered as fixed entities while magnetic particles are considered as movable entities. Each of the magnetic particles (candidate solutions) denotes all the centroids of datasets. In this, the fixed data objects apply force directly to the magnetic particles which causes magnetic particles to move towards the global optimum. Instead of using cellular or lattice structure, in Fuzzy-MOC employs scale free networks within the population.

3.1 Solution encoding

Encoding scheme is needed to encode centroids or cluster centres. Initially, all the candidate solutions represented as magnetic particles are randomly generated for the clustering problem. One candidate solution (magnetic particle) represents all the centroids of the dataset. The random candidate solutions generated interact with their mass value and magnetic field through a magnetic force. For clustering, each magnetic particle is represented as \(k\) cluster centroids encoded as \(Dim\) dimensional vector. Therefore the dimension of the particle is \(Dim \times k\). For instance, if there are three centroid clusters with four features in the dataset, then the length of the individual particle is of size (1 × 12).The solution representation of magnetic particle is shown in Fig. 2.

Fig. 2
figure 2

Particle encoding of single particle

After updating the particle’s position, it is possible that it may violate the constraints given in Eqs. 2 and 3. To solve this problem, standardization is performed on position matrix of each magnetic particle. Negative values in the position matrix are set to zero. If all the values in a row of the position matrix are zero, then a new random number if generated between 0 and 1. After standardization, new matrix is given as follows:

$${X_i}=\left[ {\begin{array}{*{20}{c}} {{\mu _{11}}/\mathop \sum \limits_{{i=1}}^{k} {\mu _{ij}}}& \cdots &{{\mu _{k1}}/\mathop \sum \limits_{{i=1}}^{k} {\mu _{ij}}} \\ \vdots & \ddots & \vdots \\ {{\mu _{1n}}/\mathop \sum \limits_{{i=1}}^{k} {\mu _{ij}}}& \cdots &{{\mu _{kn}}/\mathop \sum \limits_{{i=1}}^{k} {\mu _{ij}}} \end{array}} \right],$$

To evaluate the performance of the proposed algorithm, the fitness value or the objective function is kept same as that of FCM algorithm (see Eq. 4). According to Izakian and Abraham (2011), the running time of FCM algorithm is lower as compared to the heuristic algorithms because it executes less function evaluations, but it has a disadvantage of being vulnerable to local optimum.

Algorithm 2 shows the pseudo code of proposed algorithm.

figure b

3.2 Salient features of the proposed algorithm, Fuzzy-MOC

  • \({\text{G}} ( {{\text{itr}}} )\) will take initial value \({{\text{G}}_0}\) and will reduce with time towards a final value, \({\text{G}}({\text{max}}\_{\text{itr}})\), to adjust the accuracy of the search.

  • A random number is multiplied with force which lies between 0 and 1 to give a randomize characteristic to the algorithm. It helps the algorithm to escape from the local optimum, so the dependency on initial clustering centroids is reduced.

  • \(eps=0.01\) is used in the acceleration equation to avoid divide by zero situation.

4 Experimental results

This section provides an evaluation on the performance of the proposed algorithm on some commonly used UCI (http://archive.ics.uci.edu/ml/) data sets. The proposed algorithm is evaluated in terms of the following performance metrics: F1, RI, purity and accuracy and the results obtained are compared with three other fuzzy clustering algorithms: FCM, Fuzzy-PSO, and KFCM. For the purpose of comparison, all the four algorithms were executed 30 times each and their average values were recorded. The output of the proposed and the other clustering algorithms are summarized in the Tables 3, 4 , 5 and 6. All algorithms were implemented on MATLAB software7.0.0 on a computer having 8 Gb RAM and i7 core processor. Parameter settings of Fuzzy-PSO (Pang et al. 2004), FCM (Bezdek et al. 1984), KFCM (Shen et al. 2006) are kept same as that in the original paper.

The parameter settings of all the algorithms are provided in Table 1.

Table 1 Parameter setting

4.1 Evaluation metric

Following performance measures are considered for evaluating the performance of the proposed algorithm against other algorithms:

Accuracy: It is determined by comparing the clusters obtained by the algorithm with clusters already available in dataset (ground truth value) (Sun and Guo 2014)

$$Accuracy=\frac{{\mathop \sum \nolimits_{{i=1}}^{n} \delta (ground\;truth\;value,\;map\;(C))}}{n}$$
(17)

The map function is used for matching \({\text{Truelabel}}\) of object \({\text{i}}\) to cluster label (\({\text{C}}\)) (obtained by clustering algorithm). Higher the value of accuracy is, better the clustering result.

Rand Index (RI): The Rand Index initially given by (Arzeno and Vikalo 2015) provides the measure of overall clustering accuracy. It gives the percentage of instance pairs that are correctly classified as belonging to either the same cluster or to the different clusters. More specifically, if \({c_i}\) is the label of instance \(i\) and \({\hat {c}_i}\) is the example or a cluster assigned to instance \(i\) by the clustering algorithm. Then,

$$RI=\frac{{\mathop \sum \nolimits_{{i>j}} {\mathbbm{1}} ({\mathbbm{1}}({c_i}={c_j})={\mathbbm{1}}({{\hat {c}}_i}={{\hat {c}}_j})}}{{Total\;number\;of\;instance\;pairs}}$$
(18)

F1: It is the harmonic mean of recall and precision. Precision can be defined as the fraction of number of correct pairs predicted in the same cluster among the total number of pairs predicted in the same cluster, while recall is the fraction of number of correct pairs predicted in the same cluster over the total number of pairs actually in the same cluster.In general, larger values of F-measure indicate better clustering. The value of F1 lies between 0 and 1. Mathematically, it is given as:

$$F1=\frac{{2 \times Precision \times Recall}}{{Precision+Recall}}$$
(19)

Purity: To compute purity, each cluster is assigned to the class which is most frequent in the cluster, and then the accuracy of assignment is measured by counting the number of correctly assigned documents and dividing by \(n\). Higher values of purity indicates good clustering:

$$Purity({P_i})=\frac{1}{{{n_i}}} \times ma{x_j} \times {n_{ij}}$$
$$Purity=\mathop \sum \limits_{{i=1}}^{k} \frac{{{n_i}}}{n} \times Purity({P_i})$$
(20)

where \({P_i}\) is the centroid of the\(ith\)cluster.

4.2 Datasets

The proposed algorithm is validated on 16 datasets, out of which 14 datasets are taken from the UCI database while one is taken from Jain dataset (Jain and Law 2005).The characteristics of datasets according to the number of classes, number of instances and number of features are described in Table 2. The datasets used in this paper are Vowel, Breast Cancer Wisconsin Diagnostic (WDBC), Breast Cancer Wisconsin Original (BCW), Bupa (BUPA liver disorders), Heart, Dermatology, Contraceptive Method Choice (CMC), Balance, Crude oil, Ionosphere database (IONO), Iris, Jain, Thyroid and Wine. Aggregation (Gionis and Mannila 2007) is a synthetic dataset considered for comparison.

Table 2 Description of data sets

Besides these datasets, a high dimensional health dataset, called Gene expression is also used for evaluating the proposed algorithm. This collection of data is part of the RNA-Seq (HiSeq) PANCAN data set, it is a random extraction of gene expressions of patients having different types of tumors. The dataset consists of 801 instances with 20,530 features. It contains five classes: BRCA, KIRC, COAD, LUAD and PRAD.

4.3 Result and discussion

Figure 3 shows the original aggregation dataset. For all of the clustering algorithms, the number of clusters is 7. The three dimensional clustering results for the four clustering algorithms are shown in the first four panels of Fig. 4. From this figure it can be seen that though none of the algorithms gave perfect results, the clustering obtained through Fuzzy-MOC is better than the clustering obtained through the other three algorithms.

Fig. 3
figure 3

Aggregation dataset

Fig. 4
figure 4

Aggregation dataset a KFCM, b FCM, c Fuzzy_PSO, d Fuzzy-MOC (proposed algorithm)

Table 3 shows the average accuracy from the 30 simulation runs. Fuzzy-MOC achieves highest accuracy among all the datasets except for thyroid, heart, aggregation, iris and IONO data sets, in comparison to FCM, Fuzzy-PSO and KFCM. For the IONO data set, average accuracy of Fuzzy-MOC is greater than those of KFCM and it is equal to that of FCM and Fuzzy-PSO. For the thyroid, iris and aggregation datasets, FCM yields greater accuracy than the other clustering algorithms. KFCM achieves highest accuracy in Heart dataset in comparison to other algorithms. From the result, it can be observed that the performance of Fuzzy-MOC is more consistent in comparison to other clustering algorithms with respect to the average accuracy.

Table 3 Mean values of accuracy over 30 independent runs on 15 datasets

Table 4 shows the average F1 Measure for the 30 simulation runs. For all data sets except heart, dermatology, vowel, IONO and wine data sets, Fuzzy-MOC exhibits a significantly higher F1 value in comparison to FCM, Fuzzy-PSO and KFCM. In case of thyroid data set Fuzzy-PSO give the best results. For the IONO dataset, FCM, Fuzzy-PSO and Fuzzy-MOC give similar results. FCM achieves highest F1 in dermatology and wine dataset in comparison to other clustering algorithms. For heart dataset, KFCM achieves the highest F1 value.

Table 4 Mean values of F1 over 30 independent runs on 15 datasets

Table 5 shows the results in terms of mean purity and RI obtained by different clustering algorithms used in the present study. Fuzzy-MOC produces higher accuracy especially on BUPA, CMC, dermatology, WDBC, aggregation, Iris, IONO, jain, vowel datasets in terms of purity and RI in comparison to the other three clustering algorithms. For crude oil dataset Fuzzy-MOC achieves a higher RI value while FCM give a highest value for purity. FCM achieves higher purity and RI value for wine, thyroid and balance datasets. KFCM give better results in heart datasets. FCM, Fuzzy-PSO, KFCM and Fuzzy-MOC give similar purity value for BCW dataset while KFCM achieve better RI value as compared to other.

Table 5 Mean values of RI and purity over 30 independent runs on 15 datasets

Results obtained for Gene expression dataset (high dimensional dataset) are provided in Table 6. From the table it can be seen that Fuzzy-MOC achieves higher values for RI and accuracy in comparison to other algorithms. FCM perform better in terms of purity and KFCM in terms of F1. From the results, once again it can be seen that the proposed algorithm surpassed the other algorithms for two out of four performance measures for a high dimensional gene expression data.

Table 6 Gene expression dataset

5 Conclusion

Clustering algorithms have emerged as an alternative powerful meta-learning tool to undertake a broad range of applications. This paper proposes Fuzzy-MOC algorithm, a new meta-heuristic approach based on the principle of magnetic field theory for efficient fuzzy clustering. Fuzzy-MOC is designed so as to minimize the initialization problem, a major drawback of most of the clustering algorithms. The objective considered is to determine the optimum centroid of the clusters. Empirical evaluation of Fuzzy-MOC is done on a set of 15 benchmark datasets and a high dimensional gene expression data set. Efficiency of Fuzzy-MOC is evaluated through four different performance metrics: F1, accuracy, purity and RI and comparison is done with three other fuzzy clustering algorithms. The experimental results indicate a consistent performance of Fuzzy-MOC for most of the data sets including high dimensional data set considered in the present study.