1 Introduction

Data Clustering is an unsupervised classification method whose aim is to part the data set into clusters on the basis of similarities and dissimilarities among the data objects. It helps to find the hidden relationship present in the data objects and divides them in the subsets which have some meaning in context with a particular problem [2].

The clustering algorithm falls in two categories: hierarchical clustering and partitional clustering. The hierarchical clustering approach is a group of nested clusters arranged as a tree known as dendrogram. Each cluster node of the tree consists of child nodes [26]. The idea is to find the data on different levels of granularity. In contrast, partitional clustering methods assume the given number of clusters to be found and then look for the optimal partition based on the objective function. The hierarchical clustering can be considered as a series of partitional clustering and any member of hierarchical clustering can be considered as partitional clustering [8].

Data clustering relies on many assumptions, characteristics, information found in the data that explain the data set and its relationship to each other. The objective of clustering algorithm is to maximize the inter cluster distance and minimize the intra cluster distance as shown in Fig. 1 [3].

Fig. 1
figure 1

Clustering analysis process

K-means clustering is one of the older partitioning methods, which is commonly used to automatically divide the data set into k groups. K-means algorithm generates a fast and efficient solution. The basic K-means algorithm works with the objective to minimize the mean squared distance from each data point to its nearest centre. Although K-means is a very popular algorithm, it suffers from several drawbacks. K-means algorithm is highly dependent upon its initial selection of cluster centres and pre-knowledge of fixed cluster centres. It also suffers from dead unit problem [21]. The K-means algorithm may contain several local minima, as the objective function of K-means is not convex [9, 10]. To solve such problems, evolutionary algorithms such as genetic algorithm (GA) and particle swarm optimization (PSO) have been introduced. Genetic algorithm starts its search with a random set of solutions, instead of just one solution. Once a population of solutions (a random set of binary strings) are created at random, each solution is evaluated in the context of the problem [13]. A termination criterion is then checked. If the termination criterion is not satisfied then three main operators modify the population of solutions and a new (and hopefully better) population is created [17, 18]. Particle swarm optimization (PSO) is a population based algorithm which relies upon the cognitive and social behavior of swarm. PSO imitates the swarming behavior of birds flock, fish and bees, how they are seeking food, defined as cornfield vector [1, 12]. Although, evolutionary algorithms are more efficient and generate the good solutions, they suffer from high computational cost of slow convergence rate. PSO also suffers from slow convergence rate [11]. To deal with slow convergence problem of PSO several hybridizations have been proposed in the literature.

A hybridization of Nelder-Mead simplex method with PSO was proposed by Fan, Liang and Zhara [7]. A hybridization of K-means, Nelder-Mead simplex and PSO was proposed by Kao and Zhara [10]. The idea behind hybridization is to use the merits of all the algorithms and to find the exact and appropriate cluster centres for clustering the arbitrary data. In another approach linearly decreasing weight particle swarm optimization (LDWPSO) was introduced with the concept of linearly decreasing inertia factor to solve the slow convergence problem of PSO. Yang et al. [4] proposed an algorithm named as accelerated linearly decreasing weight particle swarm optimization (ALDWPSO) for improving the performance of PSO and to maintain the global and local search abilities of swarm. This paper presents a hybridization scheme named as Boundary Restricted Adaptive Particle Swam Optimization (BR-APSO). The objective of proposed BR-APSO is to find the appropriate cluster centre, solve the slow convergence problem and maintain the equilibrium between global and local search capabilities of swarm. Results shown on different artificial and real life situation data set prove that BR-APSO gives more accurate results as compared to K-means PSO, K-PSO, K-NM-PSO, LDWPSO and ALDWPSO.

This paper is organized as follows: In Sect. 2, the K-means algorithm, its details, working and major drawbacks are described. Section 3 details the standard PSO and its hybridization with other algorithms. Section 4 describes the BR-APSO approach, its development and working. Section 5 discusses simulation and experimental results made on some standard test systems and draws inferences on the cluster formation from the results obtained. Finally, Sect. 6 concludes the paper.

2 K-means Algorithm

K-means algorithm has been developed by J. A. Hartigan and M. A. Wong in between 1975 and 1977. It works on the concept that a centre point can represent a cluster. K-means algorithm is an iterative process in which data objects are moved into different clusters, until they reach the appropriate cluster [8, 17].

Given a set of observations (x1, x2,…, xn), where each observation is a d-dimensional real vector. K-means algorithm aims to partition the n observations into c sets (c < n) as Z = (z1, z2,…, zc) to minimize a measure of dispersion within the clusters. The standard K-means algorithm minimizes the within-cluster sum of squares distance according to the equation (1) given below.

$$ f_{1} \, = \,\arg \,\min \left( {\sum\limits_{j = 1}^{c} {\sum\limits_{{X^{i} \in Z^{j} }}^{{}} {\left\| {X^{i} - \mu^{j} } \right\|} }^{2} } \right) $$
(1)

where \( \mu^{j} \) is the mean of \( Z^{j} \).

In K-means algorithm as shown in Fig. 2, k objects are selected as centroids from the given data set and other data objects of data set are assigned to c clusters where maximum similarity is achieved. Once all the data objects are assigned then it calculates the mean squared distance from each data point. The process is repeated until the termination criterion is satisfied. Choosing a proper initial centroid is the key step of the basic K-means procedure.

Fig. 2
figure 2

K-means algorithm

The main advantage of K-means algorithm is that it generates a fast and efficient solution [24]. The drawback of K-means algorithm is that the effectiveness and quality of result of K-means algorithm is highly dependent upon the objective function used in measuring the distance between objects. Its selection of initial cluster centres and also the number of clusters must be previously known and fixed before clustering process starts [16, 18]. K-means algorithm is also very sensitive in respect to outliers.

3 PSO and its hybridization

PSO is one of the popular algorithms of evolutionary computation techniques which simulates the movement and flocking of birds. PSO performs a global search of solution space and has been widely used to solve multi objective optimization problems [1]. This approach is based upon the cooperation of agents called swarms. In analogy with evolutionary computation methods, a swarm is similar to population and a particle is similar to an individual. PSO follows a stochastic optimization method based on swarm intelligence (SI) [23]. The fundamental idea is that each particle represents a potential solution which it updates according to its own experience and that of neighbours. In PSO, particles are the agents that show the individual solutions. On the other hand, swarm is the collection of particles that shows the solution space. The PSO algorithm searches in parallel using a group of individuals or particles in a swarm, approach to the optimum through its present velocity, previous experience and the experience of its neighbours [5, 19]. PSO searches the problem domain by adjusting the trajectories of moving points in a multidimensional space. The motion of an individual particle for the optimal solution is governed through the interactions of the position and velocity of each individual, its own previous best performance and the best performance of their neighbours. For a swarm the ith particle is represented by a position denoted as xi = (xi 1 , xi 2 ,…, xi d ). Except for the position, each particle of a swarm is represented in D dimensional space with a velocity vi = (vi 1 , vi 2 ,…, vi d ). The particles explore in the search space with a velocity that is dynamically adjusted according to its own and neighbours’ performances. The standard PSO method updates the velocity and position of each particle according to the equations given below.

$$ \begin{aligned} vid( {t + 1} ) & = \omega^{*}vid(t) + c_{1} {}^{*}rand()^{*}( {pid - xid} ) \\ &\quad + c_{2} {}^{*}rand()^{*}( {pgd - xid} ) \end{aligned} $$
(2)
$$ xid\left( {t \, + \, 1} \right) = vid\left( {t \, + \, 1} \right) + xid\left( t \right) $$
(3)

where c1 and c2 are two positive acceleration constants,

rand() is a uniformly random number in [0,1], pid and pgd are the best positions found so far by the ith particle and all the particles respectively, t is the iteration count and ω is an inertia weight which is usually, linearly decreasing during the iterations. Equation (2) consists of three parts: the first one shows the current speed of the ith particle i.e. the present state, the second term is known as the cognition term which shows the thought of the particle itself and the last and third term is a social term that shows the ability of information sharing among the swarms. The inertia weight ω plays a role of balancing the local and global search. El-abd and Kamel [6] proposed that the initial value of ω should be high and it should drop linearly and reach its minimum at the end. The higher value at the beginning of iteration helps the particle to move fast and find the global search ability and the small value at the end helps the particles to find the finer local search at the end.

Tsai and Chiu [21] proposed generalized models and techniques for tuning these parameters. PSO also has some drawbacks. PSO finds the best result with interaction of particles, but when the search space is large its convergence speed becomes very slow near global optimum. It also shows poor quality clustering results when it deals with large and complex data sets [19, 25].

Nelder-Mead simplex method, which is a derivative free line search method, is specially used for unconstrained minimization scenarios such as nonlinear least squares, nonlinear simultaneous equation and other types of function minimization [14, 15]. It is used to find the local minima to a function with several variables. The NM simplex method process is based on four basic operations- reflection, expansion, contraction and shrinkage. With the help of these four operations the methods may improve themselves and find the closest local optimum point.

The hybrid NM-PSO algorithm as shown in Fig. 3, is hybridization of Nelder-Mead (NM) simplex search method and particle swarm optimization in which the population size is set to 3N + 1. The initial randomly generated 3N + 1 particles are sorted by fitness and after that top N + 1 particles are given to simplex method to improve the (N + 1)th particle [7]. The remaining 2N particles are adjusted by PSO. The process of adjusting these 2N particles includes selection of gbest particle and selection of neighbourhood best particle and velocity updates. The gbest particle is then calculated as guided by sorted fitness values and neighbourhood best particle is determined by dividing the 2N particles in N neighbourhood.

Fig. 3
figure 3

Algorithm of NM-PSO

Kao et al. [10] has proposed a hybridization of K-means, Nelder Mead and PSO. The idea behind this is to use the merits of this entire algorithm. In K-NM-PSO 3N particles are randomly generated. The clustering process is started by K-means and then NM-PSO is carried out to its completion.

Shi and Eberhart [20] have proposed a linearly decreasing weight particle swarm optimization (LDWPSO) algorithm. In this algorithm concept of linearly decreasing inertia weight was introduced. Results show that LDWPSO is more efficient and accurate as compared to the original PSO and has a great efficiency of balancing the global and local search abilities of the swarm. In LDWPSO \( w \) is linearly decreased from 0.9 to 0.4 in the search process. The equation of linearly decreasing inertia weight is as follows:

$$ w = (w_{\max } - w_{\min } ) \times \frac{{Iteration_{\max } - Iteration_{i} }}{{Iteration_{\max } }} + w_{\min } $$
(4)

In this equation \( w \) is the inertia weight, \( w_{\min } \) is 0.4 and \( w_{\max } \) is 0.9.

Yang et al. [4] proposed a hybrid algorithm named as accelerated linearly decreasing weight particle swarm optimization (ALDWPSO) as shown in Fig. 4, is hybridization of linearly decreasing weight and an acceleration strategy to improve the performance of PSO. It involves the four processes—the encoding and initialization of the particle, acceleration strategy, velocity update, position update and fitness evaluation.

Fig. 4
figure 4

Algorithm of ALDWPSO

4 Proposed algorithm

In standard PSO when evaluation goes on, at each iteration particles of swarm update the velocity based on their momentum, personal best solution and group best solution. In evaluation process, some particles of swarm have a tendency to go outside the boundary in the search of global solution. In Fig. 5 these particles are shown by arrows. These particles will attract another particle and pass the personal best and global best information which will not be accurate as this particle is outside the boundary. Due to this inaccurate information, the quality of clustering result will affect more and the diversity of clustering process will be lost. It also affects the convergence rate of solution and the algorithm is stuck in local minimum.

Fig. 5
figure 5

Standard PSO clustering

Another problem with PSO is its slow convergence near global solution [10]. There are several algorithms and their hybridization is available in literature, but these algorithms can improve the global convergence ability of PSO to a certain degree [24]. In proposed boundary restricted adaptive particle swarm optimization algorithm (BR-APSO) as shown in Fig. 6, a boundary restriction strategy has been taken for those particles which go outside the boundary of search space. When evaluation process starts by standard PSO, there are several particles which go outside the boundary and affect the clustering result. In Fig. 7, it is shown that after applying the BR-APSO, there is no element which is outside.

Fig. 6
figure 6

Algorithm of BR-APSO

Fig. 7
figure 7

BR-APSO clustering

In this approach the algorithm forcibly brings back those particles which go outside in the evaluation process. This process iteratively goes on when the termination criteria are not satisfied. The inertia weight is maximum in beginning of the process and as the process carries on low inertia is suggested for the reasons that it must not jump over the global solution. Previous work adopted the tuning process by linearly decreasing the inertia weight but it is seen that better solutions or swarm crowding near the global solution is observed with small change in the inertia weight and hence the inertia weight adaption function is non-linear. This motivates us to choose exponential function to update the inertia weight in BR-APSO. Result shows that the performance is significantly improved over other methods of PSO because it effectively balances the global and local searching ability of swarm.

5 Datasets and result discussion

Nine data sets have been taken to validate our algorithm. Data sets are, Vowel, Iris, Crude oil, Contraceptive Method Choice, Glass and Wine. The aim for selecting these data sets is to cover the data set of all dimensions (high, low, medium). The data set with 3 dimensions is grouped into three clusters, so the number of parameters is the product of cluster number and its attributes (features) i.e. N = k × d = 3 × 3 = 9, for finding the optimal cluster centre. The datasets are available at ftp://ftp.ics.uci.edu/pub/machine-learning-databases/.

  1. 1.

    Artificial data set 1 (Art1): This is a 2-dimensional problem with 4 unique classes. The problem is interesting in that only one of the inputs is really relevant to the formation of the classes. Figure 8 shows the artificial data set1. A total of 600 patterns were drawn from four independent bivariate normal distributions, where classes were distributed according to equation (5) for i = 1,… ~ where μ is the mean vector and ∑ is the covariance matrix; m1 = −3, m2 = 0, m3 = 3 and m4 = 6.

    Fig. 8
    figure 8

    Artificial dataset 1

    $$ N_{2} \left( {\mu = \left( {\begin{array}{*{20}c} {m_{i} } \\ 0 \\ \end{array} } \right),\;\sum { = \left[ {\begin{array}{*{20}c} {0.50} & {0.05} \\ {0.05} & {0.50} \\ \end{array} } \right]} } \right) $$
    (5)
  2. 2.

    Artificial data set 2 (Art2): This is a three dimensional problem with 5 unique classes. The total number of data sets is n = 250, where distribution of features of class is according to Class1 ~ Uniform (85, 100), Class2 ~ Uniform (70, 85), Class3 ~ Uniform (55, 70), Class4 ~ Uniform (40, 55), Class5 ~ Uniform (25, 40). The data sets are shown in Fig. 9.

    Fig. 9
    figure 9

    Artificial dataset 2

  3. 3.

    Vowel data set: This data contain 871 Indian Telugu vowels sounds. This is the three dimensional problem where all three attributes are corresponding to the I, II and III vowel frequency and six overlapping classes {d (72 objects), a (89 objects), i (172 objects), u (151 objects), e (207 objects), o (180 objects)}.

  4. 4.

    Fisher Iris data set: This is perhaps the best known database to be found in the pattern recognition literature. The data set contains 3 classes of iris flower of 150 instances each, where each class refers to a type of iris plant (Iris setosa, Iris virginica and Iris versicolor). One class is linearly separable from the other two; the latter are NOT linearly separable from each other. Each class has 50 samples with four features.

  5. 5.

    Crude oil data set: This data set has 56 data with five attributes: vanadium, iron, beryllium, saturated hydrocarbons and aromatic hydrocarbons. The crude oil data set is taken from three zones of sandstone.

  6. 6.

    Contraceptive Method Choice (CMC) data set: This data set is the subset of 1987 National Indonesia contraceptive Prevalence survey. This data set is the sample of women who are married and were not pregnant or did not know if they were at the time of interview. The problem is to predict the choice of current contraceptive method of women on the basis of their demographic and economic characteristics.

  7. 7.

    Wisconsin breast cancer data set: It contains 683 objects with two categories and nine attributes: clump thickness, cell size uniformity, cell shape uniformity, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. Categories are malignant and benign.

  8. 8.

    Glass data set: The data set was collected from 6 different glasses: building windows float processed (70 objects), building windows non-float processed (76 objects), vehicle windows float processed (17 objects), containers (13 objects), tableware (9 objects), and headlamps (29 objects), where each data has nine features: refractive index, sodium, magnesium, aluminum, silicon, potassium, calcium, barium, and iron.

  9. 9.

    Wine dataset: These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents (inputs) found in each of the three types of wines (classes). These data are collection of 178 instances (data vectors). Hence, this is a classification problem with “well behaved class structures”. There are 13 inputs, 3 classes and 178 data vectors.

Kao et al. [10] summarized the details of these nine data sets for better understanding the characteristics and further analysis. The inputs, classes and data vector details of the different data sets are shown in Table 1.

Table 1 Characteristics of data set

In this section, details of the overall results of the proposed algorithm are discussed. The performance and comparison of the result of different algorithms such as K-means, PSO, NM-PSO, K-PSO, K-NM-PSO, LDWPSO, ALDWPSO are evaluated. The clustering quality is also compared. The criteria for evaluating of quality are:

Sum of intra cluster distance: The sum of distances between data object and centre of cluster. If the sum is minimum then the quality of clustering is high.

Error rate: The total number of misplaced data objects of the cluster divided by the total number of data as according to equation (6)

$$ ER = \left( {\sum\limits_{i = 1}^{n} {\left( {if\left( {Ai = Bi} \right)\;{\text{then}}\;0\;{\text{else}}\;1} \right)/n} } \right) \times 100 $$
(6)

Here n is the total number of data objects. Ai and Bi is the dataset with ith data object before and after clustering.

The program has been written in MATLAB. The results of proposed algorithm are average of 20 simulation runs.

Kao et al. [10] and Yang et al. [4] have taken 10xN iterations for each run for all dataset in N dimensional problem. On the other hand proposed algorithm finds the global optimal solution in just 100 iterations as compared to 10 × N iterations. The experimental results also show that its convergence is very fast and provide more efficiency and effectiveness as compared to K-means, K-PSO, K-NM-PSO, LDWPSO, ALDWPSO.

The Intra cluster distances of all seven algorithms for nine datasets are summarized in Table 2. The obtained values are the result of inter cluster distance and its standard deviation and average of intra cluster distance over 20 simulation runs. Result shows that for all experimental data set BR-APSO gives better result by little difference between the averages and less standard deviations in comparison of other seven algorithms. Proposed algorithm is more effective and faster to find the best global optima as compared to PSO, NM-PSO, K-PSO, K-NM-PSO LDWPSO and ALDWPSO. Table 3 shows an example of error rate calculation. In this table data points (2, 6) and (1, 7) are misplaced and the error rate is 2/5.

Table 2 Comparison of intra cluster distance for K-means, PSO, NM-PSO, K-PSO, K-NM-PSO, LDWPSO, ALDWPSO and BR-APSO
Table 3 Error rate calculations

Table 4 describes the error rate of all the algorithms over 20 simulations with mean and standard deviation and best solution among all. For all data sets, proposed algorithm has smaller mean and standard deviation among all seven algorithms such as K-means, NM-PSO, K-NM-PSO, LDWPSO, ALDWPSO.

Table 4 Comparison of error rate for K-means, PSO, NM-PSO, K-PSO, K-NM-PSO, LDWPSO, ALDWPSO and BR-APSO

The convergence graph of the BR-APSO, NM-PSO, K-PSO, K-NM-PSO, K-means, PSO are shown in Figs. 10 and 11. Figure 10a shows the convergence behavior of the algorithms for data set Art1. In all the algorithms the convergence speed of K-means algorithm is fast but it is stuck in local optimum. The other algorithms PSO & NM-PSO take about 50 iterations for global convergence. K-PSO and K-NM-PSO take about 10 iterations to reach the global optimum, whereas proposed BR-APSO takes about 7 iterations to converge to global optimum. On the other hand, in Fig. 10b the algorithms BR-APSO, NM-PSO, K-PSO & K-NM-PSO correctly cluster the data of Art1 dataset into four clusters. In Figs. 10c, d the PSO clusters the dataset with 25 % error rate & K-means clusters the dataset with 25.67 % error rate.

Fig. 10
figure 10

a Conversion graph of BR-APSO, K-means, PSO, K-NM-PSO for Art 1; b BR-APSO, NM-PSO, K-PSO and K-NM-PSO with 0 % error rate; c PSO with 25 % error rate; d K-means with 25.67 % error rate

Fig. 11
figure 11

a Conversion graph of BR-APSO, K-means, PSO, K-NM-PSO for Art 2; b BR-APSO, NM-PSO, K-PSO and K-NM-PSO with 0 % error rate; c PSO with 20 % error rate; d K-means with 40 % error rate

Figure 11a shows the convergence graph of all algorithms for Art2 data set. K-means algorithm is fast but stuck in local minima. PSO and NM-PSO takes 80 iterations for finding global optimum, whereas, BR-APSO takes about 10 iterations for convergence to global optimum. Figure 11b, c, d shows the clustering results of all the algorithms. BR-APSO, K-PSO, NM-PSO, K-means, K-NM-PSO clusters the correct dataset in five clusters whereas K-means shows 40 % error rate and PSO shows the 20 % error rate to clusters the dataset in five clusters.

6 Conclusion

This paper has proposed a cluster analysis algorithm based on PSO named as BR-APSO. The performance of this algorithm is validated on nine data sets based on intra cluster distance criterion in N dimensional Euclidian space. The K-means and other algorithms suffer from initial center selection and slow convergence rate near global optima. The experiment results by using nine data sets show that proposed algorithm has better performance and convergence rate compared to K-means K-PSO, K-NM-PSO, LDWPSO, ALDWPSO and least error rate compared to these algorithms.