Data clustering using K-Means based on Crow Search Algorithm

Lakshmi, K; Visalakshi, N Karthikeyani; Shanthi, S

doi:10.1007/s12046-018-0962-3

Data clustering using K-Means based on Crow Search Algorithm

Published: 22 October 2018

Volume 43, article number 190, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Sādhanā Aims and scope Submit manuscript

Data clustering using K-Means based on Crow Search Algorithm

Download PDF

484 Accesses
23 Citations
Explore all metrics

Abstract

Cluster analysis is one of the popular data mining techniques and it is defined as the process of grouping similar data. K-Means is one of the clustering algorithms to cluster the numerical data. The features of K-Means clustering algorithm are easy to implement and it is efficient to handle large amounts of data. The major problem with K-Means is the selection of initial centroids. It selects the initial centroids randomly and it leads to a local optimum solution. Recently, nature-inspired optimization algorithms are combined with clustering algorithms to obtain the global optimum solution. Crow Search Algorithm (CSA) is a new population-based metaheuristic optimization algorithm. This algorithm is based on the intelligent behaviour of the crows. In this paper, CSA is combined with the K-Means clustering algorithm to obtain the global optimum solution. Experiments are conducted on benchmark datasets and the results are compared to those from various clustering algorithms and optimization-based clustering algorithms. Also the results are evaluated with internal, external and statistical experiments to prove the efficiency of the proposed algorithm.

An Effective Crow Search Algorithm and Its Application in Data Clustering

Article 23 July 2024

Hybridization of K-means Clustering Using Different Distance Function to Find the Distance Among Dataset

A Hybrid CRO-K-Means Algorithm for Data Clustering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Data mining techniques extract knowledge from large amount of data. These techniques include classification, clustering, association rules, etc. Cluster analysis is the unsupervised technique grouping the data without knowing the class labels. Clustering is applied in many application areas such as biology, security, business intelligence and web search [1]. Clustering can be divided into two categories: hard and soft clustering. In hard clustering, the same object can belong to only a single cluster. In soft clustering, the same object can belong to different clusters.

Clustering algorithms are classified into two categories: partitional and hierarchical. Partitional clustering algorithms form the clusters by partition of the data objects into groups. Hierarchical clustering algorithms form the clusters by the hierarchical decomposition of data objects. K-Means clustering algorithm is one of the partitional clustering algorithms and it is popular and most widely used due to its simplicity and efficiency. It chooses the initial centroid randomly from the data objects and uses the Euclidean distance to measure the distance between the data objects and its cluster centroid. K-Means algorithm gives a local optimum solution due to its selection of initial centroids.

A number of optimization algorithms are developed to provide the global optimum solution. Optimization algorithms are categorized into heuristic and metaheuristic. Heuristic means ‘to find’ or ‘to discover by trial and error’ and ‘meta’ means ‘beyond’ or ‘higher level’ [2]. Some of the nature-inspired metaheuristic optimization algorithms are Genetic Algorithm [3, 4], Ant Colony Optimization [5], Simulated Annealing (SA) [6], Particle Swarm Optimization [7, 8], Tabu Search [9, 10], Cat Swarm Optimization [11], Artificial Bee Colony [12,13,14], Cuckoo Search Algorithm [15, 16], Gravitational Search Algorithm [17], Firefly Algorithm [18], Bat Algorithm [19], Wolf Search Algorithm [20] and Krill Herd [21].

Crow Search Algorithm (CSA) is one of the population-based metaheuristic optimization algorithm and it was introduced by Alireza Askarzadeh [22]. This algorithm simulates the intelligent behaviour of crows. Crows are considered as one of the world’s most intelligent birds. This algorithm is based on finding the hidden storage position of excess food. Finding food source hidden by another crow is not a easy task because if a crow finds anyone following it, it tries to fool the crow by moving to another position.

To overcome the K-Means local optimum problem, in this paper a new clustering algorithm by hybridized Crow Search Optimization and K-Means clustering algorithms called CSAK Means is proposed.

The organization of this paper is as follows. Section 2 describes the related researches in the literature. Section 3 describes the K-Means clustering algorithm and the CSA is discussed in section 4. Section 5 describes the proposed CSAK Means clustering algorithm. The experimental analysis is discussed in section 6. Conclusion and future works are provided in section 7.

2 Related works

In this section, some of the optimization algorithms approaches for clustering problems and hybridization of optimization algorithms with K-Means are discussed.

Ant Colony Optimization approach for clustering problem is given in [23]. SA algorithm approach for clustering problem was proposed in [24]. Particle Swarm Optimization approach for clustering problem is given in [25]. Tabu Search algorithm approach for clustering problem was proposed in [26]. Artificial Bee Colony Optimization approach for clustering problem is given in [27, 28]. Cat Swarm Optimization algorithm for clustering was proposed in [29].

Genetic Algorithm combined with K-Means was developed in [30]. Hybrid clustering algorithm based on K-Means and ant colony algorithm was proposed in [31]. Cluster analysis with K-Means and SA was introduced in [32]. K-Means clustering algorithm based on Particle Swarm Optimization was proposed in [33, 34]. Tabu-Search-based K-Means was developed in [35]. Artificial Bee Colony based K-Means algorithm was proposed in [36]. Combination of Gravitational Search algorithm with K-Means was introduced in [37]. Firefly Algorithm combined with K-Means was proposed in [38]. Bat Algorithm combined with K-Means was proposed in [39]. Wolf Search Algorithm, Cuckoo Search, Bat Algorithm, Firefly Algorithm and Ant Colony Optimization algorithms integrated with K-Means are introduced in [40].

These algorithms try to solve the K-Means local optimum solution, but they suffer from low-quality results and low convergence speed, complicated operators, complex structure and parameter setting issues.

3 K-Means Clustering Algorithm

K-Means is the most widely used and easy to implement clustering algorithm. It partitions the data objects into predefined K number of groups based on the data objects that are closest to the centroid. The main objective of K-Means clustering is to minimize total intra-cluster distance, or the squared error function. The squared error function is calculated using Eq. (1):

$$\begin{aligned} \sum \limits _{j=1}^K\sum \limits _{i=1}^N\parallel x_i{}^{(j)}-c_j\parallel ^2. \end{aligned}$$

(1)

A dataset consists of N number of objects $X_i$, $i=1, 2, \ldots , N$ with D number of features $D_j$, $j=1, 2, \ldots , D$.

The K-Means clustering algorithm is described as follows:

i
Input the number of clusters K.
ii
Randomly select the K initial centroids $c_j, {\textit{j}}=1, 2, \ldots , {\textit{K}}$ from the data objects.
iii
Find the distance between each K-cluster centroid and the data objects using the formula
$$\begin{aligned} dis(x_{i},c_{j})=\sqrt{\sum \limits _{j=1}^d(x_i{}-c_j{})^2}. \end{aligned}$$
(2)
iv
Find the minimum distance and assign the data objects to clusters.
v
Update the centroids using Eq. (3), i.e., calculate the mean of all data objects assigned to the cluster:
$$\begin{aligned} c_j=\frac{1}{N_j}\sum \limits _{{x_i}\in {s_j}}x_i. \end{aligned}$$
(3)

The K-Means algorithm is terminated when one of the following conditions is satisfied: (i) the average change in the centroids, (ii) the maximum number of iterations is reached and (iii) no change in the clustership of objects.

The main features of K-Means clustering are the following: (i) simple and easy to implement and (ii) can handle large amount of data objects efficiently. The main issues are the following: (i) needs the number clusters in advance, (ii) handles numeric data only and (iii) produces local optimum solutions.

4 CSA

The principles of CSA are the following: (i) crows live in the form of groups, (ii) remember the position of food hiding locations, (iii) follow each other for stealing food and (iv) protect their food source.

The number of crows, i.e., flock size, is P in D-dimensional environment and the position of the crow at iteration time i in the search space is specified as $X_{i,iter}$, $i=1, 2, \ldots , N$; ${\textit{iter}}=1, 2, \ldots , {\textit{itermax}}$; itermax is the maximum number iterations. Each crow has a memory m to remember the position of the hiding place. At each iteration, the position of hiding place for crow i is specified by $m_{i,iter}$ and it shows the best position obtained so far. Metaheuristic algorithms should provide a good balance between diversification and intensification. In CSA, these two are controlled by the Awareness Probability (AP) parameter.

The CSA is described as follows:

1.
Initialize the parameters, number of flocks P, maximum number of iterations itermax, Flight Length FL and Awareness Probability AP.
2.
Initialize the position of crows randomly in PD-dimensional search space.
3.
Initialize the memory of the crows with position of crows.
4.
Evaluate the position of the crows.
5.
While iter<maxiter
1. (a)
  for all crows
  1. i.
    randomly choose any one of the crows to follow (for example v);
  2. ii.
    if crow ν does not know that crow μ is following it, new position of ν is obtained using Eq. (4); if crow ν does know that crow μ is following it, new position of ν is obtained randomly:
    $$\begin{aligned} {\left\{ \begin{array}{ll} x^{i,it} + r_i \times FL^{i,it} \times (m^{j,it}-x^{i,it}) & r_j \geqslant AP^{j,it} \\ \rm{a \,\, random \,\, position} & \rm{otherwise}\end{array}\right. } \end{aligned}$$
    (4)
  3. iii.
    check the feasibility of the new position; if the new position of crow is feasible, its position is updated; otherwise, the crow stays in the current position;
  4. iv.
    evaluate the new position of the crows using Eq. (1);
  5. v.
    update the memory of the crows using Eq. (5):
    $$\begin{aligned} {\left\{ \begin{array}{ll} x^{{i,it+1}} + {f(x^{i,it+1})} {\, \rm{is \, better\, than}}\, f(m^{m^i,it})\\ m^{{i,it}} \,\rm{otherwise} \end{array}\right. } \end{aligned}$$
    (5)
6.
End of while.

5 Proposed algorithm

The K-Means clustering algorithm is easy to implement and efficiently handles large datasets. The main drawback is that it produces local optimum solutions. To obtain the global optimum solution, K-Means is combined with global optimization algorithms. CSA is the metaheuristic global optimization algorithm and combined with K-Means to obtain the global optimum solution. In this section, CSA combined with K-Means algorithm is proposed.

The proposed CSAK Means algorithm is described as follows:

1.
Input the values of number of clusters K, flock size N, maximum number of iterations maxiter, flight length FL and awareness probability AP.
2.
Initialize the position of crows N and memory of crows M.
3.
Generate the matrix of size K*D with random numbers (number of features in the dataset). The maximum range of random numbers is the total number of instances in the data objects.
4.
Encode the random numbers with the data objects. Each row specifies the K cluster centres for clustering algorithm. For example, if $K=3$, $D=4$, a single row looks as shown in figure 1.
5.
Initialize the memory of the crows with the values of the positions of the crows because initially crows hid their foods at their initial positions.
6.
Evaluate the fitness of initial position of crows using Eq. (1).
7.
Initialize the fitness of memory of the crows with the fitness position of the crows.
8.
Update the position of crows:
1. (a)
  while iteration $\le $ maxiter
  1. i.
    for all crows
    1. A.
      choose any one of the crows to follow randomly (for example μ);
    2. B.
      if crow μ does not know that crow ν is following it, new position of μ is obtained using Eq. (4);
  2. C.
    if crow μ does know that crow ν is following it, new position of μ is obtained randomly;
  3. D.
    check the feasibility of the new position; if the new position of crow is feasible, its position is updated; otherwise, the crow stays in the current position;
2. ii.
  end of while;
  1. (b)
    evaluate the fitness of new position of crows using Eq. (1);
3. (c)
  update the memory of the crows using Eq. (5).
9.
Calculate the Euclidean distance from each data to best obtained solution centroid from CSA.

6 Experimental results

6.1 Datasets

To evaluate the performance of proposed CSAK Means algorithm, six benchmark datasets, Iris, Wine, Glass, Breast Cancer, Contraceptive Method Choice (CMC) and Haberman’s Survival, are used. For each dataset the number of instances and number of classes are specified in table 1. These datasets are collected from UCI machine repository [41].

Table 1 Dataset details.

Full size table

Iris: This dataset contains 150 samples of iris flower with 3 different species. The species include Setosa, Versicolour and Virginica. For each species there are 50 observations. The attributes in each species are sepal length, sepal width, petal length and petal width.

Wine: This dataset contains the chemical analysis of wines grown in the same region but derived from three different cultivars. There are 13 quantities found in each of the three types of wines.

Glass: This dataset contains the types of glass motivated by criminological investigation. At the scene of the crime, the glass left can be used as evidence, if it is correctly identified. There are 10 quantities found in each of the six types of glass.

Wisconsin Breast Cancer: This dataset contains the samples to identify the type of breast cancer. It is identified using 9 quantities found in each of the two types of breast cancer.

CMC: This dataset contains the samples of married women who were either not pregnant or did not know at the time of interview. The problem is to predict the current contraceptive method choice (no use, long-term methods or short-term methods) of a woman based on her demographic and socio-economic characteristics. There are 9 quantities found in each of the three types of choices.

Habermans Survival: This dataset contains the cases from a study that was conducted between 1958 and 1970 at the University of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for breast cancer. There are 3 quantities found in each of the two types of status.

6.2 Measures

The performance of CSAK Means is evaluated with internal and external measures. The internal measure used is Silhouette and the external measures used are Purity, Normalized Mutual Information, Rand Index and FMeasure. Also the convergence time and time taken for each iteration are compared for the algorithms. ANOVA and statistical tests for significance are also performed for all algorithms.

6.2a Purity: Purity is the external evaluation measure to measure the quality of clustering algorithm. It is calculated as the count of all correct predictions divided by the total count of the data objects. It is calculated using Eq. (6):

$$\begin{aligned} purity(X,Y)=\dfrac{1}{N}\sum \limits _{i=1}^K max_j|{c_i\cap t_j}|. \end{aligned}$$

(6)

N is the total number of objects, K is the number of clusters, $c_i$ is the cluster in C and $t_j$ is maximum count for cluster $c_i$.

6.2b Normalized Mutual Information: Normalized Mutual Information is an external measure to validate the quality of clustering. It is the information theoretic measure on how well the predicted clusters and the actual clusters predict the normalized amount of information inherent from these two. It is calculated using Eq. (7):

$$\begin{aligned} NMI(X,Y)=\dfrac{2I(X,Y)}{[H(X)+H(Y)}. \end{aligned}$$

(7)

X is the actual class label, Y is the label predicted by the algorithm, H is the entropy and I(X;Y) is the mutual information between X and Y.

6.2c Rand Index: Rand Index is an external measure to find the similarity between actual labels and predicted labels. This measure has a value between 0 and 1, 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same. Rand Index is calculated using Eq. (8):

$$\begin{aligned} Rand Index=\frac{TP+TN}{TP+FP+FN+TN}. \end{aligned}$$

(8)

TP means True Positive; it is the count of similar objects in the same cluster. TN means True Negative; it is count of dissimilar objects in different clusters. FP means False Positive; it is the count of dissimilar objects in the same cluster. FN means False Negative; it is the count of similar objects in different clusters.

6.2d FMeasure: FMeasure is the external measure to obtain the accuracy of the clustering results. It is the harmonic mean of precision and recall. FMeasure can be computed using formula (9):

$$\begin{aligned} FMeasure=2\times \frac{precision\times recall}{precision+recall}. \end{aligned}$$

(9)

Precision is calculated as the number of correct positive predictions divided by the total number of positive predictions. The best precision is 1, whereas the worst is 0. Precision is calculated as true positive divided by the sum of false positive and true positive. It is calculated using Eq. (10):

$$\begin{aligned} precision=\frac{TP}{TP+FP}. \end{aligned}$$

(10)

Recall is calculated as the number of correct positive predictions divided by the total number of positives. The best sensitivity is 1.0, whereas the worst is 0.0. It is calculated using Eq. (11):

$$\begin{aligned} recall=\frac{TP}{TP+FN}. \end{aligned}$$

(11)

6.2e Silhouette: The silhouette is an internal measure that measures how similar an object is to its own cluster compared with other clusters. This measure combines both the cohesion and separation. It is calculated using Eq. (12):

$$\begin{aligned} sil(i)=\frac{b_i-a_i}{max({a_i,b_i)}} \end{aligned}$$

(12)

where $a_i$ is the average dissimilarity of i with respect to all other objects within the same cluster and $b_i$ is the average dissimilarity of i with respect to all other objects in other clusters.

6.2f ANOVA: “Analysis of Variance” is a statistical test and it determines whether there is any statistically significant difference between the means of two or more groups. A one-way ANOVA is used to find out whether the means of groups are significantly different from one another or each group is relatively the same.

The one-way ANOVA table has six columns: (i) source of variability, (ii) sum of squares (ss) of each source, (iii) degrees of freedom (df) of each source, (iv) mean square (MS) for each source, (v) F-statistic, the ratio of the MSs and (vi) probability, the corresponding p-value of F.

6.3 Results

The algorithms are implemented using Matlab R2012a on an Intel i5 of 2.30 GHz with 4 GB RAM. The K-Means, K-Means++, Genetic K-Means, PSOK Means and CSAK Means algorithms are executed in 10 distinct runs with parameters specified in table 2. The values for the Particle Swarm Optimization algorithm are suggested in [42]. The values for the CSA are suggested in [22].

Table 2 Algorithm-specific parameters.

Full size table

The fitness values of K-Means, K-Means++, Genetic K-Means, PSOK Means and CSAK Means for all datasets are shown in tables 3–8. The ANOVA statistical test results are shown in tables 9–14. Figures 2–7 show a comparison of convergence behaviour of the datasets for all algorithms. The boxplot for the silhouette of fitness values is shown in figures 8–13.

Table 3 Fitness, measures and computation time values of Iris Dataset.

Full size table

Table 4 Fitness, measures and computation time values of Wine Dataset.

Full size table

Table 5 Fitness, measures and computation time values of Glass Dataset.

Full size table

Table 6 Fitness, measures and computation time values of Cancer Dataset.

Full size table

Table 7 Fitness, measures and computation time values of CMC Dataset.

Full size table

Table 8 Fitness, measures and computation time values of Survival Dataset.

Full size table

Table 9 ANOVA test results of Iris Dataset.

Full size table

Table 10 ANOVA test results of Wine Dataset.

Full size table

Table 11 ANOVA test results of Glass Dataset.

Full size table

Table 12 ANOVA test tesults of Cancer Dataset.

Full size table

Table 13 ANOVA test results of CMC Dataset.

Full size table

Table 14 ANOVA test results of Survival Dataset.

Full size table

6.4 Discussion

Table 3 shows the results of fitness, measures and computation time values of Iris Dataset. For the Iris Dataset, the CSAK Means provides the best solution and the standard deviation is also smaller than those of other algorithms. The internal and external index solutions of CSAK Means are better than those of other algorithms. The convergence time and time for each iteration for CSAK Means are higher than those of other algorithms.

Table 4 shows the fitness, measures and computation time values of of fitness values of Wine Dataset. For the Wine Dataset, the CSAK Means provides the best solution and the standard deviation is also smaller than those of other algorithms. The internal and external index solutions of CSAK Means are better than those of other algorithms. The convergence time and time for each iteration for CSAK Means are lower and higher, respectively, than those of other algorithms except PSOK Means.

Table 5 shows the fitness, measures and computation time values of of fitness values of Glass Dataset. For the Glass Dataset, K-Means++ provides the best solution. The internal measure silhouette for Genetic K-Means is better than those of other algorithms. The external measure index solutions of CSAK Means are better than those of other algorithms. The convergence time and time for each iteration for CSAK Means are lower and higher, respectively, than those of other algorithms.

Table 6 shows the fitness, measures and computation time values of Cancer Dataset. For the Cancer Dataset, CSAK Means provides the best solution. The internal and external measure index solutions of CSAK Means are better than those of other algorithms. The convergence time and time for each iteration for CSAK Means are lower and higher, respectively, than those of other algorithms.

Table 7 shows the fitness, measures and computation time values of fitness values of CMC Dataset. For the CMC Dataset, K-Means++ provides the best solution but the worst, average and standard deviation of CSAK Means are better than those of other algorithms. The internal measure silhouette for CSAK Means is better than those of other algorithms. The external measure index solutions of CSAK Means and Genetic K-Means are the same. These values are better than those of other algorithms. The convergence time for CSAK Means is higher than those of other algorithms except PSOK Means. The time taken for each iteration is higher than those of all algorithms.

Table 8 shows the fitness, measures and computation time values of fitness values of Survival Dataset. For the Survival dataset, CSAK Means provides the best solution. The internal and external measure values of CSAK Means are better than those of other algorithms. The convergence time and time taken for each iteration for CSAK Means are higher than those of other algorithms.

Tables 9–14 show the results of ANOVA test results. The reason behind the ANOVA test is to test if there is any significance between the accuracies of the algorithms. The null hypothesis for an ANOVA is no significant differences among the groups and the alternative hypothesis is there is significant difference among the groups. Here, in all cases where Prob>F, the null hypothesis is rejected and alternative hypothesis is accepted; this implies that accuracies of all algorithms are not equal.

7 Conclusion and future work

In this paper, hybridized CSA and K-Means clustering algorithm is proposed and this new algorithm is called CSAK Means. The results of proposed algorithm are compared to those of K-Means, K-Means++, Genetic K-Means and PSOK Means algorithms. To evaluate the CSAK Means algorithm, fitness function used here is Mean Square Error Criterion. Afore-mentioned experimental results show that CSA outperforms the K-Means, K-Means++, Genetic K-Means and PSOK Means algorithms. In Genetic Algorithm, three operators, namely selection, crossover and mutation, need to be applied. PSO needs four parameters, namely inertia weight, individual learning factor, social learning factor and maximum velocity. CSA needs the two parameters AP and FL. Each optimization algorithm has its own parameters and it is tedious to fix the optimum values for each parameter. In future, this is extended to dynamically determine the number of clusters.

References

Han J, Pei J and Kamber M 2011 Data mining: concepts and techniques. Elsevier, United States
MATH Google Scholar
Yang X S 2008 Introduction to computational mathematics. World Scientific, Singapore
Book Google Scholar
Holland J H 1975 Adaption in natural and artificial systems. Ann Arbor, MI: The University of Michigan Press
MATH Google Scholar
Goldberg D 1989 Genetic algorithms in search, optimization and machine learning. Addison-Wesley, United States
MATH Google Scholar
Dorigo M 1992 Optimization, learning and natural algorithms. PhD Thesis, Politecnico di Milano
Brooks S P and Morgan B J 1995 Optimization using simulated annealing. The Statistician 44(2): 241–257
Article Google Scholar
Eberhart R and Kennedy J 1995 A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science, MHS’95, pp. 39–43, IEEE
Kennedy J and Eberhart R 1995 Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Perth, WA, vol. 4, pp. 1942–1948
Glover F and Laguna M 1997 Tabu search. Boston: Kluwer
Book Google Scholar
Holland J H 1975 Adaptation in natural and artificial systems: an introductory analysis with application to biology, control, and artificial intelligence. Ann Arbor, MI: University of Michigan Press, pp. 439–444
Chu S C, Tsai P W and Pan J S 2006 Cat swarm optimization. In: Proceedings of the Pacific Rim International Conference on Artificial Intelligence. Berlin–Heidelberg: Springer, pp. 854–858
Basturk B and Karaboga D 2006 An artificial bee colony (ABC) algorithm for numeric function optimization. In: Proceedings of the IEEE Swarm Intelligence Symposium, Indianapolis, Indiana, USA
Karaboga D and Basturk B 2007 A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. Journal of Global Optimization 39(3): 459–471
Article MathSciNet Google Scholar
Karaboga D and Basturk B 2007 Artificial bee colony (ABC) optimization algorithm for solving constrained optimization problems. In: Proceedings of the International Fuzzy Systems Association World Congress. Berlin–Heidelberg: Springer, pp. 789–798
Yang X S and Deb S 2009 Cuckoo search via Levy flights. In: Proceedings of the World Congress on Nature and Biologically Inspired Computing, NaBIC 2009, IEEE, pp. 210–214
Yang X S and Deb S 2014 Cuckoo search: recent advances and applications. Neural Computing and Applications 24(1): 169–174
Article Google Scholar
Rashedi E, Nezamabadi-Pour H and Saryazdi S 2009 GSA: a gravitational search algorithm. Information Sciences 179(13): 2232–2248
Article Google Scholar
Yang X S 2010 Firefly algorithm, Levy flights and global optimization. In: Proceedings of Research and Development in Intelligent Systems XXVI. London: Springer, pp. 209–218
Google Scholar
Yang X S 2010 A new metaheuristic bat-inspired algorithm. In: Proceedings of Nature Inspired Cooperative Strategies for Optimization, NICSO 2010. Berlin–Heidelberg: Springer, pp. 65–74
Chapter Google Scholar
Tang R, Fong S, Yang X S and Deb S 2012 Wolf search algorithm with ephemeral memory. In: Proceedings of the Seventh International Conference on Digital Information Management (ICDIM), IEEE, pp. 165–172
Gandomi A H and Alavi A H 2012 Krill herd: a new bio-inspired optimization algorithm. Communications in Nonlinear Science and Numerical Simulation 17(12): 4831–4845
Article MathSciNet Google Scholar
Askarzadeh A 2016 A novel metaheuristic method for solving constrained engineering optimization problems: crow search algorithm. Computers and Structures 169: 1–12
Article Google Scholar
Shelokar P S, Jayaraman V K and Kulkarni B D 2004 An ant colony approach for clustering. Analytica Chimica Acta 509(2): 187–195
Article Google Scholar
Selim S Z and Alsultan K 1991 A simulated annealing (SA) algorithm for the clustering problem. Pattern Recognition 24(10): 1003–1008
Article MathSciNet Google Scholar
Chen C Y and Ye F 2004 Particle swarm optimization algorithm and its application to clustering analysis. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control, IEEE, vol. 2, pp. 789–794
Google Scholar
Al-Sultan K S 1995 A tabu search approach to the clustering problem. Pattern Recognition 28(9): pp.1443–1451
Article Google Scholar
Zhang C, Ouyang D and Ning J 2010 An artificial bee colony approach for clustering. Expert Systems with Applications 37(7): 4761–4767
Article Google Scholar
Karaboga D and Ozturk C 2011 A novel clustering approach: Artificial Bee Colony (ABC) algorithm. Applied Soft Computing 11(1): 652–657
Article Google Scholar
Santosa B and Ningrum M K 2009 Cat swarm optimization for clustering. In: Proceedings of the International Conference on Soft Computing and Pattern Recognition, SOCPAR’09, IEEE, pp. 54–59
Krishna K and Murty M N 1999 Genetic K-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 29(3): 433–439
Article Google Scholar
Lu J and Hu R 2013 A new hybrid clustering algorithm based on K-means and ant colony algorithm. In: Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013)
Sun LX, Xu F, Liang Y Z, Xie Y L and Yu R Q 1994 Cluster analysis by the K-means algorithm and simulated annealing. Chemometrics and Intelligent Laboratory Systems 25(1): 51–60
Article Google Scholar
Van der Merwe D W and Engelbrecht A P 2003 Data clustering using particle swarm optimization. In: Proceedings of the 2003 Congress on Evolutionary Computation, CEC’03, IEEE, vol. 1, pp. 215–220
Ahmadyfard A and Modares H 2008 Combining PSO and k-means to enhance data clustering. In: Proceedings of the International Symposium on Telecommunications, IEEE, pp. 688–691
Liu Y, Liu Y, Wang L and Chen K 2005 A hybrid tabu search based clustering algorithm. In: Proceedings of the International Conference on Knowledge-Based and Intelligent Information and Engineering Systems. Berlin–Heidelberg: Springer, pp. 186–192
Google Scholar
Armano G and Farmani M R 2014 Clustering analysis with combination of artificial bee colony algorithm and k-means technique. International Journal of Computer Theory and Engineering 6(2): 141
Article Google Scholar
Hatamlou A, Abdullah S and Nezamabadi-Pour H 2012 A combined approach for clustering based on K-means and gravitational search algorithms. Swarm and Evolutionary Computation 6: 47–52
Article Google Scholar
Hassanzadeh T and Meybodi M R 2012 A new hybrid approach for data clustering using firefly algorithm and K-means. In: Proceedings of the CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), IEEE, pp. 007–011
Komarasamy G and Wahi A 2012 An optimized K-means clustering technique using bat algorithm. European Journal of Scientific Research 84(2): 263–273
Google Scholar
Tang R, Fong S, Yang, X S and Deb S 2012 Integrating nature-inspired optimization algorithms to K-means clustering. In: Proceedings of the Seventh International Conference on Digital Information Management (ICDIM), IEEE, pp. 116–123
Asuncion A and Newman D 2007 UCI machine learning repository
Van den Bergh F 2002 An analysis of particle swarm optimizers. PhD Thesis, Department of Computer Science, University of Pretoria, Pretoria, South Africa

Download references

Author information

Authors and Affiliations

Department of Computer Applications, Kongu Engineering College, Perundurai, India
K Lakshmi & S Shanthi
Department of Computer Science, Government Arts and Science College, Kangeyam, India
N Karthikeyani Visalakshi

Authors

K Lakshmi
View author publications
You can also search for this author in PubMed Google Scholar
N Karthikeyani Visalakshi
View author publications
You can also search for this author in PubMed Google Scholar
S Shanthi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K Lakshmi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lakshmi, K., Visalakshi, N.K. & Shanthi, S. Data clustering using K-Means based on Crow Search Algorithm. Sādhanā 43, 190 (2018). https://doi.org/10.1007/s12046-018-0962-3

Download citation

Received: 10 January 2017
Revised: 14 November 2017
Accepted: 13 May 2018
Published: 22 October 2018
DOI: https://doi.org/10.1007/s12046-018-0962-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data clustering using K-Means based on Crow Search Algorithm

Abstract

Similar content being viewed by others

An Effective Crow Search Algorithm and Its Application in Data Clustering

Hybridization of K-means Clustering Using Different Distance Function to Find the Distance Among Dataset

A Hybrid CRO-K-Means Algorithm for Data Clustering

1 Introduction

2 Related works

3 K-Means Clustering Algorithm

4 CSA

5 Proposed algorithm