Data clustering using multivariant optimization algorithm

Zhang, Qin-Hu; Li, Bao-Lei; Liu, Ya-Jie; Gao, Lian; Liu, Lan-Juan; Shi, Xin-Ling

doi:10.1007/s13042-014-0294-5

Data clustering using multivariant optimization algorithm

Original Article
Published: 26 August 2014

Volume 7, pages 773–782, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Data clustering using multivariant optimization algorithm

Download PDF

Qin-Hu Zhang¹,
Bao-Lei Li¹,
Ya-Jie Liu¹,
Lian Gao¹,
Lan-Juan Liu¹ &
…
Xin-Ling Shi¹

359 Accesses
7 Citations
Explore all metrics

Abstract

Data clustering is one of the most popular techniques in data mining to group data with great similarity and high dissimilarity into each cluster. This paper presents a new clustering method based on a novel heuristic optimization algorithm proposed recently and named as multivariant optimization algorithm (MOA) to locate the optimal solution automatically through global and local alternating search implemented by a global exploration group and several local exploitation groups. In order to demonstrate the performance of MOA-clustering method, it is applied to group six real-life datasets to obtain their clustering results, which may be compared with those received by employing K-means algorithm, genetic algorithm and particle swarm optimization. The results show that the proposed clustering algorithm is an effective and feasible method to reach a high accurate rate and stability in clustering problems.

Clustering Analysis Based on Coyote Search Technique

An Analysis of K-Means, Particle Swarm Optimization and Genetic Algorithm with Data Clustering Technique

A Novel Hybrid Clustering Analysis Based on Combination of K-Means and PSO Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Cluster analysis has become an important technique in exploratory data analysis, pattern recognition, machine learning, image segmentation, neural computing, and other engineering [1]. In the field of clustering, K-means algorithm as a popular clustering method has been successfully applied to many practical clustering problems [2, 3]. However, the results obtained by using K-means algorithm may contain several local minima as the objective function of K-means which is not convex [4, 5]. Evolutionary algorithms such as genetic algorithm (GA) [6] and particle swarm optimization (PSO) [7] have been therefore introduced to solve such problems and widely applied to various clustering problems [8–17].

In this paper, the recently proposed discrete heuristic optimization algorithm named multivariant optimization algorithm (MOA) is adopted to solve clustering problems. In MOA, a search individual is named as an atom. The main idea of MOA is to search the solution space through alternating global–local search iterations where global exploration atoms explore the whole solution space to locate potential areas and then multiple local exploitation groups with different population are allotted to these potential areas for different levels of local exploitations. The better atoms generated in the optimization process are recorded in a data structure which is made up of a queue and some stacks, whereas the worse ones are extruded in competition. As to clustering problem, we can regard an array (atom) recording all cluster centers as a solution, and only need to search for the optimal solution in the whole solution space, so the clustering problem is converted to an optimization problem. Then the MOA can be applied to search for the optimal solution which contains all cluster centers of different classes.

For the purpose of simplifying the description, MOA, K-means, GA, and PSO based data clustering are named as MOA-clustering, K-clustering, GA-clustering, and PSO-clustering respectively in this paper.

In order to evaluate the performance of MOA-clustering, some comparative experiments on MOA-clustering, K-clustering, GA-clustering and PSO-clustering based on six datasets are conducted. The experimental results demonstrate that the MOA-clustering not only has the ability to locate the optimal solution but also has competitive performance to GA-clustering and PSO-clustering, and outperforms K-clustering in terms of the accurate rate and stability, and that it is an effective and feasible method to reach a high accurate rate and stability in clustering problems.

2 Multivariant optimization algorithm

With the development of computer technology, the memory and speed of computer had improved a lot in a rapid speed, as a result, a novel discrete evolutionary optimization algorithm making full use of the memory of computer is proposed in this paper. The idea of the proposed algorithm is inspired by the characters of computer data structure especially the ordered doubly-linked list. In the MOA search process, global search atoms explore the global space to locate potential areas and then local search atoms exploit each potential local area in detail to improve the results and then the better atoms are recorded in the structure table. After sufficient global–local search iterations under the instruction of a structure table, multiple optimal solutions are recorded in the queue of the structure table.

The search process of MOA is implemented by search atoms under the instruction of a structure. For a minimum optimization problem, the structure table illustrated in Fig. 1 is designed according to the following rules:

1.
Global search atoms are recorded in the queue.
2.
The fitness values of atoms recorded in the queue are increasing from the front to the rear.
3.
Each node in the queue has a stack pointer which points to a stack, the depth of the stacks is descending from the left to the right.
4.
Local search atoms generated in the neighborhood of the i-th global atom are recorded in the i-th stack. The fitness values of atoms recorded in each stack are increasing from the bottom to the top.

In MOA algorithm, an atom stands for a candidate solution of the optimization problem. The atoms are in two types: global search and local search atoms. Global search atoms are generated uniformly at random in the solution space. Local search atoms are generated in the neighborhood of the global atoms recorded in the queue node, which is considered as the center of a potential area. In a D-dimensional solution space, the global search atoms denoted as atom _g are generated according to the Eq. (1):

$$atom_{g} = \left\{ {unifrnd(min_{1} ,max_{1} ), \ldots ,unifrnd(min_{D} ,max_{D} )} \right\}$$

(1)

where min _i and max _i are the lower and the upper bounds of the i-th dimension of solution space, which are determined by calculating the minimum and maximum of the i-th dimension in datasets. The function unifrnd(min _i, max _i) returns a random number which is uniformly distributed on the interval from min _i to max _i, so atom _g is a vector.

The local search atoms denoted as atom _l are generated in each corresponding global atom’s neighborhood with radius R according to the Eq. (2):

$$atom_{l} = \left\{ \begin{gathered} atom_{g} + r * R * \frac{{\left[ {h1, \ldots ,h_{D} } \right]}}{{\sqrt {\sum\limits_{i = 1}^{D} {h_{i}^{2} } } }} \, \left( {\sum\limits_{i = 1}^{D} {h_{i}^{2} } \ne 0} \right) \hfill \\ atom_{g} \, \left( {\sum\limits_{i = 1}^{D} {h_{i}^{2} } = 0} \right) \hfill \\ \end{gathered} \right.$$

(2)

where atom _g and R are the center vector and radius of the neighborhood respectively, [h ₁,…,h _D] is a vector with random numbers uniformly distributed on [−1,1], and ${{[h_{1} , \ldots ,h_{D} ]} \mathord{\left/ {\vphantom {{[h_{1} , \ldots ,h_{D} ]} {\sqrt {\sum\nolimits_{i = 1}^{D} {h_{i}^{2} } } }}} \right. \kern-0pt} {\sqrt {\sum\nolimits_{i = 1}^{D} {h_{i}^{2} } } }}$ is a unit vector, so ${{R^{*} [h_{1} , \ldots ,h_{D} ]} \mathord{\left/ {\vphantom {{R^{*} [h_{1} , \ldots ,h_{D} ]} {\sqrt {\sum\nolimits_{i = 1}^{D} {h_{i}^{2} } } }}} \right. \kern-0pt} {\sqrt {\sum\nolimits_{i = 1}^{D} {h_{i}^{2} } } }}$ stands for a circle with radius R, r is a random number between 0 and 1. The definition of radius R is as follows: let f(x) be defined on D and there exists a global optimal point atom _g $\in$ D. We say that f(x) has a limit as x tends to atom _g provided that there exists an optimum A $\in$ R such that for every ε > 0, no matter how small, there always exists R > 0 such that for every x $\in$ D,

$$\left\{ \begin{gathered} 0 < |x - atom_{g} | < R \, .implies.{ |}f(x) - A | { < }\varepsilon \hfill \\ \hfill \\ \mathop {\lim }\limits_{iterations - > \infty } P(|f(x) - A| < \varepsilon ) = 1 \hfill \\ \end{gathered} \right.$$

Further, in the case that such an optimum A exists, we shall say that A is the limit of f(x) as x tends to atom _g. In addition, f(x) will gradually converge to the optimum A with probability 1 when the number of iterations is enough.

The MOA algorithm searches the solution space by the following steps which are illustrated by Fig. 2:

Step1::: set the initial parameters of the MOA algorithm: the length of queue, the depth of each steak, the scope of neighborhood and the maximum number of iterations
Step2::: generate and evaluate global search atoms. At the beginning of iteration, a number of new global search atoms are generated and then their fitness values are evaluated
Step3::: update the queue. Compared the fitness value of each new global atom with the atoms in the queue, if a new atom is better enough to be recorded in the queue, a new queue node which records this atom should be inserted into the queue following the same logic patterns used in a double linked general ordered list and the rear node should be deleted to keep the length of the queue fixed
Step4::: generate and evaluate local search atoms. For each stack, the local search atoms whose number equals the depth of the stack are generated in the neighborhood of their corresponding global atoms in the queue and then their fitness values are evaluated by fitness function
Step5::: update each stack. Compared the fitness value of each new atom with the atoms in the corresponding stack, if a new atom is better enough to be recorded in the stack, a new stack node which records this atom should be inserted into the stack. If the number of nodes in the stack is bigger than the depth of the stack, the redundant nodes should be deleted. If the best atom in the i-th stack is better than the i-th global atom, they should be replaced each other
Step6::: check the termination criterion. If the termination criterion is satisfied, the algorithm will stop. Otherwise, return to Step2

3 Application of multivariant optimization algorithm

3.1 Encoding the search atoms

From the above description of MOA, we can know the main encoded objects are the search atoms, including global and local search atoms. In the context of clustering, a single atom represents all cluster centers. That is, in a n-dimensions space, the i-th atom is encoded as follows:

$$xi = (x_{11}^{i} ,x_{12}^{i} , \ldots ,x_{1n}^{i} ,\;x_{21}^{i} ,x_{22}^{i} , \ldots ,x_{2n}^{i} , \ldots ,\;x_{K1}^{i} ,x_{K2}^{i} , \ldots x_{Kn}^{i} )$$

where x ⁱ_jm refers to the m-th value of the j-th cluster center vector in the i-th atom, K is the number of clusters. So the length of each atom is K × n.

Figure 3 is an example of the encoding of a single atom at the time of producing search atoms in the MOA. Let n = 2, K = 3, i.e., the search space is two-dimension and the number of clusters is three. The vector of this atom represents three cluster centers [(x ₁, y ₁) (x ₂, y ₂) (x ₃, y ₃)].

According to the encoding of a single atom, every atom produced during the process of searching stands for a candidate solution to optimization problem, so that we can make the best of MOA to search the optimal solution in the n-dimensions solution space as an optimization problem.

3.2 Designing the evaluation function

The design of the evaluation function of MOA-clustering is a key issue of applying MOA to clustering, the main role of it is to evaluate whether the atom gotten during the process of searching is a better cluster center or not, if a new atom is better than the old one during the process of updating, then the old one will be replaced by the new one.

In this paper, we design the evaluation fitness following two criteria as follows:

1.
The inner-cluster distance as defined in Eq. (3), i.e. the distance between data vectors and their corresponding cluster center within a cluster, where the objective is to minimize the inner-cluster distance.
$$J1 = \sum\limits_{j = 1}^{K} {\sum\limits_{\forall xi \in Zj}^{{}} {\mathop {\left\| {xi - {\text{z}}j} \right\|}\nolimits^{2} } }$$
(3)
where z _j is the j-th cluster center, K is the number of clusters, x _i denotes data points belonging to z _j.
2.
The inter-cluster distance as defined in Eq. (4), i.e. the distance between all cluster centers, where the objective is to maximize the distance between clusters.
$$J2 = \sum\limits_{i = 1}^{K} {\sum\limits_{j = i + 1}^{K} {\mathop {\left\| {zi - zj} \right\|}\nolimits^{2} } }$$
(4)
where z _i, z _j are the i-th and j-th cluster center respectively, K is the number of clusters.

According to the two criteria, the evaluation function is designed as defined in Eq. (5), where generally the objective is to minimize the value of evaluation function.
$${\text{fitness}} = w_{1} \times J_{1} - w_{2} \times J_{2}$$
(5)
where w ₁, w ₂ are the weight coefficient of J ₁ and J ₂ respectively, which decide the influence of J ₁ and J ₂ in evaluation. If w ₁ is bigger than w ₂, it means that J ₁ decides the result of evaluation in a large part. Through a series of experiments, the clustering results are relative stabile and better when w ₁ = 0.8, w ₂ = 0.2. So in this paper, we set w ₁ = 0.8, w ₂ = 0.2.

3.3 MOA-clustering

After encoding the search atoms and designing the evaluation function, the execution of MOA-clustering is as follow:

Step1::

set the initial parameters of the MOA algorithm: the length of queue, the depth of each steak, the scope of neighborhood and the maximum number of iterations

Step2::

generate and evaluate global search atoms. At the beginning of iteration, a number of new global search atoms are generated randomly as the above description of encoding the search atoms in Fig. 3 and all data points x _i are assigned to their corresponding cluster centers z _j with the shortest distance between x _i and z _j according to Euclidean distance as defined in Eq. (6), then their fitness values are evaluated as the definition of the Eq. (5)

$$D = \left\| {xi - zj} \right\|,\;i = 1,2, \ldots ,N,\quad j = 1,2, \ldots ,K$$

(6)

where N is the number of dataset, K is the number of clusters

Step3::

update the queue. According to the principle that the minimum is the best, compared the fitness value of each new global atom with the atoms in the queue, if a new atom is better than the worst one in the queue, a new queue node which records this atom should be inserted into the queue following the same logic patterns used in a double linked general ordered list and the node where the worst atom is recorded should be deleted to keep the length of the queue fixed

Step4::

generate and evaluate local search atoms as same as Step2. For each stack, the local search atoms whose number equals the depth of the stack are generated in the neighborhood of their corresponding global atoms in the queue and then their fitness values are evaluated by fitness function

Step5::

update each stack. According to the principle that the minimum is the best, compared the fitness value of each new atom with the atoms in the corresponding stack, if a new atom is better enough to be recorded in the stack, a new stack node which records this atom should be inserted into the stack. If the number of nodes in the stack is bigger than the depth of the stack, the redundant nodes should be deleted. If the best atom in the i-th stack is better than the i-th global atom, they should be replaced each other

Step6::

check the termination criterion. If the termination criterion is satisfied, the algorithm will stop. Otherwise, return to Step2

Step7::

The best atom is obtained from the structure table of MOA, which contains the optimal cluster center vector. Then all data points again are assigned to their corresponding cluster centers according to the Eq. (6). Finally, the accurate rate of clustering results is calculated

4 Experiments

This section compares the clustering results of K-clustering, GA-clustering, PSO-clustering and MOA-clustering on six real-life datasets to verify the performance of MOA-clustering.

4.1 Datasets

Six experimental datasets including Haberman’s Survival, Iris, Vertebral Column, Wisconsin Breast Cancer, Contraceptive Method Choice and Wine are used to assess the performance of the respective clustering methods. All datasets are available at http://archive.ics.uci.edu/ml/index.html/ and listed in Table 1 and described briefly as follows:

Table 1 The brief description of six real-life datasets

Full size table

1.
Haberman’s Survival Dataset (n = 306, d = 3, k = 2) consists of 306 objects characterized by three features: Age of patient at time of operation, Patient’s year of operation, Number of positive axillary nodes detected. The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for breast cancer. There are two categories in the data: the patient survived 5 years or longer (225 objects) and the patient died within 5 year (81 objects).
2.
Fisher’s Iris Dataset (n = 150, d = 4, k = 3) consists of three different species of iris flowers: iris setosa, iris virginica and iris versicolour. For each species, 50 samples were collected from four features, namely sepal length and width as well as petal length and width.
3.
Vertebral Column Dataset (n = 310, d = 6, k = 2) consists of 310 objects characterized by six features: pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius and grade of spondylolisthesis. Dataset containing values for six biomechanical features used to classify orthopaedic patients into two classes: normal (210) and abnormal (100).
4.
Wisconsin Breast Cancer Dataset (n = 683, d = 9, k = 2) consists of 683 objects characterized by nine features: clump thickness, cell size uniformity, cell shape uniformity, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli and mitoses. There are two categories in the data: malignant tumors (444 objects) and benign tumors (239 objects).
5.
Contraceptive Method Choice Dataset (CMC) (n = 1,473, d = 9, k = 3) consists of 1,473 objects characterized by nine features: Wife’s age, Wife’s education, Husband’s education, Number of children ever born, Wife’s religion, Wife’s now working, Husband’s occupation, Standard-of-living index, Media exposure. Dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. There are three contraceptive methods used in the data: No-use (629), Long-term (333), Short-term (511).
6.
Wine Dataset (n = 178, d = 13, k = 3) consists of 178 objects characterized by 13 features: alcohol content, malic acid content, ash content, alkalinity of ash, concentration of magnesium, total phenols, flavanoids, non-flavanoid phenols, and proanthocyanins, and color intensity, hue and OD280/OD315 of diluted wines and pralines. These features were obtained by chemical analysis of wines that are produced in the same region in Italy but derived from three different cultivars. The quantities of objects in the three categories of the dataset are: class 1 (59 objects), class 2 (71 objects), and class 3 (48 objects).

4.2 Settings for clustering algorithms

In order to demonstrate the selection process of radius R and iteration number during data clustering, the relevant experiments with fixed iterations or radius R are carried out 10 times on three datasets termed Iris, Cancer as well as Wine and the relevant results are reported in Fig. 4 and Table 2, respectively. The relevant parameters used in the both two experiments are opted and enumerated as follows: the queue length of the upper triangular structure is 10, the depth of the i-th stack is determined by 10-i, thus the number of searching atoms is 60. Meanwhile, the number of iterations is 200 while radius R is changed from 0.1 to 10 in the fixed-iteration experiments and the radius R are set as 0.5, 3, 5, 7 as well as 9 with corresponding iterations of 200, 600, 800, 1,000 as well as 1,000 respectively in the determined-radius experiments. Figure 4 reveals that the radius R can be chosen randomly with 200 iterations and the lower average clustering accuracy for larger radius R can also be attributed to the limited iterations. From Table 2, it can be seen that when the iterations increases with the raising of radius R, the average clustering accuracy is almost consistent to that with lower iteration number for small radius R and the standard deviation is also very small. It means that a good and stable clustering accuracy can be obtained when the number of iterations is enough for the determined bigger radius R. For balancing the clustering accuracy and computationally cost effective in the experiments, the radius R is randomly selected as 0.5 in the interval [0.1, 1] and the number of iterations is set as 200.

Table 2 The average clustering accuracy and the standard deviation of accuracy for different combinations

Full size table

The common control parameters of these algorithms are population size (P) and the number of maximum generation (max _g). In order to compare K-clustering, GA-clustering, PSO-clustering and MOA-clustering fairly, these methods use the same common control parameter values which are denoted as P = 60, max _g = 200. Other control parameters of GA-clustering and PSO-clustering are presented below.

For genetic algorithms, we have used the standard version with no elitism, a mutation probability of 0.05 and a crossover probability of 0.95 [8]. For PSO, we have also used the standard version with the inertia weight is 0.7. The learning factors are set to 2 without the inertia correction [17]. We have used a fixed population size of n = 60 in all our simulations for all methods except K-clustering.

4.3 Experimental results and discussion

In order to evaluate the proposed algorithm, two criteria are used: the fitness as defined in Eq. (5) and the accurate rate as defined in Eq. (7).

$$Accuracy = {{\left( {\sum\limits_{i = 1}^{N} {\left\{ \begin{gathered} 1 \, Ai = Ai^{*} \hfill \\ 0 \, Ai \ne Ai^{*} \, \hfill \\ \end{gathered} \right.} } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\limits_{i = 1}^{N} {\left\{ \begin{gathered} 1 \, Ai = Ai^{*} \hfill \\ 0 \, Ai \ne Ai^{*} \, \hfill \\ \end{gathered} \right.} } \right)} {N \times 100\% }}} \right. \kern-0pt} {N \times 100\% }}$$

(7)

where N is the total number of data points. A _i and $Ai^{*}$ are the i-th data point before and after clustering.

The performance of MOA-clustering and other clustering methods is compared by the two criteria. The fitness of all four algorithms on six datasets is summarized in Table 3, including the best fitness, the average fitness, the worst fitness, and the standard deviation of fitness over 20 simulations. Figure 6 shows the standard deviation of fitness with bar chart. Table 4 summarizes the best accurate rate, the average accurate rate and the worst accurate rate obtained from the four clustering algorithms on six datasets over 20 simulation runs. Figure 7a–c shows the best accurate rate, the average accurate rate and the worst accurate rate respectively with bar charts. The clustering results of four algorithms on Haberman’s Survival dataset are shown in Fig. 5. Figure 5a shows the position of the dataset in 3-dimension space before clustering, Fig. 5b shows the result of MOA-clustering, GA-clustering and PSO-clustering on this dataset after clustering with 51.96 % accurate rate, Fig. 5c presents the result of K-clustering on this dataset after clustering with 24.18 % accurate rate.

Table 3 the fitness of clustering results with the application of different clustering algorithm on six datasets

Full size table

Table 4 The accurate rate of clustering results with the application of different clustering algorithm on six datasets

Full size table

According to the results displayed in Table 3 and Fig. 6, K-clustering is very unstable because of its least standard deviation on Cancer dataset and most standard deviation on Haberman’s Survival, Iris and Wine datasets. PSO-clustering is not very stable because it has a large fluctuation on Vertebral Column and Cancer datasets in terms of standard deviation. GA-clustering and MOA-clustering proposed in this paper are relative stable, however MOA-clustering outperforms GA-clustering on Vertebral Column, Iris Cancer and CMC datasets according to the standard deviation of fitness. So on the whole, MOA-clustering is more stable than other algorithm.

The results in Table 4 and Fig. 7 clearly show that the MOA-clustering has the ability to locate the optimal solution, because the proposed method can get the best accuracy on all datasets in comparison of other algorithms, which means it can locate the best cluster centers. According to the accurate rate of clustering results, MOA-clustering obviously outperforms K-clustering on all datasets. What’s more, MOA-clustering has competitive performance to GA-clustering and PSO-clustering in terms of accurate rate on Haberman’s Survival and Wine datasets, and even has better accurate rate than GA -clustering and PSO-clustering on other remaining datasets.

To sum up, MOA -clustering is capable of reaching a high accurate rate and stability in clustering problems compared with K-means, GA, and PSO on the fitness and accurate rate of clustering results.

5 Conclusion

This paper provides a new clustering method based on Multivariant Optimization Algorithm (MOA). Six real-life datasets are used to investigate the performance of MOA-clustering. The experimental results demonstrate that the proposed clustering algorithm is an effective and feasible method to reach a high accurate rate and stability in clustering problems.

References

Evangelou IE, Hadjimitsis DG, Lazakidou AA, Clayton C (2001) Data mining and knowledge discovery in complex image data using artificial neural networks. Workshop on Complex Reasoning an Geographical Data, Cyprus
Google Scholar
Selim SZ, Ismail MA (1984) K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell 6:81–87
Article MATH Google Scholar
Hitendra Sarma T, Viswanath P, Eswara Reddy B (2013) A hybrid approach to speed-up the k-means clustering method. Int J Mach Learn Cybernet 4(2):107–117
Article Google Scholar
Jinxin D, Minyong Qi (2009) A new algorithm for clustering based on particle swarm optimization and K-means. IEEE Int Conf Artif Intell Comput Intell 4:264–268
Google Scholar
Kao YT, Zahara E, Kao IW (2008) A hybridized approach to data clustering. Expert Syst Appl 34(3):1754–1762
Article Google Scholar
Filho JLR, Treleaven PC, Alippi C (1994) Genetic algorithm programming environments. IEEE Comput 27:28–43
Article Google Scholar
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of IEEE international conference on neural networks (ICW). vol IV, Perth, Australia, pp 1942–1948
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3–2:95–99
Article Google Scholar
Maulik U, Bandyopadhyay S (2000) Genetic algorithm based clustering technique. Pattern Recogn 33:1455–1465
Article Google Scholar
Chiou YC, Lan LW (2001) Theory and methodology genetic clustering algorithms. Eur J Oper Res 135:413–427
Article MathSciNet MATH Google Scholar
Merwe VD, Engelbrecht AP (2003) Data clustering using particle swarm optimization. In: Proceedings of IEEE congress on evolutionary computation 2003 (CEC 2003), Canbella, Australia, pp 215–220
Rana S, Jasola S, Kumar R (2013) A boundary restricted adaptive particle swarm optimization for data clustering. Int J Mach Learn Cybernet 4(4):391–400
Article Google Scholar
Cui XH, Potok TE, Palathingal P (2005) Document clustering using particle swarm optimization. IEEE swarm intelligence symposium 2005. Pasadena, California, pp 185–191
Google Scholar
Chen CY, Ye F (2004) Particle swarm optimization algorithm and its application to clustering analysis. International conference on networking, sensing control Taipei, Taiwan, March 21–23
Omran M, Engelbrecht AP, Salman A (2005) Particle swarm optimization method for image clustering. Int J Pattern Recogn Artif Intell 19(3):297–322
Article Google Scholar
Chuang LY, Hsiao CJ, Yang CH (2011) Chaotic particle swarm optimization for data clustering. Expert Syst Appl 38(12):14555–14563
Article Google Scholar
Eberhart RC, Shi Y (2000) Comparing inertia weights and constriction factors in particle swarm optimization. In: Proceedings, evolutionary computation

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, School of Information, Yunnan University, Kunming, 650091, Yunnan Province, China
Qin-Hu Zhang, Bao-Lei Li, Ya-Jie Liu, Lian Gao, Lan-Juan Liu & Xin-Ling Shi

Authors

Qin-Hu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bao-Lei Li
View author publications
You can also search for this author in PubMed Google Scholar
Ya-Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lian Gao
View author publications
You can also search for this author in PubMed Google Scholar
Lan-Juan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xin-Ling Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin-Ling Shi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, QH., Li, BL., Liu, YJ. et al. Data clustering using multivariant optimization algorithm. Int. J. Mach. Learn. & Cyber. 7, 773–782 (2016). https://doi.org/10.1007/s13042-014-0294-5

Download citation

Received: 01 January 2014
Accepted: 18 August 2014
Published: 26 August 2014
Issue Date: October 2016
DOI: https://doi.org/10.1007/s13042-014-0294-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data clustering using multivariant optimization algorithm

Abstract

Similar content being viewed by others

Clustering Analysis Based on Coyote Search Technique

An Analysis of K-Means, Particle Swarm Optimization and Genetic Algorithm with Data Clustering Technique

A Novel Hybrid Clustering Analysis Based on Combination of K-Means and PSO Algorithm

1 Introduction

2 Multivariant optimization algorithm