1 Introduction

Cluster analysis has become an important technique in exploratory data analysis, pattern recognition, machine learning, image segmentation, neural computing, and other engineering [1]. In the field of clustering, K-means algorithm as a popular clustering method has been successfully applied to many practical clustering problems [2, 3]. However, the results obtained by using K-means algorithm may contain several local minima as the objective function of K-means which is not convex [4, 5]. Evolutionary algorithms such as genetic algorithm (GA) [6] and particle swarm optimization (PSO) [7] have been therefore introduced to solve such problems and widely applied to various clustering problems [817].

In this paper, the recently proposed discrete heuristic optimization algorithm named multivariant optimization algorithm (MOA) is adopted to solve clustering problems. In MOA, a search individual is named as an atom. The main idea of MOA is to search the solution space through alternating global–local search iterations where global exploration atoms explore the whole solution space to locate potential areas and then multiple local exploitation groups with different population are allotted to these potential areas for different levels of local exploitations. The better atoms generated in the optimization process are recorded in a data structure which is made up of a queue and some stacks, whereas the worse ones are extruded in competition. As to clustering problem, we can regard an array (atom) recording all cluster centers as a solution, and only need to search for the optimal solution in the whole solution space, so the clustering problem is converted to an optimization problem. Then the MOA can be applied to search for the optimal solution which contains all cluster centers of different classes.

For the purpose of simplifying the description, MOA, K-means, GA, and PSO based data clustering are named as MOA-clustering, K-clustering, GA-clustering, and PSO-clustering respectively in this paper.

In order to evaluate the performance of MOA-clustering, some comparative experiments on MOA-clustering, K-clustering, GA-clustering and PSO-clustering based on six datasets are conducted. The experimental results demonstrate that the MOA-clustering not only has the ability to locate the optimal solution but also has competitive performance to GA-clustering and PSO-clustering, and outperforms K-clustering in terms of the accurate rate and stability, and that it is an effective and feasible method to reach a high accurate rate and stability in clustering problems.

2 Multivariant optimization algorithm

With the development of computer technology, the memory and speed of computer had improved a lot in a rapid speed, as a result, a novel discrete evolutionary optimization algorithm making full use of the memory of computer is proposed in this paper. The idea of the proposed algorithm is inspired by the characters of computer data structure especially the ordered doubly-linked list. In the MOA search process, global search atoms explore the global space to locate potential areas and then local search atoms exploit each potential local area in detail to improve the results and then the better atoms are recorded in the structure table. After sufficient global–local search iterations under the instruction of a structure table, multiple optimal solutions are recorded in the queue of the structure table.

The search process of MOA is implemented by search atoms under the instruction of a structure. For a minimum optimization problem, the structure table illustrated in Fig. 1 is designed according to the following rules:

Fig. 1
figure 1

Structure table of multivariant optimization algorithm

  1. 1.

    Global search atoms are recorded in the queue.

  2. 2.

    The fitness values of atoms recorded in the queue are increasing from the front to the rear.

  3. 3.

    Each node in the queue has a stack pointer which points to a stack, the depth of the stacks is descending from the left to the right.

  4. 4.

    Local search atoms generated in the neighborhood of the i-th global atom are recorded in the i-th stack. The fitness values of atoms recorded in each stack are increasing from the bottom to the top.

In MOA algorithm, an atom stands for a candidate solution of the optimization problem. The atoms are in two types: global search and local search atoms. Global search atoms are generated uniformly at random in the solution space. Local search atoms are generated in the neighborhood of the global atoms recorded in the queue node, which is considered as the center of a potential area. In a D-dimensional solution space, the global search atoms denoted as atom g are generated according to the Eq. (1):

$$atom_{g} = \left\{ {unifrnd(min_{1} ,max_{1} ), \ldots ,unifrnd(min_{D} ,max_{D} )} \right\}$$
(1)

where min i and max i are the lower and the upper bounds of the i-th dimension of solution space, which are determined by calculating the minimum and maximum of the i-th dimension in datasets. The function unifrnd(min i , max i ) returns a random number which is uniformly distributed on the interval from min i to max i , so atom g is a vector.

The local search atoms denoted as atom l are generated in each corresponding global atom’s neighborhood with radius R according to the Eq. (2):

$$atom_{l} = \left\{ \begin{gathered} atom_{g} + r * R * \frac{{\left[ {h1, \ldots ,h_{D} } \right]}}{{\sqrt {\sum\limits_{i = 1}^{D} {h_{i}^{2} } } }} \, \left( {\sum\limits_{i = 1}^{D} {h_{i}^{2} } \ne 0} \right) \hfill \\ atom_{g} \, \left( {\sum\limits_{i = 1}^{D} {h_{i}^{2} } = 0} \right) \hfill \\ \end{gathered} \right.$$
(2)

where atom g and R are the center vector and radius of the neighborhood respectively, [h 1,…,h D ] is a vector with random numbers uniformly distributed on [−1,1], and \({{[h_{1} , \ldots ,h_{D} ]} \mathord{\left/ {\vphantom {{[h_{1} , \ldots ,h_{D} ]} {\sqrt {\sum\nolimits_{i = 1}^{D} {h_{i}^{2} } } }}} \right. \kern-0pt} {\sqrt {\sum\nolimits_{i = 1}^{D} {h_{i}^{2} } } }}\) is a unit vector, so \({{R^{*} [h_{1} , \ldots ,h_{D} ]} \mathord{\left/ {\vphantom {{R^{*} [h_{1} , \ldots ,h_{D} ]} {\sqrt {\sum\nolimits_{i = 1}^{D} {h_{i}^{2} } } }}} \right. \kern-0pt} {\sqrt {\sum\nolimits_{i = 1}^{D} {h_{i}^{2} } } }}\) stands for a circle with radius R, r is a random number between 0 and 1. The definition of radius R is as follows: let f(x) be defined on D and there exists a global optimal point atom g \(\in\) D. We say that f(x) has a limit as x tends to atom g provided that there exists an optimum A \(\in\) R such that for every ε > 0, no matter how small, there always exists R > 0 such that for every x \(\in\) D,

$$\left\{ \begin{gathered} 0 < |x - atom_{g} | < R \, .implies.{ |}f(x) - A | { < }\varepsilon \hfill \\ \hfill \\ \mathop {\lim }\limits_{iterations - > \infty } P(|f(x) - A| < \varepsilon ) = 1 \hfill \\ \end{gathered} \right.$$

Further, in the case that such an optimum A exists, we shall say that A is the limit of f(x) as x tends to atom g . In addition, f(x) will gradually converge to the optimum A with probability 1 when the number of iterations is enough.

The MOA algorithm searches the solution space by the following steps which are illustrated by Fig. 2:

Fig. 2
figure 2

Flowchart of the MOA algorithm

Step1::

set the initial parameters of the MOA algorithm: the length of queue, the depth of each steak, the scope of neighborhood and the maximum number of iterations

Step2::

generate and evaluate global search atoms. At the beginning of iteration, a number of new global search atoms are generated and then their fitness values are evaluated

Step3::

update the queue. Compared the fitness value of each new global atom with the atoms in the queue, if a new atom is better enough to be recorded in the queue, a new queue node which records this atom should be inserted into the queue following the same logic patterns used in a double linked general ordered list and the rear node should be deleted to keep the length of the queue fixed

Step4::

generate and evaluate local search atoms. For each stack, the local search atoms whose number equals the depth of the stack are generated in the neighborhood of their corresponding global atoms in the queue and then their fitness values are evaluated by fitness function

Step5::

update each stack. Compared the fitness value of each new atom with the atoms in the corresponding stack, if a new atom is better enough to be recorded in the stack, a new stack node which records this atom should be inserted into the stack. If the number of nodes in the stack is bigger than the depth of the stack, the redundant nodes should be deleted. If the best atom in the i-th stack is better than the i-th global atom, they should be replaced each other

Step6::

check the termination criterion. If the termination criterion is satisfied, the algorithm will stop. Otherwise, return to Step2

3 Application of multivariant optimization algorithm

3.1 Encoding the search atoms

From the above description of MOA, we can know the main encoded objects are the search atoms, including global and local search atoms. In the context of clustering, a single atom represents all cluster centers. That is, in a n-dimensions space, the i-th atom is encoded as follows:

$$xi = (x_{11}^{i} ,x_{12}^{i} , \ldots ,x_{1n}^{i} ,\;x_{21}^{i} ,x_{22}^{i} , \ldots ,x_{2n}^{i} , \ldots ,\;x_{K1}^{i} ,x_{K2}^{i} , \ldots x_{Kn}^{i} )$$

where x i jm refers to the m-th value of the j-th cluster center vector in the i-th atom, K is the number of clusters. So the length of each atom is K × n.

Figure 3 is an example of the encoding of a single atom at the time of producing search atoms in the MOA. Let n = 2, K = 3, i.e., the search space is two-dimension and the number of clusters is three. The vector of this atom represents three cluster centers [(x 1, y 1) (x 2, y 2) (x 3, y 3)].

Fig. 3
figure 3

The encoding of a single atom

According to the encoding of a single atom, every atom produced during the process of searching stands for a candidate solution to optimization problem, so that we can make the best of MOA to search the optimal solution in the n-dimensions solution space as an optimization problem.

3.2 Designing the evaluation function

The design of the evaluation function of MOA-clustering is a key issue of applying MOA to clustering, the main role of it is to evaluate whether the atom gotten during the process of searching is a better cluster center or not, if a new atom is better than the old one during the process of updating, then the old one will be replaced by the new one.

In this paper, we design the evaluation fitness following two criteria as follows:

  1. 1.

    The inner-cluster distance as defined in Eq. (3), i.e. the distance between data vectors and their corresponding cluster center within a cluster, where the objective is to minimize the inner-cluster distance.

    $$J1 = \sum\limits_{j = 1}^{K} {\sum\limits_{\forall xi \in Zj}^{{}} {\mathop {\left\| {xi - {\text{z}}j} \right\|}\nolimits^{2} } }$$
    (3)

    where z j is the j-th cluster center, K is the number of clusters, x i denotes data points belonging to z j .

  2. 2.

    The inter-cluster distance as defined in Eq. (4), i.e. the distance between all cluster centers, where the objective is to maximize the distance between clusters.

    $$J2 = \sum\limits_{i = 1}^{K} {\sum\limits_{j = i + 1}^{K} {\mathop {\left\| {zi - zj} \right\|}\nolimits^{2} } }$$
    (4)

    where z i , z j are the i-th and j-th cluster center respectively, K is the number of clusters.

    According to the two criteria, the evaluation function is designed as defined in Eq. (5), where generally the objective is to minimize the value of evaluation function.

    $${\text{fitness}} = w_{1} \times J_{1} - w_{2} \times J_{2}$$
    (5)

    where w 1, w 2 are the weight coefficient of J 1 and J 2 respectively, which decide the influence of J 1 and J 2 in evaluation. If w 1 is bigger than w 2, it means that J 1 decides the result of evaluation in a large part. Through a series of experiments, the clustering results are relative stabile and better when w 1 = 0.8, w 2 = 0.2. So in this paper, we set w 1 = 0.8, w 2 = 0.2.

3.3 MOA-clustering

After encoding the search atoms and designing the evaluation function, the execution of MOA-clustering is as follow:

Step1::

set the initial parameters of the MOA algorithm: the length of queue, the depth of each steak, the scope of neighborhood and the maximum number of iterations

Step2::

generate and evaluate global search atoms. At the beginning of iteration, a number of new global search atoms are generated randomly as the above description of encoding the search atoms in Fig. 3 and all data points x i are assigned to their corresponding cluster centers z j with the shortest distance between x i and z j according to Euclidean distance as defined in Eq. (6), then their fitness values are evaluated as the definition of the Eq. (5)

$$D = \left\| {xi - zj} \right\|,\;i = 1,2, \ldots ,N,\quad j = 1,2, \ldots ,K$$
(6)

where N is the number of dataset, K is the number of clusters

Step3::

update the queue. According to the principle that the minimum is the best, compared the fitness value of each new global atom with the atoms in the queue, if a new atom is better than the worst one in the queue, a new queue node which records this atom should be inserted into the queue following the same logic patterns used in a double linked general ordered list and the node where the worst atom is recorded should be deleted to keep the length of the queue fixed

Step4::

generate and evaluate local search atoms as same as Step2. For each stack, the local search atoms whose number equals the depth of the stack are generated in the neighborhood of their corresponding global atoms in the queue and then their fitness values are evaluated by fitness function

Step5::

update each stack. According to the principle that the minimum is the best, compared the fitness value of each new atom with the atoms in the corresponding stack, if a new atom is better enough to be recorded in the stack, a new stack node which records this atom should be inserted into the stack. If the number of nodes in the stack is bigger than the depth of the stack, the redundant nodes should be deleted. If the best atom in the i-th stack is better than the i-th global atom, they should be replaced each other

Step6::

check the termination criterion. If the termination criterion is satisfied, the algorithm will stop. Otherwise, return to Step2

Step7::

The best atom is obtained from the structure table of MOA, which contains the optimal cluster center vector. Then all data points again are assigned to their corresponding cluster centers according to the Eq. (6). Finally, the accurate rate of clustering results is calculated

4 Experiments

This section compares the clustering results of K-clustering, GA-clustering, PSO-clustering and MOA-clustering on six real-life datasets to verify the performance of MOA-clustering.

4.1 Datasets

Six experimental datasets including Haberman’s Survival, Iris, Vertebral Column, Wisconsin Breast Cancer, Contraceptive Method Choice and Wine are used to assess the performance of the respective clustering methods. All datasets are available at http://archive.ics.uci.edu/ml/index.html/ and listed in Table 1 and described briefly as follows:

Table 1 The brief description of six real-life datasets
  1. 1.

    Haberman’s Survival Dataset (n = 306, d = 3, k = 2) consists of 306 objects characterized by three features: Age of patient at time of operation, Patient’s year of operation, Number of positive axillary nodes detected. The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for breast cancer. There are two categories in the data: the patient survived 5 years or longer (225 objects) and the patient died within 5 year (81 objects).

  2. 2.

    Fisher’s Iris Dataset (n = 150, d = 4, k = 3) consists of three different species of iris flowers: iris setosa, iris virginica and iris versicolour. For each species, 50 samples were collected from four features, namely sepal length and width as well as petal length and width.

  3. 3.

    Vertebral Column Dataset (n = 310, d = 6, k = 2) consists of 310 objects characterized by six features: pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius and grade of spondylolisthesis. Dataset containing values for six biomechanical features used to classify orthopaedic patients into two classes: normal (210) and abnormal (100).

  4. 4.

    Wisconsin Breast Cancer Dataset (n = 683, d = 9, k = 2) consists of 683 objects characterized by nine features: clump thickness, cell size uniformity, cell shape uniformity, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli and mitoses. There are two categories in the data: malignant tumors (444 objects) and benign tumors (239 objects).

  5. 5.

    Contraceptive Method Choice Dataset (CMC) (n = 1,473, d = 9, k = 3) consists of 1,473 objects characterized by nine features: Wife’s age, Wife’s education, Husband’s education, Number of children ever born, Wife’s religion, Wife’s now working, Husband’s occupation, Standard-of-living index, Media exposure. Dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. There are three contraceptive methods used in the data: No-use (629), Long-term (333), Short-term (511).

  6. 6.

    Wine Dataset (n = 178, d = 13, k = 3) consists of 178 objects characterized by 13 features: alcohol content, malic acid content, ash content, alkalinity of ash, concentration of magnesium, total phenols, flavanoids, non-flavanoid phenols, and proanthocyanins, and color intensity, hue and OD280/OD315 of diluted wines and pralines. These features were obtained by chemical analysis of wines that are produced in the same region in Italy but derived from three different cultivars. The quantities of objects in the three categories of the dataset are: class 1 (59 objects), class 2 (71 objects), and class 3 (48 objects).

4.2 Settings for clustering algorithms

In order to demonstrate the selection process of radius R and iteration number during data clustering, the relevant experiments with fixed iterations or radius R are carried out 10 times on three datasets termed Iris, Cancer as well as Wine and the relevant results are reported in Fig. 4 and Table 2, respectively. The relevant parameters used in the both two experiments are opted and enumerated as follows: the queue length of the upper triangular structure is 10, the depth of the i-th stack is determined by 10-i, thus the number of searching atoms is 60. Meanwhile, the number of iterations is 200 while radius R is changed from 0.1 to 10 in the fixed-iteration experiments and the radius R are set as 0.5, 3, 5, 7 as well as 9 with corresponding iterations of 200, 600, 800, 1,000 as well as 1,000 respectively in the determined-radius experiments. Figure 4 reveals that the radius R can be chosen randomly with 200 iterations and the lower average clustering accuracy for larger radius R can also be attributed to the limited iterations. From Table 2, it can be seen that when the iterations increases with the raising of radius R, the average clustering accuracy is almost consistent to that with lower iteration number for small radius R and the standard deviation is also very small. It means that a good and stable clustering accuracy can be obtained when the number of iterations is enough for the determined bigger radius R. For balancing the clustering accuracy and computationally cost effective in the experiments, the radius R is randomly selected as 0.5 in the interval [0.1, 1] and the number of iterations is set as 200.

Fig. 4
figure 4

The changing curve of clustering accuracy with the change of local radius R on Iris dataset (a), Cancer dataset (b) and Wine dataset (c) when the number of iterations is 200

Table 2 The average clustering accuracy and the standard deviation of accuracy for different combinations

The common control parameters of these algorithms are population size (P) and the number of maximum generation (max g ). In order to compare K-clustering, GA-clustering, PSO-clustering and MOA-clustering fairly, these methods use the same common control parameter values which are denoted as P = 60, max g  = 200. Other control parameters of GA-clustering and PSO-clustering are presented below.

For genetic algorithms, we have used the standard version with no elitism, a mutation probability of 0.05 and a crossover probability of 0.95 [8]. For PSO, we have also used the standard version with the inertia weight is 0.7. The learning factors are set to 2 without the inertia correction [17]. We have used a fixed population size of n = 60 in all our simulations for all methods except K-clustering.

4.3 Experimental results and discussion

In order to evaluate the proposed algorithm, two criteria are used: the fitness as defined in Eq. (5) and the accurate rate as defined in Eq. (7).

$$Accuracy = {{\left( {\sum\limits_{i = 1}^{N} {\left\{ \begin{gathered} 1 \, Ai = Ai^{*} \hfill \\ 0 \, Ai \ne Ai^{*} \, \hfill \\ \end{gathered} \right.} } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\limits_{i = 1}^{N} {\left\{ \begin{gathered} 1 \, Ai = Ai^{*} \hfill \\ 0 \, Ai \ne Ai^{*} \, \hfill \\ \end{gathered} \right.} } \right)} {N \times 100\% }}} \right. \kern-0pt} {N \times 100\% }}$$
(7)

where N is the total number of data points. A i and \(Ai^{*}\) are the i-th data point before and after clustering.

The performance of MOA-clustering and other clustering methods is compared by the two criteria. The fitness of all four algorithms on six datasets is summarized in Table 3, including the best fitness, the average fitness, the worst fitness, and the standard deviation of fitness over 20 simulations. Figure 6 shows the standard deviation of fitness with bar chart. Table 4 summarizes the best accurate rate, the average accurate rate and the worst accurate rate obtained from the four clustering algorithms on six datasets over 20 simulation runs. Figure 7a–c shows the best accurate rate, the average accurate rate and the worst accurate rate respectively with bar charts. The clustering results of four algorithms on Haberman’s Survival dataset are shown in Fig. 5. Figure 5a shows the position of the dataset in 3-dimension space before clustering, Fig. 5b shows the result of MOA-clustering, GA-clustering and PSO-clustering on this dataset after clustering with 51.96 % accurate rate, Fig. 5c presents the result of K-clustering on this dataset after clustering with 24.18 % accurate rate.

Table 3 the fitness of clustering results with the application of different clustering algorithm on six datasets
Table 4 The accurate rate of clustering results with the application of different clustering algorithm on six datasets
Fig. 5
figure 5

The results of four algorithms on Haberman’s Survival dataset before (a) and after (b, c) clustering

According to the results displayed in Table 3 and Fig. 6, K-clustering is very unstable because of its least standard deviation on Cancer dataset and most standard deviation on Haberman’s Survival, Iris and Wine datasets. PSO-clustering is not very stable because it has a large fluctuation on Vertebral Column and Cancer datasets in terms of standard deviation. GA-clustering and MOA-clustering proposed in this paper are relative stable, however MOA-clustering outperforms GA-clustering on Vertebral Column, Iris Cancer and CMC datasets according to the standard deviation of fitness. So on the whole, MOA-clustering is more stable than other algorithm.

Fig. 6
figure 6

The standard deviation of fitness

The results in Table 4 and Fig. 7 clearly show that the MOA-clustering has the ability to locate the optimal solution, because the proposed method can get the best accuracy on all datasets in comparison of other algorithms, which means it can locate the best cluster centers. According to the accurate rate of clustering results, MOA-clustering obviously outperforms K-clustering on all datasets. What’s more, MOA-clustering has competitive performance to GA-clustering and PSO-clustering in terms of accurate rate on Haberman’s Survival and Wine datasets, and even has better accurate rate than GA -clustering and PSO-clustering on other remaining datasets.

Fig. 7
figure 7

The best accurate rate (a), the average accurate rate (b) and the worst accurate rate (c)

To sum up, MOA -clustering is capable of reaching a high accurate rate and stability in clustering problems compared with K-means, GA, and PSO on the fitness and accurate rate of clustering results.

5 Conclusion

This paper provides a new clustering method based on Multivariant Optimization Algorithm (MOA). Six real-life datasets are used to investigate the performance of MOA-clustering. The experimental results demonstrate that the proposed clustering algorithm is an effective and feasible method to reach a high accurate rate and stability in clustering problems.