Integration of artificial immune network and K-means for cluster analysis

Kuo, R. J.; Chen, S. S.; Cheng, W. C.; Tsai, C. Y.

doi:10.1007/s10115-013-0649-3

Integration of artificial immune network and K-means for cluster analysis

Regular Paper
Published: 04 May 2013

Volume 40, pages 541–557, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Knowledge and Information Systems Aims and scope Submit manuscript

Integration of artificial immune network and K-means for cluster analysis

Download PDF

R. J. Kuo¹,
S. S. Chen²,
W. C. Cheng¹ &
…
C. Y. Tsai³

571 Accesses
14 Citations
Explore all metrics

Abstract

This study is dedicated to propose a cluster analysis algorithm which is integration of artificial immune network (aiNet) and K-means algorithm (aiNetK). Four benchmark data sets, Iris, Wine, Glass, and Breast Cancer, are employed to testify the proposed algorithm. The computational results reveal that aiNetK is superior to particle swam optimization and artificial immune system-related methods.

Application of K-Means Clustering Algorithm in Automatic Machine Learning

A modified unconscious search algorithm for data clustering

Article 11 March 2021

Cluster Analysis Using Improved Bio-inspired Fuzzy C-Means Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Cluster analysis has been widely applied in many areas. Its goal is to partition a set of patterns into disjoint and homogeneous clusters [23]. Basically, clustering algorithm groups the data with high similarity altogether and so minimizes the similarity among each clusters. It allows us to find how data spreads as well as their features. Through the process of reducing the complexity, clustering algorithm also allows us to analyze the hidden commercial implication under the clustered data and would be beneficial for business decision making.

There already have been many clustering methods proposed in the literatures. Among them, metaheuristics which are heuristics and stochastic search procedures based on the mechanics of natural selection, genetics, and evolution are most applied. Metaheuristics allow them to find the global solution for a given problem [29]. Therefore, this study considers the use of a kind of AISs, artificial immune network (aiNet) for cluster analysis. The proposed clustering algorithm is an integration of aiNet and K-means algorithm (aiNetK). In order to testify the proposed aiNetK algorithm, four benchmark data sets, Iris, Wine, Glass, and Breast Cancer, were employed. They are compared with those of AIS-based algorithms and PSO-based algorithms.

The remainder of this study is organized as follows. Section 2 briefly presents the necessary background, while the proposed aiNetK algorithm is proposed in Sect. 3. Section 4 discusses the experiment analysis results using four benchmark data sets. Finally, the concluding remarks are made in Sect. 5.

2 Backgound

This section will briefly survey some literatures regarding the current study. They include cluster analysis and artificial immune systems.

2.1 Cluster analysis

Cluster analysis which is a very important technique in data mining can partition data into a certain number of clusters. A cluster is described by considering internal homogeneity and external differentiation, i.e., patterns in the same cluster should be similar to each other, while patterns in different clusters should not [38]. Basically, clustering algorithms can be divided into two categories: hierarchical and partitional clustering [14]. Hierarchical clustering algorithms organize data into a hierarchical structure according to the proximity matrix. The benefit is that the results of hierarchical clustering are usually depicted by a binary tree or den diagram. Hierarchical clustering algorithms can be further classified as agglomerative and divisive methods. Many hierarchical clustering algorithms have been presented including CURE [10], BIRCH [42], ROCK [11], CHAMELEON [16], DIANA [17], AUTOCLUST [7], and AMOEBA [6].

Unlike to hierarchical clustering which produces a successive level of clusters by iterative fusions or divisions, partitional clustering assigns a set of objects into clusters with no hierarchical structure. These algorithms intend to minimize certain criteria, like square error function, and can therefore be treated as optimization problems. Partitional clustering aims to optimize cluster centroids as well as the number of clusters [13]. Partitional clustering algorithms are mainly classified into supervised and unsupervised clustering algorithms. The main difference is that supervised clustering algorithms need to pre-determine the number of clusters, while the other does not. Recently, the semi-supervised clustering algorithm is developed [43, 44]. Since the algorithm proposed in this study belongs to supervised clustering algorithm, it will be discussed with more details.

The most widely used supervised clustering algorithm is the K-means algorithm [8]. Latter and Bezdek [2] developed a fuzzy version of the K-means algorithm, called the fuzzy C-means algorithm. Unlike K-means algorithm, the object can belong to all of the clusters with a certain degree of membership. Other modifications of fuzzy C-means algorithms are the possibilistic-means clustering algorithm [19], fuzzy c-shells [3], hierarchical unsupervised fuzzy clustering [9, 25], fuzzy possibilistic c-means [32, 37], and semi-supervised fuzzy clustering algorithm [28, 39].

In addition, metaheuristics are also employed to improve the performance of K-means. Clustering can be regarded as a category of optimization problems that uses metaheuristic algorithms, like GA, PSO, or ant colony optimization (ACO) algorithms. Many techniques support this method, such as the genetically guided [12], the genetic K-means [18, 26] , Tabu search [1], the ant colony clustering [20], and the particle swarm K-means optimization (PSKO) algorithm [21].

2.2 Artificial immune network

In 1974, Jerne proposed the first mathematical model in artificial immune system, which initiated subsequent researches and discussions. Liu et al. [24] suggested that artificial immune system (AIS) is a novel intelligent algorithm method, inspired from the biological immune system. The main principle of AIS is to imitate artificial immune system, and AIS can acquire learning capability by learning the biological protection principle.

The Opt-aiNET derives from aiNet, which uses clonal selection and affinity maturation in the immune theory and the immune network theory to establish the network model as an important application model of the artificial immune system. Its advantages include noise acceptability, unsupervised learning, and self-organization. The applications include data processing, optimization learning, and fault diagnosis. Timmis and Edmonds [36] suggested that the optimal artificial immune network model is an adaptive process based on an immune principle to perform search and optimization. The procedures in the model include the fitness value evaluation made by antibody for objective function, clonal expansion antibody maturation, and suppression. There are some characteristics for opt-aiNet suggested by Pasti and Castro [31]. The population size will grow or reduce with iteration. Clone produces similar antibodies by copying populations. The number of copies in each population is the same. Besides, the variation of cloned antibodies is related to the fitness value of the original populations. Basically, a better fitness value indicates smaller variation. The excessively similar antibodies are suppressed. The affinity of antibodies is smaller than preset threshold $\sigma _s $. The average fitness value evaluation aims to explore development space. If there is a deviation from the last generation, the exploration can be continued, and the suppression phase can be performed until no deviation occurs. New cells will be added after suppression to prevent solution from falling on the regional optimal solution.

Bezerra et al. [4] proposed the adaptive radius immune algorithm (ARIA), through which the most density information of the compressed data can be retained. The simulation experiment showed that the clustering effect of the method is better than aiNet clustering algorithm. Tang and Vemuri [34] used aiNet model in the more complex file clustering field. Based on the immune network and mutation principle, aiNet algorithm has good clustering result, and better performance than hierarchical clustering method and K-means clustering method. Sotiropoulos et al. [33] applied aiNet to cluster analysis for customer online shopping. The method is compared with hierarchical, fuzzy c-means, and spectral analysis methods. The experiment has demonstrated aiNet clustering analysis yields better result. Hart and Ross [15] combined AIS with sparse distributed memories (SDM) to propose a model used for processing a mass of dynamic data. The model can follow up dynamic data set and memorize past clustering results. Nasraoui et al. [30] suggested that AIS algorithm can be used for network mining of clustering of dynamic data strings, which are clicked on continuously in student course grouping. Younsi and Wang [40] applied clonal selection in basic artificial immune system to data clustering, suggested an easier algorithm, and used four simulation data sets and two reference data sets to test data clustering. The experimental result indicated that the data sets can be clustered correctly. Based on anti-spam technology, Yue et al. [41] proposed a new e-mail service system using artificial immune clustering method to filter all messages. Li et al. [22] suggested self-adaptive capability of multiple-clone clustering algorithm based on the biological immune and clonal selection principle. The data simulation experiment can prove that the algorithm proposed is a rational and effective clustering method and can yield better result as compared to K-means. Based on immune control and immune selection mechanism in the biological immune system, Ma et al. [27] suggested a new data mining method- immune Dominance Clonal Multi-objective Clustering algorithm (IDCMC). The parent population antibody is divided into three sub-antibody sets. Different measurement methods and evolution selection strategies are used. Through simulation experiment, this algorithm has better clustering performance than those of GA and K-means. Chiu et al. [5] suggested that hybrid of artificial immune system and ant algorithm can be used for air-conditioner market segments of 3C shopping. The experiment proved that the hybrid method has better clustering result than the SOM clustering method.

3 Methodology

This section will present the proposed method for cluster analysis. It consists of two main components: (1) data encoding and (2) aiNetK algorithm.

3.1 Data encoding

Each antibody represents a clustering solution with cluster centroids $Z_{i}$ where $i$ is the $i$th cluster centroid. Thus, for an antibody with four cluster centroids, it can be represented as shown in Fig. 1.

3.2 aiNetK algorithm

The proposed clustering algorithm is an integration of aiNet and K-means algorithm. Figure 2 illustrates the algorithm flowchart. The detailed procedure of aiNetK algorithm is as follows.

Step 1: Set up parameters

In aiNetK algorithm, it is necessary to determine some parameters including number of generations, number of memory cells, $M$, number of remaining cells, $R$, number of clone, $N_{c}$, error threshold between two generations, and self-suppression threshold, ${\sigma }_s $.

Step 2: Generate initial population randomly

At the beginning of generation, generate initial population randomly. The initial population includes $P$ antibodies with $M$ memory cells and $R$ remaining cells. Population set $P$ will toward to the optimal solution using the following steps. In the proposed aiNetK algorithm, it is necessary to pre-specify the number of clusters, $K$. Since every antibody is consisted of $K$ centroid vectors, it can be represented as:

$$\begin{aligned} P_{id} =\left( {X_{i1} ,\ldots ,X_{ij} ,\ldots ,X_{ik} } \right) , \end{aligned}$$

(1)

where $P_{id} $ represents the $i$th antibody, $i$ is the antibody number, and $d$ is the cluster number.

Step 3: Calculate fitness value for population

This study uses Euclidean distance to calculate the fitness value for antibody. The procedure is as follows:

(i)
Calculate the distance of each data point $X_{i}$ to each cluster centroid, $C_{i,j}$. Distance of $X_{i}$ to $C_{i,j}$ is represented as:
$$\begin{aligned} d(X_i ,C_{ij} )=\left\| {X_i -C_{ij} } \right\| , \end{aligned}$$
(2)
where $X_{i}$ is the $ i$th data point vector and $C_{ij}$ is the centroid of $i$th antibody and $j$th cluster.
(ii)
Form $K$ clusters by assigning each data point to its closest centroid for every antibody.
(iii)
Calculate fitness value, Ab$_{i}$, for every antibody. Ab$_{i}$ is represented as:
$$\begin{aligned} (\text{ Ab })_i&= \frac{1}{1+D_i } ,\quad 0\le (\text{ Ab })_i \le 1, {i}\in {P};\end{aligned}$$
(3)

$$\begin{aligned} D_i&= \text{ SED }_i =\sum _{j=1}^K \sum _{\forall X_i \in n_{ij} } \left\| {X_i -C_{ij} } \right\| , \end{aligned}$$
(4)
where $D_i $ is sum of Euclidean distance (SED) of $i$th antibody between $X_{i}$ and $C_{i,j}$. $n_{i,j}$ is the total number of data points for of $j$th cluster.

Step 4: Clone the population to produce $N_{c}$ clones

Every antibody in the population is cloned, or copied, $N_{c}$ times to generate new population with $P\times N_c $ antibodies.

Step 5: Retain the original population and maturate affinity of remaining antibodies

In this part, the original population needs to be retained first and then affinity of remaining antibodies can be maturated to prevent the antibodies after variation from being worse than the parameters before variation. The variation steps are as follows:

(i)
Calculate mutation rate. The mutation rate is related to the fitness value. The mutation rate of the antibodies with higher affinity is smaller. The equation is presented by Eq. (5):
$$\begin{aligned} \text{ Maturate } \text{ rate }=\alpha _{i} =\left( \frac{1}{\rho }\right) e^{-\left( {\text{ Ab }^{*}} \right) _i }, \end{aligned}$$
(5)
where $\left( {\text{ Ab }^{*}} \right) _i =\left( \text{ Ab } \right) _i /(\text{ Ab })_\mathrm{max} ,\,\left( {\text{ Ab }^{*}} \right) _i $ is a normalized affinity value; $\rho $ is constant; $i\in \left( {P*N_c} \right) -P.$
(ii)
Maturate affinity as shown in Eq. (6):
$$\begin{aligned} c^{{\prime }}=c+\alpha i\times N\left( {0,1} \right) , \end{aligned}$$
(6)
where ${c}^{{\prime }}$: antibody serial after affinity maturation, ${c}=({{c}}_1 ,\ldots ,{{c}}_2 ,\ldots ,{{c}}_{{x}} )$ stands for cloned antibody serial, $\alpha i$: mutation rate, $N\left( {0,1} \right) $: standard normal distribution with the average value 0 and standard deviation 1.

Step 6: Perform one-iteration K-means and obtain new antibody cluster

Calculation steps of one-iteration K-means are as follows:

(i)
Calcuate Euclidean distance from each data $X$ in each antibody to centroid of each cluster, as shown in Eq. (7):
$$\begin{aligned} d(X_i ,C_{ij} )=\left\| {X_i -C_{ij} } \right\| . \end{aligned}$$
(7)
(ii)
In each antibody, each data $X$ is assigned to the nearest cluster centroid and belongs to the cluster.
(iii)
Calculate new cluster centroid vector in each antibody and obtain new antibody cluster, as shown in Eq. (8):
$$\begin{aligned} C_{ij}^\mathrm{new} =\frac{1}{n_{ij}}\sum _{\forall X_i \in n_{ij} } X_i , \end{aligned}$$
(8)
where $C_{ij}^\mathrm{new} $: new centroid vector of the $j$th cluster in the $i$th antibody, $n_{ij} $: total quanity of data of the $j$th cluster in the $i$th antibody.

Step 7: Recalculate the fitness value of new antibodies

The cluster centroids in the antibodies may change after affinity maturation and one-iteration K-means. Therefore, the fitness value of the new antibody should be calculated before selection of optimal antibody. The calculation of the fitness value is the same as Step 3.

Step 8: Select the optimal antibody in each group to replace original population

In this step, there are $P$ groups of new antibody clusters, and each cluster has $N_{c}$ clones. In order to obtain optimal antibody of each group, the antibodies in each group are prioritized in terms of the fitness value in selection. After prioritization, the first antibody (with the highest affinity) in each group is selected to replace original population $P$. At this time, the optimal antibody selected from each group may be the worst antibody in other groups.

Step 9: Determine whether average error between two generations of populations has difference

Identify whether average errors between two generations of populations have difference. If the difference value (absolute value) is greater than error threshold, this implies that there is difference between two generations and the population has not been searched completely. It is necessary to return to Step 4. The threshold value should be pre-determined. This study employs Taguchi parameter design to obtain the value in the subsequent section. The calculation of population error is shown in Eq. (9):

$$\begin{aligned} \text{ Average } \text{ population } \text{ error }=\text{ Population } \text{ error }=\sum _i^p \frac{\text{ SED }_i }{n}, \end{aligned}$$

(9)

where $n$ stands for number of population in this generation.

Step 10: Autogenous suppression

The autogenous suppression aims to expand the search scope by deleting excessively similar antibodies and prevent regional optimal solution. This study arrays the optimal populations selected in the last step in descending order and then calculate the Euclidean distance between pairwise antibodies, as shown in Eq. (10). If the distance is smaller than the autogenous suppression threshold $\sigma _{s}$, the antibody with lower affinity is deleted, and the better antibody is retained and used as memory cell $M$; memory cell $M$ is multiplied by $d$ % which is number of the remaining cells $R$. Thus, the number of memory cells and remaining cells in each iteration is different with autogenous suppression. The autogenous suppression $\sigma _{s}$ and $d$ % are also parameters needing determination. This study employs Taguchi parameter design to obtain the values in the subsequent sections.

$$\begin{aligned} d(C_i ,C_j )=\left\| {C_i -C_j } \right\| \end{aligned}$$

(10)

Step 11: Determine whether the number of iterations is statisfied or not

If the preset iteration number is reached, then stop; Otherwise, go back to Step 3.

4 Computational experiences

This section applies some benchmark data sets to assess the proposed algorithm, aiNetK, in comparison with particle swarm optimization and artificial immune system-related algorithms. A detailed discussion is presented as follows.

4.1 Data sources

This section adopts four data sets provided by the Web site of the Department of Information and Computer Science, University of California (http://archive.ics.uci.edu/ml/datasets.html). Table 1 lists the corresponding information including data set name, number of data, number of features, and number of clusters, for each data set.

Table 1 Benchmark data sets

Full size table

4.2 Taguchi experimental design

The metaheuristic performance is affected by the combination of parameters. The full factorial analysis is very time consuming, a very large number of cases has to be studied if more than three parameters are considered. To reduce the required number of simulations, a Taguchi method based on orthogonal parametric arrays can be used [35]. Orthogonal arrays only identify the main effects and not the interactions between the parameters. This allows for an efficient screening of a large number of parameters and determining which parameters have a larger impact on the performance.

For designing aiNet algorithm, it is necessary to determine the related parameters including: size of population cell ($N)$, multiples of clone ($N_{c})$, percentage ($d)$ of new cells, suppression threshold, and $\beta $. These 5 parameters are used as design factors of the Taguchi experimental design. Each factor has three levels. This study refers to the parameters in the past literatures. Table 2 shows parameter setting of the literatures and an optimal parameter combination from several tests in this study. It is found that the $N_{c}$ in the literature is the same. Thus, this study slightly adjusts $N_{c}$ in order to obtain better parameter combination. This section uses the Iris data set as the object of the experimental combination analysis. The orthogonal array ${L}_{27} (5^{3})$ is used for experiment, and software package Minitab is used for Taguchi experimental design. The SED is applied as the criterion. A smaller SED indicates better clustering result. Thus, the experiment belongs to small-the-better. S/N ratio is signal-to-noise ratio and is used to evaluate system quality stability in the Taguchi design of experiments. A higher S/N ratio indicates smaller noise and better generated parameter quality. To sum up, the parameter setting of the proposed aiNetK algorithm and PSO, AIS, aiNet, PSKO, and AISK algorithms is shown in Table 3.

Table 2 Factor levels

Full size table

Table 3 Correlation parameters of the algorithms

Full size table

4.3 Experimental results and testing

4.3.1 Algorithm convergence

This study presents the convergence of different algorithms in the four benchmark data sets, as shown in Figs. 3, 4, 5 and 6. Based on the SED convergence map, ai-Net can evaluate average fitness value error and suppress excessively similar antibodies. Thus, convergence of the four data sets is quicker than the AIS and the PSO. Through local search of K-means algorithm with quick convergence, the proposed aiNetK algorithm in the four data sets has faster and better convergence than other algorithms.

4.3.2 Algorithm clustering results

(1) :: Mean value and standard deviation of SED

This study tests each algorithm in each benchmark data set 30 times and took the average value as the testing result, as shown in Table 4. As seen, in the most benchmark data sets, the SED and standard deviation obtained by the aiNetK algorithm are both the lowest.

Table 4 Clustering results of algorithms

Full size table

(2) :: Computational time

For computational time, since aiNet has the characteristic of evaluating average fitness value error, the time needed by the proposed aiNetK is longer than those of other algorithms. The computational time for each algorithm is listed in Table 5.

Table 5 Computational time (seconds) of each algorithm in each benchmark data set

Full size table

(3) :: Required number of iterations for convergence

To compare rapid convergence capability of the algorithms, the average value of SED of each benchmark data set is used as threshold value. Each algorithm is stopped, and the iteration number is recorded when the threshold value is reached. Contrarily, if the algorithm does not reach the threshold value when the iteration exceeds 2,000 times, convergence is regarded to fail. This study tests each algorithm 10 times in each benchmark data set, and the average iteration number of convergence is shown in Table 6. As shown, PSO and AIS have fast computing speed but worse search capability, and thus, most of data sets cannot reach threshold value. The aiNetK needs longer computational time but has better searching capability, it can reach convergence threshold needing minimum number of iterations.

Table 6 Average convergence times of each algorithm in each benchmark data set

Full size table

(4) :: Accuracy

The accuracy of the clustering result of each algorithm for the four benchmark data sets is shown in Table 7. It indicates that the accuracy of proposed aiNetK for the four data sets is the highest among other algorithms.

Table 7 Accuracy of clustering result of each algorithm in each benchmark data set

Full size table

4.3.3 Statistical test

Through confidence level of 95 % ($\alpha =0.05$), it is verified whether difference between aiNetK and other five algorithms is significant or not. $\mu _{\mathrm{aiNetK}} $ is average SED value of aiNetK, $\mu _{\mathrm{PSO}} $ is average SED value of PSO, $\mu _{\mathrm{PSKO}} $ is average SED value of PSKO, $\mu _{\mathrm{AIS}} $ is average SED value of AIS, $\mu _{\mathrm{AISK}} $ is average SED value of AISK, and $\mu _{\mathrm{aiNet}} $ is average SED value of aiNet. Then, pairwise comparison is verified:

The test statistic ($Z_0 )$ is as follows:

$$\begin{aligned} Z_0 =\frac{\left( {\overline{X_1 } -\overline{X_2 } } \right) -\left( {\mu _1 -\mu _2 } \right) }{\sqrt{\frac{S_1^2 }{n_1 }+\frac{S_2^2 }{n_2 }}}. \end{aligned}$$

(11)

The left tail test of single-tailed test was conducted. If the test statistic value $Z_0 <-Z_{\alpha } $, then ${H}_0 $ is rejected. Table 8 shows test results of other algorithms in the four data sets by using aiNetK.

Table 8 Test of other algorithms in the four data sets by using aiNet

Full size table

In the statistical test, aiNetK is significantly better than other five algorithms in the benchmark data sets, except versus PSKO algorithm in Wine data set. Based on the above results, the average computational time of the aiNetK is the longest, since aiNetK integrates the two algorithms together. In addition, its algorithm structure is more complex than other algorithms, thus resulting in longest computational time. However, aiNetK has faster convergence. Further, aiNetK can converge to lowest average SED value. The velocity of convergence, convergence result, and accuracy of clustering result of aiNetK is better than those of other algorithms.

5 Conclusions

This study has proposed a novel cluster analysis technique, aiNetK algorithm, which integrates aiNet and K-means algorithms together. This can improve the clustering performance. The computational results of for benchmark data sets reveal that it has the lowest SED value compared to other clustering algorithms. In the future, the current idea can be applied to automatic clustering algorithm which does not need the number of clusters in advance. Additionally, other metaheuristics can also be applied like bee colony optimization algorithm.

References

Al-Sultan K (1995) A tabu search approach to the clustering problem. Pattern Recognit 28(9):1443–1451
Article Google Scholar
Bezdek J (1980) A convergence theorem for the fuzzy ISO-DATA clustering algorithm. IEEE Trans Pattern Anal Mach Intell 2:1–8
Article MATH Google Scholar
Bezdek J, Hathaway R (1992) Numerical convergence and interpretation of the fuzzy c-shells clustering algorithms. IEEE Trans Neural Netw 3(5):787–793
Google Scholar
Bezerra GB, Barra TV, De Castro LN, Von Zuben FJ (2005) Adaptive radius immune algorithm for data clustering. In: Lecture notes in computer science, vol 3627, pp 290–303
Chiu CY, Kuo IT, Lin CH (2009) Applying artificial immune system and ant algorithm in air-conditioner market segmentation. Expert Syst Appl 36(3):4437–4442
Article Google Scholar
Estivill-Castro V, Lee I (2000a) AMOEBA: hierarchical clustering based on spatial proximity using delaunay diagram. In: Proceedings of the 9th international spatial data handling (SDH2000), pp 10–12
Estivill-Castro V, Lee I (2000b) AUTOCLUST: automatic clustering via boundary extraction for massive point data sets. In: Proceedings of the 5th international conference geo-computation, pp 23–25
Forgy E (1965) Clustering analysis of multivariate data: efficiency versus interpretability of classification. Biometrics 21:768–769
Google Scholar
Geva AB (1999) Hierarchical unsupervised fuzzy clustering. IEEE Trans Fuzzy Syst 7(6):723–733
Article Google Scholar
Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. In: Proceedings ACM SIGMOD international conference management of data, pp 73–84
Guha S, Rastogi R, Shim K (2000) ROCK: a robust clustering algorithm for categorical attributes. Inf Syst 25(5):345–366
Article Google Scholar
Hall L, Özyurt I, Bezdek J (1999) Clustering with a genetically optimized approach. IEEE Trans Evol Comput 3(2):103–112
Article Google Scholar
Hamerly G, Elkan C (2003) Learning the K in K-means. In: Proceedings of 7th annual conference on neural information processing systems
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, Los Altos, CA
Google Scholar
Hart E, Ross P (2003) Exploiting the analogy between the immune system and sparse distributed memories. Genet Program Evol Mach 4(4):333–358
Article Google Scholar
Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Comput 32(8):68–75
Article Google Scholar
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, London
Book Google Scholar
Krishna K, Murty MN (1999) Genetic K-means algorithm. IEEE Trans Syst Man Cybern 29(3):433–439
Article Google Scholar
Krishnapuram R, Keller J (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110
Article Google Scholar
Kuo RJ, Wang HS, Hu TL, Chou SH (2005) Application of ant K-means on clustering analysis in data mining. Int J Comput Math Appl 50:1709–1724
Article MATH MathSciNet Google Scholar
Kuo RJ, Wang MJ, Huang TW (2011) An application of particle swarm optimization algorithm to clustering analysis. J Soft Comput 15(3):533–542
Article Google Scholar
Li XY, Xu HL, Cheng ZG (2008) One immune simplex particle swarm optimization and it’s application. In: Proceedings of the 4th international conference on natural computation, pp 331–335
Liao XF, Hu LT, Jin H (2010) Energy optimization schemes in cluster with virtual machines. Clust Comput 13:113–126
Article Google Scholar
Liu F, Wang Q, Gao X (2006) Survey of artificial immune system. In: Proceedings of the 1st international symposium on systems and control in aerospace and astronautics, pp 19–21
Lo JTH (2012) A cortex-like learning machine for temporal hierarchical pattern clustering, detection, and recognition. Neurocomputing 78(1):89–103
Article Google Scholar
Lu B, Ju F (2012) An optimized genetic K-means clustering algorithm. In: Proceedings international conference on computer science and information processing, pp 1296–1299
Ma W, Jiao L, Gong M (2009) Immunodominance and clonal selection inspired multi-objective clustering. Prog Nat Sci 19(6):751–758
Article MathSciNet Google Scholar
Maraziotis IA (2012) A semi-supervised fuzzy clustering algorithm applied to gene expression data. Pattern Recognit 45(1):637–648
Article MATH Google Scholar
Mu Y, Sheng A (2009) Evolutionary diagonal recurrent neural network with improved hybrid EP-PSO algorithm and its identification application. Int J Innov Comput Inf Control 5(3):1615–1624
Google Scholar
Nasraoui O, Rojas C, Cardona C (2006) A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation. Comput Netw 50(10):1488–1512
Article Google Scholar
Pasti R, Castro LND (2006) An immune and a gradient-based method to train multi-layer perceptron neural networks. In: Proceedings of the international joint conference on neural networks, pp 2075–2082
Saad MF, Lee J, Kwon O, Alimi AM (2011) Context data clustering based on modified fuzzy possibilistic C-means algorithm for efficient context-aware computing services. Inf Int Interdiscip J 14(9):3101–3111
Google Scholar
Sotiropoulos DN, Tsihrintzis GA, Savvopoulos A, Virvou M (2006) Artificial immune system-based customer data clustering in an e-shopping application. In: Lecture notes in computer science, vol 4251, pp 960–967
Tang N, Vemuri V (2005) An artificial immune system approach to document clustering. In: Proceedings of the ACM symposium on applied computing, vol 2, pp 918–922
Taguchi G, Chowdhury S, Wu Y (2005) Taguchi’s quality engineering handbook. Wiley, London
MATH Google Scholar
Timmis J, Edmonds C (2004) A comment on Opt-AiNET: an immune network algorithm for optimisation. In: Lecture notes in computer science, vol 3102, pp 308–317
Vellingiri J, Pandian SC (2011) Fuzzy possibilistic c-means algorithm for clustering on web usage mining to predict the user behavior. Eur J Sci Res 58(2):222–230
Google Scholar
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Google Scholar
Yan Y, Chen L, Tjhi WC (2013) Semi-supervised fuzzy co-clustering algorithm for document categorization. Knowl Inf Syst 34(1):55–74
Google Scholar
Younsi R, Wang W (2004) A new artificial immune system algorithm for clustering. In: Lecture notes in computer science, vol 3177, pp 58–64
Yue X, Abraham A, Chi ZX, Hao YY, Mo H (2007) Artificial immune system inspired behavior-based anti-spam filter. Soft Comput 11(8):729–740
Article Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD conference management of data, pp 103–114
Zhao W, He Q, Ma H, Shi Z (2012) Effective semi-supervised document clustering via active learning with instance-level constraints. Knowl Inf Syst 30(3):569–587
Article Google Scholar
Zeng S, Tong X, Sang N, Huang R (2013) A study on semi-supervised FCM algorithm. Knowl Inf Syst (in press)

Download references

Acknowledgments

This study was financially supported by the National Science Council of the Taiwan Government, under contract number NSC99-2410-H-011-009-MY3. This support is greatly appreciated.

Author information

Authors and Affiliations

Department of Industrial Management, National Taiwan University of Science and Technology, No. 43, Section 4, Kee-Lung Road, Taipei, Taiwan, Republic of China
R. J. Kuo & W. C. Cheng
TSMC Solid State Lighting Co., Ltd., No. 9, Li-Hsin 4th Road, Hsinchu Science Park, Hsinchu, Taiwan, Republic of China
S. S. Chen
Department of Industrial Engineering and Management, Yuan Ze University, No. 135, Yuan-Tung Road, Chungli, Taoyuan, Taiwan, Republic of China
C. Y. Tsai

Authors

R. J. Kuo
View author publications
You can also search for this author in PubMed Google Scholar
S. S. Chen
View author publications
You can also search for this author in PubMed Google Scholar
W. C. Cheng
View author publications
You can also search for this author in PubMed Google Scholar
C. Y. Tsai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. J. Kuo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuo, R.J., Chen, S.S., Cheng, W.C. et al. Integration of artificial immune network and K-means for cluster analysis. Knowl Inf Syst 40, 541–557 (2014). https://doi.org/10.1007/s10115-013-0649-3

Download citation

Received: 03 September 2012
Revised: 06 April 2013
Accepted: 16 April 2013
Published: 04 May 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10115-013-0649-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Integration of artificial immune network and K-means for cluster analysis

Abstract

Similar content being viewed by others

Application of K-Means Clustering Algorithm in Automatic Machine Learning

A modified unconscious search algorithm for data clustering

Cluster Analysis Using Improved Bio-inspired Fuzzy C-Means Algorithm

1 Introduction