1 Introduction

Image segmentation is one of the most difficult and challenging problems in the image processing. It denotes a process by which an image is partitioned into non-overlapping regions such that each region is homogeneous and the union of two adjacent is heterogeneous. The extent of homogeneity of the segmented region can be measured using some image property [e.g. Pixel intensity (Dunn 1974)]. Most of segmentation algorithms aim at the concrete problem; there is not a universal segmentation method for all images.

Clustering can be defined as the optimal partitioning of a given set of n data points into c subgroups, such that data points belonging to the same group are as similar to each other as possible, whereas data points from two different groups have the maximum difference. Image segmentation can be treated as a clustering problem where the features describing a pixel correspond to a pattern, and each image region corresponds to a cluster. Therefore, many clustering algorithms have widely been used to solve the segmentation problem [e.g. K-means (Tou and Gonzalez 1974), FCM, ISODATA (Ball and Hall 1967) and Snob (Wallace and Boulton 1968)]. In the last decades, fuzzy segmentation methods, especially the fuzzy c-means algorithm (FCM) (Bezdek 1981), have been widely used in the image segmentation (Udupa and Samarasekera 1996; Yamany et al. 1999) and such a success chiefly attributes to introduction of fuzziness for the belongingness of each image pixel, which makes the clustering methods able to retain more information from the original image than the crisp or hard segmentation methods (Pham and Prince 1999). However, most of the unsupervised fuzzy clustering algorithms assume a prior knowledge of the number of classes, c, while in many practical situations, the appropriate number of classes is unknown or impossible to determine even approximately. Find an optimal number of clusters in a large dataset is usually a challenging task.

Because the fuzzy partitions obtained using the FCM algorithm depend on the choice of c. It is necessary to validate each of the fuzzy c-partitions once they are found (Pal and Bezdek 1995). This validation is carried out by a cluster validity index, which evaluates each of the fuzzy c-partitions and determines the optimal partition or the optimal number clusters (c) from them. Many validation criteria have been proposed for evaluating fuzzy partitions (Kim et al. 2004), such as Bezdek’s partition coefficient (V pc) and partition entropy (V pe), Xie–Beni’s (V xb), Fukuyama–Sugeno’s \((V_{\rm FS}),\) Kwon’s \((V_{\rm K}),\) Boudraa’s \((V_{\rm CWB})\) and Dae–WonKim’s \((V_{\rm OS}).\) Researchers use nests ISODATA and GA to get the optimal partition according to various validation criteria. However, it is necessary to pre-define the maximum number of clusters (c) when we use the exhaustive attack method to get the optimal partition number. Obviously, the exhaustive attack method is feasible when the data point is less. But the exhaustive attack method is impossible when the data point is big. Moreover, we have not a generic theory and implement method to determine the maximum number of clusters (c) at present. In Kim et al. (2004), a new cluster validity index is proposed that determines the optimal partition and optimal number of clusters for fuzzy partitions obtained from the fuzzy c-means algorithm. The proposed validity index exploits an overlap measure and a separation measure between clusters. In Hall and Kanade (2005), the authors use a fuzzy cluster validity metric proposed by Xie and Beni as the criterion for evaluating a partition produced by swarm based clustering. In Shelokar et al. (2004), Hall and Kanade (2007) and Huang et al. (2006), colony algorithm is used to get the initial cluster centers.

In this article, an automatic fuzzy clustering algorithm (AFCM) is proposed which can determine the number of clusters automatically. In order to get better segmentation quality, we presents an algorithm based on AFCM algorithm, called automatic modified fuzzy c-means cluster segmentation algorithm (AMFCM). AMFCM algorithm incorporates spatial information into the membership function for clustering. The spatial function is the weighted summation of the membership function in the neighborhood of each pixel under consideration. Experimental results show that AMFCM algorithm is an effective method and outperforms the standard FCM algorithm and AFCM algorithm.

The rest of the paper is organized as follows. In Sect. 2, we formally describe the problem of fuzzy clustering, briefly outline the standard FCM algorithm, and provide a fuzzy clustering validity measures. The proposed algorithm is illustrated in Sect. 3. Experimental results along with statistical tests of significance are presented in Sect. 4. In Sect. 5, we conclude this paper.

2 Fuzzy c-means clustering algorithm

Let \(X = \left\{{{x_1},{x_2}, \ldots ,{x_n}} \right\}\) be a set of n patterns or data points, each having d features. These patterns can also be represented by a profile data matrix \({X_{n \times d}}\) having n d-dimensional row vectors. The ith row vector \({\mathop X\limits^{\rightarrow} }_i\) characterizes the ith object from the set X and each element \({X_{ij}}\) in \({\mathop X\limits^{\rightarrow} }_i\) corresponds to the jth real value feature \(j = 1,2, \ldots,d\) of the ith pattern \(i = 1,2, \ldots,n.\) Given such a \({X_{n \times d}},\) a clustering algorithm tries to find a partition \(X = \left\{ {{X_1},{X_2}, \ldots ,{X_c}} \right\}\) such that the similarity of the patterns in the same cluster \({X_i}\) is maximum and patterns from different clusters differ as far as possible. The partitions should maintain the following properties:

  1. (1)

    \({X_1} \cup {X_2} \cup \cdots \cup {X_c} = X\)

  2. (2)

    \({X_i} \cap {X_j} = \phi , 1 \le i \ne j \le c\)

  3. (3)

    \({X_i} \ne \phi\;\forall i \in \left. {\left\{ {1,2, \ldots ,c} \right.} \right\}\)

In the case of fuzzy clustering, a pattern may belong to all the classes with a certain fuzzy membership grade for each class. In this case, we need to evolve an appropriate partition matrix \(U = {[{u_{ij}}]_{c \times n}},\) where \({u_{ij}} \in [0,1],\) such that \({u_{ij}}\) denotes the grade of membership of the jth element to the ith cluster.

2.1 The standard FCM algorithm

The standard FCM algorithm is an iterative optimization that minimizes the cost function defined as follows:

$$ {J_m}(U,V) = \sum\limits_{i = 1}^c {\sum\limits_{j = 1}^n {u_{ij}^md_{ij}^2} } $$
(1)

subject to

$$ \begin{aligned} &\sum\limits_{i = 1}^c {{u_{ij}} = 1,} \quad 1 \le j \le n\\ &0 < \sum\limits_{j = 1}^n {{u_{ij}} < n, }\quad 1 \le i \le c \\ &\sum\limits_{i = 1}^c {\sum\limits_{j = 1}^n {{u_{ij}} = n}} \end{aligned} $$
(2)

where \(m \in (1,\infty)\) controls the fuzziness of the resulting partition, and \(m = 2\) is used in this study, \(u{}_{ij}\) represents the membership of pixel \({x_j}\) in the emphith clustering, \({d_{ij}} = \left\| {{x_j} - {v_i}} \right\|\) represents distance between the pixel \({x_j}\) and the cluster center \({v_i}.\) In the image clustering, the most commonly used feature is the gray-level value, or intensity of image pixel. Thus the FCM cost function is minimized when high membership values assigned to pixels whose intensities are close to the centroid of their clusters, and low membership values are assigned when the point is far from the centroid. The membership function represents the probability that a pixel belongs to a specific cluster. In the FCM algorithm, the probability is dependent solely on the distance between the pixel and each individual cluster center in the feature domain. The membership function and cluster centers are updated by the following:

$$ {u_{ij}} = \left[\sum\limits_{k = 1}^c {\left({\frac{d_{ij}^2} {d_{kj}^2}}\right)}^{\frac{1} {m - 1}}\right]^{ - 1} $$
(3)
$$ {v_i} = {\frac{\sum\nolimits_{j = 1}^n {u_{ij}^m{x_j}}} {\sum\nolimits_{j = 1}^n {u_{ij}^m} }} $$
(4)

Starting with an initial guess for each cluster center, the FCM converges to a solution for \({v_i}\) representing the local minimum or a saddle point of the cost function. Convergence can be detected by comparing the changes in the membership function or the cluster center at two successive iteration steps. After the convergence, defuzzification is applied to assign each pixel to a specific cluster for which the membership is maximal. The standard FCM algorithm degenerate to hard c-means algorithm when \({u_{ij}} \in \left\{ {0,1} \right\}\) and \(m = 1.\) In this case, the cost function defined as follows:

$$ {J_m}(U,V) = \sum\limits_{i = 1}^c {\sum\limits_{j = 1}^n {u_{ij}^{}d_{ij}^2}} $$
(5)

subject to

$$ u_{ij}=\left\{ \begin{array}{ll} 1 & d_{ik}=\min\limits_{1\leq r\leq n}\{d_{ir}\}\\ 0 & {\rm others}\\ \end{array} \right. $$
(6)

2.2 Cluster validity functions

In order to obtain a quantitative comparison, two types of cluster validity functions, fuzzy partition and feature structure, are often used to evaluate the performance of clustering in different clustering methods. The representative functions for the fuzzy partition are partition coefficient V pc (Bezdek 1974) and partition entropy V pe (Bezdek 1975). They are defined as follows:

$$ {V_{\rm pc}} = {\frac{1} {n}}\sum\limits_{i = 1}^c {\sum\limits_{j = 1}^n {u_{ij}^2} } $$
(7)

and

$$ {V_{\rm pe}} = {\frac{- 1} {n}}\sum\limits_{i = 1}^c {\sum\limits_{j = 1}^n {({u_{ij}}\log {u_{ij}})} } $$
(8)

The idea of these validity functions is that the partition with less fuzziness means better performance. As a result, the best clustering is achieved when the value V pc is maximal or V pe is minimal.

Other validity functions based on the feature structure, for example, Xie–Beni’s V xb (Xie and Beni 1991), defined as

$$ {V_{\rm xb}} = {\frac{\sum\nolimits_{i = 1}^c {\sum\nolimits_{j = 1}^n {{u_{ij}}^2{{\left\| {{x_j} - {v_i}} \right\|}^2}} } } {n\left({{\min }_{i \ne k}}\left\{ {{\left\| {{v_k} - {v_i}} \right\|}^2} \right\}\right)}} $$
(9)

A good clustering result generates samples that are compacted within one cluster and samples that are separated between different clusters. Minimizing V xb is expected to lead to a good clustering.

3 Automatic modified FCM algorithm

3.1 Automatic FCM algorithm

The standard FCM algorithm is often determined by the number of clusters manually and randomly initialized in the membership value for fuzzy clustering algorithm. In this paper, we propose a method to improve the standard FCM algorithm and which can automatically determine the optimal cluster number called automated fuzzy c-means algorithm. The proposed algorithm can be summarized as follows: firstly, we use hard c-means algorithm to find the initial two centriods which are closed to the real cluster centers. Secondly, FCM is applied to calculate the membership matrix that will be iterated. We calculate the pixel proportion of each cluster \(({\rm pop}_{i})\) by summarizing pixels in a cluster and diving by the number of image pixels. The optimal clusters will be reached when the \({\rm pop}_{i}\) of at most two clusters have values less than \(\delta.\) The parameter \(\delta\) is represented as the smallest percentage for the existing cluster. A cluster whose area is <10% will be discarded. Because it is too small an area to become a meaningful area. So, the value of \(\delta\) is set as 0.1.

The proposed algorithm is denoted AFCM algorithm and can be summarized in the following step:

Step 1:

Set the initialize two cluster centers by using a random generator from the original dataset. Set \(m = 2,\) decide \(\varepsilon,\) where \(\varepsilon\) is a small positive constant.

Step 2:

Calculate initial centroids by hard c-means algorithm by using (5).

Step 3:

Repeat

  1. (1)

    Update centroid \({V^{(l)}}\) by using (4).

  2. (2)

    Compute the membership matrix \({U^{(l)}}\) by using Eq. 3. Until \(\| {{U^{(l)}} - {U^{(l + 1)}}}\| < \varepsilon.\)

Step 4:

Check the automatic optimal cluster numbers:

  1. (1)

    Calculate \({\rm pop}_{i}\) value: \({\rm pop}_i = \left| {{c_i}} \right|/n,\) where \(\left| {{c_i}} \right|\) is the size of cluster \(i; i = 1,2, \ldots ,c.\) n is the total number of image pixels.

  2. (2)

    Stop process when the optimal cluster is realized, otherwise go to step 5.

Step 5:

Increase the number of cluster \(c =c+1,\) set initial value of new centroid \({v_c},\) go to step 3.

We can get the optimal membership matrix U when the algorithm is finished. Then, we can get the final segmentation image.

3.2 Automatic modified FCM algorithm

3.2.1 The spatial feature of pixel

One of the important characteristics of an image is that neighboring pixels are highly correlated. In other words, these neighboring pixels possess similar feature values, as we can know that they should be the close membership value according to the amended membership function and the probability that they belong to the same cluster is great. This spatial relationship is important in clustering, but it is not utilized in a standard FCM algorithm. All samples are used as dispersive points when using the standard FCM algorithm to cluster. So the standard FCM algorithm is sensitive to noise. To exploit the spatial information, a spatial function is defined as

$$ {h_{ij}} = {\frac{\sum\nolimits_{t \in {\Upomega _j}} {{u_{it}}{\beta _t}} } {\sum\nolimits_{k = 1}^c {\sum\nolimits_{t \in {\Upomega _j}} {{u_{kt}}{\beta _t}} } }} $$
(10)

where \({\Upomega _j}\) represents a square window centered on pixel \({x_j}\) in the spatial domain. A \(3 \times 3\) window was used in this paper. Just like the membership function, the spatial function \({h_{ij}}\) represents the probability that pixel \({x_j}\) belongs to ith clustering. The spatial function is the largest if all of its neighborhood pixels belong to ith clustering, and is the smallest if none of its neighborhood pixels belong to ith clustering. In the spatial function, c is the number of desired clusters, \({\beta _t}\) is the contribution factor of the neighbor \({x_t},\) and the weight \({u_{it}}\) is the membership of the pattern \({x_t}\) to the ith clustering. The factor \({\beta _t}\) represents the contribution of the neighbor \({x_t }\)’s membership to the overall spatial dissimilarity. Generally, the nearer the neighbor is, the stronger the interaction becomes, and the bigger the contribution will be. Consequently, for each neighbor \({x_t}(t \in {\Upomega _j}), {\beta _t}\) is a non-increasing function of the distance between sites j and t, and is defined as follow:

$$ {\beta _t} = {\frac{1} {1 + \exp (\theta \left\| {j - t} \right\|)}} $$
(11)

where the coefficient \(\theta\) determines how fast the interaction between two sites diminished with the increasing of the distance of them. A small \(\theta\) leads to a similar contribution of different neighbors; whereas a big \(\theta\) makes the spatial dissimilarity highly depend on those adjacent neighbors. In this paper, \(\theta\) is empirically set to 0.6.

3.2.2 Automatic modified FCM algorithm

With the spatial feature of pixel, the modified membership of the current pixel is defined as follows:

$$ u_{ij}^* = {\frac{{u_{ij}}{h_{ij}}} {\sum\nolimits_{k = 1}^c {{u_{kj}}{h_{kj}}} }} $$
(12)

the new cluster center is represented as follows:

$$ v_i^* = {\frac{\sum\nolimits_{j = 1}^n {{{(u_{ij}^*)}^m}{x_j} }} {\sum\nolimits_{j = 1}^n {{(u_{ij}^*)}^m} }} $$
(13)

then the proposed automatic modified FCM algorithm can be summarized in the following step:

Step 1:

Get the number of clusters by AFCM algorithm.

Step 2:

Calculate the spatial function \({h_{ij}}\) and \({\beta _t}\) by using (10) and (11).

Step 3:

Update centroid \({V^{(l)}}\) by using (13).

Step 4:

Update the membership matrix \({U^{(l)}}\) by using Eq. 12.

Step 5:

If \(\left\| {{U^{(l)}} - {U^{(l + 1)}}} \right\| < \varepsilon\) then stop; otherwise, jump to step 2.

4 Experiment results and analysis

In this section, we test the three algorithms which are FCM, AFCM and AMFCM on three images. One is the cameraman image with \(256 \times 256\) pixels, another is the simulative brain image with \(256 \times 256\) pixels, and the last one is the lena image is size of \(512 \times 512.\) They are shown, respectively, in Figs. 1a, 2a and 3a. For all cases, unless otherwise stated, the fuzzy weighting exponent \(m = 2, \delta = 0.1\) and \(\varepsilon = 1e - 5.\) All the algorithms are coded in matlab7.4. For each example, after we get the appropriate cluster number from the AFCM algorithm, we use the cluster number to process FCM method. Figures 1b, 2b and 3b show the segmented results of the FCM algorithm for the three images. Figures 1c, 2c and 3c show the segmented results of the AFCM algorithm for the three images. The segmented results of the AMFCM algorithm for the three images are, respectively, showed in Figs. 1d, 2d and 3d. Table 1 tabulates the experimental results of the three algorithms for the three images.

Fig. 1
figure 1

Comparison of segmented results on cameraman image with 256 × 256 pixels. a primary image, b segmented result of FCM algorithm, c segmented result of AFCM algorithm, d segmented result of AMFCM algorithm

Fig. 2
figure 2

Comparison of segmented results on simulative brain image with 256 × 256 pixels. a primary image, b segmented result of FCM algorithm, c segmented result of AFCM algorithm, d segmented result of AMFCM algorithm

Fig. 3
figure 3

Comparison of segmented results on lena image with 512 × 512 pixels. a primary image, b segmented result of FCM algorithm, c segmented result of AFCM algorithm, d segmented result of AMFCM algorithm

Table 1 The experimental results of the three algorithms for three images

Table 1 shows the validity functions used to evaluate the performance of the three algorithms for three images. As we can see, for V pc (V pe), the AFCM algorithm is smallest (greatest). AFCM algorithm can automatically estimate the appropriate number of clusters, but the segmentation quality of AFCM is worst in the three algorithms. Visually, the quality of Figs. 1d, 2d and 3d are best which indicates that AMFCM algorithm performs best. For V pc (V pe), the AMFCM algorithm is greater (smaller) than conventional FCM and AFCM algorithm. In most cases, the validity functions based on the fuzzy partition were best for the AMFCM algorithm which indicates that AMFCM algorithm is an effective method and outperforms the standard FCM algorithm and AFCM algorithm.

5 Conclusions

FCM algorithm is one of the most popular methods for image segmentation. However, the cluster number must be known in advance when we use the standard FCM algorithm. This paper has presented a new AFCM algorithm. An important feature of AFCM algorithm is that it is able to find the optimal number of clusters automatically. However, the segmentation quality of AFCM is poor. So, we proposed AMFCM algorithm based on AFCM algorithm which not only can get better segmentation quality but also can find the optimal number of clusters automatically. Experimental results show that the proposed AMFCM algorithm outperforms the standard FCM algorithm and AFCM algorithm. Not only does the proposed AMFCM algorithm can automatically estimate the appropriate number of clusters, it also has better segmentation quality than FCM’s.