1 Introduction

Image segmentation is a fundamental problem in image processing, but it also represents a difficult research problem. Image segmentation is widely used in the fields of pattern recognition, computer vision, machine learning and medical image processing. Based on whether the processing image contains colour information, we can divide the image segmentation techniques into the grey image segmentation techniques and the colour image segmentation techniques. The family of grey image segmentation techniques is widely applied in processing text images, industrial images and medical images and includes the edge detection algorithm [2], the clustering algorithm [3], the thresholding segmentation algorithm [6, 7, 13, 18], and the partial differential equation based segmentation algorithm [5]. The family of colour image segmentation techniques is mainly used to process, for example, natural images and videos [16]. Common methods in colour image segmentation techniques include the interactive segmentation algorithm based on graph theory [4, 10, 11] and the clustering algorithm [8, 14, 19]. Recently, Sarkar etc. proposed a multilevel colour image segmentation algorithm based on minimum cross entropy and differential evolution and obtained efficient performance [12].

The interactive segmentation algorithm based on graph theory belongs to the class of supervised methods and is used mainly in cases requiring human interaction, such as editing photos. The clustering algorithm is an unsupervised method and can be used in colour image clustering and the video processing; the classical algorithm implementing this strategy is the FCM method [8, 9, 14, 19]. Because the FCM clustering algorithm is not restricted by the dimensionality of the sample data, it can process the one-dimensional grey-scale images, the three-dimensional colour images and the high-dimensional data. Together with the robustness of clustering performance, the FCM clustering algorithm has been extensively researched and has many applications [3, 8, 9, 14, 19]. Krinidis et al. proposed a robust fuzzy local information c-means clustering algorithm (FLICM) [3] in 2010, with a comparative analysis of the existing clustering algorithms, including the traditional FCM algorithm, the constrained FCM algorithm, the enhanced FCM algorithm and the fast generalised FCM algorithm. Additionally, the FLICM algorithm exhibits notably strong anti-noise performance. As the key parameter that affects the FCM clustering performance are the cluster number and the selection of the initial cluster centroids, which are set by users of the above clustering algorithms, the clustering performance shows no substantial improvement. To improve the FCM clustering performance, Mok et al. presented a robust adaptive clustering analysis method for automatic identification of clusters [9]. However, it is difficult to apply the method to image clustering with a large sample number. Ma et al., in 2006, proposed a cost-function approach to rival penalised competitive learning (DSRPCL) [8]. This method can automatically identify the clusters through a penalised competitive mechanism, but a large number of initial clusters are still needed to be set by the users, and in addition, the initial cluster centroids are obtained by a randomised approach. Yu et al. put forward an ant colony–fuzzy C-means hybrid algorithm (AFHA) [19]. First, the ant colony optimisation is used to obtain the initial cluster centroids; second, the traditional FCM algorithm is applied to conduct colour image clusters. However, this algorithm performs slowly in the ant colony optimisation step, which results in low efficiency [14]. On this basis, Tan et al. proposed a colour image segmentation using histogram thresholding-fuzzy C-means hybrid approach (HTFCM) [14]. This approach uses the histogram information to effectively improve the clustering efficiency and performance, but whether the obtained initial cluster centroids are reasonable, is basically influenced by three control parameters. In the cluster initialisation process, an unreasonable hard division method is employed in the algorithm, which results in much misclassification. Moreover, the pixels are taken as the sample data, leading to poor efficiency due to the large number of pixels.

Based on the above analysis, this paper proposes a histogram-based colour image fuzzy clustering algorithm (HCIFCM). This algorithm takes full advantage of the important piece of information that optimal segmentation threshold is located in the valley, and it can automatically determine the number of clusters and the initial cluster centroids accurately. To improve the clustering efficiency, the WaterShed algorithm [17] is used to perform image pre-segmentation. Making use of regions obtained through the pre-segmentation and not the original pixels as the clustering samples has improved the clustering efficiency greatly. To verify the effectiveness of the algorithm, many testing experiments are performed on the Berkeley image database [1], and a comparison between the DSRPCL algorithm [8] and HTFCM algorithm [14] is made in terms of the running time and the PRI metric. The experiments demonstrate that the clustering performance of our algorithm is more desirable.

The structure of this paper is organized as follows. In Section 2, the HTFCM algorithm is reviewed. In Section 3, the novel HCIFCM algorithm is proposed for addressing the problem of low efficiency due to computational complexity and poor clustering performance of the HTFCM algorithm. Section 4 gives the experimental results on some colour natural images. Finally, the conclusion is drawn in Section 5.

2 Motivation

In 2011, Tan et al. presented a colour image segmentation algorithm using histogram thresholding-fuzzy C-means hybrid approach (HTFCM algorithm). The basic steps of this algorithm are as follows.

  • Step.1 First, apply a peak search method to find all the peak sets (P R , P G and P B ) of the statistical RGB histogram in a colour image. The corresponding number of the peaks is denoted by n R  = |P R |, n G  = |P G | and n B  = |P B |, respectively. The basic principle of the peak search method is to retain the candidate peak whose number of pixels is greater than the given threshold H.

  • Step.2 Merge the peaks obtained in Step.1 to form candidate cluster centroids (i R , i G , i B ), where, i R  ∈ P R , i G  ∈ P G and i B  ∈ P B . Subsequently, assign every image pixel to the nearest cluster centroid and count the number of pixels in each cluster. Eliminate the cluster centroid, for which cluster, the number of pixels assigned is less than a given threshold V. The pixels in each eliminated cluster centroid are reassigned to the nearest cluster centroid. Finally, update each cluster centroid.

  • Step.3 Calculate the Euclidean distance D for any two cluster centroids. Merge the nearest cluster and update the cluster centroid if the minimum distance D min < dc, where dc is the given threshold. Stop the iteration unless D min < dc is not satisfied. Then, the initial cluster centroids are obtained.

  • Step.4 All the pixels in the image will be taken as the sample data. To obtain the cluster results, we implement the FCM algorithm for the sample data, using the initial cluster centroids gained in the previous step.

From the basic steps of the HTFCM algorithm, we can see that whether the obtained initial cluster centroids are reasonable is mainly affected by three control parameters, H, V and dc. For example, in Step1, setting too small a threshold H will result in excessive candidate peaks being reserved, which increases the difficulty of subsequent processing; while if it is set too large, many useful peaks will be lost, which makes the final obtained initial cluster centroids unreasonable. The latter two parameters are critical, as well. These parameters are often selected based on a subjective experience that will directly affect the reasonableness of the initial cluster centroids. In addition, there are many other problems. For instance, after the peak is searched in Step1, the division method adopted by the HTFCM algorithm, which is to divide each pixel into the nearest cluster, is not reasonable. The division method takes the mean of two peaks as the segmentation threshold, while the optimal threshold should be in the valley between the two peaks. However, the mean is not in the valley, or it deviates far away in general. Therefore, the HTFCM algorithm may cause much misclassification. Moreover, all the pixels are taken as the sample data in the FCM clustering process, causing poor efficiency.

3 Method description

To address the problems above, we presented a histogram-based colour image fuzzy clustering algorithm (HCIFCM). The detailed process is presented in Fig. 1.

Fig. 1
figure 1

HCIFCM algorithm process

3.1 Histogram preprocessing

In a colour image, the value range of grayscale i is [0, 255], and the component histograms of RGB - the three primary colours - are denoted by R(i), G(i) and B(i), for each component, respectively. Smoothing is made for each component histogram as follows,

$$ {H}_S(i)=\frac{1}{2k+1}{\displaystyle \sum_{x=i-k}^{i+k}S(x)} $$
(1)

where k is a positive integer, and S(x) is a component histogram, S ∈ {R, G, B}. For example, H R (i) denotes the smoothed R component histogram. The parameter k should not be too large; otherwise, it will cause excessive smoothness in the histogram [14]. Commonly, k can be set from 2 to 5, which will not only make the histogram achieve smoothness but also ensure that the image detail information is not lost too much [14]. Because k is small, many local extrema may still be exhibited in the smoothed histogram; therefore, we need to eliminate these to further improve the histogram waveform’s smoothness and reduce the influence of the local extrema. This paper employs the following equation to search for all the local extrema in the histogram [14].

$$ \left\{\begin{array}{l} PMax=\left\{i\Big|{H}_S(i)>{H}_S\left(i-1\right)\wedge {H}_S(i)>{H}_S\left(i+1\right)\right\}\hfill \\ {} VMin\kern1em =\left\{i\Big|{H}_S(i)<{H}_S\left(i-1\right)\wedge {H}_S(i)<{H}_S\left(i+1\right)\right\}\hfill \end{array}\right. $$
(2)

where PMax is the maximum set and VMin is the minimum set. To eliminate the extrema, the greater value between the two neighbours is used to replace the maximum, while the lower value is used to replace the minimum. That is,

$$ \left\{\begin{array}{l}\begin{array}{ll}{H}_S(i)= \max \left\{{H}_S\left(i-1\right),{H}_S\left(i+1\right)\right\},\hfill & if\kern0.5em i\in PMax\hfill \end{array}\hfill \\ {}\begin{array}{ll}{H}_S(i)= \min \left\{{H}_S\left(i-1\right),{H}_S\left(i+1\right)\right\},\hfill & if\kern0.5em i\in VMin\hfill \end{array}\hfill \end{array}\right. $$
(3)

After the above pre-processing, the histogram waveform will be smoother, so that the peak and the valley can be positioned accurately.

3.2 Histogram multilevel division and mergering

In this study, the fast location method based on the grayscale wave transformation [18] was employed to obtain all the peaks and valleys in the whole histogram waveform. When the difference between a putative peak and its neighbouring valley is larger than the wave threshold T, we can firmly determine a peak or a valley. Without pre-processing for the histogram, different grayscale images need to be given different wave thresholds to achieve optimal segmentation results. The value of T ranges from 10 to 40. According to the experiment, we find that for certain images in other application fields, the value of T is larger or smaller. The reason is that, in this method, the grayscale wave change, from a row, a column or a straight line direction at any angle, is taken as a waveform that has no statistical properties and is easily disturbed by noise or other factors. This results in the sensitivity to threshold T.

However, the application of the fast location method in reference [18] to this paper is more reasonable and effective. On the one hand, the object being processed in this study is the image histogram with special statistical properties and little external noise or other factor influences. On the other hand, the histogram waveform is smoother after pre-processing. The above two properties can alleviate the sensitivity to the threshold T. Through experiments, we find that the peak and the valley can be located accurately as long as we set a small fixed wave threshold.

The valley obtained by the fast location method is denoted by \( {V}_S(i)=\left\{{i}_1,{i}_2,\cdots, {i}_{\iota },\cdots, {i}_{n_S}\right\} \), where n S is the total number of valleys and n S  + 1 is the total number of peaks. If n S  = 0, that is V S (i) = ∅, the histogram contains only one peak, namely the unimodal distribution; otherwise, the histogram is called multimodal. We take the obtained valley as the threshold for dividing the histogram into n S  + 1 categories. In this step, all the wave thresholds are denoted by T 1. The detailed division process is as follows.

$$ \left\{\begin{array}{ll}{L}_S(i)=0,\hfill & if\kern0.5em {n}_S=0\hfill \\ {}{L}_S(i)=0,\hfill & if\kern0.5em {n}_S>0\wedge i<{i}_{\iota}\wedge \iota =1\hfill \\ {}{L}_S(i)={n}_S,\hfill & if\kern0.5em {n}_S>0\wedge i>{i}_{\iota}\wedge \iota ={n}_S\hfill \\ {}{L}_S(i)=\iota -1,\hfill & if\kern0.5em {n}_S>0\wedge i>{i}_{\iota -1}\wedge i<{i}_{\iota}\wedge \iota >1\hfill \end{array}\right. $$
(4)

where L S (i) is a division for each component histogram.

According to each component histogram above, the number of peaks is obtained, denoted by n R , n G and n B respectively. Next, we obtain the classification numbers by the histogram multilevel division method. These are C R , C G and C B , where C R  = n R  + 1, C G  = n G  + 1 and C B  = n B  + 1. And the number of all possible classification patterns is L (L = C R  × C G  × C B ). After dividing each component histogram, a merge procedure is needed to obtain a single partition of the pixels in the whole image. For any pixel in the colour image, we denote its value as (r, g, b), and the classification number which this pixel belongs to is defined as

$$ L\left(r,g,b\right)={L}_R(r)\times {C}_G\times {C}_B+{L}_G(g)\times {C}_B+{L}_B(b) $$
(5)

3.3 Initializing clustering centroids

After finishing the division for all the pixels, we count the number of the pixels contained in each class. Let H(i) denote the number of pixels in Class i, where i ∈ [0, L − 1], then a new histogram H(i) is obtained. For each H(i), use the histogram multilevel division method to obtain a division (i), in which process, the fluctuation threshold T 2 is adopted. The length of the new histogram H(i) commonly satisfies L < < 255. Due to the certain number of pixels in the same image, the height of the new histogram is much higher than the original histogram, which results in the greater drop between the peak and the valley. Thus we can find that the ratio of two fluctuation thresholds is inversely proportional to the ratio of the size about the two corresponding histograms, that is \( \frac{T_1}{T_2}\propto 1/\left(\frac{256}{L}\right) \). Therefore, the fluctuation threshold T 2 can be set as follows.

$$ {T}_2=\beta \frac{256}{L}\cdotp {T}_1 $$
(6)

where β is the control parameter.

If the division (i) includes C classes, the image is initially divided into C classes. And the final classification number ϒ(r, g, b) for each pixel is given by,

$$ \varUpsilon \left(r,g,b\right)=\Re \left(L\left(r,g,b\right)\right) $$
(7)

Next, we calculate the colour mean \( \left({\overline{r}}_i,{\overline{g}}_i,{\overline{b}}_i\right) \) of each class and take it as the initial cluster centroid, where i ∈ [0, C − 1].

3.4 Initializing sample data

In paper [14], the pixels are taken as the sample data for FCM clustering and the large number of pixels causes a very time consuming clustering process. However, replacing the pixels with the pre-segmentation regions as the new sample is an admissible solution. WaterShed is one of the typical pre-segmentation methods, as is the Lazy Snapping [4] method presented by Li et al. in the process of the interactive colour image segmentation, and the IRMLGC algorithm [10, 11] proposed by Peng. The latter two methods use the pre-segmentation regions as the nodes, to model the image and complete further segmentation. However, the WaterShed algorithm is based on the gradient information of the grayscale image, irrespective of the rich information on colour contained in the colour image. Therefore, the colour information is used in this study to calculate the gradient value to alleviate the over-segmentation and to make the pre-segmentation more accurate. The area of the segmented region is usually small, so we adopt the colour mean as a new sample set to depict each region, which significantly reduces the sample number, and then improves the FCM clustering efficiency.

3.5 FCM algorithm

In the FCM algorithm, the key is that we set the cluster number and the cluster centroids appropriately, using a common method - the random cluster centroids initialisation. However, it may not be able to achieve the ideal clustering result, even using a large number of iterations, especially for the image clustering. Because the sample size is very large, unreasonable initialisation would make the clustering process more time-consuming; in addition, the generation of clustering results has certain randomness. However, once the initial cluster centroids are more accurate, the FCM algorithm will be much effective. To improve the clustering performance of the FCM algorithm effectively, in our paper, the histogram information was used to initialise the cluster centroids.

3.6 Examples analysis

The implementation process of this algorithm is illustrated by the following example. Figure 2(a) is a Sperm image, and Fig. 3(a), (c) and (e) are the RGB component original histograms, respectively. From Fig. 3(c), it is obvious that, on the main peak, there are many local extrema, which will bring considerable interference in accurately searching the real peaks and valleys. Figure 3(d), (b) and (f) are the results of the RGB component original histograms after pre-processing. From the comparison between Fig. 3(c) and (d), we can see that the histogram waveform is very smooth after pre-processing, which provides an advantage for positioning the peaks and valleys accurately. The valleys searched by the fast location method are marked by the pink points.

Fig. 2
figure 2

Sperm image and its cluster result. a Original sperm image. b Initial cluster result. c FCM cluster result

Fig. 3
figure 3

Histograms. a R component original histogram. b R component histogram after pre-processing. c G component original histogram. d G component histogram after pre-processing. e B component original histogram. f B component histogram after pre-processing. g The merged histogram

In Fig. 3(b), the histogram is a unimodal distribution; in Fig. 3(d), the histogram, with three valleys, is divided into four categories; while in Fig. 3(f), there is one valley, so the histogram is divided into two categories. We denote the above categories as C R  = 1, C G  = 4 and C B  = 2, respectively. On merging the categories of the three components, eight (L = C R  × C G  × C B  = 8) possible clustering centroids are obtained, and the merged histogram is shown in Fig. 3(g). Three valleys are found by using the fast location method shown as the blue points in Fig. 3(g). Next, the histogram is divided into four categories and four initial cluster centroids are finally obtained. The initial clustering results are more accurate and are obtained from the initial cluster centroids directly; these are illustrated in Fig. 2(b). Figure 2(c) is the clustering results obtained by FCM methods; here, we can see that every small object maintains complete contour, and even the dark edge regions show accurate clustering with a clear distinction from the background.

4 Experimental results and analysis

The testing environment selected in our simulation experiments is as follows: Intel Xeon 3.10 GHz, 4G memory, VS2008 and VC++ programming environment. The experiment includes two parts. In Experiment I, qualitative analysis and comparison is made between our algorithm and the HTFCM algorithm; while in Experiment II, we make quantitative analysis and comparison on the BSD300 image database between our algorithm, the HTFCM algorithm and the DSRPCL algorithm.

4.1 Experimental results I

For Experiment I, the selected testing images are shown in Fig. 4; these are the Sperm image, Date image and the Teddy image with sizes 422×552, 800×600 and 284×398, respectively. Figure 5 shows the cluster results of the testing images in Fig. 4 by HTFCM algorithm and HCIFCM algorithm. Table 1 depicts the performance comparison between the HTFCM algorithm and the HCIFCM algorithm from seven aspects, such as clustering number, number of iterations, time consumed for initialising cluster centroids, time consumed of FCM, total time consumed and ratio of total time consumed. From the table, we can see that compared with the HTFCM algorithm, the clustering time of the HCIFCM algorithm has greatly reduced, and the clustering results are more optimistic with respect to the visual effects. As to why our algorithm has a better performance on running time, there are two reasons: on the one hand, for the clustering centroids initialisation, the main processing object is the histograms, and not all the pixels, which accelerates the process; on the other hand, the WaterShed algorithm is used for the pre-segmentation, and instead of the pixels, the segmentation regions are taken as the sample data, which greatly reduces the sample number and improves the performance of the FCM algorithm. The proposed method is superior to the HTFCM method in the aspects of efficiency, clustering numbers and iterations due to the accurate initial cluster centroids and robust histogram based multilevel thresholding framework.

Fig. 4
figure 4

Testing images for Experiment I. a Sperm. b Date. c Teddy

Fig. 5
figure 5

Experimental results I. a Cluster results by HTFCM algorithm. b Cluster results by HCIFCM algorithm

Table 1 Comparison of experimental results I

4.2 Experimental results II

For Experiment II, we test 300 images derived from Berkeley database (BSD300) and part of the original images are shown in Fig. 6.

Fig. 6
figure 6

Testing images for Experiment II. a #238011. b #299091. c #353013. d #55067. e #118035. f #241004

In the colour image segmentation, PRI (Probability Rand Index) is a widely used assessment metric [11, 15, 16] for comparing with manually labelled images, such as Ground Truth, to obtain the statistical ratio of correct segmented pixels. The PRI metric is given by,

$$ PRI\left({S}_{test},\left\{{S}_K\right\}\right)=\frac{1}{\left({}_2^N\right)}{\displaystyle \sum_{i,j,i<j}\left[I\left({l}_i^{S_{test}}={l}_j^{S_{test}}\right)p\left({\widehat{l}}_i={\widehat{l}}_j\right)+I\left({l}_i^{S_{test}}\ne {l}_j^{S_{test}}\right)p\left({\widehat{l}}_i\ne {\widehat{l}}_j\right)\right]} $$
(8)

where N is the sample number and l i is the clustering number of the pixel i. I( ) denotes a binary function, I(1) = 1 and I(0) = 0. S test is the segmentation result and {S K } is the set of manually labelled images. \( p\left({\widehat{l}}_i={\widehat{l}}_j\right)=\frac{1}{K}{\displaystyle \sum_{k=1}^KI\left({l}_i^{S_k}={l}_j^{S_k}\right)} \) and \( p\left({\widehat{l}}_i={\widehat{l}}_j\right)+p\left({\widehat{l}}_i\ne {\widehat{l}}_j\right)=1 \).

According to the authors’ suggestions from paper [8], in this experiment, we set P = 0.2, and the learning rate α of the DSRPCL1 algorithm and that of the DSRPCL2 algorithm to 0.1 and 0.5, respectively; simultaneously, according to the suggestions from paper [14], we set H = 20, V = 0.008N and dc = 28 in the HTFCM algorithm, and in our algorithm, we set T 1 = 10 and β = 30, both of which are obtained from experience. In addition, the initial clustering number k of the DSRPCL1 algorithm and the DSRPCL2 algorithm are both set to 30.

Figure 7 shows the clustering results of the three testing images (#238011, #299091 and #353013) in Fig. 6, which are respectively obtained by the HTFCM algorithm, the DSRPCL1 algorithm, the DSRPCL2 algorithm and the HCIFCM algorithm proposed by this paper. Compared with other algorithms, for the moon in image #238011, the clustering result of HCIFCM algorithm is more accurate, whether in the edge retention or integrity; for the pyramid in the image #299091, the misclassification of this method is much less, and additionally, the clustering result preserves the sky hierarchy, which is similar to the original image; for the leaves in image #353013, the clustering result is smoother, and the integrity is considerably better.

Fig. 7
figure 7

Comparison of the experimental results. a DSRPCL1. b DSRPCL2. c HTFCM. d HCIFCM

Table 2 illustrates the performance comparison between the DSRPCL1 algorithm, the DSRPCL2 algorithm, the HTFCM algorithm and the HCIFCM algorithm from six aspects, including the clustering number, the PRI metric, the time consumed for initialising cluster centroids, the time consumed of FCM clustering, the total time consumed and the ratio between the total time consumed. From the table, we can see that the PRI value of the HCIFCM algorithm is much higher and that the clustering results are more accurate. In terms of the time consumed for initialising the cluster centroids, our algorithm is quicker and more stable due to the statistical histograms which are taken as the processing object, instead of the image pixels as in the HTFCM algorithm. In terms of the total time consumed, our algorithm is superior to the other three algorithms owing to the superpixel representation. Table 3 gives the average performance comparison on the BSD300 image database. Obviously, our algorithm has more advantages in the overall clustering performance.

Table 2 Comparison of the clustering performance
Table 3 Mean comparison of clustering performance on BSD300 images

5 Conclusions

In this work, we proposed a histogram-based colour image fuzzy clustering algorithm. This algorithm uses the histogram information in the colour image to initialise the cluster centroids. Next, the FCM clustering algorithm was applied to achieve the colour image segmentation. Compared with the DSRPCL algorithm and the HTFCM algorithm, our algorithm is more effective not only in the execution efficiency but also in the clustering performance. In the stage of cluster centroids initialisation, each component histogram, and not the pixels, is the main processing object. For the FCM clustering, the WaterShed algorithm is used for image pre-segmentation, and the segmented regions are taken as the new sample data, ultimately improving the efficiency of the algorithm greatly. In addition, a more efficient fast peak-valley location scheme is used to locate the valleys accurately, and the valleys are taken as the segmentation thresholds to segment the histogram, after which more reasonable initial cluster centroids can be obtained. Whether the initial cluster centroids obtained by the HTFCM algorithm are reasonable is basically affected by three control parameters, and there is also a degree of irrationality in the division process. For our algorithm, there are fewer parameters, and it is considerably easier to achieve parameter control. However, the parameter selection problem still restricts the clustering performance of our algorithm. Therefore, a non-parametric and automatic colour image clustering algorithm needs to be researched further.