1 Introduction

Image segmentation is a process of dividing an image into several non-overlapping, consistent regions which are homogeneous with respect to some characteristics. It plays an important role in image processing and pattern recognition. Results of image segmentation are very useful in applications such as object recognition, medical image processing, face recognition, content based image retrieval as well as other computer vision applications. In the past decades, many methods have been proposed for image segmentation which can be categorized into four groups: threshold-based, clustering-based, edge-based and region-based methods [18]. Among them, clustering is the method of creating groups of objects based on the similarity of relevant features, which has been widely used for image segmentation. Fuzzy C-means (FCM) algorithm has been a popular image segmentation method due to its simplicity of implementation and clustering validity. In order to resolve the outlier or noise sensitivity problem, many improved methods of FCM were proposed in the literature, the ones integrated with local spatial information were widely researched [1, 4, 9]. However, the FCM-type algorithms are usually sensitive to the selection of the initial cluster centers. Hence some study has been carried out and the K-harmonic means (KHM) [23] algorithms is a very effective method to cope with the problem mentioned above, which is also researched for image segmentation [15]. Many improved algorithms based on KHM have been proposed. The most popular methods are the combination of KHM with other swam intelligence algorithms, which can take full advantage of global search ability of heuristic optimization and local search ability of KHM. Such as combining KHM with Particle Swarm Optimization (PSO) [21], Ant Colony Optimization (ACO) [13], Variable neighborhood search (VNS) [2, 5], Candidate groups search (CGS) [10], firefly algorithm (FA) [24], simple swarm optimization (SSO) [22]. However, these swarm intelligence based clustering algorithms are usually time-consuming and the improvements of performance are insignificant for many data sets.

For the above-mentioned KHM and its improved version, it is assumed that all features make equal contributions during the cluster processing, which can be a limitation in practical clustering situations. Because some features can have higher relevance in the clustering information than others. Recently the feature weighting strategy has gained much attention and been used to improve different kinds of clustering algorithms. The W-K-means algorithm [11] utilizes the feature weight to measure the importance of features and introduces the parameter β to further control its compact. But the appropriate value of β is hardly determined for different datasets due to the need of some prior knowledge. Then the E-W-K-means [12] was proposed to overcome the drawback of W-K-means, it uses the between-cluster information as the denominator of the dissimilarity measure. Xing [20] proposed an improved FCM method called IFWFCM, which was based on feature weighted distance and the feature weights were adaptively updated during the clustering process. But there is no relevant work with respect to KHM algorithm.

In this paper, we propose novel KHM-type clustering algorithms using adaptive feature weighting to identify the importance of different features. Some elements including Lab color space, local homogeneity and texture of the color image are extracted to constitute a more comprehensive feature vector. Moreover, the feature group weighting strategy is introduced to new clustering algorithm for color image segmentation. Experiments on some benchmark datasets from the UCI Repository and Berkeley segmentation datasets [3] demonstrate that the proposed feature weighting methods outperform others.

The rest of the paper is organized as follows: Section 2 will present KHM algorithm and feature weighting method. Section 3 will describe the proposed methods. Section 4 will analyze the experimental results on benchmark datasets and color images. Section 5 will conclude the work in this paper.

2 Preliminaries

2.1 K-harmonic means clustering

The KHM algorithm, similar to K-means and FCM, is a center based partitional clustering algorithm, but the most obvious difference is that the harmonic means (HM) of the distances from each data point to the centers as components to its object function, allowing KHM to be less sensitive to the initial centers and achieve better clustering performance. Assuming that a data set X = (x 1, x 2, …, x n ), x i = (x i1, …, x i d ), which has n data points and each contains d features, is partitioned into k clusters with the centers denoted as c j (j = 1, 2, …, k). The objective function of KHM is expressed as follows.

$$ {J_{KHM}}(X,C) = \sum\limits_{i = 1}^{n} {\frac{k}{{\sum\nolimits_{j = 1}^{k} {\frac{1}{{d_{ij}^{p}}}}} }} {{,} }\forall i = 1{{,}}2, {\ldots} ,n. $$
(1)

where d i j is the Euclidian distance between data point x i and cluster center c j , p is an input parameter and can affect the performance of clustering, typically set as p ≥ 2 [22].

During the clustering process, the objective function is minimized and keeps steady until the end. Then x i is assigned to cluster j with the largest membership value. The new cluster center can be calculated according to (2).

$$ {\boldsymbol{c}}_{j}^{new} = \frac{{\sum\nolimits_{{{} }i = 1}^{n} {{m_{KHM}}({{{{\boldsymbol{c}}_{j}}} \left/ {{{\boldsymbol{x}}_{i}}}\right.}}) \times {w_{KHM}}({{\boldsymbol{x}}_{i}}) \times {{\boldsymbol{x}}_{i}}}}{{\sum\nolimits_{{{} }i = 1}^{n} {{m_{KHM}}({{{{\boldsymbol{c}}_{j}}} \left/ {{{\boldsymbol{x}}_{i}}}\right.})} \times {w_{KHM}}({{\boldsymbol{x}}_{i}})}} $$
(2)

where m K H M (c j /x i ) is the membership function.

$$ {m_{KHM}}({{{{\boldsymbol{c}}_{j}}} \left/ {{{\boldsymbol{x}}_{i}}}\right.})= \frac{{d_{ij}^{- p - 2}}}{{\sum\nolimits_{j = 1}^{k} {d_{ij}^{- p - 2}}} } $$
(3)

w K H M (x i ) is the weighting function.

$$ {w_{KHM}}({{\boldsymbol{x}}_{i}}) = \frac{{\sum\nolimits_{j = 1}^{k} {d_{ij}^{- p - 2}}} }{{{{\left( \sum\nolimits_{j = 1}^{k} {d_{ij}^{- p}} \right)}^{2}}}} $$
(4)

In [16, 19], some study was carried out on the fuzzy version of KHM that was similar to FCM, fuzzy K-harmonic means (FKHM), where the fuzzy membership u i j (i = 1, 2, …, n, j = 1, 2, …, k) was added in the dissimilarity computation that can replace m K H M (c j /x i ) to calculate the assignment of the dataset. The objective function of FKHM is expressed as follows.

$$ {J_{FKHM}}(X,C) = \sum\limits_{i = 1}^{n} {\frac{k}{{\sum\nolimits_{j = 1}^{k} {\frac{1}{{u_{ij}^{m}d_{ij}^{p}}}}} }} {{,} }\forall i = 1{{,}}2, {\ldots} ,n. $$
(5)

where m is the fuzzy parameter that is set as 2 here.

2.2 Feature weighting clustering algorithm

Certain features of datasets may exhibit higher relevance than others. Feature weighting clustering methods have been proposed in recent years. To over come the problem that the elements in a feature weight vector cannot be adaptively updated during the clustering process. In [20], an improved FCM method called IFWFCM was proposed, the updated equation can be derived from the revised object function based on the weighted distance. The objective function of IFWFCM is shown as (6).

$$ F({\boldsymbol{U}},{\boldsymbol{V}},{\boldsymbol{w}};{\boldsymbol{D}}) = \sum\limits_{j = 1}^{K} {\sum\limits_{i = 1}^{n} {u_{ij}^{m}\sum\limits_{q = 1}^{d} {{{[{w_{q}}({x_{iq}} - {c_{jq}})]}^{2}}}} } $$
(6)

where \({u_{ij}} \in [0,1],\sum \limits _{j = 1}^{k} {{u_{ij}} = 1},\;{w_{q}} \in [0,1],\sum \limits _{q = 1}^{d} {{w_{q}} = 1}\), and m is the fuzzy parameter that is usually set as 2.

3 The proposed algorithms

3.1 Feature weighted K-harmonic means

As described above, all features make equal contributions in the traditional KHM algorithm that maybe a limitation to the clustering performance, the importance of certain features to the dynamic cluster information cannot be appropriately manifested. But the updated feature weights can more accurately reflect the relevance of each feature in the clustering process. Hence, the feature weighting strategy like that of W-K-means and IFWFCM can be introduced into KHM to alleviate this problem. An input parameter β of W-K-means should be set in advance and the appropriate value β is hardly determined for different datasets. In view of the parameter p that is included in KHM, the feature weighted Euclidian distance \(d_{ij}^{(\boldsymbol {w})}\) with no parameter can be utilized to calculate the dissimilarity between each pair of the data point and the center. Hence the feature weighted K-harmonic means (WKHM) is proposed and thereby its objective function is defined as (7).

$$ {J_{WKHM}}(X,C) = \sum\limits_{i = 1}^{n} {\frac{k}{{\sum\nolimits_{j = 1}^{k} {\frac{1}{{{{\left[d_{ij}^{({\boldsymbol{w}})}\right]}^{p}}}}}} }} {{,}}\forall i = 1{{,2,}} {\ldots} ,n. $$
(7)

where \(d_{ij}^{({\boldsymbol {w}})} = \left \| {{\text {diag}}({\boldsymbol {w}})({{\boldsymbol {x}}_{i}} - {{\boldsymbol {c}}_{j}})} \right \| = \sqrt {\sum \nolimits _{q = 1}^{d} {{w_{q}^{2}}{{({x_{iq}} - {c_{jq}})}^{2}}}} \) with the feature weight vector w = (w 1, w 2, …, w d )T and \({\text {diag}}({\boldsymbol {w}}) = \left [ \begin {array}{l} {w_{1}}{{} }\;0\;\;{{} } {\cdots } {{} }0\\ \;0{{} }\;{w_{2}}\;{{} } {\cdots } {{} }0\\ \;\;\;\;\;\;\; {\cdots } \\ \;0\;\;0\;\; {\cdots } {{} }{w_{d}} \end {array} \right ]\), \({w_{q}} \in [0,1],\sum \limits _{q = 1}^{d} {{w_{q}} = 1}\).

The clustering procedure of WKHM is carried out by minimizing the objective function, hence it can be regarded as an optimization problem. In terms of the constraint of feature weights, we can utilize the Lagrange multiplier technology to achieve an unconstraint optimization problem and formulate the function L expressed as (8), where λ is the Lagrange multiplier.

$$ L = \sum\limits_{i = 1}^{n} {\frac{k}{{\sum\nolimits_{j = 1}^{k} {\frac{1}{{{{\left[d_{ij}^{({\boldsymbol{w}})}\right]}^{p}}}}}} }} - \lambda \left( \sum\limits_{q = 1}^{d} {{w_{q}}} - 1\right) $$
(8)

For each iteration, the update equations of cluster centers c j (j = 1, 2, …, k) and feature weights w q (q = 1, 2, …, d) can be obtained by taking the derivatives of the function L with respect to each of them. The detailed theoretical derivations are provided in Appendix A and B.

The update equation of w q (q = 1, 2, …, d) is shown as (9). Appendix A provides the detailed theoretical derivations of (9).

$$ w_{q}^{new} = {\left( {\sum\limits_{l = 1}^{d} {\frac{{\sum\nolimits_{i = 1}^{n} {{D_{iq}}}} }{{\sum\nolimits_{i = 1}^{n} {{D_{il}}}} }}} \right)^{- 1}} $$
(9)

where \({D_{iq}} = \frac {{\sum \nolimits _{j = 1}^{k} {\left ({{{\left [d_{ij}^{({\boldsymbol {w}})}\right ]}^{- p - 2}} \cdot {{({x_{iq}} - {c_{jq}})}^{2}}} \right )}} }{{{{\left (\sum \nolimits _{j = 1}^{k} {{{\left [d_{ij}^{({\boldsymbol {w}})}\right ]}^{- p}}} \right )}^{2}}}}\).

The overall procedure of WKHM is summarized as Algorithm 1. It should be noted that during the initialization step, the values of the elements in feature weight vector w are initialized with the same value.

figure d

3.2 Feature weighted fuzzy K-harmonic means

In this section, \(d_{ij}^{(\boldsymbol {w})}\) is also introduced into FKHM to form the feature weighted fuzzy K-harmonic means (WFKHM) algorithm, whose objective function is defined as (10).

$$ {J_{WFKHM}}(X,C) = \sum\limits_{i = 1}^{n} {\frac{k}{{\sum\nolimits_{j = 1}^{k} {\frac{1}{{u_{ij}^{m}{{\left[d_{ij}^{({\boldsymbol{w}})}\right]}^{p}}}}}} }} {{,}}\forall i = 1{{,2,}} \ldots,n. $$
(10)

where fuzzy membership \({u_{ij}} \in [0,1],\;\sum \limits _{j = 1}^{k} {{u_{ij}} = 1}\) and feature weight \({w_{q}} \in [0,1],\sum \limits _{q = 1}^{d} {{w_{q}} = 1}\), m is the fuzzy parameter that is set as 2 here.

Similar to WKHM, the update formulas of fuzzy membership, cluster center and feature weight are respectively expressed as (11)–(13) by utilizing the Lagrange multiplier. In view of the same principle to derive these variables of WFKHM with respect to WKHM, we do not provide the detailed analysis here.

$$ {u_{ij}} = {\left[ {\sum\limits_{s = 1}^{k} {{{\left( {\frac{{{{\left[d_{ij}^{({\boldsymbol{w}})}\right]}^{p}}}}{{{{\left[d_{is}^{({\boldsymbol{w}})}\right]}^{p}}}}} \right)}^{\frac{1}{{m + 1}}}}}} \right]^{- 1}} $$
(11)
$$ {\boldsymbol{c}}_{j}^{new} = \frac{{\sum\nolimits_{{{} }i = 1}^{n} {\frac{{{u_{ij}}^{- m}{{\left[d_{ij}^{({\boldsymbol{w}})}\right]}^{- p - 2}}}}{{{{\left( \sum\nolimits_{j = 1}^{k} {\left( {u_{ij}}^{- m}{{\left[d_{ij}^{({\boldsymbol{w}})}\right]}^{- p}}\right)} \right)}^{2}}}}} {{\boldsymbol{x}}_{i}}}} {{\sum\nolimits_{{{} }i = 1}^{n} {\frac{{{u_{ij}}^{- m}{{\left[d_{ij}^{({\boldsymbol{w}})}\right]}^{- p - 2}}}}{{{{\left( \sum\nolimits_{j = 1}^{k} {\left( {u_{ij}}^{- m}{{\left[d_{ij}^{({\boldsymbol{w}})}\right]}^{- p}}\right)} \right)}^{2}}}}}} } $$
(12)
$$ w_{q}^{new} = {\left( {\sum\limits_{l = 1}^{d} {\frac{{\sum\nolimits_{i = 1}^{n} {{D_{iq}}}} }{{\sum\nolimits_{i = 1}^{n} {{D_{il}}}} }}} \right)^{- 1}} $$
(13)

where \({D_{iq}}=\frac {{\sum \nolimits _{j=1}^{k} {\left ({{u_{ij}}^{-m}{{\left [d_{ij}^{({\boldsymbol {w}})}\right ]}^{-p-2}}\cdot {{({x_{iq}}-{c_{jq}})}^{2}}} \right )}} }{{{{\left (\sum \nolimits _{j=1}^{k} {\left ({u_{ij}}^{- m}{{\left [d_{ij}^{({\boldsymbol {w}})}\right ]}^{-p}}\right )}\right )}^{2}}}}\).

The overall procedure of WFKHM is similar to that of WKHM, which is shown as Algorithm 2.

figure e

3.3 Feature extraction of color image

To deal with color image segmentation, the IFWFCM algorithm [20] has confirmed the effectiveness of adaptive feature weighting strategy. However, the overall segmentation performance of IFWFCM on color images is not superior to that of FCM according to the experimental results, and even become worse in some cases. Moreover, IFWFCM only chose Lab color space, so the feature weighting strategy cannot be fully taken advantage of. In this study, the feature vector comprised of color space, local homogeneity and texture is used, where eight elements are included. Generally, there are different types of color spaces in the literature such as RGB, Lab, HSV color spaces, and the second one is chosen to describe the pixels in our method due to its popularity and effectiveness. The local homogeneity that quantifies the uniformity of a region in the image is calculated in the HSV color space [7, 14, 17]. Two efficient methods: Singular Value Decomposition (SVD) and QR factorization in the RGB color space are utilized to calculate the texture components of the image, where the the spacing parameter is N s = 8, the size of the window is 32, and the theta angles to rotate the window are 𝜃 = [0, 22.5, 45, 67.5]. Then, the homogeneity feature (HF) with 3 elements, 3 feature components in Lab color space and the texture feature (TF) with 2 elements are integrated to construct the 8 dimensional feature vector {H F h u e , H F s a t , H F v a l , L, a, b, T F S V D , T F Q R }, hence the segmentation process can be carried out.

Here we mainly introduce the local homogeneity, it substantially makes use of the local information from images. Homogeneity is defined as a composition of standard deviation and discontinuity. For an image with the size M × N, P i j denotes the pixel at the location (i, j), hence its normalized standard deviation is calculated as follows.

$$ {V_{ij}} = \frac{{{v_{ij}}}}{{{v_{\max} }}} \;and \;{v_{ij}} = \sqrt {\frac{1}{{{d^{2}}}}\sum\limits_{p = i - ({{(d - 1)} \left/ 2\right.})}^{i + ({{(d - 1)} \left/ 2\right.})} {\sum\limits_{q = j - ({{(d - 1)} \left/ 2\right.})}^{j + ({{(d - 1)} \left/ 2\right.})} {{{({I_{pq}} - {\mu_{ij}})}^{2}}}} } $$
(14)

where v max = max{v i j }, 0 ≤ i, pM − 1, 0 ≤ j, qN − 1. I p q is the color component of a pixel at the location (p, q).

w i j is a size d × d (for example 5 × 5) window centered at (i, j) for the computation of deviation. μ i j is the mean color component of a pixel P i j within the window w i j , whose equation is shown as follows.

$$ {\mu_{ij}} = \frac{1}{{{d^{2}}}}\sum\limits_{p = i - ({{(d - 1)} \left/ 2\right.})}^{i + ({{(d - 1)} \left/ 2\right.})} {\sum\limits_{q = j - ({{(d - 1)} \left/ 2\right.})}^{j + ({{(d - 1)} \left/ 2\right.})} {{I_{pq}}}} $$
(15)

The discontinuity of color component at each position (i, j) can be calculated by its magnitude e i j of the gradient that is obtained by Sobel operator within a t × t (for example 3 × 3) window, hence the normalized magnitude of the gradient is calculated as follows.

$$ {E_{ij}} = \frac{{{e_{ij}}}}{{{e_{\max} }}} \;and \; {e_{ij}} = \sqrt {{G_{x}^{2}} + {G_{y}^{2}}} $$
(16)

where e max = max{e i j }, G x and G y are the gradient of color component in the x and y directions respectively.

Therefore, the local homogeneity of a pixel P i j in an image is calculated as (17).

$$ {H_{ij}} = 1 - {E_{ij}} \times {V_{ij}} $$
(17)

The value of local homogeneity at each location is in the interval [0,1]. The more uniform the local region around a pixel is, the larger the homogeneity value of the pixel.

3.4 Feature group weighting based clustering approaches

During the process of color image segmentation, in order to distinguish different types of features more effectively, the feature group weighting strategy is introduced into WKHM and WFKHM, the feature group weighted K-harmonic means (GWKHM) and the feature group weighted fuzzy K-harmonic means (GWFKHM) are further proposed. Note that, in [6, 8] the feature groups are assigned the weights in each cluster that may not be applicable for our study, hence we adopt the global feature group weighting approach here. The features of local homogeneity, Lab color space and texture are respectively represented as three groups: G(1) = {H F}, G(2) = {F L a b }, G(3) = {T F}, a weight value v t (t = 1, 2, 3) is assigned to the corresponding feature group. The feature group weighted Euclidian distance \(d_{ij}^{(g\boldsymbol {w})}\) is calculated as (18).

$$ \begin{array}{l} d_{ij}^{(g{\boldsymbol{w}})} \,=\, \sqrt {{v_{1}}\sum\limits_{q \in HF} {{w_{q}^{2}}{{({x_{iq}}-{c_{jq}})}^{2}}}+{v_{2}}\sum\limits_{q \in F_{Lab}} {{w_{q}^{2}}{{({x_{iq}}-{c_{jq}})}^{2}}}+{v_{3}}\sum\limits_{q \in TF} {{w_{q}^{2}}{{({x_{iq}}-{c_{jq}})}^{2}}}} \;\\ \;\;\;\;\;\;\; = \sqrt {\sum\limits_{t = 1}^{T} {{v_{t}}\sum\limits_{q \in G(t)} {{w_{q}^{2}}{{({x_{iq}} - {c_{jq}})}^{2}}}} } \end{array} $$
(18)

where \(\sum \limits _{t=1}^{T} {{v_{t}}}=1, T=3\) and \(\sum \limits _{q \in G(t)} {{w_{q}} = 1},\;{w_{q}} \in [0,1],\;1 \le t \le T\).

For the algorithm GWKHM, the Lagrange function L 2 formulated as (19) is similar to that of WKHM, where λ t (t = 1, 2, 3) is the Lagrange multiplier.

$$ {L_{2}} = \sum\limits_{i = 1}^{n} {\frac{k}{{\sum\nolimits_{j = 1}^{k} {\frac{1}{{{{\left[d_{ij}^{(g{\boldsymbol{w}})}\right]}^{p}}}}}} }} - \sum\limits_{t = 1}^{T} {{\lambda_{t}}\left( \sum\limits_{q \in G(t)} {{w_{q}}}-1\right)} $$
(19)

The update equation of feature weights of G(1) is expressed as (20). The detailed theoretical derivations of (20) are shown in Appendix B.

$$ w_{q}^{new}={\left( {\sum\limits_{l \in G(1)} {\frac{{\sum\nolimits_{i=1}^{n} {{D_{iq}}}}}{{\sum\nolimits_{i=1}^{n} {{D_{il}}}}}}} \right)^{-1}} $$
(20)

where \({D_{iq}}=\frac {{\sum \nolimits _{j = 1}^{k} {\left ({{{\left [d_{ij}^{(g{\boldsymbol {w}})}\right ]}^{-p-2}} \cdot {{({x_{iq}}-{c_{jq}})}^{2}}} \right )}} }{{{{\left (\sum \nolimits _{j = 1}^{k} {{{\left [d_{ij}^{(g{\boldsymbol {w}})}\right ]}^{-p}}} \right )}^{2}}}}\)

Similar to the case of qG(1), the update formulas of feature weights in the case of qG(2) and qG(3) are respectively \(w_{q}^{new}={\left ({\sum \limits _{l \in G(2)}{\frac {{\sum \nolimits _{i=1}^{n} {{D_{iq}}}}} {{\sum \nolimits _{i = 1}^{n} {{D_{il}}}} }}} \right )^{- 1}}\) and \(w_{q}^{new} = {\left ({\sum \limits _{l \in G(3)} {\frac {{\sum \nolimits _{i = 1}^{n} {{D_{iq}}}} }{{\sum \nolimits _{i = 1}^{n} {{D_{il}}}} }}} \right )^{- 1}}\), where D i q is the same as that in (20).

For the algorithm GWFKHM, we can derive the update equations with the same principle described above, hence the computations of fuzzy membership and cluster center are the same forms as (11) and (12), but it needs to be noted that \(d_{ij}^{(\boldsymbol {w})}\) should be replaced by \(d_{ij}^{(g{\boldsymbol {w}})}\). The update equation of feature weight is shown as (21) in the cases of qG(1), qG(2) and qG(3).

$$ w_{q}^{new} = {\left( {\sum\limits_{l \in G(t)} {\frac{{\sum\nolimits_{i = 1}^{n} {{D_{iq}}}} }{{\sum\nolimits_{i = 1}^{n} {{D_{il}}}} }}} \right)^{- 1}},\;1 \le t \le T $$
(21)

where \({D_{iq}} = \frac {{\sum \nolimits _{j = 1}^{k} {\left ({{u_{ij}}^{- m}{{\left [d_{ij}^{({\boldsymbol {w}})}\right ]}^{- p - 2}} \cdot {{({x_{iq}} - {c_{jq}})}^{2}}} \right )}} }{{{{\left (\sum \nolimits _{j = 1}^{k} {\left ({u_{ij}}^{- m}{{\left [d_{ij}^{({\boldsymbol {w}})}\right ]}^{- p}}\right )} \right )}^{2}}}}\)

The overall procedures of GWKHM and GWFKHM are respectively identical to those of WKHM and WFKHM, where the major difference lies in the update equations of feature weights in different groups. That means in the Step4 of Algorithm 1, the (9) is replaced by (20) with qG(1), qG(2), qG(3), while in the Step5 of Algorithm 2, the (13) is replaced by (21).

Moreover, the update equation of feature weight of IFWFCM is formulated as (22) if the \(d_{ij}^{(g\boldsymbol {w})}\) is utilized, and the new method is named as GIFWFCM.

$$ w_{q}^{new}={\left( {\sum\limits_{l \in G(t)} {\frac{{\sum\nolimits_{i=1}^{n} {\sum\nolimits_{j = 1}^{k} {u_{ij}^{m}{{({x_{iq}}-{v_{jq}})}^{2}}}}}}{{\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j=1}^{k} {u_{ij}^{m}{{({x_{il}}-{v_{jl}})}^{2}}}}}}}} \right)^{- 1}},\;1 \le t \le T $$
(22)

3.5 Computational complexity

The time complexity of WKHM and WFKHM are both O(2T n K D 1), where T is the number of iterations, n is the number of data points, K is the number of clusters, and D 1 = 3 is the number of features in the Lab color space. Meanwhile, the time complexity of GWKHM and GWFKHM are both O(2T n K D 2), where D 2 = 8 is the number of extracted features. However, it should be noted that during the preprocessing phase, the time complexity of calculating the local homogeneity is O(2M N D 3), where M × N is the size of the image, and D 3 = 3 is the number of features in local homogeneity; and the time complexity of calculating the texture elements is O(4(M/N s )(N/N s )), where N s = 8 that has been described in the Section 3.3.

4 Experiment and analysis

4.1 The experiment on benchmark datasets

In this section, the performances of the two proposed methods WKHM and WFKHM are evaluated by conducting some experiments on several benchmark datasets, which are also compared with K-means, KHM and FKHM. The five algorithms are applied to eight common real-life datasets from the UCI Machine Learning Repository, namely Iris, Wine, Sonar, Ionosphere, WDBC, Australian, Vehicle and Satellite. The characteristics of all datasets are shown in Table 1, where K is the number of cluster, d is the number of dimensions and n is the number of data points.

Table 1 The characters of experimental datasets

The experiments are conducted on a computer with Intel i7-4770, CPU 3.40GHz and 16GB RAM by using MATLAB2014a. To evaluate the performance of all algorithms, two well-known external cluster validity indexes (CVIs): accuracy (Acc) and rand index (RI) are adopted here, their values are in the interval [0,1], the larger they are, the better clustering result. The parameters of the above five algorithms are set as: the total iteration number is M a x i t e r = 100, the fuzzy membership index in FKHM and WFKHM is m = 2. Note that, if the value of the objective function does not change obviously, the algorithm will stop in advance with the iteration number smaller than M a x i t e r. The parameter p is set as 3. All algorithms are repeatedly conducted 20 times independently, then their performances are compared in terms of the means and the standard deviations of A c c and RI shown in Table 2. For each dataset, the best value of each evaluation metric is highlighted in bold type.

Table 2 The results of A c c and RI of the five algorithms on eight datasets (Mean ± Standard deviation)

It can be seen from Table 2 that the performances of two feature weighting approaches have shown obvious superiority over the K-means, standard KHM and FKHM for most datasets, which verifies the effectiveness of adaptive feature weighting strategy. Except for datasets Wine and Sphere, the improvement of A c c on the datasets Iris, Sonar, WDBC, Australian, Vehicle and Satellite are respectively 6.72%, 7.92%, 22.70%, 41.86%, 0.22% and 7.88%. For the proposed algorithms, the variances of A c c and RI are 0 in most cases, which has demonstrated the insensitiveness of the algorithms, it can solve the initialization problem of K-means.

4.2 The experiment on color images

To verify the performance of proposed methods with feature weight for color image segmentation, we carried out our experiments on the well-known Berkeley Segmentation Dataset. The human segmentation results are provided for the sake of evaluating the segmentation performance, but it should be noted that they are merely provided for rough analysis because the completely true segmentation results are not achieved. As the quantitative measurement for segmentation results, the ratio of misclassification E r r o r is adopted here, whose equation is shown as (23).

$$ Error = \frac{{\sum\nolimits_{i = 1}^{K} {{e_{i}}}} }{{M \times N}} \times 100\% $$
(23)

where, K is the number of color regions, and e i is the color error of the i th region.

4.2.1 Experiments in the Lab color space

In this section, the segmentation performances of FCM, KHM, FKHM, IFWFCM [20], WKHM and WFKHM in the Lab color space are presented. The parameters of the clustering algorithms are the same as those in Section 4.1. Table 3 shows the E r r o r results of different methods on the color images. For the purpose of comparing the performance of algorithms visually, only a portion of experiments are shown from Figs. 12 and 3.

Table 3 Error rates of different algorithms for color images with the Lab features
Fig. 1
figure 1

The segmentation results according to different algorithms on image 299091

Fig. 2
figure 2

The segmentation results according to different algorithms on image 163062

Fig. 3
figure 3

The segmentation results according to different algorithms on image 124084

As we can see from Table 3 the feature weighting clustering algorithms can always achieve smaller E r r o r values than other methods. It should be noted that the proposed methods are not sensitive to the initialization, leading to more stable results compared with IFWFCM. For example, WKHM outperforms the other methods for images 135069, 238011, 113016, 80099, 24063, 108073 and 134052. However, the segmentation performances of images 299091, 163062 and 124084 are not desirable as observed from Figs. 13. This may be because the Lab color space is not very sufficient to describe these images, so the feature group weighting approaches with more types of features (i.e. Lab color space, local homogeneity, texture of the image) can be more effective and their segmentation performances are evaluated in the next subsection.

4.2.2 Experiments for feature group weighting based algorithms with the extracted features

In this section, the segmentation results of FCM, KHM, FKHM, GIFWFCM, GWKHM and GWFKHM with the distance \(d_{ij}^{(g\boldsymbol {w})}\) described in the Section 3.4 are presented and the performances are also evaluated in terms of the ratio of misclassification E r r o r. The parameters of the clustering algorithms are also the same as those in Section 4.1. We use traversal search to find the optimal feature group weights, for each feature group weight, the value ranges from 0.1 to 0.8, and the step size is set as 0.1. The experimental results demonstrate the weight of the Lab feature group is larger than that of the other two features for most images.

The E r r o r results of different methods with 8 features are shown as Table 4. Only a portion of experiments are shown on Figs. 4567891011 and 12. For the images 3096, 135069, 238011, 299091, 113016, 124084, 80099, 113044, 24063, 108073 and 134052, the proposed algorithm GWKHM or GWFKHM outperforms the others, obtaining the best E r r o r value respectively as 0.6664%, 0.4152%, 11.1580%, 6.3154%, 3.1807%, 3.5699%, 0.3258%, 4.1062%, 13.6612%, 34.2122%, 31.6177%. The three feature group weighting based methods can achieve smaller E r r o r values than the standard ones in most cases. Note that, the E r r o r values of the standard clustering algorithms are smaller than the proposed feature group weighting based methods for images 163062. Their segmentation results shown in Fig. 9c, d, e are closer to the human segmentation results. But we can observe from Fig. 9f, g, h that the segmentation results of the algorithms using feature group weighting can represent the detailed information better. To make a comparison between Tables 3 and 4, the results of the later are better than those of the former in most cases, hence we can conclude that the extracted feature vector with 8 dimensional components is superior to that merely in the Lab color space.

Table 4 Error rates of different algorithms for color images with the extracted features
Fig. 4
figure 4

The segmentation results according to different algorithms on image 3096

Fig. 5
figure 5

The segmentation results according to different algorithms on image 135069

Fig. 6
figure 6

The segmentation results according to different algorithms on image 238011

Fig. 7
figure 7

The segmentation results according to different algorithms on image 299091

Fig. 8
figure 8

The segmentation results according to different algorithms on image 113016

Fig. 9
figure 9

The segmentation results according to different algorithms on image 163062

Fig. 10
figure 10

The segmentation results according to different algorithms on image 124084

Fig. 11
figure 11

The segmentation results according to different algorithms on image 113044

Fig. 12
figure 12

The segmentation results according to different algorithms on image 24063

5 Conclusions

In this paper, we proposed novel KHM-type clustering algorithms with feature weighting for color image segmentation. To effectively compute the dissimilarity between different pixels and cluster centers, the Lab color space, local homogeneity and texture of the image are extracted to construct the feature vector. These features are divided into three groups during the segmentation process and a weight value is assigned to each group to measure its importance for clustering. Experimental results of both benchmark datasets and color images have shown the superiority of proposed methods, and the feature group weighting based clustering algorithms can achieve better performance for image segmentation than other related works. Adding more complicated features for more accurate image segmentation results in the proposed model will be further researched.