A modified adaptive differential evolution algorithm for color image segmentation

Khan, Ahmad; Jaffar, M. Arfan; Shao, Ling

doi:10.1007/s10115-014-0741-3

A modified adaptive differential evolution algorithm for color image segmentation

Regular Paper
Published: 22 March 2014

Volume 43, pages 583–597, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Knowledge and Information Systems Aims and scope Submit manuscript

A modified adaptive differential evolution algorithm for color image segmentation

Download PDF

Ahmad Khan^1,3,
M. Arfan Jaffar² &
Ling Shao³

625 Accesses
18 Citations
Explore all metrics

Abstract

Image segmentation is an important low-level vision task. It is a perceptual grouping of pixels based on some similarity criteria. In this paper, a new differential evolution (DE) algorithm, modified adaptive differential evolution, is proposed for color image segmentation. The DE/current-to-pbest mutation strategy with optional external archive and opposition-based learning are used to diversify the search space and expedite the convergence process. Control parameters are automatically updated to appropriate values in order to avoid user intervention of parameters setting. To find an optimal number of clusters (the number of regions or segments), the average ratio of fuzzy overlap and fuzzy separation is used as a cluster validity index. The results demonstrate that the proposed technique outperforms state-of-the-art methods.

Colour image segmentation with histogram and homogeneity histogram difference using evolutionary algorithms

Article 23 April 2015

A Multi-objective Evolutionary Algorithm for Color Image Segmentation

Performance Analysis of Differential Evolution Algorithm Variants in Solving Image Segmentation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Image segmentation and grouping similar visual objects have received an extensive attention in the last decades. It plays an important role in computer vision and graphics applications such as: object localization or recognition, data compression, tracking, and image retrieval.

Image segmentation techniques can be divided into image-domain-based, physics-based and feature-space-based techniques [18]. All these techniques with different variations have been used extensively, but every one shares some pros and cons. Image-domain-based techniques use both color features and spatial relationship, among pixels, to carry out segmentation task. The main difficulty lies in these techniques is the selection of seed region. Physics-based techniques use the reflection properties of material to carry out color image segmentation. It performs better in a situation where the reflectance of underlying objects are known; hence, these techniques are application specific. The feature-based techniques use only color features and ignore spatial relationship; therefore, the segmented regions are usually fragmented [17]. The contour-based approaches start from an initial boundary shape in the form of spline curves; these curves are iteratively evolved to minimize some energy function [6, 10, 22, 30]. The difficulty lies in these techniques is the manual initialization of spline curves to the boundaries of an object of interest.

Markov Random Field (MRF) as another statistical model that uses spatial connectivity among neighboring pixels [2]. Many approaches are based on Markov Random Fields such as Graph Cut [12–14, 25]. These algorithms solve the two-class problem, i.e., separating the background and foreground objects, and need human interaction to manually specify the seed regions. Similarly, the background and foreground segmentation is treated with watershed segmentation [3, 29]. The watershed approach suffers from over-segmentation that can be handled by morphological operations. Moveover, the segmentation accuracy is highly sensitive to morphological structuring element. Recently, [16] proposed a supper pixel segmentation algorithm that uses entropy rate of random walk on the graphs. This algorithm performs best in the case of supper pixel segmentation, but for general segmentation it produces over-segmented results.

Image segmentation can be treated as a clustering problem. Features of each pixel correspond to a pattern, and the combination of the pixels (i.e., segment) corresponds to a cluster. Keeping this analogy in mind, many clustering algorithms have been used for image segmentation [1, 5, 7]. Among the clustering algorithms, adopted for color image segmentation, the most famous is the k-means [7]. The fuzzy c-means (FCM) [1] has also extensively used for image segmentation due to its ease of implementation and simplicity. However, its implementation often faces two problems: 1) how many clusters should be made; 2) and how to select the initial centroids that are properly distributed in the image. The number of cluster centers and initialization has considerable impact on final results. Tan and Isa [27] used the concept of histogram thresholding to initialize the cluster centers to the dominating colors. They used compactness as an objective function to optimize the cluster centers. This approach performs poor for complex color images because only cluster compactness is not good enough to validate the segmentation accuracy. Level sets and spatial fuzzy have been combined for medical image segmentation [15]. This algorithm performs well for two-class problem but unable to segment general color images of complex scenes. Another color image segmentation technique based on the merging of maximal similarity of regions is proposed in [23]. Bhattacharyya distance between the histograms of the regions is used as a cluster validity index. The main weakness of this approach is the human intervention to mark the seed regions.

In this paper, we propose a novel differential evolution algorithm based on spatial fuzzy c-mean (sFCM) [5] to make compact and well-separated clusters. A combination of fuzzy separation and fuzzy overlap is used as a cluster validity index to determine the optimal number of clusters automatically. Initialization greatly effects convergence and diversity. Purely random initial solutions have more probability to visit or even revisit the unproductive regions of the search space [24]. The opposite number decreases the chances to visit the unproductive regions again and again. For this purpose, we take the best individuals from the union of purely random solutions and their opposites as initial population. An external archive is maintained to store the dropped solutions in order to give chances to poor solutions in the evolution process. Poor solutions are not always poor, but it can give best results with slight perturbation. To represent each cluster center, we use the HSV color space relying on the hue, saturation, and value properties the color. Such color space allows us to specify colors in a way that is close to human experience of colors.

The rest of the paper is organized as follows. The basic concept of classical differential evolution algorithm is presented in Sect. 2. Section 3 explains the proposed algorithm, modified adaptive differential evolution (MoADE), in detail. Experimental results are described in Sect. 4 followed by conclusion.

2 Classical differential evolution algorithm

Differential evolution (DE) is a relatively recent population-based evolutionary heuristic. It is used to optimize problems over continuous domains. In DE, a vector of real numbers is used to represent a decision variable or solution. An individual vector can be represented, is a bold face letter $\varvec{{{x}}}$, in the following way:

$$\begin{aligned} \varvec{{x}}=[x^{(1)}, x^{(2)}, x^{(3)}, \ldots , x^{(D)}] \end{aligned}$$

(1)

where the superscripts $(1,2,3,\ldots ,D)$ represent the dimension of the problem. In classical differential evolution, the initial population $\{\varvec{{{x}}}_{i,0}$ $=(x_{i,0}^{(1)}$, $x_{i,0}^{(2)}$, $x_{i,0}^{(3)}, \ldots x_{i,0}^{(D)})\}$ $\mid i=1,2,3, \ldots N_{p}$ is generated randomly according to uniform distribution $ x_{low}^{j}\le x_{i,0}^{j}\le x_{up}^{j} $ for $ j=1,2,\ldots \ldots D$ where $D$ is the dimension of the problem and $N_{p}$ is the population size. After initialization DE enters to a loop of evolutionary operations: mutation, crossover, and selection.

Mutation: To mutate a particular target vector $\varvec{{{x}}}$ at iteration $t$, three vectors are randomly selected from the current population. In other words, the $j$th component of a trial vector $\varvec{{{x}}}$ at iteration $t$ is generated as follows:

$$\begin{aligned} \vartheta _{t}^{j}=x_{t}^{j}+F\cdot (x_{r1,t}^{j}-x_{r2,t}^{j}) \end{aligned}$$

(2)

where the indices $r1$ and $r2$ are distinct integers uniformly chosen from the set $ \{1,2,\ldots ,N_{p}\}$, and $F$ is the mutation factor that lies in interval $(0,1+)$.

Crossover: The crossover operation is performed to increase the diversity of the perturb vector. A binomial crossover operation forms the final trial/offspring vector $\varvec{{{\upsilon }}}_{t}=(\upsilon _{t}^{1}, \upsilon _{t}^{2} ,\ldots , \upsilon _{t}^{D}) $

$$\begin{aligned} \upsilon _{t}^{j}= \left\{ \begin{array}{ll} \vartheta _{t}^{j} \quad \mathrm{if} \quad rand (0,1)\le CR \quad or \quad j = j_{rand}, \\ x_{t}^{j}, \quad \mathrm{otherwise} \end{array} \right. \end{aligned}$$

(3)

where $rand(0,1)$ is a uniform random number in the interval $[0,1]$ and independently generated for each $j$ and $i$, respectively, $\quad j_{rand} = randint(1, D)$ is an integer randomly chosen from $1$ to $D$. The crossover probability $CR\in [0, 1]$ roughly corresponds to the average fraction of vector components that are inherited from the mutation vector.

Selection: It decides whether a particular trail individual $\varvec{{{\upsilon }}}$ or the parent $\varvec{{{x}}}$ at generation $t$ should be allowed to next generation. Selection is performed on the bases of fitness values $f(\cdot )$ of trail and parent vectors. For example, if we have a minimization problem, then the selection is given by:

$$\begin{aligned} \varvec{{x}}_{t+1}= \left\{ \begin{array}{ll} \varvec{{{\upsilon }}}_{t} \,\,\,\, \mathrm{if} \,\, f(\varvec{{{\upsilon }}}_{t})< f(\varvec{{{x}}}_{t}) \\ \varvec{{{x}}}_{t}, \,\,\mathrm{otherwise} \end{array} \right. \end{aligned}$$

(4)

The operation in Eq. (4) is called a successful update if the trial vector $\varvec{{{\upsilon }}}$ is better than parent $\varvec{{{x}}}$, i.e., the improvement or evolution progress $\bigtriangleup = f(\varvec{{{x}}}) - f(\varvec{{{\upsilon }}})$ is positive. The evolutionary process continues up to a fixed number of iterations. The best vector in the last generation is considered the solution.

3 Modified adaptive differential evolution

In classical differential evolution, the crossover and mutation parameters are fixed to generate all trial vectors at all generations. In adaptive DE algorithms, each individual $i$ is associated with its own crossover probability $CR_{i}$ and mutation probability $F_{i}$. In this paper, an adaptive differential evolution ADE [32]) is applied to the domain of color image segmentation by incorporating a new objective function and opposition-based learning. The new algorithm is called MoADE. The pseudocode of algorithm 1 explains MoADE in detail.

DE algorithm needs to be initialized before start of iterative procedure. In practice, initialization is either done by the user using some sort of the domain knowledge or random procedure. The pure random initialization has proved to be inappropriate since it forces the clustering algorithms to converge to local optima. It means there is more probability to visit or even revisit the unproductive region of the search space. The opposite numbers have high chance to produce best results than purely random ones. Here, the initial population takes the best individuals from the union of purely random solutions and their opposites [24] in order to avoid unproductive samples or solutions. In algorithm 1, steps $2-8$ show the implementation of opposition-based initialization. The opposite of each component of each individual can be calculated as:

$$\begin{aligned} \bar{x}_{i}^{j}=a^{j}+b^{j}-x_{i}^{j} \quad i=1, 2, \ldots , N_{p}; j=1, 2, \ldots , D \end{aligned}$$

(5)

where $x_{i}^{j}$ and $\bar{x}_{i}^{j}$ denote the $j$th component/dimension of the $i$th vector and its opposite, respectively, $a^{j}$ and $b^{j}$ represent the intervals of the $j$th dimension of the search space. The probability $randn(0,1)<1/(3+e^{-1/g})$ encourages high exploration in initial stages while high exploitation in the final stages. If the probability holds for particular individual $\varvec{{{x}}}_{i}$ then its opposite can be calculated as:

$$\begin{aligned} \varvec{{\bar{x}}}_{i,g}=\hbox {MIN}^{d}+\hbox {MAX}^{d}-\varvec{{x}}_{i,g} \end{aligned}$$

(6)

where $\hbox {MIN}^{d}$ and $\hbox {MAX}^{d}$ are the minimum and maximum values of a particular dimension in the current generation $g$. The intervals $[\hbox {MIN}^{d},\hbox {MAX}^{d}]$ encourages to stay in already shrunken search space. If other boundary intervals are used rather than $[\hbox {MIN}^{d},\hbox {MAX}^{d}]$, then we would jump out side of the already shrunken search space and would lost the knowledge of the current reduced space (converged population).

Sometimes inferior solutions, with little change, provide additional information about the progress direction compared to current population [32]. For this purpose, the mutation strategy DE/current-to-pbest/1 with archives [32] is adopted. This mutation strategy uses a set of archived inferior solutions denoted by A and the current population denoted by P to generate a mutation vector as given by Eq. (7).

$$\begin{aligned} \varvec{{\vartheta }}_{i,g}= \varvec{{x}}_{i,g}+F_{i}(\varvec{{x}}_{best,g} - \varvec{{x}}_{i,g})+F_{i}(\varvec{{x}}_{r1,g}- \varvec{{x}}_{r2,g}) \end{aligned}$$

(7)

where $\varvec{{{x}}}_{best,g}$ is randomly chosen as one of the top $p\,\% $ individual in the current population and $F_{i}$ is the mutation factor that is associated with $\varvec{{{x}}}_{i}$ and re-generated at each generation by the adaptation process introduced later in Eq. (12). The vectors $\varvec{{{x}}}_{i,g}$ and $\varvec{{{x}}}_{r1,g}$ are selected from P, while $\varvec{{{x}}}_{r2,g}$ is randomly selected from $\mathbf P \cup \mathbf A $.

Initially the archive is empty. Then after each generation the parent solutions, that fail in the selection process, are added to it. The archive is made of fixed size to avoid significant computation overhead. If the archive size exceeds a certain threshold, say $N_{p}$, then some solutions are randomly removed to keep the archive size at $N_{p}$.

The archive provides information about the progress direction and improves the population diversity. The larger $F$ increases the population diversity but it tends to converge to local minima. The proposed algorithm with archives and opposition-based learning is able to search the space effectively in order to come out of local optima.

3.1 Parameters adaptation

Similar to [32], the crossover probability $CR_{i}$ of each individual $\varvec{{{x}}}_i$ is independently generated according to a normal distribution of mean $\mu _{CR}$ and standard deviation 0.1.

$$\begin{aligned} CR_{i} = randn_{i}(\mu _{CR}, 0.1) \end{aligned}$$

(8)

The set of all successful crossover probabilities $CR_{i}$s at generation $g$ is represented by $S_{CR}$. The mean $\mu _{CR}$ is initialized to be 0.5 and then updated at the end of each generation as:

$$\begin{aligned} \mu _{CR} = (1 - c)\cdot \mu _{CR} + c\cdot mean_{W}(S_{CR}) \end{aligned}$$

(9)

where $c$ is a positive constant between 0 and 1 and $mean_{W}(\cdot )$ is the weighted average of all successful crossover probabilities at generation $g$. The $mean_{W}(\cdot )$ can be calculated as:

$$\begin{aligned} mean_{W}(S_{CR})=\sum _{s_{j} \in S_{CR}}w(\mu ,s_{j})(s_{j}) \end{aligned}$$

(10)

The weights $w(\mu ,s_{j})$ are then computed as follows:

$$\begin{aligned} w(\mu ,s_{j})=\frac{1}{Z_{j}}e^{-\Vert \mu -s_{j}\Vert _{2}^{2}} \end{aligned}$$

(11)

where $Z_{j}$ is a normalization constant ensuring that $\sum _{s_{j} \in S_{CR}}w(\mu ,s_{j})(s_{j})=1$. Compared to arithmetic means, the weighted average gives more weights to frequently used successful crossover factors and thus maintain an optimal crossover range.

Similar to [32], at each generation $g$, the mutation factor $F_{i}$ of each individual $\varvec{{{x}}}_{i}$ is independently generated according to a Cauchy distribution with location parameter $\mu _{F}$ and scale parameter 0.1.

$$\begin{aligned} F_{i} = randc_{i}(\mu _{F}, 0.1) \end{aligned}$$

(12)

When the mutation factors are highly concentrated around a certain value, then the greedy mutation strategies (such as DE/best, and DE/current-to-pbest) tend toward premature convergence. The Cauchy distribution helps to diversify the mutation factors and thus avoid premature convergence very effectively compared to normal distribution. At a particular generation, $g$ the set of all successful mutation factors is represented by $S_{F}$. The location parameter $\mu _{F}$ of the Cauchy distribution is initialized to be 0.5 and then updated at the end of each generation as:

$$\begin{aligned} \mu _{F} = (1 - c)\cdot \mu _{F} + c\cdot mean_{L}(S_{F}) \end{aligned}$$

(13)

where $mean_{L}(\cdot )$ is the Lehmer mean

$$\begin{aligned} mean_{L}(S_{F})=\frac{\sum _{F\in S_{F}}F^{2}}{\sum _{F\in S_{F}}F} \end{aligned}$$

(14)

The Lehmer mean gives more weights to larger successful mutation factors. Therefore, Lehmer mean is helpful to propagate larger mutation factors to improve the progress rate.

3.2 Objective function formulation

The objective function or the cluster validity index (CVI) plays an important role in clustering and segmentation. It checks whether the provided clustering $(U;V )$ of $X$ reflects the original structure of the data or not. Clustering algorithms need $c$, the number of clusters, to be known in advanced. Due to unsupervised nature of clustering, the user does not have prior knowledge about $c$. A reliable cluster validity index (CVI) should take into account both compactness and separation for the $c$-partition of $U$. If only the former is considered then the best partition consists of as many clusters as that of points. On the other hand, if only the latter is considered, the best partition consists of a single cluster. For the accurate and reliable segmentation, it is important to combine both measures in the CVI. But the separation generally finds distances between centroids of different clusters and are insufficient to represent the geometrical structure of the data.

In this paper, we use an aggregation functions, for the cluster validity, that map a collection of values $\varvec{{{u}}}$ in [0,1] to a value in [0,1], formally a vector $\varvec{{{u}}}=(u_{1};\ldots ;u_{c})^{t}$ $\longmapsto A(u)$. The norms (t-norms) and conorms (t-conorms) are the frequently used aggregation operators in the class of triangular norms. They are used to implement conjunctive and disjunctive operators in field of fuzzy logic and fuzzy set theory. A t-norm $\top $ and t-conorm $\bot $ are commutative, associative and monotone functions having 1 and 0, respectively, for the neutral elements [11].

If ${\varvec{{v}}_{1}}$, ${\varvec{{v}}_{2}}$,..., ${\varvec{{v}}_{c}}$ denote clusters’ centers, then the membership ${{u}_{ij}}$, $i = 1, 2,\ldots , c$ and $j = 1, 2,\ldots , n$, can be computed as:

$$\begin{aligned} u_{ij}=\frac{1}{\sum _{k=1}^{c}(\frac{D(\varvec{{v}}_{i},\varvec{{x}}_{j})}{D(\varvec{{v}}_{k},\varvec{{x}}_{j})})^\frac{1}{m-1}}, for \ 1\le i\le c, \ 1 \le j\le n \end{aligned}$$

(15)

where $D({\varvec{{{v}}}_{i}},{\varvec{{{x}}}_{j}})$ is the Euclidean distance between point ${\varvec{{{x}}}_{j}}$ and cluster center ${\varvec{{{v}}}_{i}}$, $m$ is the weighting coefficient. While computing ${{u}_{ij}}$ using Eq. (15), if $D({\varvec{{{v}}}_{i}},{\varvec{{{x}}}_{j}})$ is equal to zero for some $k$, then ${{u}_{ij}}$ is set to zero for all $i=1,2,\ldots ,c,\,i\ne k$, and ${{u}_{ij}}$ is set equal to one.

In images, neighboring pixels are highly correlated [5]. In other words, neighboring pixels possess similar features and there is high probability that they belong to the same cluster. A spatial function is defined in Eq. (16) to exploit the idea of spatial information [5].

$$\begin{aligned} {{h}_{ij}}=\sum _{k\in NB({\varvec{{{x}}}_{j}})}{{{u}_{ik}}} \end{aligned}$$

(16)

where $NB({\varvec{{{x}}}_{j}})$ represents a square window centered on pixel ${\varvec{{{x}}}_{j}}$ in the spatial domain. The spatial function ${{h}_{ij}}$ represents the probability that pixel ${\varvec{{{x}}}_{j}}$ belongs to $i$th cluster. The spatial information can be incorporated in membership function as:

$$\begin{aligned} {u}^{\prime }_{ij}=\frac{{{u}^{p}_{ij}}{{h}^{q}_{ij}}}{\sum _{k=1}^{c}{{{u}^{p}_{kj}}{{h}^{q}_{kj}}}} \end{aligned}$$

(17)

where $p$ and $q$ are parameters to control the relative importance of both functions. The centers can be updated using (18).

$$\begin{aligned} {\varvec{{{v}}}_{i}}=\frac{\sum _{j=1}^{n}{{{({{u}^{\prime }_{ij}})}^{m}}{\varvec{{{x}}}_{j}}}}{\sum _{j=1}^{n}{{{({{u}^{\prime }_{ij}})}^{m}}}},\,1\le i\le c \end{aligned}$$

(18)

The above equations gives the fuzzy clustering ($U;V$) of data $X$, where $u_{ij}$ defines the similarity of a data point $\varvec{{x}}_{j}$ to a center $\varvec{{v}}_{i}$. Using the membership vector $\varvec{{u}}^{\prime }_{j} =(u^{\prime }_{1j};\ldots ;u^{\prime }_{cj})^{t}$ ($j^{th}$ column of $U$), to assign a particular data point to the most matching group the standard t-conorm (max operator) is commonly used. However, the data point $\varvec{{x}}_{j}$ may satisfy more than one group description (i.e., lower values interact with the greatest) then using the exclusive partitioning (only the t-conorm) is not efficient. The aggregate value of $\varvec{{u}}^{\prime }_{j}$ is adequate to assess that how much a particular point $\varvec{{x}}_{j}$ belongs to several other clusters. We use the fuzzy OR operator (fOR-$\ell $ for short) defined in [20] to evaluates the degrees of similarity at a given order $(\ell )$. Let $\rho $ be the power set of $C = {1,2,\ldots ,c}$ and $\rho _{\ell } ={A \in |A|=\ell }$ where $|A|$ denotes the cardinality of subset $A$, then the fOR-$(\ell )$ associates a single value $\bot ^{\ell }(\varvec{{u}}^{\prime }_{j})\in [0; 1]$ to $\varvec{{u}}^{\prime }_{j}$ as given by the following equation:

$$\begin{aligned} \overset{\ell }{\bot }(\varvec{{u}}^{\prime }_{j})=\underset{i=1,2,\ldots ,c}{\overset{\ell }{\bot }} u^{\prime }_{ij}=\underset{A\in \rho _{l-1}}{\top }\left( \underset{j\in \backslash A }{\bot }\right) u^{\prime }_{ij} \end{aligned}$$

(19)

It must be viewed as some kind of generalization of the $\ell ^{th}$ highest value, $\ell \in C$. The standard t-norms, $\overset{\ell }{\bot }(\varvec{{u}}^{\prime }_{j})$ uses the $\ell ^{th}$ highest element of $u^{\prime }_{j}$ [20].

In this paper, we have used two measures, the fuzzy overlap and fuzzy separation [4], to find the relation of a point $\varvec{{x}}_{j}$ to the clusters and overcome the drawbacks of compactness and separation measures. The fuzzy overlap measure evaluates the degree of overlap of a specified number ($\ell $) of fuzzy clusters, and a fuzzy separation measure indicates the degree of overlap of the most probable cluster, i.e., the one corresponding to the highest membership degree, with respect to $c-1$ other clusters. A low value of this latter measure will denote a large separation of the most probable cluster from the others. For each point, $\varvec{{x}}_{j}$ of $X$, described by its membership vector $\varvec{{u}}^{\prime }$, the overlap measure between $\ell $ fuzzy clusters can be obtained by (20). A combination of $\ell $-order overlap degrees for $\varvec{{x}}_{j}$ is obtained by successive computing of $\overset{\ell }{\bot }(\varvec{{u}}^{\prime }_{j})$ for different values of $\ell $. To determine the over all degree of overlap for a given point $\varvec{{x}}_{j}$, the order that induces high overlap is find out. The most satisfied order(s) can be quantified by the fuzzy disjunction of the $\ell $-order overlap measures $({\ell } =2;c)$, so the overall overlap measure of a point $\varvec{{x}}_{j}$ is given by:

$$\begin{aligned} O_{\bot }(\varvec{{u}}^{\prime }_{j};c)=\underset{{\ell }=2,c}{\overset{1}{\bot }}\left( \underset{i=1,c}{\overset{\ell }{\bot }}u^{\prime }_{ij}\right) \end{aligned}$$

(20)

The inter-cluster separation plays an important role in cluster validity. For this purpose, the fuzzy separation [4] of each point $\varvec{{x}}_{j}$ with ($\varvec{{u}}^{\prime }_{j}$) is find out. It is the overlap measure within one fuzzy cluster, i.e., its separation from the other fuzzy clusters, since $\varvec{{u}}^{\prime }_{j}$ sums to one. This aggregation corresponds to the fuzzy disjunction of membership degrees for a given point $\varvec{{x}}_{j}$, which selects the most probable cluster. The fuzzy separation of $\varvec{{x}}_{j}$ with respect to the $c$ clusters is given as:

$$\begin{aligned} S_{\bot }(\varvec{{u}}^{\prime }_{j};c)=\overset{1}{\bot }\left( \underbrace{\underset{i=1,c}{\overset{1}{\bot }}u^{\prime }_{ij},\ldots \underset{i=1,c}{\overset{1}{\bot }}u^{\prime }_{ij}}_{c-1\ times}\right) \end{aligned}$$

(21)

The highest membership degree of a point $\varvec{{x}}_{j}$ corresponds to the small value of overlapping degree $O_{\bot }(\varvec{{u}}^{\prime }_{j}; c)$ and a large value of the separation $S_{\bot }(\varvec{{u}}^{\prime }_{j}; c)$. It means that if a point $\varvec{{x}}_{j}$ has a small $O_{\bot }(\varvec{{u}}^{\prime }_{j}; c)$ and large $S_{\bot }(\varvec{{u}}^{\prime }_{j}; c)$, it will be well-separated and not overlapping cluster. Then, the $\ell $-order overlap and separation index (OSI) taking values in [0; 1] as the average value of the ratios of both measures:

$$\begin{aligned} OSI_{\bot }(c)=\frac{1}{n}\sum _{j=1}^{n}\frac{O_{\bot }(\varvec{{u}}^{\prime }_{j};c)}{S_{\bot }(\varvec{{u}}^{\prime }_{j};c)} \end{aligned}$$

(22)

Running MoADE for $c\in [c_\mathrm{min}; c_\mathrm{max}]$ and selecting that value of $c$, which best minimizes Eq. (22). The $c_\mathrm{min}$ and $c_\mathrm{max}$ are set to 2 and 20, respectively. The small regions less than 100 pixels are merged to its neighboring segments on the basis of similarity to avoid over-segmentation.

MoADE explores the underlying search space efficiently using the opposition-based learning. Besides to diversify the search process, MoADE also speeds up the convergence considerably. Figure 1 shows the results of ADE [32] and proposes MoADE after 100 iterations by optimizing the new objective function. First row in Fig. 1 shows the results of ADE, while the second row shows the results of MoADE, respectively. It is clear from the figure that ADE does not detect the body of the boy, while MoADE performs better in this case. Similarly, for the pyramid image, (second column of Fig. 1) ADE fails to detect the black-gate-shaped and white-building-shaped objects. But the proposed MoADE performs better to detect the underlying objects accurately. Note that, ADE uses random procedure for initialization and is unable to uniformly initialize the search space to the productive regions. Once the search space is initialized to the unproductive regions, then the algorithm converges to local optimas’. On the other hand, the proposed MoADE uses the opposition-based learning strategy to initialize and evolve the population. By this way, best individuals from the union of current population and its opposition are selected for new generation in order to get global best solution.

4 Experimental results

Experimentations have been carried out on images taken from the Berkeley image segmentation database (BSD) [19]. The Berkeley image segmentation database (BSD) contain 300 natural images of sizes ($321\times 481$) or ($481\times 321$). The results of the proposed algorithm are compared with three state-of-the-art image segmentation algorithms: segmentation via lossy data compression (LDC) [31], efficient graph-based image segmentation (EG) [8], and normalized cuts and image segmentation (NC) [26].

The region number in NC is set to 20, which is the average number of segments marked by the human subjects in each image. In EG algorithm, the Gaussian smoothing parameter $\sigma =0.8$, the threshold value $k=300$, and the minimal region = 50 pixels are set, as described in [8]. Source codes of the all algorithms are downloaded from their respective web sites.

4.1 Qualitative comparisons

In this section, the segmentation results of the MoADE, NC, EG, and LDC approaches are evaluated visually. Eight images: shrub ($481\times 321$), church1 ($481\times 321$), eagle1 ($481\times 321$), eagle2 ($481\times 321$), texture ($321\times 481$), building ($481\times 321$), lion ($481\times 321$), and church2 ($481\times 321$) are selected from the Berkeley image segmentation database for qualitative comparison. Figure 2 shows the results of proposed MoADE, NC, EG, and LDC. For the Shrub image, the result of MoADE is closely matching the human perception, i.e., the shrub and non-shrub areas are separated in a best way, whereas the NC divides the uniform background into unnecessary segments. The performance of EG and LDC is not good for the shrub image because they make over-segmentation. In the case of eagle1 image, the proposed algorithm outperforms the other state-of-the-art algorithms. LDC performs good for eagle1 image compared to NC, which divide the uniform background into multiple segments, and EG, which fails to detect the actual boundaries of the object. Note that, the proposed MoADE also outperforms the other techniques in the case of church1 image. For eagle2 image, the proposed technique performs better by accurately separating the tree and eagle from the background region, but the performance of the other three methods are not good enough. Similar to the human perception, the proposed MoADE correctly segment the texture image by separating the two different textures. For the same image, the LDC performs good enough compared to NC, which divides the same texture into different segments, and EG, which makes over-segmentation, but fails to detect boundaries of the texture correctly. It is clear from the figure that the proposed algorithm also outperforms the other state-of-the-art methods in the case of building, lion, and church2 images.

NC tends to partition an image into regions of similar sizes, resulting in the region boundaries different from the real edges. EG gives strongly over-segmented results. LDC usually obtains better visual results than NC, and EG, but it often fails to find real edges and creates strongly over-segmented regions. It is easy to judge from the results that the proposed method performs better than the other state-of-the-art algorithms by generating boundaries which match the real edges of the objects.

4.2 Quantitative comparisons

Quantitative comparisons are also important to evaluate the performance of an algorithm objectively. Region differencing [13] measures how much a particular segmentation can be viewed as a refinement of other. The boundary matching [9] compares the results of an algorithm and that of human subjects by finding the average displacement error of the boundary pixels. Region differencing and boundary matching are not sufficiently good to objectively evaluate segmentation results [28]. If a single pixel is considered a separate segment, then region differencing and boundary matching will have best scores. Thus, the over-segmented results may be ranked good, which are far away from human perception of image understanding.

In this paper, we have used probabilistic rand index (PRI) [28] and variation of information $(VoI)$ [21] to objectively evaluate the segmentation results. Consider a set of ground truths, labeled by $K$ persons, ${S}_{1}$, ${S}_{2}, \ldots , {S}_{K}$, of an image consisting of $N$ pixels. Let $S_{test}$ be the segmentation result to be compared with the ground truths. Then, the PRI value is defined as:

$$\begin{aligned} \hbox {PRI}(S_{test},\{S_{k}\})=\frac{1}{{N\atopwithdelims ()2}}\sum _{p<q}[\bar{p}_{pq}^{c_{pq}}(1-\bar{p}_{pq})^{1-c_{pq}}] \end{aligned}$$

(23)

where $(p,q)$ is a pixel pair in image, $c_{pq}=T(l_{p}^{S_{test}}=l_{q}^{S_{test}})$ denotes the event of a pair of pixel $p$ and $q$ having the same label in the test result $S_{test}$, and $\bar{p_{pq}}=1/K\sum _{k=1}^{K}T(l_{p}^{S_{k}}=l_{q}^{S_{k}})$ is regarded the probability of $p$ and $q$ having the same labels. The value of PRI lies in the range [0,1], and the larger the better. The variation of information (VoI) is defined as:

$$\begin{aligned} \hbox {VoI}(S_{test},\{S_{k}\})=\frac{1}{K}\sum _{k}[H(S_{test})+H(S_{k})-2I(S_{test},S_{k})] \end{aligned}$$

(24)

where $H$ and $I$ denote the entropy and the mutual information, respectively. The detailed definitions of $H$ and $I$ can be found in [21]. VoI is an information-based measure, which determines how much one segmentation gives information about other. The value of VoI falls in $[0, \infty )$, and the smaller, the better.

Table 1 shows the average values of PRI and VoI for the four algorithms on 300 images of the Berkeley segmentation database. In this table, the second column shows the average PRI and VoI values between different human subjects, which are the best scores. It can be observed from the table that the proposed algorithm performs better than the other three state-of-the-art algorithms in terms of VoI and PRI scores. The average PRI value of the EG is close to proposed algorithm, but the average VoI value is much greater. Similarly, the average VoI of the LDC is close to proposed MoADE and better than that of the other two algorithms, but the PRI value is not better. Figures 3 and 4 graphically represent the PRI and VoI scores, respectively, for the four algorithms on 300 BSD images. It is clear from the curves in Figs. 3 and 4 that the proposed algorithm performs better compared to other state-of-the-art methods.

Table 1 Average values of PRI and VoI for the four algorithms on 300 BSD images

Full size table

5 Conclusion

In this paper, we presented a novel image segmentation algorithm. It is the combination of spatial fuzzy c-mean (sFCM), which looks for similarity in the local neighborhood among the pixels, and differential evolution (DE), which diversifies the search space to find the optimal solution. The ratio of fuzzy overlap and fuzzy separation is used as a cluster validity index in order to find out the optimal number of clusters automatically. The results of the proposed algorithm are compared with three state-of-the-art image segmentation algorithms. The qualitative and quantitative results demonstrate that the proposed algorithm performs better compared to the given state-of-the-art image segmentation algorithms. In the future, we are going to construct a unified matrix, from different feature spaces. i.e., color and texture, in such away that the important information of each feature space is to be preserved. Then, the segmentation will be performed on a joint or combined search space.

References

Ahmed MN, Yamany SM, Mohamed N, Farag AA, Moriarty T (2002) A modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data. IEEE Trans Med Imaging 21(3):193–199
Article Google Scholar
Besag J (1986) On the statistical analysis of dirty pictures. J R Stat Soc B 48:259–302
MATH MathSciNet Google Scholar
Beucher S, Meyer F (1993) The morphological approach to segmentation: the watershed transformation. Math Morphol Image Process 34:433–481
Google Scholar
Capitaine HL, Frelicot C (2011) A cluster-validity index combining an overlap measure and a separation measure based on fuzzy-aggregation operators. IEEE Trans Fuzzy Syst 19(03):580–588
Article Google Scholar
Chuang KS (2006) Fuzzy C-means clustering with spatial information for image segmentation. Comput Med Imaging Graph 30:9–15
Article Google Scholar
Cohen L (1991) On active contour models and balloons. CVGIP Image Underst 53(2):211–218
Article MATH Google Scholar
Duda R, Hart P, Stork D (2001) Fast discovery of association rules. Pattern Classification (2nd ed). Wiley, Hoboken, NJ
Felzenszwalb P, Huttenlocher D (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181
Article Google Scholar
Freixenet J, Munoz X, Raba D, Marti J Cufi X (2002) Yet another survey on image segmentation: region and boundary information integration. In: Proceedings of European conference on computer vision (ECCV 2002), Copenhagen, Denmark, May 2831, 2002. Lecture Notes in Computer Science 2352. Springer, Berlin, pp 408–422
Kass M, Witkin A, Terzopoulos D (1988) Snakes: active contour models. Int J Comput Vis 1(4):321–331
Article Google Scholar
Klement E, Mesiar R (2005) Logical, algebraic, analytic, and probabilistic aspects of triangular norms. Elsevier, Amsterdam
MATH Google Scholar
Kolmogorov V, Zabih R (2004) What energy functions can be minimized via graph cuts? IEEE Trans Pattern Anal Mach Intell 26(2):147–159
Article Google Scholar
Kov YB, Jolly M (2001) Interactive graph cuts for optimal boundary region segmentation of objects in ${N-D}$ images. In: Proceedings of eighth IEEE international conference on computer vision, 2001 (ICCV 2001), Jully 714, 2002. 1, pp 105–112
Li Y, Sun J, Tang CK, Shum HY (2004) Lazy snapping. ACM Trans Graph 23(3):303–308
Article Google Scholar
Li BN, Chui CK, Ong SH (2011) Integrating spatial fuzzy clustering with level set for autamated medical image segmentation. Comput Biol Med 41:1–10
Article Google Scholar
Liu MY, Tuzel O, Ramalingam S, Chellappa R (2011) Entropy rate superpixel segmentation. In: Proceedings of IEEE international conference on computer vision, (ICCV 2011), 2011, pp 2097–2104
Loo PK, Tan CL (2004) Adaptive region growing color segmentation for text using Irregular pyramid. In: Proceedings of 6th international workshop international workshop on document analysis systems, Florence, Italy, 2004, September 8–10. Lecture Notes in Computer Science 3163. Springer, Berlin, pp 264–275
Lucchese L, Mitra SK (2001), Color image segmentation: a state of the art survey. In: Proceedings of the Indian National Science Academy on image processing. Vis. Pattern Recognition vol 2, pp 207–221
Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of eighth IEEE international conference on computer vision, 2001 (ICCV 2001), Jully 714, 2001. 2, pp 416–423
Mascarilla L, Berthier M, Frelicot C (2008) A k-order fuzzy or operator for pattern classification with k-order ambiguity rejection. Fuzzy Sets Systems 159(15):2011–2029
Article MATH MathSciNet Google Scholar
Meila M (2005) Comparing clusterings-An axiomatic view. In: ICML ’05 Proceedings of the 22nd international conference on Machine learning, New York, NY, USA, pp 577–584
Mortensen E, Barrett W (1998) Interactive segmentation with intelligent scissors. Graph Models Image Process 60(5):349–384
Article MATH Google Scholar
Ninga J, Zhanga L, Zhanga D, Wub C (2010) Interactive image segmentation by maximal similarity based region merging. Pattern Recognit 43:445–456
Article Google Scholar
Rahnamayan S, Tizhoosh HR, Salama MMA (2008) Opposition-based differential evolution. IEEE Trans Evolut Comput 12(2):64–79
Article Google Scholar
Rother C, Kolmogorov V, Blake A (2004) Opposition-based differential evolution. ACM Trans Graphs 23(3):309–314
Article Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Tan KS, Isa NAM (2011) Color image segmentation using histogram thresholding fuzzy C-means hybrid approach. Pattern Recognit 44:1–15
Article MATH Google Scholar
Unnikrishnan R, Pantofaru C, Hebert M (2007) Toward objective evaluation of image segmentation algorithms. IEEE Trans Pattern Anal Mach Intell 29(6):929–944
Article Google Scholar
Vincent L, Soille P (1999) Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans Pattern Anal Mach Intell 13(6):583–598
Article Google Scholar
Xu C, Prince J (1998) Snakes, shapes, and gradient vector flow. IEEE Trans Image Process 7(3):359–369
Article MATH MathSciNet Google Scholar
Yang AY, Wright J, Sastry SS, Ma Y (2008), Unsupervised segmentation of natural images via lossy data compression. Comput Vis Image Underst 13(5):212–225
Google Scholar
Zhang J, Sanderson AC (2009) ADE: adaptive differential evolution with optional external archive. IEEE Trans Evolut Comput 13(5):945–958
Article Google Scholar

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions.

Author information

Authors and Affiliations

National University of Computer and Emerging Sciences, H-11/4, Islamabad, Pakistan
Ahmad Khan
College of Computer and Information Sciences, Al Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia
M. Arfan Jaffar
Department of Electronic and Electrical Engineering, University of Sheffield, Sheffield, UK
Ahmad Khan & Ling Shao

Authors

Ahmad Khan
View author publications
You can also search for this author in PubMed Google Scholar
M. Arfan Jaffar
View author publications
You can also search for this author in PubMed Google Scholar
Ling Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Arfan Jaffar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, A., Jaffar, M.A. & Shao, L. A modified adaptive differential evolution algorithm for color image segmentation. Knowl Inf Syst 43, 583–597 (2015). https://doi.org/10.1007/s10115-014-0741-3

Download citation

Received: 18 August 2012
Revised: 10 November 2013
Accepted: 07 March 2014
Published: 22 March 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10115-014-0741-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A modified adaptive differential evolution algorithm for color image segmentation

Abstract

Similar content being viewed by others

Colour image segmentation with histogram and homogeneity histogram difference using evolutionary algorithms

A Multi-objective Evolutionary Algorithm for Color Image Segmentation

Performance Analysis of Differential Evolution Algorithm Variants in Solving Image Segmentation

1 Introduction

2 Classical differential evolution algorithm