Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Image segmentation is the process of decomposing an image into a set of regions which are visually distinct and uniform with respect to some feature. This distinguishing feature may be colour or gray-level information that is used to create histograms, or information about the pixels that indicate edges or boundaries or texture information [10]. There is a wide range of image segmentation techniques available in the literature [20, 24]. Among them, one approach of particular interest is the thresholding approach, because of its efficiency in performance and its theoretical simplicity. A comprehensive survey of image thresholding techniques is found in [26].

Thresholding techniques can be classified as bilevel and multilevel thresholding, depending on number of image segments into which an original image is decomposed. In bilevel thresholding, each image pixel is assigned to one of two brightness regions, object and background, according to whether its intensity (gray level or colour) is greater than a specified threshold \(T\) or not. In multilevel thresholding, pixels can be classified into many classes, not just foreground and background. Therefore, more than one threshold should be determined to segment the image into certain brightness regions which may correspond to one background and several objects.

Among the multitude of image thresholding techniques, entropy-based approaches have drawn a lot of attention in recent times. The principle of entropy, a well-known concept from information theory introduced by Shannon [6], is to use uncertainty as a measure to describe the information contained in a source. The maximum information is achieved when no a priori knowledge is available, in which case, it results in maximum uncertainty.

Basically, entropy thresholding considers an image histogram as a probability distribution, and determines an optimal threshold value that yields the maximum entropy. More specifically, the best entropy thresholded image is the one that preserves as much information as possible that is contained in the original unthresholded image in terms of Shannon’s entropy [4].

The popular criterion for image thresholding based on maximum entropy principle was first applied by Pun [22, 23] and then improved upon in [12]. The concept was later generalized to evolve to Renyi’s entropy [25] and Tsallis’s entropy [33].

However, due to the possible multivalued levels of brightness in a gray-tone image, or inherent vagueness and imprecision embedded in images, the result of image thresholding is not always satisfactory. This uncertainty can be adequately analyzed through the use of fuzzy set theory [1]. This theory, proposed by Zadeh [35], is a mathematical tool to analyze vagueness and uncertainty inherent in making decisions. It has proved its efficiency and usefulness in many applications, including image processing problems. In fact, some fuzzy logic-based thresholding techniques have been proposed in the literature, where fuzzy theory is employed to select an optimal threshold by maximizing the fuzzy entropy [30, 32, 36].

Cheng et al. [5] introduced the concept of fuzzy c-partition into the maximum entropy principle to select the threshold values for gray-level images. This method was first applied for bilevel thresholding and then extended to multilevel thresholding. Tobias and Serra [32] proposed an approach for histogram thresholding which was not based on the minimization of a threshold-dependent criterion function. The histogram threshold was determined according to the similarity between gray levels, assessed through implementation of a fuzzy measure.

In [36], Zhao et al. proposed a new technique for three-level thresholding by exploiting the relationship between the fuzzy c-partition and the probability partition. In their proposed technique, the maximum entropy principle was used to measure the compatibility between fuzzy partition and probability partition. Zhao et al. used the simplest function, which is monotonic in nature, to approximate the memberships of the bright, dark and medium fuzzy sets (defined according to pixel intensity levels) and derived a necessary condition of the entropy function arriving at a maximum. Based on the idea of Zhao et al., Tao et al. [31] designed a new three-level thresholding method for image segmentation. The authors defined a new concept of fuzzy entropy through probability analysis, fuzzy partition and entropy theory. The image is first partitioned into three parts, namely dark, gray and white, whose member functions of fuzzy region are described by a Z-function, a \(\varPi \)-function and a S-function, respectively. Later, Tao et al. [30] developed another system which examined the performance of their previous approach for the segmentation of infrared objects. This approach entailed the use of the ant colony optimization (ACO) method to effectively obtain the optimal combination of the free parameters of the fuzzy sets. The experimental results showed that the implementation of the proposed fuzzy entropy principle by ACO had more highly effective search performance than the genetic algorithm (GA) used in [31].

The present work aims at developing a new three-level thresholding algorithm, called DBBO-Fuzzy, based on the hybridization of biogeography-based optimization (BBO), a very recently proposed population based optimization technique, and the differential evolution (DE) algorithm, a very powerful stochastic optimizer. The approach presented in [31] is adopted as the basic support for this work.

A hybrid DE with BBO, namely DE/BBO, has been proposed in [11] where a hybrid migration operator is defined. Another combination of BBO and DE is proposed in [3], where the population is updated by applying, alternately from an iteration of the algorithm to the next, the BBO and DE updating methods. The proposed DBBO-Fuzzy algorithm incorporates the mutation procedure inherited from DE algorithm to replace the existing mutation procedure in BBO. A selection operator is also introduced in order to favor a given number of individuals for the next generation. In addition, the algorithm incorporates the features of elitism in order to prevent the best solutions from getting corrupted. The proposed algorithm is tested on a popular set of well-known benchmark images, and experimental results show that the proposed approach is reliable and efficient.

First of all, it is necessary to present some basic definitions. Section 3.2 provides the preliminaries of the fuzzy set theory and fuzzy entropy formulation, where the terminology used in [35] has been followed. The three-level thresholding problem is then formulated and the assumptions made in this paper are introduced in Sect. 3.3. Section 3.4 briefly describes the conventional BBO algorithm. The proposed algorithm is presented in Sect. 3.4, and the effectiveness of the proposed algorithm, along with a comparison with BBO and DE-based algorithms, is demonstrated for a set of benchmark images in Sect. 3.6. Finally, Sect. 3.7 presents our conclusions.

2 Fuzzy Set Theory

Fuzzy set theory is a generalization of classical set theory, designed to express uncertainty and imprecision in available knowledge. The theory dates back to 1965, when Lotfi Zadeh, a professor at Berkeley, published his seminal paper on the theory of fuzzy sets and the associated logic, namely fuzzy logic [35].

Essentially, the fuzziness, a feature of imperfect information, results from the lack of crisp distinction between the elements belonging and not belonging to a set.

Definition 3.1

Let \(X\) be a universe of discourse with a generic element denoted by \(x_i\): \(X =\left\{ x_1, x_2, \ldots , x_n\right\} \).

A fuzzy set \(A\) in a space \(X\) is formally defined as:

$$\begin{aligned} A = \left\{ \left(x_i, \mu _A\left(x_i\right)\right)|x \in X\right\} \end{aligned}$$
(3.1)

in which \(\mu _A: X \rightarrow [0, 1]\) is the membership function or characteristic function. This function assigns to each element \(x_i\) in the set a membership grade \(\mu _A\left(x_i\right) \in [0, 1]\). As opposed to classical sets, where each element must have either \(0\) as the membership grade if the element is completely outside the set or 1 if the element is completely in the set, the theory of fuzzy sets is based on the idea that one is uncertain about whether the element is in or out of the set. Thus, the nearer the value of \(\mu _A(x_i)\) to unity, the higher the grade of membership of \(x_i\) in \(A\). The fuzzier case and the more difficult one is when \(\mu _A(x_i)=0.5\) [15].

The fuzzy set theory approach has found interesting applications in automatic control, decision making, pattern recognition, psychology, economics, medical diagnosis, image processing and other fields. A number of aspects of digital image processing have been treated by this theory, such as image quality assessment, edge detection, image segmentation, etc.

2.1 Fuzzy Entropy

A very frequent measure of fuzziness is referred to as the fuzzy entropy inspired by the Shannon entropy of random variables [27] and introduced for the first time by De Luca and Termini [8]. The authors established the following four axioms for fuzzy entropy:

Definition 3.2

Let \(E\) be a set-to-point map \(E: F(X)\rightarrow \left[0, 1\right]\). Hence \(E\) is a fuzzy set defined on fuzzy sets and \(F(X)\) is the family of all fuzzy sets in \(X\). \(E\) is an entropy measure if it satisfies the four De Luca and Termini axioms:

  1. 1.

    \(E\left(A\right) = 0\) iff \(A \in X\) (\(A\) nonfuzzy).

  2. 2.

    \(E\left(A\right) = 1\) (the maximum value) iff \(\mu _{A}\left(x\right) = 0.5\), \(\forall x \in X\).

  3. 3.

    \(E\left(A\right) \le E\left(B\right)\) if \(A\) is less fuzzy than \(B\), i.e., if \(\mu _{A}\left(x\right) \le \mu _{B}\left(x\right)\) when \(\mu _{B}\left(x\right) \le 0.5\), and \(\mu _{A}\left(x\right) \ge \mu _{B}\left(x\right)\) when \(\mu _{B}\left(x\right) \ge 0.5\).

  4. 4.

    \(E\left(A\right) = E\left(A^c\right)\), where \(A^c\) is the complementary set of \(A\).

A number of studies related to the measures of fuzzy entropy and their applications have been conducted by Kaufmann [13], Bhandari and Pal [2], Kaufmann [14], Pal and Pal [19], and Fan and Xie [9].

3 Problem Formulation

3.1 Model of an Image

A digital gray-tone image refers to a two-dimensional light intensity function defined over a spatial coordinate system. Let \(G=\left\{ 0, 1, \ldots , L-1\right\} \) be the set of intensity values, and \(D =\left\{ \left(x,y\right): 0 \le x \le M-1, 0 \le y \le N-1 \right\} \) be the spatial coordinates of the pixels for an \(M\)x\(N\) image. The digital image defines a mapping \(I:D\longrightarrow G\), where \(0\le I(x,y) \le L-1\) gives the intensity (brightness) of the image at the spatial coordinates \((x, y) \in D\) with \(L=256\) gray levels for an \(8\)-bit image [30, 31].

Let \(D_k = \left\{ (x, y) | I (x, y) = k, (x, y) \in D\right\} \), \(k \in G\). The histogram of an image, defined as \(H = \left\{ h_0, h_1, \ldots , h_{L-1}\right\} \), presents the frequency of occurrence of each gray level in the image and is obtained directly from the observation of the considered image. In view of this consideration, the \(kth\) gray level in the image is defined as follows:

$$\begin{aligned} h_k=\frac{n_k}{N*M},\quad k=0, 1, \ldots , L-1 \end{aligned}$$
(3.2)

where \(n_k\) denotes the total number of pixels in \(D_k\), and \(N*M\) denotes the total number of pixels in the image. It is clear that

$$\begin{aligned} 0 \le h_k \le 1 \quad \text{ and} \quad \sum _{k=0}^{L-1} h_k = 1 \end{aligned}$$
(3.3)

A probability partition (PP) of the image domain \(D\) is defined as

$$\varPi _L=\left\{ D_0, D_1, \ldots , D_{L-1}\right\} $$

which is characterized by a probabilistic distribution [30, 31]:

$$\begin{aligned} p_k \equiv P\left(D_k\right) = h_k, \quad k = 0,1,\ldots ,L-1, \end{aligned}$$
(3.4)

Equation (3.4) presents the relationship between the histogram \(H\) and the probability partition \(\varPi \), where \(p_k\) is the probability measure of the occurrence of gray level \(k\).

3.2 Three-Level Thresholding

The segmentation problem is to determine the sets \(D_k \subset D\) \((k=0,\ldots ,L-1)\) whose union is the entire image \(D\). Thus, the sets that constitute the segmentation must satisfy:

$$\begin{aligned} \bigcup _{k=0}^{L-1}D_k = D \quad \text{ and} \quad D_i \cap D_j = \phi \quad (i \ne j) \end{aligned}$$
(3.5)

where \(\phi \) denotes an empty set. Ideally, a segmentation method finds those sets that correspond to distinct anatomical structures or regions of interest in the image.

In the case of three-level thresholding of an image, the aim is to separate its domain \(D\) into three parts, \(E_d\), \(E_m\) and \(E_b\), where \(E_d\) is composed of ‘dark’ pixels corresponding to the smaller gray levels, \(E_b\) is composed of those ‘bright’ pixels corresponding to the larger gray levels, and \(E_m\) is composed of pixels with medium gray levels.

The problem is to find the unknown probabilistic fuzzy 3-partition of \(D\), \(\varPi _3 = \left\{ E_d, E_m, E_b \right\} \), which is characterized by the probability distributions [30, 31]:

$$\begin{aligned} p_d = P(E_d), \quad p_m = P(E_m), \quad p_b = P(E_b) \end{aligned}$$
(3.6)

The three fuzzy partitions \(E_d\), \(E_m\) and \(E_b\) are characterized by three membership functions \(\mu _d\), \(\mu _m\) and \(\mu _b\), respectively. The membership function \(\mu _d\) of dark pixels corresponds to a \(Z\)-function, the membership function \(\mu _m\) of medium pixels is a \(\varPi \)-function, and the membership function \(\mu _b\) of bright pixels of the image is a \(S\)-function [31] (see Fig. 3.1). Six free parameters \(\left\{ a_1, b_1, c_1, a_2, b_2, c_2\right\} \) control the shapes of the three membership functions and satisfy the conditions \(0<a_1 \le b_1 \le c_1 \le a_2 \le b_2 \le c_2 < 255\) for an image with \(256\) gray levels.

Fig. 3.1
figure 1

Membership function graph

The three-level thresholding involves a determination of the optimal thresholds \(T_1\) and \(T_2\) such that the classification of a pixel \(I\left(x, y\right)\) is achieved as follows:

$$\begin{aligned} D_{kd}&=\left\{ \left(x, y\right) : I\left(x, y\right) \le T_1, \left(x, y\right) \in D_k \right\} \nonumber \\ D_{km}&= \left\{ \left(x, y\right) : T_1 < I\left(x, y\right) \le T_2, \left(x, y\right) \in D_k \right\} \nonumber \\ D_{kb}&= \left\{ \left(x, y\right) : I\left(x, y\right) > T_2, \left(x, y\right) \in D_k \right\} \end{aligned}$$
(3.7)

Once the parameters \(a_1\), \(b_1\), \(c_1\), \(a_2\), \(b_2\) and \(c_2\) are selected then \(\varPi _k = \left\{ D_{kd}, D_{km}, D_{kb}\right\} \) is the PP of \(D_k\) with the probabilistic distribution:

$$\begin{aligned} p_{kd}&= P\left(D_{kd}\right) = p_k . p_{d|k} \nonumber \\ p_{km}&= P\left(D_{km}\right) = p_k . p_{m|k}\nonumber \\ p_{kb}&= P\left(D_{kb}\right) = p_k . p_{b|k} \end{aligned}$$
(3.8)

In Eq. (3.8), \(p_{d|k}\) is the conditional probability of a pixel that is classified into the class “d” (dark) under the condition that the pixel belongs to \(D_k\). Similarly, \(p_{m|k}\) and \(p_{b|k}\) are the conditional probabilities of a pixel belonging to classes “m” (medium) and “b” (bright), respectively.

Based on the complete probability formula, we therefore have:

$$\begin{aligned} p_{d}&= \sum _{k=0}^{255} p_k . p_{d|k} = \sum _{k=0}^{255} p_k . \mu _{d}\left(k\right) \nonumber \\ p_{m}&= \sum _{k=0}^{255} p_k . p_{m|k} = \sum _{k=0}^{255} p_k . \mu _{m}\left(k\right) \nonumber \\ p_{b}&= \sum _{k=0}^{255} = p_k . p_{b|k} = \sum _{k=0}^{255} p_k . \mu _{b}\left(k\right) \end{aligned}$$
(3.9)

Based on Eq. (3.9), it is clear that the three-level thresholding problem is reduced to finding suitable membership functions \(\mu _{d}\left(k\right)\), \(\mu _{m}\left(k\right)\) and \(\mu _{b}\left(k\right)\) of a pixel with an arbitrary intensity level \(k\). These membership functions represent the conditional probability that a pixel is classified into the dark, medium and bright regions, respectively, with respect to the variable \(k \in G\) (i.e., \(p_{d|k}=\mu _{d}\left(k\right)\), \(p_{m|k}=\mu _{m}\left(k\right)\) and \(p_{b|k}=\mu _{b}\left(k\right)\)). It is obvious that \(\mu _{d}\left(k\right) + \mu _{m}\left(k\right) + \mu _{b}\left(k\right) = 1\), \(k= 0, 1, \ldots , 255\). The three membership functions are shown in Eqs. (3.10)–(3.12):

$$\begin{aligned} \mu _d(k) = \left\{ \begin{array}{ll} 1&k \le a_1\\ 1-\frac{\left(k-a_1\right)^2}{\left(c_1-a_1\right)*\left(b_1-a_1\right)}&a_1 < k \le b_1\\ \frac{\left(k-c_1\right)^2}{\left(c_1-a_1\right)*\left(c_1-b_1\right)}&b_1 < k \le c_1\\ 0&k> c_1 \end{array}\right. \end{aligned}$$
(3.10)
$$\begin{aligned} \mu _m(k) = \left\{ \begin{array}{ll} 0&k \le a_1\\ \frac{\left(k-a_1\right)^2}{\left(c_1-a_1\right)*\left(b_1-a_1\right)}&a_1 < k \le b_1\\ 1-\frac{\left(k-c_1\right)^2}{\left(c_1-a_1\right)*\left(c_1-b_1\right)}&b_1 < k \le c_1\\ 1&c_1 < k \le a_2\\ 1-\frac{\left(k-a_2\right)^2}{\left(c_2-a_2\right)*\left(b_2-a_2\right)}&a_2 < k \le b_2\\ \frac{\left(k-c_2\right)^2}{\left(c_2-a_2\right)*\left(c_2-b_2\right)}&b_2 < k \le c_2\\ 0&k> c_2 \end{array}\right. \end{aligned}$$
(3.11)
$$\begin{aligned} \mu _b(k) = \left\{ \begin{array}{ll} 0&k \le a_2\\ \frac{\left(k-a_2\right)^2}{\left(c_2-a_2\right)*\left(b_2-a_2\right)}&a_2 < k \le b_2\\ 1-\frac{\left(k-c_2\right)^2}{\left(c_2-a_2\right)*\left(c_2-b_2\right)}&b_2 < k \le c_2\\ 1&k> c_2 \end{array}\right. \end{aligned}$$
(3.12)

The free parameters of the three membership functions can be determined by maximizing the fuzzy partition entropy [30, 31]. The total fuzzy entropy function of partition \(\varPi _3\) is defined as:

$$\begin{aligned} H\left(a_1, b_1, c_1, a_2, b_2, c_2\right) = H_d + H_m + H_b \end{aligned}$$
(3.13)

where,

$$\begin{aligned} H_{d}&= -\sum _{k=0}^{255} \frac{p_k * \mu _d\left(k\right)}{p_d} * \ln \left(\frac{p_k * \mu _d\left(k\right)}{p_d}\right) \nonumber \\ H_{m}&= -\sum _{k=0}^{255} \frac{p_k * \mu _m\left(k\right)}{p_m} * \ln \left(\frac{p_k * \mu _m\left(k\right)}{p_m}\right)\nonumber \\ H_{b}&= -\sum _{k=0}^{255} \frac{p_k * \mu _b\left(k\right)}{p_b} * \ln \left(\frac{p_k * \mu _b\left(k\right)}{p_b}\right) \end{aligned}$$
(3.14)

The best-selected set of \(\left\{ a_1, b_1, c_1, a_2, b_2, c_2\right\} \) is the one that corresponds to maximum entropy \(H\).

The optimal thresholds \(T_1\) and \(T_2\) that segment the image into three gray levels are obtained as the intersections of the membership function curves, i.e.,

$$\begin{aligned} \mu _d\left(T_1\right) = \mu _m\left(T_1\right) = 0.5 \end{aligned}$$
(3.15)
$$\begin{aligned} \mu _m\left(T_2\right) = \mu _b\left(T_2\right) = 0.5 \end{aligned}$$
(3.16)

Based on Eqs. (3.10)–(3.12), it can be written [31]:

$$\begin{aligned} T_1 = \left\{ \begin{array}{ll} a_1+ \sqrt{\left(c_1-a_1\right)*\left(b_1-a_1\right)/2}&\quad \text{ if}\; \left(a_1+c_1\right)/2 \le b_1 \le c_1\\ c_1- \sqrt{\left(c_1-a_1\right)*\left(c_1-b_1\right)/2}&\quad \text{ if}\; a_1 \le b_1 <\left(a_1+c_1\right)/2 \end{array}\right. \end{aligned}$$
(3.17)
$$\begin{aligned} T_2 = \left\{ \begin{array}{ll} a_2+ \sqrt{\left(c_2-a_2\right)*\left(b_2-a_2\right)/2}&\quad \text{ if}\; \left(a_2+c_2\right)/2 \le b_2 \le c_2\\ c_2- \sqrt{\left(c_2-a_2\right)*\left(c_2-b_2\right)/2}&\quad \text{ if}\; a_2 \le b_2 <\left(a_2+c_2\right)/2 \end{array}\right. \end{aligned}$$
(3.18)

As mentioned before, to find the optimal combination of all the fuzzy parameters, we propose to use a new optimization technique based on the hybridization of BBO and DE algorithms.

4 Biogeography-Based Optimization Algorithm (BBO)

The theory of biogeography grew out of the works of Wallace [34] and Darwin [7] in the past and the works of McArthur and Wilson [17] more recently. Some of the key questions that this branch of biology attempts to answer are: How do organisms reach their current habitats? Do they always occupy their current distribution patterns? Why does an ecosystem have a particular number of species? The patterns of the distribution of the species across geographical areas can usually be explained through a combination of historical factors, such as speciation, extinction and migration.

The biogeography-based optimization (BBO) algorithm, developed by Dan Simon [28], is strongly influenced by the equilibrium theory of island biogeography [17]. The basic premise of this theory is that the rate of change in the number of species on an island depends critically on the balance between the immigration of new species onto the island and the emigration of species from the island.

The BBO algorithm operates upon a population of individuals called islands (or habitats). Each habitat represents a possible solution for the problem at hand. The fitness of each habitat is determined by its habitat suitability index (HSI), a metric which determines the goodness of a candidate solution, and each habitat feature is called a suitability index variable (SIV). Good solutions may have large number of species, which represent habitats with a lower HSI than the poor solutions.

As was mentioned before, the migration pattern is determined by the immigration rate (\(\lambda \)) at which new species immigrate to the habitat, and the emigration rate (\(\mu \)) at which populations of established species emigrate. These parameters are functions of the number of species in a habitat.

But how might immigration and emigration work on a habitat? We make two sets of assumptions regarding these processes:

Immigration: :

The rate of immigration (\(\lambda \)) declines with the number of species (\(S\)) present on the habitat. Maximum immigration rate (\(I\)) occurs when the habitat is empty and decreases as more species are added. Once all the potential colonists are on the habitat, then one can write \(S = S_{max}\) (the maximum number of species the habitat can support), and the immigration rate must be equal to zero. Generally speaking, the immigration rate when there are \(S\) species in the habitat is given by:

$$\begin{aligned} \lambda _S = I \left(1-\frac{S}{S_{max}}\right) \end{aligned}$$
(3.19)
Emigration: :

The rate of emigration (\(\mu \)) for a habitat increases with the number of species (\(S\)). Maximum emigration rate (\(E\)) occurs when all possible species are present on the habitat (i.e., when \(S=S_{max}\)), and must be zero when no species are present. Generally speaking, the emigration rate when there are \(S\) species in the habitat is given by:

$$\begin{aligned} \mu _S = E\left(\frac{S}{S_{max}}\right) \end{aligned}$$
(3.20)

Figure 3.2 graphically represents the relationships between the number of species, emigration rate \(\mu \) and immigration rate \(\lambda \). Over a period of time, the counteracting forces of emigration and immigration result in an equilibrium level of species richness. The equilibrium value \(S^*\) is the point at which the rate of arrival of species \(\lambda \) is exactly matched by the rate of emigration \(\mu \). We have assumed here that \(\mu \) and \(\lambda \) vary following linear relationships, but different mathematical models of biogeography that include more complex variations have also been proposed [17].

Fig. 3.2
figure 2

The relationship between the fitness of habitats (number of species), emigration rate \(\mu \) and immigration rate \(\lambda \)

We now consider the probability \(P_s\) that the habitat contains exactly \(S\) species. The number of species changes from time \(t\) to time \((t+\varDelta t)\) as follows [28]:

$$\begin{aligned} P_s(t+\varDelta t) = P_s(t)(1-\lambda _s\varDelta t - \mu _s \varDelta t) + P_{s-1}\lambda _{s-1}\varDelta t + P_{s+1}\mu _{s+1} \varDelta t \end{aligned}$$
(3.21)

This states that the number of species on the habitat in one time step is based on the total number of current species on the habitat, the new immigrants and the number of species that leave the habitat during this time period. We assume here that \(\varDelta t\) is small enough so that the probability of more than one immigration or emigration can be ignored. In order to have \(S\) species at time \((t+\varDelta t)\), one of the following conditions must hold:

  • There were \(S\) species at time \(t\), and no immigration or emigration occurred between \(t\) and \((t+\varDelta t)\);

  • One species immigrated onto a habitat already occupied by \(S-1\) species at time \(t\).

  • One species emigrated from a habitat occupied by \(S+1\) species at time \(t\).

The limit of Eq. (3.21) as \(\varDelta t \rightarrow 0\) is given by Eq. (3.22).

$$\begin{aligned} \dot{P}_s = \left\{ \begin{array}{ll} -(\lambda _s + \mu _s)P_S + \mu _{s+1}P_{s+1}&\quad \text{ if}\; S=0\\ -(\lambda _s + \mu _s)P_S + \lambda _{s-1}P_{s-1}+\mu _{s+1}P_{s+1}&\quad \text{ if}\;1\le S \le S_{max}-1\\ -(\lambda _s + \mu _s)P_S + \lambda _{s-1}P_{s-1}&\quad \text{ if}\; S = S_{max} \end{array} \right. \end{aligned}$$
(3.22)

Equation (3.22) can be arranged into a single matrix form:

$$\begin{aligned} \left[\begin{array}{l} {\dot{P}}_{0}\\ {\dot{P}}_{1}\\ \vdots \\ \vdots \\ \dot{P}_{n}\\ \end{array}\right]&= \left[\begin{array}{lllll} -\left(\lambda _{0} + \mu _{0}\right)&\mu _{1}&0&\ldots&{} 0\\ \lambda _{0}&-\left(\lambda _{1} + \mu _{1}\right)&\mu _{2}&\ldots&{} \vdots \\ \vdots&\ddots&\ddots&\ddots&{} \vdots \\ \vdots&\ddots&\lambda _{n-2}&-\left(\lambda _{n-1} + \mu _{n-1}\right)&{} \mu _{n} \\ 0&\ldots&0&\lambda _{n-1}&-\left(\lambda _{n} + \mu _{n}\right)\\ \end{array}\right] \nonumber \\&\qquad \left[\begin{array}{l} P_{0}\\ P_{1}\\ \vdots \\ \vdots \\ P_{n}\\ \end{array}\right] \end{aligned}$$
(3.23)

For notational brevity, we simply write \(n=S_{max}\).

The BBO algorithm can be described overall by Algorithm 3.1. The two basic operators which govern the working of BBO are the migration, described in Algorithm 3.2, and the mutation, described in Algorithm 3.3, where \(rand(0, 1)\) is a uniformly distributed random number in the interval \([0, 1]\); \(X_{ij}\) is the \(j\)th SIV of the solution \(X_i\).

The likelihood that a given solution \(S\) is expected a priori to exist as a solution for the given problem is indicated by the species count probability \(P_s\). In this context it should be remarked that very high HSI solutions and very low HSI solutions are both equally improbable. Medium HSI solutions are relatively probable. If a given solution has a low probability, then it is likely to mutate to some other solution. Conversely, a solution with high probability is less likely to mutate. The mutation rate \(m(S)\) is inversely proportional to the solution probability:

$$\begin{aligned} m(S) = m_{max}\left(1-\frac{P_S}{P_{max}}\right) \end{aligned}$$
(3.24)

where \(m_{max}\) is a user-defined parameter, and \(P_{max} = \max \limits _{S} P_S\), \(S = 1\), ..., \(S_{max}\).

Migration is used to modify existing habitats by mixing features within the population. Mutation is used to enhance diversity of the population, thereby preventing the search from stagnating. If a habitat \(S\) is selected to execute the mutation operation, then a chosen variable (\(SIV\)) is randomly modified based on its associated probability \(P_S\). At the same time, the concept of elitism (i.e., copying some of the fittest individuals for the next generation) is also applied.

figure a1
figure a2
figure a3

5 Description of the Proposed DBBO-Fuzzy Algorithm

Motivated by the exploration capabilities of the differential evolution (DE) algorithm [18], a hybrid method combining the exploitation of BBO with the exploration of DE is proposed in this paper. The purpose of this hybridization is to benefit from the advantages of each algorithm and to compensate for each algorithm’s weaknesses.

In order to find the global solution in a better manner than the BBO algorithm, the proposed algorithm, named the DBBO-Fuzzy algorithm, replaces the BBO-based mutation by a DE mutation. In addition, a selection operation is introduced in order to favor a given number of individuals for the next generation.

The proposed algorithm can be summarized as follows:

 :

1. Initialization The algorithm starts with an initial population of NP search variable vectors (or habitats), where the problem dimension \(D\) is the number of fuzzy parameters. For the three-level thresholding problem, six parameters \(\left\{ a_1, b_1, c_1, a_2, b_2, c_2\right\} \) are used (i.e., \(D=6\)). Since the habitats are likely to get modified over different generations, the following notation may be adopted for representing the \(i\)th habitat of the population at the current generation \(g\) as:

$$\begin{aligned} \mathbf{X}_{i,g} = (X_{i,1, g}, X_{i,2, g}, X_{i,j, g}, \ldots ,X_{i,D, g}) \end{aligned}$$
(3.25)

where \(i=1, \ldots , NP\), \(j = 1, \ldots , D\) and \(X_{i,j}\) is the \(j\)th SIV of the habitat \(\mathbf{X}_i\). Each decision variable, \(X_{i,j,g}\), is randomly initialized within its corresponding lower bound (\(L_j\)) and upper bound (\(U_j\)), and it is intended to cover the entire search space uniformly in the form:

$$\begin{aligned} X_{i,j, 0} =L_j + rand(0, 1) \times (U_j - L_j) \end{aligned}$$
(3.26)
 :

2. Evaluation of the objective function: The objective function values of the habitats are evaluated using the fuzzy entropy function given by Eq. (3.13). The objective function has six parameters (SIV) \(a_1\), \(b_1\), \(c_1\), \(a_2\), \(b_2\), \(c_2\), which satisfy the conditions \(0<a_1 \le b_1 \le c_1 \le a_2 \le b_2 \le c_2 < 255\).

 :

3. Migration: The migration operator reproduces a new population vector \(\mathbf{M}_{i,g}\) as follows:

$$\begin{aligned} M_{i,j,g} = \left\{ \begin{array}{ll} X_{k,j,g}&\text{ if}\; rand(0, 1)< \lambda _i\\ X_{i,j,g}&\text{ otherwise} \end{array} \right. \end{aligned}$$
(3.27)

where \(i=1, 2, \ldots , NP\), \(j = 1, \ldots , D\) and \(X_{k,j,g}\) is the \(j\)th decision variable of a randomly selected habitat \(\mathbf{X}_{k,g}\). \(\mathbf{X}_{k,g}\) is selected with a probability based on \(\mu _k\).

 :

4. DE Mutation: The DBBO-Fuzzy incorporates the mutation procedure inherited from DE algorithm [21, 29] to replace the existing mutation procedure in BBO. The mutation is performed by calculating weighted vector differences between other randomly selected habitats of the same population. A differentiation constant \(F\) is used to control the amplification of the differential variation. Next, five different mutation schemes are outlined, inspired by the suggestions of Price et al. [21]. The general convention used to name the different DE schemes is \(DE/x/y\). Here \(DE\) stands for differential evolution, \(x\) represents a string denoting the type of the vector to be perturbed (whether it is randomly selected or it is the best vector in the population with respect to fitness value) and \(y\) is the number of difference vectors considered for perturbation of \(x\). The mutation operation constructs, for each habitat \(\mathbf{M}_{i, g}\), a mutant habitat \(\mathbf{V}_{i,g}\) according to one of the following mutation schemes:

  • DE/rand/1: This mutation scheme uses a randomly selected habitat \(\mathbf{M}_{r_1, g}\), and only one weighted difference vector \(F.(\mathbf{M}_{r_2, g}-\mathbf{M}_{r_3, g})\) is used to perturb it.

    $$\begin{aligned} \mathbf{V}_{i, g}= \mathbf{M}_{r_1, g}+ F.(\mathbf{M}_{r_2, g}-\mathbf{M}_{r_3,g}) \end{aligned}$$
    (3.28)
  • DE/current to best/1: Here the mutant habitat is created using any two randomly selected habitats of the population as well as the best habitat in the current generation.

    $$\begin{aligned} \mathbf{V}_{i, g}= \mathbf{M}_{i, g} + F.(\mathbf{M}_{best, g}-\mathbf{M}_{i, g})+ F.(\mathbf{M}_{r_1,g}-\mathbf{M}_{r_2, g}) \end{aligned}$$
    (3.29)
  • DE/best/1: Here the habitat to be perturbed is the best habitat of the current population and the perturbation is caused by using a single difference vector.

    $$\begin{aligned} \mathbf{V}_{i, g}= \mathbf{M}_{best, g}+ F.(\mathbf{M}_{r_1, g}-\mathbf{M}_{r_2,g}) \end{aligned}$$
    (3.30)
  • DE/rand/2: to create \(\mathbf{V}_{i, g}\) for each \(i\)th habitat, a total of five other distinct habitats (say the \(r_1\), \(r_2\), \(r_3\), \(r_4\), and \(r_5{th}\) habitats) are chosen in a random manner from the current population.

    $$\begin{aligned} \mathbf{V}_{i, g}= \mathbf{M}_{r_1, g}+ F.(\mathbf{M}_{r_2, g}-\mathbf{M}_{r_3, g})+ F.(\mathbf{M}_{r_4, g}-\mathbf{M}_{r_5, g}) \end{aligned}$$
    (3.31)
  • DE/best/2: in this mutation scheme, the mutant habitat is formed by using two difference vectors, as shown below:

    $$\begin{aligned} \mathbf{V}_{i, g}= \mathbf{M}_{best, g}+ F.(\mathbf{M}_{r_1, g}-\mathbf{M}_{r_2, g})+ F.(\mathbf{M}_{r_3,g}-\mathbf{M}_{r_4, g}) \end{aligned}$$
    (3.32)

where the indices \(r_1, r_2, r_3, r_4, r_5\) are randomly chosen over the interval \([1, NP]\) and should be mutually different from the running index \(i\). \(F\) is a real constant scaling factor within the range \([0, 2]\), usually chosen to be less than 1. \(\mathbf{M}_{best,g}\) is the habitat with best fitness value in the population in generation \(g\).

  • 5. Selection: The values of the objective function are calculated for the updated habitats. The selection operation selects either a habitat \(\mathbf{X}_{i,g}\) or its newly updated habitat \(\mathbf{V}_{i,g}\) to survive as a member for the next generation, according to the fitness value. For the following generation \(g+1\), new habitats \(\mathbf{X}_{i,g+1}\) are selected according to the following selection rule:

    $$\begin{aligned} \mathbf{X}_{i, g+1} = \left\{ \begin{array}{ll} \mathbf{V}_{i,g}&\text{ if}\; f(\mathbf{V}_{i,g}) < f(\mathbf{X}_{i,g})\\ \mathbf{X}_{i,g}&\text{ if}\;f(\mathbf{V}_{i,g}) > f(\mathbf{X}_{i,g}) \end{array} \right. \end{aligned}$$
    (3.33)

    The best new habitat replaces the worst corresponding one in the current population only if \(\mathbf{V}_{i,g}\) is better than \(\mathbf{X}_{i,g}\). This concept is similar to what happens in nature for longer-living species, where the offspring and parents are alive concurrently and have to compete.

  • 6. Boundary constraints: If the variable value \(X_{i,j,g}\) violates the boundary constraints, the corresponding violating variable value is randomly generated within the boundary constraints as follows:

    $$\begin{aligned} X_{i,j,g} =L_j + rand(0, 1) \times (U_j - L_j) \end{aligned}$$
  • 7. Stopping criteria: If the stopping criteria are met, the vector represented by the best habitat contains the optimal combination of fuzzy parameter values that maximize the total fuzzy entropy function of partition \(\varPi _3\). Otherwise, the procedure is repeated from step \(3\).

6 Experimental Settings and Results

6.1 Test Images

The performance of the proposed DBBO-Fuzzy algorithm is compared with those of the basic BBO algorithm [28], named here as BBO-Fuzzy, and the performance of DE-Fuzzy algorithm, which proceeds exactly as the original algorithm presented in [21].

The performance of these competing three-level thresholding algorithms are tested with a set of 12 benchmark images, each with 256 gray levels. These are commonly known as Lena, Peppers, Cameraman, Airplane, Lake, Walking bridge, Mandrill, Barbara, Boat, Elaine, GoldHill and Fingerprint. All the images are of size \(512 \times 512\) pixels. The original images considered are shown in Fig. 3.3.

Fig. 3.3
figure 3

Original test images: a Lena, b Peppers, c Cameraman, d Airplane, e Lake, f Walking bridge, g Mandrill, h Barbara, i Boat, j Elaine, k GoldHill, l Fingerprint

6.2 Test Design

In all experiments, the same parameter values are used for each of the three aforementioned algorithms to make a fair comparison. A population of \(20\) individuals is used; these are evolved during \(20\) generations. For each test image, ten independent runs are carried out.

For the DE-Fuzzy algorithm, \(F = 0.5\) and \(CR = 0.9\) are chosen, as recommended in [29]. For fair performance comparison, the same mutation scheme, DE/rand/1, is adopted for both the DE-Fuzzy and the DBBO-Fuzzy algorithms. For the BBO-Fuzzy algorithm, the same parameter settings as in [28] are used. In Table 3.1, the parameter setup used in the experiments conducted is summarized.

Table 3.1 DBBO-fuzzy parameters

6.3 Results and Discussions

The three-level thresholding algorithms are employed to determine the optimal thresholds \(T_1\) and \(T_2\) that segment a given image into three gray levels while preserving the original information as much as possible after creation of this partition.

The objective is to maximize the total fuzzy entropy\(H\left( b_1, c_1, a_2, b_2, c_2\right)\), given in Eq. (3.13), and then to find the “optimal” combination of all the fuzzy parameters \(\left(a_1, b_1, c_1, a_2, b_2, c_2\right)\) that produce the maximization of fuzzy entropy. The higher value of objective function results in better segmentation.

The results are also compared based on the uniformity factor, the most commonly used measure to quantitatively judge the segmentation quality. This uniformity measure is defined as [16]:

$$\begin{aligned} u = 1-2*c*\frac{\sum _{j=0}^{c}\sum _{j \in R_j} \left(f_i-\mu _j\right)^2}{N*\left(f_{max}-f_{min}\right)^2} \end{aligned}$$
(3.34)

where, \(c\) number of thresholds, \(R_j\) \(j\)th segmented region, \(f_i\) gray level of the pixel \(i\), \(\mu _j\) mean gray level of pixels in \(j\)th region, \(N\) total number of thresholds in the given image, \(f_{max}\) maximum gray level of pixels in the given image, \(f_{min}\)minimum gray level of pixels in the given image

The value of this uniformity measure should be a value within the interval \([0,1]\). The higher the value of \(u\), better the quality of the thresholded image.

Table 3.2 Comparison of DBBO-fuzzy, BBO-fuzzy and DE-fuzzy, where boldface indicates the best performing algorithm
Table 3.3 Optimal threshold and uniformity factor values obtained by DBBO-fuzzy, BBO-fuzzy and DE-fuzzy, where boldface indicates the best performing algorithm

Table 3.2 shows the average objective values (i.e., total fuzzy entropy) achieved by each algorithm for each image under test. The values in boldface describe the best-performing algorithm among competing algorithms. It is observed from the obtained results that the proposed DBBO-Fuzzy method obtains higher fuzzy entropy values than both BBO-Fuzzy and DE-Fuzzy in all test images.

In order to determine whether the differences between the DBBO-Fuzzy algorithm and the BBO-Fuzzy and DE-Fuzzy algorithms are statistically significant, a two-tailed t-test was conducted with \(df = 10 + 10 - 2 = 18\) degrees of freedom at \(\alpha = 0.05\) level of significance (i.e., at \(95\,\%\) confidence level). The average objective values and the standard deviations obtained by each algorithm over ten independent runs were used to calculate the t-values. The absolute value of the computed t is found to be larger than the critical value in all test images. This suggests that, with \(95\,\%\) confidence, the difference between DBBO-Fuzzy algorithm and the other two competing algorithms is statistically significant. Therefore, it is evident that the hybridization of the BBO algorithm with DE has noticeable effect on the performance of both algorithms. In these tests, 1 versus 2 means DBBO-Fuzzy algorithm versus BBO-Fuzzy algorithm, and 1 versus 3 means DBBO-Fuzzy algorithm versus DE-Fuzzy algorithm.

Table 3.3 shows the optimal thresholds obtained and the uniformity factor values attained using DBBO-Fuzzy, BBO-Fuzzy and DE-Fuzzy methods. One can observe from the results reported in Table 3.3 that the solution quality of DBBO-Fuzzy is superior to BBO-Fuzzy and DE-Fuzzy in \(11\) out of \(12\) images. The only exception is the Cameraman image, where DBBO-Fuzzy produces a slightly worse uniformity factor than DE-Fuzzy. From this point of view also, the DBBO-Fuzzy algorithm stands out as the clear winner.

Table 3.4 presents representative optimal parameter sets \(\left\{ a_1, b_1, c_1, a_2, b_2, c_2\right\} \) obtained by employing DBBO-Fuzzy, BBO-Fuzzy and DE-Fuzzy algorithms.

For a visual interpretation of the three level thresholding results, the thresholded images obtained by applying DBBO-Fuzzy algorithm are presented in Fig. 3.4. After determination of the thresholds for each image, the gray levels of all pixels in a given region are changed to the average gray level of all the pixels belonging to that region.

Table 3.4 Representative optimal parameter sets \(\left(a_1, b_1, c_1, a_2, b_2, c_2\right)\)
Fig. 3.4
figure 4

The three-level thresholded images using DBBO-Fuzzy: a Lena, b Peppers, c Cameraman, d Airplane, e Lake, f Walking bridge, g Mandrill, h Barbara, i Boat, j Elaine, k GoldHill, l Fingerprint

Figures 3.5, 3.6 and 3.7 show some sample images under consideration and the resultant segmented images obtained after employing the DBBO-Fuzzy, BBO-Fuzzy and DE-Fuzzy algorithms. From the pictorial representations of the segmented images, it is clear that DBBO-Fuzzy algorithm emerges as the best performer.

Fig. 3.5
figure 5

The three-level thresholded images of peppers: a original image; b thresholded image using DBBO-Fuzzy; c thresholded image using BBO-Fuzzy; d thresholded image using DE-Fuzzy

Fig. 3.6
figure 6

The three-level thresholded images of boat: a original image; b thresholded image using DBBO-fuzzy; c thresholded image using BBO-fuzzy; d thresholded image using DE-fuzzy

Fig. 3.7
figure 7

The three-level thresholded images of goldhill: a original image; b thresholded image using DBBO-fuzzy; c thresholded image using BBO-fuzzy; d thresholded image using DE-fuzzy

6.3.1 Effect of the Mutation Strategy

To make a detailed, in-depth study of the DBBO-Fuzzy algorithm, it was tested employing different mutation schemes, described in Sect. 3.5, in order to investigate the effects of the mutation strategy on its performance. The results are reported in Table 3.5. To visualize the best performing scheme, the best values are given in boldface.

A closer look at Table 3.5 reveals that out of the \(12\) test images, the DBBO-Fuzzy algorithm with DE/best/1 mutation strategy emerged as the best candidate algorithm, since it could achieve the highest values of the fuzzy entropy in eight cases (i.e., Lena, Peppers, Cameraman, Airplane, Lake, Mandrill, Elaine and Fingerprint). The DBBO-Fuzzy algorithm with DE/rand/2 proved to be the winner in only two cases (i.e., Walking bridge and Boat). The DBBO-Fuzzy algorithm with DE/rand/1 and DE/current-to-best/1 mutation schemes perform best in only one case each (i.e., GoldHill and Barbara, respectively).

In terms of the best uniformity measure value, DBBO-Fuzzy with DE/rand/1 produced the highest values in \(6\) test images out of \(12\) images (i.e., Lena, Cameraman, Walking bridge, Mandrill, Boat and Fingerprint). For the remaining six images (i.e., Peppers, Airplane, Lake, Barbara, Elaine and GoldHill), the DBBO-Fuzzy algorithm with DE/current-to-best/1 mutation scheme achieved the highest uniformity measure values.

All the variations of the DBBO-Fuzzy algorithm were compared to a default DBBO-Fuzzy algorithm with DE/rand/1, and differences were reported as significant if a two-tailed t-test produced a t-value larger than the critical value. The significance level \(\alpha \) is set at \(0.05\).

Significant differences exist when comparing DBBO-Fuzzy with DE/rand/1 mutation strategy with the variant using DE/current-to-best/1 scheme for two test images (Airplane and Lake). The improvement in the mean objective function value obtained when using DE/best/1 mutation strategy is more significant in the case of Airplane image. For DBBO-Fuzzy with DE/rand/2 mutation, the test has provided a statistically significant difference for the Lake image. In eight out of \(12\) cases, DBBO-Fuzzy with DE/best/2 mutation proves to be significantly different compared to the variant using DE/rand/1 mutation.

However, in general, it can be noted that the results using different mutation schemes do not significantly affect the performance of the proposed algorithm.

Table 3.5 Comparison of DBBO-fuzzy, BBO-fuzzy and DE-fuzzy, where the best values are given in boldface
Fig. 3.8
figure 8figure 8

Effect of varying the population size on the performance of DBBO-fuzzy: a Lena, b Peppers, c Cameraman, d Airplane, e Lake, f Walking bridge

Fig. 3.9
figure 9figure 9

Effect of varying the elitism parameter on the performance of DBBO-fuzzy: a Lena, b Peppers, c Cameraman, d Airplane, e Lake, f Walking bridge, g Mandrill, h Barbara, i Boat, j Elaine, k GoldHill, l Fingerprint

6.3.2 Effect of the Population Size

Simulations have also been carried out for different values of population size \(NP\), and the performance of the DBBO-Fuzzy algorithm are shown for different variations in \(NP\) in Fig. 3.8.

In general, it can be inferred that the DBBO-Fuzzy algorithm with a population size \(NP = 20\) outperforms the other variants, since it could achieve the highest value of fuzzy entropy in \(11\) cases out of \(12\). Only in one case was DBBO-Fuzzy with \(NP=30\) reveals to be the best in terms of fuzzy entropy value. With \(NP=20\), the DBBO-Fuzzy variant could achieve the highest value of uniformity factor in eight cases. On the other hand, with \(NP = 10\), the DBBO-Fuzzy algorithm could achieve the highest value of uniformity factor in two cases (Airplane and GoldHill) and in one case each with \(NP=50\) (Cameraman) and \(NP=30\) (Elaine).

These results suggest that blindly increasing the population size may not have a relevant positive effect on the performance of the DBBO-Fuzzy algorithm.

6.3.3 Effect of the Elitism Parameter

The performance of the proposed algorithm is also evaluated in detail by varying the elitism parameter \(n_{elit}\). In general, one can observe from Fig. 3.9 that the average objective function values tend to increase with the number of elites, although the uniformity measure does not follow the same trend in all test images. In nine cases out of \(12\), DBBO-Fuzzy with \(n_{elit}=8\) produced highest values of the fuzzy entropy. Only in two cases, the DBBO-Fuzzy variant with \(n_{elit}=6\) and in one case the DBBO-Fuzzy variant with \(n_{elit}=4\) achieved the highest values of fuzzy entropy. On the other hand, DBBO-Fuzzy without elitism (\(n_{elit}=0\)) could never achieve the highest value for fuzzy entropy but produced good uniformity factor values in three cases out of \(12\) (i.e., Airplane, Mandrill and Boat).

Finally, an important general remark should be made here that if there is a conflicting choice between a higher fuzzy entropy value and a higher uniformity factor value, higher priority should be given to the solution having higher uniformity factor, as it quantitatively reflects the direct impact of the quality of the output segmented image.

7 Conclusion

Image segmentation is a process of partitioning an image space into several homogeneous regions. Thresholding is one of the most widely used techniques in image segmentation because of its fast and easy application. However, it has been proven that thresholding often fails to produce satisfactory segmentation results due to grayness and spatial ambiguity in images. The use of fuzzy set theory could be recognized as an adequate mathematical tool that can be used to model the inherent image vagueness. A new three-level thresholding algorithm based on the hybridization of BBO and the DE algorithms, called the DBBO-Fuzzy algorithm, has been described in this chapter. The DBBO-Fuzzy uses the DE mutation strategy to improve the global search capability and escape from local optima. The experimental results manifest that the proposed algorithm outperforms both BBO and DE algorithms and achieves a high quality of the thresholded images.

The future work will mainly focus on employing a multiobjective approach for such image segmentation problems. Instead of considering a single objective function, a biobjective model could be adopted for the three-level thresholding problem, in which one seeks to optimize simultaneously the total fuzzy entropy and the uniformity factor.