Keywords

1 Introduction

In recent years, the intelligent systems that depend on machine learning and pattern recognition are widely used in numerous fields. These include the application of face and voice recognition, objects identification, computer vision, and so on. Nevertheless, the researchers are still working to improve the accuracy of these systems, especially when they are used in real-time environments. When these systems acquire their data from images, they should use image processing techniques to prepare and process the images to be able to identify and recognize the objects on them. Image segmentation is an essential phase in this stage. It works for splitting an image into segments with similar features (i.e., color, contrast, brightness, texture, and gray level) based on a predefined criterion [1]. Image segmentation has been applied in several applications such as medical diagnosis [2], satellite image [3], and optical character recognition [4]. However, it could be a complex process if the images are corrupted by noises from environments or equipment. There are many methods for applying image segmentation, such as edge detection [5], region extraction [6], histogram thresholding, and clustering algorithms [7]; as well as, threshold segmentation [8], it is one of the popular methods for performing this task to locate the best threshold value [9, 10]; it can be divided into two types: bi-level which can be used to produce two groups of objects and multilevel that used to segment complex images and separate pixels into multiple homogeneous classes (regions) based on intensity [1, 11]. Bi-level thresholding method can produce adequate outcomes in cases where the image includes two levels only, however, if it has been used with multilevel the computational time will be often high [12]. On the other hand, the results of bi-level thresholding are not suitable to real application images; so, there is a wide requirement to use multilevel thresholding [11]. There are two methods to determine the thresholds, namely, a global and local level. In a local level, thresholds are determined for each portion of the image; on the other hand, at a global level, one threshold is taken to the whole image [13]. So, by using the image histogram, the global thresholding can be determined. Several thresholding methods explore for the thresholds by optimizing some fitness functions that are defined from images and they handle the determined thresholds as parameters. So, the determination of optimal thresholds in multilevel thresholding is an NP-hard problem [14]. Many methods analyze the image histogram to determine the optimal thresholds, by either minimizing or maximizing a fitness function with consideration of the values of threshold.

When the number of thresholds is small, classical methods are acceptable; but if there are several threshold numbers, it is a best practice to perform a swarm intelligence (SI) technique to optimize this task, such as, genetic algorithm (GA), particle swarm optimization (PSO), firefly optimization (FFO), and bat algorithm.

Jie et al. (2013) [15] introduced a multi-threshold segmentation method that utilized k-means and firefly optimization algorithm (FA). The results showed that the proposed method obtained a low run-time and higher performance than the classical fast FCM and PSO-FFCM models. In the same effort, Chaojie et al. (2013) [16] proposed a method based on FA that outperformed GA algorithm.

Vishwakarma et al. (2014) [17] compared their proposed model that based on FA with the classical K-means clustering algorithm and the model achieved the best results. Sarkar (2011) [18] presented a technique based on differential evolution for multilevel thresholding using minimum cross entropy thresholding (MCET). It was applied to some of the real images and the results showed high efficiency than PSO and GA. Moreover, Fayad et al. [19] proposed a segmentation model based on ACO algorithm. It achieved good results and small errors in comparison to the ground truth. On the other hand, Abd ElAziz et al. [20] introduced a hybrid model that combined SSO and FA (FASSO) for image segmentation. It showed faster convergence and lower preprocessing time. The PSO and its edition [21,22,23,24,25,26] are implemented in image segmentation to locate the multilevel thresholding. Moreover, there are several swarm techniques that applied for segmentation including honey bee mating optimization (HBMO) [27], harmony search (HS) algorithm [28], cuckoo search (CS) [29], and artificial bee colony (ABC) [30, 31]. However, most of these techniques are either trapped on local optima or predefined control parameters such as GA, PSO, CS, and HS algorithms.

In this chapter, we present a new multilevel thresholding method for image segmentation method. The multilevel thresholding is considered as multi-objective optimization problem, in which the popular two image segmentation functions namely, Otsu’s and entropy are used as the fitness function which optimized by the whale optimization algorithm. The properties of these two functions are used to improve the accuracy of image segmentation via multilevel thresholding. The characteristics of the WOA are the ability of fast convergence. The rest of this chapter is organized as follows: Sect. 2 presents the materials and methods. Section 3 introduces the proposed method. Section 4 illustrates the experiments and discussions. The conclusion and future work are given in Sect. 5.

2 Materials and Methods

2.1 Problem Formulation

In this section, the multilevel thresholding problem definition is introduced, by considering an gray level image I contains \(K+1\) groups. Therefore, the \(t_k, k=1,\ldots ,K\) thresholds are needed to split I to subgroups \(C_k\) as in the following equation:

$$\begin{aligned} C_{0}&=\{I(i, j) \in \ I \mid 0 \le I\left( i, j\right) \le t_{1}- 1\} \nonumber \\ C_{1}&=\{I(i, j) \in \ I \mid t_{1} \le I\left( i, j\right) \le t_{2}- 1\}, \nonumber \\&\;\;\;\; \ldots \\ C_{K}&=\{I(i, j) \in \ I \mid t_{k} \le I\left( i, j\right) \le L- 1\},\nonumber \end{aligned}$$
(1)

where I(ij) is (ij)th pixel value and L is the gray levels of \(I \in [0, L-1]\).

The aim of the multilevel thresholding is to find the threshold values construct these groups \(C_k\), which can be determined by maximizing the following equation:

$$\begin{aligned} t_1^*,t_2^*,\ldots ,t_K^*=\max _{t_1,\ldots ,t_K} F({t_1,\ldots ,t_K}), \end{aligned}$$
(2)

where \(F({t_1,\ldots ,t_K})\) may be Kapur’s entropy or the Otsu’s function.

  • Otsu’s function:

This function is defined mathematically as

$$\begin{aligned} F_{Ots}= & {} \sum _{i=0}^K A_i(\eta _i-\eta _1)^2, \end{aligned}$$
(3)
$$\begin{aligned} A_i= & {} \sum _{j=t_i}^{t_{i+1}-1}P_j,\end{aligned}$$
(4)
$$\begin{aligned} \eta _i= & {} \sum _{j=t_i}^{t_{i+1}-1} i\frac{P_j}{A_j},\;\; where\;\; P_i= h_i/N, \end{aligned}$$
(5)

where \(\eta _1\) is the mean intensity of I with \(t_0 = 0\) and \(t_{K + 1} = L\). The \(h_i\) and \(P_i\) are the frequency and the probability of the ith gray level, respectively.

  • Kapur’s Entropy:

The Kapur’s entropy function determines the optimal threshold values through maximizing the overall entropy [32] that is defined as:

$$\begin{aligned} F_{Kap}=\sum _{i=0}^K(-\sum _{i=t_i}^{t_{i+1}-1}\frac{P_j}{A_j} ln(\frac{P_j}{A_j} ) ). \end{aligned}$$
(6)

2.2 Whale Optimization Algorithm (WOA)

The whale optimization algorithm (WOA) is a new meta-heuristic technique that mimics the Humpback whales [33]. In this technique, the optimization begins by producing a random population of whales. These whales search for the prey’s (optimum) location, then attach (optimize) them by one of these methods encircling or bubble-net.

In the encircling method [33] the Humpback whales improve their location based on the best location as follows:

$$\begin{aligned} \mathbf D =|\mathbf C \odot \mathbf X ^*(t) -\mathbf X(t) | \end{aligned}$$
(7)
$$\begin{aligned} \mathbf X (t+1)=|\mathbf X ^*(t) -\mathbf A \odot \mathbf D |, \end{aligned}$$
(8)

where \(\mathbf D \) describes the distance between the position vector of both the prey \(\mathbf X(t) ^*\) and a whale \(\mathbf X(t) \), and t denotes the current iteration number. \(\mathbf A \) and \(\mathbf C \) are coefficient vectors, and defined as follows:

$$\begin{aligned} \mathbf A =2\mathbf a \odot \mathbf r -\mathbf a \end{aligned}$$
(9)
$$\begin{aligned} \mathbf C =2\mathbf r , \end{aligned}$$
(10)

where r is a random vector \(\in [0, 1]\), and the value of \(\mathbf a \) is linearly decreased from 2 to 0 as iterations proceed.

Whereas the bubble-net method can be performed by two approaches. The first is the shrinking encircling that given by reducing the value of \(\mathbf a \) in equation (9), also, \(\mathbf A \) is reduced. The last is the spiral updating position. This method is applied to mimic the helix-shaped movement of Humpback whales around prey:

$$\begin{aligned} \mathbf X (t+1)=\mathbf D ' \odot e^{bl} \odot cos (2\pi l)+\mathbf X ^*(t), \end{aligned}$$
(11)

where \(\mathbf D ' =|\mathbf X ^*(t)-\mathbf X (t)|\) is the distance between the whale and prey, b is a constant for determining the shape of the logarithmic spiral, \(\odot \) is an element-by-element multiplication, and l is a random value in \([-1, 1]\).

The whales can swim around the victim through a shrinking circle and along a spiral-shaped path concurrently:

$$\begin{aligned} \mathbf{{X}}(t+1)=\left\{ \begin{array}{cl} \mathbf{{X}}^{*}(t)-\mathbf{{A}}\odot \mathbf {D} &{} if \;\; p\ge 0.5 \\ \mathbf{{D}}'\odot e^{bl} \odot cos (2\pi l)+\mathbf {X}^{*} (t) &{} if \;\; p< 0 .5 \end{array}\right. \end{aligned}$$
(12)

where \(p\in [0, 1]\) is a random value which describes the probability of choosing either the shrinking encircling method or the spiral model to adjust the position of whales.

In exploration phase, the Humpback whales search randomly for prey. The position of a whale is adjusted by determining a random search agent rather than the best search agent as follows:

$$\begin{aligned} \mathbf D =|\mathbf C \odot \mathbf X _{rand}-\mathbf X(t) | \end{aligned}$$
(13)
$$\begin{aligned} \mathbf X (t+1)=|\mathbf X _{rand}-\mathbf A \odot \mathbf D |, \end{aligned}$$
(14)

where \(\mathbf X _{rand}\) is a random position determined from the current population. Algorithm 1 illustrates the whole structure of the WOA.

3 The Proposed Method

In this section the proposed method for determining the multilevel thresholding values is introduced. In the first the fitness function is defined, based on the combination of the Otsu’s and Kapur entropy functions, as

$$\begin{aligned} Fit=\alpha F_{Ots}+\beta F_{Kap}, \end{aligned}$$
(15)

where \(\alpha \) and \(\beta \) are random values in the range [0, 1] and the parameters represent the balance between the two fitness functions.

The input to the proposed method is the image histogram, the number of whales N and the dimension of each whale position is the threshold level dim. The WOA starting by generating a random population of N solutions in the search domain [0, L] (here \(L=265\)), for each position the fitness function \(Fit_i\) is computed using equation (15). Then the fitness function \(F_{best}\) and its corresponding best whale position \(x_{best}\) are determined. Based on each value of decrease the parameter a from 2 to 0, the values of two parameters A and C are computed, then the position of each whale is updated based on the value of the parameter p as illustrated in Sect. 2.2. The previous steps are repeated until the stop criteria are satisfied, and the proposed method is shown in Algorithm 1.

figure a

4 Experiments and Discussion

In this section, the experimental environment for the proposed method is introduced. The image description is illustrated in the first, then the setting of the parameters for each algorithm and the measurements used to evaluate the quality of segmentation image is discussed.

4.1 Benchmark Images

The proposed methods used in this chapter are tested on four common grayscale images from the database of Berkeley University [34]. These images are called TestE1, TestE2, TestE3, and TestE7 as illustrated in Fig. 1.

Fig. 1
figure 1

Samples of the tested images, from left TestE1, TestE2, TestE3, and TestE7

4.2 Experimental Settings

The proposed method results are compared with four algorithms, namely, WOA, SSO, FA, and FASSO; these algorithms are previously proposed for multilevel image segmentation and introduced good results. To make the comparison process fair, the population size is 25, the dimension of each agent is the number of thresholds (m) and the same stopping criteria (maximum number of iterations is 100, with a total of 35 runs per algorithm). The parameters of each algorithm used in this paper are illustrated in Table 1.

The experiments were computed on using the following threshold numbers: 2, 3, 4, and 5. All of the methods are programmed in “Matlab 2014” and implemented on “Windows 64bit” environment on a computer having “Intel Core2Duo (1.66 GHz)” processor and 2 GB memory.

Table 1 The parameters setting of each algorithm

4.3 Segmented Image Quality Metrics

The accuracy of the segmented image is evaluated based on fitness function, time, peak signal-to-noise ratio (PSNR), and the structural similarity index (SSIM), where PSNR is defined as

$$\begin{aligned} PSNR=20log_{10}(\frac{255}{RMSE}),\;\; RMSE=\sqrt{\frac{\sum _{i=1}^N \sum _{j=1}^M (I(i,j)-\hat{I}(i,j))^2}{N.M}}, \end{aligned}$$
(16)

where I and \(\hat{I}\) are original and segmented images of size \(M \times N\), respectively. The high value of PSNR refers to the high performance of segmentation algorithm.

The SSIM is defined as

$$\begin{aligned} SSIM(I,\hat{I})=\frac{(2\mu _I \mu _{\hat{I}} +c_1)(2\sigma _{I,\hat{I}} +c_2)}{(\mu _I^2 +\mu _{\hat{I}}^2 +c_1)(\sigma _{I}^1+\sigma _{\hat{I}}^2 +c_2)}, \end{aligned}$$
(17)

where \(\mu _I\) (\(\mu _{\hat{I}}\)) and \(\sigma _{I}\) (\(\sigma _{\hat{I}}\)) are the mean intensity and the standard deviation of the image I (\(\hat{I}\)), respectively. The \(\sigma _{I,\hat{I}}\) is the covariance of I and \(\hat{I}\) and \(c_1=6.5025\) and \(c_2=58.52252\) are two constants [35]. The highest value of SSIM and PSNR indicates better performance (Figs. 2, 3, 4 and 5).

Fig. 2
figure 2

The average of results of measures overall the testing images

Fig. 3
figure 3

The result of segmentation TestE1 image using (from left to right) SSO, FASSO, FA, WOAMOP, and WOA

Fig. 4
figure 4

The result of segmentation TestE2 image using (from left to right) SSO, FASSO, FA, WOAMOP, and WOA

Fig. 5
figure 5

The result of segmentation TestE3 image using (from left to right) SSO, FASSO, FA, WOAMOP, and WOA

Table 2 The average of fitness function values and CPU process time of different segmentation techniques
Table 3 Selected thresholds of techniques
Table 4 The average PSNR and SSIM of different segmentation techniques

4.4 The Results and Discussions

The results of comparison between the proposed algorithm and other algorithms are illustrated in Tables 2, 3, 4 and Fig. 6.

In Table 2, the average results of fitness values and time(s) are computed at thresholds 2, 3, and 4. From this table and Fig. 6d we can conclude that, based on the fitness function (as a measure), in general the WOAMOP is the better algorithm than the SSO is in the second rank followed by the FASSO, FA, and WOA. However, at threshold level equal to the FA and SSO give results better than that obtained by WOAMOP, also at level three of segmentation, the FASSO is outperformed WOAMOP (very small difference). At the high-level thresholding (4 and 5) the WOAMOP is better than all other algorithms followed by SSO in the second rank. Also from this table and Fig. 6c the best algorithm based on the time elapsed is the proposed algorithm followed by FA (however, this very small difference).

Table 4 and Fig. 6a–b show the SSIM and PSNR values. From this table and 6b we can observe that, at \(K=2, 3,4\), and 5 the WOAMOP is better than all other algorithms (however, at \(k=2\) the difference between the algorithm is small). Also, the FA is in the second rank followed by SSO, FASSO, and WOA.

From all previous discussion we can conclude that the proposed method gives better performance based on the quality measures that used (PSNR, SSIM, time, and fitness function).

Fig. 6
figure 6

The result of segmentation TestE7 image using (from left to right) SSO, FASSO, FA, WOAMOP, and WOA

5 Conclusion and Future Work

Image recognition applications use image processing methods to prepare and process the images to be able to identify and recognize the objects on them. So, image segmentation techniques is an essential preprocessing step in several applications; it divides an image into segments with similar features based on a predefined criterion. In this chapter, a new multi-objective whale optimization algorithm (WOAMOP) was proposed for multi-thresholding image segmentation. The proposed method used the hybrid between the Kapur’s entropy and the Otsu’s function as a fitness function. The WOAMOP applied to determine the best solution (threshold values) and then used this thresholding values to divide the image. The experiment results of the proposed method were compared with four algorithms, namely, original WOA, FA, SSO, and FASSO. The WOAMOP achieved better results than all algorithms, and also it provides a faster convergence with relatively lower processing time. In future, the WOAMOP can be applied to other complex image segmentation problems such as color images.