1 Introduction

Agricultural products play a key role in many economies worldwide. However, plant diseases are common and can occur naturally. Accordingly, these prevalent diseases are likely to adversely affect both the quantity and quality of these agricultural products. Then, the detection of plant diseases is crucial for that reason [25].

The main causes of plant disease are viruses, bacteria or even fungi. These various causes result in different visual symptoms, mainly on the plants. Fortunately, these symptoms can be identified by the naked eye. Therefore, the conventional method for detecting plant diseases is the naked eye. Nonetheless, this method is not only time-consuming but also costly since a well-trained team of experts is required to conduct this type of examination [25].

An alternative automatic method based on image processing is proposed in the literature to overcome the traditional ways disadvantages. In essence, a diseased leaves image consists of a background, normal part and diseased part with visual symptoms, like a change in color [30]. So image segmentation is a key step in the process of plant disease detection such that the diseased part of the plant leaf can be extracted and examined to detect the disease and its degree if required. Indeed, segmentation is a crucial, inevitable step in image processing. The following image processing steps, such as feature extraction and pattern recognition, are highly dependent on this step [8].

In general, image segmentation divides the image into two or more parts according to the intensity of pixels, where the pixels with similar intensity levels or similar properties are part of the same region. Although a wide range of image segmentation techniques exists in the literature, these techniques are generally divided into five main categories: clustering-based segmentation, matching-based segmentation, region-based segmentation, edge-based segmentation and thresholding-based segmentation [8]. Thresholding is considered as one of the most common segmentation techniques due to its robustness, simplicity and efficiency [20]. Thresholding-based segmentation methods divide into two broad types: bi-level and multilevel thresholding [20]. In bi-level thresholding, we try to find one threshold value that separates the image into a background and object. However, in multilevel thresholding, we try to find two or more threshold values to separate the image into distinct meaningful regions or objects. Consequently, multilevel thresholding demands more computation than bi-level thresholding and the computation complexity increases exponentially with the number of thresholds [20]. In the plant disease detection, we need to use multilevel thresholding rather than binary thresholding as an image of a diseased leaf contains at least three distinct objects: the background, normal part and diseased part.

Optimization is an important branch of mathematics that aims at finding the optimal solution of a given problem and under given constraints [10]. Thresholding is an optimization problem where we try to find the best threshold values minimizing some objective function. In terms of the optimization objective function, thresholding can be separated into two types: entropy-based thresholding and between-class variance-based thresholding. Kapur entropy and Otsu (between-class variance-based approach) are the most widely used objective functions [21]. In addition to these two common objective functions, Renyi entropy and Tsallis entropy are another candidate objective functions for the thresholding segmentation [19].

Deterministic optimization methods can be used for multilevel thresholding problems, but one of their major drawbacks is the computation time, particularly when the number of threshold values is high [20]. For instance, dynamic programming, which is one of the famous deterministic optimization algorithms, was used in [17] to solve the thresholding segmentation problem with Otsu’s objective function. However, this algorithm’s main limitation, like other deterministic optimization algorithms, is the computation time, which increases exponentially with the number of threshold values [19]. In [19], a modified version of the Otsu’s between-class variance was used as the objective function for the deterministic optimization algorithm used in [17]. This modified Otsu’s objective function and the original one share the same solution.

Metaheuristic algorithms are powerful optimization algorithms that are often used as alternative, efficient methods to the traditional optimization techniques to overcome their limitations [20]. Indeed, metaheuristic algorithms achieve better results than the traditional optimization algorithms [6]. In the literature, there are innumerable metaheuristic algorithms and new ones are constantly being developed [15]. One of them is the best depends on the application itself; that is, one algorithm might show good results in one specific application but show bad results in other applications [20]. Each algorithm has its limitations; some have good exploitation ability while the others have good exploration ability [6]. Hybrid optimization algorithms are proposed in the literature to overcome the drawbacks of the individual algorithms and mix their strengths.

In [21], nine of metaheuristic algorithms were listed as multilevel segmentation algorithms. These nine bio-inspired algorithms were ABC, CS, bat algorithm, social spider optimization, firefly algorithm, gray wolf algorithm, moth flame optimization, whale optimization algorithm and particle swarm optimization. A comparison between these algorithms was drawn based on four factors: the value of the objective function (both Otsu and Kapur were used as the objective function), peak signal-to-noise ratio (PSNR) [12], structural similarity index (SSIM) [26] and lastly CPU time.

In [1], five of the metaheuristic methods were used to solve the multilevel segmentation problem. These five algorithms were the particle swarm optimization, culture algorithm, genetic algorithm, artificial tree algorithm and CS. They were compared against each other in terms of the computation time, value of the objective function which was the Levine and Nazif intra-class uniformity and PSNR.

In [6], several stochastic algorithms, including the ABC, firefly algorithm, differential evolution, social spider optimization, particle swarm optimization, whale optimization algorithm, moth flame optimization, gray wolf optimization, multiverse optimization, selfish herd optimization and harmony search, were used and compared in solving the thresholding problem. Time complexity, fitness values, PSNR and SSIM were the performance factors used to assess the different algorithms. In addition, Wilcoxon’s ranked-sum statistical test was carried out to analyze the results statistically.

In [4], a modified version of the particle swarm optimization algorithm was used as the optimization algorithm for multilevel thresholding with the recursive minimum cross-entropy as the objective function. This proposed algorithm was compared with the original CS, particle swarm algorithm, genetic algorithm, firefly algorithm and modified ABC. The comparison was drawn in terms of the following performance factors: values of the objective function, computation time, misclassification error, complex wavelet structural similarity index measurement, feature similarity index (FSIM) and PSNR.

In [7], a hybrid algorithm of ABC and the sine–cosine algorithm was implemented for the multilevel segmentation, for which the Otsu’s function was used as the objective function. The performance of this hybrid algorithm was compared with the original ABC algorithm and sine–cosine algorithm according to the fitness values, CPU time, SSIM and PSNR. Additionally, [13] applied a novel metaheuristic algorithm, known as black widow optimization, to the multilevel thresholding segmentation. Both Otsu and Kapur were used as the objective function. The results were compared against those of a six other metaheuristic optimization algorithms according to SSIM, PSNR, FSIM as well as the fitness function.

All of the studies mentioned above tried to segment gray images using multilevel thresholding. Unlike these studies, a study reported in [11] used a metaheuristic algorithm called the efficient Krill Herd algorithm to segment color images using multilevel thresholding segmentation. In that study, the Tsallis entropy, Kapur and Otsu were used all as the objective function. In addition to that study, color satellite images were segmented in [24] using multilevel thresholding. Masi entropy was used as the objective function of the optimization problem and compared with the other famous entropy-based objective functions, including Kapur, Tsallis and Renyi entropy. Using Masi entropy [18] showed better results for color satellite images as compared to the other three entropy-based criteria. The performance was assessed based on PSNR, FSIM, SSIM, mean square error (MSE) and misclassification error.

The last study implemented satellite images, while the other studies mentioned above used some benchmark images. Below is some literature review about segmentation for plant disease detection. In [25], color images of five different leaf diseases were segmented using genetic algorithm, with the Euclidean distance between the pixels and their corresponding clusters as the fitness function. The five types of leaf diseases were frog eye leaf spot, early scorch, fungal disease Sunburn disease and bacterial leaf spot. In [2], K-means clustering method [9] was carried out to segmentation of color images of plant leaves. There were four clusters; that is, each image was segmented into four regions. Different values of the number of clusters were tested and the best results were achieved when the number of clusters was four. In [14], color images of diseased leaves were segmented by using K-means clustering and additional segmentation step was added in order to mask the mostly green-colored pixels. This additional segmentation was done using Otsu’s method.

In [27], a two-step segmentation of color images of diseased leaves was carried out using Otsu’s method as the first step and Sobel operator as the second step. In fact, Otsu’s method was commonly used in the literature in order to segment images of diseased plants after some preprocessing had been implemented to the images. In [30], segmentation of color images of diseased leaves was implemented via K-means clustering and super-pixel clustering. The segmentation process went through two distinct steps. The first step was by applying super-pixel clustering; after that, K-means clustering of the super-pixel clustering-segmented images was conducted.

The contribution of this study can be summarized as follows. First, we introduce a novel modified version of TLABC. Then, we apply it the multilevel thresholding color image segmentation and compared our results with four other closely related metaheuristic algorithms: ABC, TLBO, TLABC and CS. The proposed version of TLABC introduces two modifications to the standard TLABC algorithm. The first one is the use of Levy flight equations in the scout phase while the second one is the addition of an indicator of the fertile area to the framework strategy. These two modifications, in fact, substantially improve the performance of the algorithm. The next section provides more details and explanations about these modifications.

The rest of this paper is as follows: Section 2 explains the methodology and algorithms used in this work, Sect. 3 sheds light on the different metrics used to assess the performance of the used algorithms, Sect. 4 displays the results and findings of this study with relevant discussion, and finally, Sect. 5 concludes this paper.

2 Methodology

Color image thresholding is considered a problem of optimization. In this research, metaheuristic algorithms have been used to solve this optimization problem. The objective function of this optimization problem is the Otsu’s function where the optimal thresholds separate the image pixels into different classes with minimal intra-class variance and maximal inter-class variance. That is, Otsu’s method is a maximization problem exhaustively searching for the best thresholds maximizing the inter-class variance and equivalently minimizing the intra-class variance. Metaheuristic algorithms, depend on an iterative process to find the optimal solution to the optimization problems. Each metaheuristic algorithm has its way of generating and replacing the solutions during the searching process, leading to different outputs. In this section, we will introduce ABC, CS, TLBO, TLABC and TLABC with Levy flight algorithms, a new modified version of TLABC proposed in this work, in detail.

2.1 Thresholding problem

The image thresholding problem searches for the optimal m levels such that the original image is segmented into \((m+1)\) subimages. When m equals 1, it is called bi-level thresholding while it becomes multilevel thresholding when m is greater than one. Thus, the \(m-level\) thresholding is an optimization problem trying to find the optimal vector \([t_1, t_2,..., t_m]\) that minimizes some objective function (in our case, the Otsu function). That is, the original image I (with intensity f(xy) at the location (xy)) is segmented into \((m+1)\) subimages as follows:

$$\begin{aligned} \begin{array}{l} {I_o} = \{ f(x,y) \in I\left| {0 \le f(x,y) \le {t_1}} \right. - 1\} \\ {I_1} = \{ f(x,y) \in I\left| {{t_1} \le f(x,y) \le {t_2} - 1} \right. \} \\ \vdots \\ {I_i} = \{ f(x,y) \in I\left| {{t_i} \le f(x,y) \le {t_{i + 1}} - 1} \right. \} \\ \vdots \\ {I_m} = \{ f(x,y) \in I\left| {{t_m} \le f(x,y) \le 255\} } \right. \end{array} \end{aligned}$$
(1)

While Eq. (1) normally applies to the gray-level images, it can apply to the red, green and blue channels of the RGB color images as well where each channel of the RGB image is treated as an individual image.

2.2 ABC algorithm

Among swarm intelligence optimization algorithms, the ABC algorithm is considered as one of the most effective algorithms due to its major advantages such as high exploration, simple implementation and less control parameters [16]. To improve its performance more and overcome its limitations like poor exploitation and slow convergence, many modified versions of this algorithm were proposed in the literature. A recently modified version along with other already existing modified versions of the ABC algorithm, were compared in [16].

The ABC algorithm divides the bee foraging task into three phases: employed and onlooker bees for the purpose of exploitation in addition to scout bees for the purpose of exploration. These three types of tasks lead the bees to converge toward the best food source by sharing information with each other. The bee colony is divided into two equal subsets of bees, the employed and onlookers. A scout bee carries out a random search. The employed bee becomes a scout as soon as its food source is exhausted. Each of these tasks is described in detail as follows.

Initialization Initial population is randomly generated using Eq. (2).

$$\begin{aligned} x_i^j=x_{min}^j+rand(0,1)(x_{max}^j-x_{min}^j) \end{aligned}$$
(2)

Employed Bee Task Equation (3) is used for performing the update of the current solution where \(k \in [2 , SN]\) and \(j \in [2 , D]\) are randomly chosen indices, SN is the number of food sources, D is the dimension of the solution (in this paper, D is the number of levels) and k is not equal to i. Employed bees update the existing solution based on the fitness value of the new solution which is calculated by Eq. (3). Then they replace the existing food source with the new one of higher fitness value.

$$\begin{aligned} v_i^j=x_i^j+rand(0,1)(x_i^j-x_k^j) \end{aligned}$$
(3)

Onlooker Bee Task It starts when the employed bees phase finishes. In this task, the onlooker bees in the hive calculate the selection probability of food sources based on the employed bees that shared the new solutions. The probability may be computed using Eq. (4).

$$\begin{aligned} P_i=\frac{Fit(i)}{\displaystyle \sum \limits _{i=1}^N Fit(i)} \end{aligned}$$
(4)

Scout Bee Task A food source is assumed to be exhausted if that food source is not improved for a threshold number of iterations. Then employed bee becomes a scout, and a new food source will be created randomly inside the search space using Eq. (2).

2.3 CS algorithm

Due to its high efficiency in solving various types of optimization problems and real-world applications, CS has attracted great attention [28]. There are three idealized rules of simplicity in the definition of CS: (1) Each cuckoo lays only one egg in a randomly selected nest at a time. (2) Only bring the best nests to the next generations. (3) The number of available host nests is fixed, and the host bird can discover the egg laid by a cuckoo with a probability \(pa \in [0, 1]\).

$$\begin{aligned} x_{ij}^{'}(t+1)=x_{ij}(t)+stepsize(t)\oplus Levy \end{aligned}$$
(5)

2.4 TLBO algorithm

The algorithm TLBO mimics a teacher’s effect on students. There are two stages in the TLBO algorithm: teacher phase and student phase [22].

Teacher phase The best solution from the population is known as the teacher since the teacher is considered the most knowledgeable person in the class. The teacher provides learners with information to maximize the mean outcome of the lesson using Eq. (6). This phase is responsible for the exploration process.

$$\begin{aligned} x_{i,d}^{new}=x_{i,d}^{old}+rand_2(x_{teacher,d}-T_F*x_{mean,d}) \end{aligned}$$
(6)

Learner phase Through communicating with each other, the learners increase their knowledge. If one of the students has more experience and knowledge than the others, then all other learners will learn new information. This phase is the responsibility of the exploitation process.

$$\begin{aligned} {u_s} = \left\{ \begin{array}{l} {x_s} + rand \cdot ({x_s} - {x_j}),\;if\;f({x_s}) \le f({x_j})\\ {x_s} + rand \cdot ({x_j} - {x_s}),\;if\;f({x_j}) > f({x_s}) \end{array} \right. \end{aligned}$$
(7)

2.5 TLABC algorithm

TLABC is hybridization between TLBO and ABC algorithms which combines the advantages of both (the exploration of ABC and the exploitation of TLBO). It effectively employs three hybrid search phases as follows [5].

Teacher phase Here, each employed bee uses a hybrid of TLBO and mutation operator of differential evolution to search a new food source, which can develop the variety of search tendencies extraordinarily and upgrade the searchability of TLABC.

Learning-Based Onlooker Bee Phase In this stage, an onlooker bee chooses a sustenance source to search out as indicated by the selection probability, which is determined utilizing Eq. (8). After that, the onlooker bee finds out new food sources using the TLBO’s learning strategy.

$$\begin{aligned} {u_{i,d}} = \left\{ \begin{array}{l} x_{i,d}^{old} + ran{d_2}({x_{teacher,d}} - {T_F} \cdot {x_{mean,d}}),\;if\;ran{d_1} < 0.5\\ {x_{r1,d}} + F \cdot ({x_{r2,d}} - {x_{r3,d}}),\;\quad \quad \;\;\;\quad otherwise \end{array} \right. \end{aligned}$$
(8)

Generalized Oppositional Scout Bee Phase In this stage, if a nourishment source cannot be improved further for a specific period, it is viewed as depleted and would be relinquished. At that point, an arbitrary candidate solution and the generalized oppositional solution of it are created. The best solution of them is utilized rather than the old depleted nourishment source.

$$\begin{aligned} x_{i,j}^{GO} = k \cdot ({a_j} + {b_j}) - {x_{i,j}} \end{aligned}$$
(9)

2.6 The proposed modification of TLABC algorithm (MTLABC)

TLABC has a good balance between exploration and exploitation, as described above. Yet, the modification of TLABC proposed in this paper would significantly improve its efficiency. The TLABC algorithm was modified in previous work; the framework of TLABC was maintained while the equations of searching of onlooker and employed phases were modified [23]. Nonetheless, the currently proposed modification of the TLABC algorithm is, in essence, a kind of low-level integrative hybridization between the TLABC and the Levy flight algorithm. In fact, the good exploration ability of the ABC algorithm attracts a considerable number of researchers to hybridize it with many other swarm algorithms, as mentioned previously. Some of them keep both the search equations and the ABC framework while the others keep just the ABC framework. In the TLABC hybridization, only the framework was maintained; therefore, in this paper, we have also modified the search equation used in the scout phase. This modification enhances the performance of exploitation searching of TLABC in the generalized oppositional scout bee phase; the solution created randomly using Eq. (2) was replaced by a solution created using the update equation of Levy flight using Eq. (6) around the fertile area of the search rather than the old depleted nourishment source. In fact, the scout phase of the original ABC and TLABC generates solutions randomly to improve the exploration more. However, the exploration of ABC is already high and we here make some trade-off between the exploration and exploitation by generating solutions near the best solution instead of randomly. By doing so, we improve the exploitation of the algorithm and achieve some type of balance between exploration and exploitation. In addition, we add an indicator of the fertile area which is the other modification we add to the TLABC. The fertile area could be discovered by using this indicator to each solution which will increase whenever the solution increases. Thus, the solution which its indicator is the highest is indeed in the fertile area. The pseudo-code for this change is shown in Algorithm (1). The parameter settings of the MTLABC algorithm are listed in Table 1.

figure a
Table 1 Parameters setting of MTLABC algorithm

3 Image quality assessment (IQA) metrics

Loss of information, together with a significant degradation in the quality of an image, occurs at different stages of its life cycle, including acquisition, transmission, restoration, processing or compression [12, 26]. Therefore, various methods, known as image quality assessment, have developed in the literature to assess the quality of a particular image after going through some type of distortion. In general, IQA methods divides into two main approaches: subjective and objective approaches. In the subjective methods, the image quality is evaluated by a human. However, this method is costly, time-consuming and subjective (can differ according to the person evaluating the image). On the other hand, the objective approaches assess the image according to some quantitative measures so the image quality assessment can be done automatically using some algorithms and metrics. Some of the most widely used objective IQA metrics are PSNR, SSIM, FSIM and color feature similarity (FSIMc) indices [29].

3.1 PSNR index

PSNR, which is a function of the MSE, is considered as the simplest IQA metric. It is very straightforward to compute, mathematically convenient for optimization, and has a clear physical meaning. MSE measures the mean-squared error between the intensities of the reference image pixels (the image before segmentation in our case) and the distorted image pixels (the image after segmentation in our case). The PSNR index is calculated according to Eq. (10).

$$\begin{aligned} PSNR({I_R},{I_S}) = 10{\log _{10}}\left( {\frac{{{{255}^2}}}{{MSE({I_R},{I_S})}}} \right) \end{aligned}$$
(10)

where \(I_R\) and \(I_S\) are the original image and the segmented image, respectively. The MSE is obtained using Eq. (11).

$$\begin{aligned} MSE({I_R},{I_S}) = \frac{1}{{m*n}}\sum \limits _{i = 1}^m {\sum \limits _{j = 1}^n {{{[{I_R}(i,j) - {I_S}(i,j)]}^2}} } \end{aligned}$$
(11)

where the size of the image is \(m-by-n\), and the MSE is averaging the sum of the difference between the original and segmented images squared.

The PSNR index measures the quality of the segmented image as compared to the original one in decibels. As inferred from Eq. (10), when the MSE approaches zero, the PSNR goes to infinity. Consequently, high values of the PSNR mean a high quality of the segmented image and vice versa [12]. Although the MSE and PSNR are easy to compute, both of them fail to distinguish structural content of the image. In fact, two images with different levels of degradation can have the same MSE (PSNR) values.

3.2 SSIM index

As mentioned before, the MSE and PSNR fail to discriminate the structural content of an image. Nevertheless, the nature of images is highly structured where proximate pixels of an image are highly dependent and these dependencies are due to the structure of the different objects on the image. Thus, SSIM tries to measure the differences between the structural content of the original image and the distorted image (the segmented image in this paper). The SSIM index imitates how the human visual system (HVS) assesses the image quality depending upon the structures of the images. The SSIM index models the image distortion as three distinct factors: luminance distortion, correlation distortion and contrast distortion. Then, the SSIM index is defined as follows:

$$\begin{aligned} SSIM({I_R},{I_S}) = L({I_R},{I_S}).C({I_R},{I_S}).S({I_R},{I_S}) \end{aligned}$$
(12)
$$\begin{aligned} L({I_R},{I_S}) = \frac{{2 \cdot {\mu _{{I_R}}}{\mu _{{I_S}}} + {c_1}}}{{\mu _{{I_R}}^2 + \mu _{{I_S}}^2 + {c_1}}} \end{aligned}$$
(13)
$$\begin{aligned} C({I_R},{I_S}) = \frac{{2 \cdot {\sigma _{{I_R}}}{\sigma _{{I_S}}} + {c_2}}}{{\sigma _{{I_R}}^2 + \sigma _{{I_S}}^2 + {c_2}}} \end{aligned}$$
(14)
$$\begin{aligned} S({I_R},{I_S}) = \frac{{{\sigma _{{I_R}{I_S}}} + {c_3}}}{{{\sigma _{{I_R}}}{\sigma _{{I_S}}} + {c_3}}} \end{aligned}$$
(15)

where \(I_R\) and \(I_S\) are the original (reference) image and the segmented image, respectively. \(L(I_R,I_S)\) is the luminance function which compares the similarity between the luminance means of the original and segmented images. This function is maximum and equal to 1 when \({\mu _{{I_R}}} = {\mu _{{I_S}}} \cdot C(I_R,I_S)\) is the contrast function measuring the similarity between the original and segmented images. This function is maximum and equal to 1 when the standard deviations of the original and segmented images are equal \({\sigma _{{I_R}}} = {\sigma _{{I_S}}} \cdot S(I_R,I_S)\) is the correlation or structural function which measures the correlation between the two images where \({\sigma _{{I_R}{I_S}}}\) is the covariance between the original and segmented images. The higher the SSIM index is, the higher the quality of the segmented image is. The three constants in the above equations are used to ensure stability and make sure the denominators are always nonzero.

3.3 FSIM index

FSIM is other IQA metric which evaluates the image quality relying upon its low-level features, just as HVS does. These low-level features are the phase congruency (PC) of the image and the gradient magnitude (GM). The primary feature of the FSIM index is the PC, which is a dimensionless indicator of the significance of the local structure. Although the PC is a reasonable model of how HVS identifies the image features, it is contrast-invariant given that the HVS perception of the image quality does also depend on the information about the contrast in the image. So the GM is the secondary feature of the FSIM index which takes into account the local contrast to play a complementary role with the PC in characterizing the image local structure [29]. The process of computing the FSIM index consists of two steps:

Step 1: The map of the local similarity between the two images (the original and segmented images) is calculated. In fact, the PC and GM are two components determining the local similarity map. Therefore, the similarity between the original and segmented images in terms of their PC’s is calculated as in Eq. (16).

$$\begin{aligned} {S_{PC}} = \frac{{2 \cdot P{C_1} \cdot P{C_2} + {T_1}}}{{PC_1^2 + PC_2^2 + {T_1}}} \end{aligned}$$
(16)

where \(PC_1\) and \(PC_2\) are the PC of the original and segmented images and \(T_1\) is a positive constant to increase the stability and ensure the denominator does not equal zero (the value of \(T_1\) relies upon the dynamic range of the PC). The values of \(S_{PC}\) are in the range (0, 1] such that small values near zero indicate that the PC of the two images are different and high values near one indicate that the PC of the two images are similar. Likewise, the similarity between the original and segmented images in terms of their GM’s is calculated according to Eq. (17).

$$\begin{aligned} {S_G} = \frac{{2 \cdot {G_1} \cdot {G_2} + {T_2}}}{{G_1^2 + G_2^2 + {T_2}}} \end{aligned}$$
(17)

where \(G_1\) and \(G_2\) are the GM of the original and segmented images and \(T_2\), like \(T_1\), is a positive constant calculated according to the dynamic range of the GM.

Step 2: The two components, the PC and GM, are combined to come up with a single value according to Eq. (18).

$$\begin{aligned} {S_L} = {({S_{PC}})^\alpha } \cdot {({S_G})^\beta } \end{aligned}$$
(18)

where \(S_L\) is the combined similarity between the original and segmented images. \(\alpha\) and \(\beta\) are two parameters weighing the importance of the GM and PC. For simplicity, both \(\alpha\) and \(\beta\) are normally put equal to 1. After obtaining \(S_L\), the overall similarity between the original and segmented images is:

$$\begin{aligned} FSIM = \frac{{\sum \nolimits _{x \in \Omega } {{S_L}(x) \cdot P{C_m}(x)} }}{{\sum \nolimits _{x \in \Omega } {P{C_m}(x)} }} \end{aligned}$$
(19)

\(\Omega\) is the whole image spatial domain. \(S_L\) at any location x is weighted by \(PC_m(x)\) where \(PC_m(x)=max(PC_1,PC_2)\). So the PC acts as a weight because different locations have different importance in determining the similarity between two images according to the HVS. Additionally, the PC structures determine the significance of each location. Like SSIM and PSNR, the higher the FSIM index is, the better the quality of the segmented image is.

3.4 FSIMc index

The FSIM index is developed to be used with grayscale images or the luminance of color images. The FSIM can be easily extended to be used with color images by incorporating the chrominance information of color images. This extension of the FSIM index is known as FSIMc. Firstly, the RGB color image is transformed into YIQ color space where Y denotes the luminance information, and I and Q contain the chrominance information. This transformation can be obtained by Eq. (20).

$$\begin{aligned} \left[ \begin{array}{l} Y\\ I\\ Q \end{array} \right] = \left[ \begin{array}{l} 0.299\quad \;\;0.587\quad \quad \; \; 0.114\\ 0.596\quad - 0.274\quad - 0.322\\ 0.211\quad - 0.523\quad \quad 0.312 \end{array} \right] \left[ \begin{array}{l} R\\ G\\ B \end{array} \right] \end{aligned}$$
(20)

We can compare the similarity between the chromatic channels of the original and segmented images as follows:

$$\begin{aligned} {S_I}(x) = \frac{{2 \cdot {I_1}(x) \cdot {I_2}(x) + {T_3}}}{{I_1^2(x) + I_2^2(x) + {T_3}}} \end{aligned}$$
(21)
$$\begin{aligned} {S_Q}(x) = \frac{{2 \cdot {Q_1}(x) \cdot {Q_2}(x) + {T_4}}}{{Q_1^2(x) + Q_2^2(x) + {T_4}}} \end{aligned}$$
(22)

where \(I_1\) and \(Q_1\) are the chromatic channels of the original image while \(I_2\) and \(Q_2\) are the chromatic channels of the segmented image. \(T_3\) and \(T_4\) are, like \(T_1\) and \(T_2\), positive constant. Since the two chromatic components, I and Q, have roughly the same dynamic range, \(T_3\) and \(T_4\) can be equal to each other. Then the two similarity components, \(S_I\) and \(S_Q\), are combined together to yield a single value according to Eq. (23).

$$\begin{aligned} {S_C}(x) = {S_I}(x) \cdot {S_Q}(x) \end{aligned}$$
(23)

Lastly, the FSIMc can be defined as

$$\begin{aligned} FSIMc = \frac{{\sum \nolimits _{x \in \Omega } {{S_L}(x)} \cdot {{[{S_C}(x)]}^\lambda } \cdot P{C_m}(x)}}{{\sum \nolimits _{x \in \Omega } {P{C_m}(x)} }} \end{aligned}$$
(24)

where \(\lambda\) is a positive parameter indicating the significance of the two chromatic components.

4 Experimental results

In this section, we show the results of applying five metaheuristic optimization algorithms to the multilevel segmentation problem. These metaheuristic optimization algorithms are ABC, CS, TLBO, TLABC and MTLABC. The first four are well-known metaheuristic optimization algorithms in the literature, while the last one is a modified version of TLABC proposed in this paper. Eight color images of plant diseases were selected randomly from New Plant Diseases Dataset available on [3] and used as the benchmark to compare the five metaheuristic optimization algorithms. Figure 1 shows these eight color images together with their histograms. As it is shown in Fig. 1, the images are with different levels of complexity. One of these images is unimodal (with one peak), some of them are bimodal (with two peaks) and the others are multimodal. This variety of number of peaks in images histograms makes it a difficult task for the metaheuristic algorithms to find the optimal thresholding values. Each of the five metaheuristic optimization algorithms was repeated 50 times and the averages of these 50 times were used for comparison.

Fig. 1
figure 1

Eight images and their histograms

To compare the performance of the five optimization algorithms, five performance criteria were used. These five criteria are the objective function, CPU time, PSNR, SSIM and FSIMc. The experiments were conducted on MATLAB R2018b installed on a laptop with Intel Core i7-2670QM CPU (2.20GHz) and 6-GB RAM.

4.1 Objective function

As mentioned before, the Otsu’s function was used as the objective function of the metaheuristic optimization algorithms with different numbers of thresholds. The objective function was found for each of the three channels of the RGB color images individually. Table 5 shows the values of the objective function for the red channel of the eight images using different numbers of thresholds (k = 5, 7, 9, 11, 15, 17 and 20 where k is the number of the thresholds). Likewise, Tables 6 and 7 show the values of the objective function for the green and blue channels. For each image, the best results at every threshold are boldfaced. In Table 5, MTLABC achieved the best results in 43 out of 64 while CS obtained the best results in 21 out of 64. In Table 6, MTLABC obtained the best results in 48 out of 64 while CS achieved the best results in 16 out of 64. In Table 7, MTLABC achieved the best results in 54 out of 64 while CS achieved the best results in 10 out of 64. It is clearly seen that CS was almost the best when the number of thresholds was small while MTLABC was the best when the number of thresholds was large. Figure 2 shows a comparison between the MTLABC and CS which are always the best two algorithms in terms of the objective function. At each level, there are eight color images with red, green and blue channels. In fact, we treat these three channels as individual images so there are 24 image (24 experiments) at each level. From Fig. 2, we see that CS is the best in most of the 24 experiments at low levels (k = 5 and 7) while MTLABC is the best in most, and often all, of the 24 experiments at high levels (k = 9, 11, 13 15, 17 and 20). Therefore, MTLABC performs well in high dimensions and as the dimension (number of levels) increases, the MTLABC outperforms the other four algorithms more significantly.

Fig. 2
figure 2

Comparison between MTLABC and CS in terms of how many experiments, at each level, in which MTLABC or CS achieved the best value of objective function (At each level, there are eight images and for each image there are red, green and blue channels, treated as individual images. So, in total, 24 images exist for each level, each of which is called an experiment)

Figure 3 displays the convergence curves of the five algorithms for the red channel of the first image at all the levels. The convergence curves show how the average values of the objective function change with respect to the number of iterations. Each algorithm was run 50 times so we report in the graph the average values of the objective function. According to Fig. 3, MTLABC achieves the best (fastest) convergence among the five optimization algorithms; in contrast, CS and ABC exhibit the slowest convergence. As a results, the modifications we added to TLABC improves the convergence of the algorithm, not to mention its improved accuracy.

Fig. 3
figure 3

Convergence curves of the TLBO, TLABC, ABC, CS and MTLABC applied to the first image (red channel) with different numbers of levels (k = 5, 7, 9, 11, 13, 15, 17, 20)

4.2 CPU time

As mentioned before, each one of the five algorithms is repeated 50 times; thus, the mean CPU time in seconds, together with the standard deviation of the 50 repetitions at level 5, as an example, is shown in Table 2. As shown in Table 2, ABC has less mean CPU time in all the eight images since the ABC involves fewer computational as compared to the other algorithms. On the other hand, the computation time of the modified version of TLABC, MTLABC, is comparable to TLABC, although the results and convergence of MTLABC are better than TLABC.

Table 2 Mean CPU time in seconds at level 5 (standard deviation)

4.3 PSNR

As stated previously, whenever the value of the PSNR index is high, the quality of the segmented image is good and accordingly the performance of the optimization algorithm is high. Table 8 shows the PSNR values for the five algorithms at the different numbers of the thresholds where the best results were boldfaced. As shown in Table 8, the performance of the MTLABC was the best, in terms of the PSNR index, in 59 out of 64. On the other hand, CS was the best in the remaining five cases. It is also noted that the MTLABC was always the best when the number of the thresholds is high (when k = 11, 13, 15, 17 and 20).

4.4 SSIM

Like PSNR, the performance of the optimization algorithm is high when the SSIM index is high. Table 9 shows the SSIM values for the five algorithms at the different numbers of the thresholds. The best results were boldfaced. The MTLABC was the algorithm with the highest SSIM values in 61 out of 64 while CS achieved the best performance in the remaining three cases. As with the PSNR index, the MTLABC was always the best when the number of the thresholds is high (when k = 9, 11, 13, 15, 17 and 20).

4.5 FSIM

The extension of the FSIM for the color images was used as other performance metrics. The optimization algorithms with a good performance show high values of the FSIMc index, similar to the PSNR and SSIM indices. Table 10 shows the FSIMc values for the five algorithms at the different numbers of the thresholds. The MTLABC was the algorithm with the best performance in 60 cases out of 64 cases while CS achieved the best performance in the remaining four cases. As with the other metrics, the MTLABC was always the best when the number of the thresholds is high (when k = 11, 13, 15, 17 and 20).

4.6 Statistical analysis

Due to their stochastic behavior, statistical tests are needed to prove statistically significant differences between the different metaheuristic optimization algorithms used in this study.

In this paper, the Friedman test and the Wilcoxon signed-rank test, which are two widely used nonparametric hypothesis tests, were used to test whether or not a significant difference between the used optimization algorithms exists. In particular, we want to show that MTLABC, which is the proposed optimization algorithm in this paper, is statistically different from the other optimization algorithms proposed in the literature and used in this paper.

In the Friedman test, the null hypothesis, Ho, is that the medians of the optimization algorithms are equal while the alternative hypothesis, H1, is that their medians are different. Therefore, rejecting the null hypothesis in favor of the alternative demonstrates that the distinct optimization algorithms are different from each other. In addition to hypotheses, the significance level of the test, \(\alpha\), is defined as the probability of rejecting the null hypothesis while it is, in fact, true. Accordingly, as the significant level goes down, it becomes more difficult to reject the null hypothesis. Thus, the smallest significance level at which we still can reject the null hypothesis is an important factor known as the p value. The p value indicates the level of confidence when rejecting the null the hypothesis. The smaller the p value is, the more confident we reject the null hypothesis. Moreover, when the p value for some sample is less than the significant level of the test, the null hypothesis gets rejected by that test.

We applied the Friedman test to the three channels of the RGB color images individually. In other words, the red, green and blue channels of any image are treated as three different images, and Table 3 shows the p values along with the Chi-square \(x^2\) values of the Friedman test at different threshold numbers. From a \(x^2\)-distribution, the 0.95 quantile (the critical value at significant level 0.05) of \(x^2\)-distribution with (5-1) degrees of freedom is 9.488 and all the \(x^2\) values in Table 4 are significantly greater than this critical value. That means the null hypothesis is confidently rejected and the optimization algorithms are proved different. Likewise, the very small p values mean that we can reject the null hypothesis with high confidence. As a result, we can conclude that the five P-metaheuristic algorithms are not the same according to the Friedman test (indicating that the proposed MTLABC might be different from the other four and a follow-up pairwise test is needed to prove that).

Table 3 P values and Chi-square values of Friedman test of the red, green and blue channels at different levels
Table 4 P values of the Wilcoxon signed-rank test of the red, green and blue channels

The Wilcoxon signed-rank test was used as a follow-up pairwise test, to statistically validate the results and show a significant difference between the proposed MTLABC algorithm and the other optimization algorithms from which MTLABC is a modified version. The Wilcoxon signed-rank test performed a pairwise test between MTLABC and the other four optimization algorithms used in this paper. If the p value of this test is less than 0.05, then there, indeed, exists a statistically significant difference between the two algorithms which the test compared. Otherwise, the test fails to prove any significant difference between the two algorithms. Table 4 shows the p values of comparing MTLABC against TLABC, TLBO, ABC and CS using the Wilcoxon signed-rank test. It is obvious that the p values are much smaller than 0.05, such that the significant difference between MTLABC and the other algorithms are demonstrated.

Table 5 Averages of the Otsu objective function (red)
Table 6 Averages of the Otsu objective function (green)
Table 7 Averages of the Otsu objective function (blue)
Table 8 PSNR values of ABC, TLBO, CS, TLABC and MTLABC
Table 9 SSIM values of ABC, TLBO, CS, TLABC and MTLABC
Table 10 FSIMc values of ABC, TLBO, CS, TLABC and MTLABC

5 Discussion and conclusion

Thresholding is a widely used algorithm for image segmentation owing to its robustness, simplicity and efficiency. In essence, thresholding is deemed an optimization problem. Some objective functions attempt to find the optimal threshold values, efficiently separating the distinct objects in an image. Indeed, Otsu’s function is one of the commonly used objective functions of this optimization problem. Like any optimization problem, multiple choices exist to solve the problem and the best option is the one that best suits the problem in hand. In general, there are two broad types of optimization algorithms: deterministic and stochastic optimization algorithms. The latter is used whenever the problem is complicated, like our case, and the deterministic optimization algorithms fail to solve the problem or take unreasonably much time. In this work, we used five different metaheuristic optimization algorithms to solve the multilevel thresholding segmentation problem where Otsu’s function was used as the objective function. These five stochastic algorithms are ABC, CS, TLBO, TLABC and a modified version of TLABC proposed in this paper. The benchmark used in this paper consists of eight color images of plant leaves with some kind of plant disease. Normally, a plant disease image comprises three to five distinct parts.

However, we tried multiple numbers of levels and reported the five algorithms’ results at each level. The objective function values and three commonly used image quality assessment metrics are used to evaluate and compare the five stochastic performance algorithms. Our proposed algorithm showed the best results in most images and most of the levels in terms of these four performance measures. In fact, MTLABC was the best in all the eight images when the number of levels is high indicating that MTLABC works considerably well in high dimension; CS ranked second after MTLABC. Two nonparametric statistical tests were used to analyze the results statistically. These statistical tests proved that MTLABC, the algorithm proposed in this work, is statistically different from the other four stochastic algorithms: ABC, TLBO, TLABC and CS. The p values obtained in both tests are less than \(10^{-4}\).

For future study, the data set can be extended into more than eight images, and further stages of image processing, such as feature extraction and classification, can be carried out. The new hybrid stochastic algorithm proposed in this work will be applied to other applications and compared with the already existing state-of-the-art stochastic algorithms in future work.