1 Introduction

Image segmentation is the technology and process of dividing an image into several specific and unique areas and presenting the object of interest. The process involves dividing an image into different segmentation pixel classes with features such as grayscale, color, and texture.

[23]. There is some higher-level processing, that is, image analysis, object recognition, and computer vision. Image segmentation is often used as the preprocessing stage of this higher-level processing. Different types of methods exist for image segmentation. Thresholding is one of the main methods and has the ability to search for the optimal threshold. The thresholding technique consists of bi-level thresholding and multilevel thresholding [1, 25]. Bi-level thresholding separates the pixels in the image into two parts, whereas multilevel thresholding separates the pixels into several parts [8]. Image processing has many aspects, such as image enhancement, image homogenization, and image segmentation. Pattern recognition is sometimes included in image processing. The main content of machine learning is generalization, which separates two or more items according to their characteristics [16, 17]. Machine learning is a science of artificial intelligence [18, 19]. The main research area in this field is artificial intelligence, in particular, how to improve the performance of specific algorithms in experience learning [14]. In image segmentation, it is often possible to manually mark the work; however, it is difficult to write a complete rule for automatic processing. Sometimes there is an entire set of algorithms; however, there are too many parameters, and it is too tedious to manually adjust and determine the correct parameters. We can use the machine learning method to extract a certain number of features and manually mark a batch of results, and then use machine learning to determine a set of automatic judgment criteria [15]. Machine learning is more effective in developing such software.

Many computer scientists and scholars have studied image segmentation for many years and have proposed some innovations that have been explored and expected in the literature. An automatic threshold selection method based on the information theory entropy criterion proposed by Otsu in 1979 caused great concern, and both theoretical research and practical applications have made significant breakthroughs [22]. Pun and Kapur proposed the method of using the maximum a priori entropy to estimate the classification rationality to select the threshold in 1980 [24] and 1985 [9], respectively. Yen proposed a method of selecting the threshold using the principle of maximum relativity instead of the commonly used principle of maximum entropy [31]. In 1999, Yin designed an improved genetic algorithm and embedded learning strategies to enhance its multi-threshold search capability [32]. This method greatly reduced the computational cost of the multilevel threshold and had good segmentation results. In 2004, Lai studied Gaussian smoothing in detail, proposed a genetic algorithm based on it, and applied the algorithm to image segmentation [13]. In 2008, Maitra introduced a cooperative learning operator and comprehensive learning operator in particle swarm optimization (PSO), which enhanced the image segmentation ability of the algorithm. The comprehensive learning operator reduced the risk of the premature algorithm. The cooperative learning operator was devoted to solving the problem of the dimensionality disaster [20]. In 2012, the differential evolution (DE) algorithm based on the Gaussian distribution function was applied in multi-threshold segmentation. The calculation complexity of the algorithm was reduced, but it was still high [3]. In 2014, Bhandari proposed a method called ELR-CS to solve multi-threshold image segmentation and compared it in a large number of experiments with the cuckoo search (CS) algorithm and wind driven optimization (WDO) algorithm, which indicated that it had good segmentation performance [2]. In 2015, Wang applied the flower pollination algorithm (FPA) based on a modified randomized location to multi-threshold medical image segmentation [28]. In 2017, Khairuzzaman used the gray wolf algorithm to optimize Otsu’s objective function and Kapur’s method to solve multi-threshold image segmentation. The experimental results demonstrated that the proposed gray wolf optimization (GWO) was more stable and obtained higher-quality solutions than the PSO and BFO algorithms [12]. Aziz applied the whale optimization algorithm (WOA) and moth-flame optimization (MFO) for multilevel thresholding image segmentation using the two algorithms in Otsu’s method and Kapur’s method [5], respectively. The WOA and MFO algorithms were better than the other compared algorithms for almost all the test images. WOA was superior to MFO under the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) tests.

Recently, the entropy-based method for multilevel thresholding image segmentation has been quite popular, in particular, Ostu’s criterion [12], cross entropy, Tsallis entropy [26], and Kapur’s entropy [12]. Kapur’s entropy is a nonparametric threshold technique that maximizes entropy to calculate the homogeneity of classes. Among these thresholding methods, Kapur’s entropy has attracted the attention of researchers and has been shown to be more superior than other thresholding methods. However, to select the optimal threshold, an exhaustive search using Kapur’s entropy requires much more execution time for increasing thresholds. To overcome this problem, researchers have used swarm intelligent algorithms that are inspired by nature. The most popular algorithms are the genetic algorithm (GA) [6], (PSO) [11], (DE) [27], artificial bee colony (ABC) [10], and firefly algorithm (FA) [30]. Researchers have also used swarm optimization with thresholding methods to solve multilevel thresholding image segmentation. Akay used PSO and ABC for multilevel thresholding to maximize Kapur’s entropy [14]. In [4], Sathyaet et al. used a modified PSO with minimum cross entropy to search for the optimal threshold. These studies have demonstrated the power of combining the swarm algorithm with thresholding methods to manage image segmentation.

Determining the optimal thresholding for image segmentation has gained more attention in recent years. It is a valuable foundational technology for digital image processing and machine vision and it is often used as a preprocessing stage for applications such as pattern recognition. However, traditional multilevel thresholding methods are computationally expensive because they involve exhaustively searching the optimal thresholds to optimize the objective functions. In this paper, we propose an algorithm that is inspired by the orientation of moths toward moonlight called the moth swarm algorithm (MSA), [21] which is used to solve multilevel thresholding for image segmentation, thereby overcoming some deficiencies of other algorithms. The proposed method selects the optimal set of thresholds using Kapur’s entropy function. The simulation results show that the MSA obtained better results when compared with the WOA [5], bat algorithm (BA) [29], GWO [12], and FPA [28] in terms of the PSNR [2], SSIM [2], computational times, and Kapur’s entropy fitness function. Multi-threshold image segmentation is a powerful image processing technique that is used for the preprocessing of pattern recognition and computer vision. The performance of the higher-level processing system depends on the accuracy of the segmentation technique used. The MSA has the advantage that it is simple and universal, has strong robustness, is suitable for parallel processing, and has a wide application range. The MSA is an evolutionary method that is inspired by the phototropism of moths and transverse orientation. Different to other evolutionary algorithms, the MSA exhibits interesting search capabilities while maintaining a low computational overhead. Segmentation is one of the most important tasks in image processing that endeavors to identify whether the pixel intensity corresponds to a predefined class. Applying the MSA to image segmentation is a new method of image segmentation.

The remainder of this paper is structured as follows: In Section 2, we introduce the multi-threshold and Kapur entropy. In Section 3, we briefly present the concept of the MSA. In Section 4, we present the proposed MSA-based multilevel thresholding method and its pseudocode. In Section 5, we present the experimental results and discussions. In the final section, we conclude the study and suggest some directions for future studies.

2 Multilevel thresholding

Bi-level thresholding divides an image into two parts: one is the object and the other is the background. Bi-level thresholding is effective if the image is simple, that is, it contains only an object; however, if the image is complex and involves many objects, bi-level thresholding may fail to provide the appropriate performance [12]. As a result, multilevel thresholding is often used instead of bi-level thresholding to segment complex images. To obtain a good segmentation result, it is essential to choose the prober values of these thresholds. Optimal threshold selection methods search for thresholds by optimizing an objective function. To obtain a good thresholding function, entropy-based methods have been found to be efficient and feasible. Kapur’s entropy is one of the most popular techniques used for optimal thresholding techniques. The concept of Kapur’s entropy is briefly introduced in the following subsection.

2.1 Concept of Kapur’s entropy

Kapur et al. supposed that there are double probability distributions that are the object and background. Therefore, maximizing all the entropy of the partitioned image is the major step to obtain the best threshold level. In the situation in which the optimal thresholds for segmenting the classes are assigned appropriately, then only the maximum entropy is required and the most suitable. Thus, the major purpose in this paper is to search for the optimal threshold (the best fitness value) that yields the maximum entropy using Kapur’s entropy technique and the MSA.

Let there be K gray levels in a given image, which is in the range of {0, 1, 2…(K − 1)}. Let N be the whole number of pixels in the image and ni denote the number of pixels at gray level i. Then pi = ni/N represents the probability of the occurrence of gray level i in image K. Kapur’s entropy defines an image completely represented by its corresponding gray-level histogram. Consider that there exist m thresholds [t1, t2, ..tm] to be chosen that divide the image into many parts: C0, C1, C2, …Cm. Therefore, Kapur’s entropy is obtained using.

$$ {J}_2\left({t}_1,{t}_2,\dots {t}_m\right)=H{}_0+{H}_1+{H}_2+\dots +{H}_m $$
(1)

where

$$ {\displaystyle \begin{array}{ll}{H}_1=-\sum \limits_{i=0}^{t_1-1}\left({p}_i/{\omega}_0\right)\ln \left({p}_i/{\omega}_0\right),\kern0.5em {\omega}_0=\sum \limits_{i=0}^{t_1-1}{p}_i;& {H}_2=-\sum \limits_{i={t}_1}^{t_2-1}\left({p}_i/{\omega}_1\right)\ln \left({p}_i/{\omega}_1\right),\kern0.5em {\omega}_1=\sum \limits_{i={t}_1}^{t_2-1}{p}_i;\\ {}{H}_3=-\sum \limits_{i={t}_2}^{t_3-1}\left({p}_i/{\omega}_2\right)\ln \left({p}_i/{\omega}_2\right),\kern0.5em {\omega}_2=\sum \limits_{i={t}_2}^{t_3-1}{p}_i;& {H}_m=-\sum \limits_{i={t}_m}^{K-1}\left({p}_i/{\omega}_m\right)\ln \left({p}_i/{\omega}_m\right),\kern0.5em {\omega}_m=\sum \limits_{i={t}_m}^{t_m-1}{p}_i;\end{array}} $$

where H1, H2, …Hm represent Kapur’s entropies and ω0, ω1, ω2, …, ωm denote the class probabilities of the segmented classes C0, C1, C2, …Cm, respectively. The purpose of the present study is to use the MSA to maximize Kapur’s objective function, which is defined in Eq. (1).

3 Moth swarm algorithm

The MSA is a meta-heuristic proposed by Al-Attar Ali Mohamed [21] that is inspired by the special behavior of moths. A brief mathematical model of the MSA is provided in the following subsections.

3.1 Basic concepts

In the basic MSA, the position of the light source represents a possible solution to the problem to be optimized and the luminescence intensity of the light source represents the fitness of this solution. These assumptions are used to approximate the characteristics of the proposed algorithm. In the MSA, the entire moth population is divided into three groups of moths, which are defined as follows:

  • Pathfinders: This group of moths (np) can discover new areas in the optimization space according to the first-in, last-out principle. Pathfinders distinguish the best position of light sources to guide the movement of the main groups.

  • Prospectors: This group of moths flies into a stochastic spiral path in the neighborhood of the light sources that have been marked by the group of pathfinders.

  • Onlookers: This group of moths flies directly to the best global solution (moonlight) that has been determined by the prospectors.

3.2 Mathematical model

For each iteration, each moth xi is integrated into the optimization problem to determine the luminescence intensity of its corresponding light source f(xi). The pathfinders’ positions are considered as the best fitness in the swarm and they provide guidance for the next update iteration. Therefore, the prospectors and onlookers in the swarm are the second the third-best groups. The MSA is executed as follows:

3.2.1 Initialization

At the beginning of the flight, a set of moths that are the candidate solution are randomly generated as follows:

$$ {x}_{i,j}=\mathit{\operatorname{rand}}\left[0,1\right]\cdot \left({x}_j^{\mathrm{max}}-{x}_j^{\mathrm{min}}\right)+{x}_j^{\mathrm{min}}\kern0.5em \forall \kern0.5em i\in \left\{1,2\dots, n\right\};\kern0.5em j\in \left\{1,2,\dots d\right\} $$
(2)

where \( {x}_j^{\mathrm{max}} \) represents the upper limit and \( {x}_j^{\mathrm{min}} \) represents the lower limit.

After executing the initialization, the moths are divided into types according to their calculated fitness. Hence, the best moths are defined as light sources (pathfinders), the second best are defined as prospectors, and the worse are defined as onlookers.

3.2.2 Reconnaissance stage

n the MSA, the quality of the swarm for exploration may decrease over the course of iterations. The moths can become stagnant in an area, which appears to be good and easily achieves the local optimum. To eliminate premature convergence and enhance the diversity of solutions, pathfinder moths search for less-crowded places to guide other groups. Pathfinder moths update their positions by interacting with each other (crossover operations) and with the ability to fly for long distances (Lévy mutation), thereby using the adaptive crossover with Lévy mutation, which is presented in the following four steps.

Diversity index for crossover points

To improve the diversity of the solutions, a new strategy is used to select the crossover points. First, for iteration t, normalized dispersal degree \( {\sigma}_j^t \) of the individuals in the jth dimension can be calculated as follows:

$$ {\sigma}_j^t=\frac{\sqrt{\frac{1}{n_p}{\sum}_{i=1}^{n_p}{\left({x}_{ij}^t-\overline{x_j^t}\right)}^2}}{\overline{x_j^t}} $$
(3)

where \( \overline{x_j^t}=\frac{1}{n_p}{\sum}_{i=1}^{n_p}{x}_{ij}^t \), np represents the number of pathfinder moths and μt is a variation coefficient used to measure the relative dispersion and can be calculated as follows:

$$ {\mu}^t=\frac{1}{d}{\sum}_{j=1}^d{\sigma}_j^t $$
(4)

Any element among the pathfinder moths that has a low degree of disposal is accepted in group cp, which is the cross point group defined as follows:

$$ j\in {c}_p\kern3em if{\sigma}_j^t\le {\mu}^t $$
(5)

Thus, with the new strategy, the group of crossover points can change dynamically over the course of the iterations.

Lévy flights

A Lévy flight is a class of random walk that is based on a power-law distribution called α ‐ stable distribution, which can travel a large distance using different types of steps. For Lévy flights, Mantegna’s algorithm [7] is used to emulate the α ‐ stable distribution by generating random samples Li that have the same behavior as Lévy flights, which is defined as follows:

$$ {L}_i\sim step\oplus Levy\left(\alpha \right)\sim 0.01\frac{u}{{\left|y\right|}^{1/\alpha }} $$
(6)

where step denotes the scaling size that corresponds to the scales of the interest scales, ⊕ represents the dot product (entrywise multiplication), \( \mu =N\left(0,{\sigma}_{\mu}^2\right) \) and \( \nu =N\left(0,{\sigma}_y^2\right) \) are both normal stochastic distributions with \( {\sigma}_{\mu }={\left[\frac{\Gamma \left(1+\beta \right)\times \sin \left(\pi \times \beta /2\right)}{\Gamma \left(\left(\left(1+\beta /2\right)\right.\right)\times \beta \times {2}^{\left(\beta -1\right)/2}}\right]}^{1/\alpha } \), and σy = 1.

Difference vectors Lévy mutation

For nc ∈ np crossover execution points, the MSA addresses the sub-trail vector \( \overrightarrow{v_p}=\left[{v}_{p1},{v}_{p2},\dots, {v}_{pn_c}\right] \) by disturbing the constituents that have been selected from the host vector \( \overrightarrow{x_{r^1}}=\left[{x}_{r^11},{x}_{r^22},\dots {x}_{r^3{n}_c}\right] \), with the corresponding constituents in the donor vectors \( \overrightarrow{x_{r^1}}=\left[{x}_{r^11},{x}_{r^22},\dots {x}_{r^3{n}_c}\right] \). The mutation mechanism can be used for synthesis as a sub-trail vector defined as follows:

$$ \overrightarrow{v_p^t}=\overrightarrow{x_{r^1}^t}+{L}_{p1}^t\cdot \left(\overrightarrow{x_{r^2}^t}-\overrightarrow{x_{r^3}^t}\right)+{L}_{p2}^t\cdot \left(\overrightarrow{x_{r^4}^t}-\overrightarrow{x_{r^5}^t}\right)\kern0.5em \forall \kern0.5em {r}^1\ne {r}^2\ne {r}^3\ne {r}^4\ne {r}^5\ne p\in \left\{1,2,\dots, {n}_p\right\} $$
(7)

In Eq. (6), \( {L}_{p1}^t \) and \( {L}_{p2}^t \) are the two equivalent variables used as the mutation scaling factor and they are both generated by the power-law Lévy flights using (Lp~random(nc) ⊕ Levy(α)). The series of indices (r1, r2, r3, r4, r5, p), which are mutually indices, are selected exclusively from the pathfinder solutions.

Adaptive crossover operation based on population diversity

To obtain the completed trail solution, each pathfinder solution, also called a host vector, updates its position using the crossover operation by integrating the mutated variables of the sub-trail vectors (low degree of dispersal) with the related variables of the host vector. The main trail solutions are defined as follows:

$$ {V}_{pj}^t=\left\{\begin{array}{l}{v}_{pj}^t\kern2em if\kern0.5em j\in {c}_p\\ {}{x}_{pj}^t\kern2em if\kern0.5em j\notin {c}_p\end{array}\right. $$
(8)

Note that μt is the variation coefficient that is used to control the rate of crossover.

Roulette wheel selection

After completing all the preceding steps, the fitness of the completed trail solution is calculated and compared with the related host solution. The better fitness solutions are chosen to survive for the next iteration, which is calculated as follows:

$$ \overrightarrow{x_p^{t+1}}=\left\{\begin{array}{l}\overrightarrow{x_p^t}\kern4.5em if\kern0.5em f\left(\overrightarrow{V_p^t}\right)\ge f\left(\overrightarrow{x_p^t}\right)\\ {}\overrightarrow{v_p^t}\kern4.5em if\kern0.5em f\left(\overrightarrow{V_p^t}\right)<f\left(\overrightarrow{x_p^t}\right)\end{array}\right. $$
(9)

where Pp is the probability of the luminescence intensity fitp, which is modeled as follows:

$$ {P}_p=\frac{fit_p}{\sum_{p=1}^{n_p}{fit}_p} $$
(10)

The luminescence intensity is computed from the objective function value fp for minimization problems and defined as follows:

$$ {fit}_p=\left\{\begin{array}{l}\frac{1}{1+{f}_p}\kern5.5em for\kern0.5em {f}_p\ge 0\\ {}1+{f}_p\kern5.5em for\kern0.5em {f}_p<0\end{array}\right. $$
(11)

3.2.3 Transverse orientation

The prospector moths are the second-best luminescence intensity group of moths. The number of prospector moths decreases during the process of iteration T, which is modeled as follows:

$$ {n}_f= round\left(\left(n-{n}_p\right)\times \left(1-\frac{t}{T}\right)\right) $$
(12)

After the pathfinder moths complete their search, they share information about the luminescence intensity with the prospectors, which attempt to update their positions to locate new light sources. Each prospector moth xi flies into a logarithmic spiral path to perform a deep search around the related artificial light source xp, which is selected probability is Pp in Eq. (10). The new position of the ith prospector moth is modeled as follows:

$$ {x}_i^{t+1}=\left|{x}_i^t-{x}_p^t\right|\cdot {e}^{\theta}\cdot \cos 2\pi \theta +{x}_p^t\kern3em \forall \kern0.5em \mathrm{p}\in \kern0.5em \left\{1,2,\dots, {\mathrm{n}}_{\mathrm{p}}\right\};\kern0.5em i\in \left\{{n}_p+1,{n}_p+2,\dots, {n}_f\right\} $$
(13)

where θ is a random number range in [r, 1] for defining the shape of the logarithmic spiral and r =  − 1 − t/T. In the model of the MSA, each moth dynamically changes its type. Therefore, if any prospectors search a solution that is more luminescent than the existing light sources, then they attempt to become a pathfinder moth; that is, the new lighting sources and moonlight are put forward at the end of this stage.

3.2.4 Celestial navigation

During the process of optimization, as the number of prospectors decreases, the number of onlookers increases (no = n − nf − np), which may result in accelerating the convergence rate of the MSA to achieve a global solution. The moths with the lowest fitness value are onlooker moths. For this stage, the onlookers are divided into the two following parts.

Gaussian walks

In this stage, the onlookers are forced to search for more promising areas in the search place: the first part, with size nG = round(no/2) flying with Gaussian distribution \( q\sim N\left(\mu, {\sigma}_G^2\right) \) with density using

$$ f(q)=\frac{1}{\sqrt{2\pi}\sigma G}\exp \left(-\frac{{\left(q-\mu \right)}^2}{2{\sigma}^2G}\right)\kern4em -\infty <q<\infty $$
(14)

The new onlookers in this subgroup \( {x}_i^{t+1} \) fly with the set of Gaussian walks as follows:

$$ {x}_i^{t+1}={x}_i^t+{\varepsilon}_1+\left[{\varepsilon}_2\times {gbest}^t-{\varepsilon}_3\times {x}_i^t\right]\kern0.5em \forall \kern0.5em i\in \left\{1,2,..,{n}_G\right\} $$
(15)
$$ {\varepsilon}_1\sim random\left( size(d)\right)\oplus N\left({best}_g^t,\frac{\log t}{t}\times \left({x}_i^t-{best}_g^t\right)\right) $$
(16)

where ε1 denotes a random sample drawn from the Gaussian walks scaled to the size of this group, \( {best}_g^t \) denotes the global best solution, and both ε2 and ε3 are random numbers in the range [0, 1].

Associative learning mechanism with immediate memory

In this stage, the left part of the onlookers with size nA = no − nG are used to drift to the moonlight taking into account the associative learning operators with an immediate memory to simulate the real behavior of moths in nature. The immediate memory is initialized from the continuous Gaussian distribution on the intervals of \( {x}_i^t-{x}_i^{\mathrm{min}} \) and \( {x}_i^{\mathrm{max}}-{x}_i^t \). The mathematical model of this stage can be defined as follows:

$$ {x}_i^{t+1}={x}_i^t+0.001\cdot G\left[{x}_i^t-{x}_i^{\mathrm{min}},{x}_i^{\mathrm{max}}-{x}_i^t\right]+\left(1-g/G\right)\cdot {r}_1\cdot \left({best}_p^t-{x}_i^t\right)+\Big(2g/G\cdot {r}_2\cdot \left({best}_g^t-{x}_i^t\right) $$
(17)

where i ∈ {1, 2, …nA}, and 2g/G and 1 − g/G denote the social factor and cognitive factor, respectively. Both r1 and r2 are random numbers in the range [0, 1]. bestp denotes a light source stochastic selected from the new pathfinder group according to the probability value of its related solution.

4 Methodology

4.1 Proposed MSA-based multilevel thresholding method

Moths represent the search agents and their positions represent the thresholds to be optimized. Therefore, depending on the number of thresholds, the moths move in one-dimensional, two-dimensional, three-dimensional, or hyper-dimensional space by changing their position vectors. The positions of the moths are first initialized randomly. Then the fitness of all the moths is determined using Eq. (1). The positions of the moths are updated if better positions are determined. This process is repeated until the maximum number of iterations is completed. The best position of the moths provides the desired thresholds. The pseudocode for the proposed multilevel thresholding method is presented in the following subsection.

4.2 Pseudocode for MSA-based multilevel thresholding

figure e

4.3 Flowchart of MSA-based multilevel thresholding (see Fig. 1)

Fig. 1
figure 1

Flowchart of the MSA

5 Experiments and discussion

The experimental setup for the proposed algorithm is briefly introduced in this section. First, we introduce the test images. Additionally, we present the parameter settings of each algorithm. Finally, we present the descriptions of the segmentation validation metrics.

5.1 Test images

In our study, the experiments were conducted on eight images that were carefully selected from the database of Berkeley University and shown in Fig. 2.

Fig. 2
figure 2

Original images

5.2 Segmented image quality metrics

The MSA used five methods to evaluate the performance of segmented images as follows:

(1) The fitness function value using Eq. (1). The larger the objective function value, the more information the segmented image contained. (2) The execution time of the MSA and other algorithms. The average execution time was used to compare the computational complexity of a multi-threshold approach. Less time indicated that an algorithm was faster than other algorithms. (3) The PSNR measured the difference between the segmented image and reference image based on the intensity values in the image. The larger the PSNR value, the fewer distortions were represented. Because the visual acuity of the human eye is not absolute, it is possible that a higher PSNR value may appear to be worse than a lower PSNR value. (4) The SSIM is a measure of the similarity of two images. When two images are identical, the value of the SSIM equals one. (5) A statistical analysis using the Wilcoxon rank sum test was performed at a 5% significance level. It demonstrated whether there was a meaningful difference among the five algorithms. A value of less than 0.05 indicated that it was maintained at the significance level.

5.3 Experiment settings

The results obtained using MSA-based multilevel thresholding for image segmentation were compared with the algorithms using the WOA [5], BA [29], GWO [12], and FPA [28]. All the algorithms were run 30 times for each test image to ensure the credibility of the statistics. The parameter settings for all the algorithms are presented as follows. All the algorithms were run on a computer with an AMD Athlon (tm) II X4 640 processor and 4 GB of RAM using MATLAB R2012a.

BA setting

The maximum pulse intensity A was 0.9, maximum pulse frequency r was 0.5, pulse attenuation coefficient alpa was 0.95, and pulse frequency increase factor gamma was 0.05. The population size was 25 and maximum iteration number was 100.

WOA setting

The probability of p was 0.5, the number of iterations and population number were 100 and 25, respectively.

FPA setting

The population number was 25, number of iterations was 100, and probability of p was 0.8.

GWO setting

\( \overrightarrow{\alpha} \) was linearly decreased from two to zero. The population and number of iterations were the same as those for the FPA.

5.4 Segmented image quality measurements

To evaluate the segmented image quality, we used four measures: the PSNR, SSIM, computational time, and fitness function value using Eq. (1). The PSNR often served as a quality measurement between the segmented images and reference images. It provided the similarity of an image against a reference image based on the MSE of each pixel [1]. The PSNR is defined as follows:

$$ PSNR\kern1em \left( in\kern0.5em db\right)=20{\log}_{10}\left(\frac{255}{RMSE}\right) $$

where RMSE is the root mean-squared error defined as

$$ RMSE=\sqrt{\frac{1}{MN}}{\sum}_{i=1}^M{\sum}_{j=1}^N{\left[I\left(i,j\right)- Seg\Big(i,j\Big)\right]}^2 $$

where both M and N denote the sizes of the image, I denotes the original image, and Seg denotes the segmented images. The higher the PSNR value, the better the results of the segmented images.

The SSIM is often used to compute the similarity of the original image and segmented image. The mathematical model of the SSIM is defined as follows:

$$ SSIM\kern1em \left(I, Seg\right)=\frac{\left(2{\mu}_I{\mu}_{Seg}+{c}_1\right)\left(2{\sigma}_{I, Seg}+{c}_2\right)}{\left({\mu}_I^2+{\mu}_{Seg}^2+{c}_1\right)\left({\sigma}_I^2+{\sigma}_{Seg}^2+{c}_2\right)} $$

where μI represents the mean intensity of image I and μSeg represents the mean intensity of image Seg. σI and σSeg are the standard deviation of image I and image Seg, respectively. σI, Seg is a coefficient that denotes the covariance between image I and image Seg. c1 and c2 are constants. The higher the value of the SSIM, the better the result of the segmented image.

5.5 Experimental results

In this subsection, we present the segmented results obtained using the proposed algorithm and four state-of-the-art algorithms based on Kapur’s entropy in Tables 1, 2, 3, 4, 5, and 6. Table 1 lists the number of thresholds (k = 2, 3, 4, 5, 6) and the best fitness value obtained by all the algorithms. For the convenience of reading, we also provide the ranking of the fitness values. Table 1 shows clearly that the MSA outperformed all the algorithms for all the test images when K = 2 and K = 3. When K = 4, the MSA failed to obtain the best fitness value for all images except the images Man and Zebra; however, the MSA still outperformed the other algorithms. For K = 5, for the image Man, the result obtained using WOA was slightly better than using the MSA, whereas the MSA obtained a higher value than the other three algorithms. For the image Scene, the MSA ranked third after the WOA and BA. However, for the remaining images, the MSA outperformed all the other algorithms. When K = 6, the WOA and GWO obtained better results than the MSA for the image Starfish; however, for the other seven images, the MSA always determined the highest value compared with the other four algorithms. From Table 1, we conclude that the MSA outperformed the other four algorithms for almost all the test images with various threshold values.

Table 1 Comparison of the best fitness values for all the algorithms
Table 2 Best threshold values obtained from the algorithms for all test images
Table 3 Average execution time for WOA, GWO, FPA, BA, and MSA
Table 4 PSNR metrics for the WOA, GWO, FPA, BA, and MSA
Table 5 SSIM metrics for the WOA, GWO, FPA, BA, and MSA
Table 6 p − values of the Wilcoxon test over 30 runs (p − values > 0.5 have been highlighted in bold)

Tables 2 and 3 report the best threshold values obtained from the algorithms for all the test images and the corresponding execution times, respectively. The results in Table 3 demonstrate that the average execution time of the MSA was less than that of the other algorithms for most of the test images under different thresholds.

Table 4 reports the PSNR values under different thresholds obtained by the five algorithms. As shown in Table 4, the MSA obtained the highest results in the majority of the test cases. There were 40 PSNR values in total for each algorithm, and the MSA obtained the best results for half of all the PSNR values when compared with the other four algorithms. Additionally, the MSA ranked second in 11 cases for the PSNR values. This demonstrates the superior ability of the MSA.

Table 5 illustrates the SSIM standard under various thresholds for the WOA, GWO, FPA, BA, and MSA. If the value of the SSIM was high, then the similarity between the segmented image and original image was high; that is, the SSIM is a method that measures the quality of the segmented image and original image. The results reported in Table 5 indicate that the MSA always obtained the highest SSIM value or the second-best result for the 40 test cases. Hence, we conclude that the MSA-based method provided better quality segmentation compared with the WOA, GWO, FPA, and BA-based methods.

To further verify the performance of the swarm algorithms, we also conducted a statistical test: the Wilcoxon’s rank-sum test was performed at a 5% significance level in our experiment [10, 30]. Generally, if p − values < 0.5, then this can be considered as sufficient evidence against the null hypothesis. The objective function values of the MSA method were compared with the other four algorithms. Table 6 provides the results of the p − values for the Wilcoxon’s rank-sum test compared the MSA and other algorithms over all the simulation cases. In Table 6, N/A means “Not Applicable,” which indicates that the statistical test could not be executed because all the runs of the algorithms determined the optimum successfully. It can be clearly observed from Table 6 that the MSA-based method provided better results than the BA, GWO, FPA, and WOA-based methods for almost all the test cases. The results demonstrate that there were significant differences between the MSA and the other four algorithms, and they prove that the MSA performed much better than the other algorithms.

The segmentation results of the MSA and the other algorithms are presented in Figs. 3, 4, 5, 6, 7, 8, 9, and 10. According to these figures, we can observe that the MSA demonstrated good segmentation results for different images under various thresholds. Additionally, these figures demonstrate that, for an increasing threshold value, the segmented images were better.

Fig. 3
figure 3

Thresholded images of man obtained by all the algorithms

Fig. 4
figure 4

Thresholded images of airplane obtained by all the algorithms

Fig. 5
figure 5

Thresholded images of pepper obtained by all the algorithms

Fig. 6
figure 6

Thresholded images of baboon obtained by all the algorithms

Fig. 7
figure 7

Thresholded images of scene obtained by all the algorithms

Fig. 8
figure 8

Thresholded images of starfish obtained by all the algorithms

Fig. 9
figure 9

Thresholded images of zebra obtained by all the algorithms

Fig. 10
figure 10

Thresholded images of butterfly obtained by all the algorithms

The comparison between the proposed algorithm and the other algorithms is shown in Tables 1, 2, 3, 4, 5, and 6 and Figs. 3, 4, 5, 6, 7, 8, 9, and 10. Tables 1 and 2 report the fitness values and best threshold values, respectively, of the algorithms over 30 runs. The results in Table 1 indicate that all the algorithms performed nearly equally for K = 2, 3, 4, 5, 6; however, the MSA ranked in the top three for the fitness values. When K = 3, the WOA and GWO had nearly the same fitness function. Table 3 reports the experiments execution time. Depending on the results, the fastest algorithm was the MSA. The proposed algorithm obtained the best fitness value in most of the experiments, despite not being the fastest. The results demonstrate that the proposed algorithm had a better ability to switch between the exploration and exploitation phases than the other algorithms, and it had low complexity and high performance. The MSA has fewer control parameters than the other algorithms, which makes it more suitable for other optimization problems because the control parameters of the algorithm are likely to be more complex than the problem itself.

The values of the SSIM and PSNR are reported in Tables 4 and 5. These values indicate that the proposed method had better results in most cases. However, when k = 2, 4, 6 for image Zebra, the SSIM value and PSNR value were lower than those of the other algorithms because each image was considered as a different optimization problem and the randomness of the swarm approaches caused the results to vary in some cases. Figures 2, 3, 4, 5, 6, 7, 8, 9, and 10 show the segmentation results of the proposed algorithm and the other algorithms with different threshold levels. From these figures, we conclude that higher-level images contain more details than other images.

In image processing, it is often possible to manually mark the work, but it is difficult to write a complete rule for automatic processing. Sometimes there is an entire set of algorithms, but there are too many parameters, and it is too tedious to manually adjust and determine the correct parameters. Hence, we can use the machine learning method to extract a certain number of features and manually mark a batch of results, and then use machine learning to determine a set of automatic judgment criteria. Machine learning is more effective for developing such software.

In this section, we used different approaches to measure the segmented quality of the MSA-based method and compare it with other algorithms. In terms of the objective fitness value, PSNR, SSIM, and execution times, the MSA-based method provided the best results for most test cases. Additionally, Wilcoxon’s rank-sum test demonstrated the good performance of the MSA-based method. This confirms the superior ability of the MSA for image segmentation.

6 Conclusions and future work

The objective of image segmentation thresholding is to obtain a good quality of segmented results without consuming a great deal of time. In our paper, we used the MSA-based Kapur’s entropy method to solve image segmentation problems under different thresholds. The experimental results of the MSA-based method not only clearly demonstrated the efficiency and feasibility of this method in solving multilevel thresholding but also proved our proposed method’s obvious superiority over four state-of-the-art algorithms, WOA, BA, GWO, and FPA, in terms of the PSNR, SSIM, and execution time. For future work, the MSA can be applied to solve image segmentation under higher threshold values and tested on many more images. Additionally, we aim to propose a simpler but more efficient MSA to solve image segmentation and apply it in egineering applications.