1 Introduction

COVID-19 was an unprecedented human pandemic in recent decades [1, 2]. The first case of COVID-19 was reported in Wuhan, China, in December 2019, but there is no conclusive evidence as far as its origin is concerned [3]. Due to its severe transmissibility, 600 million infections have accumulated worldwide to date, with irreversible effects on the global economy and human lives [4]. Cough, fever, sore throat, headache, and physical fatigue are all symptoms of COVID-19 [5]. In addition, COVID-19 can cause fibrosis of the lung tissue and lead to various complications that seriously threaten the health of patients [6]. Therefore, according to the above, healthcare providers need to have the means to detect this disease in time to prevent and interrupt the spread of the epidemic. The most common assay is Polymerase Chain Reaction (PCR), which is frequently used by medical institutions all over the world [7]. However, this work is time-consuming, costly, and most importantly, it has the potential for false-positive results. Considering that COVID-19 causes fibrosis in patients' lung tissues, chest CT has become a more reliable means of detection [8]. CT images can clearly reflect the internal anatomy of the body tissues and also the nature of the lesions, which is a great help for the clinical counterpart treatment. The reasonable segmentation of CT images can further improve the efficiency of disease diagnosis by medical personnel. Therefore, image segmentation has become a common technique in medical imaging [9].

Segmentation is a key step in image processing and it plays an indispensable role in different fields such as remote sensing [10], feature selection [11], computer vision [12], medical imaging [13], and cryptography [14], etc. The main goal of image segmentation is to find a suitable threshold to segment an image into multiple regions containing the same features based on texture, color, brightness, or contrast. There are numerous approaches to solving the segmentation problem, such as edge-based [15], region-based [16], threshold-based [17], and feature clustering-based [18]. However, according to a large literature summary, the threshold-based segmentation method relies on its simplicity and efficiency to be the preferred solution [19,20,21]. Threshold segmentation is divided into two types in terms of broad categories: one is bi-level threshold segmentation and the other is multi-level threshold segmentation. Bi-level threshold segmentation is the simplest segmentation method and is now well established. It segments an image region into two classes by determining an optimal threshold value. On the other hand, multilevel thresholding is used to segment an image into several different parts by maximizing or minimizing the objective function for a given number of thresholds to determine the threshold value. The most common multilevel thresholding methods include Fuzzy entropy [9], Kapur’s entropy [22], Tsallis entropy [23], and Otsu method [24]. Although multilevel thresholding is theoretically an upgraded version of bi-level threshold segmentation, it is constrained by the practical problem that as the number of thresholds increases and the image size becomes larger, it becomes impractical to rely on the computing power to exhaust every possible solution. Therefore, more and more scholars are using meta-heuristics instead of traditional mathematical computational methods to reduce the time cost in complex image processing problems.

In recent decades, meta-heuristics have made a splash in industry with their simple and efficient performance, especially for a variety of challenging optimization problems. Many researchers have worked to improve the performance of meta-heuristics, and a large number of meta-heuristics have emerged during this period. The most common meta-heuristics for multilevel threshold image segmentation problems include: Particle Swarm Optimization (PSO) [25], Differential Evolution (DE) [26], Grey Wolf Optimizer (GWO) [27], Monarch Butterfly Optimization (MBO) [28], Whale Optimization Algorithm (WOA) [29], Multi-Verse Optimizer (MVO) [30], Harris Hawk Optimization (HHO) [31], Black Widow Optimization Algorithm (BWOA) [32], Slime Mould Algorithm (SMA) [33], Hunger Games Search (HGS) [34], RUNge Kutta Optimizer (RUN) [35], Marine Predators Algorithm (MPA) [36], Weighted Mean of Vectors (INFO) [37], and Rime Optimization Algorithm (RIME) [38] etc. In addition, the recently proposed Golden Jackal Optimization (GJO) [39] has also shown superior performance. For the GJO, Rezaie et al. [40] used the GJO algorithm to solve the model parameter estimation problem for fuel cells, and the experimental results showed that the proposed improved method outperformed other classical algorithms in terms of optimal model estimation. Houssein et al. [41] introduced the opposition-based learning mechanism into the GJO which is named IGJO. The experimental results show that IGJO outperforms WOA, SOS, SSA, HHO, GTO, and MPA in multilevel threshold segmentation. Zhang et al. [42] proposed an enhanced GJO algorithm and used it to solve the adaptive infinite impulse response system identification problem, and the experimental results showed that the enhanced GJO algorithm was able to obtain higher computational accuracy in this problem. By investigating the above literature, we can observe that, due to its flexibility and versatility, the GJO has received much attention from scholars in various fields since it was proposed, and has solved various complex engineering problems. However, these literatures also reflect the disadvantages of GJO. Firstly, the convergence speed is slow in high-dimensional complex problems. Then, GJO is easily influenced by the initial population. At last, like most meta-heuristics, it is easy to fall into local optimum.

The following is a review of some recent outstanding contributions in the field of image segmentation using meta-heuristics. Huo, Sun, and Ren [43] used the improved Bloch Quantum Artificial Bee Colony Algorithm for multilevel thresholding of images and verified the comprehensive performance of the algorithm on grayscale images. In [44], Xing used Gaussian mutation, Lévy flight, and opposition-based learning to improve the Emperor Penguin Optimization, and the performance of the improved algorithm was enhanced for multilevel thresholding segmentation of color images. Upadhyay et al. [45] used the Crow Search Algorithm to optimize the objective function for computing Kapur entropy, and experimented on different threshold values of 2, 4, 8, 16, and 32, which obtained superior results. Elaziz et al. [46] proposed a hybrid algorithm to determine the optimal threshold for threshold segmentation, which combines the Whale Optimization Algorithm and the Volleyball Premier League Algorithm. The experimental results show that the proposed algorithm outperforms other methods in several performance metrics such as Peak Signal to Noise Ratio and Structural Similarity Index. Meanwhile, Elaziz et al. [47] improves the Marine Predators Algorithm using quantum theory, which greatly enhances the global search capability of the algorithm. Better results were obtained in the multilevel threshold segmentation task. Houssein et al. [12] proposed an image segmentation method based on the Black Widow Optimization Algorithm, the Kapur entropy and the Otsu method were used as the objective function, respectively. The experimental results show that the proposed algorithm has more reliable performance compared with the classical algorithm. Meanwhile, the application of meta-heuristics in the field of image segmentation is investigated in more depth by Houssein et al. in the following literature. In [48], a local escaping operator is combined with the Tunicate Swarm Algorithm to propose a method for global optimization and image segmentation. In [49], the opposition-based learning mechanism is introduced for the Marine Predators Algorithm. The proposed algorithm is applied to the multilevel threshold segmentation problem. Experimental results show that the proposed algorithm outperforms other algorithms being compared in terms of performance. In [50], an effective multilevel thresholding segmentation method is proposed using an improved Equilibrium Optimizer, which is used for segmentation of medical images. The experimental results show that the proposed method can effectively solve the segmentation problem of medical images. In [51], the Salp Swarm Algorithm is combined with the Marine Predators Algorithm to determine the optimal threshold for the multilevel thresholding segmentation problem. In [13], the Chimp Optimization Algorithm is enhanced based on opposition-based learning and Lévy flight, and the improved algorithm outperforms other algorithms in multilevel threshold segmentation. Liu et al. [52] proposed a novel Ant Colony Optimization for COVID-19 image segmentation, and the experimental results show that the algorithm can further improve the diagnosis of COVID-19. Bhandari [53] proposed an algorithm using two objective functions of a multilevel threshold segmentation method based on Beta-difference Evolution, which can retrieve the best threshold accurately and efficiently. Wu et al. [54] proposed an improved Teaching–learning-based Optimization Algorithm for the multilevel threshold segmentation problem. The experimental results show that the algorithm can segment high-resolution X-ray images perfectly. He and Huang [55] proposed a Krill Herd Algorithm for solving multilevel thresholding segmentation of color images by using the Otsu method, Kapur entropy, and Tsallis entropy as the objective function. The experimental results show that the proposed algorithm has more accurate and stable performance on Kapur’s entropy. Ren et al. [56] improved the differential evolution (DE) and proposed an algorithm called MDE. This algorithm can improve the convergence accuracy and the ability to leap out of local optimum to some extent. And the segmentation experiments were conducted on breast cancer and skin cancer pathology images, and the experimental results illustrate that the proposed method can provide an efficient segmentation procedure for pathology medical images. Hosny et al. [57] combine Coronavirus Optimization Algorithm (COVIDOA) and Harris Hawks Optimization Algorithm (HHOA) to solve the segmentation problem, and the proposed method compensates the shortcomings of COVIDOA and HHOA mutually to some extent, and the improved performance of the proposed algorithm over the COVIDOA and HHOA algorithms was demonstrated by five test problems in the IEEE CEC 2019 benchmark problem, and the quality of the segmented images of the proposed algorithm was better than the other methods in the segmentation experiments. Zhu et al. [58] proposed an improved WOA with Levy operator and chaotic random variation strategy to improve the ability of the algorithm to jump out of the local optimum and explore the search space, and the proposed method has excellent performance in both benchmark test sets and image segmentation experiments compared with other variants of WOA. Emam et al. [59] proposed an improved RSA algorithm by integrating Reptile Search Algorithm (RSA) and RUN algorithm, which introduced the ESQ mechanism of RUN into RSA, improved the convergence speed and the ability to jump out of the local optimal. Through CEC2020 benchmark test set and brain magnetic resonance imaging segmentation experiments, it is proved that the proposed algorithm has strong optimization ability. Han et al. [60] used an improved MVO to maximize the Kapur’s entropy, and the experimental results proved that the proposed method is highly competitive with other meta-heuristics for benchmark functions and image segmentation experiments. Xing et al. [61] improved WOA by introducing a Quasi-Opposition-Based Learning and Gaussian Barebone Mechanism, and the improved method beat other algorithms in the image segmentation experiments. According to the above literature, we can further understand that the meta-heuristics has been widely used in image segmentation. However, at present, there are still many shortcomings, most meta-heuristics converge slowly and easy to fall into local optimality. Therefore, we need to further improve the meta-heuristic algorithm to improve the efficiency of the algorithm to deal with problems.

In recent years, many researchers have combined machine learning methods and meta-heuristics to improve the performance of meta-heuristics. Among them, combining methods such as Support Vector Machine (SVM) [62], Self-Organizing Maps (SOM) [63], and Reinforcement Learning (RL) [64] with meta-heuristics has achieved amazing success and provided a new direction for the development of meta-heuristics. In addition, RL has become more popular among research scholars in recent studies [65]. RL enables an agent to learn autonomously by interacting with the environment, and the combination with metaheuristics often leads to better performance. For example, Qu et al. [66] combined Grey Wolf Optimizer with RL and succeeded in achieving unexpected success in the UAV 3D path planning problem. In [67], the authors used Policy Iteration (PI) and Grey Wolf Optimizer to train the Neural Networks (NNs). Experimental results show that the proposed method has a better solution. Sadeg et al. [68] proposed a Reinforcement Learning based Bee Swarm Optimization for feature selection. Experimental results show that the proposed algorithm can yield satisfactory results for large instances. Chen et al. [69] introduced a method for solving the flexible job-shop scheduling problem, combining RL with the Genetic Algorithm for the first time, and achieved satisfactory results in this problem.

As mentioned above, at this present, meta-heuristics have been widely used in the field of image segmentation, and the combination of RL and meta-heuristics has been recognized by a wide range of researchers. However, as far as we know, there is a lack of research that combines RL with meta-heuristics to solve image segmentation. Therefore, in view of the problem that golden jackal optimization algorithm is easy to fall into local optimality, in this study, we propose a reinforcement learning-based improved golden jackal optimization to advance the research of medical image segmentation. The main contributions of this paper are as follows:

  1. (1)

    QLGJO: An enhanced version of the GJO based on reinforcement learning (Q-Learning), which is named QLGJO, is proposed to be used to advance the research of CT images of COVID-19.

  2. (2)

    Three mutation strategies are proposed to improve the exploration performance of GJO. In addition, a new update strategy is introduced in the original algorithm to further balance the exploration and exploitation.

  3. (3)

    The performance comparison experiment of QLGJO and other advanced meta-heuristics were conducted on IEEE CEC2022. The experimental data reveal that the performance of QLGJO is better than the others.

  4. (4)

    The Otsu method (maximum interclass variance method) is used as the objective function, and it is optimized using QLGJO.

  5. (5)

    Peak Signal to Noise Ratio (\(PSNR\)) [70], Structural Similarity Index (\(SSIM\)) [71], and Feature Similarity Index (\(FSIM\)) [72] are used as metrics for the segmentation experiments to verify the effectiveness of the different algorithms.

  6. (6)

    The performance of the proposed method was evaluated with six different meta-heuristics at thresholds of 8, 12, 16, and 20. Experimental results show that the proposed method has superior advantages and can be further extended to other classes of medical imaging diagnostics.

The rest of this paper is organized as follows: Sect. 2 contains the materials and methods. Section 3 suggest the QLGJO algorithm. Section 4 introduces, discusses and analyzes the performance of the proposed method. At last, Sect. 5 summarizes the study and provides suggestions for future work.

2 Preliminaries

In this section, we introduced the main framework of the GJO algorithm and some basic concepts of reinforcement learning. In addition, the objective function used in this study and the dataset of COVID-19 are also described.

2.1 Golden Jackal Optimization

GJO was proposed by Chopra and Ansari [39] in 2022 as a meta-heuristic based on swarm intelligence. Compared with other meta-heuristics, GJO provides a fresh strategy to solve optimization problems. The GJO is inspired by the collaborative hunting behavior of golden jackals, which hunt in pairs. Generally, the male jackal leads the female jackal in hunting, first finding and approaching the prey, then surrounding and chasing the prey, and finally hunting the prey. GJO simulates the hunting process of the golden jackal in two phases: first, the search phase, which involves finding and tracking the prey. Next is the exploitation phase, in which the golden jackal surrounds the prey and hunts it. The mathematical description of GJO will be discussed in the following subsections.

2.1.1 Initialization

As mentioned above, GJO is a meta-heuristic based on swarm intelligence. Therefore, the initialization process of GJO is similar to most meta-heuristics. Equation (1) depicts the initialization process of the GJO.

$$ \begin{gathered} \overrightarrow {{Pos_{k} }} \; = \;\overrightarrow {{LB}} \; + \;\overrightarrow {{rand}} \circ \left( {\overrightarrow {{UB}} - \overrightarrow {{LB}} } \right),\;k\; = \;1,\;2, \ldots ,\;n \hfill \\ Pos\; = \;\left[ {\overrightarrow {{Pos_{1} }} ,\;\overrightarrow {{Pos_{2} }} , \ldots ,\;\overrightarrow {{Pos_{n} }} } \right]^{T} \hfill \\ \end{gathered} $$
(1)

where \({\varvec{P}}{\varvec{o}}{\varvec{s}}\) represents the positions of all individuals in the population, \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{Pos_{k} }}\) stands for the position of the \(k\) th individual, \(n\) is the population size, \(\mathop{LB}\limits^{\rightharpoonup} \) and \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{UB}}}}\) denote the upper and lower boundaries of the environment, respectively. \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{rand}}}}\) is a n-dimensional random vector between 0 and 1. Moreover, the symbol \(\circ \) indicates the Hadamard product [73].

2.1.2 Exploration

In the exploration phase, male golden jackals lead female golden jackals to find and track their prey. The positions of the male and female golden jackets are shown by Eq. (2) and Eq. (3), respectively. The position update scheme for individual k is shown in Eq. (7).

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\mathbf{1}}} }} (t) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{m}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{E}}} \circ \left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{m}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{RL}}}} \circ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{k}}} }} (t)} \right| $$
(2)
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\mathbf{2}}} }} (t) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{fm}}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{E}}} \circ \left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{fm}}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{RL}}}} \circ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{k}}} }} (t)} \right| $$
(3)

where \(t\) is the current iteration, \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{k}}} }} \left( t \right)\) stands for the position of the \(t\) th iteration for the \(k\) th individual. \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{m}}} }} \left( t \right)\) and \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{fm}}}} }} \left( t \right)\) denote the male and female jackals of the current iteration, respectively, which are best and second-best individuals of the population. \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{RL}}}}\) is a n-dimensional random vector which is based on Lévy flight [74], which is discussed in detail in [36]. \({\mathop{E}\limits^{\rightharpoonup} }\) is the Evading Energy of individual which is calculated as Eq. (4), Eq. (5), and Eq. (6).

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{\varvec{E}}} = E_{1} \cdot \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{E}}_{{\mathbf{0}}} }} $$
(4)
$$ E_{1} = 1.5 \times (1 - t/T) $$
(5)
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{E}}_{{\mathbf{0}}} }} = 2 \cdot \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{rand}}}} - 1 $$
(6)

where \({E}_{1}\) stands for the decreasing energy of the individual, \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{E}}_{0} }}\) denotes the initial value of the decreasing energy. \(T\) is the maximum number of the iterations.

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{k}}} }} (t + 1) = \frac{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\mathbf{1}}} }} (t) + \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\mathbf{2}}} }} (t)}}{2} $$
(7)

2.1.3 Exploitation

In the exploitation phase, the golden jackal attacks the prey tracked in the exploration phase, and when the evading energy of the prey decays to 0, the golden jackal will hunt the prey. The mathematical model of this phase is similar to that of the exploration phase, which is shown below.

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\mathbf{1}}} }} (t) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{m}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{E}}} \circ \left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{RL}}}} \circ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{m}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{k}}} }} (t)} \right| $$
(8)
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\mathbf{2}}} }} (t) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{fm}}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{E}}} \circ \left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{RL}}}} \circ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{fm}}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{k}}} }} (t)} \right| $$
(9)

Please note that all the definitions in this section have the same meanings as those mentioned in the previous section. Therefore, it will not be repeated any further. And the pseudo-code of the GJO is given in Algorithm 1.

figure a

2.2 Reinforcement Learning

Reinforcement Learning (RL), also referred to as reactive learning, evaluative learning, or augmented learning, which is one of the paradigms and methodologies of machine learning [75]. RL is a learning mechanism that learns how to map from states to actions in order to maximize the obtained reward. As scientific research continues to advance, RL can currently be divided into two main categories: policy-based methods and value-based methods. While policy-based methods do not include value functions, the opposite is true for value-based methods. A typical representative of value-based methods is the Q-Learning algorithm. This algorithm uses a matrix called the Q-table to record the Q-values for different states (\(S\)). The Q-table is randomly initialized, and before each iteration, the algorithm selects the best action to maximize the Q-value among all actions (\(A\)) for state (\(S\)) to obtain the best reward. In addition, Eq. (10) shows the Bellman Equation [76], which is the specific update formula for the Q-table.

$$ Q_{t + 1} (s_{t} ,a_{t} ) \leftarrow Q_{t} (s_{t} ,a_{t} ) + \lambda [r_{t + 1} + \gamma \max (Q_{t} (s_{t + 1} ,a)) - Q_{t} (s_{t} ,a_{t} )] $$
(10)

where \({s}_{t}\) and \({s}_{t+1}\) represent the current state and the next state, respectively. \(\lambda \) stands for the learning rate, and \(\gamma \) is the discount factor, which all of them are between 0 and 1. \({r}_{t+1}\) denotes the reward or penalty which is the agent receives depending on the current action. \({Q}_{t}({s}_{t},{a}_{t})\) indicates the Q-value of the selected action in the current state. \(\mathrm{max}({Q}_{t}({s}_{t},a))\) represents the maximum Q-value of all actions in the current state. Finally, \({Q}_{t+1}({s}_{t},{a}_{t})\) is the Q-value which is pre-estimated for the next state. Therefore, three components are required to solve the problem with Q-Learning: the reward table (R-table), the Q-table, and the Bellman Equation. In addition, the pseudo-code of the Q-Learning is shown in Algorithm 2.

figure b

2.3 Thresholding Methods

2.3.1 Otsu’s Methods

The Otsu method was proposed in 1979 [77], which segmented images by maximizing variance between classes. In other words, the Otsu method is a nonparametric segmentation method that divides an image into different regions based on the intensity of the pixels. Assume that \(L\) is the pixel intensity levels of an image which has size of \(M\times N\).

$$ n = n_{0} + n_{1} + ... + n_{L - 1} $$
(11)
$$ Ph_{i} = \frac{{n_{i} }}{n}, \, \sum\limits_{i = 0}^{L - 1} {Ph_{i} } = 1 $$
(12)

where \(n\) indicates the total number of image pixels, \({n}_{i}\) is the number of pixels for intensity level \(i\), and probability distribution of all intensity levels is represented by \({Ph}_{i}\).

Assume there is a threshold \(th\), in which \(th\) is between 0 and \(L-1\), then the image can be divided into two classes according to th. The first class, \({C}_{1}\), contains all pixels with pixel intensity levels between \(\left[0, th\right]\), while \({C}_{2}\) contains the rest of the pixels.

$$ \omega_{1} (th) = \sum\limits_{i = 0}^{th} {Ph_{i} } , \, \omega_{2} (th) = \sum\limits_{i = th + 1}^{L - 1} {Ph_{i} } = 1 - Ph_{1} (th) $$
(13)

where \({\omega }_{1}\left(th\right)\) and \({\omega }_{2}\left(th\right)\) represent the cumulative probability distributions for \({C}_{1}\) and \({C}_{2}\), respectively.

$$ \mu_{1} (th) = \sum\limits_{i = 0}^{th} {iP(i|C_{1} )} = \sum\limits_{i = 0}^{th} {\frac{{iP(C_{1} |i)P(i)}}{{P(C_{1} )}}} = \frac{1}{{\omega_{1} (th)}}\sum\limits_{i = 0}^{th} {iPh_{i} } $$
(14)
$$ \mu_{2} (th) = \sum\limits_{i = th + 1}^{L - 1} {iP(i|C_{2} )} = \sum\limits_{i = th + 1}^{L - 1} {\frac{{iP(C_{2} |i)P(i)}}{{P(C_{2} )}}} = \frac{1}{{\omega_{2} (th)}}\sum\limits_{i = th + 1}^{L - 1} {iPh_{i} } $$
(15)
$$ \mu_{th} = \sum\limits_{i = 0}^{th} {iPh_{i} } , \, \mu_{T} = \sum\limits_{i = 0}^{L - 1} {iPh_{i} } $$
(16)

where \({\mu }_{1}\left(th\right)\) and \({\mu }_{2}\left(th\right)\) indicate the mean intensity levels for \({C}_{1}\) and \({C}_{2}\), respectively. \({\mu }_{th}\) denotes the mean intensity level from \(0\) to \(th\). \({\mu }_{T}\) represents the mean intensity level for the whole image. In summary, Eq. (17) is derived without difficulty. Hence, the objective function of the maximizing variance between classes can be expressed by Eq. (18).

$$ \begin{gathered} \omega_{1} (th) + \omega_{2} (th) = 1 \hfill \\ \omega_{1} (th) \cdot \mu_{1} (th) + \omega_{2} (th) \cdot \mu_{2} (th) = \mu_{G} \hfill \\ \end{gathered} $$
(17)
$$ \begin{gathered} \sigma_{B}^{2} = \frac{{(\mu_{G} \omega_{1} (th) - \mu_{th} )^{2} }}{{\omega_{1} (th)(1 - \omega_{1} (th))}} \\ = \omega_{1} (th)(\mu_{1} (th) - \mu_{T} )^{2} + \omega_{2} (th)(\mu_{2} (th) - \mu_{T} )^{2} \\ = \omega_{1} (th)\omega_{2} (th)(\mu_{1} (th) - \mu_{2} (th))^{2} \\ \end{gathered} $$
(18)
$$ \sigma_{B}^{2} (th^{*}) = \mathop {\max }\limits_{0 \le th \le L - 1} \sigma_{B}^{2} (th) $$
(19)

Therefore, according to Eq. (18), we are able to compute a \({th}^{*}\) to maximize the \({\sigma }_{B}^{2}\), which can be further expressed using Eq. (19). In conclusion, the Otsu method can be considered as a maximization problem, which means that the Otsu method could be further optimized using the meta-heuristics.

2.3.2 Kapur’s Entropy

Kapur's entropy was proposed by Kapur in 1985 [78], which segmented the image based on the probability distribution of the image histogram. Also considering the bi-level threshold segmentation problem, the objective function of Kapur's entropy is defined as shown below:

$$ \max F_{Kapur} (th) = H_{1} + H_{2} $$
(20)

where \({H}_{1}\) and \({H}_{2}\) represent the Kapur’s entropy of the pixel intensity at \(\left[0, th\right]\) and \(\left[th, L-1\right]\), respectively, which are computed as follows:

$$ H_{1} = \sum\limits_{i = 0}^{th} {\frac{{Ph_{i} }}{{\omega_{1} (th)}}\ln \left( {\frac{{Ph_{i} }}{{\omega_{1} (th)}}} \right)} , \, H_{2} = \sum\limits_{i = th + 1}^{L - 1} {\frac{{Ph_{i} }}{{\omega_{2} (th)}}\ln \left( {\frac{{Ph_{i} }}{{\omega_{2} (th)}}} \right)} $$
(21)

where \(\mathrm{ln}\) is the natural logarithm, and the rest definitions in this formula have the same meanings as those mentioned in the previous section.

2.4 The COVID-19 Dataset

As mentioned in the previous section, this study was conducted to promote the scientific research for COVID-19. Therefore, we evaluated the performance of the proposed algorithm using chest CT images of COVID-19. CT images were obtained from the dataset [79], which has 349 CT images containing clinical findings of COVID-19 from 216 patients. In this study, the proposed algorithm and other comparison algorithms were evaluated on 12 randomly selected images from this dataset in order to test and compare the performance between each algorithm.

3 The Proposed Algorithm

Global exploration capabilities and local exploitation capabilities intrinsically affect the performance of meta-heuristics. However, the two capabilities conflicted with each other for the majority of meta-heuristics. It means that there is no way for a meta-heuristic to perform local exploitation while performing global exploration. Therefore, how to balance the exploration and exploitation for meta-heuristics has become a key factor in improving the performance. At this stage, the meta-heuristics usually use a composite function calculated from the current and the maximum number of iterations to control exploration and exploitation. For example, the evading energy (\({\mathop{E}\limits^{\rightharpoonup} }\)) in the GJO is used to control the exploration and exploitation. When \(\left| {{\mathop{E}\limits^{\rightharpoonup} }} \right| > 1\), the algorithm performs exploration, vice versa, it performs exploitation. Hence, we could realize that the search process of GJO is unified, which means that all of the individuals are transformed from global search to local exploitation. This phenomenon implies a strong possibility that the algorithm will be trapped in local optimal. Therefore, an independent search process can improve the performance more effectively, whereas reinforcement learning can conveniently achieve this goal.

3.1 The Q-Learning Strategy

In this study, we consider the individual in the population as the agent of RL, while the search space is regarded as the environment, the state (\(s\)) represents the current updated position scheme of the individual, and the action (\(a\)) stands for the change process of state (\(s\)). Each individual has three operations to update their own position: exploration, exploitation, and hybrid mode. They adaptively choose the update strategy based on their learning experience. If the fitness of an individual increases after the updating operation, then it could receive positive feedback, and vice versa, it must receive a penalty. The Q-table has been designed as a \(3\times 3\) matrix, where the rows and columns denote state and action, respectively. In addition, it is important to note that each individual has a separate Q-table in order to ensure the independence of the learning process. Figure 1 illustrates the update process of an individual in Q-Learning mode. The individual is currently in the “Exploration” state, and by comparing the action feedback of the “Exploration” state, we can anticipate that the individual will get the most rewards when the next state is in the “Hybrid”. Therefore, the individual will switch to the “Hybrid” state. In addition, the value of the Q-table will be updated by Eq. (10).

Fig. 1
figure 1

The update process of the Q-table

With the analysis of Fig. 1 combined with Eq. (10), we realize that the learning rate \(\lambda \) can influence the Q-Learning algorithm enormously. A higher learning rate can cause individuals to forget the experiences which have been gained. On the other hand, a lower learning rate can prevent individuals from learning from the environment to change their behavior. Therefore, learning rate should be dynamically adjusted from a higher value to a lower value during the iteration, which can effectively increase the learning ability of the individual. In this study, the adjustment formula for the learning rate is shown as follows:

$$ \lambda { = }\frac{{\lambda_{initial} + \lambda_{funal} }}{2} - \frac{{\lambda_{initial} - \lambda_{funal} }}{2}\cos \left( {{\uppi }\left( {1 - \frac{t}{T}} \right)} \right) $$
(22)

where \({\lambda }_{initial}\) and \({\lambda }_{final}\) stand for the initial and final value of \(\lambda \), which have been set to 0.9 and 0.1, respectively. \(t\) and \(T\) represent the current and maximum iteration number, respectively. Moreover, the reward parameter \(r\) has been determined by fitness, which is set to 1 if the fitness is improved, otherwise -1.

3.2 The New Update Mode and Mutation Strategy

In the original GJO algorithm, the exploration operation is performed at the beginning of the iteration, which is described by Eq. (2) and Eq. (3). The exploitation operation, on the other hand, is performed at the end of the exploration operation, which is depicted using Eq. (8) and Eq. (9). In this paper, RL is introduced to coordinate the exploration and exploitation selection processes and inherit the original exploration and exploitation mechanism. However, in order to further balance exploration and exploitation, this study proposes a new mode, called the hybrid mode. In the hybrid mode, the population will be divided into two classes, in which one class keeps exploration while the other class starts exploitation. In addition, in the hybrid model, to further enhance the diversity of the population, a variation mechanism is designed to update the position of individuals in the population. The specific update strategy of the hybrid model is described by Eq. (23), Eq. (24), and Eq. (25).

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{{\mathbf{1}}} }} (t) = \left\{ {\begin{array}{*{20}c} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{\user2{m}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{\user2{E}} \circ \left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{\user2{m}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{\user2{RL}}} \circ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{\user2{k}} }} (t)} \right|,r < 0.5} \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{\user2{m}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{\user2{E}} \circ \left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{RL}}} \circ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{\user2{m}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{\user2{k}} }} (t)} \right|,r \ge 0.5} \\ \end{array} } \right. $$
(23)
$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{{\mathbf{2}}} }} (t) = \left\{ {\begin{array}{*{20}c} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{{\user2{fm}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{\user2{E}} \circ \left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{{\user2{fm}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{RL}}} \circ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{\user2{k}} }} (t)} \right|,r < 0.5} \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{{\user2{fm}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{\user2{E}} \circ \left| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{RL}}} \circ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{{\user2{fm}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{\user2{k}} }} (t)} \right|,r \ge 0.5} \\ \end{array} } \right. $$
(24)
$$ \begin{gathered} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\alpha }} }} (t) = \frac{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\mathbf{1}}} }} (t) + \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\mathbf{2}}} }} (t)}}{2} \hfill \\ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{\varvec{\beta }} }} (t) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\varvec{r}{\mathbf{1}}}} }} (t) + \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{rand}}} \circ (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\varvec{r}{\mathbf{2}}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\varvec{r}{\mathbf{3}}}} }} (t)) \hfill \\ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{k}} }} (t + 1) = \left\{ {\begin{array}{*{20}c} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\alpha }} }} (t),{\text{fitness}}(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\alpha }} }} (t)) {\text{fitness}}(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\beta }} }} (t))} \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\beta }} }} (t),otherwise} \\ \end{array} } \right. \hfill \\ \end{gathered} $$
(25)

where \(r\) is a random number between 0 and 1. \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{\alpha}}} }} \left( t \right)\) represents the updated position of the \(t\) th iteration for the \(k\) th individual. \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{\beta}}} }} \left( t \right)\) denotes the mutation position of the \(t\) th iteration for the \(k\) th individual. \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{rand}}}}\) stands for a random vector between 0 and 1. \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{r}}1}} }} \left( t \right)\), \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{r}}2}} }} \left( t \right)\), and \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{r}}3}} }} \left( t \right)\) are three random individuals, respectively.

For both exploration and exploitation strategies, we present two mutation strategies to improve the performance of the algorithm, respectively. Equation (26) is applied after exploration to assist the population in exploring more search space. On the other hand, Eq. (27) is used to mutate after the exploitation to improve the diversity of the population and prevent individuals from being trapped in local optima.

$$ \begin{gathered} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\alpha }} }} (t) = \frac{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\mathbf{1}}} }} (t) + \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\mathbf{2}}} }} (t)}}{2} \hfill \\ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\beta }} }} (t) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\varvec{r}{\mathbf{1}}}} }} (t) + \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{rand}}} \circ (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{c{r}{\mathbf{2}}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\user2{r}{\mathbf{3}}}} }} (t)) + \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{rand}}} \circ (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\varvec{r}{\mathbf{4}}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\varvec{r}{\mathbf{5}}}} }} (t)) \hfill \\ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{k}} }} (t + 1) = \left\{ {\begin{array}{*{20}c} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\alpha }} }} (t),{\text{fitness}}(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\alpha }} }} (t)) {\text{fitness}}(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\beta }} }} (t))} \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\beta }} }} (t),otherwise{\text{ }}} \\ \end{array} } \right. \hfill \\ \end{gathered} $$
(26)

where \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{r}}4}} }} \left( t \right)\) and \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{r}}5}} }} \left( t \right)\) are two random individuals, respectively. In addition, the meanings of the rest variables are all consistent with those mentioned in Eq. (25).

$$ \begin{gathered} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\alpha }} }} (t) = \frac{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\mathbf{1}}} }} (t) + \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\mathbf{2}}} }} (t)}}{2} \hfill \\ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\beta }} }} (t) = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{k}} }} (t) + \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{rand}}} \circ (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\varvec{r}{\mathbf{2}}}} }} (t) - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{{\varvec{r}{\mathbf{3}}}} }} (t)) \hfill \\ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\user2{k}} }} (t + 1) = \left\{ {\begin{array}{*{20}c} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\alpha }} }} (t),{\text{fitness}}(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\user2{Pos}_{\varvec{\alpha }} }} (t)) {\text{fitness}}(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\beta }} }} (t))} \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\!-\!-\!-\!-\!\!-\!-\!\rightharpoonup}$}}{{\varvec{Pos}_{\varvec{\beta }} }} (t),otherwise{\text{ }}} \\ \end{array} } \right. \hfill \\ \end{gathered} $$
(27)

It is worth noting that Eq. (27) is more similar to Eq. (25), and all variables have the same meanings. However, the variable \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{\varvec{k}}} }} \left( t \right)\) is used instead of the variable \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}}{{{\varvec{Pos}}_{{{\varvec{r}}1}} }} \left( t \right)\). It allows the mutation of individuals to be more dependent on the current position, which enables the population to increase diversity while maintaining the original convergence. Furthermore, the pseudo-code and flowchart of the proposed method are given in Algorithm 3 and Fig. 2, respectively.

figure c
Fig. 2
figure 2

Flowchart of the proposed method

3.3 Computational Complexity

3.3.1 Time Complexity

Further analysis of previous subsection reveals that the time complexity of the QLGJO algorithm is mainly influenced by three components. The first is the calculation of the fitness, the second is the update of the golden jackal population and the mutation of the golden jackal individuals, and the last is the update of the reinforcement learning component (Q-table). In which the calculation of the fitness will costs \(O\left(T\times N\times {O}_{Otsu}\right)\) time to calculate, where \(T\) indicates the maximum number of iterations, \(N\) denotes the size of the population, and \({O}_{Otsu}\) is the cost of Otsu method. And the population needs \(O(T\times N\times M)\) time to be updated, where \(M\) represents the dimensions of the object function, which is generally the same as the threshold number in the Otsu method. In addition, the mutation strategy also costs \(O(T\times N\times M)\) time complexity. Finally, the Q-table spends \(O(T\times N)\) time to be updated. Therefore, the total time complexity of QLGJO is \(O(T\times N\times ({O}_{Otsu}+M))\) after simplification.

3.3.2 Space Complexity

According to Algorithm 3, we can observe that the proposed algorithm additionally adds a Q-table for each individual. In addition, the space complexity of the population is \(O(M\times N)\) and the space complexity of the Q-table is \(O\left(9\times N\right)\to O(N)\). Therefore, the space complexity of the algorithm is the sum of the space occupied by the population and the space required by the Q-table, which is \(O(M\times N)\) with simplification.

4 Experimental Results and Analysis

In this section, we demonstrate the performance of the proposed algorithm through two different sets of experiments. First, we use IEEE CEC2022 as a benchmark test function to test the performance of proposed method. In this experiment, we selected six advanced meta-heuristics for comparison, including: the original algorithm of the proposed method, GJO. The first medical image segmentation variant of the GJO algorithm, IGJO. One of the latest meta-heuristics, INFO. One of the commonly used meta-heuristics in recent years, MVO. The best old algorithm, DE, and the most popular algorithm, PSO. Then we further evaluate the practical performance of QLGJO by presenting the experimental results of the COVID-19 image segmentation. As mentioned in Sect. 2, we use 9 random chosen images from Yang et al. [79] as experimental data to compare the performance of the proposed algorithm with other meta-heuristics. The twelve images were named as Patient 3, Patient 4, Patient 5, Patient 6, Patient 7, Patient 9, Patient 13, Patient 24, Patient 30, Patient 37, Patient 80, and Patient 121. Figures 3, 4, 5, and 6 show each test image and their histogram information. In the comparison experiments, the algorithms involved in the comparison were kept the same as the previous experiment and the Otsu method was used as the objective function for the segmentation. Moreover, Peak Signal to Noise Ratio (\(PSNR\)) [70], Structural Similarity Index (\(SSIM\)) [71], and Feature Similarity Index (\(FSIM\)) [72] were used as evaluation metrics to assess the performance of all algorithms.

Fig. 3
figure 3

COVID-19 CT test images and their histograms

Fig. 4
figure 4

COVID-19 CT test images and their histograms

Fig. 5
figure 5

COVID-19 CT test images and their histograms

Fig. 6
figure 6

The Segmented result for Patient 3

4.1 Environment Settings

To ensure the fairness of the experiments, all algorithms were run 21 times independently in the same environment. For IEEE CEC2022 benchmark functions, the number of iterations was set to 5000 and the population size is fixed to 120. On the other hand, for segmentation experiment, the number of iterations was set to 200, and the population size was fixed at 60. The parameters of all algorithms were kept at default parameters to ensure that they were in a relatively optimal state, considering the suggestion by Arcuri et al. [80]. In addition, the specific configuration information of the running environment is given in Table 1, and the parameter settings for each algorithm are listed in Table 2.

Table 1 Runtime Environment
Table 2 The parameters setting of all algorithms

4.2 Experiment on IEEE CEC2022

The CEC2022 benchmark test function is the most recent test function set which contains 12 different functions, of which F1 belongs to the Unimodal Functions, F2-F5 belongs to the Multi-modal Functions, F6-F8 belongs to the Hybrid Functions, and F9-12 belongs to the Composition Functions. In addition, Table 3 gives the details of the CEC2022 benchmark test function. In this experiment, we analyze the performance of the proposed method through quantitative and qualitative indicators. Among them, quantitative indicator includes the mean, median and Standard Deviations (std) [81] obtained in each benchmark function, while qualitative indicator is obtained by analyzing boxplot and convergence curve. In addition, the comprehensive performance of all algorithms was ranked by the Friedman mean rank test [82].

Table 3 The CEC2022 benchmark test function

Table 4 shows a general overview of the performance of each algorithm on the CEC2022 benchmarking function, the optimal values are highlighted in this table. With the data in this table, we can see that for the accuracy, QLGJO achieves an impressive improvement over all functions compared to the original GJO, IGJO, and PSO. Compared to the latest meta-heuristic INFO, QLGJO achieves a great advantage on F2, F4, F5, F6, F7, F8, F10, F11, and F12, and shows similar performance F1, F3, and F9. Compared to the common newer meta-heuristic MVO, QLGJO also achieves a great success over F2, F3, F4, F6, F7, F8, F9, and F11, except for F10, which is surpassed by MVO, and shows similar performance F1, F5, and F12. Finally, compared with the best classical algorithm DE, QLGJO shows some advantages on F2, F4, F6, and F12, and is comparable to DE on the rest of the benchmark functions. On the other hand, as far as the cost time of the algorithm is concerned, the proposed method is somewhat lacking and fails to obtain the best score among all the benchmark functions. However, compared to INFO, the proposed method still has some advantages. Finally, according to the Friedman mean rank test, we can further conclude that QLGJO ranks first in terms of overall performance among all the compared algorithms, followed by DE, MVO ranks third, while INFO, IGJO, GJO, and PSO rank fourth, fifth, sixth, and seventh, respectively.

Table 4 The CEC2022 benchmark test function

Figure 7 shows the boxplot of all the algorithms on the CEC2022 benchmark functions. Through the boxplot we can intuitively comprehend the data distribution of the algorithm after multiple runs. The maximum value is marked at the highest point and the minimum value is marked at the lowest point. Therefore, in general, we consider that the more stable the performance of an algorithm, the lower the height of the corresponding box plot. By analyzing Fig. 7, we can see that the proposed method has the lowest boxplot in the overall view. Therefore, we can tentatively conclude that QLGJO has the most stable performance among all the compared algorithms.

Fig. 7
figure 7

The boxplot for CEC2022

Figure 8 records the convergence curve plots of each algorithm. One point should be pointed out that, in order to compare the accuracy among the algorithms more intuitively, the convergence curves of all algorithms are operated with the difference between the optimal value of the benchmark function. Therefore, the closer the curve is to 0, the higher the accuracy of the algorithm is. In Fig. 8, the proposed algorithm reaches the lowest point except for F7 and F10. In other words, in terms of generalization, QLGJO has the highest accuracy among all the compared algorithms. However, one point also needs to be pointed out that although the convergence speed of QLGJO is substantially improved compared to the original algorithm, it still needs further enhancement compared to INFO and DE. In summary, through the experimental analysis in this subsection, we have a preliminary impression of the performance of QLGJO, and the proposed method is extremely competitive in numerical experiments compared with existing advanced algorithms. Therefore, in the next subsection, we will further validate the effectiveness of the proposed method in real-world problems with the image segmentation experiments on COVID-19.

Fig. 8
figure 8

The convergence curves for CEC2022

4.3 COVID-19 CT Image Segmentation Experiment

4.3.1 Performance Metrics

As mentioned above, in this experiment, three evaluation parameters, \(PSNR\), \(SSIM\), and \(FSIM\) are used to evaluate the performance of the proposed algorithm. In this subsection, a brief review of these three metrics is presented.

\(PSNR\) is a common metric in digital image processing, which is used in multilevel threshold segmentation as a very critical performance metric [83]. \(PSNR\) could represent the difference between the original image and the segmented image, and is calculated using Eq. (28).

$$ PSNR = 20\log_{10} \left( {\frac{255}{{RMSE}}} \right) $$
(28)
$$ RMSE = \sqrt {\frac{{\sum\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{N} {(Img_{org} (i,j) - Img_{seg} (i,j))^{2} } } }}{M \times N}} $$
(29)

where \(RMES\) is the root mean square error calculated by Eq. (29) [84], \({Img}_{org}\) and \({Img}_{seg}\) represent the original image and the segmented image, respectively, \(M\) and \(N\) denote the size of the image. In addition, the closer the value of \(PSNR\) is to \(0\), the greater the difference between the two images.

Structural similarity index is another common metric used to compare the similarity of two images [85]. The \(SSIM\) is between 0 and 1, which is similar to the \(PSNR\). The greater the difference between the two images, the closer the \(SSIM\) is to 0. It is calculated as follows:

$$ SSIM(Img_{org} ,Img_{seg} ) = \frac{{(2\mu_{org} \mu_{seg} + c_{1} )(2\delta_{org,seg} + c_{2} )}}{{(\mu_{org}^{2} + \mu_{seg}^{2} + c_{1} )(\delta_{org}^{2} + \delta_{seg}^{2} + c_{2} )}} $$
(30)

where \({\mu }_{org}\) and \({\mu }_{seg}\) denote the mean intensities for the original image and segmented image, respectively, \({\sigma }_{org}\) and \({\sigma }_{seg}\) represent the standard deviations of \({Img}_{org}\) and \({Img}_{seg}\), respectively. In addition, the covariance of \({Img}_{org}\) and \({Img}_{seg}\) was represented by \({\sigma }_{org, seg}\). Finally, \({C}_{1}\) and \({C}_{2}\) are two constant values.

Feature similarity index is a new metric used to compare the degree of feature difference between two images [86]. \(FSIM\) obtains the result by compounding the Phase Congruency (\(PC\)) [87] and Gradient Magnitude (\(GM\)) [88]. The higher \(FSIM\) value indicates the better performance of the thresholding method. It can be described as follows:

$$ FSIM = \frac{{\sum\limits_{x \in \Omega } {S_{L} (x)PC_{m} (x)} }}{{\sum\limits_{x \in \Omega } {PC_{m} (x)} }} $$
(31)
$$ S_{L} (x) = [S_{PC} (x)]^{\alpha } [S_{G} (x)]^{\beta } $$
(32)
$$ S_{PC} (x) = \frac{{2PC_{org} (x)PC_{seg} (x) + T_{1} }}{{PC_{org}^{2} (x) + PC_{seg}^{2} (x) + T_{1} }} $$
(33)
$$ S_{G} (x) = \frac{{2G_{org} (x)G_{seg} (x) + T_{2} }}{{G_{org}^{2} (x) + G_{seg}^{2} (x) + T_{2} }} $$
(34)

where the phase congruency of \({Img}_{org}\) and \({Img}_{seg}\) were denoted by \(P{C}_{org}\) and \(P{C}_{seg}\), respectively. And the gradient magnitude of \({Img}_{org}\) and \({Img}_{seg}\) were represented by \({G}_{org}\) and \({G}_{seg}\). \({T}_{1}\) and \({T}_{2}\) are two positive constants, respectively. \(\alpha \) and \(\beta \) are two constants, respectively.

4.3.2 Experimental Results and Analysis

In this subsection, we will discuss and analyze the results of the proposed algorithm for multilevel thresholding segmentation on COVID-19 CT images. In this experiment, we use the Otsu method, which is mentioned in Sect. 2.3 as the objective function. And the selected images are segmented with threshold levels of 8, 12, 16, and 20, respectively. Figures 4, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 shows the segmented images of the proposed algorithm for all the tested images at different threshold levels. Table 5, Table 6, and Table 7 show the mean value and std of all algorithms on \(PSNR\), \(SSIM\), and \(FSIM\) metrics, respectively. In addition, the Friedman mean rank test was also again used to rank the comprehensive performance of all algorithms. It is noteworthy that an accurate and effective multilevel thresholding technique should have higher mean values on \(PSNR\), \(SSIM\), and \(FSIM\), while the STD should be as low as possible. Therefore, the maximum mean value and the minimum STD value are highlighted in the tables.

Fig. 9
figure 9

The Segmented result for Patient 4

Fig. 10
figure 10

The Segmented result for Patient 5

Fig. 11
figure 11

The Segmented result for Patient 6

Fig. 12
figure 12

The Segmented result for Patient 7

Fig. 13
figure 13

The Segmented result for Patient 9

Fig. 14
figure 14

The Segmented result for Patient 13

Fig. 15
figure 15

The Segmented result for Patient 24

Fig. 16
figure 16

The Segmented result for Patient 30

Fig. 17
figure 17

The Segmented result for Patient 37

Fig. 18
figure 18

The Segmented result for Patient 80

Fig. 19
figure 19

The Segmented result for Patient 121

Table 5 Comparison of \(PSNR\)
Table 6 Comparison of \(SSIM\)
Table 7 Comparison of \(FSIM\)
Fig. 20
figure 20

The average time slots achieved by the different algorithms for each image segmentation experiment

Fig. 21
figure 21

Comparison of radar plots for GJO, IGJO, and QLGJO

Table 5 shows the \(PSNR\) values for each segmented image. Through the analysis of the data in the table, the proposed method has achieved the best experimental results, except that it was surpassed by the INFO algorithm in the Patient 37 segmentation experiment with the threshold value of 8. With the Friedman mean rank test, the ranking results of all algorithms are shown as follows: QLGJO is ranked first, MVO is ranked second, INFO is ranked third, DE is ranked fourth, IGJO is ranked fifth, and GJO and PSO are ranked sixth and seventh, respectively.

As shown in Table 6, where the \(SSIM\) values of the segmentation results are recorded. We can observe that in the segmentation experiment of Patient 30 with a threshold of 8, DE achieves the best results, and the proposed algorithm is also slightly inferior compared to MVO. In the segmentation experiment on Patient 37 with a threshold of 8, DE again achieves the best results, but only slightly outperforms the proposed algorithm. According to the Friedman mean rank test, the top three algorithms are QLGJO, MVO, and IGJO.

Table 7 lists the mean \(FSIM\) values after all images were segmented. With the comparison of the data in this table, the original GJO, IGJO, and MVO did not obtain satisfactory results. While INFO, DE and PSO show a large improvement in \(FSIM\) metrics, in contrast, the proposed QLGJO outperforms the other algorithms in the majority of experimental results. Moreover, the proposed algorithm still maintains the best performance as measured with the Friedman mean rank test.

Furthermore, Fig. 20 shows the average time slots achieved by the different algorithms for each image segmentation experiment in terms of time cost. Each row in the figure represents the percentage of time consumed by the different algorithms in the same segmentation experiment. We can clearly observe that the percentage of time slots occupied by the proposed algorithms does not vary significantly across all experiments. This phenomenon indicates that the overhead of the QLGJO algorithm is not restricted to a specific image or threshold. Besides this, it should be noted that the proposed algorithm has a slightly higher time cost than the original algorithm, owing to the additional time cost of introducing reinforcement learning. Overall, the proposed algorithm does not perform particularly well at the time complexity level, but it can complete the specified task in a reasonable amount of time.

In addition, Table 8 presents the results of all algorithms tested for fitness values on the Otsu method according to the Wilcoxon rank sum test. The Wilcoxon rank-sum test was used to verify whether there was a significant difference between the algorithms. When the p-value is less than 0.05, it can be considered that there is a significant difference between the algorithms. In other words, it means that there is a significant improvement in the proposed algorithm. Conversely, this means that the performance of the proposed algorithm is similar to or worse than the algorithm being compared. To better represent the analysis of the values, we use the symbols “ +  + ” and “\(-\)” to represent the cases where the p-value is less than 0.05 and the cases where the p-value is greater than 0.05, respectively. From Table 8, we can observe that there is a significant difference in the fitness values of the proposed algorithm compared to the original GJO, IGJO, INFO, DE, and PSO. Although it did not completely beat MVO in comparison, it only showed no significant improvement in the case of Patient 7 with a threshold of 20 and Patient 80 with a threshold of 16 and 20. Therefore, based on the results of the Wilcoxon rank sum test, we can conclude that the proposed QLGJO algorithm has higher performance in multilevel threshold segmentation using the Otsu method.

Table 8 Comparison of the Wilcoxon signed-rank test for Otsu method

Further inspection of the data in Tables 57, we can observe that although the QLGJO obtains the best results in terms of mean value, it still has some deficiencies in terms of STD, none of the proposed methods obtains a STD close to 0. Therefore, we further compare with the original GJO and IGJO to analyze the reasons for the poor performance of the proposed method in terms of STD. To analyze the performance differences between different versions of the GJO algorithm, radar plots [89] for the original GJO, IGJO, and proposed QLGJO are plotted separately in Fig. 21. It should be noted that the Otsu method used in this paper is a maximum optimization problem. Thus, the closer an algorithm is to the outer layer, the stronger its performance. From the images, we notice that the original GJO only performs better on Patient 3, and Patient 9. On the other hand, IGJO improves on the original GJO by overcoming the issue that the GJO tends to fall into a local optimum in most experiments. However, these improvements are also very limited in front of QLGJO, since both algorithms mentioned earlier are inside QLGJO, which indicates that the performance of the proposed algorithm in this paper surpasses all other versions of GJO algorithms.

By analyzing the radar plots of the three different versions of the GJO, we did not obtain a directly reason which affects the stability of the QLGJO. Therefore, we further analyzed the population diversity of different versions of the GJO. Figures 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69 show the convergence plots of the population diversity analysis. In which, the abscissa indicates the number of iterations and the ordinate represents the mean Euclidean distance between population individuals. These figures can objectively reflect the distribution among individuals of the population and further report the diversity characteristics of the population. Through the observation and analysis of these figures, we can see that the reinforcement learning strategy can help the population keep its diversity characteristics unaffected by the problem. Combined with reinforcement learning, the proposed QLGJO algorithm converges to stability earlier than the original GJO as well as the improved GJO, and the convergence curve roughly obeys the distribution of the \(f\left( t \right) = 1/t\). In addition, it is worth noting that the populations of both the original GJO and the IGJO eventually converge and cluster together, forcing the populations to fail to escape from the local optima. However, the populations of QLGJO still maintain a certain degree of diversity, which allows the algorithm to have the potential to step away from the local optimum to further improve the quality of the solution in the late iteration. Therefore, by analyzing the proposed method from multiple perspectives, we finally determined that the factor which affected the stability of the QLGJO algorithm was that the RL strategy would lead to a certain oscillation effect in the later Iteration. Overall, combining the results of CEC2022 benchmark test function and COVID-19 image segmentation, the proposed method can obtain satisfactory results in terms of convergence accuracy and convergence speed. Consequently, we can consider QLGJO as one of the most competitive GJO variants at present.

Fig. 22
figure 22

Patient 3 with \(th\) 8

Fig. 23
figure 23

Patient 3 with \(th\) 12

Fig. 24
figure 24

Patient 3 with \(th\) 16

Fig. 25
figure 25

Patient 3 with \(th\) 20

Fig. 26
figure 26

Patient 4 with \(th\) 8

Fig. 27
figure 27

Patient 4 with \(th\) 12

Fig. 28
figure 28

Patient 4 with \(th\) 16

Fig. 29
figure 29

Patient 4 with \(th\) 20

Fig. 30
figure 30

Patient 5 with \(th\) 8

Fig. 31
figure 31

Patient 5 with \(th\) 12

Fig. 32
figure 32

Patient 5 with \(th\) 16

Fig. 33
figure 33

Patient 5 with \(th\) 20

Fig. 34
figure 34

Patient 6 with \(th\) 8

Fig. 35
figure 35

Patient 6 with \(th\) 12

Fig. 36
figure 36

Patient 6 with \(th\) 16

Fig. 37
figure 37

Patient 6 with \(th\) 20

Fig. 38
figure 38

Patient 7 with \(th\) 8

Fig. 39
figure 39

Patient 7 with \(th\) 12

Fig. 40
figure 40

Patient 7 with \(th\) 16

Fig. 41
figure 41

Patient 7 with \(th\) 20

Fig. 42
figure 42

Patient 9 with \(th\) 8

Fig. 43
figure 43

Patient 9 with \(th\) 12

Fig. 44
figure 44

Patient 9 with \(th\) 16

Fig. 45
figure 45

Patient 9 with \(th\) 20

Fig. 46
figure 46

Patient 13 with \(th\) 8

Fig. 47
figure 47

Patient 13 with \(th\) 12

Fig. 48
figure 48

Patient 13 with \(th\) 16

Fig. 49
figure 49

Patient 13 with \(th\) 20

Fig. 50
figure 50

Patient 24 with \(th\) 8

Fig. 51
figure 51

Patient 24 with \(th\) 12

Fig. 52
figure 52

Patient 24 with \(th\) 16

Fig. 53
figure 53

Patient 24 with \(th\) 20

Fig. 54
figure 54

Patient 30 with \(th\) 8

Fig. 55
figure 55

Patient 30 with \(th\) 12

Fig. 56
figure 56

Patient 30 with \(th\) 16

Fig. 57
figure 57

Patient 30 with \(th\) 20

Fig. 58
figure 58

Patient 37 with \(th\) 8

Fig. 59
figure 59

Patient 37 with \(th\) 12

Fig. 60
figure 60

Patient 37 with \(th\) 16

Fig. 61
figure 61

Patient 37 with \(th\) 20

Fig. 62
figure 62

Patient 80 with \(th\) 8

Fig. 63
figure 63

Patient 80 with \(th\) 12

Fig. 64
figure 64

Patient 80 with \(th\) 16

Fig. 65
figure 65

Patient 80 with \(th\) 20

Fig. 66
figure 66

Patient 121 with \(th\) 8

Fig. 67
figure 67

Patient 121 with \(th\) 12

Fig. 68
figure 68

Patient 121 with \(th\) 16

Fig. 69
figure 69

Patient 121 with \(th\) 20

5 Conclusions

With the global prevalence of COVID-19, the entire scientific community has been working on ways to mitigate its impact on society. Early screening and treatment of patients can effectively cut off the transmission of COVID-19. Through the efforts of multiple medical experts, CT images have been shown to be effective in identifying suspected patients who are already infected with the new coronavirus. The multilevel threshold segmentation of CT images can effectively reduce the difficulty of subsequent processing and save more precious time for patients and health care workers. Therefore, we proposed a reinforcement learning-based GJO algorithm, QLGJO, to solve the COVID-19 CT image segmentation. In this study, we use the Otsu method as an objective function to determine the optimal threshold for COVID-19 CT images and make improvements to the original GJO algorithm. First, reinforcement learning was introduced to the GJO algorithm for the first time to balance the exploration and exploitation of the algorithm. Second, a new iterative phase was extended to accelerate the convergence of the algorithm. Finally, three new variational mechanisms are introduced to assist the algorithm avoid local optima. The proposed algorithm is compared to six advanced meta-heuristics (GJO, IGJO, INFO, MVO, DE, and PSO). Firstly, the CEC2022 benchmark test function was used to verify the performance of QLGJO. Then, Peak signal-to-noise ratio, structural similarity index, and feature similarity index are used as evaluation metrics to measure the performance differences between the algorithms in the segmentation experiments. The experimental results demonstrated that the proposed algorithm produced the most satisfactory results when compared to other algorithms and achieved efficient COVID-19 CT image segmentation.

In future work, we intend to introduce more objective functions into the proposed algorithm, such as Kapur Entropy, Tsallis Entropy, and Fuzzy Entropy. Meanwhile, we will extend the proposed algorithm to be applied to a wider range of fields, including feature selection, image classification, and drone path planning. Furthermore, it would also be a remarkable contribution to further reduce the additional overhead which is added by introducing reinforcement learning as well as to enhance the stability of the algorithm in the latter iterations, as mentioned in subSect. 4.3.2.