1 Introduction

Cancer is still one of the major causes of death in the world and as yet there is not a fully comprehensive knowledge of its appearance and elimination. The necessity of addressing preventive measures, medical research, and more effective treatment strategies has been persistently obvious. It is anticipated that cancer will be the most important obstacle to improving life expectancy in the current century. There has been an estimated 18.1 million cancer cases and 9.6 million cancer deaths in 2018. A detailed status report has been given by Bray et al. (2018).

Patients usually undergo cancer treatments, which are most likely to have the highest degree of effectiveness and the fewest side effects depending on the cancer type and how advanced it is. Surgery, radiation therapy, chemotherapy, targeted therapy, hormone therapy, and immunotherapy are considered as the most important types of treatment. Radiotherapy, chemotherapy, and targeted therapy have made limited progress in recent years (Vrána et al. 2018). However, efforts to investigate more successful treatment strategies have been made through immunotherapy (Kirschner and Panetta 1998; Rosenberg 2014; Tran et al. 2017; Gopalakrishnan et al. 2018; Wilson et al. 2018; Arabameri et al. 2018) and the use of immune system to treat cancer patients has achieved prominence as a successful treatment plan.

Cancer immunotherapy is aimed at engaging the immune system ability to recognize and destroy cancerous cells (Evans et al. 2018). It exploits the fact that cancer cells often have molecules on their surface—the tumour antigens—that may be detected by the immune system. Tumours are antigenic tissues due to their many genetic mutations. This antigenicity is not generally changed into effective immunogenicity. Tumour immunology recognizes antigens and develops strategies to improve anti-tumour immunity. Recent clinical successes, in the form of several approaches to cancer immunotherapy, support the concept that therapeutic utilization of immune system can actually succeed in important effects on cancer patients (Khalil et al. 2015).

Adoptive cell therapy (ACT) is a type of cancer immunotherapy that refers to the transfer of activated immune cells to a patient. T cells (a type of lymphocyte, one of the sub-types of white blood cells) are taken from the patient and cultured in the laboratory with the goal of improving immune functionality, then returned to the patient. For instance, cytotoxic T cells—an example of effector cells—become activated and capable of responding to cancer cells. Effector cells are any of different types of relatively short-lived activated cells that defend the body in an immune response. ACT may be accompanied by the administration of Interleukin-2 (IL-2). IL-2 is a type of cytokine (cytokines are molecules, important for basic activities of cells such as development, repair, and immunity) that controls the activities of lymphocytes. It specifically plays essential roles in immune system by direct effects on T cells (Kirschner and Panetta 1998; Rosenberg 2014; Rosenberg and Restifo 2015; Spranger et al. 2017).

In parallel with scientific research on cancer immunotherapy, which leads to important medical advances, the theoretical study of tumour–immune dynamics has provided valuable insights into the cancer–immune interaction. The role of the key components of immunity in cancer dynamics, cancer characteristics, and various factors as immune profiles are theoretically assessed and tumour dynamics (i.e., long term tumour recurrence and short term oscillations) are comprehensively described in mathematical modelling (Kirschner and Panetta 1998; Castiglione and Piccoli 2006; Banerjee and Sarkar 2008; Altrock et al. 2015; Cappuccio et al. 2006; d’Onofrio 2008; Eftimie et al. 2011; Soto-Ortiz and Finley 2016; Kosinsky et al. 2018; Valentinuzzi et al. 2018).

Improvements in efficiency of treatment strategies, however, can be obtained by considering an appropriate optimality criterion to be achieved, where the immuneostimulants and cancer-killing agents consumption is a major concern. The optimality conditions may be derived by employing: (i) the calculus of variations and Pontryagin’s minimum principle (PMP) that leads to a nonlinear two-point boundary-value problem (TPBVP) (see Pontryagin et al. 1962; ii) the method of dynamic programming and solving the Hamilton–Jacobi–Bellman equation (Bellman and Kalaba 1966).

Optimal control theory covers a wide range of methods of obtaining the optimum value of a performance measure. Specifically, these methods may be categorized under three macro-headings:

  1. (i)

    indirect methods in which the calculus of variations is used to determine the optimality conditions. The indirect multiple-shooting and the indirect collocation methods fall into this category;

  2. (ii)

    direct methods in which the optimal control problem is translated into a nonlinear optimization problem by making an appropriate approximation of the state and/or control of the optimal control problem. Direct collocation methods are the most notable approaches to optimal control problems; and,

  3. (iii)

    pseudospectral methods for optimal control (Ross and Karpenko 2012).

Ghaffari and Naserifar (2010) utilize a the well-known cancer-immunotherapy model, introduced by Kirschner and Panetta (1998). They adopt the forward-backward sweep method (FBSM) for solving the optimal control problem. In this paper, the genetic algorithm (GA) and the particle swarm optimization (PSO) are used to provide optimal treatment strategies with the aim of comparing the results with those obtained by Ghaffari and Naserifar (2010), and demonstrating the capability of these metaheuristics.

In the literature, several different optimal control problems are observed which can be categorized according to the objective function, the model used for describing the immune-cancer dynamics, and the method of solving the problem. A detailed literature on optimization problems in cancer immunotherapy is given in Sect. 2. The Kirschner–Panetta model, used in this paper, is briefly described in Sect. 3. Section 4 is devoted to the formulation of the single-objective optimal control problem. The GA and the PSO are described in Sect. 5. Section 6 makes a comparison between the methods used here and the method adopted by Ghaffari and Naserifar (2010). Section 7 is devoted to formulating the problem in the form of a bi-objective optimization problem. Finally, a brief summary of the results are given in Sect. 8 and the pros and cons of the methods are highlighted.

2 Related literature and contribution of the paper

The literature on optimal control problems in cancer immunotherapy is vast, however, the main body of the works focuses on the indirect methods (Castiglione and Piccoli 2006, 2007; Burden et al. 2004; Piccoli and Castiglione 2006; Cappuccio et al. 2007; Pillis et al. 2008; Ghaffari and Naserifar 2010; Ledzewicz et al. 2013; Elmouki and Saadi 2016; Ravindran et al. 2017). Very few works devote their attention to metaheuristics and, in addition, usually introduce a single approach without making comparisons between different methods.

The PMP is the topic of the works by Burden et al. (2004) and Ghaffari and Naserifar (2010). Both papers utilize the Kirschner–Panetta model (see the original paper by Kirschner and Panetta 1998) to produce optimal treatment protocols, where their objective is to simultaneously cope with the volume of cancer cells during the therapy and to minimize the total amount of the effector cells used during the treatment. Specifically, Ghaffari and Naserifar (2010) add a payoff term to the objective functional, originally proposed by Burden et al. (2004), in order to minimize the cancer cells at end of the treatment. Their optimal solutions turn out to be virtually bang-bang controls, i.e. the optimal controls (the amount of drug as a function of time) switch from upper (lower) bound to the other and are strictly never in between.

The distinguishing feature of the contributions made by Castiglione and Piccoli (2006), Piccoli and Castiglione (2006), Castiglione and Piccoli (2007) and Cappuccio et al. (2007) is that the drug administration is considered as a set of instant injections during a particular period of treatment. In fact, the impulsive control function is defined as the sum of finite number of Dirac delta functions representing N injections (of \( u_i \) amount of drug at time \( t_i \)). They employ the so-called needle variations (originally used by Pontryagin and his colleagues to prove the PMP) and establish their propositions in order to compute the derivatives of the objective function with respect to its variables \( t_i \) and \( u_i \). Thus the optimal control problem is translated into a minimization problem that can be solved by using any numerical scheme such as steepest descent method. Castiglione and Piccoli (2006) minimize the final value of tumour mass, and then, Piccoli and Castiglione (2006) add another term in order to minimize the time period during which the cancer cells are above a fixed threshold, and finally, Cappuccio et al. (2007) take this approach and (at the same time) minimize the final tumour mass, the integral cost of tumour exceeding a fixed maximum, the total amount of drug.

A direct collocation method is employed by Pillis and Radunskaya (2003). The authors’ aim is to minimize the tumour mass during and at the end of treatment. Minelli et al. (2011) formulate two (continuous and discrete) optimal problems, where the goal is to minimize both the cancer cells at the final time and the total amount of the medicine. The idea of the authors is the use of direct transcription and collocation method for the solution of continuous problem, and the use of direct transcription and multiple-shooting method for the discrete control problem.

Houy and Grand (2019) take the cancer-immune model introduced by Soto-Ortiz and Finley (2016). Their objective is to design an optimal course of immune cell injections in order to eradicate the tumour over a 4000-day period. The authors shift the focus onto a heuristic approach, i.e. the Monte Carlo tree search algorithm—a method for finding optimal decisions in a given domain by taking random samples in the decision space (see the article by Browne et al. 2012).

Finally, the GA is the approach, adopted by Lollini et al. (2006), Pennisi et al. (2009), Kiran and Lakshminarayanan (2013) and Qomlaqi et al. (2017), in dealing with cancer therapy. Lollini et al. (2006) use the immune-cancer model proposed by Celada and Seiden (1992) while their goal is to attain the minimum number of drug injections in a 400-day period of treatment to completely eliminate the tumour. Kiran and Lakshminarayanan (2013) formulate a multi-objective optimization problem. They integrate chemotherapy with immunotherapy to determine what schedule (time and dosage of injections) and which type of therapy could be more successful in taking control of tumour evolution. The authors use a version of GA, called non-dominated sorting genetic algorithm II (NSGA-II), a method devised by Deb et al. (2002) for dealing with multi-objective optimization problems. NSGA-II is also used by Batmani and Khaloozadeh (2013) for presenting an optimal treatment strategy, and coping with drug resistance and side effects.

In contrast to the works presented in the literature, which are mainly based on employing only a single method, the results obtained in this paper are based on exploiting the characteristics of two metaheuristics in order to provide a comparison with the results obtained by using the FBSM and to challenge the validity of proposed solutions.

Regarding the obtained solutions, the main aspects of contributions, made in this paper, are as follows:

  1. (i)

    in order to sufficiently employ the GA, the optimal control problem is translated to a discrete binary-valued problem;

  2. (ii)

    single-point, two-point, and uniform crossover are simultaneously used to ensure exploration-exploitation utility in parallel with each other;

  3. (iii)

    different to the simple GA that suffers the so-called premature convergence, in this paper, the GA is specially designed to preserve the population diversity (Pandey et al. 2014);

  4. (iv)

    unlike the original PSO, introduced by Kennedy and Eberhart (1995), the constriction coefficients (Clerc and Kennedy 2002) are employed and properly defined in order to induce the swarm to display exploration (versus exploitation) tendency and converge on optimum solutions;

  5. (v)

    the optimal therapeutic protocols are enhanced by formulating the problem in a bi-objective form.

3 The Kirschner–Panetta model

In this section, the fundamental elements of Kirschner–Panetta model, introduced originally by Kirschner and Panetta (1998), are detailed:

$$\begin{aligned}&\frac{\mathrm {d}x}{\mathrm {d}t} = cy - \mu _2 x + \frac{p_1 x z}{g_1 + z} + s_1 , \end{aligned}$$
(1a)
$$\begin{aligned}&\frac{\mathrm {d}y}{\mathrm {d}t} = r_2 y \left( 1 - b y \right) - \frac{a x y}{g_2 + y}, \end{aligned}$$
(1b)
$$\begin{aligned}&\frac{\mathrm {d}z}{\mathrm {d}t} = \frac{p_2 x y}{g_3 + y} - \mu _3 z + s_2 , \end{aligned}$$
(1c)

with initial conditions:

$$\begin{aligned}&x\left( 0\right) = x_0, \, y\left( 0\right) = y_0, \, z\left( 0\right) = z_0 . \end{aligned}$$
(1d)
Table 1 Parameters in (1) and their units/descriptions; IU: International Unit (standard measure of IL-2), L: Litre
Fig. 1
figure 1

Solutions to nondimensionalized Kirschner–Panetta model for different values of c. The tumour carrying capacity is scaled to \(10^5 \, \left( \mathrm {cell}\right) \). a \( c= 1.00 \times 10^{-2} \, \left( \mathrm {day^{-1}}\right) \); b \( c= 2.97 \times 10^{-2} \, \left( \mathrm {day^{-1}}\right) \); c \( c= 4.00 \times 10^{-2} \, \left( \mathrm {day^{-1}}\right) \)

Fig. 2
figure 2

Graph of \(y\left( t\right) \) for ACT therapy (i.e. \(s_1 \ne 0\), \(s_2 = 0\)). a \(c= 2.5 \times 10^{-2} \, \left( \mathrm {day}^{-1}\right) \), \(s_1=4.0 \times 10^2 \, \left( \mathrm {cell/day}\right) \), \(x_0=1.0 \times 10^4 \, \left( \mathrm {cell}\right) , y_0=1.0 \times 10^4 \, \left( \mathrm {cell}\right) , z_0=1.0 \times 10^4 \, \left( \mathrm {IU.L}^{-1}\right) \); b \(c= 8.0 \times 10^{-5} \, \left( \mathrm {day}^{-1}\right) \), \(s_1=5.5 \times 10^2 \, \left( \mathrm {cell/day}\right) \), \(x_0=1.0 \times 10^4 \, \left( \mathrm {cell}\right) , y_0=1.0 \times 10^2 \, \left( \mathrm {cell}\right) , z_0=1.0 \times 10^4 \, \left( \mathrm {IU.L}^{-1}\right) \); c \(c= 8.0 \times 10^{-5} \, \left( \mathrm {day}^{-1}\right) \), \(s_1=5.5 \times 10^2 \, \left( \mathrm {cell/day}\right) \), \(x_0=1.0 \times 10^4 \, \left( \mathrm {cell}\right) , y_0=1.0 \times 10^5 \, \left( \mathrm {cell}\right) , z_0=1.0 \times 10^4 \, \left( \mathrm {IU.L}^{-1}\right) \); d \(c= 2.5 \times 10^{-2} \, \left( \mathrm {day}^{-1}\right) \), \(s_1=5.5 \times 10^2 \, \left( \mathrm {cell/day}\right) \), \(x_0=1.0 \times 10^4 \, \left( \mathrm {cell}\right) , y_0=1.0 \times 10^4 \, \left( \mathrm {cell}\right) , z_0=1.0 \times 10^4 \, \left( \mathrm {IU.L}^{-1}\right) \)

Table 1 provides a brief description of the parameters in (1). The proposed model presents the immune-tumour dynamics by defining three populations: (i) x indicates the effector cells (effectors); (ii) y shows the cancer cells; and (iii) z represents the IL-2. The model incorporates the most important concepts of tumour-immune dynamics including the feature of IL-2, and is able to show both tumour regression for a highly antigenic tumour (i.e. larger values of c; see Table 1) and uncontrolled tumour growth for a low antigenic tumour.

Furthermore, according to (1):

  1. 1.

    Equation (1a) depicts the rate of change of effector cells over time. Specifically,

    1. (i)

      effectors grow in the presence of tumour cells (the first term on the right-hand side of (1a)), where parameter c shows the antigenicity of the tumour. Larger values of c denote well-recognized antigens;

    2. (ii)

      the third term on the right-hand side of (1a), which is of Michaelis-Menten form, indicates effector cell proliferation, stimulated by IL-2;

    3. (iii)

      \(s_1\) denotes the administration of effector cells as an external source of medicine;

    4. (iv)

      effectors naturally decay with an average lifespan of \( 1/ \mu _2 \, \left( \mathrm {day}\right) \);

  2. 2.

    the rate of change of tumour cells is governed by (1b):

    1. i.

      the first term on the right-hand side of (1b) illustrates that in the absence of any immune response, the growth of cancer cells follows the sigmoid function, i.e. the solution to the logistic differential equation: \( {\dot{y}} = r_2 y \left( 1 - b y \right) \), where the tumour’s carrying capacity is equal to \( 1/b = 10^{9} \, \left( \mathrm {cell}\right) \); and,

    2. ii.

      the second term indicates the interaction of cancer cells with the effectors where the tumour growth is brought under control;

  3. 3.

    Equation (1c) governs the IL-2:

    1. (i)

      the first term on the right-hand side of (1c) shows the natural production of the IL-2 due to the interaction between effectors and tumour;

    2. (ii)

      the second term indicates the decay of IL-2 with an average rate equal to \( 1/ \mu _3 \, \left( \mathrm {day}\right) \); and finally,

    3. (iii)

      \( s_2 \) denotes the dose of IL-2 per day as an external administration.

The Kirschner–Panetta model is capable of exploring three different treatment possibilities: (i) ACT therapy, i.e. \( s_1 > 0 \), and \( s_2 = 0 \); (ii) IL-2 therapy (\( s_1 = 0 \), and \( s_2 > 0 \)); and (iii) combined treatment, i.e. \( s_1 > 0 \), and \( s_2 > 0 \).

If untreated (\( s_1 = s_2 =0 \)), it is impossible for immune system to eliminate the cancer cells. This case is illustrated in Fig. 1, where (depending on the different values of parameter c) the long term tumour recurrence and short term oscillations can be observed. Specifically: (i) for \( 8.6 \times 10^{-5} \le c \le 3.25 \times 10^{-2} \) the solutions to (1) are stable limit cycles with different period and amplitude dependent on the value of c (see Fig. 1a, b); and (ii) for \( c \ge 3.25 \times 10^{-2} \) the oscillations become smaller and settle quickly on a stable steady state.

ACT therapy may leads to totally different conditions. Roughly speaking,

  1. (i)

    an injection of effector cells less than the critical value \( s_{1,cr} = \left( r_2 g_2 \mu _2 \right) / a = 540 \, \left( \mathrm {cell . day^{-1}}\right) \) fails to completely eradicate cancer cells. This case is shown in Fig. 2a;

  2. (ii)

    for lower values of c, injections more than the critical value (i.e. \( s_1 \ge s_{1,cr} \)) lead to a region of bistability, i.e. (dependent on the initial conditions) the system tends to either a stable tumour-free steady state or a tumour survival (see Fig. 2b, c);

  3. (iii)

    for larger values of c and \( s_1 \ge s_{1,cr} \), the system will definitely tends to a stable (free of tumour) steady state. Figure 2d illustrates this last case.

Treatment by only using IL-2 does not significantly vary the dynamics of the system. Thus, it would be impossible for IL-2 therapy to result in the removal of cancer cells. Although too much use of IL-2—more than the critical value \( s_{2,cr} = \left( \mu _2 \mu _3 g_1 \right) / \left( p_1 - \mu _2 \right) \approx 63.5 \times 10^6 \, \left( \mathrm {IU . L^{-1} . day^{-1}}\right) \)—may theoretically lead to the elimination of cancer cells, it will, in turn, cause an uncontrollable increase in effectors and damaging side effects.

Finally, a treatment strategy, based on the administration of effectors and IL-2 in combination, boosts immune system’s chances of eliminating the tumour. And again, the dosage of IL-2 must be restricted to \( s_{2,cr} \) in order to prevent harmful side effects. In comparison with ACT therapy, cancer cells with a smaller value of c even, can be eradicated by using lower amount of injected effectors. The adequate explanation is that the administration of IL-2 (with any value less than \(s_{2,cr}\)) makes a reduction in the amount of injected effectors required to eliminate cancer cells, i.e. the tumour-free steady state will be stable while \(s_2 < s_{2,cr}\) and \(s_1 > \frac{\left( \mu _3 g_1 +s_2\right) - \frac{p_1s_2}{\mu _2}}{\left( \mu _3 g_1 +s_2\right) }\times s_{1,cr} \). Figure 3 demonstrates how an administration of IL-2 in combination with effectors provokes a better response.

Fig. 3
figure 3

A comparison of \( y\left( t\right) \) in: (i) ACT therapy (\( s_1 = 500 \, \left( \mathrm {cell/day}\right) \) and \( s_2 = 0 \)) and (ii) combined therapy (\( s_1 = 500 \, \left( \mathrm {cell/day}\right) \) and \( s_2 = 4.0 \times 10^6 \left( \mathrm {IU.L^{-1}}\right) /\mathrm {day}\)); in both cases: \( c = 1.8 \times 10^{-2} \, \left( \mathrm {day^{-1}}\right) \), \( x_0 = y_0 = 1.0 \times 10^{4} \left( \mathrm {cell}\right) \), and \( z_0 = 1.0 \times 10^{4} \, \left( \mathrm {IU.L^{-1}}\right) \)

4 Description of the optimal control problem

First, a few basic concepts of the optimal control theory, restricted to systems described by ordinary differential equations, are recalled. Then, the optimal control problem of cancer immunotherapy is formulated by referring back to Kirschner–Panetta model.

4.1 A review of the fundamental concepts

Given the system:

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{{\mathbf {x}}} = {\mathbf {f}}\left( {\mathbf {x}}\left( t\right) ,{\mathbf {u}}\left( t\right) ,t\right) , \\ {\mathbf {x}}\left( 0\right) = {\mathbf {x}}_0 , \end{array}\right. } \end{aligned}$$
(2)

a measure (in Bolza form) of quantitatively evaluating the system’s performance may be defined as:

$$\begin{aligned} J\left( {\mathbf {u}}\right) = h \left( {\mathbf {x}} \left( t_f\right) \right) + \int _{0}^{t_f}{g \left( {\mathbf {x}},{\mathbf {u}},t \right) \mathrm {d}t}, \end{aligned}$$
(3)

where \( {\mathbf {x}}^{T}=\left( x_1,x_2, \dots , x_n\right) \in X \subset {\mathbb {R}}^n\) is the state vector, \( {\mathbf {u}}^{T}=\left( u_1,u_2, \dots , u_m\right) \in U \subset {\mathbb {R}}^m\) is the control input vector (X and U denote the set of admissible state trajectories and the set of admissible controls respectively), \( {\mathbf {f}}^{T} = \left( f_1,f_2, \dots , f_n\right) \) is a continuously differentiable vector-valued function (in all its arguments), h is a differentiable scalar function, and finally g is a continuously differentiable function in its arguments.

The first term on the right-hand side of (3) indicates a function value on the boundary called the payoff term, while the second term, the integral functional, represents the performance of the system over the entire time interval: \( [0,t_f] \). The system follows some state trajectory when a control input is applied, and performance measure \( J \left( {\mathbf {u}}\right) \) assigns a unique real number to the state trajectory. Thus an optimal control problem may be described as: find an admissible control \({\mathbf {u}}^{*} \in U\) (and the corresponding admissible trajectory \({\mathbf {x}}^{*} \in X\)) that minimizes the performance measure \( J \left( {\mathbf {u}}\right) \):

$$\begin{aligned}&\underset{{\mathbf {u}}}{\min } \, J\left( {\mathbf {u}}\right) , \end{aligned}$$
(4a)

subject to:

$$\begin{aligned}&\dot{{\mathbf {x}}} = {\mathbf {f}}\left( {\mathbf {x}}\left( t\right) ,{\mathbf {u}}\left( t\right) ,t\right) , \end{aligned}$$
(4b)
$$\begin{aligned}&{\mathbf {x}}\left( 0\right) = {\mathbf {x}}_0, \end{aligned}$$
(4c)
$$\begin{aligned}&{\mathbf {u}}\left( t\right) \in U, \text {for all } t \in \left[ 0,t_f\right] . \end{aligned}$$
(4d)

The constraint in (4b) can be transferred to the objective functional, \( J\left( {\mathbf {u}}\right) \), by introducing Lagrange multiplier \( \varvec{\lambda }^{T} \left( t\right) = \left( \lambda _1 \left( t\right) , \lambda _2 \left( t\right) , \dots , \lambda _n \left( t\right) \right) \) (also called the co-state vector) :

$$\begin{aligned} J\left( {\mathbf {u}}\right) = h \left( {\mathbf {x}} \left( t_f\right) \right) + \int _{0}^{t_f} {\left( g \left( {\mathbf {x}},{\mathbf {u}},t \right) +\varvec{\lambda }^{T} \left( t\right) \left( {\mathbf {f}} \left( {\mathbf {x}},{\mathbf {u}},t\right) - \dot{{\mathbf {x}}} \right) \right) \mathrm {d}t}. \end{aligned}$$

After defining the Hamiltonian function

$$\begin{aligned} H \left( {\mathbf {x}},{\mathbf {u}},\varvec{\lambda },t\right) = g \left( {\mathbf {x}},{\mathbf {u}},t \right) + \varvec{\lambda }^{T} \left( t\right) {\mathbf {f}} \left( {\mathbf {x}},{\mathbf {u}},t\right) , \end{aligned}$$
(5)

a simple version of the PMP gives the necessary conditions for \( {\mathbf {u}}\left( t\right) \) to be potentially an optimal control, where the terminal time \( t_f \) and the terminal state \({\mathbf {x}}\left( t_f\right) \) are considered to be fixed and free (not restricted) respectively: let \({\mathbf {u}}^{*} : \left[ 0,t_f \right] \rightarrow U \subset {\mathbb {R}}^m\) be an optimal control, and let \({\mathbf {x}}^{*} : \left[ 0,t_f \right] \rightarrow X \subset {\mathbb {R}}^n \) (and \( \varvec{\lambda }^{*} \)) be the corresponding optimal state (and co-state) trajectory. Then the optimality set (i.e., \({\mathbf {u}}^{*}\), \({\mathbf {x}}^{*}\), and \( \varvec{\lambda }^{*} \)) satisfies simultaneously the set of the following conditions:

  1. i.

    the state equation:

    $$\begin{aligned} \frac{\mathrm {d}{\mathbf {x}}^{*}}{\mathrm {d}t} = \frac{\partial H }{\partial \varvec{\lambda }} \left( {\mathbf {x}}^{*}, {\mathbf {u}}^{*} , \varvec{\lambda }^{*}, t \right) , \quad \forall t \in \left[ 0,t_f\right] , \end{aligned}$$
    (6)

    with initial conditions on the state vector:

    $$\begin{aligned} {\mathbf {x}}^{*} \left( 0\right) = {\mathbf {x}}_0 ; \end{aligned}$$
    (7)
  2. ii.

    the co-state equation:

    $$\begin{aligned} \frac{\mathrm {d}\varvec{\lambda }^{*}}{\mathrm {d}t} = - \frac{\partial H}{\partial {\mathbf {x}}} \left( {\mathbf {x}}^{*}, {\mathbf {u}}^{*} , \varvec{\lambda }^{*}, t \right) , \quad \forall t \in \left[ 0,t_f\right] , \end{aligned}$$
    (8)

    with terminal conditions on the co-state vector (transversality conditions):

    $$\begin{aligned} \varvec{\lambda }^{*} \left( t_f\right) = \frac{\mathrm {d} h}{\mathrm {d}{\mathbf {x}}} {\left( {\mathbf {x}}^{*}\left( t_f\right) \right) } ; \end{aligned}$$
    (9)
  3. iii.

    optimality condition (i.e. the optimal control \( {\mathbf {u}}^{*} \) minimizes the Hamiltonian function):

    $$\begin{aligned} H \left( {\mathbf {x}}^{*}, {\mathbf {u}}^{*} , \varvec{\lambda }^{*}, t \right) \le H \left( {\mathbf {x}}^{*}, {\mathbf {u}} , \varvec{\lambda }^{*}, t \right) , \quad \forall t \in \left[ 0,t_f\right] , \, \forall {\mathbf {u}} \in U . \end{aligned}$$
    (10)

Concerning the minimization of the Hamiltonian function, the inequality in (10) is transformed straightforwardly to \( \frac{\partial H}{\partial {\mathbf {u}}} = 0\), provided that the optimal control \( {\mathbf {u}}^* \) is strictly within the set of admissible controls for all time in the interval \( \left[ 0,t_f\right] \) (i.e. not on the boundary). In this case, the boundary does not affect the solution. However, \( \frac{\partial H}{\partial {\mathbf {u}}}\) may not be equal to zero when the optimal control lies on the boundary during a subinterval \( \left[ t_1,t_2\right] \) of the interval \( \left[ 0,t_f\right] \). A specific point is raised when U, the set of admissible controls, is composed of scalar piecewise continuous functions \( u\left( t\right) \) such that \( a \le u \left( t\right) \le b\) (and \(a,b \in {\mathbb {R}}\)). In this case, it can be shown that \( \frac{\partial H}{\partial u} \) is non-negative (non-positive respectively) if the optimal control \( u^* \left( t\right) \) lies on the lower boundary a (upper boundary b respectively). A heuristic proof has been given by Lenhart and Workman (2007, section 8). In summary:

$$\begin{aligned} {\left\{ \begin{array}{ll} u^* \left( t\right) = a &{} \text {implies} \quad \frac{\partial H}{\partial u} \ge 0 \; \text {at time} \, t,\\ a<u^* \left( t\right) <b &{} \text {implies} \quad \frac{\partial H}{\partial u} = 0 \; \text {at time} \, t,\\ u^* \left( t\right) = b &{} \text {implies} \quad \frac{\partial H}{\partial u} \le 0 \; \text {at time} \, t. \end{array}\right. } \end{aligned}$$
(11)

4.2 Optimal control applied to Kirschner–Panetta model

In this section, the optimal control problem of cancer treatment is formulated by adopting the PMP approach. Ghaffari and Naserifar (2010) consider a problem for ACT therapy, i.e. effectors are the only external source for treatment (\( s_1 >0 \) and \( s_2 = 0 \)). The performance of Kirschner–Panetta model is controlled by the specified objective functional:

$$\begin{aligned} J\left( {\mathbf {u}}\right) = A y \left( t_f\right) + \int _{0}^{t_f}{ \left( y\left( t\right) - x\left( t\right) - z\left( t\right) + \frac{1}{2} B {\left( u \left( t\right) \right) }^2 \right) \mathrm {d}t}. \end{aligned}$$
(12)

The control input, \( u\left( t\right) \), is the percentage of a fixed amount of \( s_1 \) and therefore bounded by \( 0 \le u\left( t\right) \le 1 \). The positive real parameters, A and B, are the weight factors. The payoff term, \( A y \left( t_f\right) \), refers to the minimization of cancer cells at the final time. The integrand in the second term on the right-hand side of (12) consists of: (i) \( y \left( t\right) \) to minimize the cancer cells during the treatment; (ii) \( - x \left( t\right) - z \left( t\right) \) to keep the effector cells and IL-2 at a high level; and (iii) the quadratic form, \(\frac{1}{2} B {\left( u \left( t\right) \right) }^2\), to minimize the total amount of injected effector cells during the therapy, where the minimization of total used effector cells is B times as important as cancer cells.

For a fixed final time and a free final state, the problem is considered as follows:

$$\begin{aligned}&\underset{0 \le u \le 1}{\min } \, A y \left( t_f\right) + \int _{0}^{t_f}{ \left( y\left( t\right) - x\left( t\right) - z\left( t\right) + \frac{1}{2} B {\left( u \left( t\right) \right) }^2 \right) \mathrm {d}t} , \end{aligned}$$
(13a)

subject to:

$$\begin{aligned}&{\left\{ \begin{array}{ll} \displaystyle \frac{\mathrm {d}x}{\mathrm {d}t} = cy - \mu _2 x + \displaystyle \frac{p_1 x z}{g_1 + z} + u\left( t\right) s_1 , &{} \\ \displaystyle \frac{\mathrm {d}y}{\mathrm {d}t} = r_2 y \left( 1 - b y \right) - \displaystyle \frac{a x y}{g_2 + y} , &{} \\ \displaystyle \frac{\mathrm {d}z}{\mathrm {d}t} = \displaystyle \frac{p_2 x y}{g_3 + y} - \mu _3 z , &{} \end{array}\right. } \end{aligned}$$
(13b)
$$\begin{aligned}&x\left( 0\right) = 1 , y\left( 0\right) = 1 , z\left( 0\right) = 1 \end{aligned}$$
(13c)

The Hamiltonian function is obtained by referring to (5), (13a), and (13b):

$$\begin{aligned} H \left( x,y,z,\lambda _1,\lambda _2,\lambda _3,u\right) =&\; y\left( t\right) - x\left( t\right) - z\left( t\right) + \frac{1}{2} B {\left( u \left( t\right) \right) }^2 \nonumber \\&+ \lambda _1 \left( cy - \mu _2 x + \frac{p_1 x z}{g_1 + z} + u\left( t\right) s_1 \right) \nonumber \\&+ \lambda _2 \left( r_2 y \left( 1 - b y \right) - \frac{a x y}{g_2 + y} \right) \nonumber \\&+ \lambda _3 \left( \frac{p_2 x y}{g_3 + y} - \mu _3 z \right) . \end{aligned}$$
(14)

This leads to the final formulation of the problem by using (6)–(9), (11), and (14):

$$\begin{aligned}&\frac{\mathrm {d}x}{\mathrm {d}t} = cy - \mu _2 x + \frac{p_1 x z}{g_1 + z} + u\left( t\right) s_1 , \end{aligned}$$
(15a)
$$\begin{aligned}&\frac{\mathrm {d}y}{\mathrm {d}t} = r_2 y \left( 1 - b y \right) - \frac{a x y}{g_2 + y} , \end{aligned}$$
(15b)
$$\begin{aligned}&\frac{\mathrm {d}z}{\mathrm {d}t} = \frac{p_2 x y}{g_3 + y} - \mu _3 z , \end{aligned}$$
(15c)
$$\begin{aligned}&\frac{\mathrm {d}\lambda _1}{\mathrm {d}t} = 1 + \left( \mu _2 - \frac{p_1 z}{g_1 + z} \right) \lambda _1 + \frac{a y \lambda _2}{g_2 + y} - \frac{p_2 y \lambda _3}{g_3 + y} , \end{aligned}$$
(15d)
$$\begin{aligned}&\frac{\mathrm {d}\lambda _2}{\mathrm {d}t} = -1 - c \lambda _1 - \left( r_2 - 2 r_2 b y - \frac{a g_2 x}{\left( g_2 +y\right) ^2} \right) \lambda _2 -\frac{p_2 g_3 x}{\left( g_3 +y \right) ^2} \lambda _3 , \end{aligned}$$
(15e)
$$\begin{aligned}&\frac{\mathrm {d}\lambda _3}{\mathrm {d}t} = 1 - \frac{p_1 g_1 x}{\left( g_1 + z\right) ^2} \lambda _1 + \mu _3 \lambda _3 , \end{aligned}$$
(15f)

with boundary conditions:

$$\begin{aligned}&x\left( 0\right) = 1, \, y\left( 0\right) = 1, \, z\left( 0\right) = 1, \, \lambda _1 \left( t_f\right) = 0, \, \lambda _2 \left( t_f\right) = A , \, \lambda _3 \left( t_f\right) = 0 , \end{aligned}$$
(15g)

and the control is characterized by

$$\begin{aligned}&u\left( t\right) = {\left\{ \begin{array}{ll} 0 &{} \text {if} \, \lambda _1 \ge 0 , \\ - \frac{s_1}{B} \lambda _1 &{} \text {if} \, - \frac{B}{s_1}< \lambda _1 < 0 , \\ 1 &{} \text {if} \, \lambda _1 \le - \frac{B}{s_1}. \end{array}\right. } \end{aligned}$$
(15h)

The two-point boundary value problem (TPBVP), obtained above, may be solved by using any of indirect methods such as indirect shooting or collocation methods. However, Ghaffari and Naserifar (2010) adopt the FBSM. Although the method is fully introduced in the book by Lenhart and Workman (2007), for the sake of convenience, a brief description of FBSM will be provided in Sect. 6.

In the following sections, the GA and the PSO are briefly described and then adopted to solve the problem. Solving this type of problems will generally result in bang-bang solutions, i.e. the optimal control switches periodically from upper bound \( u = 1 \) to lower bound \( u = 0 \) (Ghaffari and Naserifar 2010). Thus the inputs in the proposed metaheuristics, can be limited to binary-valued controls (i.e. \( u\left( t\right) \in \left\{ 0,1\right\} , \forall t \in \left[ 0,t_f\right] \)) and, in turn, represent the total number (and order) of the days with treatment.

Although Ghaffari and Naserifar (2010) state that initial conditions, \( x \left( 0\right) \), \( y \left( 0\right) \), and \( z \left( 0\right) \), are normalized to 1.0, the model is not scaled at all and the primary values (given in Table 1) are used for the optimal problem. Therefore, those values are also used here in order to make a real and meaningful comparison between the results. This simply means that the problem is solved for non-scaled model with initial conditions equal to 1.0. Nonetheless, the scaled system is used in formulation of the bi-objective optimization problem (see Sect. 7.4).

5 Metaheuristic approaches

Metaheuristics offer viable alternatives to more traditional algorithms in order to find optimal (or as optimal as possible) solutions for complicated optimization problems when the classic and exact methods fail to carry out the task in a reasonable trial period. A metaheuristic is an algorithmic framework composed of a set of strategies to guide the search process where the goal is to efficiently explore the search space for quasi-optimal (even optimal) solutions. The practical significance of metaheuristics has been grasped in diverse branches of science and engineering such as computational biology, machine learning and data mining, electronics and telecommunications, system modelling, and control. the GA and the PSO are most well-known for their capability of efficiently fitting with a vast range of applications even multimodal and multi-objective optimization problems. In the following sections, the GA and the PSO will be brought in, to observe how the method, used by Ghaffari and Naserifar (2010), deals with the highly nonlinear problem described in (13).

5.1 Genetic algorithm

A GA employs a set of mechanisms inspired by biological evolution together with the genetics-based operators (i.e. crossover and mutation) and is generally used to generate high-quality solutions to optimization problems. The algorithm was initially introduced by Holland (1975) and further developed by Goldberg (1989). Over the past years, different versions of the algorithm have been presented, however, they are all based on the main concepts of the original algorithm. A simple GA begins by creating a random initial population of n solutions (these solutions are often expressed as binary-valued strings) and then creates the next generations (new populations) by performing the following stages:

  1. 1.

    computes the cost function (i.e. the objective function, if the problem is minimization) for each member (individual) of current population and assigns a cost value to each one;

  2. 2.

    some of the best individuals of current generation (i.e. individuals with lower cost values) are passed to the next generation (called elitist selection strategy). These individuals are called elites and are \( n_\textit{elite} \) in number. The elitist strategy guarantees that the quality of solutions will not decrease during the sequence of new generations;

  3. 3.

    selects individuals called parents to produce new individuals called offspring. Probably, the individuals with better cost values are more likely to be selected in each selection process. The individuals are selected based on sampling with replacement, so that an individual may be selected several times;

  4. 4.

    combines the entries of a pair of selected parents by using crossover rules to produce two new offspring, then these offspring are passed to the next generation. Crossover fraction, \( f_c \), specifies the number of crossover offspring, \( n_c \), which are moved to the next generation: \(n_c = {\texttt {round}} \left( f_c \times \left( n - n_\textit{elite}\right) \right) \);

  5. 5.

    makes changes to some entries of a single selected parent by using mutation rules in order to create new offspring called mutants. The remaining part of the next generation, \( n_m \), is composed of mutants, i.e. \( n_m = n - \left( n_\textit{elite} + n_c\right) \);

  6. 6.

    the algorithm stops when a stopping criterion is met.

In a simple GA, the search process converges when the genetic nature of the GA operators leads to a uniform generation with (almost) identical individuals. Thus, the operators are no longer capable of producing new offspring superior to their parents. Unfortunately, the population uniformity may occur before finding the true optimum—a common problem with GAs called premature convergence. Roughly speaking, the population reaches a sub-optimal position while more successful offspring cannot be produced. It is widely recognized that the intuitive cause of premature convergence is the loss of the diversity of the population, and therefore the use of methods for maintaining population diversity could be a potential strategy to reduce the chance of premature convergence. Crowding methods (replacing similar parents with new offspring) and fitness sharing (penalizing the individuals in densely populated areas), which are well-known approaches to multimodal optimization problems, give promising results in the maintenance of population diversity (see the survey article provided by Pandey et al. 2014).

In this paper, with regard to a good spread of individuals, a direct approach to diversity preservation is adopted in order to base an estimate on the density of individuals in the population. To this end, the assessment of the distance between an individual and its surrounding individuals is carried out based on the concept of crowding distance (inspired by the NSGA-II; see the original work done by Deb et al. 2002).

The proposed GA The algorithm is first initialized by creating a random population of n individuals, then \( n_c \) crossover offspring and \( n_m \) mutants are produced by choosing any of the crossover patterns and mutation rules respectively. Let N indicate the total of all these parents, crossover offspring, and mutants, i.e. \( N = n + n_c + n_m \) and let \(J_i\) denote the cost value of the i-th individual where \(i = 1, 2, \dots , N\). The members of the population are first sorted according to their cost values, so that the first and last individuals at the top and bottom of the list correspond to the best (lowest) cost, \( J_{min} \), and the worst cost, \( J_\textit{max} \), respectively. The first n/2 of the best individuals are directly moved to the next generation while the remaining individuals are selected based on crowding distance. The crowding distance of each individual is considered an index of the interval covered by that individual in the objective space (range of the cost function). Thus, the i-th individual’s crowding distance, \( d_i \), is defined as the normalized cost difference between its two nearest individuals on either side. The crowding distances of a population with N individuals are calculated as follows:

$$\begin{aligned} d_i = {\left\{ \begin{array}{ll} \displaystyle \frac{J_{i+1}-J_{i-1}}{J_\textit{max}-J_{min}} &{} \text {if} \, i \in \left\{ 2,\dots ,N-1\right\} , \\ \infty &{} \text {if} \, i \in \left\{ 1,N\right\} . \end{array}\right. } \end{aligned}$$
(16)

Individuals with higher amount of crowding distance are far superior to those with lower crowding distance values, and contribute greatly to a more diverse population. Thus the individuals are arranged in descending order of crowding distance and the first n/2 of individuals are selected to form the remaining part of the next generation. The crowding distances of the first and last individuals are considered equal to infinity in order to be transferred to next generation. A pseudo-code of the proposed GA is given in Algorithm 1 where an I-iteration run is considered the stopping criterion.

figure a

Concerning the proposed algorithm, some remarkable points are in order:

  1. 1.

    the algorithm does not require any user-defined parameter for diversity preservation;

  2. 2.

    instead of defining a genotypic difference (bit-wise diversity) measure in the individual space, the distance is evaluated in the objective space;

  3. 3.

    the worst solution, for which the value of crowding distance equates to infinity, is transferred to the next generation to help the population diversity of subsequent generations.

Implementation of the proposed GA for optimal medication regimen The proposed GA, described in Algorithm 1, is implemented by using MATLAB computing environment to assess the cost in (13a) and formulate the optimal ACT strategies. All the parameters of the Kirschner–Panetta model are reported in Table 1. Specifically the following two cases are considered in order to provide a valid (and fair) comparison with the results obtained by Ghaffari and Naserifar (2010):

  1. 1.

    \( s_1 = 550 \, \left( \mathrm {cell . day^{-1}}\right) \), \( c = 0.025 \, \left( \mathrm {day^{-1}}\right) \), \( A = 10^3 \), \( B = 1 \);

  2. 2.

    \( s_1 = 550 \, \left( \mathrm {cell . day^{-1}}\right) \), \( c = 0.040 \, \left( \mathrm {day^{-1}}\right) \), \( A = 10^3 \), \( B = 10^4\).

In both cases the treatment strategies are developed over a 350-day period of time, i.e. \( t_f = 350 \, \left( \mathrm {day}\right) \). Each individual represents an input control (a position in the search space), \( u = \left( u_1,u_2,\dots ,u_{350}\right) \), a 350-tuple row vector with binary-valued entries, i.e. \( u_i \in \left\{ 0,1\right\} \) for \( i = 1,2,\dots ,350 \), so that there exist 350 binary decision variables. The cost function takes each individual as an input, then converts it into a piecewise continuous function and calculates the corresponding cost value, J. The crowding distance is calculated for the full population. At the initialization stage the full treatment input (\( u_i = 1 \) for \( i = 1,2,\dots ,350 \)) and null treatment input (\( u_i = 0 \) for \( i = 1,2,\dots ,350 \)) are included in the list of initial population in order to improve the initial diversity and performance of the algorithm. The population size is \( n=200 \), the number of crossover offspring and mutants are equal to \( n_c = 0.7 \times n \) and \( n_m = 0.3 \times n \) respectively, and the algorithm runs over 100 iterations.

5.2 Particle swarm optimization

The PSO is a nature-inspired metaheuristic that arises from interactions and information flow between individuals (or particles) of a population. The population possesses the ability to arrange its particles in a purposeful manner (self-organization). The PSO begins with creating a random population of n particles where each particle finds its position in the search space and carries out a corresponding evaluation of cost function. The search space is often (at least in this paper) the d-dimensional real space, \( {\mathbb {R}}^{d} \), so that a particle’s position denotes a d-tuple vector \( {{\mathbf {x}}} \) in (a subset of) \( {\mathbb {R}}^{d} \). During the coming iterations, the i-th particle changes its previous position, \( {{\mathbf {x}}}_i \left( k\right) \) at k-th iteration, to the new position, \( {{\mathbf {x}}}_i \left( k+1\right) \), by moving in its new direction (called velocity), \( {{\mathbf {v}}}_i \left( k+1\right) \), i.e.

$$\begin{aligned} {{\mathbf {x}}}_i \left( k+1\right) = {{\mathbf {x}}}_i \left( k\right) + {{\mathbf {v}}}_i \left( k+1\right) . \end{aligned}$$
(17)

How the particle moves to the new position is dependent on three factors, i.e. in each step (iteration), the particle tends to move:

  1. 1.

    in its previous direction, \( {{\mathbf {v}}}_i \left( k\right) \);

  2. 2.

    towards its best previously experienced position, \( {{\mathbf {b}}}_{i} \left( k\right) \), up to the k-th iteration;

  3. 3.

    towards the best global position among all particles, \( {{\mathbf {g}}} \left( k\right) \), up to the k-th iteration.

Thus, the j-th projection (component) of the i-th particle’s new velocity is defined as:

$$\begin{aligned} v_{ij} \left( k+1\right) = v_{ij} \left( k\right) + c_1 \, r_1 \, \left( b_{ij} \left( k\right) - x_{ij} \left( k\right) \right) + c_2 \, r_2 \, \left( g_j \left( k\right) - x_{ij} \left( k\right) \right) , \end{aligned}$$
(18)

where \( j=1,2,\dots ,d \), and \( b_{ij} \), \( x_{ij} \), and \( g_{j} \) denote the j-th component of the vectors \( {{\mathbf {b}}}_{i}\), \( {{\mathbf {x}}}_{i}\), and \( {{\mathbf {g}}}\) respectively. Obviously, in the initialization stage the velocity of each particle is equal to zero, i.e. \( v_{ij} \left( 0\right) := 0 \) for all \( i \in \left\{ 1,2,\dots ,n\right\} \) and \( j \in \left\{ 1,2,\dots ,d\right\} \). The parameters \( c_1 \) and \( c_2 \), called acceleration coefficients (both of them equal to 2.0 in original version), denote the rates at which the particle deviates from its neighbourhood and moves to the best global position ever found. The coefficients \( r_1 \) and \( r_2 \) are both random scalars uniformly distributed in the interval \( \left[ 0,1\right] \), in order to include randomness in the movement of the particles.

After the appearance of its original version, the algorithm has undergone marked changes and, in consequence, its overall performance has greatly improved. Bonyadi and Michalewicz (2017), the authors conduct a thorough review of research on PSO and compile a list of articles incorporating considerable modifications into the original algorithm with a primary focus on single objective optimization problems.

The so-called swarm explosion is a major problem with the basic PSO. The explosion refers to the particles’ velocities and positions speeding towards infinity without control as a result of random weighting parameters in (18). An early solution to moderately tackle the issue of explosion is to restrict \( v_{ij} \) to the closed interval \( \left[ -V_\textit{max},V_\textit{max}\right] \) where the positive real number \( V_\textit{max} \) is bounded by \( 0.1 \times x_\textit{max} \le V_\textit{max} \le 1.0 \times x_\textit{max} \), and \( x_\textit{max} \) denotes the maximum absolute value of bounds defined for elements of position vector. Another improvement is made by Shi and Eberhart (1998) to assist in the balance between exploration and exploitation aspects of search process. The first part on the right-hand side of (18) provides the exploration of search space; conversely, the other parts cause the swarm to be attracted to the initial best position (greater exploitation of local search data). As a consequence, an inertia weight, w, is introduced to balance exploration against exploitation:

$$\begin{aligned} v_{ij} \left( k+1\right) = w \, v_{ij} \left( k\right) + c_1 \, r_1 \, \left( b_{ij} \left( k\right) - x_{ij} \left( k\right) \right) + c_2 \, r_2 \, \left( g_j \left( k\right) - x_{ij} \left( k\right) \right) , \end{aligned}$$
(19)

with a typical inertia range of 0.9–1.2. However, further experiments, conducted on inertia weight, show that a linear reduction in w from 0.9 to 0.4 during a run improves the performance (Eberhart and Shi 2000).

Clerc and Kennedy (2002) provide a mathematical methodology (based on an eigenvalue analysis of the search procedure) to assess stability and convergence of the PSO. The research indicates that the inclusion of appropriately defined constriction coefficient, \( \chi \), results in the prevention of explosion and an increase in the convergence rate:

$$\begin{aligned} v_{ij} \left( k+1\right)&= \chi \left[ v_{ij} \left( k\right) + c_1 \, r_1 \, \left( b_{ij} \left( k\right) - x_{ij} \left( k\right) \right) + c_2 \, r_2 \, \left( g_j \left( k\right) - x_{ij} \left( k\right) \right) \right] , \nonumber \\ \chi&= \frac{2}{\left| 2-\phi -\sqrt{\phi ^2-4\phi } \right| }, \qquad \phi = c_1 + c_2. \end{aligned}$$
(20)

The parameter \( \phi \) must be greater than 4.0 to prevent the explosion of particles, however, as \( \phi \) increases the diversity of the swarm reduces. Typically, \( \phi \) is set to 4.1 (with \( c_1 = c_2 = \phi /2 \)) in order to guarantee the convergence and to avoid premature convergence. This method is used here to set the PSO parameters and the random perturbation of position is evaluated by using (20).

Implementation of PSO for optimal medication regimen The algorithm is implemented by using MATLAB computing environment for the two previously mentioned cases in Sect. 5.1 where \( \phi = 4.10 \), \( c_1 =2.05 \), \( c_2 =2.05 \), and the constriction factor is therefore equal to 0.729. Each particle represents a piecewise continuous function alternating between 0 and 1. The PSO initializes with creating a random population of 200 particles and ends with completing 100 iterations. Again, the full treatment input (\( u\left( t\right) = 1 \) for all \( t \in \left[ 0,350\right] \)) and null treatment input (\( u\left( t\right) = 0 \) for all \( t \in \left[ 0,350\right] \)) are included in the list of initial population.

6 Comparison between the GA, the PSO, and the FBSM

Ghaffari and Naserifar (2010), in their approach to the minimization of objective functional (12), focus on the PMP approach that, in turn, leads to generating the boundary value problem (15)—the necessary conditions that the control, the state, and the co-state variables need to satisfy. This boundary value problem can be solved by multiple shooting, indirect shooting, and indirect collocation methods. Ghaffari and Naserifar (2010) solve the problem by using the FBSM (Lenhart and Workman 2007). This method takes advantage of the fact that the values of co-state variables are not required to solve the state equations. For convenience, the outline of the method is provided with regard to problem (15):

  1. 1.

    make an initial guess for the control input, u, over the interval \( \left[ 0,t_f\right] \);

  2. 2.

    use the initial conditions (\( x\left( 0\right) =1 \), \( y\left( 0\right) =1 \), and \( z\left( 0\right) =1 \)), and the control, u, to solve the state equations (15a)–(15c), by using any of numerical methods such as the fourth-order Runge–Kutta method. Store the values of state variables, x, y, and z;

  3. 3.

    use the transversality conditions (\( \lambda _1 \left( t_f\right) = 0 \), \(\lambda _2 \left( t_f\right) = A\), and \(\lambda _3 \left( t_f\right) = 0 \)), and the values of state variables, x, y, and z, to solve the co-state equations (15d)–(15f) backward in time, i.e. by integration from final time, \( t=t_f \), to initial time, \( t=0 \). store the values of co-state variables, \(\lambda _1\), \(\lambda _2\), and \(\lambda _3\);

  4. 4.

    use the co-state variable, \( \lambda _1 \), to update the control, u, by referring to (15h). Store the updated values of control, u;

  5. 5.

    return to (step) number 2, unless a stopping criterion is met. In this case, give the current values of control, states, and co-states as solutions.

Although the method can be easily programmed, it suffers a major problem that is inherent in indirect methods. These methods are very sensitive to the initial guess of unknown conditions or variables (Betts 1998). As stated by Lenhart and Workman (Lenhart and Workman 2007, page 50), the FBSM may run into difficulties with convergence and therefore the initial guess requires adjusting. While the FBSM requires too many runs in order to find an appropriate initial guess, on the other hand, the GA and the PSO are capable of finding acceptable solutions even by a single run.

Figure 4 shows (for the first case with \( c = 0.025 \, \mathrm {(day^{-1})} \)) the best controls and corresponding tumour state trajectories obtained by using the GA, the PSO, and the FBSM used by Ghaffari and Naserifar (2010).

Fig. 4
figure 4

The optimal control, \( u\left( t\right) \), and corresponding tumour state trajectory, \( y\left( t\right) \), obtained by using the proposed GA, the PSO, and the FBSM for \( c= 0.025 \, (\mathrm {day}^{-1}) \) and \( B = 1.0 \); a the optimal controls; b the tumour state trajectories

Figure 5 shows the optimal solutions and corresponding tumour state trajectories for the second case with \( c = 0.040 \, \mathrm {(day^{-1})} \). For convenience, a brief summary of the obtained results is given in Table 2.

Minimum value of objective functional (12) : in case 1, the minimum value of objective function, obtained by using the GA and the PSO, equates to \( - 6.546 \times 10^6 \) and \( - 6.553 \times 10^6 \) respectively. On the other hand, the minimum cost value obtained by using the FBSM is equal to \( - 6.203 \times 10^6 \). Similarly, in case 2, the minimum value of objective function, obtained by using the GA and the PSO is equal to \( - 5.314 \times 10^6 \) and \( - 5.313 \times 10^6 \) respectively. This value equates to \( - 4.982 \times 10^6 \) for the FBSM. In both cases, the minimum values of cost function (12), obtained by the GA and the PSO, are less than those obtained by Ghaffari and Naserifar (2010). These results demonstrate the effectiveness of the proposed metaheuristics (compared to the FBSM) in dealing with the highly nonlinear single-objective problem (13).

In terms of minimum cost, these results show the proposed metaheuristics’ capabilities of finding acceptable solutions, compared to the FBSM. This can be considered as a good marker of the effectiveness of these metaheuristics.

Fig. 5
figure 5

The optimal control, \( u\left( t\right) \), and corresponding tumour state trajectory, \( y\left( t\right) \), obtained by using the proposed GA, the PSO, and the FBSM for \( c= 0.040 \, (\mathrm {day}^{-1}) \) and \( B = 1.0 \times 10^4 \); a the optimal controls; b the tumour state trajectories

Table 2 Minimum values of the cost function (12) and percentage of total days with treatment for case 1: \( c=0.025 \, \left( \mathrm {day^{-1}}\right) \), \( B=1.0 \), and case 2: \( c=0.040 \, \left( \mathrm {day^{-1}}\right) \), \( B=1.0 \times 10^4 \), obtained by using the GA, the PSO, and the FBSM used by Ghaffari and Naserifar (2010)

Final value and the time integral of cancer cells: one major item, which has been considered in the objective functional (12), is the minimization of the final state of tumour cells that is denoted by \( y_\textit{end} \) in Table 2. In case 2, the obtained values of \( y_\textit{end} \) are rather similar, however, the performance of the FBSM is quite better than metaheuristics. While, in case 1, the FBSM shows a better performance, the metaheuristics show higher effectiveness compared to the indirect method, however, in case 2, the GA and PSO do not have any particular advantage over the FBSM.

In case 1, the percentage of the total days with treatment is \( 92 \% \) for the GA, \( 93 \% \) for the PSO, and \( 91 \% \) for the FBSM. This means that the total drug usage in the indirect method is less than those of metaheuristics. However, for this method, Fig. 4b shows that the total amount of tumour cells during the treatment is much more than those obtained by using metaheuristics. For the second case in contrast, the percentage of the total days with treatment is \( 71 \% \) for the GA, \( 72 \% \) for the PSO, and \( 75 \% \) for the FBSM, which means that the total amount of the effector cells, obtained by using the metaheuristics, are less than that of the FBSM.

7 Bi-objective optimization problem

The goal of this section is to formulate the problem in a bi-objective form (Deb 2014). In this case, instead of a single solution, a set of optimal solutions is obtained which provides better insight into the problem. Before anything else, a short description of multi-objective problems is provided, then the problem is described and solved by using the NSGA-II, devised by Deb et al. (2002).

7.1 A brief overview of multi-objective optimization problems

A multi-objective optimization problem (MOP) refers to that involving more than one objective function. If \( x ={ \left( x_1,x_2, \dots ,x_d\right) } \in X \) denotes the decision vector, where X is the set of feasible decision vectors, then the goal is to simultaneously minimize a set of functions \( f_1\left( x\right) \), \( f_2\left( x\right) \), \( \dots \), \( f_k\left( x\right) \), which can be considered components of an objective vector \( f: \, X \rightarrow {\mathbb {R}}^{k} \):

$$\begin{aligned} \underset{x \in X}{\min } \; f(x) = \left( f_1\left( x\right) , f_2\left( x\right) , \dots , f_k\left( x\right) \right) . \end{aligned}$$
(21)

There are two euclidean spaces to be considered in MOPs:

  1. 1.

    the d-dimensional space of the decision vector x (called decision space);

  2. 2.

    the k-dimensional space of the objective functions (objective space) in which each coordinate axis corresponds to a component of the objective vector f.

Fig. 6
figure 6

A schematic diagram of objective space

The most important thing about MOPs is that decreasing an objective almost (not necessarily) leads to increasing some of other objectives. For example, minimizing the total amount of injected effector cells (as an objective) cannot be achieved without causing a simultaneous increase in the final amount of cancer cells (as another objective). In other words, minimization of total injected drug conflicts with minimization of cancer cells. This example is illustrated by a schematic diagram in Fig. 6. The horizontal axis shows the first objective function (the total drug) and the vertical axis corresponds to the second objective—the tumour cells. Some typical decision vectors (i.e. \( \mathbf {A}, \dots , \mathbf {E} \)) are mapped onto the objective space. Obviously, none of decision vectors \( \mathbf {A} \), \( \mathbf {B} \), and \( \mathbf {C} \) are superior to each other. For instance, vector \( \mathbf {A} \) gives a better cost value of the first objective than vector \( \mathbf {C} \), i.e. \( f_1(\mathbf {A}\,) < f_1(\mathbf {C}\,)\), but vector \( \mathbf {C} \) gives a lower amount of tumour in comparison to \( \mathbf {A} \), i.e. \( f_2(\mathbf {C}\,) < f_2(\mathbf {A}\,)\). This is the reason why there is not a unique optimal solution to an MOP and therefore the notion of “optimum” changes. However, the decision vector \( \mathbf {D} \) is not a good candidate, because \( f_1(\mathbf {A}\,) < f_1(\mathbf {D}\,)\) and \( f_2(\mathbf {A}\,) < f_2(\mathbf {D}\,)\) (and the so-called term is: “\( \mathbf {A} \) dominates \( \mathbf {D} \)”). Similarly, the decision vector \( \mathbf {E} \) cannot be a good choice since dominated by \( \mathbf {B} \).

A decision vector \( x \in X \) is said to dominate another vector \( y \in X \) (denoted by \( x \preceq y \), or \( x \; \mathrm {dom} \, y\)) if and only if \( \forall i \in \left\{ 1, \dots , k \right\} \): \( f_i\left( x\right) \le f_i\left( y\right) \) and \( \exists j \in \left\{ 1, \dots , k \right\} \): \( f_j\left( x\right) < f_j\left( y\right) \), where k is the number of objective functions (see (21)). If there is no \( y \in X \) that dominates x, then x is called a non-dominated solution. Moreover, a non-dominated solution is an optimal solution—called Pareto optimal solution—and (vise versa) if a decision vector is a Pareto Optimal solution, then it is a non-dominated solution. In Fig. 6, vectors \( \mathbf {A} \), \( \mathbf {B} \), and \( \mathbf {C} \) are obviously Pareto optimal solutions. Pareto optimal set, denoted by \( X_p \), is defined as the set of all non-dominated solutions. Its corresponding map onto the objective space, \( f \left( X_p\right) \), is called Pareto front (\( f(\mathbf {A}\,) \), \( f (\mathbf {B}\,) \), \( f (\mathbf {C}\,) \), and possibly the dashed curve in Fig. 6). Obviously, if a decision vector is found that dominates (for example) the Pareto optimal solution \( \mathbf {A} \), then the decision vector \( \mathbf {A} \) is no longer a Pareto optimal solution and the Pareto front (dashed curve shown in Fig. 6) changes.

7.2 Main approaches to MOPs

The (ideal) goal in MOPs is to (hopefully) identify the true Pareto optimal set (which possibly consists of infinitely many Pareto optimal solutions) but, in practice, this may be impossible. In general, a large and complex search space (and a highly nonlinear system) gives rise to difficulties for traditional methods (for instance, a multiple objective linear program) to be capable of finding the true Pareto optimal set. For many problems, in addition, there generally exists an enormous amount of Pareto solutions (perhaps infinite). In this regard, production of a finite set of Pareto optimal solutions as a representative of Pareto optimal set is in order.

A broad category of approaches to MOPs (called “a priori” methods) refers to strategies in which the original MOP is converted into a single-objective problem. Here, a decision maker (an expert in the problem field) is necessary to be asked for preference information based on which the (single) best solution is found. Weighted sum method, \(\epsilon \)-constraint method, and goal programming fall into this category.

By contrast, “a posteriori” methods tackle the problem as it is (not to convert it into a single-objective problem) and produce a representative set of finite Pareto solutions, among which an expert (in the field) chooses the appropriate solution. Normal boundary intersection, and normal constraint belong to “a posteriori” category. A detailed list of MOP methods can be found in the article by Marler and Arora (2004), although other classification methodologies can be observed in the book by Coello et al. (2007).

A redeeming feature of “a priori” methods is that common approaches to single-objective problems can be taken. However, an objective assessment of weights, goals, and constraints, is usually difficult. Even worse, difficulties may arise due to a relative lack of background knowledge. In general, the optimizer attempts to use various sets of weights or constraints to produce a set of Pareto solutions, however, this strategy cannot always lead to a diverse set of solutions. The weighted sum method is, in addition, very sensitive to the configuration of objective domain (non-convex regions may not be discovered in general). Although production of solutions associated with non-convex regions is not beyond the capabilities of \( \epsilon \)-constraint method, a prior knowledge of problem is required to appropriately choose a suitable range of constraints. In “a posteriori” methods, an algorithm has to be repeatedly run in order to find a Pareto optimal solution over each run.

On the other hand, evolutionary algorithms are always being improved. These methods have attracted a lot of attention due to their efficiency and ease of implementation. Coello et al. (2007) provide a thorough and comprehensive review and study of MOPs and multi-objective evolutionary algorithms (such as niched pareto GA, non-dominated sorting GA, and strength pareto evolutionary algorithm). In this work, the proposed problem is solved by using the NSGA-II (Deb et al. 2002). Thus a brief description of the algorithm is provided here for convenience.

7.3 A short overview of NSGA-II

Similar to a simple GA, first of all, a population of n individuals is produced. At each iteration \( n_c \) and \( n_m \) crossover offspring and mutants are produced where N indicates the total of all parents, crossover offspring, and mutants, i.e. \( N = n + n_c + n_m \). The vector of cost functions, \( \left( J_1, \dots ,J_d \right) \in {\mathbb {R}}^d\), is evaluated for each individual.

Fig. 7
figure 7

Selection procedure in NSGA-II

All the individuals must be compared to one another in order to realize how many times each individual is dominated by others. A so-called non-domination rank is assigned to each individual. The rank of any individual that is not dominated at all is equal to 1. All these individuals, for which the non-domination rank is equal to 1, belong to the first set of Pareto solutions denoted by \( {\mathcal {F}}_1 \). If an individual is dominated one time, its non-domination rank is equal to 2 and therefore it belongs to the second set of Pareto solutions, \( {\mathcal {F}}_2 \), and so on. In order to easily illustrate how the individuals transfer to the next generations, a population of \( n=50 \) individuals is considered. At each generation, the total number of crossover offspring, mutants, and parents is, for instance, equal to \( N = 100 \).

Figure 7 shows a representative sample of iterations in which 50 individuals must be selected from 100 members for the next generation. The first two sets of Pareto solutions, namely \( {\mathcal {F}}_1 \) and \( {\mathcal {F}}_2 \) with a total number of 35 individuals, are definitely selected for the next generation. The remaining 15 individuals must be selected from 25 individuals in \( {\mathcal {F}}_3 \). However, the members of \( {\mathcal {F}}_3 \) have no particular superiority over one another in terms of non-domination rank. Thus, the members in \( {\mathcal {F}}_3 \) are sorted in descending order of crowding distance (see Sect. 5) and then the best 15 individuals at the top of the list are selected for the next generation.

For each individual, the distance value corresponding to each objective must be calculated by using (16). Then, the overall crowding distance of that individual is equal to the sum of the distance values corresponding to each objective. In summary, the first priority is selection based on non-domination, and the remaining individuals are chosen based on crowding distance to preserve the diversity of Pareto solutions. Over each iteration, the algorithm improves the quality of solutions and finally the first set of Pareto solutions, \( {\mathcal {F}}_1 \), converges on the Pareto optimal solutions.

7.4 Problem statement and results

In Sect. 4, a single objective problem was considered in the form of the weighted sum of: (i) the total use of effector cells; (ii) the state variables; and (iii) the final value of cancer cells. As stated before, this method (formulation of the problem in a single objective form) has a major disadvantage: it requires various sets of weight values, to produce different optimal treatment plans. This restricts the decision maker to a very limited set of strategies. To cope with this restriction, the performance of the cancer-immune system can be optimized by formulation of a multi-objective problem.

In this regard, the aforementioned problem is now formulated in a bi-objective form. In addition, since the administration of IL-2 plays a major role in improving the immune system performance, the external source of IL-2 will be considered here as the second control input.

The first objective, \( J_1 \), is defined as:

$$\begin{aligned} J_1 = \frac{1}{2 \, t_f} \int _{0}^{t_f} {\left[ u_1\left( t\right) + u_2\left( t\right) \right] } \mathrm {d}t, \end{aligned}$$
(22)

to minimize the total drugs used during the treatment, where the piecewise continuous control inputs, \( u_1 \) and \( u_2 \), denote effectors and IL-2, respectively. The second objective is defined as follows:

$$\begin{aligned} J_2 = \phi \, y \left( t_f\right) + \left( 1-\phi \right) \frac{1}{t_f} \int _{0}^{t_f}{y\left( t\right) } \mathrm {d}t, \quad \phi \in \left[ 0,1\right] , \end{aligned}$$
(23)

to minimize the cancer cells during the treatment period and, in addition, to minimize the final state of cancer cells. the coefficient \( \phi \), which is set to 0.3, indicates the importance of minimizing the final state in comparison to minimizing the running cost of state trajectory during the treatment period. This parameter is arbitrary set to 0.3 and is of no clinical significance. In conclusion, the bi-objective problem is described as follows:

$$\begin{aligned}&\min \left( J_1,J_2\right) , \end{aligned}$$

subject to:

$$\begin{aligned}&\frac{\mathrm {d}x}{\mathrm {d}t} = cy - \mu _2 x + \frac{p_1 x z}{g_1 + z} + u_1\left( t\right) s_1, \nonumber \\&\frac{\mathrm {d}y}{\mathrm {d}t} = r_2 y \left( 1 - b y \right) - \frac{a x y}{g_2 + y}, \nonumber \\&\frac{\mathrm {d}z}{\mathrm {d}t} = \frac{p_2 x y}{g_3 + y} - \mu _3 z + u_2\left( t\right) s_2, \nonumber \\&x\left( 0\right) = {10^4} \left( \mathrm {cell}\right) , \, y\left( 0\right) = {10^4} \left( \mathrm {cell}\right) , \, z\left( 0\right) = {10^4} \left( \mathrm {IU . L^{-1}}\right) , \nonumber \\&\forall t \in \left[ 0,t_f\right] : \, u_1\left( t\right) ,u_2\left( t\right) \in \left\{ 0,1\right\} . \end{aligned}$$
(24)

A 350-day period is taken into account for treatment, i.e. \(t_f = 350 \, \left( \mathrm {day}\right) \). The daily dosage of IL-2 is set to \( s_2 = 3.0 \times 10^6 \, \left( \mathrm {IU.L^{-1}.day^{-1}}\right) \)—less than the critical amount of IL-2. The daily dosage of effector cells is considered to be equal to the minimum amount that is allowed: \( s_1 = 506.9 \, \left( \mathrm {cell.day^{-1}}\right) \) (see Sect. 3 for critical values of \( s_1 \) and \( s_2 \) in combined therapy).

The model will be normalized such that all the initial conditions, \( x\left( 0\right) \), \( y\left( 0\right) \), and \( z\left( 0\right) \) are equal to 1.0. Thus the consequent changes in the values of the parameters are shown in Table 3. Kirschner and Panetta (1998) give a detailed information on how to normalize the model.

Table 3 Values of the parameters in the normalized Kirschner–Panetta model, used in problem (24)
Fig. 8
figure 8

Pareto front obtained by NSGA-II for the parameter values given in Table 3

Fig. 9
figure 9

Tumour state trajectories corresponding to the \(50\%\) and \(100\%\) drugs usage

The problem are coded in MATLAB computing environment. Each individual contains two decision vectors corresponding to the control inputs, \( u_1 \left( t\right) \) and \( u_2 \left( t\right) \). Each decision vector, produced by NSGA-II, is defined as a 350-tuple binary-valued vector and then is decoded as a meaningful control input for evaluation of objective functions. The NSGA-II is run over 100 iterations with a population of 200 individuals.

The Pareto front is illustrated in Fig. 8. The horizontal axis is related to the first objective, \( J_1 \), and shows the percentage of the total days with treatment. The vertical axis shows the second objective in (23). Each point of Pareto front corresponds to a non-dominated solution. The Pareto front enables the decision maker to observe all possible situations and choose a specific treatment strategy.

As an example, Fig. 9 shows the tumour trajectories, corresponding to \( 50\% \) and \( 100\% \) drugs usage, and demonstrates how the combined therapy enhances the immune system in comparison with the ACT therapy. Obviously, in the case of \( 50\% \) drugs usage, the tumour is completely eliminated after 200 days, and the treatment period is reduced to 100 days for \( 100\% \) drugs usage.

8 Conclusion

The main goal of this work is to examine the performance of an indirect method of solving optimal control problems, called forward-backward sweep method (FBSM), in the context of biological systems. For this purpose, the work by Ghaffari and Naserifar (2010) is brought back. They use exactly this method for developing cancer treatment protocols, based on the Kirschner–Panetta model of cancer immunotherapy. The genetic algorithm (GA) and the particle swarm optimization (PSO) are also used along with the aforementioned method, in order to provide an objective assessment of their performance.

The FBSM suffers the difficulties that are inherent in using indirect methods, i.e. the need to make an initial guess at control (or co-state variables, or unknown conditions) and, more importantly, the sensitivity of the method to the initial guess. In other words, a serious problem will arise if the method does not converge to a solution, and consequently, the initial guess requires adjusting over and over again. In actual fact, this is an utterly time-consuming process. This situation becomes even more difficult when the method is used in the context of biological problems. Biological systems are normally difficult to be described in terms of parameters. For example, in the Kirschner–Panetta model, the parameter c is considered as a measure of the tumour antigenicity and therefore cannot have a fixed value for different types of tumours and patients (see Table 1). These different parameter values not only intensify the running costs also could lead to different optimal results.

At the first step, the problem is formulated in a single-objective form. Compared to the FBSM, the metaheuristics appear to be rather more successful in minimizing the cost function (12) and therefore their performance remains competitive with classic methods such as the FBSM. As an important point, it must be mentioned that solutions to the problem of this type are typically bang-bang controls. This means the individuals, in the proposed metaheuristics, can be easily defined as binary-valued vectors, and then, converted into piecewise continuous functions as control inputs. Otherwise, It will be very hard (even impossible) to utilize these metaheuristics for dealing with those problems that their inputs are not bang-bang controls. This can be considered as a major disadvantage of the proposed metaheuristics.

At the second step, the optimal problem is formulated in a bi-objective form, where, the IL-2 is also used to enhance the immune system. Formulating the problem in a multi-objective form provides the decision maker with a wide variety of optimal solutions. The bi-objective problem appears to be superior to the single-objective problem, since the decision maker is given the opportunity of observing a set of non-dominated optimal solutions in order to select the most appropriate treatment strategy.