1 Introduction

Optimal process control with control constraints is a challenging task related to many real-life problems [1], such as stabilization of mechanical plants [2, 3], quantum mechanics [4] and industrial processes [5]. The general form of these problems was addressed in several studies, e.g. [6,7,8,9]. The complexity of the published results reflects the complexity of the addressed problems. More specifically, for linear quadratic (LQ) problems, the imposition of constraints fundamentally impacts their solvability [10]. This study deals with a type of the continuous time constrained LQ regulator (CLQR) problem, i.e. an optimal control problem, defined by a quadratic performance index and a set of admissible processes, which satisfy a linear state-equation and some predefined constraints.

Different formulations of CLQR problems can be found in literature. For example, a finite horizon with state and control force bounds was studied and an iterative algorithm was suggested for the control synthesis [5]; A finite horizon LQR with log-barrier states constraints was reformulated as an unconstrained dynamic game [10]. The new formulation and the properties that were developed can be useful for solving the original CLQR problem.

Constraints on the control force direction can be formulated as a bilinear mapping with control signal bounds [2, 3, 11]. Such formulation turns the linear state equation to a bilinear one. Bilinear models are known to be simple and at the same time effective nonlinear dynamic models that appear in many practical modern control problems [4]. Despite their nonlinearity, their characteristics are close to those of linear systems and therefore facilitate the use of some techniques and procedures from linear systems theory [12].

Optimal control of bilinear models without control constraints has been addressed by many researchers [13,14,15,16,17]. Necessary conditions such as Pontryagin’s minimum principle [13, 14] or Lagrange multipliers [16] were used to derive suitable necessary optimality conditions. Iterative algorithms where formulated by using successive approximations [13, 14] or Adomian decomposition [15]. In [2] a constrained finite time optimal control problem with a bilinear model and biquadratic performance index was considered. The necessary optimality conditions were formulated but not solved. Instead, a clipped optimal control and numerical method for suboptimal control were given. Optimal control of a bilinear model with control signal bounds was treated in [11]. A stochastic Hamilton–Jacobi–Bellman equation was used for the formulation of the optimal control by means of a boundary value problem. The problem was not solved however it was used as a theoretical basis for a corresponding clipped optimal control.

The present paper introduces a method for optimal controller synthesis for a CLQR problem, which is defined for a single control input with bilinear inequality constraints and control bounds. In this study a constrained optimal control problem is formulated and fully solved. As a first step, the CLQR problem is reformulated as an equivalent constrained bilinear biquadratic optimal control problem (CBBR). The methodology, used for solving the CBBR problem, is Krotov’s method. For the convenience of readers, who are not familiar with Krotov’s theory, the necessary parts from the theory are given in Sect. 3. Next, Krotov’s method is used to derive an algorithm for the optimal control synthesis in a feedback form. As it will be described hereinafter, the main novelty of the present study is formulating a sequence of improving functions that suits the addressed problem and allows for Krotov’s method to be used for its solution. The required computational steps are arranged as an algorithm and proof outlines for the convergence and optimality of the solution, are given. The efficiency of the suggested method is illustrated by numerical example.

2 The optimal control problem

Definition 1

Let \({\mathbf {x}}:\mathbb {R}\rightarrow \mathbb {R}^n\) be a state vector function and \(w:\mathbb {R}\rightarrow \mathbb {R}\) be a control signal. The pair \(({\mathbf {x}},w)\) is said to be an admissible process if it satisfies the linear time invariant state equation:

$$\begin{aligned} {\dot{{\mathbf {x}}}}(t) = {{\mathbf {A}}{{\mathbf {x}}}}(t) + {\mathbf {b}}{w}(t); \quad {\mathbf {x}}(0), \forall t\in (0,t_f) \end{aligned}$$
(1)

where \({\mathbf {x}}(0)\) is an initial state vector and w satisfies:

  1. C1:

    \(w(t){\mathbf {c}}{\mathbf {x}}(t)\le 0\),

  2. C2:

    \({\mathbf {c}}{\mathbf {x}}(t)= 0\rightarrow w(t)=0\),

  3. C3:

    \(|w(t)|\le w_{max}\),

for all \(t\in [0,t_f]\); \({\mathbf {A}}\in \mathbb {R}^{n\times n}\); \({\mathbf {b}}\in \mathbb {R}^n\); \({\mathbf {c}}^T\in \mathbb {R}^n\) and \(w_{max}>0\).

The set of admissible control signals will be denoted by \(\mathcal {W}({\mathbf {x}})\). Its \({\mathbf {x}}\) dependency is clear from the above definition. Such constraints arise in semi-active control design problems, where the control forces have physical constraints on their direction [18] and amplitude. For example, when a control force is applied to a mechanical plant through a controllable friction based actuator, e.g. a magnetorheological actuator [19], the direction of the control force is opposed to the actuator motion, represented by \({\mathbf {c}}{\mathbf {x}}(t)\). When there is no motion, the actuator cannot generate force at all. These physical constraints are represented by C1 and C2. Additionally, some design considerations, such as actuator limitations or unwanted local plant effects, set bounds on the force magnitude. These bounds are represented by C3. Later in this article we will refer to constraints C1 and C2 as semi-active constraints. It also should be noted that by imposing C2 the problem which was discussed in [18] is avoided.

A CLQR optimal process is one that minimizes the following quadratic performance index.

Problem 1

(CLQR) The constrained linear quadratic regulator (CLQR) problem is a search for an admissible process \(({\mathbf {x}}^*,w^*)\) that minimizes the quadratic performance index:

$$\begin{aligned} J({\mathbf {x}},w)&= \frac{1}{2}\int \limits _0^{t_f} {\mathbf {x}}(t)^T{\mathbf {Q}}{\mathbf {x}}(t) + w(t)^2 r \mathrm {d}\,t \end{aligned}$$
(2)

for a given \({\mathbf {x}}(0)\), \(0\preceq {\mathbf {Q}}\in \mathbb {R}^{n\times n}\) and \(r>0\).

As it can be found in other works (e.g. [2, 3]), instead of treating the CLQR directly, an equivalent bilinear/biquadratic formulation can be used, as follows.

Let the mapping \(\hat{w}:\mathbb {R}\times \mathbb {R}^n \rightarrow \mathbb {R}\) be

$$\begin{aligned} \hat{w}(t,{\mathbf {x}}(t)) = -u(t){\mathbf {c}}{\mathbf {x}}(t) \end{aligned}$$
(3)

where u is a control signal that satisfies

$$\begin{aligned} u(t) \left\{ \begin{array}{ll} =0,&{}\quad {\mathbf {c}}{\mathbf {x}}(t)=0\\ \in \left[ 0,\frac{w_{max}}{|{\mathbf {c}}{\mathbf {x}}(t)|}\right] , &{}\quad \text {otherwise} \end{array}\right. \end{aligned}$$
(4)

Therefore:

$$\begin{aligned} \hat{w}(t,{\mathbf {x}}(t)){\mathbf {c}}{\mathbf {x}}(t)&= -u(t)({\mathbf {c}}{\mathbf {x}}(t))^2\le 0 \end{aligned}$$
(5)

and

$$\begin{aligned} |\hat{w}(t,{\mathbf {x}}(t))|&= |u(t){\mathbf {c}}{\mathbf {x}}(t)| = u(t)|{\mathbf {c}}{\mathbf {x}}(t)| \le \left( \frac{w_{max}}{|{\mathbf {c}}{\mathbf {x}}(t)|}\right) |{\mathbf {c}}{\mathbf {x}}(t)| = w_{max} \end{aligned}$$
(6)

Hence, by letting \(w(t)=\hat{w}(t,{\mathbf {x}}(t))\), we have \(w\in \mathcal {W}({\mathbf {x}})\) for any \({\mathbf {x}}\). Substitution of \(\hat{w}\) in (1) and (2) leads to the following optimal control problem.

Definition 2

Let \({\mathbf {x}}:\mathbb {R}\rightarrow \mathbb {R}^n\) be a state vector function and \(u:\mathbb {R}\rightarrow \mathbb {R}\) be a control signal. The pair \(({\mathbf {x}},u)\) is said to be an admissible process if it satisfies the bilinear state equation [12]:

$$\begin{aligned} {\dot{{\mathbf {x}}}}(t) = [{\mathbf {A}}- u(t){\mathbf {b}}{\mathbf {c}}]{\mathbf {x}}(t); \quad {\mathbf {x}}(0), \forall t\in (0,t_f) \end{aligned}$$
(7)

and u satisfies (4) for all \(t\in [0,t_f]\).

The set of admissible control signals are denoted by \(\mathscr {U}({\mathbf {x}})\). A CBBR optimal process is one that minimizes the following biquadratic performance index.

Problem 2

(CBBR) The constrained bilinear biquadratic regulator (CBBR) problem is the search for an admissible process \(({\mathbf {x}}^*,u^*)\) that minimizes the biquadratic performance index:

$$\begin{aligned} J({\mathbf {x}},u)&= \frac{1}{2}\int \limits _0^{t_f} {\mathbf {x}}(t)^T{\mathbf {Q}}{\mathbf {x}}(t) + r u(t)^2({\mathbf {c}}{\mathbf {x}}(t))^2 \mathrm {d}\,t \end{aligned}$$
(8)

where \(0\preceq {\mathbf {Q}}\in \mathbb {R}^{n\times n}\) and \(r>0\).

The methodology that is used for solving the CBBR problem is Krotov’s method.

3 Krotov’s sufficient conditions

A popular approach for obtaining a suitable solution to many optimal control problems is to use first order variational calculus or Pontryagin’s minimum principle. However, for many problems these theorems provide merely necessary conditions [20]. Starting in the 1960s, new results on sufficient conditions for the global optimum of optimal control problems, began to be published by Krotov [21]. A brief description of this approach with the main theorems is given hereinafter. The theorems are taken from the published works of Krotov, however, the formulation was adapted to the needs and nature of the problems relevant to this study. Their proofs can be found in [22].

Let \(\mathscr {U}\) be a set of admissible control signals and \(\mathscr {X}\) a linear space of state vector functions. An admissible process is the pair \(({\mathbf {x}},{\mathbf {u}})\), where \({\mathbf {u}}\in \mathscr {U}\), \({\mathbf {x}}\in \mathscr {X}\) and they both satisfy the states equation

$$\begin{aligned} {\dot{{\mathbf {x}}}}(t) = {\mathbf {f}}({\mathbf {x}}(t),{\mathbf {u}}(t),t); \quad {\mathbf {x}}(0), \forall t\in (0,t_f) \end{aligned}$$
(9)

Let \(q:\mathbb {R}\times \mathbb {R}^n\rightarrow \mathbb {R}\) be a smooth function of states and time. The following theorem states that each q is related with some equivalent formulation of the performance index.

Theorem 1

Let a constrained control optimization problem be defined by the states equation, performance index and a set of admissible control signals:

$$\begin{aligned} {\dot{{\mathbf {x}}}}(t)&= {\mathbf {f}}({\mathbf {x}}(t),{\mathbf {u}}(t),t);\quad {\mathbf {x}}(0),\forall t\in (0,t_f) \end{aligned}$$
(10)
$$\begin{aligned} J({\mathbf {x}},{\mathbf {u}})&= l_f({\mathbf {x}}(t_f)) + \int \limits _0^{t_f} l(t,{\mathbf {x}}(t),{\mathbf {u}}(t)) \mathrm {d}\,t \nonumber \\&{\mathbf {u}}\in \mathscr {U}\end{aligned}$$
(11)

where \({\mathbf {x}}(0)\) is known.

Let q be a given smooth function and \(({\mathbf {x}},{\mathbf {u}})\) be an admissible process. \(J_{eq}:\mathscr {X}\times \mathscr {U}\rightarrow \mathbb {R}\) is an equivalent representation of J which corresponds to q and is defined by:

$$\begin{aligned} J({\mathbf {x}},{\mathbf {u}}) \equiv J_{eq}({\mathbf {x}},{\mathbf {u}})&= s_f({\mathbf {x}}(t_f)) + q(0,{\mathbf {x}}(0)) + \int \limits _0^{t_f} s(t,{\mathbf {x}}(t),{\mathbf {u}}(t)) \mathrm {d}\,t \end{aligned}$$
(12)

where

$$\begin{aligned} s(t,{\varvec{\xi }},{\varvec{\nu }}) \triangleq \,&\, q_t(t,{\varvec{\xi }}) + q_{\mathbf {x}}(t,{\varvec{\xi }}) {\mathbf {f}}(t,{\varvec{\xi }},{\varvec{\nu }}) + l(t,{\varvec{\xi }},{\varvec{\nu }}) \end{aligned}$$
(13)
$$\begin{aligned} s_f({\varvec{\xi }}) \triangleq \,&\, l_f({\varvec{\xi }}) - q(t_f,{\varvec{\xi }}) \end{aligned}$$
(14)

Here \({\varvec{\xi }}\in \mathbb {R}^n\) and \({\varvec{\nu }}\in \mathbb {R}^{n_u}\).

In the published work of Krotov (such as [4, 21, 22]), s and q are defined with opposite signs which turn some of the minimization problems into maximization ones. In the present study, a formulation which leaves the problem as the customary one, i.e. a minimization problem, was chosen. Though, there is no essential difference between these two formulations.

The following theorem provides a sufficient condition for the global optimality of a given admissible process–\(({\mathbf {x}}^*,{\mathbf {u}}^*)\) by means of \(J_{eq}\). In what follows, \(\mathscr {X}(t)\) is a set \(\{{\mathbf {x}}(t)|\forall {\mathbf {x}}\in \mathscr {X}\}\).

Theorem 2

Let s and \(s_f\) be related with some q and let \(({\mathbf {x}}^*\in \mathscr {X},{\mathbf {u}}^*\in \mathscr {U})\) be an admissible process. If:

$$\begin{aligned} s(t,{\mathbf {x}}^*(t),{\mathbf {u}}^*(t))&= \min _{\begin{array}{c} {\varvec{\xi }}\in \mathscr {X}(t)\\ {\varvec{\nu }}\in \mathscr {U}(t) \end{array}} s(t,{\varvec{\xi }},{\varvec{\nu }}) \; \forall t\in [0,t_f) \nonumber \\ s_f({\mathbf {x}}^*(t_f))&= \min _{{\varvec{\xi }}\in \mathscr {X}(t_f)} s_f({\varvec{\xi }}) \end{aligned}$$
(15)

then \(({\mathbf {x}}^*,{\mathbf {u}}^*)\) is an optimal process.

Remarks

  • It is customary to refer q, which satisfies (15) and allows the computation of \(({\mathbf {x}}^*,{\mathbf {u}}^*)\), as Krotov function or solving function.

  • An optimum, derived by this theorem, is global since the minimization problem, defined in (15), is global [22].

  • Note that in order that the equivalence \(J({\mathbf {x}},{\mathbf {u}}) = J_{eq}({\mathbf {x}},{\mathbf {u}})\) will hold, \({\mathbf {x}}\) and \({\mathbf {u}}\) must satisfy the state Eq. (10).

  • Since q is not unique, s, \(s_f\) and \(J_{eq}\) are non-unique too.

  • The formulations above are defined with the assumption of a smooth q. This assumption can be weakened into piecewise smooth [21], i.e. smooth over t and \({\mathbf {x}}(t)\) except for some set of t’s with finite time difference between them.

This approach provides not only a sufficient condition for global optimality but it also lays the foundation for novel algorithms, aimed at the solution of optimal control problems [21]. One of these algorithms is known as Krotov’s method.

3.1 Krotov’s method–successive global improvements of control

The goal of Krotov’s method is the numerical solution of optimal control problems. It was used successfully for solving optimal control problems in quantum mechanics [4], as well as oscillation damping of a simple beam [23].

According to this method, the key to the solution is formulating a sequence of functions with special properties. These functions will be referred to as improving functions. If such a sequence can be found, it allows the computation of a global optimum for the given optimal control problem.

As a first step, an optimizing sequence is defined.

Definition 3

Let \(\{({\mathbf {x}}_k,{\mathbf {u}}_k)\}\) be a sequences of admissible processes. Such a sequence is said to be an optimizing sequence if

$$\begin{aligned} J({\mathbf {x}}_k,{\mathbf {u}}_k)\ge J({\mathbf {x}}_{k+1},{\mathbf {u}}_{k+1}) \end{aligned}$$
(16)

for all \(k=1,2,\ldots \) and:

$$\begin{aligned} \lim _{k\rightarrow \infty } J({\mathbf {x}}_k,{\mathbf {u}}_k) = \inf _{{\mathbf {x}}\in \mathscr {X},{\mathbf {u}}\in \mathscr {U}} J({\mathbf {x}},{\mathbf {u}}) \end{aligned}$$
(17)

If an optimizing sequence can be found, it allows the computation of an admissible process, which is ’arbitrarily close’ to the optimal one by means of J. Krotov’s method is aimed at the computation of such a sequence. It does so by successive improvements of admissible processes. The concept underlying these improvements is the sufficient condition given in the following theorem. Recall that each smooth function q is related to some s and \(s_f\), defined in Theorem 1. The notation \({\hat{{\mathbf {u}}}}\) is used to denote control feedback, i.e. a mapping \({\hat{{\mathbf {u}}}}:\mathbb {R}\times \mathbb {R}^n\rightarrow \mathbb {R}^{n_u}\).

Theorem 3

Let a given admissible process be \(({\mathbf {x}}_k,{\mathbf {u}}_k)\) and let \(q_k\) be a smooth function. If \(({\mathbf {x}}_k,{\mathbf {u}}_k)\) is the solution to the following maximization problem:

$$\begin{aligned} s_k(t,{\mathbf {x}}_k(t),{\mathbf {u}}_k(t))&= \max _{{\varvec{\xi }}\in \mathscr {X}(t)} s_k(t,{\varvec{\xi }},{\mathbf {u}}_k(t))\nonumber \\ s_{f,k}({\mathbf {x}}_{k}(t_f))&= \max _{{\varvec{\xi }}\in \mathscr {X}(t_f)} s_{f,k}({\varvec{\xi }}) \end{aligned}$$
(18)

and if \(\hat{\mathbf {u}}_{k+1}\) is a control feedback which satisfy

$$\begin{aligned} \hat{\mathbf {u}}_{k+1}(t,{\varvec{\xi }}) = \arg \min _{{\varvec{\nu }}\in \mathscr {U}(t)} s_k(t,{\varvec{\xi }},{\varvec{\nu }}) ;\quad \forall t\in [0,t_f] \end{aligned}$$
(19)

then \({\mathbf {x}}_{k+1}\) which solves:

$$\begin{aligned} {\dot{{\mathbf {x}}}}_{k+1}(t) = {\mathbf {f}}(t,{\mathbf {x}}_{k+1}(t),{\hat{{\mathbf {u}}}}_{k+1}(t,{\mathbf {x}}_{k+1}(t)) ;\quad {\mathbf {x}}_{k+1}(0) = {\mathbf {x}}(0), \forall t\in (0,t_f) \end{aligned}$$
(20)

and the controls signal \({\mathbf {u}}_{k+1}(t)=\hat{\mathbf {u}}_{k+1}(t,{\mathbf {x}}_{k+1}(t))\), satisfy (16).

It follows from this theorem that if for each \(({\mathbf {x}}_k,{\mathbf {u}}_k)\) one can find \(q_k\) such that (18) holds, and the feedback defined by (19), then it is possible to find an improved process–\(({\mathbf {x}}_{k+1},{\mathbf {u}}_{k+1})\). Such \(q_k\) will be denoted as improving function. Solving this problem over and over yields an optimizing sequence and hence leads to the solution of the optimization problem.

The required steps are summarized in the following algorithm. The algorithm initialization requires computation of some initial admissible process–\({{\mathbf {x}}_0\in \mathscr {X},{\mathbf {u}}_0\in \mathscr {U}}\). The iterations are done for \(k=\{0,1,2,\ldots \}\), where each iteration is constituted from three stages:

  1. 1.

    Find \(q_k(t,{\mathbf {x}})\) that solves

    $$\begin{aligned} s_k(t,{\mathbf {x}}_k(t),{\mathbf {u}}_k(t))&= \max _{{\varvec{\xi }}\in \mathscr {X}(t)} s_k(t,{\varvec{\xi }},{\mathbf {u}}_k(t)) \\ s_{f,k}({\mathbf {x}}_k(t_f))&= \max _{{\varvec{\xi }}\in \mathscr {X}(t_f)} s_{f,k}({\varvec{\xi }}) \end{aligned}$$

    for a given \(({\mathbf {x}}_k,{\mathbf {u}}_k)\).

  2. 2.

    Find an optimal feedback

    $$\begin{aligned} {\hat{{\mathbf {u}}}}_{k+1}(t,{\varvec{\xi }}) = \arg \min _{{\varvec{\nu }}\in \mathscr {U}(t)} s_k(t,{\varvec{\xi }},{\varvec{\nu }}) \end{aligned}$$
  3. 3.

    Propagate into the next improved state and control processes, by solving

    $$\begin{aligned} {\dot{{\mathbf {x}}}}_{k+1}(t)&= {\mathbf {f}}{\big (}t,{\mathbf {x}}_{k+1}(t),{\hat{{\mathbf {u}}}}_{k+1}(t,{\mathbf {x}}_{k+1}(t)){\big )} \end{aligned}$$

    and setting:

    $$\begin{aligned} {\mathbf {u}}_{k+1}(t)&= {\hat{{\mathbf {u}}}}_{k+1}(t,{\mathbf {x}}_{k+1}(t)) \end{aligned}$$

Remarks

  • This algorithm produces an optimizing sequence, i.e. a sequence of trajectories which converges monotonically to an optimum which satisfies (15).

  • This method has a significant advantage over algorithms based on small variations, since the latter are constrained to small process variations. That is troublesome, as: (1) it leads to a slow convergence rate, and (2) for some optimal control problems small variations are impossible [21].

  • Like in Lyapunov’s method for stability, the use of Krotov’s method is not straightforward. It requires formulation of a suitable sequence–\(\{q_k\}\). However, the search for these functions is a significant challenge. As of this writing, there is no known unified approach for their formulation and they usually differ from one optimal control problem to another. A form, which can be used as an improving function, was indeed suggested in [21], though it is defined by means of some unknown matrix function \({\varvec{\sigma }}:\mathbb {R}\rightarrow \mathbb {R}^{n\times n}\) that should be found. The essential non-uniqueness of the improving function is a key characteristic of this approach. This vagueness is an advantage and at the same time a disadvantage. On the one hand, it poses an additional challenge to the control design, but on the other hand, it provides a solution method with a high level of flexibility.

  • Additionally, sometimes the form of \(q_k\) hints to the form of a corresponding Lyapunov function. When dealing with non-linear plants this is an important contribution, seeing that such systems stability is always questionable.

In the present study, a suitable sequence of improving functions was found for the CBBR problem.

4 Main results

The following two theorems provide the improving function and the control law, which enable the use of Krotov’s method for the CBBR problem. In what follows, \(\mathscr {U}(t,{\mathbf {x}})\) denotes the intersection of \(\mathscr {U}({\mathbf {x}})\) at some t. It refers to the control signal values which are admissible for a given \({\mathbf {x}}(t)\).

Theorem 4

Let

$$\begin{aligned} q(t,{\mathbf {x}}(t)) = 0.5{\mathbf {x}}(t)^T{\mathbf {P}}(t){\mathbf {x}}(t);\quad {\mathbf {P}}(t_f) = {\mathbf {0}}\end{aligned}$$

where \({\mathbf {P}}:[0,t_f]\rightarrow \mathbb {R}^{n\times n}\) is a continuous, piecewise smooth and symmetric matrix function, and let \(v(t,{\mathbf {x}}(t)) \triangleq \frac{{\mathbf {x}}(t)^T{\mathbf {P}}(t){\mathbf {b}}}{r}\).

Then the control law, \(\hat{u}\), that minimizes \(s(t,{\mathbf {x}}(t),u(t))\), is

$$\begin{aligned} \hat{u}(t,{\mathbf {x}}(t))&= \arg \min _{u(t)\in \mathscr {U}(t,{\mathbf {x}})} s(t,{\mathbf {x}}(t),u(t)) \nonumber \\&=\left\{ \begin{array}{ll} 0, &{}\quad v(t,{\mathbf {x}}(t))\mathrm {sign}({\mathbf {c}}{\mathbf {x}}(t))\le 0\\ \dfrac{w_{max}}{|{\mathbf {c}}{\mathbf {x}}(t)|},&{}\quad v(t,{\mathbf {x}}(t))\mathrm {sign}({\mathbf {c}}{\mathbf {x}}(t)) \ge w_{max}\\ \dfrac{v(t,{\mathbf {x}}(t))}{{\mathbf {c}}{\mathbf {x}}(t)}, &{}\quad \text {otherwise} \end{array}\right. \end{aligned}$$
(21)

where \(\mathrm {sign}:\mathbb {R}\rightarrow \{-1,0,1\}\) is the customary sign function.

Proof

The partial derivatives of q are:

$$\begin{aligned} q_t(t,{\mathbf {x}}(t))&= 0.5{\mathbf {x}}(t)^T\dot{\mathbf {P}}(t){\mathbf {x}}(t);&q_{\mathbf {x}}(t,{\mathbf {x}}(t))&= {\mathbf {x}}(t)^T{\mathbf {P}}(t) \end{aligned}$$
(22)

Substitution in (13) and (14) yields:

$$\begin{aligned} s_f({\mathbf {x}}(t_f))&= 0 \nonumber \\ s(t,{\mathbf {x}}(t),u(t))&= 0.5{\mathbf {x}}(t)^T \left[ \dot{\mathbf {P}}(t) + {\mathbf {P}}(t)\left( {\mathbf {A}}-u(t){\mathbf {b}}{\mathbf {c}}\right) \right. \nonumber \\&\left. \quad +\, \left( {\mathbf {A}}-u(t){\mathbf {b}}{\mathbf {c}}\right) ^T{\mathbf {P}}(t) + {\mathbf {Q}}+ ru(t)^2{\mathbf {c}}^T{\mathbf {c}}\right] {\mathbf {x}}(t) \nonumber \\&=0.5{\mathbf {x}}(t)^T \left[ \dot{\mathbf {P}}(t) + {\mathbf {P}}(t){\mathbf {A}}+{\mathbf {A}}^T{\mathbf {P}}(t)+ {\mathbf {Q}}\right] {\mathbf {x}}(t) \nonumber \\&\quad +\, 0.5 r u(t)^2 ({\mathbf {c}}{\mathbf {x}}(t))^2 -2u(t){\mathbf {x}}(t)^T{\mathbf {P}}(t){\mathbf {b}}{\mathbf {c}}{\mathbf {x}}(t) \end{aligned}$$
(23)

Completing the square leads to:

$$\begin{aligned} s(t,{\mathbf {x}}(t),u(t))&=0.5{\mathbf {x}}(t)^T \left[ \dot{\mathbf {P}}(t) + {\mathbf {P}}(t){\mathbf {A}}+{\mathbf {A}}^T{\mathbf {P}}(t)+ {\mathbf {Q}}\right] {\mathbf {x}}(t)\\&\quad +\, r\left( u(t) {\mathbf {c}}_i{\mathbf {x}}(t) - \frac{{\mathbf {x}}(t)^T{\mathbf {P}}(t){\mathbf {b}}}{r} \right) ^2-\frac{({\mathbf {x}}(t)^T{\mathbf {P}}(t){\mathbf {b}})^2}{r} \\&=f_2(t,{\mathbf {x}}(t)) \quad +\, r \left( u (t) {\mathbf {c}}{\mathbf {x}}(t) - v(t,{\mathbf {x}}(t)) \right) ^2 \end{aligned}$$

where \(v(t,{\mathbf {x}}(t))\) was defined in the lemma and \(f_2\) contains all the terms which are independent of u(t). It follows that the minimum of \(s(t,{\mathbf {x}}(t),u(t))\) above \(\mathscr {U}(t,{\mathbf {x}})\) depends merely on the quadratic term. Hence, the minimizing and admissible \(u(t)\in \mathscr {U}(t,{\mathbf {x}})\) is calculated as follows:

  1. (a)

    When \({\mathbf {c}}{\mathbf {x}}(t)=0\), u(t) vanishes from the performance index and the state equation, hence its value has no effect and it can be set to \(u(t)=0\).

  2. (b)

    When \({\mathbf {c}}{\mathbf {x}}(t)\ne 0\) and \(\frac{v(t,{\mathbf {x}}(t))}{{\mathbf {c}}{\mathbf {x}}(t)}\le 0\), the admissible minimum is attained at \(u(t)=0\). However, since:

    $$\begin{aligned} \frac{v(t,{\mathbf {x}}(t))}{{\mathbf {c}}{\mathbf {x}}(t)}&= \frac{v(t,{\mathbf {x}}(t))}{|{\mathbf {c}}{\mathbf {x}}(t)|}\frac{|{\mathbf {c}}{\mathbf {x}}(t)|}{{\mathbf {c}}{\mathbf {x}}(t)} = \frac{v(t,{\mathbf {x}}(t))}{|{\mathbf {c}}{\mathbf {x}}(t)|}\mathrm {sign}(c{\mathbf {x}}(t)) \end{aligned}$$

    and since \(1/|{\mathbf {c}}{\mathbf {x}}(t)|>0\), the inequality \(\frac{v(t,{\mathbf {x}}(t))}{{\mathbf {c}}{\mathbf {x}}(t)}\le 0\) can be replaced by \(v(t,{\mathbf {x}}(t))\mathrm {sign}(c{\mathbf {x}}(t))\le 0\).

  3. (c)

    When \({\mathbf {c}}{\mathbf {x}}(t)\ne 0\) and \(\frac{v(t,{\mathbf {x}}(t))}{{\mathbf {c}}{\mathbf {x}}(t)}\ge \frac{w_{max}}{|{\mathbf {c}}{\mathbf {x}}(t)|}\), the admissible minimum is attained at \(u(t)=\frac{w_{max}}{|{\mathbf {c}}{\mathbf {x}}(t)|}\). However, since:

    $$\begin{aligned} \frac{v(t,{\mathbf {x}}(t))}{{\mathbf {c}}{\mathbf {x}}(t)}\ge \frac{w_{max}}{|{\mathbf {c}}{\mathbf {x}}(t)|}&\;\rightarrow \; v(t,{\mathbf {x}}(t))\frac{|{\mathbf {c}}{\mathbf {x}}(t)|}{{\mathbf {c}}{\mathbf {x}}(t)}\ge w_{max}\\&\;\rightarrow \; v(t,{\mathbf {x}}(t))\mathrm {sign}(c{\mathbf {x}}(t)) \ge w_{max} \end{aligned}$$

    it is possible to replace \(\frac{v(t,{\mathbf {x}}(t))}{{\mathbf {c}}{\mathbf {x}}(t)}\ge \frac{w_{max}}{|{\mathbf {c}}{\mathbf {x}}(t)|}\) by \(v(t,{\mathbf {x}}(t))\mathrm {sign}(c{\mathbf {x}}(t))\ge w_{max}\).

  4. (d)

    When \({\mathbf {c}}{\mathbf {x}}(t)\ne 0\) and \(0<\frac{v(t,{\mathbf {x}}(t))}{{\mathbf {c}}{\mathbf {x}}(t)}<\frac{w_{max}}{|{\mathbf {c}}{\mathbf {x}}(t)|}\), the admissible minimum is attained at \(u(t)=\frac{v(t,{\mathbf {x}}(t))}{{\mathbf {c}}{\mathbf {x}}(t)}\).

The admissible minimizing \(\hat{u}(t,{\mathbf {x}}(t))\) that corresponds to the admissible minimizing u(t) is given by (21). \(\square \)

Theorem 5

Let \(({\mathbf {x}}_k,u_k)\) be a given process. If \({\mathbf {P}}_k\) satisfies:

$$\begin{aligned} \dot{\mathbf {P}}_k(t)&= - {\mathbf {P}}_k(t)\left( {\mathbf {A}}-u_k(t){\mathbf {b}}{\mathbf {c}}\right) - \left( {\mathbf {A}}-u_k(t){\mathbf {b}}{\mathbf {c}}\right) ^T{\mathbf {P}}_k(t) \nonumber \\&\quad -\, {\mathbf {Q}}- r u_k(t)^2 {\mathbf {c}}^T{\mathbf {c}} \end{aligned}$$
(24)

\(\forall t\in (0,t_f)\) and for \({\mathbf {P}}_k(t_f) = {\mathbf {0}}\), then \(s_k\) given in theorem 4 satisfies

$$\begin{aligned} s_k(t,{\mathbf {x}}_k(t),u_k(t))&= \max _{{\mathbf {x}}(t)\in \mathscr {X}(t)} s_k(t,{\mathbf {x}}(t),u_k(t)) \end{aligned}$$
(25)
$$\begin{aligned} s_{f,k}({\mathbf {x}}_k(t_f))&= \max _{{\mathbf {x}}(t_f)\in \mathscr {X}(t_f)} s_{f,k}({\mathbf {x}}(t_f)) \end{aligned}$$
(26)

Proof

Substituting of \(\dot{\mathbf {P}}_k(t)\) which satisfies (24) in (23), yields:

$$\begin{aligned} s_k(t,{\mathbf {x}}(t),u_k(t)) =&0.5{\mathbf {x}}(t)^T [{\mathbf {0}}]{\mathbf {x}}(t) = 0 \end{aligned}$$
(27)

Since \(s_k(t,{\mathbf {x}}(t),u_k(t)) = s_k(t,{\mathbf {x}}_k(t),u_k(t))\) it is obvious that

$$\begin{aligned} s_k(t,{\mathbf {x}}(t),u_k(t))) \le s_k(t,{\mathbf {x}}_k(t),u_k(t)) \end{aligned}$$

for all \({\mathbf {x}}(t)\). As \({\mathbf {P}}(t_f)={\mathbf {0}}\) and \(l_f=0\), it follows that

$$\begin{aligned} s_{f,k}({\mathbf {x}}(t_f)) = l_f({\mathbf {x}}(t_f))-\frac{1}{2} {\mathbf {x}}(t_f)^T{\mathbf {P}}_k(t_f){\mathbf {x}}(t_f) =0 \end{aligned}$$

for all \({\mathbf {x}}(t_f)\in \mathscr {X}(t_f)\). Hence \(s_{f,k}({\mathbf {x}}_k(t_f)) \ge s_{f,k}({\mathbf {x}}(t_f))\) for all \({\mathbf {x}}(t_f)\in \mathscr {X}(t_f)\). \(\square \)

Putting together the steps, described at the end of Sect. 3 and these two theorems, allows computation of two sequences: \(\{q_k\}\) and \(\{({\mathbf {x}}_k,u_k)\}\) such that the second one is an optimizing sequence. As J is non negative, it has an infimum and \(\{({\mathbf {x}}_k,u_k)\}\) converges to an optimum.

The method suggested here is quite different from those suggested in [6,7,8,9]. The latter solves the constrained optimal control by sequential solution of constrained parametric optimization problems. Each problem in this sequence is sub-optimal by means of the original problem but the sequence converges to the original. Such an approach allows for known parametric optimization programs to be used in the control synthesis.

However, here a different approach is used. First, the addressed problem has more specific structure than the general form used in [6,7,8,9]. That allowed the authors to exploit its properties to achieve simpler and stronger results. Second, the use of Krotov’s method allows for the control synthesis to be done without using approaches such as constraint transcription [8], since \(\hat{u}\) is assured to yield an admissible control force, as was proven by theorem 4. It leads to an algorithm whose convergence is monotonic, which is stronger than regular convergence. Though, it should be noted that even though monotonicity is guaranteed theoretically, practically it might be affected by the existence of numerical computation errors.

The corresponding algorithm is summarized in Fig. 1. Its output is an arbitrary approximation for \({\mathbf {P}}^*\), which defines the optimal control law Eq. (21). It should be noted that the use of absolute value in step (4) of the iteration stage is theoretically unnecessary. However, it is needed since numerical computation errors might cause the algorithm to lose J’s monotonicity when it gets closer to the optimum, as was discussed above.

Fig. 1
figure 1

CBBR–algorithm for successive control improvement

5 Numerical example

Consider the free vibrating mass-spring system shown in Fig. 2. It is composed of three bodies connected by three identical springs. The mass of each body is \(10^5\,\mathrm {kg}\) and the stiffness of each spring is \(3.6\times 10^6\,\mathrm {N/m}\). The control force is applied to the first body through a single semi-active actuator with \(w_{max}=4\times 10^5\,\mathrm {N}\). The state space equation is [24]:

$$\begin{aligned} \frac{\mathrm {d}\,}{\mathrm {d}\,t}\begin{bmatrix} z_1(t) \\ z_2(t) \\ z_3(t) \\ \dot{z}_1(t) \\ \dot{z}_2(t) \\ \dot{z}_3(t) \end{bmatrix} =&\begin{bmatrix} 0&\quad 0&\quad 0&\quad 1&\quad 0&\quad 0\\ 0&\quad 0&\quad 0&\quad 0&\quad 1&\quad 0\\ 0&\quad 0&\quad 0&\quad 0&\quad 0&\quad 1\\ -\,72&\quad 36&\quad 0&\quad 0&\quad 0&\quad 0\\ 36&\quad -\,72&\quad 36&\quad 0&\quad 0&\quad 0\\ 0&\quad 36&\quad -\,36&\quad 0&\quad 0&\quad 0 \end{bmatrix} \begin{bmatrix} z_1(t) \\ z_2(t) \\ z_3(t) \\ \dot{z}_1(t) \\ \dot{z}_2(t) \\ \dot{z}_3(t) \end{bmatrix} + \begin{bmatrix} 0 \\ 0 \\ 0 \\ 10^{-5} \\ 0 \\ 0\end{bmatrix} w(t) \end{aligned}$$
(28)

where \(z_2(0)=0.6\) and \(z_3(0)=-0.6\). As \(z_d=z_1\), the constraints takes the form (C1) \(w(t)\dot{z}_1(t)\le 0\); (C2) \(\dot{z}_1(t) =0 \rightarrow w(t)=0\); and (C3) \(|w(t)|\le 4\times 10^5\), for all \(t\in [0,8]\). The performance index is:

$$\begin{aligned} J({\mathbf {x}},w)&= \frac{1}{2}\int \limits _0^{t_f} z_1(t)^2 + (z_2(t)-z_1(t))^2 + (z_3(t)-z_2(t))^2 + 10^{-13} w(t)^2 \mathrm {d}\,t \end{aligned}$$

All computations were carried out using original routines written in MATLAB.

Fig. 2
figure 2

The evaluated model

The CBBR control design was carried out using the algorithm presented in Fig. 1. 19 iterations were carried out. Performance index values per iteration are given in Table 1 and illustrated in Fig. 3. It can be seen that, from practical view point, convergence has occured after two iterations where the major improvement was achieved due to the first one. For iterations 1–5, the changes in J are monotonic. However, starting from the 6th iteration, the changes in J lose monotonicity, though their magnitudes kept on reducing, except for the 8th iteration, where it was increased. The magnitude of the changes and the properties of Krotov’s method implies that this non-monotonicity is a numerical issue.

Table 1 Peformance index values for each process. Here \(\varDelta J_i = J_i-J_{i-1}\)
Fig. 3
figure 3

Peformance index values for each iteration

Fig. 4
figure 4

Displacements in \(z_3\)

The displacements in \(z_3\) for the uncontrolled plant and the CBBR controlled plant are presented in Fig. 4. While the plant is unstable, this figure implies that the controlled response stabilizes after approximately 8 s. The control force is depicted in Fig. 5. It shows that w has some zero intervals. These intervals occurred due to constraint C1, which does not allow the control force to resist the damper velocity. Additionally, during most of the control duration, the signal is very similar to that of signals derived by the bang-bang control approach. This is because for the first 5 s the response intensity required a large control force, which is not allowed due to the bounds on w, hence saturation was reached.

Fig. 5
figure 5

Control force w

Figure 6 illustrates the correspondence between the actuator’s control force and velocity, that is each pair \((\dot{z}_d(t),w(t))\) is represented by a point on the graph. There are no points in the 1st and 3rd quarter of the \(\dot{z}-w\) plane. This implies that the force is always opposed to the actuator velocity. It can also be seen that the control force magnitude does not exceed \(\pm w_{max}\). Hence, constraints C1 and C3 from definition 1 are satisfied. The control signal, u, is given in Fig. 7. Its discontinuous form is clearly evident from the plot.

Fig. 6
figure 6

Control force w versus actuator velocity \(\dot{z}_d\)

Fig. 7
figure 7

CBBR control signal

6 Conclusions

In this study a constrained optimal control problem was formulated and fully solved. Namely, an optimal controller synthesis for a CLQR problem which is defined for a single control input with semi-active constraints and control bounds, was found.

As a first step, the CLQR problem was reformulated as an equivalent constrained bilinear biquadratic optimal control problem (CBBR). The solution of the CBBR problem used Krotov’s method. For the convenience of the readers, the corresponding parts from Krotov’s theory were given. A sequence of improving functions, which is suitable to the CBBR problem, was constructed and the corresponding successive algorithm was derived. The formulated optimal controller is in a feedback form.

The main novelty in this study is the formulation of the sequence of improving functions that suits the addressed problem and allows for Krotov’s method to be used for its solution. It enabled the solution of the CBBR and CLQR problems. The required computational steps were arranged as an algorithm and proof outlines for the convergence and optimality of the solution were given. The efficiency of the suggested method was demonstrated by numerical example.