1 Introduction

Embedded optimization, according to the definition in [27], is solving optimization problems autonomously and with limited resources. The topic of this article is embedded optimal control, an important class of methods within embedded optimization. It focuses on calculating optimal decisions in order to control a dynamic system as the state changes. There exist many algorithms for embedded optimal control and quite a few were successfully applied to real-time and embedded applications such as robotic trajectory optimization [83], autonomous driving [62] and drones [46]. For an overview on embedded optimization, we refer the reader to [27]. One of the most popular approaches in embedded optimal control nowadays is model predictive control  (MPC) [43, 65, 81]. It is based on predicting the future behavior of a system and using this information to optimize for the action at the current time step. In linear MPC (also called linear-quadratic MPC (LQMPC)), the constraints, including the dynamic model, are affine and the objective is quadratic. With nonlinear MPC (NMPC), some or all of the constraints and objective are nonlinear functions.Footnote 1 A related problem is that of moving horizon estimation (MHE) for estimating states and parameters online. We will present a unified notation of NMPC and MHE problems in Sect. 2.

Historically, MPC was primarily applied to systems with long timescales, most notably in chemical processing [74], due to the fact that during each time step, a computationally costly optimization problem has to be solved. More recent algorithmic developments [9, 21, 61, 71] and increasingly powerful embedded hardware render MPC real-time feasible for applications with shorter timescales such as autonomous driving, robotics, and avionics, as in some of the references cited above. A big role in bringing MPC to real-time applications is played by the implementation of efficient embedded optimal control methods, giving rise to software packages such as MPT [47] for explicit MPC, qpOASES [28], an active-set solver for quadratic programming (QP), FORCES [24, 96], an interior-point solver for quadratically constrained quadratic programming (QCQP) and nonlinear programs with optimal control structure, and the ACADO Code Generation tool [49] for tailored sequential quadratic programming (SQP) based NMPC solvers. Other examples of nonlinear embedded optimization packages are VIATOC [50], GRAMPC [26] and FalcOpt [88], to which we compare acados in Sect. 5. A non-exhaustive list of embedded optimization software packages can be found in Table 1.

Ultimately, embedded software should run on a dedicated hardware platform. In this paper, we focus solely on central processing units (CPU) as the algorithms presented typically don’t profit as much from the parallelization capabilities of massively parallel hardware platforms such as graphical processing units (GPU), tensor processing units (TPU) or field programmable gate arrays (FPGA). Please note that we don’t pin ourselves down to a specific CPU architecture: acados has been shown to work with x86, x86_64, ARMv7A, ARMv8A and PowerPC architectures.

One challenge in developing software for embedded optimal control lies in the trade-off between flexibility, memory usage and speed. Many of the software packages mentioned in Table 1 are based on automatic code generation. One reason for that is to have self-contained efficient linear algebra routines. Often however, the size of the problem and the choice of algorithms are then fixed for one specific optimal control instance, inducing a loss of flexibility. In some cases, a compiler or a human could generate faster or more memory-efficient code. For example, for linear algebra operations, the recently developed high-performance linear algebra package BLASFEO [35] outperforms code-generated triple-loop linear algebra routines and state of the art BLAS and LAPACK implementations for the moderate matrix size typical for embedded optimization applications. Since the linear algebra operations typically comprise most of the computational complexity of the algorithms considered, we can use standard compiler optimizations for the code surrounding the linear algebra routines without noticeably sacrificing performance (see Sect. 5). By basing acados on BLASFEO, we exchange self-containedness for performance and flexibility. We believe this is a better trade-off.

Table 1 Software packages for embedded optimization

Another challenge for embedded optimal control software is related to the process of software development. Often, to not sacrifice speed of execution and/or memory footprint, embedded optimal control software uses global data and suffers from tight coupling between algorithmic components. This might lead to a codebase that is difficult to understand, maintain and extend. We choose, as opposed to some other embedded optimal control software packages, to avoid these pitfalls by not unnecessarily sacrificing maintainability and readability of the codebase for a small gain in performance and/or a reduction of memory footprint. We try to achieve this goal by organizing the code in a modular fashion, with formal interfaces between the different algorithmic components, as described in Sect. 4. This allows for a straightforward way of interchanging solvers, routines, and libraries needed for the embedded control algorithm.

A final aspect of embedded optimal control software that affects flexibility, memory and runtime is the choice of modeling language and corresponding automatic differentiation tool. Several modeling languages exist, e.g., Mathematica, sympy or the MATLAB Symbolic Toolbox. Many of these languages make use of expression trees to represent mathematical functions, which potentially leads to a large code size, high memory usage and slow evaluation of higher-order derivatives for non-trivial models. On the contrary, the CasADi [4] modeling language is based on expression graphs. This often leads to shorter instruction sequences and to smaller, typically faster code, which makes it more suitable for embedded applications. Also, it is free and open-source software. For these reasons, we favor CasADi for modeling nonlinear functions and differential-algebraic equations. Additionally, acados supports the use of hand-written or code-generated dynamic models as C source files.

In summary, the contribution of this paper is a new software package for embedded optimal control, called acados. It offers the following main features:

  • Efficient optimal control algorithms implemented in C,

  • Modular architecture enabling rapid prototyping of solution algorithms,

  • Interfaces to Python and Matlab,

  • High-performance linear algebra based on BLASFEO [35],

  • Compatible with CasADi expressions [4],

  • Deployable on a variety of embedded devices,

  • Publicly available as permissively licensed free and open-source software.

The remainder of the paper is organized as follows: in Sect. 2, we review some important elements of nonlinear embedded optimization algorithms relevant to acados. We discuss recent advances in embedded optimization algorithms that motivate the development of acados in Sect. 3. The software package acados itself is introduced in Sect. 4. Various numerical experiments including hardware-in-the-loop simulations as well as comparisons to other embedded optimization packages are presented in Sect. 5, and the paper is concluded in Sect. 6.

2 Algorithmic ingredients for embedded nonlinear optimal control

In this section, we first introduce the problems that typically have to be solved for NMPC and MHE. We introduce a general formulation, that can facilitate both multiple-shooting discretized NMPC and MHE problems. Subsequently, we describe the algorithmic incredients of SQP-type methods as implemented in acados for the general problem.

2.1 Nonlinear optimal control

Let us first regard the continuous-time nonlinear optimal control problem (OCP) of the form

figure a

In this notation, \(x: \mathbb {R}\rightarrow \mathbb {R}^{n_x}\) denotes the differential states, \(z: \mathbb {R}\rightarrow \mathbb {R}^{n_z}\) are the algebraic variables and \(u: \mathbb {R}\rightarrow \mathbb {R}^{n_u}\) denotes the control inputs. Furthermore, we use \(\ell : \mathbb {R}^{n_x} \times \mathbb {R}^{n_z} \times \mathbb {R}^{n_u} \rightarrow \mathbb {R}\) for the Lagrange term or running cost, \(M: \mathbb {R}^{n_x} \rightarrow \mathbb {R}\) for the Mayer term or terminal cost. The dynamics are modeled with a set of implicit differential-algebraic equations (DAE) with right-hand-side \(f: \mathbb {R}^{n_x} \times \mathbb {R}^{n_x} \times \mathbb {R}^{n_z} \times \mathbb {R}^{n_u} \rightarrow \mathbb {R}^{n_x} \times \mathbb {R}^{n_z}\). In the remainder of the paper, we assume the implicit DAE to be of index 1, i.e. \(\partial f / (\partial {\dot{x}}, \partial z)\) is invertible. The nonlinear path constraints are given by \(g: \mathbb {R}^{n_x} \times \mathbb {R}^{n_z} \times \mathbb {R}^{n_u} \rightarrow \mathbb {R}^{n_g}\), and the initial value of the states is \({\overline{x}}_0 \in \mathbb {R}^{n_x}\). We consider the horizon length T to be fixed.

2.2 Multiple shooting discretization

In acados, we discretize nonlinear optimal control problems with a multiple shooting approach [13]. We introduce a time grid \([t_0, t_1, \ldots , t_N]\) with \(t_k < t_{k+1}, k=0,\ldots ,N-1\), discrete state variables \(x_0, \ldots , x_N\), algebraic variables \(z_0, \ldots , z_{N-1}\) and controls \(u_0, \ldots , u_{N-1}\). For the control trajectory, we choose a piecewise constant control parametrization. On each time interval \( \left[ t_k, t_{k+1} \right) \), we can then write the result of the numerical simulation routine as

$$\begin{aligned} \begin{bmatrix} x_{k+1} \\ z_k \end{bmatrix} = \phi _k(x_k, u_k) ,\quad k=0,\ldots ,N-1. \end{aligned}$$

where the separate components will be denoted by \(\phi _k^x(x_k, u_k)\) and \(\phi _k^z(x_k, u_k)\).

The multiple shooting approach can often lead to better convergence behavior compared to single shooting, where the simulation and optimization is performed sequentially, as shown in [10]. The resulting nonlinear programming (NLP) formulation looks as follows:

figure b

The optimal control formulation above is not the most general one that acados can handle: among others moving horizon estimation (MHE) problems, constraint relaxation via slack variables and equality constraints are supported as well. The next section presents a more general class of optimization problems which is handled by acados.

2.3 General nonlinear optimal control structured optimization problem

In order to facilitate many different OCP formulations occurring in practice, acados uses the following general formulation of nonlinear optimal control structured optimization problems:

figure c

Here, the initial state constraint from (1b) is contained in the inequality constraint (3c) for \( k = 0 \).

Note that this general formulation also includes slack variables \( s_k \) which could alternatively be formulated as control inputs \( u_k \). However, the slack variables \( s_k \) do not enter the system dynamics (3b) and can only enter the cost linearly and quadratically, i.e. via the functions \( \rho _k(\cdot ) \) of the form

$$\begin{aligned} \rho _k(s_k) = \sum _{i=1}^{n_{s_k}} \alpha _k^i s_k^i + \beta _k^i {s_k^i}^2, \end{aligned}$$

with \( \alpha _k^i\in \mathbb {R}\), \( \beta _k^i> 0 \). These properties motivate the separation between slack and control variables and allow for an efficient treatment in solution methods as implemented in acados. The slack variables can be used to formulate soft constraints or model piecewise quadratic possibly assymetric cost functions among others. Soft constraints are often useful in practice, for example to deal with constraint violations due to model-plant mismatch or disturbances that would yield infeasibility of (2).

Moreover, acados is able to handle the most common types of cost and constraint functions in a tailored fashion. Regarding the cost functions \( l_k(\cdot ) \), \( M(\cdot ) \), acados is capable of exploiting the structure of the widely used linear and nonlinear least-squares functions, while also being able to handle general nonlinear functions.

Within the constraint functions \( g_k(\cdot ) \), acados is able to exploit the most common constraint types, which are simply bounds on \( x_k \) and \( u_k \), linear constraints in \( x_k \) and \( u_k \), such that only truly nonlinear constraints have to be formulated and treated as such, and one could write \( g_k(\cdot ) \) as

$$\begin{aligned} g_k(x_k,z_k,u_k) = \begin{bmatrix} J_{\text {bx}{,k}} x_k \\ J_{\text {bu},{k}} u_k \\ C_{\mathrm {x},k} x_k + C_{\mathrm {u},k} u_k\\ g_\text {nonl}(x_k,z_k,u_k) \end{bmatrix}, \end{aligned}$$
(4)

where \( J_{\text {bx}{,k}} \), \( J_{\text {bu}{,k}} \) are selection matrices that determine which components of \( x_k \), \( u_k \) have simple bounds. Additionally, it is common to have upper and lower bounds on a constraint component of \( g_k \), which allows for a more efficient treatment of these constraints within acados.

The above NLP (3) could be solved by any general-purpose NLP solver, like IPOPT [94]. The current scope of acados, however, encompasses efficient embedded optimal control methods for solving such structured NLPs, since these are better suited in a real-time and/or embedded setting [22]. Sequential Quadratic Programming (SQP) is an example of an embeddable optimization method and will be discussed in the following section.

2.4 Sequential quadratic programming and real-time iterations

We briefly present the structure of an SQP algorithm as implemented in acados. At the very least, an embedded SQP algorithm should feature:

  • Numerical integration of the continuous-time dynamics,

  • Generation of first-order and possibly second-order sensitivities of objective and constraints,

  • A procedure for approximating the Hessian matrix,

  • An efficient QP solver (typically developed separately).

Note that globalization strategies such as line search or trust regions are considered out of scope for this paper, because globalization is not always recommended in a real-time setting, since runtime bounds can not be established in general [21], and the initializations are typically close to the exact solution [41]. Possibly extending acados with globalization strategies such as merit functions [59] and filter based line search methods [93] for use in a fully converged setting is subject of future work.

In summary, the SQP algorithm in acados looks as follows:

$$\begin{aligned} w^{[i+1]}&= w^{[i]} + \varDelta w^\mathrm {QP}, \qquad&i=0,1,\ldots , \\ \pi ^{[i+1]}&= \pi ^\mathrm {QP}, \qquad&i=0,1,\ldots , \\ \lambda ^{[i+1]}&= \lambda ^\mathrm {QP}, \qquad&i=0,1,\ldots , \\ \end{aligned}$$

where \(w^{[i]}= [x_0^{[i]\top }, u_0^{[i]\top }, \ldots , u_{N-1}^{[i]\top }, x_N^{[i]\top }]^\top \) is the primal iterate at SQP iteration i, \(\pi ^{i}\) and \(\lambda ^{i}\) are the dual iterates, readily available from the QP solution. Note that the algebraic variables have been eliminated from the OCP, but numerical approximations of these values are accessible from the numerical integration routine. The step \(\varDelta w^\mathrm {QP}\) is computed by solving the following QP:

figure d

Above, we used the shorthands \( {\bar{\phi }}^x_k, \, {\bar{g}}_k,\, k=0,\ldots , N-1\) and \({\bar{g}}_N\) to denote the function evaluations \(\phi ^x_k(x_k, u_k), \,g_k(x_k, z_k, u_k)\) and \(g_N(x_N)\), respectively. We formulate the NLP in such a way that the slack variables appear directly as tailored optimization variables. We note that some, but not all QP solvers can deal with slack variables directly. For those that do not, slack variables are reformulated as extra input variables.

In the remainder of this section, we discuss each of the ingredients of an efficient SQP solver, described above, separately. Generation of sensitivities using numerical integration will be treated in Sect. 2.5, Hessian approximation in Sect. 2.6, structure-exploiting QP solvers in Sect. 2.7 and real-time considerations in Sect. 2.8.

2.5 Numerical simulation and sensitivities

An important part of the implementation of direct shooting methods for optimal control consists of reliably and efficiently computing numerical simulation and sensitivity results for the nonlinear system of differential-algebraic equations that represents a dynamic model for our particular system of interest.

Within the family of single-step integration methods one typically distinguishes between explicit and implicit schemes [44]. Well-known examples of explicit integration schemes include explicit Runge-Kutta (RK) formulas such as explicit Euler and the RK method of order 4. Explicit integration schemes are easy to implement since they rely on a direct combination of explicit evaluations of the right-hand side of the system dynamics. Instead, implicit integration schemes result in a nonlinear system of equations that implicitly defines the numerical simulation result. Unlike explicit integration methods, the nonlinear system in implicit integration schemes generally needs to be solved numerically using an iterative procedure such as a Newton-type method. However, implicit formulas are very popular in practice because of their improved numerical stability properties and higher order of accuracy. Especially in case of stiff dynamical systems and implicit or differential-algebraic equations, an implicit integration scheme should often be used [45].

When using these numerical integration schemes within direct multiple shooting, one additionally needs a computationally efficient and reliable way of computing first (and possibly second) order derivatives of the simulation results with respect to the state and control input values:

$$\begin{aligned} \dfrac{\partial \phi ^x_k(x_k, u_k)}{\partial x_k}, \quad \dfrac{\partial \phi ^x_k(x_k, u_k)}{\partial u_k}, \quad \sum _{i=1}^{n_x}\pi _{k,i} \dfrac{\partial ^2 \phi ^x_{k,i}(x_k, u_k)}{\partial ^2 (x_k, u_k)}, \end{aligned}$$
(6)

where \(\pi _k \in \mathbb {R}^{n_x}\) is called the seed vector, for which the Lagrange multipliers are used to compute the exact Hessian of the Lagrangian. Sensitivity propagation for direct optimal control methods is typically based on a discretize-then-differentiate type of approach such as internal numerical differentiation (IND) in [11]. For the class of explicit integration methods, this concept leads to a forward or backward sensitivity propagation based on algorithmic differentiation (AD) techniques [40]. In case of an implicit integration scheme, the IND approach either results in iterative differentiation techniques or a direct computation of sensitivities based on the implicit function theorem [1]. In addition, forward-backward propagation schemes can be derived to compute the symmetric Hessian contributions [78].

Recent work in [76, 77] proposed an algorithmic approach to embed implicit integration schemes with sensitivity analysis in Newton-type optimization for direct optimal control without the need for any iterative procedure, based on the concepts of numerical condensing and expansion in a lifted Newton-type optimization method [2].

2.6 Convex Hessian approximation methods

Gauss-Newton Hessian approximation. In the case of a least-squares objective in (3), e.g. when tracking a reference, we have

$$\begin{aligned} l_k(x_k, u_k)&= \Vert r_k(x_k, u_k)\Vert _2^2, \quad k=0,\ldots ,N-1 \\ M(x_N)&= \Vert r_N(x_N)\Vert _2^2, \end{aligned}$$

with \(r: \mathbb {R}^{n_x} \times \mathbb {R}^{n_u} \rightarrow \mathbb {R}^{n_{\mathrm {r}_k}}\), \(r_N: \mathbb {R}^{n_x} \rightarrow \mathbb {R}^{n_{\mathrm {r}_N}}\). Notice that this kind of residual function is a common case in embedded optimization.

The Gauss-Newton Hessian approximation amounts to linearizing "between the norm signs". Since no second-order sensitivities are necessary, the Gauss-Newton Hessian approximation offers a competitive alternative to SQP with exact Hessians, although it converges linearly only. We remark that for a quadratic objective function in (3), the same quadratic objective arises in (5), and no additional computations are needed. For more details on Gauss-Newton methods in the context of NMPC, we refer the reader to [42].

Sequential Convex Quadratic Programming (SCQP).

A generalization to using SQP with a Gauss-Newton Hessian approximation is sequential convex quadratic programming [90]. In a sense, a Gauss-Newton SQP algorithm neglects any curvature present in the inequality constraints by linearizing them. In practice however, convex-over-nonlinear objectives and/or constraints arise often, which are of the form \(\varphi (c(x, u))\) with a convex function \(\varphi (\cdot )\) and a nonlinear function \(c(\cdot )\). Examples include ellipsoidal terminal constraints to ensure stability of an NMPC scheme, the friction ellipse in automotive applications, or tunnel-following for robotic manipulators.

In SCQP, we still linearize the inequalities, but bring the convex contributions from the inequality constraints, multiplied with a Lagrange multiplier, into the Hessian approximation. Doing so, the SCQP Hessian is guaranteed to be positive semi-definite. For problems that feature convex-over-nonlinear constraints, this Hessian contribution offers better convergence guarantees than a Gauss-Newton Hessian [90].

Structure-preserving convexification with minimal regularization. In the last two paragraphs, we devised two Hessian approximations which are convex by construction. However, when the exact Hessian of the Lagrangian is used, it might be indefinite. When this happens, the optimal direction \(\varDelta w_\mathrm {QP}\) cannot be guaranteed to be a descent direction. Furthermore, many QP solver codes expect a positive (semi-)definite Hessian, even if the second order sufficient conditions for optimality are met. The aim of regularization is to obtain an approximation \({\widetilde{H}} = \mathrm {blkdiag}({\widetilde{H}}_0,\ldots ,{\widetilde{H}}_N)\) with each \({\widetilde{H}}_k \succ 0\). We very briefly discuss three different methods here and compare their convergence in an SQP-type setting in Sect. 5.

Let \(V_k D_k V_k^{-1}\) be the eigenvalue decomposition of \(H_k\), for \(k=0,\ldots ,N\). Two simple ways of regularizing the Hessian blocks are

$$\begin{aligned} {\widetilde{H}}_k&= \mathrm {project}(H_k,\epsilon ) := V_k \, \big [\text {maxdiag}(\epsilon , D_k) \big ] \, V_k^{-1}, \end{aligned}$$
(7a)
$$\begin{aligned} {\widetilde{H}}_k&= \mathrm {mirror}(H_k,\epsilon ) := V_k \, \big [\text {maxdiag}(\epsilon , \mathrm {abs}(D_k)) \big ] \, V_k^{-1}, \end{aligned}$$
(7b)

with \(\epsilon \) small, \( \mathrm {abs}(\cdot )\) defined elementwise and \( \text {maxdiag}(\cdot , \cdot ) \) selecting the maximum values on the diagonal while not changing the off-diagonal elements.

Another approach is to exploit the optimal control problem structure of (5). One approach of this kind is called convexification and has been proposed in [92]. The difference with more naive regularization methods as stated above, is that it first exploits as much convexity as possible through the optimal control structure, before regularizing the remaining negative directions. The complexity of the regularization method is linear in the horizon length, and under some conditions, the SQP iterates converge quadratically to a local solution. An efficient implementation of this new Hessian regularization method is included in acados. A numerical example is given in Sect. 5, which shows the superior convergence behavior of convexification with respect to the mirror and project regularization techniques.

Further convex Hessian approximations Other Hessian approximations exist in the literature, such as BFGS and its limited-memory variant (see [69]). They might be added to acados in the future.

2.7 Structure-exploiting embedded QP solvers

There exist different solution strategies for QP (5), which we briefly describe in this section. We note that linear-quadratic optimal control problems can be efficiently solved with the same kind of QP solving strategies presented. As such, acados, conceived as a modular software package, can also be used to facilitate solving linear-quadratic QPs, arising in linear-quadratic MPC.

In the following subsections, we give an overview on sparsity exploitation for optimal control structred QPs.

2.7.1 Sparse approach

OCP (5) can be solved directly by using a general-purpose sparse QP solver, e.g., CVXGEN [67], OOQP [38], both primal-dual interior-point solvers, or OSQP [87], a first-order method. First-order methods are mainly based on either the fast gradient method or the alternating direction method of multipliers (ADMM). The strict real-time requirements for solution methods make first-order methods a viable candidate. However, they might suffer from slow convergence rates. For an overview of first-order methods in the context of embedded optimal control, see [27, 56].

2.7.2 Structured approach

OCP (5) is solved by exploiting its multi-stage structure, but dense linear algebra is used. An example is the approach from [80, 86] and in solvers like FORCES [24, 96], HPMPC [37] and HPIPM [34], which are all interior-point solvers.

2.7.3 Condensing approach

An alternative to the previous approaches is the so-called condensing approach [13]. By eliminating the state variables by means of the dynamic equality constraints in (5), we obtain a smaller QP with only the control inputs and possibly the initial state as optimization variables. Any general-purpose dense QP solver can then be used to solve the smaller QP, e.g. an active-set solver like qpOASES [28] or an interior-point method like the dense variant in HPIPM. Since qpOASES is able to reuse information (i.e. warm-starting) from one problem to the next, it is particularly well-suited for (N)MPC. Condensing is shown to be of quadratic complexity in the horizon length [33].

2.7.4 Partial condensing

A mix between the two previous approaches (i.e. structured, condensing) for solving (5) can be obtained by not eliminating all state variables, but only per blocks of \(N / N_2\) stages (we assume for simplicity that N is an integer multiple of \(N_2\)), where \(N_2\) is the ‘new’ horizon length of the partially condensed problem. By this additional degree of freedom, partial condensing enables us to find a better trade-off between horizon length and number of optimization variables, for a given problem. For more details on partial condensing, the reader is referred to [6].

2.7.5 Further condensing strategies

There are further condensing strategies in the literature, which might be added to acados in the future. A method called complementary condensing was proposed in [54], where the KKT system is solved in the space of the multipliers corresponding to the equality constraints. This approach favorable if the problem has more control inputs than states.

2.7.6 Dual-Newton strategy

The dual Newton strategy is an algorithm that is based on dual decomposition tailored to linear-quadratic OCPs in the form of (5), with an open-source implementation in the software qpDUNES [30].

A main advantage of the dual Newton strategy, as in most active set methods, is warm-starting. Contrary to qpOASES, qpDUNES can perform multiple active set changes per iteration. However, a premature termination of the algorithm does not return a meaningful solution as in the case of qpOASES since it is not feasible nor optimal for a neighboring problem. As recently observed in [57], the convergence of qpDUNES can benefit significantly from a partial condensing preprocessing step.

2.8 Real-time iterations

In a real-time control setting, we solve NLP (3) in sequence and under stringent time conditions. Since the environment is anyway changing continuously, it is often sufficient to solve it approximately—it is of no use to have a high-accuracy but past-the-deadline solution.

One such online method is the real-time iteration (RTI) scheme [21]. It solves an inequality-constrained QP in each iteration. The resulting generalized predictor is better suited for predictions across active set changes, than e.g. a tangential predictor obtained from an interior-point method. For a brief overview, we refer the reader to [22].

In each RTI, one full iteration of an SQP-type scheme is performed, including generation of the sensitivities w.r.t. all variables. We could introduce additional approximations to save on computation time by not updating all sensitivities in each RTI [98]. Such approximations exist on different levels: from only updating the initial state constraint with the current estimate of the state of the system, over updating the right-hand side of the (in)equality constraints, to the full RTI. By interleaving different approximations at different sample times, we obtain a multi-level iterations scheme, as introduced by [12].

One important aspect of RTI is initialization. Subsequent RTIs can profit from initialization strategies by warm-starting the algorithm with a (possibly shifted) solution from the previous problem. Recently, an improved and cheap initialization strategy was proposed: Advanced Step Real Time Iteration (AS-RTI) [70]. acados was used for its implementation.

3 Algorithm implementations in acados

In this section, we focus on the algorithm implementations in acados. Since acados builds on other software to handle basic linear algebra operations (BLASFEO [35]) and QPs (HPIPM [34]), these will be presented first. Afterwards, a short description of integrators and SQP-type optimization solvers will be given.

3.1 Linear algebra: BLASFEO

At the heart of all embedded optimization routines lies either an implementation of a small set of linear algebra routines (e.g. matrix-matrix multiplication, Cholesky decomposition), or a call to a specialized linear algebra library (e.g. BLAS and LAPACK). Generally BLAS implementations focus on performance for large dense matrices, as used in high-performance computing and data science. Considerably less investigated are BLAS and LAPACK implementations for small dense matrices.

Often, the linear algebra code in embedded optimization packages, for example in the ACADO Code Generation tool or Forces Pro, is code-generated. For very small matrix sizes (e.g. \(4 \times 4\)), this technique outperforms optimized linear algebra libraries. Furthermore, code-generation has the advantage that the code can be kept ‘library-free’. However, for larger matrix sizes (e.g. in the range \(10 \times 10\) to \(100 \times 100\), typical in MPC applications), code-generated linear algebra routines underperform with respect to optimized libraries.

BLASFEO [35] is a linear algebra package that targets computations for small matrices. It offers highly optimized linear algebra routines (e.g. dgemm, dsyrk, dpotrf), tailored for the matrix sizes typically encountered in embedded optimization. These routines exploit architecture-specific vector instructions for floating point operations (e.g. AVX), and focus on performance for matrices fitting in cache.

Furthermore, BLASFEO defines a packed matrix format (called panel major) which optimizes the cache usage, guaranteeing close to peak performance for matrices of sizes up to a couple hundreds. All high-performance BLASFEO routines use this panel major matrix format, and there is a rich set of auxiliary routines to operate on this matrix format, as well as to convert from/to column- or row-major formats. In this sense, BLASFEO provides a complete linear algebra framework, which can be used to implement many fast optimization algorithms.

Except for trivially small matrices, BLASFEO enables a considerable speed-up (up to \(10\times \) for some matrix sizes) in the matrix computations, compared to code-generated linear algebra kernels. For all small matrix sizes up to, say, \(300 \times 300\), BLASFEO offers a considerable speedup compared to state-of-the-art BLAS implementations, like OpenBLAS, too. The use of BLASFEO is one of the factors why acados performs better than ACADO on medium-scale problems, as we will see in Sect. 5.

3.2 Quadratic programming: HPIPM

In the implementation of SQP-type algorithms, QP sub-problems need to be solved efficiently at each iteration. The QP sub-problem solution is typically one of the two most expensive steps in SQP schemes, the other being the simulation and sensitivity computations of dynamical systems.

HPIPM [34] is a library defining three QP types (dense QP, OCP QP and tree-structured OCP QP, all handling soft constraints via tailored slack variables), and a rich set of routines to create, manage and solve the QPs. All QP solvers are Mehrotra’s type primal-dual interior point methods, and they are implemented using the BLASFEO linear algebra framework.

Furthermore, HPIPM provides a set of efficiently implemented routines to convert between the different QP types, e.g. condensing routines convert an OCP QP into a dense QP, partial condensing routines convert an OCP QP into another OCP QP with shorter horizon length. Additional expansion routines recover the original QP solution from the (partially) condensed one.

In acados, the QP framework is based on HPIPM, in the sense that HPIPM provides both the dense QP and OCP QP definitions, as well as (partial) condensing algorithms to convert them and interior-point methods to solve them. Numerous other QP solvers are then interfaced to acados to alternatively solve the same types of QP problems. At the time of writing, additional interfaces exist to HPMPC, qpDUNES, qpOASES, and OSQP.

3.3 Algorithmic differentiation

The main difference between software packages for linear quadratic MPC and nonlinear MPC is that NMPC packages need a modeling language for the nonlinear functions. We support CasADi, a graph-based source transformation algorithmic differentiation tool [4]. In our workflow, a user would typically specify the dynamic continuous-time models and nonlinear constraint functions with CasADi in a high-level language such as Matlab or Python. In combination with CasADi’s code generator we obtain fast, embeddable code, see Sect. 4.3.

3.4 Numerical simulation

acados features different kinds of numerical simulation routines. There are implementations of explicit and implicit Runge-Kutta integrators available, both of which support the optional propagation of first-order forward and adjoint sensitivities, as well as second-order sensitivities. The explicit integrators can be used with explicit ODE models and supports different Butcher tableaus, including Euler’s method and RK4. Moreover, the implicit integrators can be used with an index-1 differential-algebraic equation (DAE) or implicit ODE model and use the Gauss-Legendre Butcher tableaus. A novel implementation for lifted collocation integrators [77] has been made part of acados, as well as a recently proposed structure-exploiting IRK algorithm, the so-called GNSF-IRK scheme [32], discussed next.

The concept of GNSF-IRK is to rigorously exploit the linear dependencies within the dynamic system. It extends the ideas of the linear input and linear output subsystems that have been implemented within the ACADO Code Generation tool [75] and uses a more flexible structured dynamic system formulation that can also handle index-1 DAEs. A main challenge for structure-exploiting integrators is to appropriately reformulate the dynamic system of interest into the desired structured form. acados features an automatic transcription method for the GNSF structure [32], implemented as a Matlab function for CasADi models.

A last important feature of acados is that integrators can vary from stage to stage, with e.g. different state and control dimensions, different integration step length, or different integration schemes.

We note that all integrator modules, except for the ERK integrator, are based on hardware tailored linear algebra routines in BLASFEO to speed up the LU factorizations and the corresponding triangular system solutions, as discussed in [36].

3.5 SQP-type methods

For nonlinear programming, acados offers different SQP-like methods. A full-step SQP method is available, with different algorithmic options. As Hessian approximations, we have Gauss-Newton Hessians, SCQP, and exact Hessians with regularization/convexification as discussed in Sect. 2.8, and we allow for user-defined Hessian approximations. For use in an online setting, e.g. in NMPC, a specialized RTI routine is available.

Both the SCQP algorithm and the convexification method of Sect. 2.6 are novel features, to the authors’ knowledge, not present in any other NMPC software packages.

4 The acados software package

acados implements some of the optimization methods mentioned in the previous sections. acados is meant to be user-friendly at a high level, and efficient at a low level. In order to balance these properties, we developed a core library written in C which exposes functionality to the Python and Matlab interfaces. In this section, we first discuss the functionality of this inner core module, we then describe internal and external interfaces that are crucial for usability.

4.1 The acados core library

The embedded optimization algorithms discussed in Sect. 2 are implemented in acados in a modular fashion. For example, there is a clear interface between an NLP solver and an integrator. The integrator expects a linearization point \(w^{[i]}\) and returns the end state of a simulated trajectory, and optionally first- and second-order sensitivities:

figure e

Similar diagrams can be drawn for all other algorithmic components, including (partial) condensing, QP solvers, function evaluations etc. Each of these algorithmic components are modeled within acados as separate modules. Some modules can be used as standalone modules, or in combination with others. For instance, depending on the choice of algorithm, an NLP solver will make use of some or all of the other modules. In Table 2, we see an overview of all modules currently present in acados, together with the implemented algorithmic variants.

Table 2 Overview of the software modules present in acados

It is an important design choice that all modules are identical in their signature. That way, all modules look similar to the users of acados. For developers, it should be straightforward to extend acados with another module. The signature is as follows (in C syntax):

figure f

Here, <module> stands for the name of the module at hand, for example ocp_qp for QP problems with optimal control structure or sim for integration problems, and <solver> is a placeholder for a function implementing the specific solver for problems corresponding to this module, e.g., ocp_qp_hpipm (interface to HPIPM solver) or sim_erk (explicit Runge-Kutta method), etc. Each module returns an int which denotes a solver-specific error status – zero means successful completion by convention. All input arguments are pointers. Each of the arguments comes with a set of helper functions, called ..._calculate_size, computing the size (in bytes) of the struct pointed to, as well as a set of functions, called ..._assign, to initialize a block of memory.

Some modules comprise other modules. For example, an SQP solver for optimal control problems might need an integrator, which is on its own a proper acados module. In this context, we call the integrator a submodule. Each of the arguments above, dims, in, etc., have fields corresponding to submodules. As an example, the relation between an NLP solver and its submodules is depicted in Fig. 1. We remark that the calculation of the memory size of a module with submodules is done recursively, i.e., calling the calculate_size function on the top module returns the required memory size of the top module and all of its submodules, and submodules of submodules, etc. This allows users to allocate all the memory outside of acados, by design.

Fig. 1
figure 1

Example of the relation between modules and submodules in acados for a specific case of a possible SQP algorithm

The core library of acados contains mostly what has been described in this section: a collection of modules, each with corresponding data types and variants of solvers, as well as helper functions for memory management. Using the core library directly can be cumbersome and error-prone, as many details need to be taken into account: it is designed to be efficient and flexible. To cater to the specific needs of the end user, we offer different interfaces to the core of acados, which are described next.

4.2 The C interface

The C interface is responsible for encapsulating the low-level constructs of the acados core.

Choosing Solvers. Working with the core library, all functions are specific to one variant of a module. When solving a QP with, say, qpOASES, the code will refer to e.g. dense_qp_qpoases_memory, dense_qp_qpoases_opts, etc. We provide an abstraction layer to facilitate switching solvers easily. To this end, for each module we define a ‘plan’. A plan is a struct that contains a number of fields representing the choice of a particular combination of solvers. For example, the plan for an SQP-type method with Gauss-Newton Hessian approximation, for a problem discretized with an ERK integrator using HPIPM as an underlying QP solver, reads as

figure g

Here, the arrays should be of the correct length (omitted for brevity, with a slight abuse of notation). As a general rule, solvers that make use of other modules should include them in their plan.

Passing options. For each module, we manipulate a specific options struct using functions that take a textual representation of the option via a string. The string encodes both the module that the option belongs to, as well as the name of the option.

Memory management. Allocating memory ‘manually’ as described above can quickly become cumbersome. For this reason, we make available a few routines that automate that process. To this end, in the C interface each module from the core library is mirrored by an additional function with signature

figure h

The argument solver is a pointer to a C structure that encapsulates the data needed other than input and output. By doing so, we reduce the amount of boilerplate code.

Convenience routines. The C interface additionally offers helper routines, so-called ‘setters’ and ‘getters’, that wrap the handling of the low-level structs of the acados core.

4.3 High-level interfaces

Often, software for NMPC is coded in scripting languages. Therefore, we offer interfaces to two popular languages for scientific computing: Python and Matlab, where the interface to Matlab is largely compatible with its free and open-source alternative Octave. As such, we created a small domain-specific language within each of these frameworks. We build on top of code from the C interface of acados.

In order to formulate the OCP (3) through the acados modules (cost, constraints and dynamics), the main challenge is to pass the generally nonlinear functions and their derivatives to these modules. The Python and Matlab interfaces of acados use CasADi as a modeling language, i.e. to formulate all generally nonlinear parts of the OCP. The acados high level interfaces are able to use CasADi’s code generation and algorithmic differentiation to generate the C functions needed for each acados module. A major benefit of using CasADi as a modeling language is that the solution behavior of acados can be easily compared with the solutions coming from the numerous optimization tools interfaced with CasADi.

Once the OCP to be solved is described through the domain-specific language implemented by the high-level interfaces, a human readable self-contained C project that makes use of templated code can be generated. The generated project contains all the C code necessary for function and derivative evaluations generated through CasADi and the C code necessary to set up the NLP solver using the acados C interface. Moreover, a Matlab S-Function and a build system for its compilation is generated. Note that this kind of code generation is inherently different from the one in ACADO, since the templated code uses only the functions exposed by the C interface of acados. In contrast to this, ACADO generated solvers are standalone C projects that are extremely problem specific and do not rely on a common library.

With the workflow described above, it is possible to obtain a self-contained, high-performance solver that can be easily deployed on embedded hardware starting from a description of the OCP in a high-level language.

We remark that model equations and other nonlinear functions are called from acados in a completely language-agnostic way: acados is at no point aware of which modeling tool is being used. One benefit is that this facilitates self-written models (in C/C++), which are also completely compatible with acados. However, this is more involved, since in the case of CasADi functions, memory allocation and matrix format conversions are taken into account automatically by the CasADi functions wrapper in acados.

5 Numerical results

This section consists of a few numerical experiments with acados and comparisons to other embedded optimization software packages. We discuss performance on the nonlinear chain-of-masses problem, we present one open-loop example with different Hessian approximations, and show one closed-loop engine control experiment on an embedded platform.

5.1 Case study 1: Chain of masses

As a benchmarking problem, we take the chain-of-masses problem as presented in [95]. The control objective is to stabilize a chain of masses with nonlinear interaction between them. For a full description of the system, we refer to the “appendix”. The system is useful as a benchmark in the sense that the problem is simple enough to understand intuitively, yet complicated enough to get non-trivial results from a range of different solvers. Also, by increasing the number of masses, one could compare behavior for different numbers of states easily, without changing much code.

5.1.1 Closed-loop experiments

In closed-loop, an MPC controller repeatedly solves OCP (10). The first control \(u_0\) is passed to the dynamic system under control and a new initial state \({\overline{x}}_0\) is obtained. Here, we simulate the system by using a more accurate integrator than the one in OCP (10), namely the Dormand-Prince method, as implemented in the Matlab routine ode45.

We introduce one disturbance into the closed-loop system, similar as in [95]: in the beginning of the simulation, we start from a horizontal configuration of the chain of masses. Around the midpoint of the simulation, we override the closed loop control with a constant \(u_\mathrm {d} = [-1, 1, 1]^\top \). After one second of simulation time, the controller takes over again.

We compare the following solvers with each other for this particular closed-loop setup:

  • IPOPT [94]. As a solver not targeting embedded devices specifically, we use it as a baseline to compare against.

  • FalcOPT [88]. A projected gradient descent method tailored for NMPC.

  • VIATOC [50]. A gradient projection method for MPC that only allows linear inequality constraints.

  • ACADO Code Generation tool [49]. Generates SQP-based solvers.

  • GRAMPC [26]. An embedded Augmented Lagrangian-based solver.

  • acados. Framework presented in the current paper.

The tuning parameters for the different solvers are listed in Table 3.

Table 3 Solver options for the different solvers in Case Study 1

In order to compare the quality of the closed-loop solutions, we use the notion of distance-from-reference (DR), which is an approximation of the integrated cost along closed-loop trajectories:

$$\begin{aligned} \mathrm {DR}_{(\cdot ), n} = \sum _{i=0}^{n} \begin{bmatrix}x_{(\cdot ),i}-x_\mathrm {ref} \\ u_{(\cdot ),i}-u_\mathrm {ref}\end{bmatrix}^\top \begin{bmatrix} Q &{} 0\\ 0&{} R \end{bmatrix} \begin{bmatrix}x_{(\cdot ),i}-x_\mathrm {ref} \\ u_{(\cdot ),i}-u_\mathrm {ref}\end{bmatrix}. \end{aligned}$$

To compare the different solvers, we plot the relative cumulative sub-optimality (RCSO), relative to a fully converged solution, in this case, the IPOPT solution, which reads as

$$\begin{aligned} \mathrm {RCSO}_{(\cdot ),n} = \left| \frac{\mathrm {DR}_{(\cdot ),n} - \mathrm {DR}_{\texttt {ipopt},n}}{\mathrm {DR}_{\texttt {ipopt},n}}\right| , \end{aligned}$$

where \(n = 0,\ldots , 300\) denotes the time step in our simulation. We show a comparison in Table 4. The results for ACADO and acados are exactly the same, as they implement the same real-time algorithm, with both being very close to the reference solution from IPOPT. The solvers GRAMPC, VIATOC and FalcOPT, being based on first-order methods, are further away from the IPOPT solution. These findings are consistent with previously published work by other authors, see [26].

Table 4 Relative suboptimality at the end of the simulation of the hanging chain with \(M=5\) and \(N=40\)

We have a look at the computational performance along the closed-loop trajectories in Fig. 2. GRAMPC, ACADO, VIATOC and acados produce consistent timings throughout the entire experiment, even when the disturbance occurs. This is a beneficial property for embedded solvers, as they often have a fixed time deadline, being part of a larger control application. GRAMPC and acados produce solutions at almost the same speed, both approximately a factor 2 faster than ACADO which is in turn a factor 2-3 faster than VIATOC. Near the equilibrium, FalcOPT takes the shortest computation time, as it needs to perform only a few gradient steps per iteration. IPOPT is included as a baseline for comparison to non-embedded solvers. The timings are summarized in Table 5.

Fig. 2
figure 2

Computational time for each iteration of the closed loop simulation, averaged over 10 runs

Table 5 Computation times for the closed-loop experiments on a chain of masses (cf. Fig. 2)

Of course, any solver can trade off sub-optimality for computation time. To get the full picture, we plot both measures against each other in Fig. 3: we look at relative cumulative sub-optimality over the entire length of the experiment, versus worst-case computation times. By this comparison, we see that acados and GRAMPC are on the Pareto-optimal front: although acados is a factor 1000 less suboptimal than GRAMPC, the computational cost is higher. By the median computation times, acados is faster (see Table 5).

Fig. 3
figure 3

Trade-off between sub-optimality (Table 4) and computation time (Fig. 2). We see that acados and GRAMPC lie on the Pareto-optimal front

5.2 Case study 2: Hessian regularization

In Sect. 2.8, we briefly mentioned the impact of Hessian regularization on SQP methods. In this case study, we compare the convergence of exact-Hessian based SQP with three different Hessian regularizations, on a simple control problem, namely a cart-pole swingup, which is described in the “appendix”. Note that in this case study, as opposed to the others, we do not perform closed-loop experiments but solve one optimal control problem up until convergence.

5.2.1 Exact-Hessian based SQP

We solve OCP (12) with SQP, where we use the exact Hessian of the Lagrangian. In the notation of (5):

$$\begin{aligned} H_k&= \begin{bmatrix} Q &{} 0\\ 0&{} R \end{bmatrix} + \sum _{i=0}^{n_x} \pi _{k,i} \nabla ^2_{(x,u)} \phi ^x_i(x_k, u_k), \quad k=0,\ldots ,N-1 \\ H_N&= Q, \end{aligned}$$

where \(\pi _{k,i}\) are the Lagrange multipliers associated with the dynamic equality constraints.

In some cases, the non-convexity of the dynamic equations gives rise to an indefinite Hessian matrix. We apply the \(\mathrm {project}(\cdot )\) and \(\mathrm {mirror}(\cdot )\) regularizations, as well as the convexification method previously mentioned in Sect. 2.6. All are implemented as modules in the acados framework.

We compare the convergence of the SQP iterates obtained using the three different regularization methods. For each SQP variant, we start the SQP iterations from an initialization point with zeros for all states except a linearly decreasing initialization for the angle, from \(\pi \) to zero. The result can be seen in Fig. 4. The structure-exploiting convexification converges almost twice as fast as the projection regularization, and is in turn much faster than the mirroring regularization. Intuitively, this makes sense, as mirroring is ‘blocking’ directions associated with large negative eigenvalues, by introducing large positive eigenvalues in those directions. This prevents the solver from taking larger stepsFootnote 2. In turn, the structure-exploiting regularization is faster than merely projecting the eigenvalues on the positive definite cone, because it is redistributing convexity among all stages, and thus needs less regularization overall.

Fig. 4
figure 4

Convergence comparison of exact-Hessian based SQP with three different regularization strategies

Table 6 Exact-Hessian based SQP: computation times

It must be said that the convexification method is quite a bit more involved than the other two regularization schemes. However, by using the optimized linear algebra routines of BLASFEO, we implemented the convexification method such that it is only slightly more expensive per iteration than the basic regularization methods, see Table 6, but much less computationally expensive overall. Thus, the Hessian convexification method allows us to perform exact-Hessian based NMPC online, with better performance than state-of-the-art methods.

5.3 Case study 3: Hardware-in-the-loop experiments for an engine control application

As a last case study, we discuss the performance of acados on an embedded platform, namely the dSPACE MicroAutoboxII [25]. It is an industrial computing platform that is used in the car industry. It features a 900 MHz PowerPC processor (IBM PPC 750GL) with 16MB of main memory. The control application that we focus on is engine control, with the engine model as presented in [3], which we will briefly reproduce in the “appendix”.

The control objective is to track a boost pressure signal, where the boost pressure is given by \(y_p(x) := \varPi _{c,\mathrm {lp}} \cdot \varPi _{c,\mathrm {hp}}\) (see “appendix”). To this end, we solve an OCP arising from a multiple shooting formulation with the Gauss-Legendre method of order 6 with sampling time \(0.05 \, \mathrm {s}\) and \(N=20\) shooting intervals. The DAE simulation functions are denoted by \(\phi \). Let \(r(x, u) = [y_p(x); x; u]\) and \(r_N(x) = [y_p(x); x]\). The OCP then reads as

figure i

Constraints on \(\varPi _{c,\mathrm {lp}}, \varPi _{t,\mathrm {hp}}\) are included to prevent damage to the compressor. The exact values of the weight matrices and the reference vectors can be found in the “appendix”.

Fig. 5
figure 5

Closed-loop simulation of the engine control task with steps in the reference boost pressure. Simulations are carried out on the dSPACE MicroAutoboxII platform at a clock speed of \(900\,\mathrm {MHz}\)

We repeatedly solve OCP (8) approximately by performing real time iterations. As an underlying QP solver, we use HPIPM. When ran in closed loop on the dSPACE MicroAutoboxII, the results can be seen in Fig. 5. Control bounds and state bounds become active at some point in the simulation, for the high-pressure stage. The reference is tracked closely and without oscillations, which have been observed when linear-quadratic MPC is used [3]. As for the computation times, it is interesting to note that there are spikes everywhere where a jump occurs or a constraint becomes (in)active. The computation times close to the solution (i.e. at the beginning of the simulation) drop to almost zero. Overall, the maximum computation time remains under \(10\,\mathrm {ms}\), which is 5x faster than the sampling time of the system (\(50\,\mathrm {ms}\)).

6 Conclusion and outlook

In this article, we presented acados, a new software package for embedded optimization. It is free and open-source software that facilitates rapid testing and deployment of (N)MPC algorithms on embedded hardware platforms. For ease of use, we offer interfaces with higher-level languages such as Matlab and Python.

Among many features that state-of-the-art NMPC algorithms require, a couple of new features that are not present in any other software package is the convexification procedure of Sect. 2.6, allowing the use of exact-Hessian based SQP methods in real-time, and the SCQP Hessian approximation. Additionally, the structure exploiting GNSF-IRK integrator has the potential to speed up the simulation and sensitivity propagation tasks within an NMPC scheme. Furthermore, acados features partial condensing, different state and control dimensions per multiple shooting stage, the use of BLASFEO as a linear algebra backend and facilities for using CasADi as a modeling language.

The software is shown to be embeddable, by numerical experiments on the dSPACE MicroAutoboxII industrial computer, resulting in computation times in the millisecond range for a non-trivial NMPC problem. Furthermore, it is shown to be fast, by comparison to other embedded optimization packages. acados is already being used by others in various control and estimation applications, such as reluctance synchronous machines [97], real-time estimation of muscle forces [7], wind turbine fatigue [64], quadrotors [14], autonomous car racing [55], etc.

acados is an ongoing endeavor. Future work includes extending interoperability with Simulink for easier deployment on embedded systems, and adding features for nonlinear interior-point methods, as well as other SQP-based methods like multi-level iterations.

7 Appendix A—case study details

8 Case Study 1: Chain of masses

8.1 System description

See Tables 7, 8 and Figs. 7, 8.

The control objective in this example is to stabilize the motion of a chain of \(M=5\) balls with mass m connected by springs to an equilibrium position. The mass on one end of the chain is fixed at (0, 0, 0). The mass on the other end can be freely moved.

Let \(p_i\) be the position of mass i, for \(i=1,\ldots ,M\). The model equations can then be derived as follows. From Hooke’s law, we know that (see Fig. 6)

$$\begin{aligned} F_{i,i+1} = D\left( 1-\frac{L}{\Vert p_{i+1} - p_{i}\Vert }\right) (p_{i+1}-p_i), \end{aligned}$$

with each spring having spring constant D and rest length L.

Fig. 6
figure 6

Forces in the springs between the masses in the example of Case Study 1. Replicated from [95]

This allows us to write the equations of motion for the middle balls, which read as

$$\begin{aligned} \ddot{p}_i = \frac{1}{m} (F_{i, i+1} - F_{i-1, i}) + g_z, \quad i=2,\ldots ,M-1, \end{aligned}$$

with \(g_z\) the gravitational acceleration vector. The control input \( u\in \mathbb {R}^3 \) directly controls the velocity of the free mass:

$$\begin{aligned} {\dot{p}}_M = u. \end{aligned}$$

We now introduce a state space formulation with states

$$\begin{aligned} x^\top = [p_2^\top , p_3^\top , \ldots , p_{M-1}^\top , p_M^\top , v_2^\top , v_3^\top , \ldots , v_{M-1}^\top ]\in \mathbb {R}^{n_x} \end{aligned}$$

with \(n_x=3 \cdot (2\cdot (M-2)+1)\), which results in the following ODE:

$$\begin{aligned} {\dot{x}} = f(x, u) = \begin{bmatrix} v_2 \\ \vdots \\ v_{M-1} \\ u \\ \frac{1}{m} (F_{2, 3} - F_{1, 2}) + g_z \\ \vdots \\ \frac{1}{m} (F_{M-1, M} - F_{M-2, M-1}) + g_z \\ \end{bmatrix}. \end{aligned}$$
(9)

We remark that the only nonlinearity is introduced in the calculation of the forces. The steady state \((x_\mathrm {ss}, u_\mathrm {ss})\) of the system can be found by setting \(f(x_\mathrm {ss}, u_\mathrm {ss}) = 0\) for any given \(p_{M,\mathrm {ss}}\). We choose \(p_{M,\mathrm {ss}} = [7.5, 0, 0]^\top \) for the experiments.

8.2 Optimal control problem formulation

In order to stabilize the motion of the chain of masses to the steady state, we propose the following optimal control problem, obtained by performing a multiple shooting discretization of ODE (9):

figure j

where the initial state \({\overline{x}}_0\) is the current estimate of the state vector, \(\phi : \mathbb {R}^{n_x} \times \mathbb {R}^{3} \rightarrow \mathbb {R}^{n_x}\) is obtained by performing a single RK4 step of length \(0.2\,\mathrm {s}\) on ODE (9). Furthermore, we choose a horizon length \(N=40\), the weighting matrices

$$\begin{aligned} Q&= \mathrm {diag}(\underbrace{0,\ldots , 0,}_{p_2,\ldots ,p_{M-1}} \underbrace{2.5, 2.5, 2.5,}_{p_M} \underbrace{25, \ldots , 25}_{v_2,\ldots ,v_{M-1}}), \\ P&= \mathrm {diag}(\underbrace{0,\ldots , 0,}_{p_2,\ldots ,p_{M-1}} \underbrace{10, 10, 10,}_{p_M} \underbrace{0, \ldots , 0}_{v_2,\ldots ,v_{M-1}}), \\ R&= \mathrm {diag}(0.1, 0.1, 0.1), \end{aligned}$$

the reference control input \( u_\mathrm {ref} = [0, 0, 0]^\top \) and the state reference

$$\begin{aligned} x_\mathrm {ref}&= [\underbrace{0,\ldots ,0}_{p_2,\ldots ,p_{M-1}}, \underbrace{7.5, 0, 0}_{p_M}, \underbrace{0, \ldots , 0}_{v_2,\ldots ,v_{M-1}}]^\top . \end{aligned}$$

The design parameters are chosen as in [26] and are summarized in Table 7. Note that we did not introduce path constraints or state bounds, since these are not supported by all solvers that we compare to below.

Table 7 Design parameters for the chain of masses case study

9 Case Study 2: Hessian regularization

We control a mass on a rod (a pendulum), balanced on a horizontally moving cart, see Fig. 7. The goal is to swing up the pendulum from a stable equilibrium position, namely hanging down.

Fig. 7
figure 7

Schematic illustrating the inverted pendulum on top of a cart

The dynamics of the cart-pendulum are described by the following ordinary differential equation, where pv are the horizontal displacement and velocity, respectively, \(\theta \) is the angle with the vertical and \(\omega \) the corresponding angular velocity:

$$\begin{aligned} \dot{p}&= v, \end{aligned}$$
(11a)
$$\begin{aligned} {\dot{\theta }}&= \omega , \end{aligned}$$
(11b)
$$\begin{aligned} \dot{v}&= \frac{-ml\sin (\theta ){\dot{\theta }}^2 + mg\cos (\theta )\sin (\theta ) + F}{M + m - m(\cos (\theta ))^2}, \end{aligned}$$
(11c)
$$\begin{aligned} {\dot{\omega }}&= \frac{ -ml\cos (\theta )\sin (\theta ){\dot{\theta }}^2 + F\cos (\theta ) + (M+m)g\sin (\theta )}{l(M + m - m(\cos (\theta ))^2)}. \end{aligned}$$
(11d)

We collect the states in the state vector \(x:=[p,\theta ,v,\omega ]^\top \), the control u is the horizontal force F. Transcribing the continuous-time OCP with multiple shooting gives rise to the following OCP:

$$\begin{aligned} \underset{\begin{array}{c} x_0, \ldots , x_N, \\ u_0, \ldots , u_{N-1} \end{array}}{\mathrm {minimize}} \quad \sum _{k=0}^{N-1}&\begin{bmatrix} x_k \\ u_k \end{bmatrix}^\top \begin{bmatrix} Q &{} 0\\ 0&{} R \end{bmatrix} \begin{bmatrix} x_k \\ u_k \end{bmatrix} + x_N^\top Q \, x_N \end{aligned}$$
(12a)
$$\begin{aligned} \mathrm {subject~to} \quad x_0&= {\overline{x}}_0, \end{aligned}$$
(12b)
$$\begin{aligned} x_{k+1}&= \phi ^x_k(x_k,u_k),&k=0,\ldots ,N{-}1, \end{aligned}$$
(12c)
$$\begin{aligned} -80&\le u_k \le 80,&k=0,\ldots ,N{-}1, \end{aligned}$$
(12d)

where \(\phi \) is an RK4 integrator, simulating (11) over one shooting interval. The weight matrices are chosen as

$$\begin{aligned} Q=\mathrm {diag}([1000,1000,0.01,0.01]), \quad R=0.01. \end{aligned}$$

Because our aim is to swing up the pendulum, we selected strong weights on the position and angle. The other states and the control are assigned a weak penalty in order to avoid too abrupt swing-ups and to favor smooth trajectories. Note that the weighting matrices Q and R are tuning parameters used by the control engineer in the design process in order to obtain a desired behavior. Different choices are therefore equally valid. The initial value is \({\overline{x}}_0 = [0,\pi ,0,0]^\top \). We choose \(N=100\) shooting intervals of length \(0.01\,\mathrm {s}\).

10 Case study 3: hardware-in-the-loop experiments for an engine control application

Two-stage turbocharging gasoline engines are investigated to overcome the drawbacks of conventional (single-stage) turbocharging. The main advantage they offer is a better trade-off between short transient times after load changes and a high specific power. However, the two-stage architecture puts a higher demand on the engine controller, due to the nonlinear nature with cross-couplings in the inputs and the necessity to consider constraints. NMPC has been proposed as a viable control strategy [3].

In Fig. 8, a sketch of the two-stage turbocharged engine is depicted. The high-pressure (HP) stage is able to realize fast transients, the low-pressure (LP) stage produces a higher specific power, but with slower dynamics. The control challenge lies in accurately tracking the boost pressure \(p_\mathrm {boost}\), given the highly nonlinear coupling between both stages.

Fig. 8
figure 8

Schematic of the two-stage turbocharging concept. Source: [3]

For reasons of brevity, we directly present the engine model of [3] and refer the interested reader to that work for a derivation. We model the engine with a set of semi-explicit DAEs. The differential states consist of \(\varPi _{c,\mathrm {lp}}, \varPi _{c,\mathrm {hp}}\) the pressure ratios between input and output of the compressor in the low pressure and high pressure stage, respectively. The algebraic states are \(\varPi _{t,\mathrm {lp}}, \varPi _{t,\mathrm {hp}}\), the pressure ratios on the turbine. The inputs are the wastegate actuation pulse-width modulated signals \(u_\mathrm {wg,lp}, u_\mathrm {wg,hp}\), which take on values between \(0\%\) (fully open) and \(100\%\) (fully closed).

The resulting DAE system reads as

$$\begin{aligned} \dot{\varPi }_{c,\mathrm {lp}}&= c_1 (\varPi _{t,\mathrm {lp}}^{1.5} - \varPi _{t,\mathrm {lp}}^{1.25}) \sqrt{\varPi _{t,\mathrm {lp}}^{-1.5} - \varPi _{t,\mathrm {lp}}^{-1.75}} \end{aligned}$$
(13)
$$\begin{aligned}&\quad - c_2 n_\mathrm {eng} \varPi _{c,\mathrm {hp}} (\varPi _{c,\mathrm {lp}}^{1.29} - \varPi _{c,\mathrm {lp}}) \end{aligned}$$
(14)
$$\begin{aligned} 0&= \varPi _{c,\mathrm {lp}} \varPi _{c,\mathrm {hp}} \end{aligned}$$
(15)
$$\begin{aligned}&\quad - \frac{c_3}{n_\mathrm {eng}} \sqrt{\varPi _{t,\mathrm {lp}}^{0.5} - \varPi _{t,\mathrm {lp}}^{0.25}} \left( \sqrt{\varPi _{t,\mathrm {lp}}} + c_4 \eta (\varPi _{c,\mathrm {lp}} \cdot \varPi _{c,\mathrm {hp}}, u_\mathrm {wg,lp}) \right) \end{aligned}$$
(16)
$$\begin{aligned} \dot{\varPi }_{c,\mathrm {hp}}&= c_5 (\varPi _{t,\mathrm {hp}}^{1.5} - \varPi _{t,\mathrm {hp}}^{1.25}) \sqrt{\varPi _{t,\mathrm {hp}}^{-1.5} - \varPi _{t,\mathrm {hp}}^{-1.75}} \end{aligned}$$
(17)
$$\begin{aligned}&\quad - c_6 n_\mathrm {eng} \varPi _{c,\mathrm {lp}} (\varPi _{c,\mathrm {hp}}^{1.29} - \varPi _{c,\mathrm {hp}}) \end{aligned}$$
(18)
$$\begin{aligned} 0&= \varPi _{c,\mathrm {lp}} \varPi _{c,\mathrm {hp}} \end{aligned}$$
(19)
$$\begin{aligned}&\quad - \frac{c_7}{n_\mathrm {eng}} \sqrt{\varPi _{t,\mathrm {hp}}^{0.5} - \varPi _{t,\mathrm {hp}}^{0.25}} \left( \sqrt{\varPi _{t,\mathrm {hp}}} + c_8 (1-u_\mathrm {wg,hp}/100) \right) , \end{aligned}$$
(20)

with, additionally, \(n_\mathrm {eng} = 2000 \, \mathrm {min}^{-1}\) the engine speed. We model it as a measured disturbance, in this case a constant. Furthermore, \(\eta : \mathbb {R}\times \mathbb {R}\rightarrow \mathbb {R}\) is defined by

$$\begin{aligned} \eta (u, v)&= \gamma _1(u) \cdot \gamma _2(v), \end{aligned}$$

with \(\gamma _i: \mathbb {R}\rightarrow \mathbb {R}\):

$$\begin{aligned} \gamma _i(u)&= b_{1,i} + b_{2,i}\left( 1 + e^\frac{-u+b_{3,i}}{b_{4,i}}\right) ^{-1}. \end{aligned}$$

The values of all model parameters can be found in Table 8.

Table 8 Parameter values for the two-stage turbocharged engine model

In order to obtain a smooth control behavior, we include the time derivative of the controls in the optimization formulation, as follows: \({\dot{u}}_\mathrm {wg,lp} = d_{u, \mathrm {lp}}\), \({\dot{u}}_\mathrm {wg,hp} = d_{u, \mathrm {hp}}\), and we collect these rates in

$$\begin{aligned} d = \begin{bmatrix} d_{u, \mathrm {lp}} \\ d_{u, \mathrm {hp}} \end{bmatrix}. \end{aligned}$$

We then define the vector of differential states, algebraic states and controls, respectively, as follows:

$$\begin{aligned} x = \begin{bmatrix} \varPi _{c,\mathrm {lp}} \\ \varPi _{c,\mathrm {hp}} \\ u_\mathrm {wg,lp} \\ u_\mathrm {wg,hp} \end{bmatrix}, \quad z = \begin{bmatrix} \varPi _{t,\mathrm {lp}} \\ \varPi _{t,\mathrm {hp}} \end{bmatrix}, \quad u = \begin{bmatrix} d_{u, \mathrm {lp}} \\ d_{u, \mathrm {hp}} \end{bmatrix}. \end{aligned}$$

The values of parameters in OCP (8) are

$$\begin{aligned} y_{r,k}&= [y_{pr,k}; 1.14; 1.54; 50; 50; 0; 0], \\ W&= \mathrm {diag}([10^3, 10^{-3}, 10^{-3}, 10^{-3}, 10^{-3}, 10^{-4}, 10^{-4}]), \\ y_{r,N}&= [y_{pr,N}; 1.14; 1.54; 50; 50], \\ W_N&= \mathrm {diag}([10^3, 10^{-3}, 10^{-3}, 10^{-3}, 10^{-3}]) \end{aligned}$$

with \(y_{pr,k}, k=0,\ldots ,N\) varied as in Fig. 5.