1 Introduction

We consider the nonlinear programming problem to minimize an objective function subject to inequality constraints,

$$\begin{aligned} x\in {I\!R}^{n}:\begin{array}{ll} \min \;\; f(x)\\ g(x) \le 0, &{} \end{array} \end{aligned}$$
(1)

where \(x\) is an \(n\)-dimensional parameter vector. It is assumed that the functions \(f(x)\) and \(g(x) = (g_1(x),\ldots ,g_m(x))^T\) are twice continuously differentiable on the \({I\!R}^{n}\). To illustrate the underlying mathematical algorithm, we omit equality constraints and upper and lower bounds of variables to facilitate the notation.

The basic idea is to mix a sequential quadratic programming (SQP) and an interior point method (IPM) for nonlinear programming. In an outer loop, a sequence of quadratic programming subproblems is constructed by approximating the Lagrangian function

$$\begin{aligned} L(x,u):=f(x) + u^Tg(x) \end{aligned}$$
(2)

quadratically and by linearizing the constraints. The resulting quadratic programming subproblem (QP)

$$\begin{aligned} d\in {I\!R}^{n} :\begin{array}{ll} \min \;\; \frac{1}{2}d^{T}H(x_k,u_k)d+\nabla f(x_{k})^{T}d\\ g(x_{k})+\nabla g(x_{k})d \le 0 &{} \\ \end{array} \end{aligned}$$
(3)

is then solved by an interior point solver. A pair \((x_{k},u_{k})\) denotes the current iterate in the primal-dual space, \(H(x_{k},u_{k})\) the Hessian of the Lagrangian function of (2), i.e., \(H(x_{k},u_{k})=\nabla _{xx} L(x_k,u_k)\) or a suitable approximation, and \(\nabla g(x_{k})\) is the Jacobian matrix of the vector of constraints. We call \(x_k \in I\!R^n\) the primal and \(u_k \in I\!R^m\) the dual variable or the multiplier vector, respectively. The index \(k\) is an iteration index and stands for the \(k\)-th step of the optimization algorithm, \(k=0,1,2,\ldots \).

Sequential quadratic programming methods are well known, and numerous modifications and extensions have been published on SQP methods. Review papers are given by Boggs and Tolle [2] and Gould and Toint [12]. Most optimization textbooks have chapters on SQP methods, see, for example, see Fletcher [9], Gill, Murray and Wright [10], and Sun and Yuan [25]. A method for solving large scale quadratic programs is introduced in Cafieri et al. [4].

An alternative approach is the interior point method (IPM) developed in the 90’s, see, e.g., Griva et al. [14], There are numerous alternative algorithms and implementations available differing especially by their stabilization approaches, by which convergence towards a stationary point can be guaranteed, see, e.g., Byrd, Gilbert, and Nocedal [3], D’Apuzzo et al. [7], or the review paper of Gondzio [11]. The underlying strategy consists of replacing the constrained optimization problem (1) by a simpler one without inequality constraints,

$$\begin{aligned} x\in {I\!R}^{n}, s \in I\!R^m:\begin{array}{l} \min \;\; f(x) - \mu \sum _{j=1}^m \log \left( s_j\right) \\ g(x) + s = 0.\\ \end{array} \end{aligned}$$
(4)

Here, \(s=(s_1,\ldots ,s_m)^T>0\) denotes a vector of \(m\) slack variables, where the positivity has to be guaranteed separately. The smaller the so-called barrier term \(\mu \) is, the closer are the solutions of both problems. It is essential to understand that we now have \(n+m\) primal variables \(x\) and \(s\), and in addition \(m\) dual variables \(u\in I\!R^m\), the multipliers of the equality constraints of (4). Since, however, these multiplier approximations are also used to approximate the multipliers of the inequality constraints of (1), we also require that \(u>0\) throughout the algorithm.

By applying Newton’s method to the KKT conditions (4), we construct a sequence of systems of linear equations, see, for example, Byrd, Gilbert, and Nocedal [3]. Thus, we solve a so-called primal-dual system of linear equations

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c} H_{k} &{} \nabla g(x_{k})^{T} &{} 0\\ \nabla g(x_{k}) &{} -B_{k} &{} I\\ 0 &{} S_{k} &{} U_{k}\end{array}\right) \left( \begin{array}{c} d^{x}_{k}\\ d^{u}_{k}\\ d^{s}_{k}\end{array}\right) +\left( \begin{array}{c} \nabla f(x_{k})+\nabla g(x_{k})^{T}u_{k}\\ g(x_{k})+s_{k}-C_{k}u_{k}\\ S_{k}u_{k} - \mu _{k}e\end{array}\right) =0. \end{aligned}$$
(5)

Here, \(k\) denotes the actual iteration index and \(x_{k}\), \(s_{k}\), and \(u_{k}\) are the primal and dual iterates. \(S_{k}\) and \(U_{k}\) are positive diagonal matrices containing the vectors \(s_{k}\) and \(u_{k}\) along the diagonal. \(B_{k}, C_{k} \in I\!R^{m\times m}\) are positive diagonal regularization matrices and \(I\) denotes the identity matrix. Moreover, we introduce \(e=(1,...,1)^T \in I\!R^m\). The barrier term \(\mu _{k}\) introduced in (4), is internally adapted and depends now on the iteration index \(k\).

A step length \(\alpha _k>0\) along \(d_k=(d^x_k,d^s_k,d^u_k)\) is determined to achieve sufficient decrease of a merit function and to get the next iterate

$$\begin{aligned} \left( \begin{array}{c} x_{k+1} \\ s_{k+1} \\ u_{k+1} \end{array}\right) = \left( \begin{array}{c} x_{k} \\ s_{k} \\ u_{k} \end{array}\right) + \alpha _k \left( \begin{array}{c} d^x_{k} \\ d^s_{k} \\ d^u_{k} \end{array}\right) \!, \end{aligned}$$
(6)

where \(0 < \alpha _k \le 1\) and where \(s_{k+1}>0\) and \(u_{k+1}>0\) must be guaranteed by recursive reduction of \(\alpha _k\). The matrix \(H_k\) in (5) is either the Hessian matrix of the Lagrangian function \(L(x_k,u_k)\) or a corresponding quasi-Newton matrix updated in each step. Convergence is measured by evaluating the norm of

$$\begin{aligned} F\left( x_k,s_k,u_k\right) := \left( \begin{array}{c} \nabla f(x_k) + \nabla g(x_k)^Tu_k \\ g(x_k) + s_k \\ S_ku_k \end{array}\right) \!. \end{aligned}$$
(7)

In Sect. 2 we give a brief overview of the algorithm, and some numerical results are summarized in Sect. 3.

2 The combined SQP–IPM algorithm

In our situation, we proceed from an SQP algorithm, where the quadratic programming subproblem (3) is solved by an interior-point method for a fixed \(k\), i.e., we replace (3) by

$$\begin{aligned} \begin{array}{l}d^x\in {I\!R}^{n},\\ d^s \in I\!R^m\end{array} :\begin{array}{l} \min \;\; \frac{1}{2}{d^x}^{T}H(x_k,u_k)d^x + \nabla f\left( x_{k}\right) ^{T}d^x - \mu _k \sum _{j=1}^m \log \left( s_j^k+d^s_j\right) \\ g(x_{k})+\nabla g(x_{k})d^x + s_k + d^s = 0,\end{array} \end{aligned}$$
(8)

where \(d^x\) is the primal variable, and \(s_k+d^s\) is considered as slack variable to facilitate the subsequent notation. \(d^u\) denotes now the implicitly defined dual variable of (8). Moreover, we use the notation \(s_k=(s^k_1,\ldots ,s^k_m)^T\) and \(d^s=(d^s_1,\ldots ,d^s_m)^T\).

The approximate primal and dual solution returned by an IPM solver depends on an internal iteration index \(l\) and is denoted by \(d^x_{k,l}\), \(d^s_{k,l}\), and the multiplier approximation \(d^u_{k,l}\), respectively. They are supposed to converge towards a solution of the quadratic program (3). Note that we do not change the Hessian matrix \(H(x_k,u_k)\) during the inner iteration.

Thus, the overall algorithm consists of two nested loops identified by two iteration indices \(k\) and \(l\). By \(x_{k}\), \(s_{k}\), and \(u_{k}\) we denote the outer iterates of primal, slack, and dual variables, respectively, \(k=0,1,2,\ldots \). \(x_0\) is a user-provided starting point and \(u_0>0\), \(s_0>0\) are usually set internally by the algorithm. The multiplier and slack variables must satisfy \(u_k>0\) and \(s_k>0\) in all subsequent steps. Correspondingly, \(d^{x}_{k,l}\), \(d^{s}_{k,l}\), and \(d^{u}_{k,l}\) are the iterates of the inner cycle with \(s_k+d^{s}_{k,l}>0\) and \(d^{u}_{k,l}>0\), \(l=0,1,2,\ldots \). To get an SQP method, the inner loop continues until termination at an optimal solution subject to a small tolerance. The outer loop requires an additional line search along the direction obtained by the inner loop, to converge towards a stationary point.

On the other hand, a user may terminate the inner loop at any time, e.g., by setting a small value for the maximum number of iterations. Thus, it is not required to solve the quadratic programming problem exactly. A possible reason could arise when solving very large optimization problems with relatively fast function and gradient evaluations, to avoid time-consuming linear algebra manipulations of the internal QP solver.

Applying again Newton’s method to the KKT optimality conditions of (8), we get a primal-dual system of linear equations in each iteration of the inner loop, formulated now in the primal and the dual space analogously to (5),

$$\begin{aligned}&\left( \begin{array}{c@{\quad }c@{\quad }c} H_{k} &{} \nabla g(x_{k})^{T} &{} 0\\ \nabla g(x_{k}) &{} -B_{k,l} &{} I\\ 0 &{} D^s_{k,l} &{} D^u_{k,l}\end{array}\right) \left( \begin{array}{c} \Delta d^{x}_{k,l}\\ \Delta d^{u}_{k,l}\\ \Delta d^{s}_{k,l}\end{array}\right) \nonumber \\&\qquad \quad + \left( \begin{array}{c} H_{k}d^{x}_{k,l}+\nabla f(x_{k})+\nabla g(x_{k})^{T}d^u_{k,l}\\ g(x_{k}) + \nabla g(x_{k})d^{x}_{k,l} + s_k+d^s_{k,l} - C_{k,l}d^u_{k,l}\\ D^s_{k,l}d^u_{k,l} - \mu _{k,l}e\end{array}\right) = 0 . \end{aligned}$$
(9)

Here, \(k\) denotes the outer iteration index, \(l\) an inner iteration index, and \(x_{k}\), \(s_{k}\), and \(u_{k}\) are the outer iterates. \(D^s_{k,l}\), and \(D^u_{k,l}\) are positive diagonal matrices containing the vectors \(s_k+d^s_{k,l}\) and \(d^u_{k,l}\) along the diagonal. \(B_{k,l}, C_{k,l} \in I\!R^{m\times m}\) are positive diagonal regularization matrices. Their choice depends on the used merit function to get a descent direction. For the \(l_2\)-merit function (16) and the flexible penalty function (18), for example, we adapt the regularization of Chen and Golfarb [5] to our SQP–IPM algorithm,

$$\begin{aligned} B_{k,l}=C_{k,l}=\frac{\left\| g(x_{k})+\nabla g(x_{k})d_{k,l}^{x}+s_{k}+d_{k,l}^{s}\right\| _{2}}{r_{k}}I. \end{aligned}$$
(10)

\(r_k\) is a penalty parameter updated in the outer cycle to guarantee descent of a merit function. The corresponding update formulae depend on the merit function chosen, and are found in the references.

The barrier term \(\mu _{k,l}\) introduced in (9) is internally adapted and depends on the iteration indices \(k\) and \(l\). Convergence is obtained if the right-hand side of (9) is below a tolerance \(\varepsilon _k>0\).

After solving (9), we get new iterates

$$\begin{aligned} d^{x}_{k,l+1}&= d^{x}_{k,l}+\alpha _{k,l}\Delta d^{x}_{k,l},\nonumber \\ d^{s}_{k,l+1}&= d^{s}_{k,l}+\alpha _{k,l}\Delta d^{s}_{k,l},\\ d^{u}_{k,l+1}&= d^{u}_{k,l}+\alpha _{k,l}\Delta d^{u}_{k,l},\nonumber \end{aligned}$$
(11)

where \(\alpha _{k,l}\in (0,1]\) is a step length parameter. To simplify the analysis, we do not distinguish between a primal and a dual step length. By applying the fraction to the boundary rule, we get an \(\alpha _{k,l}\) such that the inner iterates satisfy

$$\begin{aligned} \begin{array}{l} d^u_{k,l+1} \ge (1-\tau )d^u_{k,l},\\ d^s_{k,l+1} \ge (1-\tau )d^s_{k,l}, \end{array} \end{aligned}$$
(12)

with, e.g., \(\tau =0.995\).

However, the step length might be still too long and is reduced further to guarantee the descent of a merit function

$$\begin{aligned} \tilde{\Phi }_{\mu ,r}\left( x,s,u,d^{x},d^{s},d^{u}\right) \!, \end{aligned}$$
(13)

where \(\mu \) is a barrier parameter and \(r\) is a penalty parameter which must be carefully chosen to guarantee a sufficient descent property, i.e., at least

$$\begin{aligned} \begin{array}{l} \tilde{\Phi }_{\mu _{k,l},r_k}\left( x_{k},s_{k},u_{k},d^{x}_{k,l+1}, d^{s}_{k,l+1}, d^{u}_{k,l+1}\right) \\ \quad \le \tilde{\Phi }_{\mu _{k,l},r_k}\left( x_{k},s_{k},u_{k},d^{x}_{k,l},d^{s}_{k,l},d^{u}_{k,l}\right) \!. \end{array} \end{aligned}$$
(14)

The merit function \(\tilde{\Phi }_{\mu ,r}\) is closely related to the merit function one has to apply in the outer cycle. Here, the step length parameter \(\alpha _k\) is adapted such that a sufficient descent property subject to a merit function \(\Phi _{\mu ,r}(x,s,u)\) is obtained, i.e., that we are able to find a penalty parameter \(r_{k}\) satisfying (14) and

$$\begin{aligned} \begin{array}{l} \Phi _{\mu _{k},r_{k}}\left( x_{k}+\alpha _k d^{x}_{k},s_{k}+\alpha _k d^{s}_{k},u_{k}+\alpha _k d^{u}_{k}\right) \\ \quad \le \Phi _{\mu _{k},r_{k}}\left( x_{k},s_{k},u_{k}\right) + \nu \;\alpha _k\nabla _{d_k}\Phi _{\mu _{k},r_{k}}\left( x_{k},s_{k},u_{k}\right) \end{array} \end{aligned}$$
(15)

where \(0<\alpha _k\le 1\) is a sufficiently small stepsize and \(\nu > 0\) is a given constant. Note that the inner product on the right-hand side of the inequality is always negative. \(\nabla _{d_k}\) denotes the the directional derivative along \((d^{x}_k,d^s_k,d^u_k)\).

To give an example, we consider the the \(l_2\)-merit function

$$\begin{aligned} \Phi _{\mu ,r}(x,s,u) := f(x) - \mu \sum _{i=1}^{m}\log s_{i} + r\left\| g(x)+s\right\| _{2} \end{aligned}$$
(16)

with \(\mu ,r\in I\!R\), see, e.g., Chen and Golfarb [5], and neglect iteration indices for a moment. By replacing \(f(x)\) by \(\frac{1}{2}d^{T}Hd+\nabla f(x)^{T}d\) and \(g(x)\) by \(g(x)+\nabla g(x)d\) and by using \(s+d_s > 0\) as slack variable for (3), we obtain

$$\begin{aligned} \tilde{\Phi }_{\mu ,r}\left( x,s,u,d^x,d^s,d^u\right)&= \nabla f(x)^Td^{x}+\frac{1}{2}{d^{x}}^THd^{x} - \mu \sum _{j=1}^{m} \log \left( s_{j}+d^s_j\right) \nonumber \\&+\, r \left\| g(x)+\nabla g(x)d^{x}+s+d^{s}\right\| _{2}, \end{aligned}$$
(17)

i.e., its counterpart used for solving the quadratic programming subproblem. Another possible merit function is the so-called flexible penalty function of Curtis and Nocedal [6],

$$\begin{aligned} \Phi _{\mu ,r}(x,s,u) = f(x)-\mu \sum _{i=1}^{m}\log s_{i}+\frac{\rho _{1}+\rho _{2}}{2}\rho _{3} + \min \left\{ \begin{array}{c} \rho _{1}(\Vert g(x)+s\Vert _{2}-\rho _{3})\\ \rho _{2}(\Vert g(x)+s\Vert _{2}-\rho _{3}) \end{array} \right\} , \end{aligned}$$
(18)

with \(r=(\rho _1,\rho _2,\rho _3)\), \(\rho _{1} \le \rho _{2}\), and \(\rho _{3}= \Vert g(x+s)\Vert _{2}\).

For \(\rho _{1}=\rho _{2}=r\), (18) and (16) are equivalent. Similar to (17), the counterpart for solving the quadratic programming subproblem is derived. Note that both merit functions do not depend on the multiplier vector \(u \in I\!R^m\) in contrast to, e.g., the augmented Lagrangian merit function used by Schittkowski [21, 22], which has also been implemented.

The algorithm is summarized now as follows:

Algorithm 1

Choose starting values \(x_{0}\), \(u_{0}\), and \(s_{0}\) with \(u_{0}>0\) and \(s_{0}>0\), \(\mu _{0,0}>0\), \(r_0>0\) and some internal constants. For \(k:=0,1,2,\ldots \)

  1. 1.

    Evaluate function and gradient values at \(x_k\), i.e., \(f(x_k)\), \(g(x_k\), \(\nabla f(x_k)\), and \(\nabla g(x_k)\),

  2. 2.

    Check stopping criteria based on the KKT conditions (7). If satisfied, then return.

  3. 3.

    Compute one or more penalty parameters \(r_k\) depending on the merit function under consideration.

  4. 4.

    Choose starting values \(d^{x}_{k,0}\), \(d^{s}_{k,0}>0\), and \(d^{u}_{k,0}>0\).

  5. 5.

    For \(l:=0,1,2,\ldots ,l_{max} \) do.

    1. (a)

      Determine a barrier parameter \(\mu _{k,l}\) and suitable scaling matrices \(B_{k,l}\) and \(C_{k,l}\), e.g., by (10).

    2. (b)

      Solve the primal-dual system of linear equations (9) and determine a step length parameter \(\alpha _{k,l} \in (0,1]\) which satisfies (12) and (14).

    3. (c)

      Compute new internal iterates \(d^{x}_{k,l+1}\), \(d^{s}_{k,l+1}\), and \(d^{u}_{k,l+1}\) by (11).

    4. (d)

      If the termination criteria for the QP (8) are satisfied, i.e., either the norm of the right-hand side of (9) is sufficiently small or \(l=l_{max}\), let \(\mu _k := \mu _{k,l}\), \(d^x_k := d^{x}_{k,l+1}\), \(d^s_k := d^{s}_{k,l+1}\), and \(d^u_k := d^{u}_{k,l+1}\) and break the for-loop.

  6. 6.

    Find a step length \(\alpha _k\) such that the sufficient decrease property (15) is satisfied, e.g., by successive reduction of \(\alpha _k=1\). If necessary, compute new function values.

  7. 7.

    Set \(x_{k+1}:=x_{k}+\alpha _k d^{x}_{k}\), \(s_{k+1}:=s_{k}+\alpha _k d^{s}_{k}\), \(u_{k+1}:=u_{k}+\alpha _k d^{u}_{k}\) and go to step 1.

Here, \(l_{max}>0\) is a given maximum number of iterations of the inner cycle. In principal, \(l_{max}=1\) leads to an IPM and a very large \(l_{max}\), say 100, to an SQP method. The larger the number of variables \(n\) is, the smaller \(l_{max}\) should be chosen. If, on the other hand, function evaluations are very costly or highly nonlinear, is recommended to use a higher number of sub-iterations.

The primal and dual stepsize parameters are always greater than zero and less or equal to one. Note that the feasibility condition (12) and the sufficient descent properties (14) and (15) are always satisfied for a sufficiently small stepsize due to the specific choice of the merit function, the barrier and the penalty parameter, and especially the structure of the primal-dual system (9). This can be achieved, e.g., by successive reduction of \(\alpha _k\) until the corresponding inequalities are satisfied.

The size of the primal-dual system (9) can be reduced by eliminating \(\Delta d_{k,l}^{s}\) to get a smaller reduced KKT system

$$\begin{aligned}&\left( \begin{array}{cc} H_k &{} \nabla g(x_k)^{T}\\ \nabla g(x_k) &{} -D^s_{k,l}{\left( D^u_{k,l}\right) }^{-1}-B_{k,l}\end{array}\right) \left( \begin{array}{c} \Delta d_{k,l}^{x}\\ \Delta d_{k,l}^{u}\end{array}\right) \nonumber \\&\quad +\left( \begin{array}{c} H_k d^{x}_{k,l} + \nabla f(x_k)+\nabla g(x_k)^{T}d^u_{k,l}\\ g(x_k)+\nabla g(x_k)d^{x}_{k,l} + \mu _{k,l} {\left( D^u_{k,l}\right) }^{-1}e - C_{k,l}d^u_{k,l}\end{array}\right) = 0 \end{aligned}$$
(19)

for determining \(\Delta d_{k,l}^{x}\) and \(\Delta d_{k,l}^{u}\). There are several strategies for updating the barrier parameter \(\mu _{k,l}\), e.g., by the Mehrota predictor-corrector method developed originally for linear programming, see Nocedal et al. [18] for the nonlinear programming formulas. In our tests however, we leave the barrier parameter constant in the inner iterations, i.e. \(\mu _{k,l+1}=\mu _{k,l}\) for all \(k,l\in I\!N\). In the outer iterations, we set \(\mu _{k+1,0}=0.1\mu _{k,0}\) whenever the error of the KKT conditions is less than \(5\mu _{k,0}\).

The matrix \(H_k\) in (9) or (19), respectively, could be the Hessian matrix of the Lagrangian function (2), if available. However, to satisfy the sufficient descent properties discussed before and to allow an efficient solution of the system of linear equations (9), \(H_k=\nabla _{xx} L(x_k,u_k)\) is modified by adding positive values to all entries along the diagonal, until \(H_k\) is positive definite.

Alternatively, it is possible to replace the Hessian matrix of the Lagrangian function by a quasi-Newton matrix. Since, however, standard update methods lead to a fill-in, we apply a limited memory BFGS update, see e.g., Liu and Nocedal [8] or Waltz et al. [27]. The idea is to store only the last \(p\) pairs of vectors \(\nabla _{x}L(x_{k+1-i},u_{k-i})-\nabla _{x}L(x_{k-i},u_{k-i})\) and \(x_{k+1-i}-x_{k-i}\) for \(i=0,\ldots ,p-1\) with \(0<p\ll n\). These pairs of vectors are used to implicitly construct the matrix at \(x_{k+1}\) and \(u_{k+1}\). Instead of storing \(O(n^{2})\) double precision numbers for a full update, one has to keep only \(O(pn)\) numbers in memory.

To illustrate limited memory BFGS updates in short, we omit the iteration index \(k\) for simplicity. Now, the matrix has the form \(H=\xi I+NMN^{T}\), where \(\xi >0\) is a scaling factor, \(N\) is a \(n\times 2p\) matrix, and \(M\) is a \(2p\times 2p\) matrix. \(M\) and \(N\) are directly computed from the \(p\) stored pairs of vectors and \(\xi \). To solve the linear system of equations (19) efficiently for different right-hand sides, we write the inverse of the matrix in (19) in a more tractable form

$$\begin{aligned} \,\left[ A+BC^{T}\right] ^{-1}&= \nonumber \left[ \left( \begin{array}{cc} \xi I &{} \nabla g(x)^{T}\\ \nabla g(x) &{} -SU^{-1}\end{array}\right) +\left( \begin{array}{c} N\\ 0\end{array}\right) (\begin{array}{cc} MN^{T}&0 \end{array})\right] ^{-1}\\&= A^{-1}-A^{-1}B(\underbrace{I+C^{T}A^{-1}B}_{\in {I\!R}^{2p\times 2p}})^{-1}C^{T}A^{-1} \end{aligned}$$
(20)

by the Sherman-Morrison-Woodbury formula with

$$\begin{aligned} A:= \left( \begin{array}{cc} \xi I &{} \nabla g(x)^{T}\\ \nabla g(x) &{} -SU^{-1}\end{array}\right) ,\; B:=\left( \begin{array}{c} N\\ 0\end{array}\right) ,\;\ C:=\Big (\begin{array}{cc} MN^{T}&0\end{array}\Big ). \end{aligned}$$

Instead of solving (19), we only have to solve the system \(Cz=b\) several times with different right hand sides. Matrix \(I+V^{T}C^{-1}U\) is only of size \(2p\times 2p\) and can be inverted at negligible costs.

3 Numerical results

The combined IPM/SQP algorithm outlined in the previous section has been implemented in form of a Fortran subroutine called NLPIP, see Sachsenberg and Schittkowski [19]. The flexible merit function (16) is applied together with the regularization matrices (10). We solve all test problems by the same set of tolerances and parameters, which are internally stored as default values. The number of recursive LM-Quasi-Newton updates is \(p=7\) unless defined separately. The KKT-system (5) or any derived system of linear equations is solved either by LAPACK [1] in case of the small test problems and by PARDISO otherwise, see e.g., Schenk and Gärtner [20]. The Fortran codes are compiled by the Intel Visual Fortran Compiler, Version 11.0, 64 bit, under Windows 7 and Intel(R) Core(TM) i7-2720QM CPU, 2.2 GHz, with 8 GB RAM.

3.1 Elliptic optimal control problems with control and state constraints

Maurer and Mittelmann [16, 17] published numerical results to compare some large-scale optimization codes on a certain class of test problems obtained by discretizing semi-linear elliptic optimal control problems with control and state constraints. The two-dimensional elliptic equations are discretized by a scalable rectangular grid of size \(N=100\) in x- and y-direction, where the following abbreviations are used in Table 1:

Table 1 Test results for semilinear elliptic control problems
problem :

test problem identifier,

\(n\) :

number of optimization variables ,

\(m_e\) :

number of equality constraints,

\(n_{func}\) :

number of function evaluations,

\(n_{grad}\) :

number of gradient evaluations,

\(f(x^\star )\) :

objective function value at termination point \(x^\star \)

time :

total CPU time in seconds.

Problems EX1 to EX8 correspond to examples 5.1–5.8 of Maurer and Mittelmann [16] and problems EX9–EX13 to examples 1–5 of Maurer and Mittelmann [17]. All optimal objective function values shown in Table 1 coincide to those presented by the authors of the papers mentioned, besides of some differences caused by different discretization accuracy.

One function evaluation consists of the computation of one objective function value and all constraint function values. Derivatives are available in analytical form and termination accuracy is set to \(10^{-8}\). The number of internal iterations is set to \(l_{max}=1\). We observe rapid convergence within a quite low number of iterations which is not effected by doubling the number of variables for EX9 to EX13.

3.2 Small and dense HS-problems

Moreover, we evaluate the performance of NLPIP on the set of 306 small-scale, but highly nonlinear test problems of Hock and Schittkowski [15, 23], and compare the results to the SQP solver NLPQLP, a dense implementation of an SQP-method, see Schittkowski [21, 22, 24]. The latter reference contains a detailed description of the test environment.

Two-sided differences are used for approximating derivatives and termination accuracy is set to \(10^{-5}\). In Table 2 we present some results for \(p=7\) and \(p=70\). \(n_{succ}\) is the number of successful solutions satisfying KKT conditions subject to a predetermined tolerance of \(10^{-8}\), and approaching the known optimal solution subject to a relative error of one percent. \(n_{func}\) is the number of function evaluations, and \(n_{grad}\) is the number of derivative evaluations, in both cases of objective function and all constraint functions simultaneously. Since many of the test problems are highly nonlinear, the number of internal iterations is set to \(l_{max}=100\). In other words, we apply an SQP method in this case.

Table 2 Average test results for 306 Hock-Schittkowski problems

Called with a larger number of BFGS-updates, NLPIP needs about the same number of iterations and function evaluations compared to NLPQLP. Since the quadratic programming problem is iteratively solved, average computation times are higher than those of NLPQLP, especially due to a large number of limited-memory updates.

3.3 CUTEr collection

A large number of test problems for nonlinear programming has been collected and programmed by Gould, Orban, and Toint [13]. The library is widely used for developing and testing optimization programs, and consists of small- and large-scale problems. Derivatives are available in analytical form. For our purposes, we select 35 test problems with 5000 or more variables. The problems are identified by their internal name. They only possess equality constraints with five exceptions, CAR2 (\(m=4996\)), CORKSCRW (\(m=35{,}000\)), COSHFUN (\(m=2000\)), OPTMASS (\(m=50{,}005\)), and SVANBERG (\(m=50{,}000\)).

Numerical results are listed in Table 3 for termination accuracy \(10^{-8}\) and \(l_{max}=1\). In two cases, the maximum number of iterations (500), is reached. \(g^-(x^\star )\) shows the constraint violation on termination. During two other test runs, the code stops because of more than 20 feasible iterates without reducing the objective function value. Only one iteration is allowed for the solving the quadratic subproblems, i.e., we apply an interior point method.

Table 3 Test results for CUTEr-problems