1 Introduction

In this paper, we consider an equality constrained nonlinear optimization problem \(\mathcal P_{\text {NLP}}\) of the form

$$ \begin{array}{@{}rcl@{}} \min\limits_{\boldsymbol{x}} && f (\boldsymbol{x}) \\ \text{subject to} && g_{j} (\boldsymbol{x}) = 0 , \qquad j = 1, 2, \ldots, m, \\ &&l_{i} \leq x_{i} \leq u_{i}, \qquad i = 1,2, \ldots, n, \end{array} $$
(1)

where f(x) is a real-valued scalar objective function and the gj(x), j = 1,2,⋯ ,m are m equality constraint functions, which depend on the n real (design) variables x = {x1,x2, \(\cdots , x_{n} \}^{T} \in \mathcal X \subset {\mathcal R}^{n}\), with li and ui, respectively, the lower and upper bounds on variable xi. The functions f(x) and gj(x) are assumed to be (at least) once continuously differentiable.

Although many or possibly even most interesting problems in structural optimization are normally formulated using inequality constraints only,Footnote 1 it is convenient to herein use only equality constraint functions, for reasons that will become clear in sections to follow. Accordingly, we will assume that any inequality constraints present may be reformulated as equality constraints, with the aid of so-called slack variables sj. In this approach, an inequality constraint gj(x) ≤ 0 is rewritten as

$$ g_{j} (\boldsymbol{x}) + s_{j} = 0, $$

subject to sj ≥ 0. Similarly, a separable inequality constraint can be rewritten as

$$ g_{ji} (x_{i}) + s_{ji} = 0, $$

subject to sji ≥ 0. This simple technique is not only well known but also often used, even in successful commercial codes, e.g., see Nocedal and Wright (2006) and Conn et al. (1992). For the sake of brevity, we will in the following not even mention the presence of the slack variables sj and sji for inequality constraints; their use is implied.

Arguably, the state-of-the-art in structural optimization is to use a sequential approximate optimization (SAO) algorithm to solve problem \(\mathcal P_{\text {NLP}}\). SAO relies on the iterative solution of a sequence of approximate optimization problems \(\mathcal P_{\mathrm {P}}[k]\), k = 0,1,2,⋯. In turn, the approximate optimization problems, or subproblems, are normally based on approximation functions \(\tilde g_{j} (\boldsymbol {x})\) which have a relatively simple structure, albeit that they may all be nonlinear. Then, at some iterate xk, we obtain continuous primal approximate subproblem \(\mathcal P_{\mathrm {P}}[k]\), written as

$$ \begin{array}{@{}rcl@{}} \min\limits_{\boldsymbol{x}} && \tilde f (\boldsymbol{x}) \\ \text{subject to} && \tilde g_{j} (\boldsymbol{x}) = 0 , \qquad j = 1, 2, \ldots, m , \\ && l_{i} \leq x_{i} \leq u_{i}, \quad i = 1,2, \ldots, n . \end{array} $$
(2)

This primal problem contains n unknowns, m equality constraints, and 2n side or bound constraints (not counting any slack or relaxation variables that may have been introduced). The side constraints are normally handled in an efficient way which does not require additional computational effort.

In SAO, a most notable feature of primal approximate subproblem \(\mathcal P_{\mathrm {P}}[k]\) is that the approximation functions \(\tilde f (\boldsymbol {x})\) and \(\tilde g_{j} (\boldsymbol {x})\) are separable. Possibly surprising at first, this is routinely done since the evaluation and storage of second-order information in structural optimization is considered prohibitively expensive on the computational devices available to us today. Instead, so-called intermediate or intervening variables are relied upon which, when substituted into a linear Taylor series expansion, reveal behavior that is representative of the underlying physics of nonlinear optimization problem \(\mathcal P_{\text {NLP}}\) (assuming that the physics is understood in the first place). The approximations then become linear in the intervening variables used.

In structural optimization for example, the reciprocal intermediate variables \(z_{i}=x_{i}^{-1}\) are important and often used, since they capture the inverse relationship between stress and area well. Invariably, the intermediate variables used themselves are separable; when substituted into a linear or first-order Taylor series expansion, this of course in turn results in separable approximations and hence separable approximate subproblems \(\mathcal P_{\mathrm {P}}[k]\).

Examples of popular algorithms that use separable approximations include the convex linearization algorithm (CONLIN) of Fleury and Braibant (1986) and its generalization, the method of moving asymptotes (MMA) of Svanberg (1987b, 1995). Groenwold and Etman (2011) have proposed the use of separable diagonal quadratic approximations, which rely on the so-called “approximated approximations” approach to capture reciprocal-like behavior (Groenwold et al. 2010). In this approach, the quadratic approximation to the reciprocal approximation itself is constructed, or even the quadratic approximation to the CONLIN and MMA approximations already mentioned.

Given the dominance of separable approximations, it is convenient to rewrite subproblems (2) as

$$ \begin{array}{@{}rcl@{}} \min\limits_{\boldsymbol{x}} && \sum\limits_{i=1}^{n} \tilde f_{i} (x_{i})\\ \text{subject to} && \sum\limits_{i=1}^{n} \tilde g_{ji} (x_{i}) = 0 , \qquad j = 1, 2, \ldots, m , \\ && l_{i} \leq x_{i} \leq u_{i}, \qquad\quad i = 1,2, \ldots, n . \end{array} $$
(3)

It is prudent to here mention that the SAO subproblems used in structural optimization are often solved in the dual space, which is problem-free from a theoretical point of view, if the approximations are convex and separable (although solution of the subproblems may still be demanding). The use of pure dual methods is particularly popular when inequality constraints only are present.

Of course, primal separability does not imply that the associated dual problem often favored in structural optimization will be separable; in general, this is indeed not the case. Hence, in pure dual methods, we have the disappointing situation that even though the primal approximations are all separable, the maximization over the dual variables is not separable. (Notwithstanding, the primal-dual relationships may of course benefit from separability if the approximations are simple enough.)

From a loss of separability point of view, augmented Lagrangian (AL) methods or sequential unconstrained minimization techniques (SUMT) do not help. Consider for example the popular augmented Lagrangian statement due to Rockafellar (1973):

$$ \tilde L_{\rho} (\boldsymbol{x} , \boldsymbol{\mu}) = \tilde f (\boldsymbol{x}) + \sum\limits_{j=1}^{m} \mu_j \tilde g_j (\boldsymbol{x}) + \frac{\rho}{2} \sum\limits_{j=1}^{m} \left( \tilde g_j (\boldsymbol{x}) \right)^2 , $$
(4)

where μj represent the Lagrangian multipliers, \(\tilde g_{j} (\boldsymbol {x}) \) the separable constraint approximation functions, and ρ a nonzero (positive) penalty parameter (it is often actually common practice to use m penalty parameters ρj). For subproblem k, the optimal duo (xk,μk) is found by iteratively minimizing (4) w.r.t. x with ρk and μk fixed, then updating the multipliers μ and the penalty parameter ρ—the latter often conditionally—until convergence occurs on the subproblem level. The multipliers are updated using the famous Hestenes–Powell formula, e.g., see Hestenes (1969) and Powell (1969). The minimization of (4) w.r.t. x may be done using any suitable minimizer that is able to accommodate the bound constraints l, u. Unfortunately, a disappointing if obvious feature of (4) is that the separable nature of primal approximate subproblem \(\mathcal P_{\mathrm {P}}[k]\) is not preserved, due to the effects of the squared terms. Penalty-based SUMT suffer from the same drawback.

Enter the so-called separable augmented Lagrangian algorithm, or SALA, popularized by Hamdi et al. (1997), Hamdi and Mahey (2000), Hamdi (2005a, 2005b, 2005c), Boyd et al. (2010), and many others. The SALA framework preserves the separable nature of the subproblems, and the minimizations over the primal variables xi result in n uncoupled searches, with the obvious feature that this is an embarrassingly parallel operation, while the m dual variable updates are also uncoupled. What is more, the separable nature begs the question whether it is possible to find the primal minimizers in closed-form. For subproblems that are simple enough, this is indeed the case; this includes the important class of separable quadratic programs (QPs).

Although the SALA paradigm has to the author’s knowledge not been applied in structural optimization, they are quite general and are receiving some attention in the mathematical programming community. A SALA may be considered to be an extension of proximal decomposition methods and derive from the class of splitting algorithms of Douglas and Rachford (1956) and Lions and Mercier (1984); they are often known as alternating directions–type methods. An immediate word of warning though: although the SALA framework seems very attractive given the combination of uncoupled updates of both the primal and dual variables with the dominance of separable approximations in optimal structural design, algorithm SALA suffers from sensitivity to a subproblem scaling parameter. To address this, we will herein use an update strategy for this parameter inspired by recent efforts of Lenoir and Mahey (2007) and Boyd et al. (2010), combined with a function and constraint scaling strategy inspired by the ALGENCAN solver (Birgin and Martínez 2007).

What is more, to take full benefit from the separable nature of algorithm SALA, suitable hardware like general purpose graphical processor units (GPGPUs) should be exploited, but we will not do so herein. Without this option, it is not clear if algorithm SALA will be competitive with the classical dual methods currently so popular in optimal structural design. Nevertheless, application of algorithm SALA to problems in optimal structural design is interesting in its own right and, in addition seems to have educational value. Our paper makes a few salient contributions: Algorithm SALA is free from any line search, and the primal and dual updates are embarrassingly parallel. Indeed, the primal and dual variable updates are available in closed-form. The importance of the algorithm using closed-form primal and dual updates cannot be overstated in the parallel context, where search methods can be difficult to implement on a massively parallel scale. Furthermore, the algorithm can easily exploit the use of intervening variables, which are popular and proven in structural optimization.

Our paper is arranged as follows: In Section 2, we present some diagonal quadratic approximations that are necessary for the separability of SALA. In Section 3, we outline the alternating directions–type method that we rely upon. We again emphasize that we rely on (strictly) convex approximations in doing so. We then proceed with numerical experiments in Section 4, where we compare SALA with the well-known ALGENCAN (Birgin and Martínez 2007) and the Gould et al. (2003) LSQP solvers, followed by conclusions and recommendations in Section 5.

2 Some diagonal quadratic approximations

The approximate subproblems used are based on an incomplete series expansion (ISE) (Groenwold and Etman 2008). We construct approximations \(\tilde f(\boldsymbol {x})^{k}\) to the objective function f0(x) and all the constraint functions fj(x) at the point xk, such that

$$ \tilde {f_{j}^{k}} (\boldsymbol{x}) = {f_{j}^{k}} + \boldsymbol{\nabla}^{T} {f_{j}^{k}} \boldsymbol{s} + \frac{1}{2} \boldsymbol{s}^{T} {\boldsymbol{C}^{k}_{j}} \boldsymbol{s} , $$
(5)

with s = (xxk) and each \({\boldsymbol {C}_{j}^{k}}\) an appropriate approximate diagonal Hessian matrix, for j = 0,1,2,…,m. We will occasionally use the abbreviated notation

$$ {f_{j}^{k}} = f_{j} (\boldsymbol{x}^{k}). $$

For the sake of clarity, we rewrite (5) using summation convention as

$$ \tilde {f_{j}^{k}} (\boldsymbol{x}) = {f_{j}^{k}} + \sum\limits_{i=1}^{n} \left( \frac{\partial f_{j} }{\partial x_{i}}\right)^{k} (x_{i} - {x_{i}^{k}}) + \frac{1}{2} \sum\limits_{i=1}^{n} c_{2i_{j}}^{k} (x_{i} - {x_{i}^{k}})^{2}, $$
(6)

with \(c_{2i_{j}}^{k}\) approximate second-order diagonal Hessian terms or curvatures. We will herein consider two very simple instances of (6), in which the the curvatures are chosen as follows:

  1. 1.

    Such that the approximate function value at the previous iterate \(\tilde f^{k-1}\) is equal to the real function value fk− 1 at the previous iterate, being a spherical quadratic approximation (denoted SPH-QDR)

  2. 2.

    Such that the approximation becomes the quadratic approximation to the reciprocal approximation (denoted T2:R), being closely related to the very popular MMA (Svanberg 1987a) approximations

While many other possibilities exist, we only outline the above approximations in Appendix Appendix. For the sake of brevity, the reader is referred to Groenwold et al. (2010) and the references therein for details about some other possibilities.

Since the approximations (6) are (diagonal) quadratic, primal problem \(\mathcal P_{\text {NLP}}\) may trivially be transformed into a sequence of quadratic approximate programs \(\mathcal P_{\text {PQ}}[k]\), written as

  • Quadratic approximate program\(\mathcal P_{\text {PQ}}[k]\)

    $$ \begin{array}{@{}rcl@{}} \min\limits_{\boldsymbol{x}} && \bar {f_{0}^{k}} (\boldsymbol{x}) = {f_{0}^{k}} + \boldsymbol{\nabla}^{T} {f_{0}^{k}} \boldsymbol{s} + \frac{1}{2} \boldsymbol{s}^{T} \boldsymbol{Q}^{k} \boldsymbol{s} \\ \text{subject to} && \bar {f_{j}^{k}} (\boldsymbol{x}) = {f_{j}^{k}} + \boldsymbol{\nabla}^{T} {f_{j}^{k}} \boldsymbol{s} = 0 , \qquad\qquad j =1,2, \dots, m, \\ &&\check x_{i} \leq x_{i} \leq \hat x_{i}, \qquad\qquad\qquad\qquad\qquad i = 1,2, \ldots, n, \end{array} $$
    (7)

    with s = (xxk) and Qk the Hessian matrix of the approximate Lagrangian \(\mathcal L^{k}\). For details, the reader is referred to Etman et al. (2009, 2012). Since the Lagrangian multipliers μk and λk at the solution of subproblem \(\mathcal P_{\text {PQ}}[k]\) are unknown, the multipliers μk and λk are used to construct the quadratic program. Hence,

    $$ Q_{ii}^{k} = c_{2i_0}^{k} + \sum\limits_{j \in \mathcal{E}} \mu_j^{k} c_{2i_j}^{k} + \sum\limits_{j\in \mathcal{I}} \lambda_j^{k} c_{2i_j}^{k}, $$
    (8)

    and \(Q_{id}^{k} = 0 \ \forall \ i \ne d\), i and d = 1,2,…,n. For clarity, both the equality and inequality terms are given in (8), since we do not use slack variables for the inequalities in the Gould et al. (2003) solver LSQP, since this would put this specific solver at a disadvantage.

We require the approximate Lagrangian \(\mathcal L^{k}\) to be (semi) positive definite. Since Qk is diagonal, positive definiteness simply requires the individual diagonal elements \(Q_{ii}^{k}\) to be positive. As \({\mu _{j}^{k}}\), \( c_{2i_{0}}\) and \(c_{2i_{j}}\) in principle are unrestricted in sign, we will enforce

$$ Q_{ii}^{k} = \max(\epsilon > 0 , Q_{ii}^{k} ), $$
(9)

with 𝜖 prescribed and “small.”

3 Alternating directions–type methods

We proceed with a brief outline of the salient features of algorithm SALA; for details, the reader is referred to the cited literature in the mathematical programming community.Footnote 2 Here, we follow closely the presentation of Lenoir and Mahey (2007). In essence, algorithm SALA reformulates separable problem (3) with the help of the allocation subspace

$$ {\mathcal A} = \left\{ y_{ji} \in {\mathcal R}^{m n} / \sum\limits_{i=1}^{n} y_{ji} = 0, j=1,2,\ldots,m \right\}. $$
(10)

Then, the allocation of resource vectors yji = gji(xi), for i = 1,2,…,n and j = 1,2,…,m, results in an embedded formulation of problem (3) with a distributed coupling, written as

$$ \begin{array}{llll} \min\limits_{\boldsymbol{x}} \ & \sum\limits_{i=1}^{n} f_{i} (x_{i}) \\ \text{subject to} \ \ & y_{ji} = g_{ji} (x_{i}) , \\ & y_{ji} \in {\mathcal A}, \\ & l_{i} \leq x_{i} \leq u_{i}, \end{array} $$
(11)

for i = 1,2,…,n and j = 1,2,…,m.

The approximate augmented Lagrangian \(\mathcal L_{\rho }\) with penalty parameter ρ > 0 obtained by associating the multipliers μji with the allocation constraints \(y_{ji} = \tilde g_{ji} (x_{i})\) decomposes into the sum

$$ \tilde {\mathcal L}_{\rho} (\boldsymbol{x},\boldsymbol{y},\boldsymbol{\mu}) = \sum\limits_{i=1}^{n} \tilde {\mathcal L}_{\rho,i} (x_i,y_{ji},\mu_{ji}), $$
(12)

with

$$ \begin{array}{@{}rcl@{}} \tilde {\mathcal L}_{\rho,i} (x_i,y_{ji},\mu_{ji}) &=& \tilde f_{i}(x_i) + \sum\limits_{j=1}^{m} \mu_{ji} (\tilde g_{ji}(x_i) - y_{ji} )\\ &&+ \frac{\rho}{2} \sum\limits_{j=1}^{m} (\tilde g_{ji}(x_i) - y_{ji} )^2. \end{array} $$
(13)

The stationary point of \(\tilde {\mathcal L}_{\rho , i}(x_{i},y_{ji},\mu _{ji})\) is obtained via successive minimizations over xi and yji in a Gauss–Seidel fashion as to exploit the separability of (11). The minimizations in the xi yield the n independent subproblems

$$ \min\limits_{x_{i}} \tilde {\mathcal L}_{\rho,i} (x_{i},y_{ji}^{k},\mu_{ji}^{k}), $$

which can be done in parallel. Since the μji are in \({\mathcal A^{\perp }}\), with \({\mathcal A}\) and \({\mathcal A^{\perp }}\) mutually orthogonal, we obtain the updates for the yji as

$$ y_{ji}^{l+1} = \tilde g_{ji}(x_i^{l+1}) - \frac{1}{n}\left( \tilde g_j(x_i^{l+1}) \right), $$
(14)

again see Lenoir and Mahey, and Boyd for details and proofs.

Subspace \({\mathcal A^{\perp }}\) has the explicit formulation

$$ {\mathcal A^{\perp}} = \left\{ \mu_{ji} \in {\mathcal R}^{m n} / \mu_{j1} = \mu_{j2} = {\ldots} = \mu_{jn}, j = 1,2, \ldots ,m \right\}. $$
(15)

So, at every iteration l, knowledge of the μji reduces to the knowledge of its common component νj = μj1 = μj2 = … = μjn,j = 1,2,…,m, and the update step for the νj becomes

$$ \nu_{j}^{l+1} = {\nu_{j}^{l}} + \frac{\rho}{n}\left( \tilde g_j(x_i^{l+1}) \right), $$
(16)

again see the cited literature for details. The complete resulting algorithm SALA, without the trivial objective function and constraint scaling given in (21), is listed in Algorithm 1. Again note that not only are the n uncoupled minimizations over the xi embarrassingly parallel, updating the yji and νj is also embarrassingly parallel.

We impose a subproblem convergence criteria on both the step sizes made by the primal and dual variables, as well as the maximum number of subproblem evaluations.

figure a

3.1 Closed-form expressions for QP-like problems

If the functions gji are simple enough, the n one-dimensional minimizations over the xi may even be done in closed-form. For a separable QP-like subproblem, this is indeed the case. Let

$$ \tilde f (x_i) = a + b_i (x_i-x_i^k) + \frac{1}{2}c_i (x_i-x_i^k)^2, $$
(17)

with b and c given n-vectors, and

$$ \tilde g_{j} (x_i) = d_{ji} + e_{ji} (x_i-x_i^k), $$
(18)

again with dj and ej given n-vectors. In order to satisfy (3), we have chosen to distribute the j constraint values equally among each of their n separable functions such that

$$ d_{ji} = g_j (\boldsymbol{x}^{k})/n. $$
(19)

We have now resorted to summation convention—a sum over repeated indices in a term is implied, i.e., we sum over i in (17) and (18) above. Here, the ci represent curvature information of the objective and the constraint functions (see Etman et al. (2009, 2012)), and we assume strictly convex primal approximate subproblems \(\mathcal P_{\mathrm {P}}[k]\). Then, the stationary conditions of (13) w.r.t. xi result in

$$ x_i^{l+1} = x_i^{k} - \left( c_i + \rho e_{ji}^2 \right)^{-1} \left( b_i + \nu_j e_{ji} - \rho y_{ji} e_{ji} + \rho d_{ji} e_{ji} \right), $$
(20)

and we sum over j. Note that the denominator cannot vanish, since ci > 0 and ρ > 0 (although eji = 0 is possible in sparse problems). Line 4 in Algorithm 1 may thus be replaced by the very simple closed-form expression (20). The relation between (6), (17), and (18) is clear; also see Etman et al. (2009, 2012).

As mentioned, suitable separable approximations are briefly presented in Section 2. For the sake of clarity, we here again mention that we use a spherical quadratic approximation (denoted SPH-QDR) and a quadratic approximation to the reciprocal approximation (denoted T2:R), being closely related to the very popular MMA (Svanberg 1987a) approximations. For some examples which are truly separable, we also use exact Hessian information.

3.2 Objective and constraint function scaling

The SALA algorithm is highly sensitive to the scaling in the objective function, constraints, and subproblem. We therefore scale both the objective and constraint functions, such that

$$ \begin{array}{llll} \min\limits_{\boldsymbol{x}} \ & w_{f} f (\boldsymbol{x}) \qquad & & \\ \text{subject to} \ \ & w_{c_{j}} g_{j} (\boldsymbol{x}) = 0 , \qquad & j &= 1, 2, \ldots, m, \\ &l_{i} \leq x_{i} \leq u_{i}, \qquad & i &= 1,2, \ldots, n, \end{array} $$
(21)

where the scaling factors, wf and \(w_{c_{j}}\), are given by

$$ \begin{array}{llll} w_{f} &= 1/\max (\| \boldsymbol{\nabla} f (\boldsymbol{x}^{k-1}) \|_{\infty}, 1 ), \\ w_{c_{j}} &= 1/\max (\| \boldsymbol{\nabla} c_{j} (\boldsymbol{x}^{k-1} ) \|_{\infty}, 1 ), \qquad j = 1, 2, \ldots, m, \end{array} $$

with xk denoting the primal values for subproblem k.

3.3 Subproblem scaling

The projected subgradient step given in (16) depends on the step length ρ, with this parameter behaving more like a scaling parameter in the SALA framework than the penalty parameter in classical augmented Lagrangian statements. For example, ρ penalizes the primal coupling constraints, and greater values will accelerate the primal sequence. However, ρ− 1 penalizes the dual sequence so that a compromise value is expected to be optimal, e.g., see Lenoir and Mahey (2007) and Boyd et al. (2010), who elaborate on optimal scaling strategies in some detail. It is possible to use ρi for i = 1,2,…,n scaling parameters for each of the separable subproblems; but this only leads to increased computational costs (Lenoir and Mahey 2007). Furthermore, if the objective and constraint functions are scaled, the n separable problems have similar orders of magnitude, which further reduces the need for individual separable subproblem scaling parameters. The single parameter update is therefore the preferred choice of subproblem scaling for our numerical testing.

We obtained reasonable numerical results when using a combination of the strategies inspired by Lenoir and Mahey (2007) and Boyd et al. (2010).

From Lenoir and Mahey (2007) , let

$$ \beta = \frac{ \| \boldsymbol{\nu}^l - \boldsymbol{\nu}^{l-1} \| } { \|\boldsymbol{y}^l - \boldsymbol{y}^{l-1} \| }. $$
(22)

Then, we update using

$$ \rho \leftarrow \rho^{(1-\alpha)} + \beta^{\alpha}, $$
(23)

or

$$ \rho \leftarrow {(1-\alpha)} \rho + \alpha \beta , $$
(24)

with α small, say 10− 2, to prevent oscillatory behavior. We accept that Lenoir and Mahey (2007) prove that

$$ \sum\limits_{l=1}^{\infty} \alpha_l = S < \infty $$
(25)

is a requirement for theoretical convergence, but for numerical purposes \(l \ll \infty \), resulting in small values of α still achieving practical problem convergence.

We now introduce the update strategy inspired by Boyd et al. (2010), coupled with that from (24). The update then becomes

$$ \rho \leftarrow \left\{\begin{array}{lllll} \sqrt{n} \rho & \text{if } \| \boldsymbol{x}^l - \boldsymbol{x}^{l-1} \| > n \| \boldsymbol{\nu}^l - \boldsymbol{\nu}^{l-1} \| \\ \rho / \sqrt{n} & \text{if } \| \boldsymbol{\nu}^l - \boldsymbol{\nu}^{l-1} \| > n \| \boldsymbol{x}^l - \boldsymbol{x}^{l-1} \| \\ (23) \text{ or } (24) & \text{otherwise.} \end{array}\right. $$
(26)

The update strategy proposed by (23) and (24) takes advantage of second-order problem information (Lenoir and Mahey 2007), which in our numerical testing led to higher levels of accuracy compared with a strategy that purely focused on keeping the primal and dual residual norms within a factor of each other. The downside, however, is the possibility of ρ quickly jumping to unsuitably high values should the resource allocation vector and dual residual norms be of different orders of magnitude. Furthermore, it is difficult to reduce the value of the scaling parameter because of the low values of α that have to be chosen in order to ensure convergence. We combine the two different strategies in the hope of ensuring relatively equal convergence of both the primal and dual sequences, with the advantage of greater levels of accuracy.

4 Numerical experiments

We use the QP form given in (6) in Section 2 to construct an approximate augmented Lagrangian for both the ALGENCAN and SALA solvers. For SALA, the augmented Lagrangian is then decomposed further into the n separable Lagrangian problems given in (13). We reiterate that the approximations used for the Hessian are either the spherical diagonal quadratic approximation (SPH-QDR) or the quadratic approximation to the reciprocal approximation (T2:R). We also compare our algorithms with the state-of-the-art Gould et al. (2003) solver LSQP. The latter solver is Taylor-made for linear or diagonal quadratic subproblems.

All numerical results for SALA use the closed-form update of the primal variables given in (20). The step length ρ is updated using (24) and (26) with α = 10− 2. The maximum number of subproblem evaluations allowed is \(l_{{\max \nolimits }} = 5 \times 10^{4}\) with a subproblem convergence tolerance of 𝜖 = 10− 6. For each subproblem k, we initialize at l = 0 as follows: for values of k > 0, ρ0 = 1, \(y_{ji}^{0} = y_{ji}^{k-1}\), and \({\nu _{j}^{0}} = \nu _{j}^{k-1}\). Else, at k = 0, y = ν = 0. A maximum of k = 500 outer iterations was allowed, after which the algorithm was stopped, and we indicate failure to converge by “—” in the numerical results to follow. The absolute values of the Lagrangian multipliers were bounded to not exceed 106, selected rather arbitrarily, to prevent numerical instabilities.

Default settings are used for the ALGENCAN solver, except that we use epsfeas = epsopt = 10− 7, where epsfeas and epsopt are parameters in ALGENCAN. For Galahad’s LSQP solver, default settings were used. We now introduce the absolute maximum constraint violation h, the norm of the KKT conditions \(\mathcal K\), and the number of function evaluations Nf required for termination. For all the solvers, problem execution was terminated when all of three conditions were met, namely, the Euclidean or 2-norm ∥xkxk− 12 ≤ 10− 4, h ≤ 10− 4, and \(\mathcal K \le 10^{-3}\). We chose convergence tolerances of modest accuracy, as ADMM is usually slow to converge to high accuracy (Boyd et al. 2010). The test problems used are tabulated in Table 1, and we present the results in Table 2.

The results are not overly surprising; sometimes ALGENCAN performs slightly better than SALA, and vice versa. This happens notwithstanding the fact that the two algorithms solve a sequence of identical subproblems, reminiscent of the no-free-lunch (NFL) theorems of optimization (Wolpert and Macready 1997). Arguably, the main reason for this is in all probability that both algorithms are known to be sensitive to scaling, albeit that augmented Lagrangian methods are probably less so than ADMM-type methods. Arguably, Galahad’s LSQP solver sets the tone. (Note that \(\mathcal K\) is smaller for LSQP than for the other solvers; notwithstanding that the same stopping criteria were used.)

The results for Svanberg’s snake problem are worthy of special attention: for this problem, none of the algorithms converged. We have on purpose included these results to highlight some limitations of the methods studied. It is pertinent to point out that this problem is very “difficult.” The snake problem consists of a very thin feasible slither in n-dimensional space. It is also of note to mention that arguably the most popular algorithms in structural optimization, namely, CONLIN and MMA, also have difficulties solving some of the test problems. CONLIN for example fails when applied to the snake problem, whereas MMA fails for Fleury’s weight minimization problem, all in the spirit of NFL.

5 Conclusions and recommendations

We have proposed an iterative separable augmented Lagrangian algorithm (SALA) for optimal structural design. The algorithm solves a sequence of separable quadratic-like programs, able to capture reciprocal- and exponential-like behavior, which is desirable in structural optimization. A salient feature of the algorithm is that the primal and dual variable updates are all updated using closed-form expressions.

For further reading, the reader is referred to the monograph by Boyd et al. (2010) and some of the references mentioned therein, for instance Bertsekas.Footnote 3 The first five chapters of the monograph by Boyd et al. give an overview of the main ADMM concepts and methods, and they also treat the special case of the QP and separable objective and constraints. Since algorithms like SALA are known to be very sensitive to scaling, we have proposed a scaling method inspired by the well-known ALGENCAN algorithm. Numerical results for SALA and ALGENCAN suggest that the algorithms perform quite similar. Having said this: “optimal” scaling of the algorithm (and related algorithms) remains an open issue.

Indeed, the sensitivity of SALA to scaling and different parameter settings may well be its biggest drawback. Having said this, even “classical” AL statements are well known to be prone to this, and the same even applies for penalty methods. To take this argument even further, “older” SQP formulations that depended on a merit acceptance function revealed the same difficulties.

The scaling issue posed by SALA is of such complexity that Lenoir and Mahey (2007) dedicated an entire paper to this issue. Interested readers may find an in-depth analysis of the effect of different scaling strategies therein. As mentioned before, scaling is an open-ended problem that will require further investigation but is beyond the scope of this paper.

Although we do not exploit this feature herein, the primal and dual updates in SALA are both embarrassingly parallel, which makes the algorithm suitable for implementation on massively parallel computational devices, including general purpose graphical processor units (GPGPUs). For some problems, a relatively slow ADMM iteration process leads to increased computational effort on the subproblem level, when compared with more traditional SQP methods. We noted this in our numerical testing, as ALGENCAN and LSQP sometimes required less CPU time than a single threaded SALA implementation. However, SALA is theoretically divisible in problem time by at least n, meaning that the advantage over traditional SQP methods should scale with increased problem size. (Problem dimensionality will of course also have to be high enough to offset the overhead computational costs associated with parallel computing.) We hope to demonstrate the possible advantages in the near future.

Finally, while we have used only two approximations herein; many different methods for obtaining the diagonal approximate higher order curvatures may be used. Possibilities include quasi-Cauchy updates (Duysinx et al. 1995), and quadratic approximations to suitable intervening variables (Groenwold et al. 2010; Groenwold and Etman 2010). The last possibility includes the quadratic approximation to the reciprocal and exponential approximations and even the quadratic approximations to the CONLIN and MMA approximations.

6 Replication of results

Our SALA algorithm is part of an optimization suite that is propriety for the time being; we can therefore not share code or code fragments in a meaningful way.