Abstract
Compact and efficient Matlab implementations of compliance topology optimization (TO) for 2D and 3D continua are given, consisting of 99 and 125 lines respectively. On discretizations ranging from 3 ⋅ 104 to 4.8 ⋅ 105 elements, the 2D version, named top99neo, shows speedups from 2.55 to 5.5 times compared to the well-known top88 code of Andreassen et al. (Struct Multidiscip Optim 43(1):1–16, 2011). The 3D version, named top3D125, is the most compact and efficient Matlab implementation for 3D TO to date, showing a speedup of 1.9 times compared to the code of Amir et al. (Struct Multidiscip Optim 49(5):815–829, 2014), on a discretization with 2.2 ⋅ 105 elements. For both codes, improvements are due to much more efficient procedures for the assembly and implementation of filters and shortcuts in the design update step. The use of an acceleration strategy, yielding major cuts in the overall computational time, is also discussed, stressing its easy integration within the basic codes.
Avoid common mistakes on your manuscript.
1 Introduction
The celebrated top99 Matlab code developed by Sigmund (2001) has certainly promoted the spreading of topology optimization among engineers and researchers, and the speedups carried by its heir, top88 (Andreassen et al. 2011), substantially increased the scale of examples that can be solved on a laptop.
On these footprints, several other codes have followed, involving extension to 3D problems (Liu and Tovar 2014; Amir et al. 2014), material design (Andreassen and Andreasen 2014; Xia and Breitkopf 2015), level–set parametrizations (Wang 2007; Challis 2010), use of advanced discretization techniques (Talischi et al. 2012; Suresh 2010; Sanders et al. 2018), or integration of TO within some finite element frameworks.
With the evolution of TO and its application to more and more challenging problems, implementations in top88 may have become outdated. Also, Matlab has improved in the last decade. Hence, we believe it is time to present a new “exemplary” code collecting shortcuts and speedups, allowing to tackle medium-/large-scale TO problems efficiently on a laptop. Preconditioned iterative solvers, applied for example in Amir and Sigmund (2011), Amir et al. (2014), Ferrari et al. (2018) and Ferrari and Sigmund (2020), allow the solution of the state equation with nearly optimal efficiency (Saad 1992). Thus, the computational bottleneck has been shifted on other operations, such as the matrix assembly or the repeated application of filters. Efficiency improvements for these operations were touched upon by Andreassen et al. (2011), however, without giving a quantitative analysis about time and memory savings.
Here, we provide compact Matlab codes for minimum compliance topology optimization of 2D and 3D continua which show a substantial speedup compared to the top88 code. We include several extensions by default, such as specification of passive domains, a volume-preserving density projection (Guest et al. 2004; Wang et al. 2011) and continuation strategies for the penalization and projection parameters in a very compact, yet sharp, implementation. Coincidentally, the new 2D TO implementation consists of 99 lines of code and is thus named top99neo. We also show how to include an acceleration technique recently investigated for TO by Li et al. (2020), with a few extra lines of code and potentially carrying major speedups. Changes needed for the extension to 3D problems are remarkably small, making the corresponding code (top3D125) the most compact and efficient Matlab implementation for 3D compliance TO to date.
Our primary goal is not to present innovative new research. Rather, we aim at sharing some shortcuts and speedups that we have noticed through time, to the benefit of the research community. Improvements introduced by the present codes will be much useful also on more advanced problems, such as buckling optimization, which will be dealt with in an upcoming work.
The paper is organized as follows. In Section 2, we recall the setting of TO for minimum compliance. Section 3 is devoted to describe the overall structure of the 2D code, focusing on differences with respect to top88. Sections 3.1–3.5 give insights about the main speedups and show performance improvements with respect to top88. The very few changes needed for the 3D code are listed in Section 4, where an example is presented and the efficiency is compared to the previous code from Amir et al. (2014). Some final remarks are given in Section 5. Appendix A gives some details about the redesigns step that are useful for better understanding a method proposed in Section 3.2 and the Matlab codes are listed in Appendices B and C.
2 Problem formulation and solution scheme
We consider a 2D/3D discretization Ωh consisting of m equi-sized quadrilateral elements Ωe. Hereafter we denote by n the global number of Degrees of Freedom (DOFs) in the discretization and by d the number of (local) DOFs of each element.
Let x = {xe}e= 1:m ∈ [0,1]m be partitioned between \(\mathbf {x}_{\mathcal {A}}\) and \(\mathbf {x}_{\mathcal {P}}\), the sets of active (design) variables and passive elements, respectively. The latter may be further split in the sets of passive solid \(\mathcal {P}_{1}\) (xe = 1) and void \(\mathcal {P}_{0}\) (xe = 0) elements, of cardinalities \(m_{\mathcal {P}_{1}}\) and \(m_{\mathcal {P}_{0}}\), respectively (see Fig. 1a).
The set of physical variables \(\hat {\mathbf {x}}_{\mathcal {A}} = {\mathscr{H}}(\tilde {\mathbf {x}})\) are defined by the relaxed Heaviside projection (Wang et al. 2011)
with threshhold η and sharpness factor β, where \(\tilde {\mathbf {x}} = H\mathbf {x}\) is the filtered field, obtained by the linear operator
where \(\mathcal {N}_{e} = \{ i \mid \text {dist}({\Omega }_{i}, {\Omega }_{e} ) \leq r_{\min \limits } \}\) and \(h_{e,i} = \max \limits (0, r_{\min \limits } - \text {dist}({\Omega }_{i}, {\Omega }_{e} ) )\).
Given a load vector \(\mathbf {f}\in \mathbb {R}^{n}\) and the volume fraction f ∈ (0,1), we consider the optimization problem
for the minimization of compliance \(c\left (\hat {\mathbf {x}}\right ) = \mathbf {u}^{T}\mathbf {f}\) with an upper bound on the overall volume
Problem (3) is solved with a nested iterative loop. At each iteration, the displacement u is computed by solving the equilibrium problem
where the stiffness matrix \(K = K(\hat {\mathbf {x}})\) depends on the physical variables through a SIMP interpolation (Bendsøe and Sigmund 1999) of the Young modulus
with E0 and \(E_{\min \limits }\) the moduli of solid and void (\(E_{\min \limits } \ll E_{0}\)). The gradients of compliance and structural volume with respect to \(\hat {\mathbf {x}}\) read (χe = 1 if \(e\in \mathcal {A}\) and 0 otherwise and 1m is the identity vector of dimension m)
and the sensitivities with respect to the design variables are recovered as
where ⊙ represents the element-wise multiplication and
The active design variables \(e\in \mathcal {A}\) are then updated by the optimality criterion rule (Sigmund 2001)
where \(\delta _{-} = \max \limits (0, x_{k, e} - \mu )\), \(\delta _{+} = \min \limits (1, x_{k, e} + \mu )\), for the fixed move limit μ ∈ (0,1) and
depends on the element sensitivities.
In (11), \(\tilde {\lambda }_{k}\) is the approximation to the current Lagrange multiplier \(\lambda ^{\ast }_{k}\) associated with the volume constraint. This is obtained by imposing \(V(\hat {\mathbf {x}}_{k+1}(\tilde {\lambda })) - f|{\Omega }_{h}| \approx 0\), e.g., by bisection on an interval \({\Lambda }^{(0)}_{k} \supset \lambda ^{\ast }_{k}\).
3 Matlab implementation and speedups
The Matlab routine for 2D problems (see Appendix B) is called with the following arguments
where nelx and nely define the physical dimensions and the mesh resolution, volfrac is the allowed volume fraction on the overall domain (i.e., \(\mathcal {A}\cup \mathcal {P}\)), penal the penalization used in (6), and rmin the filter radius for (2). The parameter ft is used to select the filtering scheme: density filtering alone if ft= 1, whereas ft= 2 or ft= 3 also allows the projection (1), with eta and beta as parameters. ftBC specifies the filter boundary conditions ('N' for zero- Neumann or 'D' for zero-Dirichlect), move is the move limit used in the OC update and maxit sets the maximum number of redesign steps.
The routine is organized in a set of operations which are performed only once and the loop for the TO iterative redesign. The initializing operations are grouped as follows
and below we give details only about parameters and instructions not found in the top88 code.
To apply continuation on the generic parameter “par,” a data structure is defined
such that the continuation starts when loop=istart and the parameter is increased by deltaPar each isteps, up to the value maxPar. This is implemented in Lines 6 and 7 for the penalization parameter p and the projection factor β, respectively. The update is then performed, by the instruction (see Line 92)
making use of compact logical operations. Continuation can be switched off, e.g., by setting maxPar<=par, or istart>=maxit.
The blocks defining the discretization (PRE.2)) contain some changes compared to top88. The number of elements (nEl), DOFs (nDof), and the set of node numbers (nodeNrs) are defined explicitly, to ease and shorten some following instructions. The setup of indices iK and jK, used for the sparse assembly, is performed in Lines 15–21 and follows the concept detailed in Section 3.1. The coefficients of the lower diagonal part of the elemental stiffness matrix are defined in vectorized form, such that \(\texttt {Ke} = \mathcal {V}(K^{(s)}_{e})\) (see Lines 22–26). Ke is used for the assembly strategy described in Section 3.1. However, in Lines 27–29, we also recover the complete elemental matrix (Ke0), used to perform the double product \(\mathbf {u}^{T}_{e}K_{e}\mathbf {u}_{e}\) when computing the compliance sensitivity (7). Although this could also be written in terms of the matrix \(K^{(s)}_{e}\) only, this option would increase the number of matrix/vector multiplications.
In PRE.3), the user can specify the set of restrained (fixed) and loaded (lcDof) DOFs and passive regions (\(\mathcal {P}_{1} \leftrightarrow \texttt {pasS}\) and \(\mathcal {P}_{0} \leftrightarrow \texttt {pasV}\)) for the given configuration. Supports and loads are defined as in the top88 code, whereas passive domains may be specified targeting a set of column and rows from the array elNrs. Independently of the particular example, Lines 34–36 define the vector of applied loads, the set of free DOFs, and the sets of active \(\mathcal {A} \leftrightarrow \texttt {act}\) design variables.
In order to make the code more compact and readable, operations which are repeatedly performed within the TO optimization loop are defined through inline functions in PRE.4) (Lines 38–43). The filter operator is built in PRE.5) making use of the built-in Matlab function imfilter, which represents a much more efficient alternative to the explicit construction of the neighboring array. A similar approach was already outlined by Andreassen et al. (2011), pointing to the Matlab function conv2, which is however not completely equivalent to the original operator, as it only allows zero-Dirichlet boundary conditions for the convolution operator. Here, we choose imfilter, which is essentially as efficient as conv2, but gives the flexibility to specify zero-Dirichlet (default option), or zero-Neumann boundary conditions.
Some final initializations and allocations are performed in PRE.6). The design variables are initialized with the modified volume fraction, accounting for the passive domains (Line 52–53) and the constant volume sensitivity (7) is computed in Line 51.
Within the redesign loop, the following five blocks of operations are repeatedly performed
In block RL.1), the physical field is obtained, applying the density filter and, if selected, also the projection. If ft= 3, the special value of the threshold eta giving a volume-preserving projection is computed, as discussed in Section 3.2.
The stiffness interpolation and its derivative (sK, dsK) are defined, and the stiffness matrix is assembled (see Lines 73–76). Ideally, one could also get rid of Lines 73–74 and directly define sK in Line 75 and dsK within Line 79. However, we decide to keep these operations apart, enhancing the readability of the code and to ease the specification of different interpolation schemes. Equation (5) is solved on Line 77 using the Matlab function decomposition, which can work with only half of the stiffness matrix (see Section 3.1). The sensitivity of compliance is computed, and the backfiltering operations (8) are performed in RL.3).
The update (10), with the nested application of the bisection process for finding \(\tilde {\lambda }_{k}\), is implemented in RL.4) (Lines 86–91), and we remark that lm represents \(\sqrt {\lambda }\).
Some information about the process is printed and the current design is plotted in RL.5) (Lines 94–97). On small discretizations, repeated plotting operations absorb a significant fraction of the CPU time (e.g., 15% for m = 4800). Therefore, one might just plot the final design, moving Lines 96–97 outside the redesign loop.
The tests in the following have been run on a laptop equipped with an Intel(R) Core(TM) i7-5500U@2.40-GHz CPU, 15 GB of RAM, and Matlab 2018b running in serial mode under Ubuntu 18.04 (but a similar performance is expected in Windows setups). We will often refer to the half MBB beam example (see Fig. 2) for numerical testing. Unless stated otherwise, we choose Ωh = 300 × 100, f = 0.5, and \(r_{\min \limits } = 8.75\) (Sigmund 2007). The load, having total magnitude |q| = 1 is applied to the first node. No passive domains are introduced for this example; therefore, pasS=[];, pasV=[]; and we set E1 = 1, E0 = 10− 9, and ν = 0.3 in all the tests.
3.1 Speedup of the assembly operation
In top88, the assembly of the global stiffness matrix is performed by the built-in Matlab function sparse
where \(\texttt {sK}\in \mathbb {R}^{m*d^{2}\times 1}\) collects the coefficients of all the elemental matrices in a column-wise vectorized form (i.e., \(\mathcal {V}(K_{e})\)) and iK and jK are the sets of indices mapping each sK(i) to the global location K(iK(i),jK(i)).
These two sets are set up through the operations
where C[m×d] is the connectivity matrix and “⊗” is the Kronecker product (Horn and Johnson 2012). The size of the array \(\mathcal {I} = [ \texttt {iK}, \texttt {jK} ] \in \mathbb {N}^{m*d^{2}\times 2}\) grows very quickly with the number of elements m, especially for 3D discretizations (see Table 1), and even though its elements are integers, the sparse function requires them to be specified as double precision numbers. The corresponding memory burden slows down the assembly process and restricts the size of problems workable on a laptop.
The efficiency of the assembly can be substantially improved by
-
1.
Acknowledging the symmetry of both Ke and K
-
2.
Using an assembly routine working with iK and jK specified as integers
To understand how to take advantage of the symmetry of matrices, we refer to Fig. 1b and to the connectivity matrix C. Each coefficient \(C_{ej}\in \mathbb {N}\) addresses the global DOF targeted by the j th local DOF of element e. Therefore, (12) explicitly reads
where \(\mathbf {c}_{e} = \left \{ c_{e1}, c_{e2}, \ldots , c_{ed} \right \}\) is the row corresponding to element e.
If we only consider the coefficients of the (lower) symmetric part of the elemental matrix \(K^{(s)}_{e}\) and their locations into the global one K(s), the set of indices can be reduced to
and the overall indexing array becomes \(\mathcal {I}_{r} = [\texttt {iK},\texttt {jK}] \in \mathbb {N}^{\tilde {d}*m\times 2}\) where \(\tilde {d} = {\sum }^{d}_{j=1} {\sum }_{i\leq j} i\). The entries of the indexing array and the memory usage are reduced by approx. 45% (see Table 1).
The set of indices (14) can be constructed by the following instructions (see Lines 15–21)
which can be adapted to any isoparametric 2D/3D element just by changing accordingly the number d of elemental DOFs. In the attached scripts, based on 4-noded bilinear Q4 and 8-noded trilinear H8 elements, we set d = 8 and d = 24, respectively. The last instruction sorts the indices as iKr(i) > jKr(i), such that K(s) contains only sub-diagonal terms.
The syntax K=sparse(iK,jK,sK) now returns the lower triangular matrix K(s) and we remark that the full operator can be recovered by
which costs as much as the averaging operation \(\frac {1}{2}(K+K^{T})\), performed in top88 to get rid of roundoff errors. However, the Matlab built-in Cholesky solver and the corresponding decomposition routine can use just K(s), if called with the option 'lower'.
Point 2 gives the most dramatic improvement, and can be accomplished by using routines developed by independent researchers. The sparse2 function, from Suite Sparse (Davis 2019), was already pointed out by Andreassen et al. (2011) as a better alternative to the built-in Matlab sparse; however, no quantitative comparisons were performed. According to the CHOLMOD reference manual (Davis 2009), sparse2 works exactly as sparse, but allowing the indices iK and jK to be specified as integers (accomplished by defining this type for the connectivity matrix, see Lines 11 and 13).
Here we suggest the “fsparse” routine, developed by Engblom and Lukarski (2016). Besides working with integers iK and jK, the function enhances the efficiency of the sparse assembly by a better sorting of the operations. From our experience on a single core process, fsparse gives a speedup of 170–250% compared to sparse2, and is also highly parallelizable (Engblom and Lukarski 2016). Defining the sets ik and jk as int32 type, we can drastically cut the memory requirements, still representing n ≈ 2.1 ⋅ 109 numbers, far beyond the size of problems one can tackle in Matlab.
In order to use fsparse, one needs to download the “stenglib” libraryFootnote 1 and follow the installation instructions in the README.md file. The packages of the library can be installed by running the “makeall.m” file. As fsparse is contained within the folder “Fast,” one may only select this folder when running makeall.m.
We test the efficiency of the assembly approaches on 2D and 3D uniform discretizations with m2 and m3 elements, respectively. Figure 3 shows time scalings for the different strategies: “F” corresponds to the assembly in top88, “H” takes advantage of the matrix symmetry only and “H, fsparse” correponds to the use of the fsparse routine (Engblom and Lukarski 2016) also. All the approaches exhibit a linear scaling of CPU time w.r.t the DOFs number. However, half the CPU time can be cut just by assembling K(s) (strategy H, sparse). Therefore, we definitely recommend this to users who aim to solve medium-size (105 to 106 DOFs) structural TO problems on a laptop. However, the most substantial savings follow from using fsparse (Engblom and Lukarski 2016) and by coupling these two strategies (H, fsparse) speedups of 10 for the 2D and 15 for 3D setting can be achieved. It is worth to highlight that a 3D stiffness matrix of the size of ≈ 9 ⋅ 105 can be assembled in less than a second and even one of size 6.2 ⋅ 106 can be assembled on a laptop in less than 10s. For this last case, the sole storage of the arrays iK, jK, and sK would cause a memory overflow, ruling out the “F” approach.
3.2 Speedup of the OC update
The cost of the redesign step \(\mathbf {x}_{k+1} = \mathcal {U}(\mathbf {x}_{k})\) is proportional to the number of bisections (nbs) required for computing the approximation \(\tilde {\lambda }_{k} \approx \lambda ^{\ast }_{k}\). The following estimate (Quarteroni et al. 2000)
is a lower bound to this number for a given accuracy \(\tau > |\lambda ^{\ast }_{k} - \tilde {\lambda }_{k}|\) and it is clear that nbs would decrease if Λ(0), the initial guess for the interval bracketing \(\lambda ^{\ast }_{k}\), could be shrunk. Moreover, the volume constraint should be imposed on the physical field (\(\tilde {\mathbf {x}}\) or \(\hat {\mathbf {x}}\)) and, in the original top88 implementation, this requires a filter application at each bisection step, which may become expensive.
The efficiency of the redesign step can be improved by a two-step strategy
-
1.
Using volume-preserving filtering schemes
-
2.
Estimating the interval \({\Lambda }^{(0)}_{k}\) bracketing the current Lagrange multiplier \(\lambda ^{\ast }_{k}\)
Concerning point 1, the density filter is naturally volume-preserving (i.e., \(V(\mathbf {x}_{k}) = V(\tilde {\mathbf {x}}_{k})\)) (Bourdin 2001; Bruns and Tortorelli 2001). Therefore, the volume constraint can be enforced on V (xk) as long as the density filter alone is considered (ft= 1). The relaxed Heaviside projection (1), on the other hand, is not volume-preserving for any η; thus, it would require one filter-and-projection application at each bisection step. However, (1) can also be made volume-preserving by computing, for each \(\tilde {\mathbf {x}}_{k}\), the threshhold \(\eta ^{\ast }_{k}\) such that (Xu et al. 2010; Li and Khandelwal 2015)
This can be done, e.g., by the Newton method, starting from the last computed \(\eta ^{\ast }_{k-1}\) and provided the derivative of (1) with respect to η
Existence of η∗∈ [0,1] for all \(\tilde {\mathbf {x}} \in [0, 1]^{m}\) follows from the fact that \(g(\eta ) = V(\hat {\mathbf {x}}(\eta ))-V(\tilde {\mathbf {x}})\) is continuous on [0,1] and g(0)g(1) ≤ 0; uniqueness follows from the fact that \(\frac {\partial g}{\partial \eta } < 0\) for all η ∈ (0,1).
Numerical tests on the MBB beam show that generally \(\eta ^{\ast }_{k}\in \left [ 0.4, 0.52\right ]\), the larger variability occurring for low volume fractions (see Fig. 4a). We also observe that \(\eta ^{\ast }_{k}\) takes values slightly above 0.5 when \(r_{\min \limits }\) is increased or β is raised. Convergence to \(\eta ^{\ast }_{k}\) is generally attained in 1–2 Newton iterations (see Fig. 4a).
The procedure for computing \(\eta ^{\ast }_{k}\) from (17), with tolerance 𝜖 = 10− 6 and initial guess η0 = eta, provided by the user, is implemented in Lines 63–67, that are executed if the routine top99neo is called with the parameter ft= 3. Otherwise, if ft= 2, the input threshhold eta is kept fixed. In case of the latter, the volume constraint should be consistently applied on \(V(\hat {\mathbf {x}})\); otherwise, some violation or over-shooting of the constraint will happen. In particular, if the volume constraint is imposed on x and η is kept fixed, one has \(V(\hat {\mathbf {x}}) > f |{\Omega }_{h}|\), if η < 0.5, and \(V(\hat {\mathbf {x}}) < f |{\Omega }_{h}|\), if η > 0.5.
Even tough we usually observed small differences, these may result in local optima or bad designs, especially for low volume fractions or high β values. Therefore, accounting for this more general situation Lines 87–91 should be replaced by the following
However, there could be other situations when one cannot rely on volume-preserving filters (e.g., when imposing length scale through robust design). Therefore, a more general strategy to reduce the cost of the OC update is to cut the number of bisection steps.
To this end, the selection of the initial bracketing interval \({\Lambda }^{(0)}_{k}\) may build upon the upper bound estimate for \(\lambda ^{\ast }_{k}\) (Hestenes 1969; Arora et al. 1991)
More details on the derivation of (19) are given in Appendix A. The behavior of the estimate (19) is shown in Fig. 4b for the MBB example. The overall number of bisections (nbs) in order to compute \(\lambda ^{\ast }_{k}\) meeting the tolerance τ = 10− 8 when considering \({\Lambda }^{(0)}_{k} = [0, \lambda ^{\ast }_{k}]\) is cut by about 50%, compared with the one required by starting from Λ(0) = [0,109] as in top88. Moreover, if no projection is applied, (19) could be used together with (10) to perform an explicit Primal-Dual iteration to compute \((\mathbf {x}_{k+1}, \lambda ^{\ast }_{k})\) and this would reduce the number of steps even more (see green curve in Fig. 4b).
However, in the basic versions of the codes, given in Appendices B and Appendix, we consider the bisection process and (19) is used to bracket the search interval, as this procedure is more general.
3.3 Acceleration of the OC iteration
The update rule (10) resembles a fixed-point (FP) iteration \(\mathbf {x}_{k+1} = \mathcal {U}(\mathbf {x}_{k})\), generating a sequence {xk} converging to a point such that \(\mathbf {r} = \mathcal {U}(\mathbf {x}^{\ast })-\mathbf {x}^{\ast } = \mathbf {0}\).
Several methods are available to speedup the convergence of such a sequence (Brezinski and Chehab 1998; Ramiere and Helfer 2015), somehow belonging to the family of quasi-Newton methods (Eyert 1996). The acceleration proposed by Anderson (1965), for instance, is nowadays experiencing a renewed interest (Fang and Saad 2009; Pratapa et al. 2016; Peng et al. 2018) and has recently been applied to TO by Li et al. (2020).
Anderson acceleration takes into account the residuals ri, their differences Δri = ri+ 1 −ri and the differences of the updates Δxi = xi+ 1 −xi for the last mr iterations (i.e. i = k − mr,…, k − 1), and obtains the new element of the vector sequence as
where ζ ∈ [0,1] is a damping coefficient and
The coefficients \(\gamma ^{(k)}_{i}\) minimize the following
The rationale behind the method is to compute a rank-mr update of the inverse Jacobian matrix \(J^{-1}_{k}\) of the nonlinear system rk = 0. This has been shown to be equivalent to a multi-secant Broyden method (Eyert 1996; Fang and Saad 2009) starting from \(J^{-1}_{0} = -\zeta I\).
The update rule (20) is usually applied only once each q steps. Thus, we can write more generally xk+ 1 = xk + zk, where (Pratapa et al. 2016)
(α ∈ (0,1)) obtaining the so-called periodic Anderson extrapolation (PAE) (Pratapa et al. 2016; Li et al. 2020).
The implementation can be obtained, e.g., by adding the following few lines after the OC step (Line 91)
where the part solving (22) and the update has been put in a separate routine for better efficiency.
In the above, we use the “∖” for solving the least squares problem (22); however, strategies based on a QR (or SVD) decomposition may be preferred in terms of numerical stability. We refer to Fang and Saad (2009) for a deeper discussion on this point.
In order to assess the effect of different filtering schemes and the introduction of parameter continuation, Anderson acceleration is tested on the MBB example considering the following options
-
T1
Density filter alone, p = 3;
-
T2
Density-and-projection filter, with η∗ computed from (17) and β = 2
-
T3
As T2, but with continuation on both β and p, defined by the parameters betaCnt={250,16,25,2} and penalCnt={50,3,25,0.25}
-
T4
As T2, but for the discretization Ωh = 600 × 200
For all the cases, the TO loop stops when \(\|\mathbf {r}_{k}\|_{2}/\sqrt {m} < 10^{-6}\), where the residual is defined with respect to the physical variables (i.e., \(\mathbf {r}_{k} = \tilde {\mathbf {x}}_{k} - \tilde {\mathbf {x}}_{k-1}\) for T1 and \(\mathbf {r}_{k} = \hat {\mathbf {x}}_{k} - \hat {\mathbf {x}}_{k-1}\) for T2–T4). The acceleration is applied each q = 4 steps, considering the last mr = 4 residuals, starting from iteration q0 = 20 for T1–T2 and from q0 = 500 for T3–T4, when both continuations have finished. We set α = 0.9 for the non-accelerated steps. The choice mr = 4 is based on the observation that convergence improvements increase very slowly for mr > 3 (Anderson 1965; Eyert 1996). However, a deeper discussion about the influence of all parameters on the convergence is outside the scope of the present work and we refer to Li et al. (2020) or, in a more general context, to Walker and Ni (2011) for this.
Results are collected in Table 2 and Figs. 5 and 6, showing the evolution of the norm of the residual, the flatness of the normalized compliance Δck/c0 = (ck − ck− 1)/c0 and the non-discreteness measure \(m_{\text {ND}}= 100 \cdot 4\mathbf {x}^{T}(1-\mathbf {x})/m\). We observe how Anderson acceleration substantially reduces the number of iterations needed to fulfill the stopping criterion, at the price of just a moderate increase in compliance (0.2–3%). Moreover, starting the acceleration just a few iterations later (e.g., it = 50 or it = 100 for T1) gives much lower compliance values (c = 254.3 and c = 252.9, respectively) and for T3 and T4 when the acceleration is started as the design has stabilized, compliance differences are negligible.
From Fig. 5 it is easy to notice the trend of PAE of producing a design with some more bars. This may even give slightly stiffer structures, such as for case T3, where the non accelerated approach removes some bars after it = 2000, whereas stopping at the design of T3–PAE gives a stiffer structure.
A comment is about the convergence criterion used, which is different from the one in top88 (maximum absolute change of the design variables (\(\| \mathbf {x}_{k+1} - \mathbf {x}_{k} \|_{\infty }\)). Here, we consider it more appropriate to check the residual with respect to the physical design field, and the 2-norm seems to give a more global measure, less affected by local oscillations.
3.4 Performance comparison to top88
We compare the performance of top99neo to the previous top88 code. In the following, we will refer to “top88” as the original code provided by Andreassen et al. (2011) and to “top88U” as its updated version making use of the sparse2 function (Davis 2009) for the assembly, with iK and jK specified as integers, and the filter implemented by using conv2.
The codes are tested by running 100 iterations for the MBB beam example (see Fig. 2), for the discretizations 300 × 100, 600 × 200, and 1200 × 400, a volume fraction f = 0.5 and considering mesh independent filters of radii \(r_{\min \limits } = 4\), 8, and 16, respectively. For top88 and top88U, we only consider density filtering, whereas for the new top99neo, we also consider the Heaviside projection, with the η∗ computed as described in Section 3.2. It will be apparent that the cost of this last operation is negligible.
Timings are collected in Table 3 where tit is the average cost per iteration, tA and tS are the overall time spent by the assembly and solver, respectively, and tU is the overall time spent for updating the design variables. For top88 and top88U, the latter consists of the OC updating and the filtering operations performed when applying the bisection on the volume constraint. For top99neo, this term accounts for the cost of the OC updating, that for estimating the Lagrange multiplier λ∗ as discussed in Section 3.2 and the filter and projection (Lines 59–70). tP collects all the preliminary operations, such as the set up of the discretization, and filter, repeated only once, before the TO loop starts.
From tit, we clearly see that top99neo enhances the performance of the original top88 by 2.66, 3.85, and 5.5 times on the three discretizations, respectively. Furthermore, timings of top88 on the largest discretization (1200 × 400), relate to a smaller filter size (\(r_{\min \limits } = 12\)), because of memory issues; thus, the speedup is even underestimated in this case. Comparing to top88U version, the improvements are less pronounced (i.e., 1.55, 1.57, and 1.78 times) but still substantial. The computational cost of the new assembly strategy is very low, even comparing to the top88U version, and its weight on the overall computational cost is basically constant. Also, from Table 3, it is clear that the design variables update weighs a lot on the overall CPU time, for both top88 and top88U. On the contrary, this becomes very cheap in the new top99neo thanks to the strategies discussed in Section 3.2; tU takes about 4–5% of the overall CPU time.
Computational savings would become even higher when adopting the larger filter size \(r_{\min \limits } = 8.75\) for the mesh 300 × 100, and scaling to \(r_{\min \limits } = 17.5\) and \(r_{\min \limits } = 35\) on the two finer discretizations. For these cases, speedups with respect to top88 amount to 4.45 and 10.35 on the first two meshes, whereas for the larger one, the setup of the filter in top88 causes a memory overflow. Speedups with respect to top88U amount to 1.55, 2.55 and 3.6 times respectively.
3.5 Frame reinforcement problem
Let us go back to the example of Fig. 1a, adding the specification of passive domains and a different loading condition.
We may think of a practical application like a reinforcement problem for the solid frame, with thickness t = L/50 (\(\mathcal {P}_{1}\)), subjected to two simultaneous loads. A vertical, uniformly distributed load with density q = − 2 and a horizontal height-proportional load, with density b = ±y/L. Some structural material has to be optimally placed within the active design domain \(\mathcal {A}\) in order to minimize the compliance, while keeping the void space (\(\mathcal {P}_{0}\)), which may represent a service opening.
To describe this configuration, we only need to replace Lines 31–33 with the following
where lDofv and lDofh target the DOFs subjected to vertical and horizontal forces, respectively. Then, the load (Line 34) is replaced with
Figure 7 shows the two optimized design corresponding to the two orientations of the horizontal load b, after 100 redesign steps. The routine top99neo has been called with the following arguments nely=nelx= 900, volfrac= 0.2, penal= 3, rmin= 8, ft= 3, eta= 0.5, beta= 2 and no continuation is applied. The cost per iteration is about 10.8 s and, considering the fairly large discretization of 1.62 ⋅ 106 DOFs, is very reasonable.
4 Extension to 3D
The implementation described in Section 3 is remarkably easy to be extended to 3D problems (see Section Appendix).
Notable modifications are the definition of \(K^{(s)}_{e}\) for the 8-node hexahedron (Lines 24–47) and the solution of the equilibrium (5), now performed by
which in this context has been observed to be faster than the decomposition routine. Then, apart from the plotting instructions, all the operations are the same as in the 2D code and only 12 lines need minor modifications, basically to account for the extra space dimension (see tags “#3D#” in Section Appendix).
We test the 3D implementation on the cantilever example shown in Fig. 8a, for the same data considered in Amir et al. (2014). The discretization is set to Ωh = 48 × 24 × 24, the volume fraction is f = 0.12, and the filter radius \(r_{\min \limits } = \sqrt {3}\). We also consider the volume-preserving Heaviside projection, (ft= 3). Figure 8b and c show the designs obtained after 100 redesign steps, for the two different filter boundary conditions. The design in (b), identical to the one in Amir et al. (2014), corresponds to zero-Neumann boundary conditions (i.e., the option “symmetric” was used in imfilter). The design in (c) on the other hand, corresponds to zero-Dirichlect boundary conditions for the filter operator and is clearly a worse local minimum.
The overall CPU time spent over 100 iterations is 1741 s and about 96% of this is due to the solution of the state equation. Only 1.2% of the CPU time is taken by matrix assemblies and 0.4% by filtering and the design update processes.
Upon replacing the direct solver in top3D125 with the same multigrid preconditioned CG solver of Amir et al. (2014), we can compare the efficiency of the two codes. We refer to Table 4 for the CPU timings, considering the discretizations Ωh = 48 × 24 × 24 (l = 3 multigrid levels) and Ωh = 96 × 48 × 48 (l = 4 multigrid levels). top3D125 shows speedups of about 1.8 and 1.9, respectively, and most of the time is cut on the matrix assembly. In the code of Amir et al. (2014), this operation takes about 50% of the overall time (and notably has the same weight as the state equation solve) whereas in top3D125 this weight is cut to 7 − 10%. Also, the time spent for the OC update is reduced, even though the code of Amir et al. (2014) already implemented a strategy for avoiding filtering at each bisection step.
5 Concluding remarks
We have presented new Matlab implementations of compliance topology optimization for 2D and 3D domains. Compared to the previous top88 code (Andreassen et al. 2011) and available 3D codes (e.g., by Liu and Tovar 2014 or Amir et al. 2014), the new codes show remarkable speedups.
Improvements are mainly due to the following:
-
1.
The matrix assembly is made much more efficient by defining mesh-related quantities as integers (Matlab int32) and assembling just one half of the matrix.
-
2.
The number of OC iterations is drastically cut by looking at the explicit expression of the Lagrange multiplier for the problem at hand.
-
3.
Filter implementation and volume-preserving density projection allow to speed up the redesign step.
The new codes are computationally well balanced and as the problem size increases the majority of the time (85 to 90% for 2D and even 96% for 3D discretizations) is spent on the solution of the equilibrium system. This is precisely what we aimed at, as this step can be dealt with efficiently by preconditioned iterative solvers (Amir et al. 2014; Ferrari et al. 2018; Ferrari and Sigmund 2020). We also discussed Anderson acceleration, that has recently been applied to TO also by Li et al. (2020), to accelerate the convergence of the overall optimization loop.
We point out that even if we specifically addressed volume constrained compliance minimization and density-based TO the methods above can be applied also to level-set and other TO approaches. Point 1 can be extended to all problems governed by symmetric matrices. Points 2 and 3 can also be extended to other problems, to some extent, and Anderson acceleration is also usable in a more general setting (e.g., within MMA).
Therefore, we believe that this contribution should be helpful to all researchers and practitioners who aim at tackling TO problems on laptops, and set a solid framework for the efficient implementation of more advanced procedures.
References
Amir O, Sigmund O (2011) On reducing computational effort in topology optimization: how far can we go? Struct Multidiscip Optim 44(1):25–29. https://doi.org/10.1007/s00158-010-0586-7
Amir O, Aage N, Lazarov BS (2014) On multigrid–CG for efficient topology optimization. Struct Multidiscip Optim 49(5):815–829. https://doi.org/10.1007/s00158-013-1015-5
Anderson DG (1965) Iterative procedures for nonlinear integral equations. J Assoc Comput Mach 12(4):547–560
Andreassen E, Andreasen CS (2014) How to determine composite material properties using numerical homogenization. Comput Mater Sci 83:488–495
Andreassen E, Clausen A, Schevenels M, Lazarov BS, Sigmund O (2011) Efficient topology optimization in matlab using 88 lines of code. Struct Multidiscip Optim 43 (1):1–16. https://doi.org/10.1007/s00158-010-0594-7
Arora JS, Chahande AI, Paeng JK (1991) Multiplier methods for engineering optimization. Int J Numer Methods Eng 32(7):1485–1525
Bendsøe MP, Sigmund O (1999) Material interpolation schemes in topology optimization. Arch Appl Mech 69(9):635–654. https://doi.org/10.1007/s004190050248
Bourdin B (2001) Filters in topology optimization. Int J Numer Methods Eng 50(9):2143–2158. https://doi.org/10.1002/nme.116
Brezinski C, Chehab JP (1998) Nonlinear hybrid procedures and fixed point iterations. Numer Funct Anal Optim 19(5–6):465–487. https://doi.org/10.1080/01630569808816839
Bruns TE, Tortorelli DA (2001) Topology optimization of non-linear elastic structures and compliant mechanisms. Comput Methods Appl Mech Eng 190(26):3443–3459. https://doi.org/10.1016/S0045-7825(00)00278-4. http://www.sciencedirect.com/science/article/pii/S0045782500002784
Challis VJ (2010) A discrete level-set topology optimization code written in matlab. Struct Multidiscip Optim 41(3):453–464. https://doi.org/10.1007/s00158-009-0430-0
Christensen P, Klarbring A (2008) An introduction to structural optimization. Solid mechanics and its applications. Springer, Netherlands
Davis TA (2009) User guide for CHOLMOD: a sparse Cholesky factorization and modification package
Davis T (2019) Suitesparse: a suite of sparse matrix software. http://faculty.cse.tamu.edu/davis/suitesparse.html
Engblom S, Lukarski D (2016) Fast matlab compatible sparse assembly on multicore computers. Parallel Comput 56:1–17
Eyert V (1996) A comparative study on methods for convergence acceleration of iterative vector sequences. J Comput Phys 124(2):271–285. https://doi.org/10.1006/jcph.1996.0059
Fang HR, Saad Y (2009) Two classes of multisecant methods for nonlinear acceleration. Numer Linear Algebra Appl 16(3):197–221. https://doi.org/10.1002/nla.617
Ferrari F, Sigmund O (2020) Towards solving large-scale topology optimization problems with buckling constraints at the cost of linear analyses. Comput Methods Appl Mech Eng 363:112,911. https://doi.org/10.1016/j.cma.2020.112911
Ferrari F, Lazarov BS, Sigmund O (2018) Eigenvalue topology optimization via efficient multilevel solution of the frequency response. Int J Numer Methods Eng 115(7):872–892
Guest JK, Prévost JH, Belytschko T (2004) Achieving minimum length scale in topology optimization using nodal design variables and projection functions. Int J Numer Methods Eng 61(2):238–254. https://doi.org/10.1002/nme.1064
Hestenes MR (1969) Multiplier and gradient methods. J Optim Theory Appl 4(5):303–320. https://doi.org/10.1007/BF00927673
Horn RA, Johnson CR (2012) Matrix analysis, 2nd edn. Cambridge University Press, New York
Li L, Khandelwal K (2015) Volume preserving projection filters and continuation methods in topology optimization. Engineering Stru 85:144–161
Li W, Suryanarayana P, Paulino G (2020) Accelerated fixed–point formulation of topology optimization: application to compliance minimization problems. Mech Rese Commun 103:103,469
Liu K, Tovar A (2014) An efficient 3d topology optimization code written in matlab. Struct Multidiscip Optim 50(6):1175–1196. https://doi.org/10.1007/s00158-014-1107-x
Peng Y, Deng B, Zhang J, Geng F, Qui W, Liu L (2018) Anderson acceleration for geometry optimization and physics simulation. ACM Trans Graph 37(4):42:1–42:14
Pratapa PP, Suryanarayana P, Pask JE (2016) Anderson acceleration of the jacobi iterative method: An efficient alternative to krylov methods for large, sparse linear systems. J Comput Phys 306:43–54. https://doi.org/10.1016/j.jcp.2015.11.018
Quarteroni A, Sacco R, Saleri F (2000) Numerical mathematics. Texts in applied mathematics. Springer
Ramiere I, Helfer T (2015) Iterative residual–based vector methods to accelerate fixed point iterations. Comput Math Appl 70:2210–2226
Saad Y (1992) Numerical methods for large eigenvalue problems. Manchester University Press
Sanders ED, Pereira A, Aguiló MA, Paulino GH (2018) Polymat: an efficient Matlab code for multi–material topology optimization. Struct Multidiscip Optim 58:2727–2759
Sigmund O (2001) A 99 line topology optimization code written in Matlab. Struct Multidiscip Optim 21(2):120–127. https://doi.org/10.1007/s001580050176
Sigmund O (2007) Morphology–based black and white filters for topology optimization. Struct Multidiscip Optim 33(4):401–424
Suresh K (2010) A 199–line Matlab code for Pareto–optimal tracing in topology optimization. Struct Multidiscip Optim 42(5):665–679
Talischi C, Paulino GH, Pereira A, Menezes IF (2012) Polytop: a matlab implementation of a general topology optimization framework using unstructured polygonal finite element meshes. Struct Multidiscip Optim 45(3):329–357. https://doi.org/10.1007/s00158-011-0696-x
Walker HF, Ni P (2011) Anderson acceleration for fixed point iterations. SIAM J Numer Anal 49(4):1715–1735
Wang MY (2007) Structural topology optimization using level set method. In: Computational methods in engineering & science. Springer, Berlin, pp 310–310
Wang F, Lazarov B, Sigmund O (2011) On projection methods, convergence and robust formulations in topology optimization. Struct Multidiscip Optim 43(6):767–784
Xia L, Breitkopf P (2015) Design of materials using topology optimization and energy-based homogenization approach in matlab. Struct Multidiscip Optim 52(6):1229–1241. https://doi.org/10.1007/s00158-015-1294-0
Xu S, Cai Y, Cheng G (2010) Volume preserving nonlinear density filter based on Heaviside functions. Struct Multidiscip Optim 41:495–505
Acknowledgments
The project is supported by the Villum Fonden through the Villum Investigator Project “InnoTop.” The authors are grateful to members of the TopOpt group for their useful testing of the code.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Responsible Editor: Palaniappan Ramu
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Replication of results
Matlab codes are listed in the Appendix and available at www.topopt.dtu.dk. The stenglib package, containing the fsparse function, is avaialble for download at https://github.com/stefanengblom/stenglib.
Appendices
Appendix A: Elaboration on the OC update
Let us consider (3) at a given design point xk assuming the reciprocal and linear approximation for the compliance and volume functions, respectively (Christensen and Klarbring 2008)
We set up the Lagrangian associated with (24)
and seek the pair \((\mathbf {x}_{k+1},\lambda ^{\ast }_{k})\in \mathbb {R}^{m}\times \mathbb {R}_{+}\) solving the subproblem
where \(\mathcal {C} = \{ \mathbf {x} \in \mathbb {R}^{m} \mid \delta _{-}\leq x_{e} \leq \delta _{+}, e = 1 \ldots , m \}\) and ψ(λ) is the dual function. Equation (25) is solved by primal-dual (PD) iterations, as x and λ are interlaced. Replacing ξ = xk and using subscripts (j) to denote inner PD iterations, we have
-
1.
Fixed λ = λ(j), the inner minimization in (25) gives
$$ {\xi^{2}_{e}}\partial_{e}c(\boldsymbol{\xi})x^{-2}_{e} + \lambda \partial_{e}V(\boldsymbol{\xi}) = 0 \Longrightarrow x_{e} = \xi_{e} \left( -\frac{\partial_{e}c(\boldsymbol{\xi})} {\lambda\partial_{e}V(\boldsymbol{\xi})} \right) ^{\frac{1}{2}} $$due to separability of the approximation. Let us denote the rightmost expression \(x_{e} = \mathcal {F}_{(j)e}(\lambda )\), and taking into account the box constraints in \(\mathcal {C}\), we have
$$ \mathcal{U}(x_{e}) = \begin{cases} x_{(j+1),e} = \delta_{-} & \text{if} e \in \mathcal{L} = \{e\mid x_{(j+1),e} \leq \delta_{-} \} \\ x_{(j+1),e} = \delta_{+} & \text{if} e \in \mathcal{U} = \{e\mid x_{(j+1),e} \geq \delta_{+} \} \\ x_{(j+1),e} = \mathcal{F}_{(j),e} & \text{if} e \in \mathcal{M} = \{e\mid\delta_{-} < x_{(j+1),e} < \delta_{+} \} \\ \end{cases} $$(26)where \(\mathcal {C} = {\mathscr{L}} + \mathcal {U} + {\mathscr{M}}\). The above is equivalent to (10).
-
2.
We then evaluate the dual function for x(j+ 1) given by (26), and the stationarity (∂λψ = 0) gives
$$ \sum\limits_{e=1}^{m} \partial_{e} V(\boldsymbol{\xi}) (\chi_{\mathcal{U}}\delta_{+} + \chi_{\mathcal{L}}\delta_{-} + \mathcal{F}_{(j),e}(\lambda)\chi_{\mathcal{M}} ) - f|{\Omega}_{h}| = 0 $$where χ[⋅] is the characteristic function of a set. In this simple case, the above can be solved for λ(j+ 1), the Lagrange multiplier enforcing the volume constraint for the updated density x(j+ 1), and after some simplifications, we obtain
$$ \lambda_{(j+1)} = \left( \frac{{\sum}_{e\in\mathcal{M}}x_{(j+1)e}(\partial_{e}c(\boldsymbol{\xi})/\partial_{e}V(\boldsymbol{\xi}))^{1/2}} {f|{\Omega}_{h}|/\partial_{e}V(\boldsymbol{\xi}) - |\mathcal{L}|\delta_{-} - |\mathcal{U}|\delta_{+}}\right)^{2} $$(27)where |⋅| denotes the number of elements in a set.
Equations (26) and (27) can be iteratively used to compute the new solution \((\mathbf {x}_{k+1},\lambda ^{\ast }_{k})\), as implemented in the code here below (again, note that lm here represents \(\sqrt {\lambda }\))
and, for the MBB beam example, this performs as shown by the green curves in Fig. 4b.
However, a closed form expression such as (27) cannot be obtained for more involved constraint expressions and therefore a root finding strategy must be employed to approximate the Lagrange multiplier. The application of (27) to the current, feasible design point (x(j+ 1) = xk) reduces to
since \(|{\mathscr{M}}| = |{\Omega }_{h}| = m\), \(|{\mathscr{L}}| = |\mathcal {U}| = 0\) and we made use of (7). We immediately verify that (28) is identical to (19).
Appendix B: The 2D code for compliance minimization
Appendix C: 3D code for compliance minimization
Rights and permissions
About this article
Cite this article
Ferrari, F., Sigmund, O. A new generation 99 line Matlab code for compliance topology optimization and its extension to 3D. Struct Multidisc Optim 62, 2211–2228 (2020). https://doi.org/10.1007/s00158-020-02629-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00158-020-02629-w