1 Introduction

Topology optimization (TO) is a mathematical method aiming at finding the optimal material distribution subjected to some constraints within a given domain. From the pioneering work of Bendsøe and Kikuchi (1988), TO has become one of the most important optimization methods in structural optimization, and a series of impressive TO methods have emerged in the past 30 years, such as homogenization method (Bendsøe and Kikuchi 1988), solid isotropic material with penalization (SIMP) approach (Bendsøe 1989; Sigmund 2001), evolutionary approach (Xie and Steven 1993), level set method (Allaire et al. 2004; Deng et al. 2016; Mei and Wang 2004), and moving morphable component (MMC) method (Guo et al. 2014; Zhang et al. 2018). TO has been successfully applied in various academic fields, such as solid mechanics (Bendsee 1983; Lindgaard and Dahl 2013; Madah and Amir 2017), fluid mechanics (Gersborg-Hansen et al. 2005; Norgaard et al. 2016), heat transfer (Dirker and Meyer 2013; Gersborg-Hansen et al. 2006), acoustics (Christiansen et al. 2015; Desai et al. 2018) and electromagnetics (Zhou et al. 2010), and engineering fields, such as aircraft (Zhu et al. 2016), additive manufacturing (Langelaar 2016; Langelaar 2017), composite materials (Cheng and Chen 2015), and design in photonics (Li et al. 2016a; Li et al. 2016b).

In actual engineering problems, the computational scales of TOs are usually large (Aage et al. 2017; Alexandersen et al. 2016; Evgrafov et al. 2008), and will be larger and larger in the future since the problems are becoming more complex with more details. Therefore, pursuing higher computational efficiency is the eternal objective of TOs. In general, there are two types of methods to improve the computational efficiency: one is the algorithm improvement (Amir and Sigmund 2011; Kim et al. 2012), and the other is parallel computing (Aage and Lazarov 2013; Martínez-Frutos and Herrero-Pérez 2017; Wang et al. 2015b; Xia et al. 2017; Zegard and Paulino 2013). Apparently, the former one is capable to fundamentally accelerate the TO and is worthier to study. Several aspects can be studied to improve the efficiency of the TO, and three representative aspects are convergence acceleration, solution acceleration, and design variable reduction.

In order to accelerate the convergence, Lin and Chou (1999) presented a two-stage approach for TO, the optimal result of the first stage constructed by big-sized elements is used as an initial topology of the second stage whose element size is smaller, but only two different element sizes are considered in the two-stage approach. Kim and Yoon (2000) proposed multiresolution multiscale topology optimization (MTOP) where the design optimization was performed progressively from low to high resolution. Nguyen et al. (2010) used multiresolution topology optimization (MTOP) scheme to obtain high-resolution designs with relatively low computational cost (i.e., a coarse mesh used for finite element analysis (FEA)), which implemented in the TopOpt app (Aage et al. 2013). Recently, Lieu and Lee (2017a, 2017b) proposed a MTOP using isogeometric analysis (IGA) and extended it to multimaterial TO. Although above MTOP approaches can greatly reduce the degrees of freedom (DOFs) in the FEA or IGA by using a coarse mesh with fewer elements, but the accuracy of FEA or IGA is also sacrificed due to the coarse mesh. For the strategy of refining mesh, Stainko (2006) presented a multilevel scheme adaptively refined only in the interface between solid and void. More researches of refining mesh can be found in (Lambe and Czekanski 2018; Panesar et al. 2017; Wang et al. 2013). Above refining approaches refined mesh on the interface between solid and void, which increase the complexity of TO. When the volume fraction is small, there are still many small-density and void elements to be updated, which occupy major computational resource but have fewer effects on the accuracy of computation and design.

In TOs, it is required to solve finite element (FE) equations in each iteration, and a fast solution method for the FE equations can definitely accelerate the TOs. There are two types of equation solution methods, direct methods (Davis 2006) and iterative methods (Saad 2003). For large-scale FE-based TO problems, iterative methods have better performance than direct methods in computational time and memory requirements (Benzi 2002). The key to speed up the iterative method is how to choose a suitable iterative method and its preconditioner. Augarde et al. (2006) chose a preconditioner with element displacement components to improve convergence. Wang et al. (2007) proposed a preconditioned Krylov subspace methods with recycling, which gave a fast iteration solver for large three-dimensional TO. Many iterative methods have been introduced into TO problems, one of which is PCG (preconditioned conjugate-gradient method) (Papadrakakis et al. 1996). Amir et al. (2009) used a standard PCG solver to terminate the iterative solution of the nested equations in a short time. Furthermore, Amir et al. (2014) combined PCG and multigrid into multigrid preconditioned conjugate gradients (MGCG), aiming to accelerate the solution speed of the linear system in TO. Liu and Tovar (2014) used the PCG in MATLAB to write efficient 3D TO code. More PCG applications can be found in (Amir 2015; Amir and Sigmund 2011). Most improved methods for PCG are accelerated by preconditioner, other factors for acceleration were less considered.

For design variable reduction, Guest and Genut (2010) found that reducing some design variables was expected to fewer iterations than using full design variable fields. Kim et al. (2012) proposed a reducible design variable method (RDVM), which reduced some quickly converge design variables from iterations to save computational expense. Yoo and Lee (2018) proposed a variable grouping method which can reduce variables according to histogram of multiplication of sensitivity and density variable. For design variable reduction, there are few researches meanwhile lack of concise and effective methods.

Although many researches about acceleration algorithm exist, there are still lots of work can be done to further improve current researches. Furthermore, acceleration methods are mostly focused on one aspect, accelerating effect in multiple aspects is seldom seen. In this paper, we will propose a triple acceleration method for TO based on three parts: multilevel mesh, initial-value-based PCG, and local-update strategy, which accelerates TO in three aspects: convergence acceleration, solution acceleration, and design variable reduction. In the reminder of this paper, Sect. 2 introduces the basic theories of TO and the mathematical model of PCG. Section 3 presents a detailed description about the proposed acceleration method. Section 4 presents the process of algorithm implementation. Section 5 discusses the acceleration effect of the proposed method by benchmark examples for both 2D and 3D problems, and the conclusions are drawn in Sect. 6.

2 Theoretical basis

The theoretical bases of this paper include SIMP-based TO and PCG, which will be summarized in the following sections. For more detail discussion on SIMP-based TO and PCG, we refer readers to (Amir et al. 2009; Andreassen et al. 2011; Barrett 1995; Liu and Tovar 2014; Sigmund 2001).

2.1 SIMP-based TO

SIMP is a simple and efficient TO method. Since Sigmund (2001) proposed the 99-line MATLAB codes for TO; the SIMP became more and more widespread in the world (Andreassen et al. 2011). For the SIMP, the design domain is divided into finite elements, and the element density xe determines its Young’s modulus Ee:

$$ {E}_e\left({x}_e\right)={E}_{min}+{x}_e^p\left({E}_0-{E}_{min}\right),{x}_e\in \left[0,1\right] $$
(1)

where E0 is the Young’s modulus of a fully solid element; Emin is a very small Young’s modulus assigned to void elements aiming to prevent the stiffness matrix from becoming singular, and p is a penalization factor (typically p = 3).

The mathematical formulation of the compliance optimization problem is shown as follows:

$$ {\displaystyle \begin{array}{c}\mathrm{Find}\ \boldsymbol{x}={\left[{x}_1,{x}_2,\dots, {x}_e,\dots {x}_N\right]}^{\mathrm{T}},\\ {}\operatorname{Min}\ c\left(\boldsymbol{x}\right)={\boldsymbol{U}}^{\mathrm{T}}\boldsymbol{KU}=\sum \limits_{e=1}^N{E}_e\left({x}_e\right){{\boldsymbol{u}}_e}^{\mathrm{T}}{\boldsymbol{k}}_e{\boldsymbol{u}}_e,\\ {}\mathrm{st}\Big\{\begin{array}{c}\frac{V(x)}{V_0}= VF\\ {}\boldsymbol{KU}=\boldsymbol{F}\\ {}0\le {x}_e\le 1\end{array},\end{array}} $$
(2)

where x is the design variable vector (i.e., element densities); N is the number of the design variables; c is the compliance; U is the global displacement vector; F is the global force vector; K is the global stiffness matrix; ke is the standard element stiffness matrix with unit Young’s modulus; ue is the element displacement vector; V(x) and V0 are the material volume and design domain volume, and VF is the volume fraction.

To solve the optimization problem, a heuristic updating scheme called optimality criteria (OC) (Wang et al. 2017) is as follows:

$$ {x}_e^{\mathrm{new}}=\left\{\begin{array}{c}\max \left({x}_{\mathrm{min}},{x}_e-m\right),\mathrm{if}\ {x}_e{B}_e^{\eta}\le \max \left({x}_{\mathrm{min}},{x}_e-m\right)\\ {}{x}_e{B}_e^{\eta },\mathrm{if}\ \max \left({x}_{\mathrm{min}},{x}_e-m\right)\le {x}_e{B}_e^{\eta}\le \min \left(1,{x}_e+m\right)\\ {}\min \left(1,{x}_e+m\right),\mathrm{if}\ \min \left(1,{x}_e+m\right)\le {x}_e{B}_e^{\eta}\end{array}\right., $$
(3)

where m is the positive move limit; η is a numerical damping coefficient, and Be can be obtained from the optimality condition as:

$$ {B}_e=\frac{-\frac{\partial c}{\partial {x}_e}}{\lambda \frac{\partial V}{\partial {x}_e}} $$
(4)

where λ is the Lagrangian multiplier which can be obtained by bi-sectioning algorithm.

The sensitivities of the objective function c and the material volume V with respect to the element density xe are found as:

$$ \frac{\partial c}{\partial {x}_e}=-p{\left({x}_e\right)}^{p-1}{\boldsymbol{u}}_e^{\mathrm{T}}\ {\boldsymbol{k}}_e{\boldsymbol{u}}_e $$
(5)
$$ \frac{\partial V}{\partial {x}_e}=1 $$
(6)

where Eq. (6) is established under the condition that each element has unit volume.

To avoid the drawbacks of TO including checkboard, local minima, and mesh dependency, we use the density filter in this work, and more details can be found in the work of Bruns and Tortorelli (Bruns and Tortorelli 2001).

2.2 Preconditioned conjugate-gradient method

PCG, improved from conjugate-gradient (CG) method, is an iterative method for solving a system of linear algebraic equations, in which a preconditioner for PCG can be used to accelerate the convergence. Applying the PCG to solve the following equations

$$ \boldsymbol{Ax}=\boldsymbol{b} $$
(7)

in which the residuals of each iteration are:

$$ {\boldsymbol{r}}_i=\boldsymbol{b}-\boldsymbol{A}{\boldsymbol{x}}_i $$
(8)

where the coefficient matrix A must be symmetric, positive-definite, and sparse.

Improved from the CG, PCG adds a preconditioning matrix M to accelerate the solution as follows:

$$ {\boldsymbol{M}}^{-1}\boldsymbol{Ax}={\boldsymbol{M}}^{-1}\boldsymbol{b} $$
(9)

and the recursion formula of residuals is:

$$ {\boldsymbol{r}}_i={\boldsymbol{r}}_{i-1}-{\alpha}_{i-1}\boldsymbol{A}{\boldsymbol{p}}_{i-1} $$
(10)

where the search direction vector pi and the multiplier αi are expressed as:

$$ {\boldsymbol{p}}_i={\boldsymbol{z}}_i+{\beta}_i{\boldsymbol{p}}_{i-1} $$
(11)
$$ {\alpha}_i=\frac{{\boldsymbol{r}}_i^T{\boldsymbol{z}}_i}{{\boldsymbol{p}}_i^T\boldsymbol{A}{\boldsymbol{p}}_i} $$
(12)

and

$$ {\boldsymbol{z}}_i={\boldsymbol{M}}^{-1}{\boldsymbol{r}}_i $$
(13)
$$ {\beta}_i=\frac{{\boldsymbol{r}}_i^T{\boldsymbol{z}}_i}{{\boldsymbol{r}}_{i-1}^T{\boldsymbol{z}}_{i-1}}. $$
(14)

The vector xi is updated in each iteration by a multiplier αi of the search direction vector pi:

$$ {\boldsymbol{x}}_{i+1}={\boldsymbol{x}}_i+{\alpha}_i{\boldsymbol{p}}_i. $$
(15)

When the residual vector ri reaches a predefined small level, the iteration stops. Compared with the traditional method of solving linear algebraic equations with a direct method, PCG is more suitable for large-scale problems and is widespread used as a iterative solver in FE solution (Amir et al. 2009).

3 Triple acceleration algorithm model

To accelerate the TO, three acceleration schemes including multilevel mesh, initial-value-based PCG and local update will be proposed in this section from different aspects.

3.1 Multilevel mesh

To reduce the computational cost, a multilevel mesh acceleration algorithm is proposed, which first solves the TO in a coarser mesh and then maps the optimized result to a finer mesh as the initial design. During the SIMP TO, each element density is a design variable, and the number of elements has an important influence on the solution process, i.e., more elements costs more solving time. On the other hand, if the mesh is too coarse, it is hard to get the precise boundary and the optimal density distribution field. Multilevel mesh algorithm provides a better choice to balance coarse and fine meshes.

The multilevel mesh method divides the TO process into several levels, and the mesh size varies from coarse to fine. First, the TO process is implemented on the coarse mesh, which has the fewer elements. The optimal result of the coarse mesh can be quickly obtained by a fast convergence. After then, the mesh is refined, and the density field of the coarse mesh is mapped to the refined mesh as the initial density field. Through such density mapping, the TO converges much faster so that the number of iterations on the refined mesh can be reduced, which can successfully save the computational time since the refined mesh requires much more computational cost in each iteration. The above procedure can be repeated several times for multiple levels in terms of the practical requirement.

The core of the multilevel mesh method is the density field inheritance between adjacent levels. In this work, we use a bisection method to refine the mesh. When the mesh refinement is applied, one element will be refined into four (2D) or eight (3D) subelements, and Fig. 1 shows the density field inheritance in 2D and 3D mesh.

Fig. 1
figure 1

density field inheritance

The flowchart of multilevel mesh algorithm is shown in Fig. 2.

Fig. 2
figure 2

The flowchat of multilevel mesh algorithm

Although two levels and coincident regular grids are taking as examples in Figs. 1 and 2, it should be noted that the proposed multilevel approach can be generalized for any mesh level and non-coincident regular grids (an interpolation method is required to transmit the density fields between the non-coincident grids).

3.2 Initial-value-based PCG

To solve the FE problem in TO, the PCG (preconditioned conjugate-gradient method) is applied in this work to calculate the displacement vector U in the FE equations KU = F, and a preconditioner M is used to accelerate the solution of the equations.

The convergence rate of PCG is given by (Amir et al. 2014)

$$ {\left|\left|\boldsymbol{U}-{\boldsymbol{U}}_i\right|\right|}_{{\boldsymbol{K}}^{-1}}\le {\left|\left|\boldsymbol{U}-{\boldsymbol{U}}_0\right|\right|}_{{\boldsymbol{K}}^{-1}}{\left(\frac{\sqrt{\kappa }-1}{\sqrt{\kappa }+1}\right)}^i $$
(16)

where i is the iteration number; κ is the condition number of matrix K. When κ ≫ 1, the convergence will be very slow. Therefore, a preconditioning matrix M is applied in the iterations, which makes κ(M−1K) ≪ κ(K). In PCG method, the effective preconditioner M should make the κ(M−1K) close to 1, which can accelerate the convergence considerably.

In order to obtain a good preconditioner M, many methods have been proposed, such as incomplete Cholesky factorization, diagonal scaling, or factorized sparse approximate inverses (FSAI). More details can be found in (Barrett 1995). In this paper, we choose incomplete Cholesky factorization as the preconditioner, M−1 can be factorized as follows:

$$ {\boldsymbol{M}}^{-1}=\boldsymbol{L}{\boldsymbol{L}}^T $$
(17)

where L is a lower triangular matrix. The FE equations KU = F is rewritten as

$$ \widehat{\boldsymbol{K}}\widehat{\boldsymbol{U}}=\widehat{\boldsymbol{F}} $$
(18)

where

$$ {\displaystyle \begin{array}{c}\widehat{\boldsymbol{K}}={\boldsymbol{L}}^T\boldsymbol{KL},\\ {}\widehat{\boldsymbol{U}}={\boldsymbol{L}}^{-1}\boldsymbol{U},\\ {}\widehat{\boldsymbol{F}}={\boldsymbol{L}}^T\boldsymbol{F}\end{array}} $$
(19)

By using incomplete Cholesky factorization, it is successful to make the \( \kappa \left(\widehat{\boldsymbol{K}}\right)\ll \kappa \left(\boldsymbol{K}\right) \). The actual acceleration effect can be found in Sect. 5.1.2.

An initial value may accelerate the solution of PCG. In the TO, the optimized results are similar between two adjacent iterations, especially for the later iterations of the solution. Therefore, we define the previous iteration result, named Upre, as an initial value. In current iteration, Upre is close to the unknown vector U, which helps the PCG easier to find the solution and save solution time.

To integrate with multilevel mesh, displacement (U) inheritance should be also applied between the adjacent two levels. The U can be calculated by such interpolation strategy:

$$ {U}_i^n=\sum \limits_{j=1}^k{N}_j{U}_j^{n-1} $$
(20)

where \( {U}_i^n \) represents the new displacement field of an element in the refinement mesh; n is the number of mesh levels; i represents the nodal ID of the refined mesh; Nj is the value of shape function of the coarser mesh at the ith node on the refined mesh; j represents the nodal ID of the coarser mesh; \( {U}_j^{n-1} \) is the displacement field of the coarser mesh. With the interpolation method, the PCG is successfully applied between the adjacent two levels, further accelerating the TO.

3.3 Local-update strategy

The last and the most important acceleration proposed in this work is the local-update strategy, which reduces the computational scale in the TO iterations. During the conventional TO, all design variables must be updated in each iteration. According to Guest and Genut (2010), reducing some design variables will not significantly influence the performance of solutions, but save significant computational cost. Reducing some design variables is expected to achieve fewer iterations than using all design variables. Enlightened by this research, a local-update strategy is proposed and realizes a high-efficiency TO method.

The most important part of the local-update strategy is how to identify the updated range. To guarantee the stabilization and accuracy of TO, the updated range should cover the solid region and the boundary. Therefore, the key of local-update strategy is finding the boundary and the solid region.

It is a serious drawback that the traditional element-based TO is difficult to obtain exact boundary. To identify the boundary precisely, the nodal density is used to replace the element density. For a node, shared by four elements (2D) or eight elements (3D) as Fig. 3, its density ρn is calculated by

$$ {\rho}_n=\frac{\sum \limits_{j=1}^N{x}_{ej}}{N} $$
(21)

where N is the number of encircled elements (four for 2D, eight for 3D, particularly at the corner or boundary, the number of encircled elements is only 1,2 for 2D or 1,2,4 for 3D), and j represents the jth encircled element. It is noted that the nodal density is also a relative density, i.e., ρn ∈ [0, 1].

Fig. 3
figure 3

The nodal density mapping relation

In order to map element density to nodal density, we need to define the mapping relation. Take a coarse mesh with 12 elements as an example, where the design domain is rectangular and discretized with square elements. Both elements and nodes are numbered column-wise from left to right, shown in Fig. 4.

Fig. 4
figure 4

The 2D design domain with 12 elements

There are exact IDs for each node and element, so the mapping relation between nodes and elements of the 12 elements is formulated as

$$ \mathrm{ele}\longrightarrow \mathrm{node}=\left[\begin{array}{cccc}1& 2& 5& 6\\ {}2& 3& 6& 7\\ {}3& 4& 7& 8\\ {}5& 6& 9& 10\\ {}\vdots & \vdots & \vdots & \vdots \\ {}15& 16& 19& 20\end{array}\right]{\displaystyle \begin{array}{cc}\longleftarrow & \mathrm{ele}\mathrm{ment}\ 1\\ {}\longleftarrow & \mathrm{ele}\mathrm{ment}\ 2\\ {}\longleftarrow & \mathrm{ele}\mathrm{ment}\ 3\\ {}\longleftarrow & \mathrm{ele}\mathrm{ment}\ 4\\ {}& \vdots \\ {}\longleftarrow & \mathrm{ele}\mathrm{ment}\ 12\end{array}} $$
$$ \mathrm{node}\longrightarrow \mathrm{ele}=\left[\begin{array}{cccc}1& 0& 0& 0\\ {}1& 2& 0& 0\\ {}2& 3& 0& 0\\ {}3& 0& 0& 0\\ {}1& 4& 0& 0\\ {}1& 2& 4& 5\\ {}2& 3& 5& 6\\ {}\vdots & \vdots & \vdots & \vdots \\ {}12& 0& 0& 0\end{array}\right]{\displaystyle \begin{array}{cc}\longleftarrow & \mathrm{node}\ 1\\ {}\longleftarrow & \mathrm{node}\ 2\\ {}\longleftarrow & \mathrm{node}\ 3\\ {}\longleftarrow & \mathrm{node}\ 4\\ {}\longleftarrow & \mathrm{node}\ 5\\ {}\longleftarrow & \mathrm{node}\ 6\\ {}\longleftarrow & \mathrm{node}\ 7\\ {}& \vdots \\ {}\longleftarrow & \mathrm{node}\ 20\end{array}} $$

After obtaining nodal densities, it is easy to define a criterion to divide the design domain into two subdomains by a threshold density value: (1) the subdomain with nodal density larger than or equal to the threshold density and (2) the subdomain with nodal density smaller than the threshold density. Only the elements inside the former subdomain are updated. In terms of the position coordinates of nodes, we can extract the boundaries that separate the subdomains. Figure 5 shows one of the iterative results during TO for cantilever beam with 40 × 20 mesh scale, in which the nodes with the approximate threshold density form a closed area (inside the blue boundary), only the elements inside this domain will be updated. The mathematical description is as follows:

$$ \forall {\rho}_n>{\rho}_c,n\longrightarrow e\in {\Omega}_e $$
(22)

where ρc is threshold value of nodal density; Ωe is the set of updated elements; n is the node ID, and e is the element ID. n ⟶ e represents the elements mapped from the nodes whose nodal densities are bigger than ρc, and these elements are assigned to Ωe. The algorithm implementation is explained in Algorithm 1 (see Table 1).

Fig. 5
figure 5

The updated range in cantilever beam (40 × 20)

Table 1 Algorithm 1

The threshold value is smaller, the updated range is bigger. If the threshold value is 0, all elements will be updated. Vice versa, the threshold value is bigger, the updated range is smaller. If the threshold value is 1, no element will be updated.

Another important parameter is the diffused layer number, which moves the boundary to the outside area. The aim of diffused layer number is to improve optimization accuracy with more design variables. A positive effect is to decrease the isolated gray elements (see in Sect. 5.2) outside the updated range. When the diffused layer number is 0, only the elements inside the blue boundary are updated. If the diffused layer number is 1, the boundary will move 1 layer toward outside area, as shown in Fig. 6. Larger diffused layer number means more elements will be included in the updated range. The mathematical description of one-layer moving is follows:

$$ e\in {\Omega}_e\to n,{\rho}_n=1 $$
(23)
$$ \forall {\rho}_n=1,n\longrightarrow e\in {\Omega}_e $$
(24)

where Ωe is the set of updated elements; n is the node ID, and e is the element ID. Equation (23) represents the nodal density inside the updated range is assigned with 1. The elements, mapped from the nodes whose nodal density equal to 1, should be assigned to Ωe. Repeating such process can move Dnum layers. The diffused-layer algorithm is listed in Algorithm 2 (see Table 2).

Fig. 6
figure 6

The boundary moves 1 layer toward outside area (diffused layer number is 1)

Table 2 Algorithm 2

Besides, how to choose Dnum is influenced by the threshold value of nodal density ρc. Since Dnum and ρc determine the updated range; if the ρc is too large, the Dnum should increase to ensure a suitable updated range.

Furthermore, to improve the convergence rate of local-update strategy, two improving approaches are proposed. In traditional TOs, the optimization process always generates approximate structure in fewer iterations but takes a long history to converge. When the structure stabilizes around a certain shape, the local-update strategy can be adopted to accelerate the TO. Therefore, the moment when to start the local-update strategy plays an important role in the TO. By studying the TO, it can be found that the change value (the relative difference of the design variables between adjacent iterations) is smaller, and the structure change is smaller. Therefore, we define a startup-parameter ST for the moment of local-update strategy startup. ST is defined as the threshold of change value. When the change value is smaller than ST, the TO usually generates an approximate structure, after then local-update strategy is applied to achieve the final result. Combined with multilevel mesh, it can be easy to gain the approximate structure from the coarser mesh level; therefore, local-update strategy can start in a short time and furtherly reduce the number of iterations. Section 5.2.1 will explain how to choose the ST in detail.

The elements with small density occupy most of computational resource when the volume fraction is small, but such elements have a small effect on the accuracy of computation and design. The other negative effect is from the gray elements, which cause TO difficult to obtain a black-and-white result. Therefore, the densities of small-density elements are directly set to the zero under certain conditions. The detail approach is defining a small-density threshold; the element density below the threshold will be set to zero. This density processing approach has two significant benefits: (1) accelerating the convergence rate, and (2) decreasing gray elements and increasing the fully solid elements. To ensure the optimal result meets the volume constraint, this process should perform before element density update process. It is worth noting that instable iterations will occur if the small-density threshold is bigger than volume fraction (VF). Finally, the flowchart of local-update strategy is given in Fig. 7.

Fig. 7
figure 7

The flowchart of the local-update strategy

4 Algorithm implementation

The triple acceleration algorithm (TAM) is developed based on MATLAB (Natick, MA, USA) in this work to solve the minimum compliance problem.

Figure 8 illustrates the flowchart of the TO using TAM (TO-TAM). In each mesh level, the initial-value-based PCG algorithm solves the KU = F for U, and local-update strategy is utilized. The convergence criterion is the relative difference of the design variables between adjacent iterations. In order to accelerate the convergence rate, the coarser mesh level should use a loose convergence criterion.

Fig. 8
figure 8

The flowchart of the TO using TAM

5 Numerical examples

Three benchmark examples for minimum compliance design are examined in this section to demonstrate the acceleration efficiency of triple acceleration algorithm. All examples are run on a desktop computer with CPU Intel core i7-6700 K of 4.00GHz, RAM of 16.0GB, and software environment MATLAB.

In Sect. 5.1, we examine a classical MBB beam to compare the computation efficiency of three parts of TAM versus the conventional TO (Andreassen et al. 2011), and study the accelerated efficiency of TAM on different meshes. In Sect. 5.2, we discuss how to choose the local-update strategy parameters (ST and Dnum) through the Michell type structure. Finally, Sect. 5.3 studies a larger-scale 3D cantilever beam to show the advantages of TAM for 3D cases.

5.1 MBB beam

Figure 9 shows the half MBB beam, a benchmark problem of TO (Andreassen et al. 2011; Sigmund 2001; Sigmund 2007). Due to the symmetry, only half MBB beam is modeled and symmetry boundary conditions are utilized. This case study aims to compare the computation efficiency between three parts of TAM and the conventional TO, and research accelerated efficiency of TAM on different meshes.

Fig. 9
figure 9

The design domain and boundary conditions of half MBB beam

5.1.1 Acceleration using multilevel mesh

The design domain is discretized with 80 × 40 linear quadrilateral elements, and the volume fraction (VF) of optimization is set to 0.5. Figure 10 shows the optimization result of conventional TO using the classical efficient 88-line TO code Top88 (Andreassen et al. 2011).

Fig. 10
figure 10

The optimization result of conventional TO

For the 80 × 40 half MBB beam, it is easy to be divided into three levels; the initial mesh is 20 × 10, the middle mesh is 40 × 20; the final mesh is 80 × 40. The optimization results of each level are shown in Fig. 11.

Fig. 11
figure 11

The optimization results of each level. a 20 × 10. b 40 × 20. c 80 × 40

Table 3 gives efficiency comparison between Top88 and TO with multilevel mesh algorithm. The iteration numbers corresponding to different levels from coarse to fine are 48, 25, and 18, i.e., 48 iterations for mesh level 1, 25 iterations for mesh level 2, and 18 iterations for mesh level 3 (the most refined mesh level). As expected, the computational time of the TO using multilevel mesh is less than that of Top88, demonstrating the advantage of multilevel mesh in acceleration efficiency. Due to the density field inheritance, it can be rapid to obtain an approximate result on the coarse mesh so that the optimization on the refined mesh can skip the lengthy iterations with fast convergence, where the iteration number of the final level is only 18. This result demonstrates that the multilevel mesh scheme has a great potential in TO acceleration.

Table 3 Efficiency comparison between Top88 and TO with multilevel mesh

5.1.2 Acceleration using initial-value-based PCG

Preconditioned conjugate-gradient method (PCG), an iterative solver, is usually used to solve large sparse linear equations. Therefore, in this section, the MBB beam is divided into 100 × 50 elements to discuss the accelerated effect of the preconditioner and initial value Upre. Figure 12 shows the final structures of TOs by using PCG under different conditions.

Fig. 12
figure 12

The optimization results of TOs using PCG under different conditions. a No preconditioner and Upre. b Only preconditioner. c Both preconditioner and Upre

Table 4 shows the compliance values iteration number, average iteration number of PCG and computational time for the corresponding TOs in Fig. 12, which demonstrates that the preconditioner is significant for PCG with reducing over 90% computational time. Furthermore, the accelerated efficiency also has been improved by using the Upre, with the lowest average iteration number of PCG and the minimum computational time. Therefore, both of the preconditioner and initial value Upre have positive effect on the TO acceleration.

Table 4 Efficiency comparison of PCG with differrent conditions

5.1.3 Acceleration using local-update strategy

In the local-update strategy, startup-parameter ST is 0.04; the threshold value of nodal density ρc is 0.5; diffused layer number Dnum is 8. The element density below 0.4 is set to zero in the iterations. The design domain is discretized with 80 × 40 square elements, and the volume fraction is 0.5. The final structure is presented in Fig. 13.

Fig. 13
figure 13

The optimization results. a Top88. b Local-update strategy

Table 5 shows the the efficiency comparison between the TO with local-update strategy and Top88. Compared to Top88, the TO with local-update strategy costs nearly half iterations and less computational time, demonstrating local-update strategy as an excellent accelerating approach. Since the densities of small-density elements are reset to zero, the final optimal result contains less gray elements and more fully solid elements that will result in a clearer distinction between black and white structure. Furthermore, the compliance of local-update TO can be smaller. It is noted that the small-density reset just performs before density update, so the volume fraction keeps accuracy in optimization process. The optimal result meets the volume constraint with a lower compliance making the structure more accurate. More detail about local-update strategy will be given in Sect. 5.2.

Table 5 Efficiency comparison between Top88 and TO with local-update strategy

5.1.4 Acceleration using triple acceleration method

To further discuss the acceleration efficiency of the method proposed in this work, we have modeled the MBB beam with meshes of 40 × 20, 80 × 40, 160 × 80 linear quadrilateral elements, respectively. The volume fraction (VF) of optimization is set to 0.5. The comparison between the TO with triple acceleration method (TO-TAM) and Top88 on difference mesh are shown in Fig. 14.

Fig. 14
figure 14

The optimization results of Top88 and TO-TAM for different meshes. a Top88 with 40 × 20. b TO-TAM with 40 × 20. c Top88 with 80 × 40. d TO-TAM with 80 × 40. e Top88 with 160 × 80. f TO-TAM with 160 × 80

Figure 15 presents the convergence history of Top88 and TO-TAM on different meshes and the efficiency comparison is given in Table 6. Since the initial element density is equal to VF, the volume fraction in optimization process is always equal to 0.5. From Fig. 15, the values of the objective function, i.e., compliance, decrease generally with the iteration number increasing in all cases, only have a small oscillation when the mesh level changes in the TO-TAM. Due to the multilevel mesh, per iteration time of coarser mesh levels is much smaller than that of the finest mesh level, while the per iteration time of the finest mesh level is closed to that of Top88 with the same mesh scale. Since the approximate structure can be inherited from coarser mesh levels by the multilevel mesh approach, the local-update strategy can start quickly and converge fast to further decrease the number of iterations. Besides, due to the usage of initial-value-based PCG, the speed of solving FE equations can be also accelerated in each iteration. From Table 6, the speedup value ranges from 1.55 to 2.02, meaning the TO-TAM has reduced 35–51% computational time compared to Top88, demonstrating the higher efficiency of TO-TAM. For the large-scale mesh, TO-TAM can achieve better accelerated effect. Since the densities of small-density elements are reset to zero in the local-update strategy, smaller compliance values are obtained by the TO-TAM compared to the Top88, which further demonstrates accuracy of the TO-TAM.

Fig. 15
figure 15

The convergence history of Top88 and TO-TAM for different meshes. a Top88 with 40 × 20. b TO-TAM with 40 × 20. c Top88 with 80 × 40. d TO-TAM with 80 × 40. e Top88 with 160 × 80. f TO-TAM with 160 × 80

Table 6 Efficiency comparison between Top88 and TO-TAM

5.2 Michell type structure

In this section, we discuss how to choose the parameters of local-update strategy, and the Michell type structure (Luo et al. 2009; Wang et al. 2017) is used here as an example. The design domain and boundary conditions are presented in Fig. 16. There are a fixed constraint at the bottom-left corner and a roller constraint at the bottom-right corner, as well as a mid-span point load.

Fig. 16
figure 16

The design domain and boundary conditions of Michell type structure

5.2.1 Startup parameter

By studying the conventioal TO using the classical top88 code (Andreassen et al. 2011), there is a connection between the structure and change value: the change value is smaller and the structure change is smaller. When the structure stabilizes around a certain shape, the local-update strategy is applied. For the Michell type structure, we discuss various values of ST, defined as threshold of change value, and the local-update strategy is utilized when the change value is smaller than ST. The final structures of local-update strategy under different ST are presented in Fig. 17, among which no ST represents the TO starts directly with the local-update strategy. The Michell type structure is discretized into 80 × 40 elements; the threshold value of nodal density ρc is 0.5, and the diffused layer number Dnum is 8. The element density below 0.4 is set to zero in the iterations.

Fig. 17
figure 17

The intermediate and final optimized structures: (a) for Top88 and the rest are local-update TO with different ST: (b) ST = 0.04, (c) ST = 0.08, (d) ST = 0.16, (e) no ST

From Table 7, we can find that startup-parameter ST is better to chose nearby 0.08 for this benchmark problem. If ST is too big, the final structure will have isolated gray elements; if ST is too small, the acceleration efficency is low. In addition, Table 7 demonstrates that starting TO process without defining ST has bad performance leading to lots of isolated gray elements, and the reason for this is that the TO starting directly with local-update strategy is not based on an approximate structure as that of the TO with a startup-paramter. In general, a suitable interval of ST (around 0.1) can obtain a high-speed and accurate TO result.

Table 7 Efficiency comparison between Top88 and local-update TO with different ST

5.2.2 Diffused layer number

The another parameter to be discussed is diffused layer number Dnum. During the local-update strategy, one problem is the isolated gray elements. In the local-update iterations, these isolated elements are outside the updated range, so these gray elements will not change and become isolated from the update-range. In order to solve this problem, the diffused layer is applied. For the 80 × 40 Michell type structure, the final structures of the TOs with differrent diffused layer number Dnum are shown in Fig. 18. The startup-parameter ST is set to 0.12; other parameters are the same as that in Sect. 5.2.1.

Fig. 18
figure 18

The final optimized structures: left for Top88, the rest are local-update TO with different Dnum

Table 8 lists the compliance values, iteration number, and computational time for the TOs illustrated in Fig. 18. We observe, as expected, small Dnum leads to isolated gray elements and large Dnum leads to more computational time. As Dnum increases, the compliance decreases, also meaning the improvement of accuracy. For this benchmark problem, Dnum should be bigger than 5. It should be noted the optimal Dnum may be different for different cases, e.g., Dnum can be set to 2 for the 3D example in Sect. 5.3.

Table 8 Efficiency comparison between Top88 and Local-update TO with difference Dnum

In this section, we find that both startup-parameter ST and diffused layer number Dnum are of great importance in computational cost and accuracy for the local-update strategy. And for different TO problems, the parameters should be defined based on the actual conditions.

5.3 3D cantilever beam

Figure 19 shows an example of 3D cantilever beam (Liu and Tovar 2014), which is fully constrained in one end, and a unit distributed vertical load is applied downwards on lower free edge. The 3D cantilever beam is discretized into 60 × 30 × 4 linear hexahedron elements; the prescribed volume fraction VF is set to 0.3. By using the TAM method, 3D cantilever beam is divided into two levels, the coarser mesh is 30 × 15 × 2; the final mesh is 60 × 30 × 4. Startup parameter ST is defined as 0.12; convergence criterion is 0.01; the threshold value of nodal density ρc is 0.5. Since the width is only equal to 4, Dnum is suitably set to 2. The element density below 0.5 is set to zero. Finally, the final structure optimized by the TO-TAM is presented in Fig. 20b, as well as the final structure of conventional TO (using the classical top3d code by Liu and Tovar (2014)) in Fig. 20a.

Fig. 19
figure 19

The design domain and boundary conditions of 3D cantilever beam

Fig. 20
figure 20

The final structure optimized by a top3d and b TO-TAM

Table 9 presents a clear comparison between TO-TAM and top3d. The iteration numbers of TO-TAM for different mesh levels from coarse to fine are 45 and 32, respectively, and TO-TAM is much faster than top3d (only costs 1/5 time of top3d). The compliance of TO-TAM is also less than top3d, which indicates that the optimal result of TAM is more accurate. This demonstrates that the use of TAM can also achieve a high-efficient 3D TO, thereby demonstrating the advantages of TAM for 3D cases.

Table 9 Efficiency comparison between top3d and TO-TAM

To further discuss the capability of TAM for 3D cases, we test the 3D TO-TAM for three more refined cases with meshes of 60 × 30 × 4, 80 × 40 × 6 and 120 × 60 × 8 linear hexahedron elements, respectively, and the numbers of mesh levels for these cases are 2, 2, and 3.

The optimized results are shown in Fig. 21, and the comparison for the different cases is given in Table 10. Definitely, the conputational time and maximum menory allocation increase with the mesh scale increasing. For a large-scale problem, it should be noted that the mesh can be devided into more than two levels as long as the TO on initial mesh level can obtain the approximate structure, e.g., three levels are used for the case with mesh of 120 × 60 × 8 elements. All the three cases sucessfully obtain consistent optimized results (see Fig. 21), and the iteration numbers for different mesh levels from coarse to fine are 45 and 32 (60 × 30 × 4 case); 53 and 45 (80 × 40 × 6 case); and 45, 32, and 37 (120 × 60 × 8 case), respectively, which demostrates the significant convergence of the TO-TAM.

Fig. 21
figure 21

The results of 3D TO-TAM for different meshes of a 60 × 30 × 4, b 80 × 40 × 6, and c 120 × 60 × 8

Table 10 Comparison of 3D TO-TAM for different meshes

6 Conclusions

This paper has proposed a triple acceleration method (TAM) for topology optimization (TO), which includes three parts: multilevel mesh, initial-value-based PCG, and local-update strategy. Three benchmark examples have been tested to verify the proposed method. The first example compares the acceleration efficiency between three parts of TAM and the classical TO code Top88, and all the three parts can accelerate the TO; then, we examine TAM in different scales of mesh, and the results show that the TO using TAM has reduced 35–51% computational time compared to Top88, especially better for large-scale problems. The second example has illustrated the importance of parameters choice for the local-update strategy. The third example demonstrates that the TAM also has excellent performance for accelerating 3D TO problems, performs better than 2D TO problems, which reduces 80% computational time. All examples demonstrate that the proposed TAM provides an excellent acceleration for TOs. Although the current work takes SIMP as example, the TAM will be extended to other types of TO methods such as level-set-based TO and MMC-based TO (Wang and Benson 2016a; Xie et al. 2018) in the future, and combined with advanced methods such as isogeometric analysis (Wang and Benson 2016b; Wang et al. 2015a) and GPU parallel computing (Martínez-Frutos and Herrero-Pérez 2017; Xia et al. 2017).