1 Introduction

Topology optimization, first introduced by Bendsoe (1989) has matured over the last few decades (Xia and Breitkopf2014, 2017) and has had a significant influence on design optimization research.

The classical topology optimization problem consists of optimizing material distribution in two or three dimensions so as to minimize the structural compliance, i.e., finding the density distribution over a voxel grid for a chosen volume fraction under a prescribed set of external loads and boundary conditions. Density-based methods are today the most widely used by engineers along with level-set methods (Zhou et al. 2019), topological derivative procedures (Allaire et al. 2004; Norato et al. 2007), and phase field techniques (Xia et al. 2018), etc (Ferro et al. 2019). A comprehensive review of developments in topology optimization post 2000 may be found in Deaton and Grandhi (2014).

With the modern day mastery of additive manufacturing techniques, topology optimization is increasingly being applied in the design of engineered materials for aerospace applications (Meng et al. 2019b). However, it is surprisingly far from attaining mainstream popularity among structural engineers, despite nearly two decades of research that have been devoted to the subject. One of the key challenges in topology optimization has been dealing with large-scale or high-dimensional design problems that could involve millions or even billions of degrees of freedom (Aage et al. 2017). During each iteration of the optimization process, we need to solve the equilibrium equations for the computation–intensive numerical/finite element (FE) model characterizing the discretized structure. This central and still unresolved issue of prohibitively high computational effort casts an ever-present pall on its large-scale application to industrial design.

High-performance computing approaches have been proposed in the literature surveyed to deal with this problem and are expectedly successful (Mahdavi et al. 2006; Aage and Lazarov 2013; Aage et al., 2015, 2017), but most, if not all, require an increase in computing resources to realize their full potential in reducing the computational time.

Reanalysis methods have been used in topology optimization since the seminal paper of Kirsch and Papalambros (Kirsch and Papalambros 2001) in 2001, where they proposed a unified approach for structural reanalysis in topology optimization. Wang et al. (2007) and Amir et al. (2009) proposed methods based on the use of Krylov subspaces. In a different paper, Amir et al. (2010) proposed the construction of a reduced basis using the combined approximations method. Reanalysis methods were also used in Amir et al. (2012), He and Jiang (2012), and Kirsch and Bogomolni (2004). Yoon (2010) used eigenmodes and Ritz vectors for the reduced basis in topology optimization for vibration response. Gogu (2015) extended the approach of Kirsch and Papalambros (2001) and used Gram–Schmidt orthonormalization to construct a reduced basis on-the-fly based on the violation of an error residual. A survey of the available literature reveals a recent resurgence of interest in reanalysis in topology optimization (Zheng et al. 2017; Sun et al. 2018; Senne et al. 2019).

Reduced-order modeling (ROM), in particular, supervised manifold learning has become a popular approach in a variety of fields today including computational mechanics and structural optimization (Dutta et al. 2018). The basic premise of projection-based ROM (Amsallem et al. 2015) involves mapping the higher dimensional physics onto a lower dimensional space through an appropriate reduced basis calculated using various methods depending on the nature of the problem at hand. While the field is still in its infancy (given the magnitude of potential improvements), the results obtained thus far have been more than promising. Principal components analysis (PCA) (Amsallem et al. 2015; Pearson 1901), proper generalized decomposition (PGD) (Chinesta et al. 2011), hyper-reduction (Ryckelynck et al. 2006), and reduced basis methods (Hoang et al. 2016) are the three prominent schools of this field today. Of these, PCA, also called proper orthogonal decomposition or POD (Berkooz et al. 1993; Xiao et al. 2009; Dulong et al. 2007; Raghavan and Breitkopf 2013; Raghavan et al. 2013a; Raghavan et al. 2013b; Meng et al. 2018; Madra et al. 2018; Xiao et al. 2018; Meng et al. 2019a), is an a posteriori statistical method that learns the covariance structure of complex multivariate data.

With the very recent exceptions of Alaimo et al. (2018), Ferro et al. (2019), and Choi et al. (2019), to the knowledge of the authors, virtually no work has been done on coupling topology optimization with POD. The work of Ferro et al. (2019) involves applying POD to the density map and yields a very efficient numerical scheme which loses precision depending on the number of modes. Since their ROM was not computed “on-the-fly,” i.e., with constant monitoring using the full-field model, could have resulted in the dependence of their obtained optimized topology density map on basis size. In addition, Choi et al. (2019) presented a novel approach to ROM-supplemented topology optimization using inexact linear solutions by incremental SVD during the initial stages of the optimization (when the accuracy is not expected to be as strict), and Krylov subspace methods with ROM recycling closer to convergence, where greater accuracy is expected.

In this work, inspired by Kirsch and Papalambros (2001) and Gogu (2015), we improve the computational efficiency by mapping displacement field quantities of the large-scale problem to a low-dimensional space through an appropriate basis, which we calculate using POD. To render the method more accessible on a workstation, we use an iterative solver for the full-field solution. The method of moving asymptotes (MMA) is used for the optimization as an alternative to the classical optimality criteria (OC) method, based on a dedicated version of sensitivity analysis.

The remainder of the paper is organized in the following manner: in Section 2, the theoretical formulation is formally presented beginning with classical topology optimization, followed by the reduced-order basis construction and sensitivity analysis. In Section 3, we summarize the algorithm for on-the-fly basis construction using POD. Section 4 details the numerical investigations using benchmark 3D compliance minimization problems followed by a discussion. The paper closes with concluding comments and recommendations for future work.

Extension to non self-adjoint problems is discussed in the Appendix.

2 Theoretical formulation

The mathematical formulation of the discrete material distribution problem may be expressed as follows:

$$ \begin{array}{@{}rcl@{}} \min\limits_{\rho} c(\rho) = F^{T}U = U^{T} \mathrm{K U} \\ \sum\limits_{e=1}^{N} v_{e}\rho_{e} = v_{\text{frac}}V < V \\ 0 \leq \rho_{e} \leq 1,\quad e=1,\dots,N \\ \text{KU} = F \end{array} $$
(1)

where c is the compliance of the structure, ρ is the vector of design variables consisting of the individual element (e) densities ρe, F is the external forces vector, U is the FE displacement vector, K is the global stiffness matrix of the structure, ve the volume of an element e, and V the maximum prescribed volume for the entire structure. The number of elements in the 2D/3D grid is N.

Using a modified solid isotropic material with penalization model Amir et al. (2010), the density of an element can be expressed as follows:

$$ \begin{array}{@{}rcl@{}} E_{e}(\rho_{e})=E_{\min }+{\rho_{e}^{p}} (E_{\text{nominal}}-E_{\min }) \end{array} $$
(2)

For topology optimization of large-scale structures, the bulk of the computational cost expectedly stems from the requirement to compute the numerical solution of the equilibrium equations at each iteration:

$$ \mathrm{K U} = F $$
(3)

Computing this full-field solution for large-scale topology optimization problems involves the inversion of a very large system of equations that can consist of up to millions or billions (Aage et al. 2017) of degrees of freedom. To improve the scalability of the approach to allow for implementation on parallel computing systems eventually (not treated in this particular paper), the FEA for the full-field solution is performed using a preconditioned conjugate solver for improved scalability, similar to Mahdavi et al. (2006) except using an incomplete Cholesky decomposition of K as the preconditioner.

The authors must point out that the PCG with incomplete Cholesky is no longer the state of the art solver, and computation times using multi-grid preconditioning (Tatebe 1993), the current gold standard according to the literature, may well be different from those listed in this work.

The basic operations are given in Algorithm 1, which is a standard procedure that may be found in any textbook on numerical methods. However, the iterative solution is still computationally expensive since it involves a large number of degrees of freedom, but also because of the preconditioning phase due to the poorly conditioned matrix K (large variations between nearly void \(E_{{\min \limits } }\) and solid \(E_{{\max \limits } }\)). To alleviate this issue, we propose a ROM procedure in the following subsections.

figure a

2.1 Projection-based reduced-order modeling

To reduce the computational effort during an iteration of the optimizer loop, we map the displacement field quantity (i.e., U) of the above large-scale problem (3) to a low-dimensional space through an appropriate orthonormal basis Φ (i.e., ΦTΦ = I) calculated on-the-fly using solution snapshots from the previous iterations.

The basis \({\Phi }=[\phi _{1} {\dots } \phi _{N_{b}}]\) is obtained using an effective set of Nb “snapshots” of the displacement field \(\boldsymbol U_{\text {temp}}=[U_{1},U_{2}, {\dots } ,U_{N_{b}}]\) each obtained by solving (3) during the main optimization, centered around the mean snapshot \(\bar u = ({\sum }_{k=1}^{N_{b}} U_{k})/N_{b}\). (Later on, we will show that Φ may be calculated by singular value decomposition (svd) of Utemp).

The problem projected onto the reduced basis transforms into the reduced system:

$$ \begin{array}{@{}rcl@{}} {\Phi}^{T} \mathrm{K U}_{\text{rb}} ={\Phi}^{T} F \end{array} $$
(4)

where Urb is the approximate solution to the higher dimensional displacement vector, obtained by a linear combination of the projection coefficients (α):

$$ \begin{array}{@{}rcl@{}} U_{\text{rb}} = {\Phi} \alpha + \bar u \end{array} $$
(5)

Equation (4) thus becomes:

$$ \begin{array}{@{}rcl@{}} {\Phi}^{T} K ({\Phi} \alpha + \bar u) = {\Phi}^{T} F \end{array} $$
(6)

The main consequence is that any of the displacement vector snapshots Ui may be expressed as a finite basis linear combination:

$$ \begin{array}{@{}rcl@{}} U_{i} \approx U^{i}_{\text{rb}} = \bar u + {\sum}_{k=1}^{m} {\alpha^{i}_{k}} \phi_{k} = \bar u + {\Phi} \alpha^{i} \end{array} $$
(7)

where the αi depend on the choice of the basis Φ. The error residual is given by the following:

$$ \epsilon_{\text{rb}}^{2} = \frac{\parallel \mathrm{K U}_{\text{rb}} - F \parallel^{2}}{\parallel F \parallel^{2}}= \frac{\parallel K({\Phi} \alpha + \bar u) - F \parallel^{2}}{\parallel F \parallel^{2}} $$
(8)

corresponding to the relative error between the internal forces stemming from the approximate reduced basis solution and the actually applied forces. If the approximate solution Urb were exact, the residual would be zero because the exact solution would satisfy the equilibrium equations KU = F.

The goal then is to use Urb in place of U for the optimization depending on the error threshold 𝜖rb. If this error is unreasonable, we then run the full-field FE, i.e., Equation (3) at that particular loop iteration to get a fresh displacement vector that will then be used to refine the basis. Note that in order to retain generality as far as possible, we will hold off on presenting the exact method of calculating the basis until the end of this section, the reason being that much of this section is relevant regardless of the choice of Φ. The exact basis updation scheme is described in the next subsection

2.2 Sensitivity analysis

When the ROM, i.e., Urb is used in place of the FE solution, the original objective function (compliance) may be expressed as follows:

$$ \begin{array}{@{}rcl@{}} c(\rho_{e}) = U_{\text{rb}}^{T} K(\rho_{e}) U_{\text{rb}} = ({\Phi} \alpha + \bar u)^{T} K(\rho_{e}) ({\Phi} \alpha + \bar u) \end{array} $$
(9)

The use of this expression, however, entails the verification of some additional constraints. The first constraint represents the Galerkin projected, i.e., reduced system of equations (replacing the original FE):

$$ \begin{array}{@{}rcl@{}} K_{rb} U_{\text{rb}} = F_{\text{rb}} \text{ or } \\ {\Phi}^{T} \mathrm{K U}_{\text{rb}} = {\Phi}^{T} K ({\Phi} \alpha + \bar u) = {\Phi}^{T} F \end{array} $$
(10)

The second constraint must be on the snapshots \(U_{1} {\dots } U_{N_{b}}\) used for generating the orthogonal basis vectors, having each (by definition) been obtained through the solution of the full equilibrium equation during the particular iteration that they were added to the set of snapshots:

$$ \begin{array}{@{}rcl@{}} K_{i} U_{i}=F \text{ where } i=1,2 {\dots} N_{b} \end{array} $$
(11)

where Ki is simply the stiffness matrix for which the snapshot vector Ui was obtained. In the completely general case, the sensitivity of the compliance calculated using the ROM is potentially different from the sensitivity for the original problem.

Following Kirsch and Papalambros (2001) and Gogu (2015), the conventional way to calculate the modified sensitivity is by using the adjoint equation, using Lagrange multipliers \(\mu _{i},\lambda _{i},i=1 {\dots } N_{b}\) for the two constraints in (10) and (11).

The modified objective function may then be represented as follows:

$$ \begin{array}{@{}rcl@{}} c(\rho_{e})= ({\Phi} \alpha + \bar u)^{T} K ({\Phi} \alpha + \bar u) - 2 \mu^{T}[{\Phi}^{T} K ({\Phi} \alpha + \bar u) - {\Phi}^{T} F] \\ - \sum\limits_{i=1}^{N_{b}} {\lambda_{i}^{T}} (K_{i} U_{i}-F) \end{array} $$
(12)

This expression may be simplified as follows:

$$ \begin{array}{@{}rcl@{}} c(\rho_{e}) = [ \alpha^{T} {\Phi}^{T} K {\Phi} \alpha - 2 \mu^{T} ({\Phi}^{T} K {\Phi} \alpha - {\Phi}^{T} F ) ] + [ 2 \bar u^{T} K {\Phi} \alpha + \bar u^{T} K \bar u \\ - 2 \mu^{T} {\Phi}^{T} K \bar u ] - [ \sum\limits_{i=1}^{N} {\lambda_{i}^{T}} (K_{i} U_{i}-F) ] = c_{1}(\rho_{e}) + c_{2}(\rho_{e}) + c_{3}(\rho_{e}) \end{array} $$
(13)

where c1,c2, and c3 are the terms within the square brackets.

Each of the three terms may then be individually evaluated as follows:

$$ \begin{array}{@{}rcl@{}} \frac{\partial c_{1}}{\partial \rho_{e}} = 2 (\alpha-2 \mu)^{T} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} K {\Phi} \alpha + 2 (\alpha-\mu)^{T} {\Phi}^{T} K {\Phi} \frac{\partial \alpha}{\partial \rho_{e}} \\ + (\alpha-2\mu)^{T} {\Phi}^{T} \frac{\partial K}{\partial \rho_{e}} {\Phi} \alpha + 2 \mu^{T} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} F \end{array} $$
(14)
$$ \begin{array}{@{}rcl@{}} \frac{\partial c_{2}}{\partial \rho_{e}} = 2 (\alpha-\mu)^{T} {\Phi}^{T} K \frac{\partial \bar u}{\partial \rho_{e}} + 2 (\alpha-\mu)^{T} {\Phi}^{T} \frac{\partial K}{\partial \rho_{e}} \bar u + \bar u^{T} \frac{\partial K}{\partial \rho_{e}} \bar u \\ 2 (\alpha-\mu)^{T} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} K \bar u + 2 \frac{\partial \alpha^{T}}{\partial \rho_{e}} {\Phi}^{T} K \bar u + 2 \frac{\partial \bar u^{T}}{\partial \rho_{e}} K \bar u \\ = 2 [(\alpha-\mu)^{T} {\Phi}^{T} + \bar u ] K \frac{\partial \bar u}{\partial \rho_{e}} + [2(\alpha-\mu)^{T} {\Phi}^{T} + \bar u ] \frac{\partial K}{\partial \rho_{e}} \bar u \\ + 2 \frac{\partial \alpha^{T}}{\partial \rho_{e}} {\Phi}^{T} K \bar u + 2 (\alpha-\mu)^{T} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} K \bar u \end{array} $$
(15)

and the last term:

$$ \begin{array}{@{}rcl@{}} \frac{\partial c_{3}}{\partial \rho_{e}} = - \sum\limits_{i=1}^{N_{b}} {\lambda_{i}^{T}} \frac{\partial K_{i}}{\partial \rho_{e}} U_{i} - \sum\limits_{i=1}^{N_{b}} {\lambda_{i}^{T}} K_{i} \frac{\partial U_{i}}{\partial \rho_{e}} \end{array} $$
(16)

In order to solve the adjoint equation, we remember that we are free to choose the Lagrange multipliers as we see fit. A useful substitution is \(\mu =(\alpha + {\Phi }^{T} \bar u)\) giving the following:

$$ \begin{array}{@{}rcl@{}} \frac{\partial c_{1}}{\partial \rho_{e}} = - 2 (\alpha+2 {\Phi}^{T} \bar u)^{T} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} K {\Phi} \alpha - 2 \bar u^{T} K {\Phi} \frac{\partial \alpha}{\partial \rho_{e}} \\ - (\alpha+2 {\Phi}^{T} \bar u)^{T} {\Phi}^{T} \frac{\partial K}{\partial \rho_{e}} {\Phi} \alpha + 2 (\alpha+{\Phi}^{T} \bar u)^{T} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} F \end{array} $$
(17)

and

$$ \begin{array}{@{}rcl@{}} \frac{\partial c_{2}}{\partial \rho_{e}} = 2 \frac{\partial \alpha^{T}}{\partial \rho_{e}} {\Phi}^{T} K \bar u - 2 \bar u^{T} {\Phi} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} K \bar u - \bar u^{T} \frac{\partial K}{\partial \rho_{e}} \bar u \end{array} $$
(18)

From the above, we end up with the following:

$$ \begin{array}{@{}rcl@{}} \frac{\partial c}{\partial \rho_{e}} = - 2 (\alpha+2 {\Phi}^{T} \bar u)^{T} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} K {\Phi} \alpha - (\alpha+2 {\Phi}^{T} \bar u)^{T} {\Phi}^{T} \frac{\partial K}{\partial \rho_{e}} {\Phi} \alpha \\ + 2 (\alpha+{\Phi}^{T} \bar u)^{T} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} F - 2 \bar u^{T} {\Phi} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} K \bar u \\ - \bar u^{T} \frac{\partial K}{\partial \rho_{e}} \bar u - \left[ \sum\limits_{i=1}^{N_{b}} {\lambda_{i}^{T}} \frac{\partial K_{i}}{\partial \rho_{e}} U_{i} + \sum\limits_{i=1}^{N_{b}} {\lambda_{i}^{T}} K_{i} \frac{\partial U_{i}}{\partial \rho_{e}} \right] \end{array} $$
(19)

which may further be simplified to the following:

$$ \begin{array}{@{}rcl@{}} \frac{\partial c}{\partial \rho_{e}} = -U_{\text{rb}}^{T} \frac{\partial K}{\partial \rho_{e}} U_{\text{rb}} + 2 U_{rb}^{T} {\Phi} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} (F-\mathrm{K U}_{\text{rb}}) \\ - \left[ \sum\limits_{i=1}^{N_{b}} {\lambda_{i}^{T}} \frac{\partial K_{i}}{\partial \rho_{e}} U_{i} + \sum\limits_{i=1}^{N_{b}} {\lambda_{i}^{T}} K_{i} \frac{\partial U_{i}}{\partial \rho_{e}} \right] \\ = -U_{\text{rb}}^{T} \frac{\partial K}{\partial \rho_{e}} U_{\text{rb}} + 2 U_{\text{rb}}^{T} {\Phi} \frac{\partial {\Phi}^{T}}{\partial \rho_{e}} {\Delta} F - \left[ \sum\limits_{i=1}^{N_{b}} {\lambda_{i}^{T}} \frac{\partial K_{i}}{\partial \rho_{e}} U_{i} + \sum\limits_{i=1}^{N_{b}} {\lambda_{i}^{T}} K_{i} \frac{\partial U_{i}}{\partial \rho_{e}} \right] \end{array} $$
(20)

The above equation is a generalized version of the expression obtained by Gogu (2015), in the context of an orthonormal basis Φ and including the effect of the mean snapshot \(\bar u\), and is valid for any reduced approach in the Galerkin family. (Note that if the mean \(\bar u\) were assumed to be = 0 (centered snapshots), the second set of terms within parentheses would vanish yielding the same exact expression as in Gogu 2015).

To go further and obtain a final expression, we present the updation strategy in the next subsection.

2.3 On-the-fly reduced basis construction and updation strategy

In the last equation of the previous subsection, we still need to determine \(\lambda _{1}..\lambda _{N_{b}}\) and \(\frac {\partial {\Phi }}{\partial \rho _{e}}\) so as to obtain \(\frac {\partial c}{\partial \rho _{e}}\), and these will depend on the particular updation strategy, which is explained in detail in this subsection.

After iNb iterations of a classical topology optimization procedure, we expect to have already calculated Nb displacement vectors (\(U_{1} {\dots } U_{N_{b}}\)) by the usual process of inverting the full equilibrium equations in (3). As hinted earlier, the subspace generated by these Nb previously calculated vectors can be used to calculate a reduced basis Φ that could be used to estimate the displacement vector for the next iteration (i + 1).

This means that the corresponding (approximate) displacement vector is obtained using the ROM in (5), which calculates the reduced state variables at the current iteration (i + 1) (and, thus, an approximation of U) by solving the equilibrium equations projected on the subspace generated by the Nb displacement vector snapshots.

At iteration (i + 2), a new approximation of the displacement vector can still be calculated using the ROM with the same subspace generated by the first Nb displacement vectors. This process may be applied until the approximate solution using the ROM is no longer sufficiently accurate, based, for example, on a threshold on the value of the residual 𝜖rb in (8), at which point, we use (3) to get a fresh snapshot vector to replace the oldest stored vector, and thus refine the basis Φ.

So whenever the ROM is used, we have Nb basis vectors that are only updated as and when the residual exceeds our pre-specified tolerance, by re-running (3) and replacing the oldest snapshot vector.Footnote 1 When the residual is below the tolerance, we use Urb instead.

This means that we do not use a continuously evolving basis Φ in this work past the firstNb iterations (that are used to determine the initial basis), rather our basis is only updated using a fresh FE solution to modify Utemp when the error residual 𝜖rb in (8) is unacceptably high. If the residual is within the tolerance, we reuse the existing Φ.

The basic approach is given in Algorithm 2.

figure b

In addition, when the reduced basis Φ is used to get Urb, \(K_{1} {\dots } K_{N_{b}}\) and \(U_{1} \dots U_{N_{b}}\) are not continuous functions of the current density ρe (having been previously obtained during the basis-changing iterations). This in turn applies to the basis Φ (obtained from the snapshots Ui). So most of the terms in the previously obtained expression will vanish.

This ultimately means that we recover the classical expression for the sensitivity for our particular approach, i.e.,

$$ \begin{array}{@{}rcl@{}} \frac{\partial c}{\partial \rho_{e}} = -U_{rb}^{T} \frac{\partial K}{\partial \rho_{e}} U_{\text{rb}} \end{array} $$
(21)

In the next subsection, we complete this section by describing the procedure of constructing Φ from the FE solutions \(U_{1} {\dots } U_{N_{b}}\) using PCA.

2.4 Construction of ROM (Φ and Urb) using PCA

As explained earlier, we map the displacement field quantity of the above large-scale problem (i.e., U) to a low-dimensional space through an appropriate orthonormal basis Φ. The higher dimensional data may then be reconstructed by linear combination of the projection coefficients α using (5), thus leading to the reconstruction error in (8). The PCA approach in this paper uses singular value decomposition to calculate Φ using the matrix of the M displacement vector snapshots to minimize this reconstruction error.

The basic idea behind “economical” singular value decomposition (SVD) of a real matrix DN×M where N > M is expressing it as under:

$$ \begin{array}{@{}rcl@{}} \mathbf{D} = \boldsymbol {\Psi} \boldsymbol {\Sigma} {\boldsymbol V}^{T} \end{array} $$
(22)

where ΨN×M and VM×M are both unitary/orthogonal matrices and ΣM×M is a diagonal matrix (i.e., Σij = δij). It can be easily shown that Ψ is the matrix of eigenvectors of the square covariance matrix Cv = DDT while the elements along the “diagonal” of Σ squared are its eigenvalues.

Constructing the centered snapshot matrix D using M stored FE solutions centered around the mean snapshot \(\bar u\):

$$ \begin{array}{@{}rcl@{}} \boldsymbol{D} = [ U_{1} - \bar u {\dots} U_{M}-\bar u] \end{array} $$
(23)

gives the reduced basis Φ composed of the first Nb columns of Ψ, where the number of modes Nb is selected according to the energy criterion:

$$ \epsilon_{\text{PCA}} = 1 - \frac{{\sum}_{i=1}^{N_{b}}s_{i}}{{\sum}_{j=1}^{M} s_{j}} $$
(24)

Note here that since the actual calculation process here involves a relatively small Nb (total number of snapshots) in the first place, compared to the number of degrees of freedom in the full-field model, we can use all the Nb modes without truncation, i.e., Nb = M.

Algorithm 2 is then completed with details about the construction of Φ, and therefore Urb as shown below in algorithm 3.

figure c

3 Benchmark tests

To demonstrate the effectiveness of the approach presented in this paper, we first compare the PCA-based approach with an ROM based on Gram–Schmidt orthonormalization (Gogu 2015) for a 2D benchmark compliance minimization problem. Next, we use two benchmark 3D tests and minimize the structural compliance with the classical SIMP (single isotropic material with penalization) assumption. The elastic parameters: maximum and minimum (dimensionless) Young’s moduli Enominal = 1 and \(E_{{\min \limits } }=10^{-9}\), Poisson’s ratio ν = 0.3. The penalty factor p = 3 and a density filter radius of 1.5 has been applied in both cases.

As an alternative to the frequently used optimality criteria approach (Saxena and Ananthasuresh 2000; Yin and Yang 2001; Sigmund 2001), we have used the method of moving asymptotes (Svanberg 1987, 2002) for the optimization loop in this work. This method is based on a convex representation of the objective function and is conveniently adapted to the problem of topology optimization due to its ease of use. The method has already been demonstrated to work very well on a vast variety of topology optimization problems (Bendsoe and Sigmund 2004; Aage and Lazarov 2013), and lends itself to increased scalability due to the separable nature of the convex approximation.

3.1 2D case: ROM comparison between Gram–Schmidt and PCA

As has been mentioned in the introductory section, a ROM approach for topology optimization using Gram–Schmidt orthonormalization was proposed in Gogu (2015). To compare our proposed approach, i.e., PCA-based on-the-fly reduced-order model, we use the same classical benchmark 2D Messerschmitt–Bolkow–Blohm (MBB) problem (Fig. 1) to assess computational effort, time and accuracy.

Fig. 1
figure 1

2D Messerschmitt–Bolkow–Blohm (MBB) benchmark problem

The problem parameters have been set as follows: 150×50 and 600×200 (voxel) FE mesh/grid, nominal, and minimum (dimensionless) Young’s moduli Enominal = 1 and E\(_{{\min \limits } } =\) 10− 9, Poisson’s ratio = 0.3, a maximum allowable volume fraction νf of 0.5, a penalization factor p = 3, and a density filter radius of 1.5, with the optimization iterations stopped when the density variation within any of the elements is less than 1%.

In order to ensure the convergence of each result of every test, we may set a larger value for the maximum number of iterations: here we set 6000, just to be on the safer side. For both ROM, the number of PCA modes Nb is selected as 4, residual threshold 𝜖rb is selected as 0.01. All these parameters are fixed, allowing us to change the filter size \(r_{{\min \limits } }\) on both convergence speed and accuracy of the objective function. The optimized topology and corresponding computing results are summarized in the following discussion.

Figure 2 gives the optimal topologies obtained using the reference routine (i.e., without any ROM), the PCA-based ROM as well as the Gram–Schmidt approach, on a FE mesh of grid resolution 150 × 50 (resolution given in voxels). From the figure, we can see that the three topologies are visually indistinguishable, which means both ROMs yield almost identical design results to that obtained with the reference full order model in the 2D case.

Fig. 2
figure 2

Optimal topologies generated using the Gram–Schmidt with \(r_{{\min \limits } } =\) (a) 1.5, (b) 2.0, (c) 2.5, PCA with \(r_{{\min \limits } } =\) (d) 1.5, (e) 2.0, (f) 2.5 and reference routine with \(r_{{\min \limits } } =\) (g) 1.5, (h) 2.0, (i) 2.5 for 150 × 50 2D grid

The corresponding results are summarized in Table 1 and Fig. 3.

Table 1 Comparison of performance for 150 × 50 2D grid resolution
Fig. 3
figure 3

Residual comparison between Gram–Schmidt and PCA with (a) 𝜖rb = 0.1, Nb = 4 and (b) 𝜖rb = 0.1, Nb = 10 (c) Nb = 40 𝜖rb = 0.1, (d) Nb = 4 𝜖rb = 0.05, and (e) 𝜖rb = 0.05 Nb = 10 (f) 𝜖rb = 0.01 Nb = 4

One can see from Table 1 that various minuscule features (like a tiny hole that appears in the “optimal” topology) fade away before our naked eyes with a slight increase of filter size from 1.5 to 3 for each computation method in each column. However, the boundary of optimal topology for all models gets smoother but fuzzier as we increase the filter size, which may lead to the illusion of the hole getting smaller or even disappearing. We may also draw a conclusion from the table that less optimization time is needed if we use a larger value of filter size (within the adequate range) for any method (reference, PCA and Gram–Schmidt), but larger values of filter size lead to a poorer optimal compliance. It is noteworthy that when using filter size \(r_{{\min \limits } }=\) 3, the performance shows a downtick which indicates us there is an optimal filter size.

Moreover, by comparing the PCA approach and Gram–Schmidt routines, we find that the PCA method requires less optimization time and a remarkably fewer number of full solutions (but more iterations) than the Gram–Schmidt method for the same filter size. This validates the PCA ROM as more efficient than the Gram–Schmidt at each iteration step. As far as accuracy of the final objective function is concerned, PCA and Gram–Schmidt methods are basically similar. If we investigate in detail, the former has a slightly higher precision than the latter. To explain the advantage of the PCA approach over Gram–Schmidt in accuracy, it is instructive to analyze the evolution of the residual throughout the whole iteration process. From Fig. 3, we can very clearly see that PCA method has a clearly lower residual than the Gram–Schmidt method when solving for intermediate displacement vectors during the entire optimization process.

Under the same control precision of design density (1%, here), PCA approach always converges earlier and has a higher convergence accuracy compared to the Gram–Schmidt method for a given \(r_{{\min \limits } }\), a clear improvement in both efficiency and accuracy in this 2D case. We may therefore conclude that the PCA method outperforms the Gram–Schmidt method, at least for this particular 2D benchmark problem.

It is important to note that none of this is counter-intuitive, since the Gram–Schmidt is basically an approximation to the POD with the modes directly obtained from the snapshots by orthonormalization rather than going through the procedure of finding the optimal modes through SVD.

3.2 3D case 1: simply supported beam

This test case is a 3D variant of the MBB benchmark problem (Fig. 4)—a simply supported beam under flexion in 3D.

Fig. 4
figure 4

First 3D test case and boundary conditions

The optimization iterations have been stopped when the density variation within any of the elements is less than 1% (or when 100 iterations have been completed).

We focus on the influence of varying the ROM error threshold 𝜖rb and the number of snapshots Nb used to construct the basis Φ, as well as the scalability of the approach with grid resolution.

3.2.1 Scalability of performance with grid resolution

Four different grids were considered here in increasing order of resolution: a coarse 96× 24 ×64 grid, a finer 108× 27 ×72 grid, an even finer grid (132× 33 ×88), and a high-resolution 156× 39 ×104. The 3D topology results are shown in Fig. 5a–d.

Fig. 5
figure 5

Optimized 3D topologies for the MBB beam problem, using four different grids with increasing resolution (a) 96× 24 ×64 grid, b finer 108× 27 ×72 grid) (c) finer grid (132× 33 ×88) and d 156× 39 ×104 obtained using PCA

Figure 6a and b compare the traditional (without ROM) topology optimization performance with the PCA-coupled approach, for 100 iterations, and the scalability of the savings, respectively. The break point represents the transition where more calls have been made to the ROM rather than the full-field model. It is immediately evident that the number of function calls to the full-field FEM drops off and stabilizes as the number of calls to the significantly less computationally intensive PCA routine increases gradually (after the first Nb iterations and progressively stabilizes). This leads to a dramatic reduction in computational time and effort as seen from the CPU times required for each case.

Fig. 6
figure 6

a Typical 132× 33 ×88 grid comparison of computational effort between traditional and PCA approach (b) Scalability of ROM performance with grid size using four different grid resolutions - comparison of computational effort with and without ROM

It is thus clear that coupling the ROM using the on-the-fly calculated PCA basis significantly improves the computational efficiency of the overall optimization routine. This improvement scales up with the grid resolution. Next, we will attempt to identify some ”best practices” for choosing appropriate Nb and 𝜖rb.

3.2.2 Performance of ROM with varying Nb and 𝜖rb

For this parametric study, we have used all the snapshots without truncation of the basis (Nb = M). In the first part, we vary Nb (number of modes/snapshots) from 2 to 20, so as to compare the number of calls to the ROM with calls to the full-field solution, as earlier. The threshold is fixed at 𝜖rb = 0.1. The results are shown in Fig. 7a, b for two different grid resolutions and Fig. 8.

Fig. 7
figure 7

96× 24 ×64 and 132× 33 ×88 grids—comparison of PCA computational effort for different Nb (no truncation) and 𝜖rb = 0.05

Fig. 8
figure 8

132× 33 ×88 grids—semilog plot comparison of PCA computational effort for different Nb (no truncation) and 𝜖rb = 0.05

These results are summarized in Table 2. It is interesting that there is no monotonic relationship between Nb and the number of full-field calls, and 10 modes being the ideal basis size for this particular problem.

Table 2 Performance comparison for various Nb over 100 iterations for the first 3D test case (using a 132× 33 ×88 grid)

Figures 9 and 10 show the influence of varying the error threshold 𝜖rb from 0.01 to 0.2 on the performance of the ROM-coupled topology optimization, when the number of snapshots/modes Nb is fixed at 5. There is an “expectedly” monotonic trend in the number of full-field calls with reducing 𝜖rb.

Fig. 9
figure 9

132× 33 ×88 grid comparison of PCA computational effort for Nb = 5 modes and varying 𝜖rb (no truncation)

Fig. 10
figure 10

132× 33 ×88 grid—semilog plot comparison of PCA computational effort for different 𝜖rb (no truncation) and Nb = 5 (below) zoomed in to the marked region

The above results are summarized in Table 3.

Table 3 Performance comparison for various 𝜖rb over 100 iterations for the first 3D test case (using a 132× 33 ×88 grid)

While calls to the ROM/full-field model are a crucial performance indicator, it is important to distinguish between a reduction in full-field calls and a reduction in CPU time. If a full-field call is followed by a single ROM call before we require another full-field call, we have gained nothing from the ROM. The CPU time reduction is therefore the final litmus test for the ROM.

Summing up, the error threshold determines the position of the “break/transition point” where the optimizer makes more calls to the ROM compared to the full-field FE solution, since raising 𝜖rb increases the admissibility of the ROM solution Urb, thus increasing the number of calls to the ROM while reducing the calls to the full-field FEM. However, there is a tradeoff since increasing the threshold beyond a certain point reduces the precision of the solution, thus potentially reducing the performance of the procedure. For this particular problem, 0.05 appears to be a reasonably good choice.

One would expect increasing Nb to improve the ROM but this is not necessarily the case. By increasing Nb, we increase the amount of information in the ROM but also the number of less relevant modes, leading to a loss of efficacy. The number of modes to be retained for this particular problem appears to be around 10 where both computational efficiency and precision are both simultaneously maximized. Too few (or too many) modes retained will reduce the performance of the ROM, at least for this case.

3.3 3D case 2: MBB beam

We next consider another classical 3D benchmark topology optimization test case: the original Messerschmitt–Bolkow–Blohm/MBB problem in 3D. The boundary conditions of the beam are given in Fig. 11. Just like in the previous test case, we study the effect of Nb, 𝜖rb, and grid resolution (for scalability). The elastic parameters are the same as before, i.e., Young’s moduli (maximum and minimum), Poisson’s ratio. vfrac is chosen as 0.1, the penalization = 3, and the density filter radius is 0.5.

Fig. 11
figure 11

Boundary conditions for the second test case: 3D MBB beam problem

In addition, three different maximum allowable volume fractions vfrac have been considered: 0.1, 0.2, and 0.3.

As in the previous test case, we will focus on the influence of different vfrac (\(({{\sum }_{1}^{N}} v_{e})/V\)), ROM error threshold 𝜖rb, and the number of snapshots Nb used to construct the basis Φ, as well as the scalability with grid resolution.

3.3.1 Performance and scalability of ROM

Three different grid resolutions (in voxels) were considered in this work: a fairly coarse 12× 12 ×72 grid, a finer 24× 24 ×144 grid, and (c) very fine grid 48× 48 ×288.

The volume fraction vfrac = 0.1 here.

Figure 12 shows the optimized topologies generated by the TopOpt–PCA algorithm, which are, as expected, visually indistinguishable from those obtained without using the ROM. The results are shown in Fig. 13.

Fig. 12
figure 12

Comparing optimized 3D topologies for the MBB beam problem, using three different grids with increasing resolution a coarse 12× 12 ×72 grid, b finer 24× 24 ×144 grid), and c very fine grid (48× 48 ×288) obtained using PCA

Fig. 13
figure 13

Comparing number of function calls to FE solver vs. PCA using Nb = 4 modes (10 total snapshots) and 𝜖rb = 0.1 for three grids with increasing resolution)

3.3.2 Performance of ROM with varying Nb and 𝜖rb

In the first part, we have used all the snapshots without truncation of the basis, and varied Nb (number of modes/snapshots) from 2 to 20, and compared the number of calls to the ROM with calls to the full-field solution, with the threshold 𝜖rb = 0.1 (fixed). In the second part, we show the influence of varying the error threshold 𝜖rb from 0.02 to 0.1 on the performance of the ROM-coupled topology optimization routine. The number of snapshots/modes Nb here is fixed at 8.

The results are shown in Figs. 1415, and 16. with a summary given in Tables 4 and 5.

Fig. 14
figure 14

Performance of PCA varying aNb (without truncation and 𝜖rb = 0.01) and b𝜖rb (using Nb = 8 modes) on a 24× 24 ×144 grid

Fig. 15
figure 15

24× 24 ×144 grid—semilog plot comparison of PCA computational effort for different Nb (no truncation) and 𝜖rb = 0.01

Fig. 16
figure 16

24× 24 ×144 grid—semilog plot comparison of PCA computational effort for different 𝜖rb and Nb = 8 modes

Table 4 Performance comparison for various Nb for 500 iterations for the second 3D test case (using a 24× 24 ×144 grid)
Table 5 Performance comparison for various 𝜖rb for 500 iterations for the second 3D test case (using a 24× 24 ×144 grid)

From the above results, it is clear that Nb and 𝜖rb are vital parameters, which are unfortunately problem and grid resolution dependent.

3.3.3 Effect of material volume fraction

Finally, we consider three different vfrac = 0.1, 0.2, and 0.3 in order to study the evolution of the computational savings with increasing material volume fraction. The results are shown in Fig. 17 below:

Fig. 17
figure 17

ROM (PCA using Nb = 8 modes without truncation and 𝜖rb = 0.1) performance for three different volume fractions (vfrac) 0.1, 0.2 and 0.3 on a 24× 24 ×144 grid

The corresponding optimized topologies are shown below in Fig. 18: It is interesting to note that the material volume fraction has a striking influence on the ROM performance. As we increase material volume fraction, the proportion of calls to the ROM increases. In Bendsoe and Sigmund (2004), it is noted that for low vfrac (i.e., below 10%), the convergence of the topology optimization routine becomes more tedious due to oscillations. The benefit of using the ROM is in being able to avoid unnecessary full-field calculations by extracting the most relevant modes (of the density map).

Fig. 18
figure 18

3D topologies for the three volume fractions a 0.1, b 0.2, and c 0.3 on a 24× 24 ×144 grid

3.4 Discussion

The PCA algorithm significantly enhances the performance of the topology optimization routine with a significant reduction in computational effort and CPU time in all test cases investigated. We note that the improvement in performance scales up with the grid resolution. It is also clear that there is an improvement in the reduction in computational effort as we increase the volume fraction—though this may simply be because the higher volume fraction problem would be expected to converge faster.

A conceivably less obvious advantage of the “on-the-fly” ROM, applied to the displacement vector, with constant monitoring for precision using the full-field model as a stand-by, very likely allows for a basis size (Nb)-independence of the optimized density map. It stands to reason that if 𝜖rb were inflated to an unreasonable level, we would lose this benefit.

4 Perspectives: extension of approach to non self-adjoint problems

We have, in this paper, focused on developing an ROM approach for self-adjoint problems, with a primary focus on the popular compliance minimization. Consider now a typical compliant mechanism design problem (shown in Fig. 19) The input end A is subjected to a horizontal concentrated load Fin = 100 towards the right. Our objective is to maximize the displacement uout of output point B.

Fig. 19
figure 19

Displacement–inverter topology optimization problem

Here, we consider the simplest possible type of compliant mechanism in which the displacement (Uout) is prescribed at a given node or set of nodes using the sparse vector \(\tilde L\).

The optimization problem may then be posed as follows:

$$ \begin{array}{@{}rcl@{}} \max\limits_{\rho} U_{\text{out}} (\rho) = {\tilde L}^{T} U = {\tilde L}^{T} ({\Phi} \alpha + \bar u) \\ \text{such that } {\Phi}^{T} K ({\Phi} \alpha + \bar u) = F \\ \sum\limits_{e=1}^{N} v_{e}\rho_{e} = v_{\text{frac}}V < V \\ \rho_{e} \in [0,1],\quad e=1,\dots,N \end{array} $$
(25)

Following Sections 2.2 and 2.3:

$$ \begin{array}{@{}rcl@{}} \mathcal{L}(\rho,\mu,\lambda)&&= {\tilde L}^{T} K ({\Phi} \alpha + \bar u) - \mu^{T}{\Phi}^{T} [K ({\Phi} \alpha + \bar u) - F] \\&&\quad- \sum\limits_{i=1}^{N_{b}} {\lambda_{i}^{T}} (K_{i} U_{i}-F) \end{array} $$
(26)

Using the same reasoning in Section 2.3, for the on-the-fly updation strategy, the basis Φ is not a continuously evolving function of ρe, we state that \(\frac {\partial {\Phi }}{\partial \rho }\), \(\frac {\partial \bar u}{\partial \rho }\) as well as the last two terms in the derivative vanish giving the following:

$$ \begin{array}{@{}rcl@{}} \frac{\partial \mathcal{L}}{\partial \rho_{e}} = ({\tilde L}^{T} {\Phi} -\mu^{T}{\Phi}^{T} K {\Phi}) \frac{\partial \alpha}{\partial \rho_{e}} - \mu^{T} {\Phi}^{T} \frac{\partial K}{\partial \rho_{e}} U_{\text{rb}} \end{array} $$
(27)

We choose μ such that:

$$ \begin{array}{@{}rcl@{}} ({\Phi}^{T} K {\Phi}) \mu = K_{\text{rb}} \mu = {\Phi}^{T} \tilde L \end{array} $$
(28)

where Krb is the reduced stiffness matrix from (10), allowing for inexpensive inversion, this giving us the simple expression for the reduced sensitivity:

$$ \begin{array}{@{}rcl@{}} \frac{\partial \mathcal{L}}{\partial \rho_{e}} = - \mu^{T} {\Phi}^{T} \frac{\partial K}{\partial \rho_{e}} U_{\text{rb}} \end{array} $$
(29)

The solution of (28) is then used to calculate the reduced sensitivity from (29). But the system in (28) has reduced dimensionality compared to (3), indicating that we now have a single reduced-order system with two load cases to solve.

We now apply both the above modified “on-the-fly” POD ROM as well as the Gram–Schmidt orthogonalization (Gogu 2015) and investigate the influence of the type of ROM, Nb and 𝜖rb on the results obtained for a displacement inverter. vfrac is set as 0.3 and the MMA algorithm is used for the optimization. Material elastic modulus is 1, the minimum (void) elastic modulus is 10− 9, and Poisson ratio is 0.3. The SIMP penalty factor is 3 and the filter radius is 1.5 (using sensitivity filtering). Optimization terminates when the maximum elemental density variation < 0.1% or 400 iterations have been completed.

In the design domain shown, the upper and lower ends on the left are simply supported, middle nodes of the left and right boundaries are input (load) end and output end (displacement) respectively. The structure is discretized by 100×100 square elements of unit volume. Linear springs simulate the structural stiffness of the input end and output end (kin = kout = 1). Figure 20 shows the optimal topologies of the reference model, POD and Gram–Schmidt orthogonalization (simply referred to as G-S) ROMs for Nb = 5, 𝜖rb = 0.01.

Fig. 20
figure 20

Optimal topologies obtained a without ROM, b G-S, and c POD

The optimal topology obtained using the POD model is almost the same as that of G-S model as well as the reference model, by visual inspection, satisfying the property of vertical symmetry and the requirements of mechanical properties as well as actual processing and manufacturing, indicating that the proposed method can meet the requirements of high-accuracy design.

Table 6 compares the results of the two ROMs (POD and G-S) by varying 𝜖rb and Nb. From Table 6, we see that for the same ROM parameters (Nb and 𝜖rb), there are significantly more calls to the POD ROM than the G-S, particularly for smaller values of 𝜖rb, not to mention the ROM is used far more frequently than the full-field solution. We again note that CPU time is not necessarily proportional to the number of full FE calls, since oversampling could potentially increase the cost of updating the reduced basis, and any reduction in full FE calls can no longer make up for the time gap.The top speed-up for the POD ROM is 1.47 (corresponding to 1.37 for the G-S), and time-saving is about 32% (against 27% for the G-S) and for low Nb and high 𝜖rb, the optimization efficiency is higher.

Table 6 Performance comparison of G-S ROM and POD ROM (with reference)

5 Conclusions

In this paper, we have presented an approach for efficient large-scale topology optimization based on coupling of topology optimization with reduced-order modeling by principal components analysis, using on-the-fly construction of the reduced basis with a database of previously calculated solutions of the FE equations.

Topology optimization coupled with on-the-fly PCA calculated basis is seen to significantly outperform the classical approach. It is important to note that we avoid storage of the “temporary” stiffness matrices and basis vectors during the “basis-changing” iterations, which means that the storage requirement is significantly reduced compared to previous methods. The PCA approach showed a significant reduction in computational effort over the traditional full-field solution approach. The improvement in performance scales well with the size of the problem.

While we have focused on the compliance minimization problem, the current method should be applicable to virtually any self-adjoint topology optimization problem, regardless of the particular physics involved.

Another obvious area of immediate work is using high-performance computing and non-intrusive asynchronous PCA to obtain additional improvement in the computational time and effort needed.

Finally, a formal extension of the approach to general non self-adjoint problems is a key area of future research.