1 Introduction

Both, analyses and design sensitivity analyses, are crucial in gradient-based optimization wherein analyses are performed to predict the performance of proposed designs, while design sensitivity analyses are performed to quantify the performance changes with respect to design changes. Since the optimization is iterative and because it relies on the accurate values of the gradients, efficient and accurate sensitivity analyses are essential. The finite difference sensitivity method requires one re-analysis to compute the sensitivities of the performance functions with respect to each design variable, so this method is extremely inefficient especially when the primal analysis is time-consuming. On the other hand, analytical direct differentiation and adjoint sensitivity analyses are very efficient. Unfortunately, this efficiency requires the analytical evaluation of various derivatives which may be difficult to compute since they require detailed knowledge of the analysis program. Indeed, analytical sensitivities require the differentiation of specific element formulations and material models with respect to a variety of design variables (Cheng and Olhoff 1993; Kiendl et al. 2014). To alleviate these implementation issues, the semi-analytical method approximates these derivatives with finite differences; as such little knowledge of the analysis program is required. However, care must be exercised as the accuracy of the semi-analytical method depends on the finite difference perturbation size. For a thorough review of sensitivity analyses and the semi-analytical method see Haftka and Adelman (1989), Tortorelli and Michaleris (1994), Gunzburger (2003), van Keulen et al. (2005), Haftka and Gürdal (2012).

Much work has been focused on the semi-analytical method for linear static structural problems (Gallagher and Zienkiewicz 1973; Botkin 1982; Camarda and Adelman 1984; Esping 1984; Cheng and Liu 1987; Barthelemy et al. 1988; Pedersen et al. 1989; Barthelemy and Haftka 1990; Haftka and Adelman 1989; Fenyes and Lust 1991; Olhoff and Rasmussen 1991; Bestle and Seybold 1992), especially its application to shape sensitivity analysis.

Our response functions are integrals over the domain. In shape sensitivity analysis, the design variables include geometric parameters that define this domain. Thus, analytical shape sensitivity analyses require the use of the material derivative from continuum mechanics and such computations can be onerous. For this reason, the semi-analytical method of shape sensitivity analyses may be preferable for its ease of implementation; moreover, it is fully reliable for most problems in which the structural displacement field entails small rigid-body rotations relative to deformations of the finite elements (Olhoff et al. 1993). However, large errors attributed to rigid body rotations of the finite elements have been found in shape sensitivities computed with the semi-analytical method (Barthelemy et al. 1988; Cheng et al. 1989; Pedersen et al. 1989; Fenyes and Lust 1991; Olhoff and Rasmussen 1991; Cheng and Olhoff 1993).

Different approaches have been suggested to improve the accuracy of the semi-analytical method. For example, improved accuracy is obtained by using the second-order central differences scheme, instead of first-order accurate forward differences (Barthelemy et al. 1988; Cheng et al. 1989; Haftka and Adelman 1989; Pedersen et al. 1989; Fenyes and Lust 1991). This method requires an additional computational cost and unfortunately does not completely eliminate the errors caused by large rigid body motions in shape sensitivity analysis. To circumvent this, the natural approach retains consistency conditions for rigid body modes and their derivatives (Mlejnek 1992). Alternatively, the analytical derivatives of the element rigid body modes are incorporated in the refined semi-analytical design sensitivities approach to alleviate inaccuracies (Van Keulen and De Boer 1998). Utilizing specific characteristics of the element stiffness matrices to compute correction factors, the so-called exact semi-analytical eliminates truncation error (Olhoff et al. 1993). A proposed improved semi-analytical method obtains better accuracy by using the von Neumann series (Oral 1996).

Kiendl et al. (2014) use the isogeometric finite element in which non-rational uniform B-splines (NURBS) are used to parameterize both the finite element response and the domain geometry. A multilevel approach allows for a more coarse, i.e., smooth, design parameterization versus the finite element response. The semi-analytical method is combined with a sensitivity weighting scheme to compute the design updates for their optimization example problems.

Semi-analytical methods have been applied for nonlinear static structures. Haftka (1993) and Mróz and Haftka (1994) use it to compute sensitivities of limit loads and show that the semi-analytical method is equivalent to the overall finite difference method when a single Newton iteration is used. A more thorough formulation of the refined semi-analytical method was presented for linear, linearized buckling, geometrically nonlinear and limit point analyses in de Boer and van Keulen (2000). The exact semi-analytical method has also been extended to geometric nonlinearities in Wang et al. (2015). Curiously, this formulation uses the secant stiffness matrix and incorporates correction terms to eliminate truncation errors.

The refined semi-analytical approach was also extended to obtain second-order derivatives (de Boer et al. 2002). The higher-order semi-analytical derivatives studied by Bernard et al. (1993) use cubic polynomials to develop surrogate models of the mass and stiffness matrices so that higher-order derivatives can be easily computed.

Sensitivity analysis for transient problems have been extensively studied (Adelman and Haftka 1986; Haug 1987; Haftka and Gürdal 2012). These studies included nonlinearities (Ray et al. 1978; Michaleris et al. 1994; Kreissl et al. 2011; Deng et al. 2011), and shape sensitivities (Meric 1988; Tortorelli et al. 1991). The semi-analytical method has been applied for linear transient structural problems using a reduced order modal model (Camarda and Adelman 1984; Greene and Haftka 1991; Hooijkamp and van Keulen 2018). As such, these methods are restricted to linear systems.

The semi-analytical method has been applied to transient heat conduction problems (Gu and Grandhi 1998), including nonlinear behaviour (Gu et al. 2002), and nonlinear coupled with structural dynamics (Chen et al. 2003). It is unclear how their use of the Precise Time Integration scheme (Zhong and Williams 1994) which is limited to time varying linear systems affects the accuracy of their nonlinear analyses and subsequent sensitivity analysis.

Semi-analytical sensitivity analysis via direct differentiation has been applied to dynamic systems with large rotations (Brüls and Eberhard 2008), and to flexible multibody systems (Tromme et al. 2015). In the latter, to ease the computation, the pseudo load is approximated using the perturbation of the residual. This approximation is easy to implement, since simulation codes usually have a function to compute the residual (Tromme et al. 2015). The goal of this paper is to study this formulation and extend it to the adjoint method.

In the following, we study the semi-analytical method to facilitate the sensitivity analyses for transient nonlinear systems. The transient problems are treated as general as possible. To do this, we use both an implicit-explicit time integration algorithm and the popular Newmark time stepping method. Additionally, we use a general formulation, so the methods can be applied to any type of transient problems (e.g., thermal, structural, multibody, etc.). We systematically develop direct and adjoint sensitivity analysis approaches. Furthermore, for transient and dynamic problems, we study the adjoint differentiate-then-discretize and the adjoint discretize-then-differentiate approaches. The adjoint semi-analytical sensitivity analysis approaches require restrictive assumptions. In particular, we show that the adjoint differentiate-then-discretize method exhibits consistency error and requires some terms to be constant in order to reuse the tangent stiffness matrix from the primal analysis. We also show that the semi-analytical adjoint differentiate-then-discretize method, for nonlinear transient and nonlinear dynamic systems is limited to systems with symmetric stiffness and damping matrices. Fortunately, we show that by using an implicit time integration, the discretize-then-differentiate adjoint method can accommodate asymmetric stiffness matrices.

The major contributions of this paper are (1) an overview of analytical sensitivity analysis for nonlinear transient problems, (2) the development of novel efficient semi-analytical formulations, (3) the identification of restrictions for semi-analytical adjoint methods, and (4) a discussion of the consistency and accuracy of the methods.

This paper gives a general overview of the finite difference method (Section 2.2) and the analytical and semi-analytical sensitivity analyses for nonlinear steady state (Section 2), transient (Section 3) and dynamic (Section 4) systems. Numerical examples are provided in Sections 3.6, and 4.6 wherein the accuracy of the methods are discussed. To quantify the accuracy, we introduce the relative percentage error between the sensitivities obtained by finite differences δFf and the analytical sensitivities δF as

$$ e_{f}= \left| \frac{\delta F_{f}-\delta F}{\delta F} \right| 100\% . $$
(1)

Similarly, we compute the relative error of the semi-analytical sensitivities δFs with respect to the analytical sensitivities δF as

$$ e_{s}=\left| \frac{\delta F_{s}-\delta F}{\delta F} \right| 100\% . $$
(2)

2 Steady-state nonlinear problems

After finite element discretization, the steady-state nonlinear problem is expressed in terms of the residual function R via the equation

$$ \textbf{R}(\textbf{U})=\textbf{0} , $$
(3)

where U is the response vector, e.g., displacement. This nonlinear problem is solved using the iterative Newton-Raphson method. If the residual of the current iterate Uj is not a solution, R(Uj)≠0, then the next iterate Uj+ 1 = Uj + ΔUj is computed by equating the first order Taylor series expansion of R about Uj+ 1 to zero, i.e.,

$$ \textbf{R}(\textbf{U}_{j + 1})=\textbf{R}(\textbf{U}_{j}+{\Delta} \textbf{U}_{j})\approx \textbf{R}(\textbf{U}_{j})+{\textbf{K}} {\Delta} \textbf{U}_{j}=\textbf{0} , $$
(4)

where K = R/U is the tangent matrix. The incremental response update ΔUj is obtained by solving the linear equation

$$ \textbf{K}(\textbf{U}_{j}){\Delta} \textbf{U}_{j}=-\textbf{R}_{j}(\textbf{U}_{j}) , $$
(5)

whereafter the next iterate

$$ \textbf{U}_{j + 1}=\textbf{U}_{j}+{\Delta} \textbf{U}_{j} , $$
(6)

is computed. The steps of evaluating the residual R and updating the response U are repeated until the solution converges to a within user specified tolerance, i.e., until |R(U)|≤ 𝜖R.

2.1 Sensitivity analysis of steady-state nonlinear systems

For the sensitivity analysis we treat the residual R and the response U as functions of the nd vector of design variables \(\textbf {d}=[d_{1},d_{2},..,d_{n_{d}}]^{\top }\), i.e., we now have express (3) as

$$ \textbf{R}(\textbf{U}(\textbf{d}),\textbf{d})=\textbf{0} . $$
(7)

After completing the primal analysis of (3), we can evaluate any number of response functions F. For our purposes, the response function depends on the response U(d) to the problem in (7) whereby we express

$$ F(\textbf{d})=G(\textbf{U}(\textbf{d}),\textbf{d}) . $$
(8)

Using the chain rule, the derivative of the response functional of (8) with respect to each di is

$$ \frac{\mathrm{D}F}{\mathrm{D}d_{i}}=\frac{\mathrm{D}G}{\mathrm{D}d_{i}}=\frac{\partial G}{\partial \textbf{U}} \frac{\partial \textbf{U}}{\partial d_{i}}+\frac{\partial G}{\partial d_{i}} , $$
(9)

where U/di is implicitly defined through (7).

2.2 Finite difference method

The forward finite difference method approximates the derivatives of a response function F using a truncated Taylor series expansion

$$ \frac{\mathrm{D}F(\textbf{d})}{\mathrm{D}d_{i}}\approx\frac{F(\textbf{d}+\epsilon \textbf{e}_i)-F(\textbf{d})}{\epsilon} , $$
(10)

where ei = [0,0,..,1,...,0,0] is the unit vector of component i, and 𝜖 the perturbation. The approximation DF(d)/Ddi𝜖F(d + 𝜖ei) − F(d) exhibits truncation error o(𝜖), where o is a function defined such that o(𝜖) tends to zero faster than 𝜖, i.e., \(\lim _{\epsilon \to 0} o(\epsilon )/\epsilon = 0\). To reduce the truncation error o(𝜖) it is desirable to choose a small 𝜖, however, numerical round-off error will erode the accuracy of the approximation if 𝜖 is too small.

Since the response function depends on the response U(d), the approximation of (10) is expressed by

$$ \frac{\mathrm{D}F(\textbf{d})}{\mathrm{D}d_{i}}\approx\frac{G(\textbf{U}(\textbf{d}+\epsilon \textbf{e}_i),\textbf{d}+\epsilon \textbf{e}_i)-G(\textbf{U}(\textbf{d}),\textbf{d})}{\epsilon} . $$
(11)

As seen above, the response U(d + 𝜖ei) must be calculated for each design variable di; this is easily obtained by modifying the finite element model, but computationally inefficient because it requires nd additional simulations to compute the U(d + 𝜖ei). Note that second-order accurate approximations which are accurate to o(𝜖2) can be obtained by central differences, but this requires two re-analyses for U(d ± 𝜖ei) which is even more costly. As seen here, the finite difference method is easy to implement, computationally inefficient, and subjected to truncation and round-off errors.

2.3 Direct differentiation for steady-state nonlinear systems

In the direct differentiation approach, the implicit derivative U/di, i.e., pseudo response, is obtained by differentiating (7) respect to di, which after some rearranging defines the so-called pseudo problem

$$ \textbf{K} \frac{\partial \textbf{U}}{\partial d_{i}}=-\frac{\partial \textbf{R}}{\partial d_{i}} , $$
(12)

where − R/di is the pseudo load. Notice that the tangent operator K from the primal analysis appears in the pseudo problem; moreover, it is already factored, assuming the use of direct solvers in the primal analysis. Thus, the evaluation of the implicit derivative U/di only requires the formation of the pseudo load vector − R/di and a back substitution. Once the implicit derivative U/di is obtained, (9) is evaluated to obtain the sensitivities for any number of functions F. As seen here, the direct method is computationally efficient because it solves one pseudo problem using the previously factored tangent matrix for each design variable regardless of the number of response functions. In addition, the computed sensitivities are numerically exact.

In the semi-analytical formulation, the derivatives R/di and DG/Ddi of (12) and (9) are approximated to within o(𝜖) via finite differences

figure a
$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} G(\textbf{U}(\textbf{d}),\textbf{d})}{\mathrm{D} d_{i}} &\approx& \frac{1}{\epsilon}\left( G \left( \textbf{U}(\textbf{d})+\epsilon \frac{\partial \textbf{U} (\textbf{d})}{\partial d_{i}},\textbf{d}+\epsilon \textbf{e}_i\right) \right.\\ &&-\left.{G(\textbf{U}(\textbf{d}),\textbf{d})}\vphantom{G \left( \textbf{U}(\textbf{d})+\epsilon \frac{\partial \textbf{U} (\textbf{d})}{\partial (d_{i}},\textbf{d}+\epsilon \textbf{e}_i\right)}\right) . \end{array} $$
(13)

In (1), we assume the residual R(U(d),d) = 0; however, we solve the primal analysis until the solution converges to a user defined tolerance, i.e., |R(U(d),d)|≤ 𝜖R. This tolerance imposes a new source of error in addition to the truncation and round-off errors.

Since the function G is known, the derivatives G/di and G/U in the sensitivity DF/Ddi can be computed exactly as in (9) or approximated as in (13). We assume the former.

The approximations in (1) and (13) are easy to implement because they only require the generation of the d + 𝜖ei followed by the evaluations of the perturbed residual R(U(d),d + 𝜖ei) and response function G (U(d) + 𝜖U/di,d + 𝜖ei) which are readily computed by the subroutines that are used to compute R(U(d),d) and G(U(d),d). Thusly, the semi-analytical method shares the simplicity of the finite difference method and the efficiency of the analytical methods. It is noted, however, that tolerance 𝜖R, truncation and round-off errors may pollute the results. In most cases, a design perturbation will not affect all of the element internal force vectors. As such, we only need to evaluate the elemental residual R(U(d),d + 𝜖ei) of the affected elements. An extreme case of this occurs in topology optimization where each volume fraction design variable only affects a single element. Less extreme cases occur in shape optimization where each dimensional change may only affect a subset of the element boundary elements.

2.4 Adjoint method for steady-state nonlinear systems

In the adjoint method, the derivative U/di is annihilated. This formulation uses the identity

$$ \frac{\mathrm{D}{F}}{\mathrm{D}d_{i}}=\frac{\partial G}{\partial \textbf{U}} \frac{\partial \textbf{U}}{\partial d_{i}}+\frac{\partial G}{\partial d_{i}}+{\boldsymbol{\Lambda}}^{\top}\left( \textbf{K} \frac{\partial \textbf{U}}{\partial d_{i}}+\frac{\partial \textbf{R}}{\partial d_{i}}\right) , $$
(14)

which follows from (9) and (12). In the above, Λ is the arbitrary adjoint vector. Rearranging (14) yields

$$ \frac{\mathrm{D}{F}}{\mathrm{D}d_{i}}=\left( \frac{\partial G}{\partial \textbf{U}}+{\boldsymbol{\Lambda}}^{\top} \textbf{K} \right) \frac{\partial \textbf{U}}{\partial d_{i}}+\frac{\partial G}{\partial d_{i}}+{\boldsymbol{\Lambda}}^{\top}\frac{\partial \textbf{R}}{\partial d_{i}} , $$
(15)

from which we identify the adjoint problem that we solve for the heretofore arbitrary Λ, i.e.,

$$ \textbf{K}^{\top} \boldsymbol{\Lambda}=-\frac{\partial G}{\partial\textbf{U}}^{\top} . $$
(16)

In this way, the term containing U/di is annihilated from (15) reducing the sensitivity to

$$ \frac{\mathrm{D}{F}}{\mathrm{D}d_{i}}=\frac{\partial G}{\partial d_{i}}+\boldsymbol{\Lambda}^{\top}\frac{\partial \textbf{R}}{\partial d_{i}} . $$
(17)

The adjoint method requires the solution of one adjoint problem (cf. (16)) for each response function F regardless of the number of design variables. And like the direct method, the adjoint problem utilizes the tangent matrix from the primal analysis, so it is also computationally efficient and numerically exact. Furthermore, the tangent stiffness matrix may be already factored, if a direct solver is used in the primal analysis.

In the semi-analytical formulation, the derivative R/di is approximated via finite differences (cf. (1)) and use (17) to obtain the sensitivities. As previously mentioned, the derivative G/U is obtained analytically using our knowledge of the function G.

3 Transient nonlinear problems

A first-order transient problem is expressed in residual form as

$$\begin{array}{@{}rcl@{}} \textbf{R}(\textbf{U}(t,\textbf{d}),\dot{\textbf{U}}(t,\textbf{d}), \textbf{d}) &=&\textbf{0} , \end{array} $$
(19a)
$$\begin{array}{@{}rcl@{}} \textbf{U}(0) &=& \textbf{U}^{0} , \end{array} $$
(19b)

where we note the design dependencies as in (7), t ∈ [0,tf] denotes time and tf the terminal analysis time. The response function for this system is expressed as

$$ F(\textbf{d})= {\int}_{0}^{t_{f}} G({\textbf{U}}(t,\textbf{d}),\dot{\textbf{U}}(t,\textbf{d}),\textbf{d}) \mathrm{d}t . $$
(20)

Our goal is to compute the sensitivity in an efficient, accurate and easy manner, i.e., we want to compute

$$ \frac{\mathrm{D} F}{\mathrm{D} d_{i}}= {\int}_{0}^{t_{f}} \left( \frac{\partial G}{\partial\textbf{U}} \frac{\partial \textbf{U}}{\partial d_i} + \frac{\partial G}{\partial \dot{\textbf{U}}} \frac{\partial \dot{\textbf{U}}}{\partial d_i} + \frac{\partial G}{\partial d_{i}} \right) \mathrm{d}t . $$
(21)

For the sensitivity analysis, we can implement the direct method whereby we differentiate (19a) and (19b) to define the pseudo problem

$$\begin{array}{@{}rcl@{}} \frac{\partial \textbf{R}}{\partial\dot{\textbf{U}}} \frac{\partial \dot{\textbf{U}}}{\partial d_i} + \frac{\partial \textbf{R}}{\partial{\textbf{U}}} \frac{\partial \textbf{U}}{\partial d_i} & =& - \frac{\partial \textbf{R}}{\partial d_{i}} , \end{array} $$
(22a)
$$\begin{array}{@{}rcl@{}} \frac{\partial \textbf{U}(0)}{\partial d_i} &=& \frac{\partial \textbf{U}^{0}}{\partial d_i} , \end{array} $$
(22b)

which we solve for U/di and \(\partial \dot {\textbf {U}}/\partial d_{i}\) and then we evaluate (21). Alternatively, we can implement the adjoint approach, whereby we utilize (22a) to write (21) as

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}} &=& {\int}_{0}^{t_{f}} \left( \frac{\partial G}{\partial\textbf{U}} \frac{\partial \textbf{U}}{\partial d_i} + \frac{\partial G}{\partial \dot{\textbf{U}}} \frac{\partial \dot{\textbf{U}}}{\partial d_i} + \frac{\partial G}{\partial d_{i}} \right) \mathrm{d}t \\ &&+ {\int}_{0}^{t_{f}} \boldsymbol{ \lambda}^{\top} \left( \frac{\partial \textbf{R}}{\partial{\textbf{U}}} \frac{\partial \textbf{U}}{\partial d_i} + \frac{\partial \textbf{R}}{\partial\dot{\textbf{U}}} \frac{\partial \dot{\textbf{U}}}{\partial d_i} + \frac{\partial \textbf{R}}{\partial d_{i}}\right) \mathrm{d}t , \end{array} $$
(23)

where again λ is the arbitrary adjoint vector. Integrating by parts and rearranging (23) yields

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}&=& \left.{\int}_{0}^{t_{f}} \left( \frac{\partial G}{\partial d_{i}} + {\boldsymbol{ \lambda}^{\top}} \frac{\partial \textbf{R}}{\partial d_{i}} \right) \mathrm{d}t - \left( \frac{\partial G}{\partial \dot{\textbf{U}}} + \boldsymbol{ \lambda}^{\top} \frac{\partial \textbf{R}}{\partial\dot{\textbf{U}}}\right) \frac{\partial \textbf{U}}{\partial d_i} \right\rvert_{t = 0}\\ &&+ {\int}_{0}^{t_{f}} \frac{\partial \textbf{U}}{\partial d_i}^{\top} \left( \frac{\partial {G}}{\partial \textbf{U}}^{\top} -\frac{\mathrm{d}}{\mathrm{d}{t}} \left( \frac{\partial {G}^{\top}}{\partial \dot{\textbf{U}}}\right) + \frac{\partial \textbf{R}^{\top}}{\partial \textbf{U}} \boldsymbol{ \lambda}\right.\\ &&-\left.\frac{\text{d}}{\text{d}t} \left( \frac{\partial \textbf{R}^{\top}}{\partial \dot{\textbf{U}}} \boldsymbol{ \lambda}\right) \right) \mathrm{d}t {\left.+ \frac{\partial \textbf{U}}{\partial d_i}^{\top} \left( \frac{\partial G}{\partial \dot{\textbf{U}}}^{\top} + \frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{\top} \boldsymbol{ \lambda}\right)\right\rvert_{t=t_{f}}} . \end{array} $$
(24)

Next, a time mapping is introduced, i.e., we define Λ such that

$$ \boldsymbol{\Lambda}(t_{f}-t)=\boldsymbol{\lambda}(t) , $$
(25)

and hence

$$ -\dot{\boldsymbol{\Lambda}}(t_{f}-t)=\dot{\boldsymbol{\lambda}}(t) , $$
(26)

substituting the above into (24) renders

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}&=&\left. {\int}_{0}^{t_{f}} \left( \frac{\partial G}{\partial d_{i}} + {\boldsymbol{\Lambda}^{\top}}\frac{\partial \textbf{R}}{\partial d_{i}} \right) \mathrm{d}t - \left( \frac{\partial G}{\partial \dot{\textbf{U}}} + \boldsymbol{\Lambda}^{\top} \frac{\partial \textbf{R}}{\partial\dot{\textbf{U}}}\right) \frac{\partial \textbf{U}}{\partial d_i} \right\rvert_{t = 0}\\ &&+ {\int}_{0}^{t_{f}} \frac{\partial \textbf{U}}{\partial d_i}^{\top} \left( {\frac{\partial G}{\partial\textbf{U}}^{\top}} -\frac{\text{d}}{\text{d}t} \left( \frac{\partial {G}^{\top}}{\partial \dot{\textbf{U}}}\right) + \frac{\partial \textbf{R}^{\top}}{\partial \textbf{U}} \boldsymbol{\Lambda} -\frac{\text{d}}{\text{d}t} \left( \frac{\partial \textbf{R}^{\top}}{\partial \dot{\textbf{U}}}\right) \boldsymbol{\Lambda} \right. \\ &&+\left.\left. \frac{\partial \textbf{R}^{\top}}{\partial \dot{\textbf{U}}} \dot{\boldsymbol{\Lambda}} \right) \mathrm{d}t+ \frac{\partial \textbf{U}}{\partial d_i}^{\top} \left( \frac{\partial G}{\partial \dot{\textbf{U}}}^{\top} + \frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{\top} \boldsymbol{\Lambda}\right)\right\rvert_{t=t_{f}} . \end{array} $$
(27)

where all quantities are evaluated at time t except Λ which is evaluated at tft. We can annihilate the terms containing the implicitly defined derivative U/di by requiring Λ to solve

$$\begin{array}{@{}rcl@{}} &&\frac{\partial \textbf{R}^{\top}}{\partial \dot{\textbf{U}}} \dot{ \boldsymbol{\Lambda} } + \left( \frac{\partial \textbf{R}}{\partial \textbf{U}}^{\top} -\frac{\text{d}}{\text{d}t} \left( \frac{\partial \textbf{R}^{\top}}{\partial \dot{\textbf{U}}}\right)\right) \boldsymbol{\Lambda} \\ &=& - \frac{\partial {G}}{\partial \textbf{U}}^{\top} +\frac{\text{d}}{\text{d}t} \left( \frac{\partial {G}^{\top}}{\partial \dot{\textbf{U}}}\right) , \end{array} $$
(28a)
$$ \left.\frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{\top}\right\rvert_{t=t_{f}} \boldsymbol{\Lambda} (0) = \left.-\frac{\partial G}{\partial \dot{\textbf{U}}}^{\top} \right\rvert_{t=t_{f}} . $$
(28b)

Using this Λ, DF/Ddi reduces to the known quantity

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}&=& {\int}_{0}^{t_{f}} \left( \frac{\partial G}{\partial d_{i}} + \boldsymbol{\Lambda}^{\top} \frac{\partial \textbf{R}}{\partial d_{i}} \right) \mathrm{d}t \\ &&-\left. \left( \frac{\partial G}{\partial \dot{\textbf{U}}} + \boldsymbol{\Lambda}^{\top} \frac{\partial \textbf{R}}{\partial\dot{\textbf{U}}}\right) \frac{\partial \textbf{U}}{\partial d_i} \right\rvert_{t = 0} . \end{array} $$
(29)

where again all quantities are evaluated at time t except Λ which is evaluated at tft.

3.1 Discretization

To solve the above, we discretize in time using an explicit/implicit parameter 0 ≤ α ≤ 1 so that

$$ \textbf{U}^{n}=\textbf{U}^{n-1} + \left( \alpha \dot{\textbf{U}}^{n} + (1-\alpha)\dot{\textbf{U}}^{n-1}\right){\Delta} t , $$
(30)

where Un = U(tn) and \(\dot {\textbf {U}}^{n}=\dot {\textbf {U}}(t_{n})\).Footnote 1 We then solve (19a) at the discrete times tn. Finally, the integrals in (20) and (21) are evaluated as

$$ F=\sum\limits_{n = 0}^{N} \mu_{n} G^{n}(\textbf{U}^{n},\dot{\textbf{U}}^{n}, \textbf{d}) , $$
(31)
$$ \frac{\mathrm{D} F}{\mathrm{D} d_{i}}=\sum\limits_{n = 0}^{N} \mu_{n} \left( \frac{\partial G^{n}}{\partial \textbf{U}} \frac{\partial \textbf{U}}{\partial d_i}^{n} + \frac{\partial G^{n}}{\partial \textbf{U}} \frac{\partial \dot{\textbf{U}}^{n}}{\partial d_{i}} + \frac{\partial G^{n}} {\partial d_{i}}\right) , $$
(32)

where, e.g., \(G^{n}=G(\textbf {U}^{n}, \dot {\textbf {U}}^{n}, \textbf {d})\) and the coefficient μn depends on the summation scheme, e.g., for trapezoidal 2μ0 = μ1 = μ2 = ... = μN− 1 = 2μN = Δt.

3.2 Primal analysis

The initial condition U0 is given, but \(\dot {\textbf {U}}^{0}\) is needed in (30) to obtain U1. To these ends, we use (19a), i.e., we use the Newton-Raphson method to solve

$$ \textbf{R}^{0}(\textbf{U}^{0},\dot{\textbf{U}}^{0},\textbf{d}) =\textbf{0} , $$
(33)

for \(\dot {\textbf {U}}^{0}\). The procedure is akin to that which we use to evaluate U in Section 2. Here \(\textbf {K}^{0}={\partial \textbf {R}^{0}}/{\partial \dot {\textbf {U}}}\) is the tangent matrix. Having U0 and \(\dot {\textbf {U}}^{0}\), we compute the first term in (31), i.e., \(F= \mu _{0} G^{0}(\textbf {U}^{0},\dot {\textbf {U}}^{0}, \textbf {d})\).

Now we commence our analysis. At each time step tn, we insert Un of (30), in (19a) and solve the resulting equation for \(\dot {\textbf {U}}^{n}\). Again, we use Newton’s method for this solution, cf. Section 2, where we introduce the tangent stiffness matrix \(\textbf {K}^{n}={\partial \textbf {R}^{n}}/{\partial \dot {\textbf {U}}}+\alpha {\Delta } t {\partial \textbf {R}^{n}}/{\partial \textbf {U}}\). After convergence, Un is updated as per (30) and F is updated as per (31), i.e.,

$$ F \leftarrow F+ \mu_{n} G^{n}(\textbf{U}^{n},\dot{\textbf{U}}^{n}, \textbf{d}) , $$
(34)

where the symbol ← represents the update assignment.

The time is then incremental and the process repeats itself until the terminal time tf. A flow chart describing these computations is provided in Fig. 1 wherein multiple functions F are evaluated for n = 1,2,...,N.

Fig. 1
figure 1

Primal analysis flowchart

3.3 Direct differentiation

For the direct differentiation, we discretize U/di like U, i.e.,

$$ \frac{\partial \textbf{U}^{n}}{\partial d_i}= \frac{\partial \textbf{U}^{n-1}}{\partial d_i} + \left( \alpha \frac{\partial \dot{\textbf{U}}^{n}}{\partial d_i} + (1-\alpha) \frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i}\right){\Delta} t . $$
(35)

Note that the initial condition U0/di is known, but \({\partial \dot {\textbf {U}}^{0}}/{\partial d_{i}}\) is not. So before commencing, we must obtain \({\partial \dot {\textbf {U}}^{0}}/{\partial d_{i}}\) like we did \(\dot {\textbf {U}}^{0}\). To these ends, we differentiate (33) to obtain the linear equation

$$ \textbf{K}^{0} \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} = -\left( \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}} \frac{\partial \textbf{U}^{0}}{\partial d_i} + \frac{\partial {\textbf{R}}^{0}}{\partial d_{i}} \right) , $$
(36)

which we solve for \({\partial \dot {\textbf {U}}^{0}}/{\partial d_{i}}\). Having U0/di and \({\partial \dot {\textbf {U}}^{0}}/{\partial d_{i}}\), we update DF/Ddi as per (32), i.e.,

$$ \frac{\mathrm{D} F}{\mathrm{D} d_{i}} = \mu_{0} \left( \frac{\partial G^{0}}{\partial \textbf{U}} \frac{\partial \textbf{U}^{0}}{\partial d_i} + \frac{\partial G^{0}}{\partial \textbf{U}}\frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} + \frac{\partial G^{0}}{\partial d_{i}} \right) . $$
(37)

Now we march in time evaluating Un/di and \({\partial \dot {\textbf {U}}^{n}}/{\partial d_{i}}\) as we did to compute Un and \(\dot {\textbf {U}}^{n}\). Equation (22a) and (35) render the linear equation

$$\begin{array}{@{}rcl@{}} \textbf{K}^{n}(\textbf{U}^{n},\dot{\textbf{U}}^{n}, \textbf{d}) \frac{\partial \dot{\textbf{U}}^{n}}{\partial d_i} &=&-\left( \frac{\partial \textbf{R}^{n}}{\partial \textbf{U}} \left( \frac{\partial \textbf{U}^{n-1}}{\partial d_i}\right.\right.\\ &&+\left.\left.(1-\alpha){\Delta} t\frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i}\right)+ \frac{\partial \textbf{R}^{n}}{\partial d_{i}} \right) , \end{array} $$
(38)

which we solve for \({\partial \dot {\textbf {U}}^{n}}/{\partial d_{i}}\). Next, we update Un/di as per (35) and DF/Ddi as per (32)

$$ \frac{\mathrm{D} F}{\mathrm{D} d_{i}} \leftarrow \frac{\mathrm{D} F}{\mathrm{D} d_{i}}+ \mu_{n} \left( \frac{\partial G^{n}}{\partial \textbf{U}} \frac{\partial \textbf{U}^{n}}{\partial d_i} + \frac{\partial G^{n}}{\partial \textbf{U}} \frac{\partial \dot{\textbf{U}}^{n}}{\partial d_{i}} + \frac{\partial G^{n}}{\partial d_{i}} \right) . $$
(39)

We continue marching in this manner for all tn. In so far as our sensitivity analysis algorithm is concerned, we insert nodes A and B from Fig. 2 into the primal analysis flowchart of Fig. 1.

Fig. 2
figure 2

Direct differentiation nodes

For the semi-analytical method we use the approximations

$$ \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}} \frac{\partial \textbf{U}^{0}}{\partial d_i} + \frac{\partial {\textbf{R}}^{0}}{\partial d_{i}} \approx \frac{1}{\epsilon} \textbf{R}^{0} \left( \textbf{U}^{0}+\epsilon \frac{\partial \textbf{U}^{0}}{\partial d_i}, \dot{\textbf{U}}^{0}, \textbf{d}+\epsilon \textbf{e}_i \right) , $$
(40)
$$\begin{array}{@{}rcl@{}} \frac{\partial \textbf{R}^{n}}{\partial \textbf{U}} \left( \frac{\partial \textbf{U}^{n-1}}{\partial d_i}+(1-\alpha){\Delta} t\frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i}\right)+ \frac{\partial \textbf{R}^{n}}{\partial d_{i}} \approx \\ \quad \frac{1}{\epsilon} \textbf{R}^{n} \left( \textbf{U}^{n}+\epsilon\left( \frac{\partial \textbf{U}^{n-1}}{\partial d_i}+(1-\alpha){\Delta} t\frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i}\right), \right.\\ \left. \vphantom{\frac{1}{\epsilon} \textbf{R}^{n} \left( \textbf{U}^{n}+\epsilon\left( \frac{\partial \textbf{U}^{n-1}}{\partial d_i}+(1-\alpha){\Delta} t\frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i}\right), \right)} \dot{\textbf{U}}^{n}, \textbf{d}+\epsilon \textbf{e}_i \right) , \end{array} $$
(41)
$$\begin{array}{@{}rcl@{}} &&\frac{\partial G^{n}}{\partial \textbf{U}} \frac{\partial \textbf{U}^{n}}{\partial d_i} + \frac{\partial G^{n}}{\partial \dot{\textbf{U}}}\frac{\partial \dot{\textbf{U}}^{n}}{\partial d_{i}} + \frac{\partial G^{n}}{\partial d_{i}} \\ \quad &\approx&\frac{1}{\epsilon} \left( G^{n} \left( \textbf{U}^{n}+\epsilon \frac{\partial \textbf{U}^{n}}{\partial d_i} , \dot{\textbf{U}}^{n}+\epsilon \frac{\partial \dot{\textbf{U}}^{n}}{\partial d_i} , \textbf{d}+\epsilon \textbf{e}_i \right) \right.\\ &&-\left.G^{n}(\textbf{U}^{n},\dot{\textbf{U}}^{n}, \textbf{d})\right) , \end{array} $$
(42)

in (36), (37), (38), and (39). Again, we assume the user can code Gn/U, \(\partial G^{n}/ \partial \dot {\textbf {U}}\) and Gn/d, so we do not use (42). As mentioned before, semi-analytical sensitivities carry the error due to 𝜖R, truncation, and round-off.

3.4 Adjoint method using differentiate-then-discretize

In the differentiate-then-discretize approach, one obtains the adjoint problem (cf. (28a) and (28b)) and the sensitivity (cf. (29)) at the continuous time level. Now we use numerical time integration to compute

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}&=& \sum\limits_{n = 0}^{N}\mu_{N-n} \left( \frac{\partial G^{N-n}}{\partial d_{i}} + \boldsymbol{\Lambda}^{n\top} \frac{\partial \textbf{R}^{N-n}}{\partial d_{i}} \right) \\ &&- \left( \frac{\partial G^{0}}{\partial \textbf{U}} + {\boldsymbol{\Lambda}^{N}}^{\top} \frac{\partial {\textbf{R}}^{0}}{\partial \dot{\textbf{U}}}\right) \frac{\partial \textbf{U}^{0}}{\partial d_i} . \end{array} $$
(43)

Before we evaluate the above, we must solve the adjoint problem of (28a) and (28b). To do this, we discretize the adjoint variable Λ like U, i.e.,

$$ \boldsymbol{\Lambda}^{n}=\boldsymbol{\Lambda}^{n-1} + \left( \alpha \dot{\boldsymbol{\Lambda}}^{n} + (1-\alpha) \dot{\boldsymbol{\Lambda}}^{n-1}\right){\Delta} t , $$
(44)

To reuse Kn like the direct method, we restrict our adjoint discussion to those R such that

$$ \frac{\text{d}}{\text{d}t}\left( \frac{\partial \textbf{R}}{{\partial \dot{\textbf{U}}}}\right) =\textbf{0} . $$
(45)

Notably \(\partial \textbf {R}/ \partial \dot {\textbf {U}}\) is typically interpreted as a mass matrix, so the mass matrix must be constant which is fairly common.

Referring to (28b), we initially solve the adjoint problem

$$ {\frac{\partial \textbf{R}^{N}}{\partial \dot{\textbf{U}}}}^{\top} \boldsymbol{\Lambda}^{0} = -\frac{\partial G^{N}}{\partial \dot{\textbf{U}}}^{\top} , $$
(46)

for Λ0 and then solve (28a) with Λ0 to evaluate \(\dot {\boldsymbol {\Lambda }}^{0}\), i.e.,

$$\begin{array}{@{}rcl@{}} {\frac{\partial \textbf{R}^{N}}{\partial \dot{\textbf{U}}}}^{\top} \dot{ \boldsymbol{\Lambda}^{0}}=- \frac{\partial \textbf{R}^{N}}{\partial \textbf{U}}^{\top} \boldsymbol{\Lambda}^{0} - \frac{\partial {G}^{N}}{\partial \textbf{U}}^{\top} \\+ \left( \frac{\partial^{2} {G}^{N}}{\partial \dot{\textbf{U}} \partial \textbf{U}} \dot{\textbf{U}}^{N} \right)^{\top} + \left( \frac{\partial^{2} {G}^{N}}{\partial \dot{\textbf{U}}^{2}} \dot{\textbf{U}}^{N} \right)^{\top} . \end{array} $$
(47)

Note that (46) and (47) do not use the tangent stiffness matrix from the primal problem. Next, we compute

$$ \frac{\mathrm{D} F}{\mathrm{D} d_{i}} = \mu_{N} \left( \frac{\partial G^{N}}{\partial d_{i}} + \boldsymbol{\Lambda}^{0\top} \frac{\partial \textbf{R}^{N}}{\partial d_{i}} \right) , $$
(48)

cf. (29), (31), and (32). Time marching now commences for the remaining time steps, i.e., for n = 1,2,...,N − 1 we evaluate \(\dot {\boldsymbol {\Lambda }}^{0}\) by solving

$$\begin{array}{@{}rcl@{}} &&{\textbf{K}^{N-n}}^{\top} \dot{\boldsymbol{\Lambda}}^{n} =- \frac{\partial {G}^{N-n}}{\partial \textbf{U}}^{\top} \\ &&+ \left( \frac{\partial^{2} {G}^{N-n}}{\partial \dot{\textbf{U}} \partial \textbf{U}} \dot{\textbf{U}}^{N-n} \right)^{\top}+ \left( \frac{\partial^{2} {G}^{N-n}}{\partial \dot{\textbf{U}}^{2}} \ddot{\textbf{U}}^{N-n} \right)^{\top} \\ &&-\frac{\partial \textbf{R}^{N-n}}{\partial \textbf{U}}^{\top} \left( \boldsymbol{\Lambda}^{n-1} + (1-\alpha) {\Delta} t \dot{\boldsymbol{\Lambda}}^{n-1}\right) , \end{array} $$
(49)

where KNn is the tangent stiffness matrix of the primal problem. Then, we compute Λn as per (44) and update

$$ \frac{\mathrm{D} F}{\mathrm{D} d_{i}} \leftarrow \frac{\mathrm{D} F}{\mathrm{D} d_{i}} + \mu_{N-n} \left( \frac{\partial G^{N-n}}{\partial d_{i}} + \boldsymbol{\Lambda}^{n\top} \frac{\partial \textbf{R}^{N-n}}{\partial d_{i}} \right) . $$
(50)

Finally, we solve

$$\begin{array}{@{}rcl@{}} &&\left( \frac{\partial {\textbf{R}}^{0}}{\partial \dot{\textbf{U}}} +\alpha{\Delta} t \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\right)^{\top} \dot{\boldsymbol{\Lambda}}^{N} =- \frac{\partial {G}^{0}}{\partial \textbf{U}}^{\top} \\ &&+ \left( \frac{\partial^{2} {G}^{0}}{\partial \dot{\textbf{U}} \partial \textbf{U}} \dot{\textbf{U}}^{0} \right)^{\top}+ \left( \frac{\partial^{2} {G}^{0}}{\partial \dot{\textbf{U}}^{2}} \ddot{\textbf{U}}^{0} \right)^{\top} \\ &&-\frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}^{\top} \left( \boldsymbol{\Lambda}^{N-1} + (1-\alpha) {\Delta} t \dot{\boldsymbol{\Lambda}}^{N-1}\right) , \end{array} $$
(51)

for \(\dot {\boldsymbol {\Lambda }}^{N}\), we evaluate ΛN with (44) and update

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}} \leftarrow \frac{\mathrm{D} F}{\mathrm{D} d_{i}} &+& \mu_{0} \frac{\partial G^{0}}{\partial d_{i}} - \frac{\partial G^{0}}{\partial \dot{\textbf{U}}}\frac{\partial \textbf{U}^{0}}{\partial d_i} \\ &+& {\boldsymbol{\Lambda}^{N}}^{\top}\left( \mu_{0} \frac{\partial {\textbf{R}}^{0}}{\partial d_{i}} - \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\frac{\partial \textbf{U}^{0}}{\partial d_i} \right) , \end{array} $$
(52)

As in (47) and (51) does not use the tangent stiffness matrix from the primal analysis.

The second derivatives \(\ddot {\textbf {U}}^{n}\) in (47), (49), and (51) can be computed using the known first derivatives \(\dots \dot {\textbf {U}}^{n-1},\dot {\textbf {U}}^{n},\dot {\textbf {U}}^{n + 1},\dots \) and Δt, and a second order forward difference for \(\ddot {\textbf {U}}^{0}\), backward differences for \(\dot {\textbf {U}}^{N}\), and central differences for any other \(\ddot {\textbf {U}}^{n}\) (cf. Figure 3). The adjoint sensitivity analysis is executed after the primal analysis is concluded. Thus, we describe this algorithm by inserting node C of Fig. 3 into the flowchart of Fig. 1.

Fig. 3
figure 3

Adjoint differentiate-then-discretize node

In the semi-analytical, we consider a further restriction that R/U is symmetric, so the term in the adjoint load of (47) can be approximated as

$$ \frac{\partial \textbf{R}^{N}}{\partial \textbf{U}}^{\top} \boldsymbol{\Lambda}^{0} \approx \frac{1}{\epsilon} \textbf{R}^{N} \left( \textbf{U}^{N}+\epsilon\boldsymbol{\Lambda}^{0} , \dot{\textbf{U}}{}^{N} , \textbf{d}\right) , $$
(53)

and the term in the adjoint load of (49) can be approximated as

$$\begin{array}{@{}rcl@{}} &&\frac{\partial \textbf{R}^{N-n}}{\partial \textbf{U}}^{\top} \left( \boldsymbol{\Lambda}^{n-1} + (1-\alpha) {\Delta} t \dot{\boldsymbol{\Lambda}}^{n-1}\right) \approx\\ &&~~\frac{1}{\epsilon} \textbf{R}^{N-n} \left( \textbf{U}^{N-n}+\epsilon \left( \boldsymbol{\Lambda}^{n-1} + (1-\alpha) {\Delta} t \dot{\boldsymbol{\Lambda}}^{n-1}\right), \right.\\ &&~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\left. \vphantom{\frac{1}{\epsilon} \textbf{R}^{N-n} \left( \textbf{U}^{N-n}+\epsilon \left( \boldsymbol{\Lambda}^{n-1} + (1-\alpha) {\Delta} t \dot{\boldsymbol{\Lambda}}^{n-1}\right), \right)} \dot{\textbf{U}}{}^{N-n} , \textbf{d}\right) . \end{array} $$
(54)

In regard to DF/Ddi of (52), we use the approximation

$$\begin{array}{@{}rcl@{}} &&\mu_{0} \frac{\partial {\textbf{R}}^{0}}{\partial d_{i}} - \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\frac{\partial \textbf{U}^{0}}{\partial d_i} \\ &\approx&\frac{1}{\epsilon} \textbf{R} \left( \textbf{U}^{0}, \dot{\textbf{U}}{}^{0}-\epsilon \frac{\partial \textbf{U}^{0}}{\partial d_i} , \textbf{d} +\mu_{0}\epsilon \textbf{e}_i \right) . \end{array} $$
(55)

Finally, the derivative Rn/di in (48) and (50) is approximated as

$$ \frac{\partial \textbf{R}^{n}}{\partial d_{i}} \approx \frac{1}{\epsilon} \textbf{R} \left( \textbf{U}^{n}, \dot{\textbf{U}}{}^{n} , \textbf{d}+\epsilon \textbf{e}_i\right) . $$
(56)

Of course, the semi-analytical approximations exhibit the previously discussed errors.

Again, we assume the user can code G/U, etc., as these would be time consuming to compute by finite differences.

3.5 Adjoint method using discretize-then-differentiate

In this second option of the adjoint method, we use (22a) and (35) to equivalently write (32) as

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}&=&\sum\limits_{n = 0}^{N} \mu_{n} \left( \frac{\partial G^{n}}{\partial \textbf{U}} \frac{\partial \textbf{U}^{n}}{\partial d_i} + \frac{\partial G^{n}}{\partial \textbf{U}}\frac{\partial \dot{\textbf{U}}^{n}}{\partial d_{i}} + \frac{\partial G^{n}}{\partial d_{i}} \right) \\ &&+\sum\limits_{n = 0}^{N}{\boldsymbol{\Lambda}^{n}}^{\top} \left( \frac{\partial \textbf{R}^{N-n}}{\partial \dot{\textbf{U}}}\frac{\partial \dot{\textbf{U}}^{N-n}}{\partial d_i} \right. \\ &&+\left.\frac{\partial \textbf{R}^{N-n}}{\partial \textbf{U}} \frac{\partial \textbf{U}^{N-n}}{\partial d_i} + \frac{\partial \textbf{R}}{\partial d_{i}}^{N-n} \right) \\ &&+\sum\limits_{n = 0}^{N-1}{\boldsymbol{\Phi}^{n}}^{\top}\left( \frac{\partial \textbf{U}^{N-n}}{\partial d_i}-\frac{\partial \textbf{U}^{N-n-1}}{\partial d_i}\right. \\ &&-\left. \left( \alpha \frac{\partial \dot{\textbf{U}}^{N-n}}{\partial d_i} + (1-\alpha) \frac{\partial \dot{\textbf{U}}^{N-n-1}}{\partial d_i}\right){\Delta} t \right) ,\\ \end{array} $$
(57)

where Λn and Φn are arbitrary adjoint vectors. Rearrangement subsequently yields

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}&=&\sum\limits_{n = 0}^{N} \left( \mu_{N-n} \frac{\partial G^{N-n}}{\partial d_{i}} + {\boldsymbol{\Lambda}^{n}}^{\top} \frac{\partial \textbf{R}}{\partial d_{i}}^{N-n} \right) \\ &&+ \left( \mu_{0} \frac{\partial G^{0}}{\partial \textbf{U}} +{\boldsymbol{\Lambda}^{N}}^{\top} \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}} -{\boldsymbol{\Phi}^{N-1}}^{\top} \right) \frac{\partial \textbf{U}^{0}}{\partial d_i} \\ &&+ \left( \mu_{0} \frac{\partial G^{0}}{\partial \dot{\textbf{U}}} +{\boldsymbol{\Lambda}^{N}}^{\top} \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}} - (1-\alpha){\Delta} t {\boldsymbol{\Phi}^{N-1}}^{\top} \right) \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} \\ &&+\sum\limits_{n = 1}^{N-1} \left( \mu_{N-n} \frac{\partial G^{N-n}}{\partial \textbf{U}} +{\boldsymbol{\Lambda}^{n}}^{\top} \frac{\partial \textbf{R}}{\partial \textbf{U}}^{N-n} \right. \\ &&\left. \vphantom{+{\sum}_{n = 1}^{N-1} \left( \mu_{N-n} \frac{\partial G^{N-n}}{\partial \textbf{U}} +{\boldsymbol{\Lambda}^{n}}^{\top} \frac{\partial \textbf{R}}{\partial \textbf{U}}^{N-n} \right)} -{\boldsymbol{\Phi}^{n-1}}^{\top} +{\boldsymbol{\Phi}^{n}}^{\top} \right) \frac{\partial \textbf{U}^{N-n}}{\partial d_i} \\ &&+ \sum\limits_{n = 1}^{N-1}\left( \mu_{N-n} \frac{\partial G^{N-n}}{\partial \dot{\textbf{U}}} +{\boldsymbol{\Lambda} ^{n}}^{\top} \frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{N-n} \right. \\ &&\left. \vphantom{+ {\sum}_{n = 1}^{N-1}\left( \mu_{N-n} \frac{\partial G^{N-n}}{\partial \dot{\textbf{U}}} +{\boldsymbol{\Lambda}^{n}}^{\top} \frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{N-n} \right)}- (1-\alpha){\Delta} t {\boldsymbol{\Phi}^{n-1}}^{\top}- \alpha {\Delta} t {\boldsymbol{\Phi}^{n}}^{\top} \right) \frac{\partial \dot{\textbf{U}}^{N-n}}{\partial d_i} \\ && +\left( \mu_{N} \frac{\partial G^{N}}{\partial \textbf{U}} +{\boldsymbol{\Lambda} ^{0}}^{\top} \frac{\partial \textbf{R}}{\partial \textbf{U}}^{N} +{\boldsymbol{\Phi}^{0}}^{\top} \right) \frac{\partial \textbf{U}^{N}}{\partial d_i} \\ &&+\left( \mu_{N} \frac{\partial G^{N}}{\partial \dot{\textbf{U}}} +{\boldsymbol{\Lambda}^{0}}^{\top} \frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{N} - \alpha {\Delta} t {\boldsymbol{\Phi}^{0}}^{\top} \right) \frac{\partial \dot{\textbf{U}}^{N}}{\partial d_i} . \end{array} $$
(58)

To annihilate the implicitly defined derivatives UN/di and \(\partial \dot {\textbf {U}}^{N} / \partial d_{i}\), we first solve the adjoint problem

$$ {\textbf{K}^{N}}^{\top} {\boldsymbol{\Lambda}^{0}}=-\mu_{N} \alpha {\Delta} t \frac{\partial G^{N}}{\partial \textbf{U}}^{\top} -\mu_{N} \frac{\partial G^{N}}{\partial \dot{\textbf{U}}}^{\top} , $$
(59)

for Λ0 and evaluate Φ0 from either of the following expressions

$$\begin{array}{@{}rcl@{}} {\boldsymbol{\Phi}^{0}} &=& -\mu_{N} \frac{\partial G^{N}}{\partial \textbf{U}}^{\top} - {\frac{\partial \textbf{R}}{\partial \textbf{U}}^{N}}^{\top} {\boldsymbol{\Lambda}^{0}} \end{array} $$
(60)
$$\begin{array}{@{}rcl@{}} &=&\frac{1}{\alpha {\Delta} t}\left( \mu_{N} \frac{\partial G^{N}}{\partial \dot{\textbf{U}}}^{\top} + {\frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{N}}^{\top} {\boldsymbol{\Lambda}^{0}} \right) . \end{array} $$
(61)

Note that for an explicit method, i.e., α = 0, we must use (60) to evaluate Φ0. We next evaluate

$$ \frac{\mathrm{D} F}{\mathrm{D} d_{i}}= \mu_{N} \frac{\partial G^{N}}{\partial d_{i}} + {\boldsymbol{\Lambda}^{0}}^{\top} \frac{\partial \textbf{R}}{\partial d_{i}}^{N} . $$
(62)

To annihilate Un/di and \(\partial \dot {\textbf {U}}^{n} / \partial d_{i}\), we march in time computing Λn from

$$\begin{array}{@{}rcl@{}} {\textbf{K}^{N-n}}^{\top} {\boldsymbol{\Lambda}^{n}}&=&-\mu_{N-n} \alpha {\Delta} t \frac{\partial G^{N-n}}{\partial \textbf{U}}^{\top} \\&&-\mu_{N-n} \frac{\partial G^{N-n}}{\partial \dot{\textbf{U}}}^{\top} + {\Delta} t {\boldsymbol{\Phi}^{n-1}} , \end{array} $$
(63)

and updating Φn from either of the following equations

$$\begin{array}{@{}rcl@{}} {\boldsymbol{\Phi}^{n}} &= &{\boldsymbol{\Phi}^{n-1}} -\mu_{N-n} \frac{\partial G^{N-n}}{\partial \textbf{U}}^{\top} - {\frac{\partial \textbf{R}}{\partial \textbf{U}}^{N-n}}^{\top} {\boldsymbol{\Lambda}^{n}} \end{array} $$
(64)
$$\begin{array}{@{}rcl@{}} &= &-\frac{1-\alpha}{\alpha} {\boldsymbol{\Phi}^{n-1}}\\ &&+\frac{1}{\alpha {\Delta} t}\left( \mu_{N-n} \frac{\partial G^{N-n}}{\partial \dot{\textbf{U}}}^{\top} + {\frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{N-n}}^{\top} {\boldsymbol{\Lambda}^{n}} \right) . \end{array} $$
(65)

Again (65) is restricted to the α≠ 0 case. Due to the different Φ updates, we define option 1 if we choose to use (60) and (64), and option 2 if we use (61) and (65). After each of these tn computations, we update

$$ \frac{\mathrm{D} F}{\mathrm{D} d_{i}} \leftarrow \frac{\mathrm{D} F}{\mathrm{D} d_{i}} + \mu_{N-n} \frac{\partial G^{N-n}}{\partial d_{i}} + {\boldsymbol{\Lambda}^{n}}^{\top} \frac{\partial \textbf{R}}{\partial d_{i}}^{N-n} . $$
(66)

Finally, to annihilate \(\partial \dot {\textbf {U}}^{0} / \partial d_{i}\), we solve the linear problem

$$ {\textbf{K}^{0}}^{\top} {\boldsymbol{\Lambda}^{N}} =-\mu_{0} \frac{\partial G^{0}}{\partial \textbf{U}}+ (1-\alpha){\Delta} t {\boldsymbol{\Phi}^{N-1}} , $$
(67)

for ΛN and update

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}} &\leftarrow& \frac{\mathrm{D} F}{\mathrm{D} d_{i}} + \mu_{0} \frac{\partial G^{0}}{\partial d_{i}} + \mu_{0} \frac{\partial G^{0}}{\partial \textbf{U}}\frac{\partial \textbf{U}^{0}}{\partial d_i} -{\boldsymbol{\Phi}^{N-1}}^{\top}\frac{\partial \textbf{U}^{0}}{\partial d_i} \\ &&+ {\boldsymbol{\Lambda}^{N}}^{\top} \left( \frac{\partial {\textbf{R}}^{0}}{\partial d_{i}} + \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}} \frac{\partial \textbf{U}^{0}}{\partial d_i}\right) . \end{array} $$
(68)

All of the computations in (59)–(67) are performed after the primal analysis is terminated, thus we insert node C from Fig. 4 into the flowchart of Fig. 1.

Fig. 4
figure 4

Adjoint discretize-then-differentiate node

The sensitivities using the differentiate-then-discretize and discretize-then-differentiate adjoint approaches are different because the discretization and differentiation steps do not commute. As seen shortly, the latter approach yields more accurate results. However, for large number of time steps, the time discretization error shrinks and the methods converge.

For the semi-analytical, if we use option 1, we again require Rn/U to be symmetric and we approximate the adjoint load terms of (60) and (64) as

$$ {\frac{\partial \textbf{R}}{\partial \textbf{U}}^{N-n}}^{\top} {\boldsymbol{\Lambda}^{n}} \approx \frac{1}{\epsilon} \textbf{R} \left( \textbf{U}^{N-n}+\epsilon \boldsymbol{\Lambda}^{n}, \dot{\textbf{U}}{}^{N-n}, \textbf{d}\right) , $$
(69)

Fortunately, we have option 2 to approximate Φn if Rn/U is asymmetric and we cannot use (69). We consider the restriction for which \(\partial \textbf {R} / \partial \dot {\textbf {U}}\) is symmetric, which is common, and α≠ 0. In this case, the adjoint load terms of (61) and (65) are approximated as

$$ {\frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{N-n}}^{\top} \boldsymbol{\Lambda}^{n} \approx \frac{1}{\epsilon} \textbf{R} \left( \textbf{U}^{N-n}, \dot{\textbf{U}}{}^{N-n}+{\epsilon} \boldsymbol{\Lambda}^{n} , \textbf{d}\right) , $$
(70)

and in DF/Ddi of (68) we approximate the sum

$$ \frac{\partial {\textbf{R}}^{0}}{\partial d_{i}} + \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}} \frac{\partial \textbf{U}}{\partial d_i}^{0} \approx \frac{1}{\epsilon} \textbf{R} \left( \textbf{U}^{0} +\epsilon \frac{\partial \textbf{U}}{\partial d_i}^{0}, \dot{\textbf{U}}{}^{0} , \textbf{d} +\epsilon \textbf{e}_i\right) . $$
(71)

The derivatives Rn/di of (62), (66), and (68) are approximated via finite differences using (56). Again, these semi-analytical approximations are susceptible to the previously discussed errors.

3.6 Transient example

Consider the transient heat conduction problem of a straight one-dimensional fin with constant cross-sectional area (Kramer and Stockman 1963) expressed in non-dimensional form as

$$\begin{array}{@{}rcl@{}} \dot{\theta} -\frac{\mathrm{d}}{\mathrm{d}x}\left[ k(\theta) \frac{\mathrm{d}\theta}{\mathrm{d}x}\right]+M^{2}\theta^{p + 1} = 0, & \enskip \text{in} \enskip 0<x<1 , \\ \frac{\mathrm{d}\theta}{\mathrm{d}x} = 0, &\enskip\text{at} \enskip x = 0, t>0 ,\\ \theta = 1, &\enskip\text{at} \enskip x = 1, t>0 ,\\ \theta = 1, &\enskip\text{at} \enskip t = 0 , \end{array} $$
(72)

where x, t, and 𝜃 are the non-dimensional position, time, and temperature respectively. k(𝜃) = 1 + ξ𝜃 is the non-dimensional thermal conductivity, ξ and M = 1 are fin parameters, and the exponent p = 1/3 models the removal of heat by turbulent natural convection along the fin. The bar is discretized by 5 equal length linear finite elements and the time domain [0,1] is discretized into N equal time steps. The Newton-Raphson tolerance is 𝜖R = |R| < 10− 14. The various sensitivity methods are illustrated for the following response function

$$ F={{\int}_{0}^{2}} {{\int}_{0}^{1}} \left( \zeta\theta^{2}(x,t) + (1-\zeta)\dot{\theta}^{2}(x,t)\right) \mathrm{d}x \mathrm{d}t $$
(73)

where the integral is approximated by using the trapezoidal rule in time and the element wise 2-point Gaussian quadrature in space. We use ζ = 0.5 and compute the sensitivities with respect to the parameter d = M. The perturbation 𝜖 = 10− 6 is used in the finite difference and semi-analytical approaches, unless otherwise stated.

3.6.1 Symmetric R/ U and \(\partial \mathbf {R} / \partial \dot {\mathbf {U}}\)

We first consider the linear thermal conductivity case, i.e., ξ = 0, for which R/U and \(\partial \textbf {R} / \partial \dot {\textbf {U}}\) are symmetric. The computations performed with the various methods yield similar results, cf. Table 1. For N = 100 and α = 0, the explicit integration scheme is not stable. Also, for α = 0, we cannot use the semi-analytical adjoint discretize-then-differentiate option 2, cf. (61) and (65).

Table 1 Sensitivities for the symmetric problem

To examine the consistency of the methods, we show the error ef, cf. (1), for different perturbation sizes 𝜖 for the N = 1000 and α = 0.5 case, cf. Figure 5. As the perturbation 𝜖 decreases, the sensitivities obtained by finite differences converge to those obtained analytically. However, the finite difference sensitivities erode for small perturbations due to round-off error.

Fig. 5
figure 5

Relative percentage error of the sensitivities obtained by the analytical methods with respect to finite differences for the symmetric problem for α = 0.5 and N = 1000

We also show the error ef for different time discretizations N for the 𝜖 = 10− 6 and α = 0.5 case, cf. Figure 6. The errors of the sensitivities obtained by the different methods show no dependency on N, with the exception of the adjoint differentiate-then-discretize scheme. As expected, this sensitivity has a consistency error that decreases as the number of time steps increases (Gunzburger 2003; Jensen et al. 2014).

Fig. 6
figure 6

Relative percentage error of the sensitivities obtained by the analytical methods with respect to finite differences for the symmetric problem for α = 0.5 and 𝜖 = 10− 6

To examine the accuracy of the semi-analytical sensitivities, we compare them to their respective analytical sensitivities via the error es of (2) for different perturbation sizes 𝜖 and the N = 1000 and α = 0.5 case, cf. Figure 7. As expected, the error is smaller as the perturbation size decreases until round-off error pollutes the computations.

In Fig. 8, we show the error es for different time discretization N using the 𝜖 = 10− 6 and α = 0.5 case. The errors of the semi-analytical sensitivities show no dependency on N because the semi-analytical approximations are independent of the time discretization, i.e., the error is solely due to the perturbation size 𝜖.

Fig. 7
figure 7

Relative percentage error of the semi-analytical sensitivities of the symmetric problem for α = 0.5 and N = 1000

Fig. 8
figure 8

Relative percentage error of the semi-analytical sensitivities of the symmetric problem for α = 0.5 and 𝜖 = 10− 6

3.6.2 Asymmetric R/ U

We consider the nonlinear thermal conductivity case where ξ = 0.5 for which only \(\partial \textbf {R} / \partial \dot {\textbf {U}}\) is symmetric and R/U is not. The computations performed with the various methods yield similar results, cf. Table 2, with the exception of the semi-analytical adjoint differentiate-then-discretize and semi-analytical adjoint discretize-then-differentiate option 1 schemes, which exhibit errors of approximately 0.1% with respect to their analytical counter parts. We attribute this error to the asymmetric R/U. Again for N = 100, the explicit α = 0 scheme is not stable.

Table 2 Sensitivities for the asymmetric problem

To examine the consistency of the methods, we show the error ef for different perturbation sizes and time steps in Figs. 9 and 10. Again as shown in the previous example, the adjoint method differentiate-then-discretize has a consistency error that decreases as the number of time steps increases.

Fig. 9
figure 9

Relative percentage error of the sensitivities obtained by the analytical methods with respect to finite differences for the asymmetric problem for α = 0.5 and N = 1000

Fig. 10
figure 10

Relative percentage error of the sensitivities obtained by the analytical methods with respect to finite differences for the asymmetric problem for α = 0.5 and 𝜖 = 10− 6

Now we examine the accuracy of the semi-analytical sensitivities, computing the error es for different perturbation sizes 𝜖 with N = 1000 and α = 0.5, cf. Figure 11. Since R/U is not symmetric, (54) and (69) do not hold, resulting in appreciable error in both the semi-analytical adjoint differentiate-then-discretize and the semi-analytical adjoint discretize-then-differentiate option 1 schemes. The other semi-analytical methods do not exhibit this error. Again as the perturbation size decreases, the error lessens until round-off error pollutes the computations.

Fig. 11
figure 11

Relative percentage error of the semi-analytical sensitivities of the asymmetric problem for α = 0.5 and N = 1000

In Fig. 12, we show the error es for different time discretization N using the 𝜖 = 10− 6 and α = 0.5 case. The error for the semi-analytical adjoint differentiate-then-discretize and the semi-analytical adjoint discretize-then-differentiate option 1 schemes is evident.

Fig. 12
figure 12

Relative percentage error of the semi-analytical sensitivities of the asymmetric problem for α = 0.5 and 𝜖 = 10− 6

4 Nonlinear dynamic problems

A nonlinear dynamic problem can be expressed through a residual as

$$\begin{array}{@{}rcl@{}} \textbf{R}(\textbf{U}(t, \textbf{d}),\dot{\textbf{U}}(t, \textbf{d}),\ddot{\textbf{U}}(t, \textbf{d}),\textbf{d}) &=&\textbf{0} , \end{array} $$
(74a)
$$\begin{array}{@{}rcl@{}} \dot{\textbf{U}}(0) &=& \dot{\textbf{U}}^{0} , \end{array} $$
(74b)
$$\begin{array}{@{}rcl@{}} \textbf{U}(0) &=& \textbf{U}^{0} , \end{array} $$
(74c)

where we note the design dependencies as in (7). The response function for this system is again expressed by (20) and its sensitivity computed by (21).

For the sensitivity analysis, we implement the direct method by differentiating (74a), (74b), and (74c)

$$\begin{array}{@{}rcl@{}} \frac{\partial \textbf{R}}{\partial \ddot{\textbf{U}}} \frac{\partial \ddot{\textbf{U}}}{\partial d_i} +\frac{\partial \textbf{R}}{\partial\dot{\textbf{U}}} \frac{\partial \dot{\textbf{U}}}{\partial d_i} + \frac{\partial \textbf{R}}{\partial{\textbf{U}}} \frac{\partial \textbf{U}}{\partial d_i} & =& - \frac{\partial {\textbf{R}}}{\partial d_i} , \end{array} $$
(75a)
$$\begin{array}{@{}rcl@{}} \frac{\partial \dot{\textbf{U}}(0)}{\partial d_i} &=& \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} , \end{array} $$
(75b)
$$\begin{array}{@{}rcl@{}} \frac{\partial \textbf{U}(0)}{\partial d_i} &=& \frac{\partial \textbf{U}^{0}}{\partial d_i} , \end{array} $$
(75c)

and solve the resulting pseudo problem for \(\partial \ddot {\textbf {U}}/\partial d_{i}\), \(\partial \dot {\textbf {U}}/\partial d_{i}\), and U/di whereupon we evaluate (21).

Alternatively, we can implement the adjoint method whereby we insert (75a) into (21) to obtain the equivalent sensitivity

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}& = & {\int}_{0}^{t_{f}} \left( \frac{\partial G}{\partial\textbf{U}} \frac{\partial \textbf{U}}{\partial d_i} + \frac{\partial G}{\partial \dot{\textbf{U}}} \frac{\partial \dot{\textbf{U}}}{\partial d_i} + \frac{\partial G}{\partial d_{i}} \right) \mathrm{d}t \\ && + {\int}_{0}^{t_{f}} {\boldsymbol{ \lambda}}^{\top} \left( \frac{\partial \textbf{R}}{\partial \ddot{\textbf{U}}} \frac{\partial \ddot{\textbf{U}}}{\partial d_i} +\frac{\partial \textbf{R}}{\partial\dot{\textbf{U}}} \frac{\partial \dot{\textbf{U}}}{\partial d_i} + \frac{\partial \textbf{R}}{\partial{\textbf{U}}} \frac{\partial \textbf{U}}{\partial d_i} + \frac{\partial \textbf{R}}{\partial d_{i}} \right) \mathrm{d}t .\\ \end{array} $$
(76)

Where again λ is the arbitrary adjoint vector. Integrating by parts and rearranging (76) yields

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}\!\!&=&\!\! {\int}_{0}^{t_{f}} \left( \frac{\partial G}{\partial d_{i}} + {\boldsymbol{ \lambda}^{\top}} \frac{\partial \textbf{R}}{\partial d_{i}} \right) \mathrm{d}t \\ &&-\left. \frac{\partial \textbf{U}^{0}}{\partial d_i}^{\top} \left( \frac{\partial G}{\partial \dot{\textbf{U}}}^{\top} + \frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{\top} \boldsymbol{ \lambda} -\frac{\text{d}}{\text{d}t}\left( \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top}\boldsymbol{ \lambda}\right)\right)\right\rvert_{t = 0} \\ &&-\!\left. \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i}^{\top} \left( \frac{\partial \textbf{R}}{\partial \ddot{\textbf{U}}}^{\top} \boldsymbol{ \lambda}\right)\right\rvert_{t = 0} + {\int}_{0}^{t_{f}} \frac{\partial \textbf{U}}{\partial d_i}^{\top} \left( \frac{\partial {G}}{\partial \textbf{U}}^{\top} \! - \frac{\text{d}}{\text{d}t}\left( \frac{\partial {G}^{\top}}{\partial \dot{\textbf{U}}}\right) \right. \\ &&\left.\vphantom{+ {\int}_{0}^{t_{f}} \frac{\partial \textbf{U}}{\partial d_i}^{\top} \left( \frac{\partial {G}}{\partial \textbf{U}}^{\top} \right)} +\frac{\partial \textbf{R}^{\top}}{\partial \textbf{U}} \boldsymbol{ \lambda} - \frac{\text{d}}{\text{d}t}\left( \frac{\partial \textbf{R}^{\top}}{\partial \dot{\textbf{U}}} \boldsymbol{ \lambda}\right) +\frac{\text{d}^{2}}{\text{d}t^{2}} \left( {\frac{\partial \textbf{R}^{\top}}{\partial{\ddot{\textbf{U}}}} \boldsymbol{ \lambda}}\right) \right) \mathrm{d}t \\&&+\left.\frac{\partial \textbf{U}}{\partial d_i}^{\top} \left( \frac{\partial G}{\partial \dot{\textbf{U}}}^{\top} + \frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{\top} \boldsymbol{ \lambda}-\frac{\text{d}}{\text{d}t} \left( \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top}\boldsymbol{ \lambda}\right)\right)\right\rvert_{t=t_{f}} \\&&+\left.\frac{\partial \dot{\textbf{U}}}{\partial d_i}^{\top} \left( \frac{\partial \textbf{R}}{\partial \ddot{\textbf{U}}}^{\top} \boldsymbol{\lambda}\right)\right\rvert_{t=t_{f}} .\\ \end{array} $$
(77)

Next, we introduce the time mapping of (25) and substitute it into the above (77) to obtain

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}&=& {\int}_{0}^{t_{f}} \left( \frac{\partial G}{\partial d_{i}} + {\boldsymbol{\Lambda}^{\top}} \frac{\partial \textbf{R}}{\partial d_{i}} \right) \mathrm{d}t \\ &&-\left. \frac{\partial \textbf{U}^{0}}{\partial d_i}^{\top} \left( \frac{\partial G}{\partial \dot{\textbf{U}}}^{\top} + \left( \frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{\top} -\frac{\text{d}}{\text{d}t}\left( \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top}\right)\right) \boldsymbol{\Lambda} + \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top} \dot{\boldsymbol{\Lambda}} \right)\right\rvert_{t = 0} \\&&-\left. \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i}^{\top} \left( \frac{\partial \textbf{R}}{\partial \ddot{\textbf{U}}}^{\top} \boldsymbol{\Lambda} \right)\right\rvert_{t = 0} + {\int}_{0}^{t_{f}} \frac{\partial \textbf{U}}{\partial d_i}^{\top} \left( \frac{\partial {G}}{\partial \textbf{U}}^{\top} - \frac{\text{d}}{\text{d}t}\left( \frac{\partial {G}^{\top}}{\partial \dot{\textbf{U}}}\right) \right. \\&&+\left. \left( \frac{\partial \textbf{R}}{\partial \textbf{U}}^{\top} - \frac{\text{d}}{\text{d}t}\left( \frac{\partial \textbf{R}^{\top}}{\partial \dot{\textbf{U}}}\right) +\frac{\text{d}^{2}}{\text{d}t^{2}} \left( \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top}\right) \right) \boldsymbol{\Lambda} \right. \\ &&+\left.\left( \frac{\partial {\textbf{R}}}{\dot{\textbf{U}}}^{\top} -2\frac{\text{d}}{\text{d}t}\left( \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top}\right)\right)\dot{\boldsymbol{\Lambda}} +\frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top} \ddot{\boldsymbol{\Lambda}} \right) \mathrm{d}t \\&&+\left.\frac{\partial \textbf{U}}{\partial d_i}^{\top} \left( \frac{\partial G}{\partial \dot{\textbf{U}}}^{\top} + \left( \frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{\top} -\frac{\text{d}}{\text{d}t}\left( \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top}\right) \right) \boldsymbol{\Lambda} + \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top} \dot{\boldsymbol{\Lambda}} \right)\right\rvert_{t=t_{f}} \\&&+\left. \frac{\partial \dot{\textbf{U}}}{\partial d_i}^{\top} \left( \frac{\partial \textbf{R}}{\partial \ddot{\textbf{U}}}^{\top} \boldsymbol{\Lambda} \right)\right\rvert_{t=t_{f}} . \end{array} $$
(78)

where all quantities are evaluated at time t except for Λ which is evaluated at tft. We annihilate the terms containing the implicitly defined derivative U/di by requiring Λ to solve

$$\begin{array}{@{}rcl@{}} &&\frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top} \ddot{\boldsymbol{\Lambda}} +\left( \frac{\partial {\textbf{R}}}{\partial\dot{\textbf{U}}}^{\top} -2\frac{\text{d}}{\text{d}t}\left( \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top}\right)\right)\dot{\boldsymbol{\Lambda}} \\ &&+\left( \frac{\partial \textbf{R}}{\partial \textbf{U}}^{\top} - \frac{\text{d}}{\text{d}t}\left( \frac{\partial \textbf{R}^{\top}}{\partial \dot{\textbf{U}}}\right) +\frac{\text{d}^{2}}{\text{d}t^{2}}\left( \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top}\right)\right) \boldsymbol{\Lambda} \\ &=& - \frac{\partial {G}}{\partial \textbf{U}}^{\top} + \frac{\text{d}}{\text{d}t}\left( \frac{\partial {G}^{\top}}{\partial \dot{\textbf{U}}}\right) , \end{array} $$
(79a)
$$\begin{array}{@{}rcl@{}} \left.\frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top} \right\rvert_{t=t_{f}} \dot{\boldsymbol{\Lambda}} (0) &=&\left. -\frac{\partial G}{\partial \dot{\textbf{U}}}^{\top} \right\rvert_{t=t_{f}} , \end{array} $$
(79b)
$$\begin{array}{@{}rcl@{}} \boldsymbol{\Lambda}(0) &=& \textbf{0} . \end{array} $$
(79c)

Using this Λ, the sensitivity reduces to

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}&=& {\int}_{0}^{t_{f}} \left( \frac{\partial G}{\partial d_{i}} + {\boldsymbol{\Lambda}^{\top}} \frac{\partial \textbf{R}}{\partial d_{i}} \right) \mathrm{d}t \\ &&- \frac{\partial \textbf{U}^{0}}{\partial d_i}^{\top} \left( \frac{\partial G}{\partial \dot{\textbf{U}}}^{\top} + \left( \frac{\partial \textbf{R}}{\partial \dot{\textbf{U}}}^{\top} -\frac{\text{d}}{\text{d}t}\left( \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top}\right)\right) \boldsymbol{\Lambda} \right. \\ &&+\left.\left. \frac{\partial {\textbf{R}}}{\partial \ddot{\textbf{U}}}^{\top} \dot{\boldsymbol{\Lambda}} \right)\right\rvert_{t = 0} -\left. \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i}^{\top} \left( \frac{\partial \textbf{R}}{\partial \ddot{\textbf{U}}}^{\top} \boldsymbol{\Lambda} \right)\right\rvert_{t = 0} . \end{array} $$
(80)

where again all quantities are evaluated at time t except for Λ which is evaluated at tft.

4.1 Discretization

To solve the above, we discretize in time using the Newmark method so that

$$\begin{array}{@{}rcl@{}} \dot{\textbf{U}}^{n} =\dot{\textbf{U}}^{n-1}+(1-\gamma){\Delta} t \ddot{\textbf{U}}^{n-1}+\gamma{\Delta} t \ddot{\textbf{U}}^{n} , \end{array} $$
(81)
$$\begin{array}{@{}rcl@{}} \textbf{U}^{n} &=& \textbf{U}^{n-1}+{\Delta} t\dot{\textbf{U}}^{n-1}\\ &&+\left( \frac{1}{2}-\beta\right){\Delta} t^{2} \ddot{\textbf{U}}^{n-1}+\beta{\Delta} t^{2} \ddot{\textbf{U}}^{n} , \end{array} $$
(82)

where Un = U(tn), \(\dot {\textbf {U}}^{n}=\dot {\textbf {U}}(t_{n})\) and \(\ddot {\textbf {U}}^{n}=\ddot {\textbf {U}}(t_{n})\). To simplify the ensuing developments, we define coefficients a = (1 − γt, b = γΔt, c = Δt, d = (1/2 − βt2, and e = βΔt2.

4.2 Primal analysis

In the primal analysis, we are given the initial condition U0 and \(\dot {\textbf {U}}^{0}\), so first we use (74a) and solve

$$ \textbf{R}^{0}(\textbf{U}^{0},\dot{\textbf{U}}^{0},\ddot{\textbf{U}}^{0},\textbf{d}) =\textbf{0} , $$
(83)

for \(\ddot {\textbf {U}}^{0}\) by Newton-Raphson. The updates \({\Delta }\ddot {\textbf {U}}^{0}\) for \(\ddot {\textbf {U}}^{0}\) are obtained by solving

$$ \textbf{K}^{0}(\textbf{U}^{0},\dot{\textbf{U}}^{0},\ddot{\textbf{U}}^{0}, \textbf{d}){\Delta}\ddot{\textbf{U}}^{0}=-\textbf{R}^{0}(\textbf{U}^{0},\dot{\textbf{U}}^{0},\ddot{\textbf{U}}^{0}, \textbf{d}) , $$
(84)

where \( \textbf {K}^{0}={\partial \textbf {R}^{0}}/{\partial \ddot {\textbf {U}}}\) is the tangent matrix. We continue updating until convergence.

Having U0, \(\dot {\textbf {U}}^{0}\) and \(\ddot {\textbf {U}}^{0}\), we compute the first term in (31), i.e., \(F= \mu _{0} G^{0}(\textbf {U}^{0},\dot {\textbf {U}}^{0}, \textbf {d})\).

Now we commence our analysis. At each time step tn, we replace \(\dot {\textbf {U}}^{n}\) and Un with the right-hand side (RHS) of (81) and (82), solve (74a) for \(\ddot {\textbf {U}}^{n}\) and then evaluate \(\dot {\textbf {U}}^{n}\) and Un from (81) and (82). Newton’s method is also used for these solves, whereupon we calculate the update \({\Delta }\ddot {\textbf {U}}^{n}\) from the linear equation

$$ \textbf{K}^{n}(\textbf{U}^{n},\dot{\textbf{U}}^{n},\ddot{\textbf{U}}^{n}, \textbf{d}){\Delta}\ddot{\textbf{U}}^{n}=-\textbf{R}^{n}(\textbf{U}^{n},\dot{\textbf{U}}^{n},\ddot{\textbf{U}}^{n}, \textbf{d}) , $$
(85)

where \( \textbf {K}^{n}={\partial \textbf {R}^{n}}/{\partial \ddot {\textbf {U}}}+b {\partial \textbf {R}^{n}}/{\partial \dot {\textbf {U}}} +e {\partial \textbf {R}^{n}}/{\partial \textbf {U}}\) is the tangent stiffness matrix. After convergence, we update F as per (34). A flowchart of these computations appears in Fig. 13.

Fig. 13
figure 13

Primal analysis flowchart for dynamic problem

4.3 Direct differentiation

For the direct differentiation sensitivity analysis, we discretize U/di like U, i.e.,

$$\begin{array}{@{}rcl@{}} \frac{\partial \dot{\textbf{U}}^{n}}{\partial d_i} &=&\frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i}+a \frac{\partial \ddot{\textbf{U}}^{n-1}}{\partial d_{i}}+b \frac{\partial \ddot{\textbf{U}}^{n}}{\partial d_{i}} , \end{array} $$
(86)
$$\begin{array}{@{}rcl@{}} \frac{\partial \textbf{U}^{n}}{\partial d_i} &=& \frac{\partial \textbf{U}^{n-1}}{\partial d_i}+c \frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i}+d \frac{\partial \ddot{\textbf{U}}^{n-1}}{\partial d_{i}}+e \frac{\partial \ddot{\textbf{U}}^{n}}{\partial d_{i}} . \end{array} $$
(87)

Note that the initial condition U0/di and \(\partial \dot {\textbf {U}}^{0} /\partial d_{i}\) are known, but \(\partial \ddot {\textbf {U}}^{0} /\partial d_{i}\) is not. So before commencing, we must obtain \(\partial \ddot {\textbf {U}}^{0} /\partial d_{i}\) like we did \({\ddot {\textbf {U}}}^{0}\). To these ends, we differentiate (83) to obtain the linear equation

$$ \textbf{K}^{0} \frac{\partial \ddot{\textbf{U}}^{0}}{\partial d_{i}} = -\left( \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\frac{\partial \textbf{U}^{0}}{\partial d_i} +\frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} + \frac{\partial {\textbf{R}}^{0}}{\partial d_{i}} \right) , $$
(88)

which we solve for \(\partial \ddot {\textbf {U}}^{0} /\partial d_{i}\). Having \(\partial \dot {\textbf {U}}^{0} /\partial d_{i}\) and U0/di we update DF/Ddi as per (37). Now we march in time evaluating Un/di, \(\partial \dot {\textbf {U}}^{n} /\partial d_{i}\) and \(\partial \ddot {\textbf {U}}^{n} /\partial d_{i}\) as we did to compute Un, \(\dot {\textbf {U}}^{n}\), and \(\ddot {\textbf {U}}^{n}\). From (75a), (86), and (87) we formulate the linear equation

$$\begin{array}{@{}rcl@{}} \textbf{K}^{n} \frac{\partial \ddot{\textbf{U}}^{n}}{\partial d_{i}} &=&-\frac{\partial \textbf{R}^{n}}{\partial \textbf{U}} \left( \frac{\partial \textbf{U}^{n-1}}{\partial d_i} +c \frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i} +d \frac{\partial \ddot{\textbf{U}}^{n-1}}{\partial d_{i}} \right)\\ &&-\frac{\partial {\textbf{R}}^{n}}{\partial \dot{\textbf{U}}} \left( \frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i} +a \frac{\partial \ddot{\textbf{U}}^{n-1}}{\partial d_{i}}\right) - \frac{\partial \textbf{R}^{n}}{\partial d_{i}} . \end{array} $$
(89)

We solve the above (89) for \(\partial \ddot {\textbf {U}}^{n} /\partial d_{i}\) and update \(\partial \dot {\textbf {U}}^{n} /\partial d_{i}\) and Un/di via (86) and (87) and DF/Ddi via (39). We continue marching in this manner for all tn. In so far as our sensitivity analysis algorithm is concerned, we insert nodes A and B from Fig. 14 into the primal analysis flowchart of Fig. 13.

Fig. 14
figure 14

Direct differentiation nodes for dynamic problem

For semi-analytical, we have the approximations

$$\begin{array}{@{}rcl@{}} &&\frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\frac{\partial \textbf{U}^{0}}{\partial d_i} +\frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} + \frac{\partial {\textbf{R}}^{0}}{\partial d_{i}} \approx \\ &&\frac{1}{\epsilon} \textbf{R}^{0} \left( \textbf{U}^{0} + \epsilon\frac{\partial \textbf{U}^{0}}{\partial d_i}, \dot{\textbf{U}}^{0} + \epsilon\frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i}, \ddot{\textbf{U}}^{0}, \textbf{d} + \epsilon \textbf{e}_i \right) , \end{array} $$
(90)
$$\begin{array}{@{}rcl@{}} &&\frac{\partial \textbf{R}^{n}}{\partial \textbf{U}} \left( \frac{\partial \textbf{U}^{n-1}}{\partial d_i} +c \frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i} +d \frac{\partial \ddot{\textbf{U}}^{n-1}}{\partial d_{i}} \right) \\&&+\frac{\partial {\textbf{R}}^{n}}{\partial \dot{\mathbf{U}}} \left( \frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i} +a \frac{\partial \ddot{\textbf{U}}^{n-1}}{\partial d_{i}}\right) + \frac{\partial \textbf{R}^{n}}{\partial d_{i}} \\ &\approx&\frac{1}{\epsilon} \textbf{R}^{n} \left( \textbf{U}^{n}+\epsilon\left( \frac{\partial \textbf{U}^{n-1}}{\partial d_i} +c \frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i} +d \frac{\partial \ddot{\textbf{U}}^{n-1}}{\partial d_{i}} \right), \right. \\ &&\left. \vphantom{\frac{1}{\epsilon} \textbf{R}^{n} \left( \textbf{U}^{n}+\epsilon\left( \frac{\partial \textbf{U}^{n-1}}{\partial d_i} +c \frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i} +d \frac{\partial \ddot{\textbf{U}}^{n-1}}{\partial d_{i}} \right), \right)} \dot{\textbf{U}}^{n}+\epsilon\left( \frac{\partial \dot{\textbf{U}}^{n-1}}{\partial d_i} +a \frac{\partial \ddot{\textbf{U}}^{n-1}}{\partial d_{i}}\right), \ddot{\textbf{U}}^{n}, \textbf{d}+\epsilon \textbf{e}_i \right) ,\\ \end{array} $$
(91)

which we use in (88) and (89). Again, we assume the user can code Gn/U, \(\partial G^{n}/ \partial \dot {\textbf {U}}\) and Gn/di.

4.4 Adjoint method using differentiate-then-discretize

In the adjoint differentiate-then-discretize approach, we discretize the adjoint problem and sensitivity of (79a) and (80). (80) is evaluated as

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}&=& \sum\limits_{n = 0}^{N}\mu_{N-n} \left( \frac{\partial G^{N-n}}{\partial d_{i}} + {{\boldsymbol{\Lambda}^{n}}^{\top}} \frac{\partial \textbf{R}^{N-n}}{\partial d_{i}} \right) \\ &&- \frac{\partial \textbf{U}^{0}}{\partial \dot{\textbf{U}}}^{\top} \left( \frac{\partial G^{0}}{\partial \dot{\textbf{U}}}^{\top} + \left( \frac{\partial {\textbf{R}}^{0}}{\partial \dot{\textbf{U}}}^{\top} -\frac{\text{d}}{\text{d}t}\left( \frac{\partial {\textbf{R}}^{0}}{\partial \ddot{\textbf{U}}}^{\top}\right)\right) {\boldsymbol{\Lambda}^{N}}\right. \\ &&+\left. \frac{\partial {\textbf{R}}^{0}}{\partial \ddot{\textbf{U}}}^{\top} {\dot{\boldsymbol{\Lambda}}^{N}}\right) - \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i}^{\top} \left( \frac{\partial {\textbf{R}}^{0}}{\partial \ddot{\textbf{U}}}^{\top} {\boldsymbol{\Lambda}^{N}}\right) .\\\ \end{array} $$
(92)

To obtain Λn, we solve (79a), (79b), and (79c) like we did for U, i.e., we introduce the Newmark time stepping scheme

$$\begin{array}{@{}rcl@{}} \dot{\boldsymbol{\Lambda}}^{n} &=&\dot{\boldsymbol{\Lambda}}^{n-1}+a \ddot{\boldsymbol{\Lambda}}^{n-1}+b \ddot{\boldsymbol{\Lambda}}^{n} , \end{array} $$
(93)
$$\begin{array}{@{}rcl@{}} \boldsymbol{\Lambda}^{n} &=& \boldsymbol{\Lambda}^{n-1}+c \dot{\boldsymbol{\Lambda}}^{n-1}+d \ddot{\boldsymbol{\Lambda}}^{n-1}+e \ddot{\boldsymbol{\Lambda}}^{n} . \end{array} $$
(94)

To reuse Kn like the direct method, we restrict R such that

$$\begin{array}{@{}rcl@{}} \frac{\text{d}}{\text{d}t}\left( \frac{\partial \textbf{R}}{\partial\dot{\textbf{U}}}\right) &=\textbf{0} , \end{array} $$
(95)
$$\begin{array}{@{}rcl@{}} \frac{\text{d}}{\text{d}t}\left( \frac{\partial \textbf{R}}{\partial \ddot{\textbf{U}}}\right) &=\textbf{0} . \end{array} $$
(96)

This means, \({\partial \textbf {R}}/{\partial \dot {\textbf {U}}}\) and \({\partial \textbf {R}}/{\partial \dot {\textbf {U}}}\) which are typically interpreted as damping and mass matrices respectively, are constant.

Noting that Λ0 = 0 from (79c), we start the algorithm by solving (79b), i.e.,

$$ \frac{\partial \textbf{R}^{N}}{\partial\ddot{\textbf{U}}}^{\top} \dot{\boldsymbol{\Lambda}}^{0}=-\frac{\partial G^{N}}{\partial \dot{\textbf{U}}}^{\top} , $$
(97)

for \(\dot {\boldsymbol {\Lambda }}^{0}\). Next, we obtain \(\ddot {\boldsymbol {\Lambda }}^{0}\) from (79a), i.e.,

$$\begin{array}{@{}rcl@{}} \frac{\partial \textbf{R}^{N}}{\partial\ddot{\textbf{U}}}^{\top} \ddot{\boldsymbol{\Lambda}}^{0} &=& -\frac{\partial \textbf{R}^{N}}{\partial \dot{\textbf{U}}}^{\top} \dot{\boldsymbol{\Lambda}}^{0} - \frac{\partial G^{N}}{\partial \textbf{U}}^{\top} \\&&+\left( \frac{\partial^{2} G^{N}}{\partial \dot{\textbf{U}}\partial \textbf{U}} \dot{\textbf{U}}^{N}\right)^{\top} + \left( \frac{\partial^{2} G^{N}}{\partial \dot{\textbf{U}}^{2}} \ddot{\textbf{U}}^{N}\right)^{\top} . \end{array} $$
(98)

Notice that (97) and (98) do not use the tangent stiffness matrix of the primal analysis. Next we initialize DF/Ddi from (48).

The time marching now commences for the remaining in time steps tn, i.e., for n = 1,2,...,N − 1 we solve

$$\begin{array}{@{}rcl@{}} {\textbf{K}^{N-n}}^{\top} \ddot{\boldsymbol{\Lambda}}^{n} &=&- \frac{\partial G^{N-n}}{\textbf{U}}^{\top} +\left( \frac{\partial^{2} G^{N-n}}{\partial \dot{\textbf{U}}\partial \textbf{U}} \dot{\textbf{U}}^{N-n}\right)^{\top} \\&&+ \left( \frac{\partial^{2} G^{N-n}}{\partial \textbf{U}^{2}} \ddot{\textbf{U}}^{N-n} \right)^{\top} \\&&- \frac{\partial \textbf{R}^{N-n}}{\partial \textbf{U}}^{\top} \left( \boldsymbol{\Lambda}^{n-1} +c \dot{\boldsymbol{\Lambda}}^{n-1} +d \ddot{\boldsymbol{\Lambda}}^{n-1} \right) \\&&-\frac{\partial \textbf{R}^{N-n}}{\partial \dot{\textbf{U}}}^{\top} \left( \dot{\boldsymbol{\Lambda}}^{n-1} +a \ddot{\boldsymbol{\Lambda}}^{n-1}\right) , \\ \end{array} $$
(99)

for \(\ddot {\boldsymbol {\Lambda }}^{n}\). Then, we update Λn and \(\dot {\boldsymbol {\Lambda }}^{n}\) with (93) and (94) and DF/Ddi with (50).

Finally, we solve

$$\begin{array}{@{}rcl@{}} &&\left( \frac{\partial {\textbf{R}}^{0}}{\partial {\ddot{\textbf{U}}}}+b \frac{\partial {\textbf{R}}^{0}}{\partial {\dot{\textbf{U}}}} +e \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\right)^{\top} \ddot{\boldsymbol{\Lambda}}^{N} =- \frac{\partial G^{0}}{\partial \textbf{U}}^{\top} \\ &&+\left( \frac{\partial^{2} G^{0}}{\partial \dot{\textbf{U}}\partial \textbf{U}} \dot{\textbf{U}}^{0}\right)^{\top} + \left( \frac{\partial^{2} G^{0}}{\partial \dot{\textbf{U}}^{2}} \ddot{\textbf{U}}^{0}\right)^{\top} \\&&- \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}^{\top} \left( \boldsymbol{\Lambda}^{N-1} +c \dot{\boldsymbol{\Lambda}}^{N-1} +d \ddot{\boldsymbol{\Lambda}}^{N-1} \right) \\&&-\frac{\partial {\textbf{R}}^{0}}{\partial \dot{\textbf{U}}}^{\top} \left( \dot{\boldsymbol{\Lambda}}^{N-1} +a \ddot{\boldsymbol{\Lambda}}^{N-1}\right) , \\ \end{array} $$
(100)

for \(\ddot {\boldsymbol {\Lambda }}^{N}\), then obtain \(\dot {\boldsymbol {\Lambda }}^{N}\) and ΛN from (93) and (94), and update

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}} &\leftarrow& \frac{\mathrm{D} F}{\mathrm{D} d_{i}} +\mu_{0} \frac{\partial G^{0}}{\partial d_{i}} + \mu_{0} {\boldsymbol{\Lambda}^{N}}^{\top} \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{d}} -\frac{\partial G^{0}}{\partial \dot{\textbf{U}}}\frac{\partial \textbf{U}^{0}}{\partial d_i} \\&&- {\boldsymbol{\Lambda}^{N}}^{\top} \left( \frac{\partial {\textbf{R}}^{0}}{\partial \dot{\textbf{U}}}\frac{\partial \textbf{U}^{0}}{\partial d_i} + \frac{\partial {\textbf{R}}^{0}}{\partial \ddot{\textbf{U}}} \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i}\right) - \dot{\boldsymbol{\Lambda}}^{N\top} \frac{\partial {\textbf{R}}^{0}}{\partial \dot{\textbf{U}}}\frac{\partial \textbf{U}^{0}}{\partial d_i} .\\ \end{array} $$
(101)

Again, we note that (100) does not use the tangent stiffness matrix from primal problem. This algorithm is described by inserting node C from Fig. 15 into the flowchart of Fig. 13.

Fig. 15
figure 15

Adjoint differentiate-then-discretize node for dynamic problem

For the semi-analytical, we consider the further restriction that R/U and \(\partial \textbf {R} / \partial \dot {\textbf {U}}\) are symmetric. In this way, the term in the adjoint load of (98) can be approximated as

$$ \frac{\partial \textbf{R}^{N}}{\partial \dot{\textbf{U}}}^{\top} \dot{\boldsymbol{\Lambda}}^{0} \approx \frac{1}{\epsilon} \textbf{R}\left( \textbf{U}^{N}, \dot{\textbf{U}}^{N} +\epsilon \dot{\boldsymbol{\Lambda}}^{0}, \ddot{\textbf{U}}^{N}, \textbf{d} \right) , $$
(102)

and the terms in the adjoint load of (99) and (100) can be approximated as

$$\begin{array}{@{}rcl@{}} &&\frac{\partial \textbf{R}^{N-n}}{\partial \textbf{U}}^{\top} \left( \boldsymbol{\Lambda}^{n-1} +c \dot{\boldsymbol{\Lambda}}^{n-1} +d \ddot{\boldsymbol{\Lambda}}^{n-1} \right) \\&&+\frac{\partial \textbf{R}^{N-n}}{\partial \dot{\textbf{U}}}^{\top} \left( \dot{\boldsymbol{\Lambda}}^{n-1} +a \ddot{\boldsymbol{\Lambda}}^{n-1}\right) \\ &\approx&\frac{1}{\epsilon} \textbf{R}\left( \textbf{U}^{N-n}+\epsilon\left( \boldsymbol{\Lambda}^{n-1} +c \dot{\boldsymbol{\Lambda}}^{n-1} +d \ddot{\boldsymbol{\Lambda}}^{n-1} \right), \right. \\ &&\left.\vphantom{\frac{1}{\epsilon} \textbf{R}\left( \textbf{U}^{N-n}+\epsilon\left( \boldsymbol{\Lambda}^{n-1} +c \dot{\boldsymbol{\Lambda}}^{n-1} +d \ddot{\boldsymbol{\Lambda}}^{n-1} \right), \right)} \dot{\textbf{U}}^{N-n} +\epsilon \left( \dot{\boldsymbol{\Lambda}}^{n-1} +a \ddot{\boldsymbol{\Lambda}}^{n-1}\right), \ddot{\textbf{U}}^{N-n}, \textbf{d} \right) . \end{array} $$
(103)

Regarding DF/Ddi of (48), (50), and (101), we can use the approximations

$$\begin{array}{@{}rcl@{}} &&\frac{\partial {\textbf{R}}^{0}}{\partial \dot{\textbf{U}}} \frac{\partial \textbf{U}^{0}}{\partial d_i} + \frac{\partial {\textbf{R}}^{0}}{\partial \ddot{\textbf{U}}} \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} \approx \\ &&\frac{1}{\epsilon} \textbf{R}\left( \textbf{U}^{0}, \dot{\textbf{U}}^{0}+\epsilon\frac{\partial \textbf{U}^{0}}{\partial d_i}, \ddot{\textbf{U}}^{0}+\epsilon\frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i}, \textbf{d} \right) , \end{array} $$
(104)
$$\begin{array}{@{}rcl@{}} \frac{\partial {\textbf{R}}^{0}}{\partial {\ddot{\textbf{U}}}}\frac{\partial \textbf{U}^{0}}{\partial d_i} &\approx \frac{1}{\epsilon} \textbf{R}\left( \textbf{U}^{0}, \dot{\textbf{U}}^{0}, \ddot{\textbf{U}}^{0}+\epsilon\frac{\partial \textbf{U}^{0}}{\partial d_i}, \textbf{d}\right), \end{array} $$
(105)
$$\begin{array}{@{}rcl@{}} \frac{\partial \textbf{R}^{n}}{{\partial d_{i}}} &\approx \frac{1}{\epsilon} \textbf{R} \left( \textbf{U}^{n}, \dot{\textbf{U}}{}^{n}, \ddot{\textbf{U}}^{n} , \textbf{d}+\epsilon \textbf{e}_i\right) . \end{array} $$
(106)

4.5 Adjoint method using discretize-then-differentiate

In this adjoint discretize-then-differentiate method, we first discretize the primal analysis and response function in time and then we differentiate for the sensitivity analysis. Thus, we incorporate (75a), (86), and (87) into (32) to obtain the equivalent sensitivity

$$\begin{array}{@{}rcl@{}} \delta F&=&\sum\limits_{n = 0}^{N} \mu_{n} \left( \frac{\partial G^{n}}{\partial \textbf{U}} \frac{\partial \textbf{U}^{n}}{\partial d_i} + \frac{\partial G^{n}}{\partial \dot{\textbf{U}}}\frac{\partial \dot{\textbf{U}}^{n}}{\partial d_{i}} + \frac{\partial G^{n}}{\partial d_{i}}\right) \\ && +\sum\limits_{n = 0}^{N}\boldsymbol{\Lambda}^{n\top} \left( \frac{\partial \textbf{R}^{N-n}}{\partial \ddot{\textbf{U}}}\frac{\partial \ddot{\textbf{U}}^{N-n}}{\partial d_{i}}+\frac{\partial \textbf{R}^{N-n}}{\partial \dot{\textbf{U}}} \frac{\partial \dot{\textbf{U}}^{N-n}}{\partial d_i} \right. \\ && +\left. \frac{\partial \textbf{R}^{N-n}}{\partial \textbf{U}} \frac{\partial \textbf{U}^{N-n}}{\partial d_i} + \frac{\partial \textbf{R}^{N-n}}{\partial d_{i}}\right) \\ && +\sum\limits_{n = 0}^{N-1}{\boldsymbol{\Phi}^{n}}^{\top} \left( \frac{\partial \dot{\textbf{U}}^{N-n}}{\partial d_i} -\frac{\partial \dot{\textbf{U}}^{N-n-1}}{\partial d_i} \right. \\ && -\left.a \frac{\partial \ddot{\textbf{U}}^{N-n}}{\partial d_i} -\frac{\partial \ddot{\textbf{U}}^{N-n}}{\partial d_i} \right) \\ && +\sum\limits_{n = 0}^{N-1}{\boldsymbol{\Psi}^{n}}^{\top}\left( \frac{\partial \textbf{U}^{N-n}}{\partial d_i} - \frac{\partial \textbf{U}^{N-n-1}}{\partial d_i}-c \frac{\partial \dot{\textbf{U}}^{N-n-1}}{\partial d_i} \right. \\ &&-\left.d \frac{\partial \ddot{\textbf{U}}^{N-n-1}}{\partial d_i} -\frac{\partial \ddot{\textbf{U}}^{N-n}}{\partial d_i}\right) , \end{array} $$
(107)

where Λn, Φn, and Ψn are arbitrary adjoint vectors. Rearranging the above yields

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}}&=& {\sum}_{n = 0}^{N} \left( \mu_{N-n} \frac{\partial G^{N-n}}{\partial d_{i}} + \boldsymbol{\Lambda}^{n\top} \frac{\partial \textbf{R}^{N-n}}{\partial d_{i}} \right) \\ &&+\left( \mu_{0} \frac{\partial G^{0}}{\partial \textbf{U}} +{\boldsymbol{\Lambda}^{N}}^{\top}\frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}} -{\boldsymbol{\Psi}^{N-1}}^{\top} \right) \frac{\partial \textbf{U}^{0}}{\partial d_i} \\ &&+ \left( \mu_{0} \frac{\partial G^{0}}{\partial \dot{\textbf{U}}} +{\boldsymbol{\Lambda}^{N}}^{\top} \frac{\partial {\textbf{R}}^{0}}{\partial \dot{\textbf{U}}} -{\boldsymbol{\Phi}^{N-1}}^{\top} -c {\boldsymbol{\Psi}^{N-1}}^{\top} \right) \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} \\ &&+\left( {\boldsymbol{\Lambda}^{N}}^{\top} \frac{\partial {\textbf{R}}^{0}}{\partial \ddot{\textbf{U}}} -a {\boldsymbol{\Phi}^{N-1}}^{\top} -d {\boldsymbol{\Psi}^{N-1}}^{\top}\right) \frac{\partial \ddot{\textbf{U}}^{0}}{\partial d_{i}} \\ &&+\sum\limits_{n = 1}^{N-1} \left( \mu_{N-n} \frac{\partial G^{N-n}}{\textbf{U}} +{{\boldsymbol{\Lambda}}^{n}}^{\top}\frac{\partial \textbf{R}^{N-n}}{\partial \textbf{U}} \right. \\ &&+\left.{\boldsymbol{\Psi}^{n}}^{\top} -{\boldsymbol{\Psi}^{n-1}}^{\top} \right) \frac{\partial \textbf{U}^{N-n}}{\partial d_i}\\ &&+ \sum\limits_{n = 1}^{N-1}\left( \mu_{N-n} \frac{\partial G^{N-n}}{\partial \dot{\textbf{U}}^{n}}{\partial d_{i}} +{\boldsymbol{\Lambda}^{n}}^{\top} \frac{\partial \textbf{R}^{N-n}}{\partial \dot{\textbf{U}}} \right. \\ &&+\left.{\boldsymbol{\Phi}^{n}}^{\top} -{\boldsymbol{\Phi}^{n-1}}^{\top} -c {\boldsymbol{\Psi}^{n-1}}^{\top}\right) \frac{\partial \dot{\textbf{U}}^{N-n}}{\partial d_i}\\ &&+ \sum\limits_{n = 1}^{N-1}\left( {\boldsymbol{\Lambda}^{n}}^{\top} \frac{\partial \textbf{R}^{N-n}}{\partial \ddot{\textbf{U}}} -b {\boldsymbol{\Phi}^{n}}^{\top} -a {\boldsymbol{\Phi}^{n-1}}^{\top} \right.\\ &&-\left.e {\boldsymbol{\Psi}^{n}}^{\top} -d {\boldsymbol{\Psi}^{n-1}}^{\top} \right) \frac{\partial\ddot{\textbf{U}}^{N-n}}{\partial d_{i}}\\ &&+\left( \mu_{N} \frac{\partial G^{N}}{\partial \textbf{U}} +{\boldsymbol{\Lambda}^{0}}^{\top}\frac{\partial \textbf{R}^{N}}{\partial\textbf{U}} +{\boldsymbol{\Psi}^{0}}^{\top} \right) \frac{\partial \textbf{U}^{N}}{\partial d_i}\\ &&+\left( \mu_{N} \frac{\partial G^{N}}{\partial \dot{\textbf{U}}} +{\boldsymbol{\Lambda}^{0}}^{\top} \frac{\partial \textbf{R}^{N}}{\partial \dot{\textbf{U}}^{n}}{\partial d_{i}} + {\boldsymbol{\Phi}^{0}}^{\top} \right) \frac{\partial \dot{\textbf{U}}^{N}}{\partial d_i} \\ &&+\left( {\boldsymbol{\Lambda}^{0}}^{\top} \frac{\partial \textbf{R}^{N}}{\partial\ddot{\textbf{U}}} - b {\boldsymbol{\Phi}^{0}}^{\top} - e {\boldsymbol{\Psi}^{0}}^{\top}\right) \frac{\partial\ddot{\textbf{U}}^{N}}{\partial d_{i}} .\\ \end{array} $$
(108)

To annihilate \(\partial \ddot {\textbf {U}}^{N} / \partial d_{i}\), \(\partial \dot {\textbf {U}}^{N} / \partial d_{i}\) and UN/di, we first solve the adjoint problem

$$ {\textbf{K}^{N}}^{\top} \boldsymbol{\Lambda}^{0}=-b \mu_{N} \frac{\partial G^{N}}{\partial \dot{\textbf{U}}}^{\top} -e \mu_{N} \frac{\partial G^{N}}{\partial \textbf{U}}^{\top} , $$
(109)

for Λ0, then we evaluate Φ0 from

$$ {\boldsymbol{\Phi}^{0}} = -\mu_{N} \frac{\partial G^{N}}{\partial \dot{\textbf{U}}}^{\top} -\frac{\partial \textbf{R}^{N}}{\partial \dot{\textbf{U}}}^{\top} {\boldsymbol{\Lambda}^{0}} . $$
(110)

and Ψ0 from either of the following options

$$\begin{array}{@{}rcl@{}} {\boldsymbol{\Psi}^{0}} &=&-\mu_{N} \frac{\partial G^{N}}{\partial \textbf{U}}^{\top} -\frac{\partial \textbf{R}^{N}}{\partial \textbf{U}}^{\top} \boldsymbol{\Lambda}^{0} , \end{array} $$
(111)
$$\begin{array}{@{}rcl@{}} &=& \frac{1}{e } \left( \frac{\partial \textbf{R}^{N}}{\partial\ddot{\textbf{U}}}^{\top} {\boldsymbol{\Lambda}^{0}} -b {\boldsymbol{\Phi}^{0}}\right) , \end{array} $$
(112)

where (112) holds for β≠ 0. We next initialize the sensitivity from (62).

To annihilate \(\partial \ddot {\textbf {U}}^{N-n} / \partial d_{i}\), \(\partial \dot {\textbf {U}}^{N-n} / \partial d_{i}\) and UNn/di, we march in time tn for n = 1,2,...,N − 1 by solving

$$\begin{array}{@{}rcl@{}} {\textbf{K}^{N-n}}^{\top} \boldsymbol{\Lambda}^{n}&=&-b \mu_{N} \frac{\partial G^{N}}{\partial \dot{\textbf{U}}}^{\top} -e \mu_{N} \frac{\partial G^{N}}{\partial \textbf{U}}^{\top} \\&&+ {\Delta} t {\boldsymbol{\Phi}^{n-1}} +\left( \gamma+\frac{1}{2}\right){\Delta} t^{2} {\boldsymbol{\Psi}^{n-1}} , \end{array} $$
(113)

for Λn, updating Φn from

$$ {\boldsymbol{\Phi}^{n}} = {\boldsymbol{\Phi}^{n-1}} +c {\boldsymbol{\Psi}^{n-1}} -\mu_{N-n} \frac{\partial G^{N-n}}{\partial \dot{\textbf{U}}^{n}}^{\top} -\frac{\partial \textbf{R}^{N-n}}{\partial \dot{\textbf{U}}}^{\top} {\boldsymbol{\Lambda}^{n}} , $$
(114)

computing Ψn by either option

$$\begin{array}{@{}rcl@{}} {\boldsymbol{\Psi}^{n}} &=& {\boldsymbol{\Psi}^{n-1}} -\mu_{N-n} \frac{\partial G^{N-n}}{\partial\textbf{U}}^{\top} -\frac{\partial \textbf{R}^{N-n}}{\partial \textbf{U}}^{\top}{{\boldsymbol{\Lambda}}^{n}} , \end{array} $$
(115)
$$\begin{array}{@{}rcl@{}} &=& \frac{1}{e } \left( -d {\boldsymbol{\Psi}^{n-1}} +\frac{\partial \textbf{R}^{N-n}}{\partial \ddot{\textbf{U}}}^{\top} \boldsymbol{\Lambda}^{n} -b {\boldsymbol{\Phi}^{n}} -a {\boldsymbol{\Phi}^{n-1}} \right) \\ \end{array} $$
(116)

and updating DF/Ddi from (66).

Finally, to annihilate \(\partial \ddot {\textbf {U}}^{0} / \partial d_{i}\), we solve

$$ {\textbf{K}^{0}}^{\top} {\boldsymbol{\Lambda}^{N}} =a {\boldsymbol{\Phi}^{N-1}} +d {\boldsymbol{\Psi}^{N-1}} , $$
(117)

for ΛN and we update

$$\begin{array}{@{}rcl@{}} \frac{\mathrm{D} F}{\mathrm{D} d_{i}} &\leftarrow& \frac{\mathrm{D} F}{\mathrm{D} d_{i}} + \mu_{0} \frac{\partial G^{0}}{\partial d_{i}} + {{\boldsymbol{\Lambda}}^{N}}^{\top} \frac{\partial {\textbf{R}}^{0}}{\partial d_{i}} \\ &&+\left( \mu_{0} \frac{\partial G^{0}}{\partial \textbf{U}} -{\boldsymbol{\Psi}^{N-1}}^{\top} \right) \frac{\partial \textbf{U}^{0}}{\partial d_i} \\ &&+ \left( \mu_{0} \frac{\partial G^{0}}{\partial \dot{\textbf{U}}} -{\boldsymbol{\Phi}^{N-1}}^{\top} -c {\boldsymbol{\Psi}^{N-1}}^{\top} \right) \frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} \\ &&+{\boldsymbol{\Lambda}^{N}}^{\top}\left( \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\frac{\partial \textbf{U}^{0}}{\partial d_i} + \frac{\partial {\textbf{R}}^{0}}{\partial \dot{\textbf{U}}}\frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} \right) . \end{array} $$
(118)

This algorithm is obtained by inserting node C from Fig. 16 into the primal analysis flowchart of Fig. 13.

Fig. 16
figure 16

Adjoint discretize-then-differentiate node for dynamic problem

For semi-analytical implementation, we require \(\partial \textbf {R}^{n} / \partial \dot {\textbf {U}}\) to be symmetric. The adjoint load terms of (110) and (114) are thusly approximated as

$$ \frac{\partial \textbf{R}^{N-n}}{\partial \dot{\textbf{U}}}^{\top} {\boldsymbol{\Lambda}^{n}} \approx \frac{1}{\epsilon} \textbf{R} \left( \textbf{U}^{N-n}, \dot{\textbf{U}}^{N-n}+\epsilon {\boldsymbol{\Lambda}}^{n}, \ddot{\textbf{U}}^{N-n}, \textbf{d}\right) . $$
(119)

The first Ψn option, is restricted to symmetric R/U. Whereby (111) and (115) are approximated as

$$ \frac{\partial \textbf{R}^{N-n}}{\partial \textbf{U}}^{\top}{{\boldsymbol{\Lambda}}^{n}} \approx \frac{1}{\epsilon} \textbf{R} \left( \textbf{U}^{N-n}+\epsilon \boldsymbol{\Lambda}^{n}, \dot{\textbf{U}}^{N-n}, \ddot{\textbf{U}}, \textbf{d}\right) . $$
(120)

For the second Ψn option, considers the more common restriction for which \(\partial \textbf {R} / \partial \ddot {\textbf {U}}\) is symmetric and β≠ 0, whence the terms in (112) and (116) are approximated as

$$ \frac{\partial \textbf{R}^{N-n}}{\partial \ddot{\textbf{U}}}^{\top} \boldsymbol{\Lambda}^{n} \approx \frac{1}{\epsilon} \textbf{R} \left( \textbf{U}^{N-n}, \dot{\textbf{U}}^{N-n}, \ddot{\textbf{U}}+\epsilon \boldsymbol{\Lambda}^{n}, \textbf{d}\right) . $$
(121)

Finally, to compute DF/Ddi in (118), we use the following approximation

$$\begin{array}{@{}rcl@{}} &&\frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\frac{\partial \textbf{U}^{0}}{\partial d_i} + \frac{\partial {\textbf{R}}^{0}}{\partial \textbf{U}}\frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i} \\ &\approx&\frac{1}{\epsilon} \textbf{R} \left( \textbf{U}^{0} +\epsilon \frac{\partial \textbf{U}^{0}}{\partial d_i}, \dot{\textbf{U}}^{0} +\epsilon {\frac{\partial \dot{\textbf{U}}^{0}}{\partial d_i}}, \ddot{\textbf{U}}^{0}, \textbf{d}\right) . \end{array} $$
(122)

The derivative Rn/di of (62), (66), and (118) is approximated from (106).

4.6 Dynamic example

Consider a two identical masses m1 = m2 = 1 that are free to slide over a frictionless horizontal surface. The masses are connected by identical nonlinear springs and identical linear dampers as seen in Fig. 17. The internal force generated by the springs is fe = x + kdx3 where x is the relative displacement of the connected nodes of the spring and the parameter kd = 1 is our design variable. The dampers generate the force \(f_{c}=k_{c} \dot {x}\), where kc = 0.1. There is no external force acting in the two mass-spring-damper system but it is subjected to the initial conditions x1(0) = 0, x2(0) = 1, \(\dot {x_{1}}(0)= 0\) and \(\dot {x_{2}}(0)= 0\). The time domain is t = [0,10], the Newton-Raphson tolerance is 𝜖R < 10− 15 and the Newmark-beta parameters are γ = 1/2 and β = 1/4.

Fig. 17
figure 17

Two mass-spring-damper system

To illustrate the various sensitivity analyses, the response function is

$$ F={\int}_{0}^{10} \left( {x_{1}^{2}}+{x_{2}^{2}}+ \dot{x}_{1}^{2}+ \dot{x}_{2}^{2} \right) \mathrm{d}t , $$
(123)

where the numerical integration is done by the trapezoidal rule. Table 3 shows the computed sensitivities values for the different methods using the perturbation size 𝜖 = 10− 6. The response function converges as the number of time steps increases, thus the values of the sensitivities corresponding to N = 100 differ from those corresponding to N = 1000. For N = 100, the sensitivities obtained by the adjoint method differentiate-then-discretize, do not coincide with the others due to the consistency error (Gunzburger 2003; Jensen et al. 2014). However, this consistency error practically vanishes for N = 1000.

Table 3 Sensitivities for the two mass-spring-damper problem with 𝜖 = 10− 6.

To examine the consistency of the methods, we show ef for the N = 1000 case and different perturbation sizes, cf. Figure 18. As expected the finite differences show truncation and round off error for large and small perturbations respectively, and the adjoint differentiate-then-discretize method shows a consistency error. Figure 19 illustrates the error ef for 𝜖 = 10− 6 and different time steps, where it is seen that the consistency error of the adjoint differentiate-then-discretize method reduces as the number of time steps increases.

Fig. 18
figure 18

Relative percentage error of the sensitivities obtained by the analytical methods with respect to finite differences for the mass-spring-damper problem N = 1000

Fig. 19
figure 19

Relative percentage error of the sensitivities obtained by the analytical methods with respect to finite differences for the mass-spring-damper problem 𝜖 = 10− 6

To examine the accuracy of the semi-analytical sensitivities, we compute the error es for the N = 1000 case, cf. Fig. 20. Again, as expected, the semi-analytical sensitivities exhibit truncation and round off error for small and large perturbation sizes respectively.

Fig. 20
figure 20

Relative percentage error of the semi-analytical sensitivities for the mass-spring-damper problem for N = 1000

Figure 21 shows that the error es for 𝜖 = 10− 6 is fairly independent of the time step size.

Fig. 21
figure 21

Relative percentage error of the semi-analytical sensitivities for the mass-spring-damper problem for 𝜖 = 10− 6

5 Conclusions

Implementation of analytical sensitivity analyses requires detailed knowledge of the analysis program and can be error-prone and time-consuming to implement. Fortunately, these drawbacks may be reduced by adopting the semi-analytical method, where terms in the pseudo or adjoint loads and also in the sensitivities are approximated by finite differences. In this way, we are able to compute these complicated terms using subroutines that are used for the solution of the primal problem and maintain the efficiency of the analytical methods. That said, the accuracy of the semi-analytical sensitivities is susceptible to truncation, round-off errors, and additional errors if the convergence tolerance of the primal analysis is not sufficiently small.

In transient and dynamic problems, the semi-analytical sensitivity analysis approach affects both restrictive assumptions and accuracy. In particular, expressions for the adjoint differentiate-then-discretize and discretize-then-differentiate approaches differ because the differentiation and discretization steps do not commute. The differentiate-then-discretize approach requires some terms to be constant, e.g., mass matrix, in order to reuse the tangent stiffness matrix from the primal analysis; however, the first and last tangent stiffness matrices are not reused. This is not the case for the direct and the adjoint discretize-then-differentiate methods where the tangent stiffness matrix is reused for all time steps. Furthermore, the adjoint differentiate-then-discretize approach yields consistency error, albeit they reduce with the time step size.

In most cases, the semi-analytical adjoint approaches for the nonlinear transient and nonlinear dynamic systems require symmetry of Rn/U, \({\partial \textbf {R}^{n}}/{\partial \dot {\textbf {U}}}\), and/or \(\partial \textbf {R}^{n}/ \partial \ddot {\textbf {U}}\). This may be problematic, as Rn/U is usually asymmetric in nonlinear problems. Fortunately, if we do not use an explicit method, the semi-analytical discretize-then-differentiate adjoint method can accommodate asymmetric Rn/U. A summary of these restrictions is presented in Tables 4 and 5. Example problems are provided to show the efficiency and errors associated with the various methods for nonlinear transient and nonlinear dynamic problems.

Table 4 Restrictions for semi-analytical adjoint methods for transient problems
Table 5 Restrictions for semi-analytical adjoint methods for dynamic problems