1 Introduction

In structural design involving steel, it is often required to use a plasticity model to determine the plastic strain, or permanent deformation, which occurs in the structure when it undergoes a stress that exceeds a value known as the yield strength (Han and Reddy 2013). As the time-dependent force evolves, if the yield strength remains constant everywhere in the structure, the resulting phenomenon is called perfect plasticity, otherwise is called plasticity with hardening. Using a hardening law, one can determine the shift in the yield strength and measure how ductile the material is. Plasticity modeling has been developed significantly since the 1960’s. At the heart of the model lies the Hill’s principle and its equivalent Drucker Illyushin principle (Marigo 2000). The plasticity model is often simplified by assuming that the evolution of the force is slow. This assumption results in a quasi-static plasticity model, which has been largely studied theoretically. The model, when written in its variational formulation, is an inequality which can be expressed either in a dual form or in a primal form. One of the interest of the primal form is that it can be shown to be a well-posed problem (Han et al. 1997). While the primal formulation illuminates the theoretical properties of the solution to the plasticity problem, it is not easily amenable to numerical resolution. Therefore, one instead resorts to the radial return algorithm (Simo and Taylor 1986; Simo and Hughes 2006) which discretizes the governing equations of hardening-based plasticity using an implicit Euler scheme.

Shape and topology optimization is a powerful tool to determine an optimal design satisfying several design constraints. The optimization algorithm necessitates a method to describe the shape, which can either be approximated by a continuous density function or represented exactly by a level-set function. Most research in shape and topology optimization is based on density methods (Bendsoe and Sigmund 2013). The idea of capturing fronts and interfaces using a level-set function was introduced by Osher and Sethian (Osher and Fedkiw 2006) and integrated into the shape and topology optimization framework in Allaire et al. (2004), Wang et al. (2003). Most of the shape and topology optimization algorithms are developed for linearized elasticity problems while less focus is given to nonlinear ones. Non-linearities can arise due to material properties (plasticity, damage), contact boundary conditions, hyper-elasticity (Klarbring and Strömberg 2013), large displacement (large strain Wallin et al. 2015; Buhl et al. 2000; finite strain Wallin et al. 2016) and structural buckling (Lindgaard and Dahl 2013). Topology optimization using density approaches or SIMP (Solid Isotropic material with Penalization) was applied to elasto-plastic problems (Maute et al. 1998; Wallin et al. 2016; Kato et al. 2015; Bogomolny and Amir 2012; Li et al. 2017), visco-elastic problems (James and Waisman 2015) and visco-elasto-plastic problems (Nakshatrala and Tortorelli 2016). A common feature in all the previous works is the determination of a design gradient by differentiating the space and time-discretized schemes of the plasticity models, which are approximated using a fictitious material density. The material properties like the Young’s modulus and the hardening coefficients are modified using this material density raised to a certain exponent. This exponent is different for every mechanical property and ought to be chosen in an ad hoc manner, ensuring numerical stability. The optimized shape obtained have intermediate densities undergoing a plastic flow, which might actually be artificial. In the level-set framework, since the material properties are not approximated using material densities, such artificial plastic zones are avoided.

The level-set method for topology optimization was applied to a simplified version of perfect plasticity in Maury et al. (2018). There, the first time step of time-discretized perfect plasticity, also known as the Hencky’s model, was approximated using Perzyna penalization and the resulting approximation was shown to be well-posed. The model did not take into account hardening laws, the time dependence or the irreversibility of the plasticity problem. As soon as one incorporates the irreversibility of the plastic flow and hardening laws, one ends up with a variational inequation with a complex theoretical and numerical treatment.

In this article, we apply the level-set method to quasi-static plasticity with linear kinematic and linear isotropic hardening. Unlike all other previous works, the quasi-static plasticity is considered in its primal form and the shape derivative is determined for the continuous problem. The primal form being non-smooth is not differentiable. Nevertheless, we construct an approximate problem that is differentiable using a penalization and regularization technique. We show that the approximate problem is well-posed and the corresponding solution converges to the actual solution. Then, we compute the shape derivative for minimizing an objective function with the approximate problem as a constraint. As usual, this shape derivative involves an adjoint problem. Thanks to the proposed penalization and regularization, the time-discretized version of this adjoint problem is proved to be well-posed. It is well known that the original primal problem is not differentiable in the usual sense but admits only a so-called conical derivative (see Mignot 1976; Sokolowski and Zolésio 1992; Maury et al. 2018). Similarly, there is no rigorous notion of adjoint for this primal problem. Nevertheless, to make a comparison with our regularized adjoint, we present a formal approach, relying on strong assumptions (not always realistic), which allows us to give a shape derivative and an adjoint problem for the primal problem. In some sense, this “formal” or “naive” shape derivative and adjoint equation of the original problem should be the limits of our regularized shape derivative and adjoint equations when the penalization and regularization coefficients go to zero. However, we do not perform such a limit analysis, which of course would require strong assumptions, and rather we content ourselves in pursuing a pedagogical goal in Sect. 3.

The efficiency of the shape derivative (obtained with our penalization and regularization process) is assessed by optimizing some numerical examples in 2D and 3D. The plasticity problem is numerically solved using the radial return algorithm. One salient feature of our approach is that the geometry is captured at each iteration of the optimization process by a body-fitted mesh, which nevertheless allows for topology changes. This is possible thanks to the open-source library MMG (Dapogny et al. 2014). This library meshes a moving domain defined by a level-set on a mesh of the initial domain and its exterior, allowing for possible topology changes. Unlike XFEM (Duysinx et al. 2006), where the mesh elements can become heavily distorted, our capturing of the geometry using MMG ensures a much better mesh and thus an accurate calculation of the shape derivative. We present numerical case studies of a cantilever and a wedge in 2D and in 3D. The two geometries are loaded with a uniaxial force that increases monotonically in time. We compare the shapes optimized for plasticity with the ones optimized for linear elasticity. For the 2D wedge, we also consider a force that changes its direction with time. While optimizing the shape for such a force, we show that the dependence of the optimized topology on the forcing history is significant.

The remainder of this paper is organized as follows: In Sect. 2, we present the governing laws of plasticity and recall the primal formulation of plasticity with linear kinematic and linear isotropic hardening. To replace the variational inequality by a smoother nonlinear variational formulation, which is amenable to optimization, the primal formulation is penalized and regularized. We prove the well-posedness of the resulting approximate model and that the approximate solution converges to the actual solution. In Sect. 3, we determine the shape derivative for the proposed penalized-regularized formulation using the well-known Céa’s technique (Céa 1986). For comparison purposes, we formally derive the adjoint problem and the shape derivative of the primal problem. In Sect. 4, we discretize the approximate problem and its adjoint in space-time, discuss its numerical resolution with the open-source software FreeFEM++ (Hecht 2012), and then briefly recall the level-set method and our optimization algorithm. Finally, Sect. 5 presents several test-cases in 2D and 3D.

2 Variational formulation of plasticity

We first present the laws governing plasticity with linear kinematic and isotropic hardening. Using these laws, we derive the primal variational formulation. This formulation is further subject to penalization and then to regularization to make it differentiable. This section closes with some statements about the well-posedness and the convergence of the solution of the penalized-regularized plasticity model towards the actual solution.

2.1 Governing equations

Throughout this article, we adopt the convention of representing non-scalar mathematical entities by bold symbols. We consider a structure represented by a smooth bounded open set \(\Omega \subset \mathbb {R}^d,\ d=2\) or 3 and a bounded time interval [0, T]. Let \(\mathcal {M}^d_s\) denote the set of symmetric \(d\times d\) matrices and \(\mathbb {I}\) represent the fourth-order identity tensor of dimension d. The structure, having a boundary \(\partial \Omega = \Gamma _N \cup \Gamma _D \cup \Gamma ,\) is fixed on \(\Gamma _D\) and loaded on \(\Gamma _N\) as shown in Fig. 1.

Fig. 1
figure 1

Boundary conditions on the structure \(\Omega\)

Plasticity is a quasi-static process as we now describe (see Han and Reddy 2013 for more details). Let \(\varvec{u}:\Omega \times [0,T]\rightarrow \mathbb {R}^d\) denote the displacement field, \({\varvec{\sigma }}:\Omega \times [0,T]\rightarrow \mathcal {M}^d_s\) denote the stress tensor, \(\varvec{n}\) denote the outward normal to \(\partial \Omega .\) The structure when subjected to a time-dependent body force \(\varvec{f}:\Omega \times [0,T]\rightarrow \mathbb {R}^d\) and a surface force \(\varvec{g}:\Gamma _N\times [0, T]\rightarrow \mathbb {R}^d,\) satisfies the equilibrium equation:

$$\begin{aligned} \hbox {div}({\varvec{\sigma }}) + \varvec{f}&= \varvec{0} \qquad \hbox {in }\Omega \times (0,T], \end{aligned}$$
(1a)
$$\begin{aligned} {\varvec{\sigma }}\cdot \varvec{n}&= \varvec{g} \qquad \hbox {on } \Gamma _N \times (0,T],\end{aligned}$$
(1b)
$$\begin{aligned} {\varvec{\sigma }}\cdot \varvec{n}&= \varvec{0} \qquad \hbox {on } \Gamma \times (0,T],\end{aligned}$$
(1c)
$$\begin{aligned} \varvec{u}&= \varvec{0} \qquad \hbox {on }\Gamma _D\times (0,T]. \end{aligned}$$
(1d)

The total strain tensor of the structure \(\varvec{\varepsilon }:\Omega \times [0,T]\rightarrow \mathcal {M}^d_s,\) expressed in terms of \(\varvec{u},\) \(\varvec{\varepsilon }= \varvec{\varepsilon }(\varvec{u}) = (\nabla \varvec{u}+(\nabla \varvec{u})^T)/2\) can be decomposed as

$$\varvec{\varepsilon } = \varvec{\varepsilon }_e + \varvec{\varepsilon }_p,$$

where \(\varvec{\varepsilon }_e\) denotes the elastic strain and \(\varvec{\varepsilon }_p,\) the plastic strain. Plasticity occurs when the magnitude of \(\varvec{\sigma }\) exceeds the yield strength, a material parameter determined experimentally. Hardening occurs when the plastic flow is followed by a change in yield strength. The hardening is modeled by a stress-like hardening tensor \(\varvec{q}:\Omega \times [0,T]\rightarrow \mathcal {M}^d_s,\) a scalar force \(g:\Omega \times [0,T]\rightarrow \mathbb {R},\) and the corresponding internal variable, \(\varvec{r}:\Omega \times [0,T]\rightarrow \mathcal {M}^d_s ,\) \(\gamma :\Omega \times [0,T]\rightarrow \mathbb {R},\) respectively. To define the structure’s elastic limit, we consider the von Mises yield criterion (Simo and Hughes 2006)

$$\begin{aligned} {f}(\varvec{\sigma }, \varvec{q}, g) = |{\varvec{\sigma }}^D - \varvec{q}^D| + \sqrt{\frac{2}{3}}\left( g - {\sigma }_Y\right) \le 0, \end{aligned}$$
(2)

where the superscript D denotes the deviatoric part of a tensor and \(\sigma _Y\in \mathbb {R}^+\) is the yield strength. This criterion defines the elastic domain

$$\begin{aligned} \mathbb {E} = \{ (\varvec{\sigma }, \varvec{q}, g) : f(\varvec{\sigma }, \varvec{q}, g) \le 0\}, \end{aligned}$$

which by definition, is convex. The structure is made of an isotropic material, with Hooke’s tensor given by

$$\begin{aligned} \mathbb {C} = \lambda \varvec{1}\otimes \varvec{1} + 2\mu \mathbb {I}, \end{aligned}$$

where \(\lambda ,\mu\) are Lamé constants. We place ourselves in the framework of associated plasticity, namely, the plastic flow rate is proportional to the normal of the elastic domain. We first state the second law of thermodynamics

$$\begin{aligned} \varvec{\sigma }:\dot{\varvec{\varepsilon }} - \dot{\varvec{\psi }} \ge 0, \end{aligned}$$
(3)

where the overdot denotes differentiation with respect to time and \(\varvec{\psi }\) is the Helmholtz free energy, given by the sum

$$\begin{aligned} \varvec{\psi }= \varvec{\psi } (\varvec{\varepsilon }_e, \varvec{r}, \gamma ) = \hat{\varvec{\psi }}_e(\varvec{\varepsilon }_e) + \hat{\varvec{\psi }}_p(\varvec{r}, \gamma ) , \end{aligned}$$

where the elastic and plastic energies are respectively defined as

$$\begin{aligned} \hat{\varvec{\psi }}_e(\varvec{\varepsilon }_e) = \frac{1}{2}\mathbb {C}\varvec{\varepsilon }_e:\varvec{\varepsilon }_e\quad \hbox {and}\quad \hat{\varvec{\psi }}_p(\varvec{r},\gamma ) = \frac{1}{2} \mathbb {H}\varvec{r}:\varvec{r} + \frac{1}{2}E_{\text {iso}}\gamma ^2 , \end{aligned}$$

where \(\mathbb {H}\) is the hardening tensor and \(E_{iso}\ge 0\) is a material parameter. On the other hand the stress is assumed to be \(\varvec{\sigma } = {\varvec{\sigma }}(\varvec{\varepsilon }_e)\). Using these definitions, the second law (3) is re-written as

$$\begin{aligned} \left( {\varvec{\sigma }} - \frac{\partial \hat{\varvec{\psi }}_e}{\partial \varvec{\varepsilon }_e } \right) :\dot{\varvec{\varepsilon }} + {\varvec{\sigma }}:\dot{\varvec{\varepsilon }}_p - \frac{\partial \hat{\varvec{\psi }}_p}{\partial \varvec{\varepsilon }_p}:\dot{\varvec{\varepsilon }}_p - \frac{\partial \hat{\varvec{\psi }}_p}{\partial \varvec{r}}:\dot{\varvec{r}} - \frac{\partial \hat{\varvec{\psi }}_p}{\partial \gamma }\dot{\gamma } \ge 0. \end{aligned}$$
(4)

Using Coleman-Noll arguments (Coleman and Gurtin 1967), we deduce

$$\begin{aligned} {\varvec{\sigma }} = \frac{\partial \hat{\varvec{\psi }}_e}{\partial \varvec{\varepsilon }_e } =\mathbb {C}\varvec{\varepsilon }_e = \mathbb {C}(\varvec{\varepsilon }(\varvec{u})-\varvec{\varepsilon }_p). \end{aligned}$$
(5)

Now, the power dissipation function \(\mathcal {D}\) is introduced as the difference between the external power and the rate of change of Helmholtz free energy

$$\begin{aligned} \mathcal {D} = {\varvec{\sigma }}:\dot{\varvec{\varepsilon }}_p + \varvec{q}:\dot{\varvec{r}} + g\dot{\gamma } \end{aligned}$$

where

$$\begin{aligned} \varvec{q} = -\frac{\partial \hat{\varvec{\psi }}_p}{\partial \varvec{r}} = -\mathbb {H}\varvec{r} \quad \hbox {and}\quad g = -\frac{\partial \hat{\varvec{\psi }}_p}{\partial \gamma } = - E_{\text {iso}}\gamma . \end{aligned}$$
(6)

Substituting \(\mathcal {D}\) in (4), we get

$$\begin{aligned} \mathcal {D}\ge 0. \end{aligned}$$

This is exactly Hill’s principle (or second law of thermodynamics) and is equivalent to the Drucker-Illyushin’s principle of maximum work which states that for any stress state \(({\varvec{\sigma }}, \varvec{q}, g)\) in \(\mathbb {E},\) the plastic flow variables \((\dot{\varvec{\varepsilon }}_p, \dot{\varvec{r}},\dot{\gamma })\) must satisfy

$$\begin{aligned} { \varvec{\sigma }} :\dot{\varvec{\varepsilon }}_p + \varvec{q}:\dot{\varvec{r}} + g \dot{\gamma } \ge { \varvec{\tau }}:\dot{\varvec{\varepsilon }}_p +\varvec{p}:\dot{\varvec{r}} + k\dot{\gamma } \qquad \forall ({\varvec{\tau }},\varvec{p}, k)\in \mathbb {E}. \end{aligned}$$
(7)

Since the set \(\mathbb {E}\) is invariant by addition of a multiple of the identity tensor to \(\varvec{\sigma }\) and \(\varvec{q}\), (7) implies that necessarily the trace of \(\dot{\varvec{\varepsilon }}_p +\dot{\varvec{r}}\) vanishes. Furthermore, (7) yields the following characterization of \(\mathcal {D}\)

$$\begin{aligned} \mathcal {D}(\dot{\varvec{\varepsilon }}_p, \dot{\varvec{r}}, \dot{\gamma }) = \sup _{({\varvec{\tau }},\varvec{p}, k)\in \mathbb {E}}\left( { \varvec{\tau }}:\dot{\varvec{\varepsilon }}_p +\varvec{p}:\dot{\varvec{r}} + k\dot{\gamma }\right) , \end{aligned}$$
(8)

where the supremum is attained at \((\varvec{\sigma }, \varvec{q}, g).\) This maximization ensures that the normality law is satisfied (Han and Reddy 2013)

$$\begin{aligned}&f(\varvec{\sigma }, \varvec{q}, g) < 0 \implies \dot{\varvec{\varepsilon }}_p = \varvec{0},\ \dot{\varvec{r}} = \varvec{0},\ \dot{\gamma } = 0 \nonumber \\&f(\varvec{\sigma }, \varvec{q}, g) = 0 \implies \dot{\varvec{\varepsilon }}_p = \zeta {\partial _{\varvec{\sigma }} f},\ \dot{\varvec{r}} = \zeta \partial _{\varvec{q}}f ,\ \dot{\gamma } = \zeta {\partial _g f}, \end{aligned}$$
(9)

where \(\zeta\) is a Lagrange multiplier satisfying

$$\begin{aligned} \zeta \ge 0 \quad \hbox { and } \quad \zeta f(\varvec{\sigma }, \varvec{q}, g) =0. \end{aligned}$$

The derivatives of f (normal to the elastic domain) are given by

$$\begin{aligned} {\partial _{\varvec{\sigma }} f} = \frac{\varvec{\sigma }^D - \varvec{q}^D}{|\varvec{\sigma }^D - \varvec{q}^D|},\quad {\partial _{\varvec{q}} f} = -\frac{\varvec{\sigma }^D - \varvec{q}^D}{|\varvec{\sigma }^D - \varvec{q}^D|}\quad \hbox {and}\quad {\partial _g f} = \sqrt{\frac{2}{3}}. \end{aligned}$$

The multiplier \(\zeta\) is determined by imposing the consistency condition \(\dot{f} = 0\) (Simo and Hughes 2006) and in our case (of linear isotropic and kinematic hardening) an analytic formula is available in the plastic zone (where \(f =0\))

$$\begin{aligned} \zeta = \frac{ \partial _{\varvec{\sigma }}f:\dot{\varvec{\sigma }} }{ \sqrt{\frac{2}{3}}E_{\text{iso}} + \mathbb {H}\partial _{\varvec{\sigma }}f:\partial _{\varvec{\sigma }}f }. \end{aligned}$$

From (9), we get \(\dot{\varvec{\varepsilon }}_p = -\dot{\varvec{r}}\). Assuming that the plastic variables \(\varvec{\varepsilon }_p\) and \(\varvec{r}\) are zero at the initial time instant, we deduce \(\varvec{\varepsilon }_p = -\varvec{r}\) for all time t. The internal variable \(\varvec{r}\) has thus been characterized and \(\mathcal {D}(\dot{\varvec{\varepsilon }}_p, \dot{\varvec{r}}, \dot{\gamma }) = \mathcal {D}(\dot{\varvec{\varepsilon }}_p, \dot{\gamma })\). Using the definition of a sub-differential, the maximization (8) can then be written as

$$\begin{aligned} \mathcal {D}(\varvec{\varepsilon }_{q}, \mu ) \ge \mathcal {D}(\dot{\varvec{\varepsilon }}_p, \dot{\gamma }) +({{\varvec{\sigma }}}-\varvec{q}):(\varvec{\varepsilon }_{q}-\dot{\varvec{\varepsilon }}_p) + g(\mu - \dot{\gamma }) \quad \forall \varvec{\varepsilon }_{q}\in \mathcal {M}^d_s, \mu \in \mathbb {R}. \end{aligned}$$
(10)

The primal variables are \((\varvec{u}, \varvec{\varepsilon }_p, \gamma ).\) We wish to work with a primal formulation, and hence we need an expression of \(\mathcal {D}(\dot{\varvec{\varepsilon }}_p, \dot{\gamma })\) in terms of the primal variables. The dissipation function \(\mathcal {D}\) satisfies (Reddy and Martin 1994)

$$\begin{aligned} \mathcal {D}(\dot{\varvec{\varepsilon }}_p,\dot{\gamma }) = {\left\{ \begin{aligned} \sqrt{\frac{2}{3}}\sigma _Y |\dot{\varvec{\varepsilon }}_p|\ \quad \hbox {if}\quad&\sqrt{\frac{2}{3}}|\dot{\varvec{\varepsilon }}_p|\le \dot{\gamma }, \\ \infty \quad \hbox {if}\quad&\sqrt{\frac{2}{3}}|\dot{\varvec{\varepsilon }}_p|> \dot{\gamma }. \end{aligned}\right. } \end{aligned}$$
(11)

The above expression is obtained by substituting \(f=0\) in (8) and performing simple algebra to determine the variables \((\varvec{\sigma },\varvec{q},g),\) which maximize \(\mathcal {D}.\) The first expression in (11) also follows from a simple substitution of (9) in (8). As a consequence, the domain of \(\mathcal {D}\) is defined by

$$\begin{aligned} \hbox {dom}\mathcal {D} = \left\{ (\dot{\varvec{\varepsilon }}_{p}, \dot{\gamma }), \ \sqrt{\frac{2}{3}}|\dot{\varvec{\varepsilon }}_{p}|\le \dot{\gamma } \hbox { a.e. in }\Omega \right\} . \end{aligned}$$
(12)

Eventually, the plasticity model, used in this paper, is:

$$\begin{aligned} \varvec{\sigma }&= \mathbb {C}(\varvec{\varepsilon }(\varvec{u})-\varvec{\varepsilon }_p)\qquad&\hbox {in }\Omega \times (0,T], \end{aligned}$$
(13)
$$\begin{aligned} \hbox {div}({\varvec{\sigma }}) + \varvec{f}&= \varvec{0} \qquad&\hbox {in }\Omega \times (0,T],\nonumber \\ {\varvec{\sigma }}\cdot \varvec{n}&= \varvec{g} \qquad&\hbox {on } \Gamma _N \times (0,T],\nonumber \\ {\varvec{\sigma }}\cdot \varvec{n}&= \varvec{0} \qquad&\hbox {on } \Gamma \times (0,T],\nonumber \\ \varvec{u}&= \varvec{0} \qquad&\hbox {on }\Gamma _D\times (0,T],\nonumber \\ \sqrt{\frac{2}{3}}\sigma _Y |{\varvec{\varepsilon }}_q|&\ge \sqrt{\frac{2}{3}}\sigma _Y |\dot{\varvec{\varepsilon }}_p| +({{\varvec{\sigma }}}-\mathbb {H}{\varvec{\varepsilon }}_p):(\varvec{\varepsilon }_{q}-\dot{\varvec{\varepsilon }}_p) - E_{\text {iso}}\gamma (\mu - \dot{\gamma }) \quad \forall (\varvec{\varepsilon }_{q}, \mu) \in \hbox {dom}\mathcal {D} \qquad&\hbox {on }\Omega \times (0,T] . \end{aligned}$$
(14)

The inequality (14) is obtained by injecting (6) and (11) in (10).

Very often, the partial differential equations (13) are solved in conjunction with the ordinary differential equations (9). But here, we solve (13) coupled to the inequation (14). This coupling, which is purely in terms of the variables \((\varvec{u},\varvec{\varepsilon }_p,\gamma )\) results in the so-called primal formulation.

If the dissipation function \(\mathcal {D}\) is expressed in terms of the stress-variables, the plasticity problem is formulated in terms of \((\varvec{u},\varvec{\sigma },\varvec{q},g)\) resulting in the dual formulation. The analytical treatment of the primal formulation being much easier than that of the dual formulation, we have chosen the former in this article.

2.2 Primal formulation

The material tensors \(\mathbb {C}\) and \(\mathbb {H}\) are assumed to be coercive, i.e., \(\exists \ c_0>0,\ \exists \ h_0>0\) such that, \(\forall \varvec{\xi }\in \mathcal {M}^d_s\),

$$\begin{aligned} \mathbb {C}\varvec{\xi }:\varvec{\xi } \ge c_0 |\varvec{\xi }|^2\quad \hbox {and} \quad \mathbb {H}\varvec{\xi }:\varvec{\xi } \ge h_0 |\varvec{\xi }|^2 . \end{aligned}$$

We define the displacement space

$$\begin{aligned} V = \{ \varvec{u}\in H^1(\Omega )^d ,\ \varvec{u}=\varvec{0} \hbox { on } \Gamma _D \} \end{aligned}$$

and the space of plastic strain Q as

$$\begin{aligned} Q = \{\varvec{\varepsilon }_q\in L^2(\Omega )^{d\times d}, \ \hbox {tr}(\varvec{\varepsilon }_q)=\varvec{0}\ \hbox {a.e. in }\Omega \}. \end{aligned}$$

We then define the product space

$$\begin{aligned} Z = V\times Q\times L^2(\Omega ) , \end{aligned}$$
(15)

where we seek the solution \(\varvec{w} =(\varvec{u}, \varvec{\varepsilon }_p, \gamma )\). The space Z is a Hilbert space equipped with the scalar product, for \(\varvec{w} = (\varvec{u}, \varvec{\varepsilon }_p, \gamma )\) and \(\varvec{z} = (\varvec{v}, \varvec{\varepsilon }_q, \mu )\),

$$\begin{aligned} \langle \varvec{w}, \varvec{z} \rangle = \int _{\Omega }\varvec{u}\cdot \varvec{v}\ \ dx + \int _{\Omega }\varvec{\varepsilon }_p:\varvec{\varepsilon }_q\ \ dx + \int _{\Omega }\gamma \mu \ \ dx . \end{aligned}$$
(16)

Let \(Z^*\) be the dual space of Z. The forces are assumed to be smooth as

$$\begin{aligned} \varvec{f}\in H^1([0,T], L^2(\Omega )^d)\quad \hbox {and}\quad \varvec{g}\in H^1([0,T], L^2(\Gamma _N)^d). \end{aligned}$$

Indeed, since \(H^1([0,T],H) \subset \mathcal {C}^0([0,T], H)\) for any Hilbert space H, at any time t the forces \(\varvec{f}(t)\) and \(\varvec{g}(t)\) are well defined. We introduce a bilinear form \(a:Z\times Z\rightarrow \mathbb {R}\),

$$\begin{aligned} a(\varvec{w}, \varvec{z}) = \int _{\Omega } \left( \mathbb {C}(\varvec{\varepsilon }(\varvec{u}) - \varvec{\varepsilon }_p):(\varvec{\varepsilon }(\varvec{v})-\varvec{\varepsilon }_q) + \varvec{\varepsilon }_q :\mathbb {H} \varvec{\varepsilon }_p + E_{\text {iso}}\gamma \mu \right) \ \ {\text {d}}x , \end{aligned}$$
(17)

and a linear form \(l_t:Z\rightarrow \mathbb {R}\) such that

$$\begin{aligned} l_t(\varvec{z}) = \int _{\Omega } \varvec{f}(t)\cdot \varvec{v} \ \ {\text {d}}x + \int _{\Gamma _N} \varvec{g}(t)\cdot \varvec{v} \ {\text {d}}s, \end{aligned}$$
(18)

with the forces \(\varvec{f}(t)\in L^2(\Omega )^d\), \(\varvec{g}(t)\in L^2(\Gamma _N)^d\) and a nonlinear convex functional \(j:Z\rightarrow \mathbb {R}\) such that

$$\begin{aligned} j(\varvec{z}) = \int _{\Omega } \mathcal {D}(\varvec{\varepsilon }_q, \mu )\ \ dx, \end{aligned}$$
(19)

where \(\mathcal {D}(\varvec{\varepsilon }_q, \mu )\) is defined by (11). This functional \(j(\cdot )\) is convex and lower semi-continuous on Z and it is Lipschitz continuous on the convex set \(K\subset Z\) defined as

$$\begin{aligned} K= V\times \hbox {dom}\mathcal {D}, \end{aligned}$$

where \(\hbox {dom}\mathcal {D}\) is defined by (12). The admissible plastic flow rates \(\dot{\varvec{\varepsilon }}_p, \ \dot{\gamma }\) belong to the convex set \(\hbox {dom}\mathcal {D}\).

Lemma 1

The bilinear form \(a(\cdot ,\cdot )\) defined in (17) is coercive on Z.

Proof

From (17) with \(\varvec{z} = \varvec{w}\in Z\), and for any \(s\in (0,1)\), we get

$$\begin{aligned} a(\varvec{w},\varvec{w})&= \int _{\Omega }\mathbb {C}(\varvec{\varepsilon }(\varvec{u}) - \varvec{\varepsilon }_p):(\varvec{\varepsilon }(\varvec{u}) - \varvec{\varepsilon }_p) \ \ {\text {d}}x + \int _{\Omega }\mathbb {H}\varvec{\varepsilon }_p:\varvec{\varepsilon }_p\ \ dx + \int _{\Omega }E_{iso}\gamma ^2 \ {\text {d}}x \\&\ge c_0 \left\Vert \varvec{\varepsilon }(\varvec{u})\right\Vert _{L^2(\Omega )}^2 + (c_0 + h_0) \left\Vert \varvec{\varepsilon }_p\right\Vert _{L^2(\Omega )}^2 - \int _{\Omega }2 \mathbb {C}\varvec{\varepsilon }(\varvec{u}):\varvec{\varepsilon }_p \ dx + E_{\text {iso}}\left\Vert \gamma \right\Vert _{L^2(\Omega )}^2 \\&\ge (c_0 - c_0(1-s)) \left\Vert \varvec{\varepsilon }(\varvec{u})\right\Vert _{L^2(\Omega )}^2 + \left( c_0 + h_0 - \frac{c_0}{1-s}\right) \left\Vert \varvec{\varepsilon }_p\right\Vert _{L^2(\Omega )}^2 + E_{iso}\left\Vert \gamma \right\Vert _{L^2(\Omega )}^2 \quad s\in (0,1)\\&= c_0s \left\Vert \varvec{\varepsilon }(\varvec{u})\right\Vert _{L^2(\Omega )}^2 + \left( h_0 - \frac{c_0s}{1-s}\right) \left\Vert \varvec{\varepsilon }_p\right\Vert _{L^2(\Omega )}^2 + E_{iso}\left\Vert \gamma \right\Vert _{L^2(\Omega )}^2\\&\ge \min \left( c_0s, \left( h_0 - \frac{c_0s}{1-s}\right) , E_{iso}\right) \left( \left\Vert \varvec{\varepsilon }(\varvec{u})\right\Vert _{L^2(\Omega )}^2 + \left\Vert \varvec{\varepsilon }_p\right\Vert _{L^2(\Omega )}^2 + \left\Vert \gamma \right\Vert _{L^2(\Omega )}^2 \right) . \end{aligned}$$

We choose \(s = \tfrac{h_0}{2c_0+h_0}\) in order to make the right hand side positive for all \(\varvec{w}\in Z.\) Finally using Korn’s inequality, this proves the coercivity of \(a(\cdot ,\cdot )\) on Z.

In order to obtain the primal formulation of (13) and (14), we multiply (1) by \(\varvec{v} - \dot{\varvec{u}},\) use (5) and integrate the product over \(\Omega\) by parts to obtain

$$\begin{aligned} \int _{\Omega }\mathbb {C} (\varvec{\varepsilon }(\varvec{u})-\varvec{\varepsilon }_p):(\varvec{\varepsilon }(\varvec{v})-\varvec{\varepsilon }(\dot{\varvec{u}}))\ {\text {d}}x = \int _{\Omega } \varvec{f}(t)\cdot (\varvec{v}-\dot{\varvec{u}})\ {\text {d}}x + \int _{\Gamma _N}\varvec{g}(t)\cdot (\varvec{v}-\dot{\varvec{u}}){\text {d}}s\qquad \forall \varvec{v} \in V, \end{aligned}$$
(20)

We then integrate (14) over \(\Omega ,\) add (20) to it and obtain the variational inequality, for any \(\varvec{z}\in K\),

$$\begin{aligned}&\int _{\Omega }\sqrt{\frac{2}{3}}\sigma _Y |{\varvec{\varepsilon }}_q|\ {\text {d}}x \ge \int _{\Omega }\sqrt{\frac{2}{3}}\sigma _Y |\dot{\varvec{\varepsilon }}_p|\ {\text {d}}x +\int _{\Omega } \varvec{f}(t)\cdot (\varvec{v}-\dot{\varvec{u}})\ {\text {d}}x + \int _{\Gamma _N}\varvec{g}(t)\cdot (\varvec{v}-\dot{\varvec{u}})\ {\text {d}}s\\&\quad -\int _{\Omega } \left( \mathbb {C}(\varvec{\varepsilon }(\varvec{u}) - \varvec{\varepsilon }_p):(\varvec{\varepsilon }(\varvec{v})-\varvec{\varepsilon }_q - \varvec{\varepsilon }(\dot{\varvec{u}}) + \dot{\varvec{\varepsilon }}_p) + \mathbb {H}\varvec{\varepsilon }_p:(\varvec{\varepsilon }_q - \dot{\varvec{\varepsilon }}_p) + E_{iso}\gamma ( \mu - \dot{\gamma }) \right) \ \ {\text {d}}x . \end{aligned}$$

We complement this variational inequality with the following initial conditions

$$\begin{aligned} \varvec{u}(0) = \mathbf {0},\quad \varvec{\varepsilon }_p(0) = \mathbf {0},\quad \gamma (0) = 0 \qquad \hbox {in }\Omega . \end{aligned}$$

To prove existence and uniqueness of a solution, we rely on theorem 4.3 in Han et al. (1997) which requires some additional regularity in time for the solution. Therefore, we assume that the forces satisfy

$$\begin{aligned} \varvec{f}(0) = \varvec{0} \quad \hbox {in }\Omega \quad \hbox {and}\quad \varvec{g}(0) = \varvec{0}\quad \hbox {on }\Gamma _N. \end{aligned}$$

Using the linear forms and the nonlinear functional defined earlier, we obtain the primal form of the plasticity problem (13) and (14): find \(\varvec{w}(t)=(\varvec{u}, \varvec{\varepsilon }_p, \gamma )(t)\) with \(\varvec{w}(0)=\varvec{0}\) such that \(\dot{\varvec{w}}(t)\in K\) (for almost all \(t\in (0, T)\)) and

$$\begin{aligned} a(\varvec{w}, \varvec{z} - \dot{\varvec{w}}) + j(\varvec{z}) - j(\dot{\varvec{w}}) \ge l_t(\varvec{z}-\dot{\varvec{w}}) \quad \forall \varvec{z}\in K. \end{aligned}$$
(21)

As a result of theorem 4.3 in Han et al. (1997), the variational inequality (21) is well-posed.

Theorem 1

Han et al. (1997) Let Z be a Hilbert space; \(K\subset Z\) be a nonempty, closed, convex cone; \(a:Z\times Z\rightarrow \mathbb {R}\) a continuous bilinear form that is symmetric and coercive; \(j:K\rightarrow \mathbb {R}\) non-negative, convex, positively homogeneous, Lipschitz continuous form; \(l_t\in H^1([0, T], Z^*)\) with \(l_0(\cdot )=0.\) Then there exists a unique \(\varvec{w}\in H^1([0,T], Z)\) satisfying (21).

Remark 1

In the absence of kinematic hardening or \(h_0=0,\) we cannot show the coercivity of \(a(\cdot ,\cdot )\) and thus the well-posedness of the problem (21).

Equation (21) is not shape-differentiable (Mignot 1976; Sokolowski and Zolésio 1992) in the classical sense and we are going to approximate it by a smooth variational equation. The non-differentiability of (21) is due to \(\mathcal {D},\) which is discontinuous exactly where \(\sqrt{\frac{2}{3}}|\dot{\varvec{\varepsilon }}_p| = \dot{\gamma }\) (or equivalently, where \(f = 0\)). Thus the function \(\mathcal {D}\) admits only directional derivatives where the yield limit f is attained.

2.3 Penalization

We approximate the problem (21) posed on the convex set K by a problem posed on the full vector space Z by penalizing the constraint \(\varvec{z}(t)\in K.\) We introduce a small penalization parameter \(0< \epsilon \ll 1\) and modify the dissipative function \(\mathcal {D}(\dot{\varvec{\varepsilon }}_p,\dot{\gamma })\) to \(\mathcal {D}_{\epsilon }(\dot{\varvec{\varepsilon }}_p,\dot{\gamma })\) as

$$\begin{aligned} \mathcal {D}_{\epsilon }(\dot{\varvec{\varepsilon }}_p,\dot{\gamma }) = \sqrt{\frac{2}{3}}\sigma _Y\left( |\dot{\varvec{\varepsilon }}_p| + \frac{1}{\epsilon }\max \left( \sqrt{\frac{2}{3}}|\dot{\varvec{\varepsilon }}_p|-\dot{\gamma }, 0\right) \right) . \end{aligned}$$
(22)

The above penalization is similar to viscoplastic regularization (see Simo and Hughes (2006), equation 7.5b), in the sense that in both situations the stress state is allowed to exceed the von Mises yield limit by some value. However, the exact correspondence between the two is not clear to us.

We then modify \(j(\cdot )\) to \(j_{\epsilon }:Z\rightarrow \mathbb {R}\) as

$$\begin{aligned} j_{\epsilon }(\dot{\varvec{w}}) = \int _{\Omega }\mathcal {D}_{\epsilon }(\dot{\varvec{\varepsilon }}_p,\dot{\gamma })\ \ {\text {d}}x . \end{aligned}$$

Problem (21) is penalized as: find \(\varvec{w}_{\epsilon }(t)\in Z\) such that \(\varvec{w}_{\epsilon }(0)=\varvec{0},\) \(\dot{\varvec{w}}_{\epsilon }(t)\in Z\) and

$$\begin{aligned} a(\varvec{w}_{\epsilon }, \varvec{z}-\dot{\varvec{w}}_{\epsilon }) + j_{\epsilon }(\varvec{z}) - j_{\epsilon }(\dot{\varvec{w}}_{\epsilon }) \ge l_t( \varvec{z} - \dot{\varvec{w}}_{\epsilon } ) \qquad \forall \varvec{z}\in Z. \end{aligned}$$
(23)

The above penalized problem is well-posed as the following theorem shows.

Theorem 2

Problem (23) admits a unique solution \(\varvec{w}_{\epsilon }\in H^1([0,T], Z).\)

Proof

The bilinear form \(a(\cdot ,\cdot )\) is coercive in Z (as shown in Lemma 1), and the nonlinearity \(\mathcal {D}_{\epsilon }(\cdot )\) is convex, positively homogeneous and Lipschitz continuous. \(j_{\epsilon }(\cdot )\) is thus non-negative, convex, positively homogeneous and Lipschitz continuous on Z. With these properties, (23) admits a unique solution \(\varvec{w}_{\epsilon }\in H^1([0,T], Z)\) using Theorem 1.

Now we split \(j_{\epsilon }(\varvec{z})\) in two as

$$\begin{aligned}&j_{\epsilon }(\varvec{z}) = j_1(\varvec{z}) + \frac{1}{\epsilon }j_2(\varvec{z})\nonumber \\&\quad \hbox {with}\qquad j_1(\varvec{z}) = \sqrt{\frac{2}{3}}\sigma _Y\int _{\Omega } |\varvec{\varepsilon }_q|\ {\text {d}}x \qquad \hbox {and}\qquad j_2(\varvec{z}) = \sqrt{\frac{2}{3}}\sigma _Y\int _{\Omega } \max \left( \sqrt{\frac{2}{3}}|\varvec{{\varepsilon }}_q|-{\mu }, 0\right) \ {\text {d}}x. \end{aligned}$$
(24)

By exploiting the convexity of the functional \(j_{\epsilon }(\cdot )\) the next theorem proves that the sequence of solutions to (23) converges weakly and strongly to the solution of (21) as the penalization parameter \(\epsilon\) goes to zero.

Theorem 3

The sequence of solutions \(\varvec{w}_{\epsilon }\) to (23) satisfies, as \(\epsilon \longrightarrow 0\),

$$\begin{aligned} \varvec{w}_{\epsilon } \buildrel *\over \rightharpoonup \varvec{w}\quad \hbox {in }L^{\infty }([0,T],Z) \quad \hbox {and}\quad \dot{\varvec{w}}_{\epsilon } \rightharpoonup \dot{\varvec{w}}\quad \hbox {in }L^{2}([0,T],Z), \end{aligned}$$

where \(\varvec{w}\) is the solution to (21). Moreover as \(\epsilon \longrightarrow 0,\)

$$\varvec{w}_{\epsilon }\longrightarrow \varvec{w} \quad \hbox {in } L^{\infty }([0, T], Z).$$

The proof of Theorem 3 is given in Appendix (A).

2.4 Penalization and regularization

The nonlinearity \(j(\varvec{z})\) being unbounded for \(\varvec{z}\notin K\), (21) is not differentiable with respect to parameters like the shape of the domain (Mignot 1976; Sokolowski and Zolésio 1992). On the contrary, \(j_{\epsilon }(\varvec{z})\) is now bounded on the full space Z, so one should be able to differentiate the penalized formulation (23). However \(j_{\epsilon }(\varvec{z})\) is still non-smooth because of the maximum operator and the norm of the plastic tensor. We therefore need to regularize the nonlinearity \(j_{\epsilon }(\cdot )\).

We introduce a small regularization parameter \(0< \eta \ll 1\). The dissipation function (22) has two kinds of non-smoothness: \(\max (\cdot ,0)\) and \(|\cdot |\) (the Euclidean norm): we regularize them with operators \(M_{\eta }:L^2(\Omega )\rightarrow L^2(\Omega )\) and \(N_{\eta }:Q\rightarrow L^2(\Omega )\) respectively, defined as

$$\begin{aligned} M_{\eta }(\gamma ) = \frac{1}{2}\left( \gamma + \sqrt{\gamma ^2 + \left( \frac{\sigma _Y\eta }{TE}\right) ^2}\right) , \quad N_{\eta }(\varvec{\varepsilon }_p) = \sqrt{\varvec{\varepsilon }_p:\varvec{\varepsilon }_p + \left( \frac{\sigma _Y\eta }{TE}\right) ^2}, \end{aligned}$$

where T is the final time, \(\sigma _Y\) is the yield strength and E is the Young’s modulus. In the above, the factor \(\eta\) is multiplied by \(\tfrac{\sigma _Y}{TE}\) so as to ensure that the regularization is coherent with the order of magnitude of the solution \(\dot{\varvec{\varepsilon }}_p\). For the ease of numerical implementation, a globally smooth regularization is chosen rather than a piecewise regularization. The dissipation function (22) is regularized as

$$\begin{aligned} \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}) = \sqrt{\frac{2}{3}}\sigma _Y\left( N_{\eta }\left( \dot{\varvec{\varepsilon }}_p\right) + \frac{1}{\epsilon }M_{\eta }\left( \sqrt{\frac{2}{3}} N_{\eta }(\dot{\varvec{\varepsilon }}_p) - \dot{\gamma } \right) \right) , \end{aligned}$$
(25)

and we define \(j_{\epsilon ,\eta }:Z\rightarrow \mathbb {R}\) in the same manner as before,

$$\begin{aligned} j_{\epsilon ,\eta }(\dot{\varvec{w}}) = \int _{\Omega }\mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\ {\text {d}}x . \end{aligned}$$

Lemma 2

The function \(j_{\epsilon ,\eta }(\cdot )\) is convex, lower semi-continuous and satisfies

$$\begin{aligned} |j_{\epsilon ,\eta }(\varvec{z}) - j_{\epsilon }(\varvec{z})| \le C\eta \left\Vert \varvec{z}\right\Vert _Z\qquad \hbox {and}\qquad j_{\epsilon }(\varvec{z}) < j_{\epsilon ,\eta }(\varvec{z}) \qquad \forall \eta >0, \ \varvec{z}\in Z, \end{aligned}$$
(26)

where C is a constant independent of \(\eta .\)

We safely leave the proof of Lemma 2 to the reader. We consider a new problem: find \(\varvec{w}_{\epsilon , \eta }(t)\in Z\) such that \(\varvec{w}_{\epsilon , \eta }(0)=\varvec{0},\) \(\dot{\varvec{w}}_{\epsilon ,\eta }(t)\in Z\) and

$$\begin{aligned} a(\varvec{w}_{\epsilon , \eta }, \varvec{z}-\dot{\varvec{w}}_{\epsilon , \eta }) + j_{\epsilon ,\eta }(\varvec{z}) - j_{\epsilon ,\eta }(\dot{\varvec{w}}_{\epsilon , \eta }) \ge l_t(\varvec{z} - \dot{\varvec{w}}_{\epsilon , \eta })\qquad \forall \varvec{z}\in Z. \end{aligned}$$
(27)

Theorem 4

The variational inequality (27) admits a unique solution \(\varvec{w}_{\epsilon ,\eta }\in H^1([0, T], Z).\)

The proof, given in Appendix (B), is inspired from that of Theorem 4.3 in Han et al. (1997). One cannot apply directly Theorem 1 because the functional \(j_{\epsilon ,\eta }\) is not positively homogeneous.

We now convert the variational inequation (27) into an equation. Since the function \(\mathcal {D}_{\epsilon ,\eta }\) is smooth, we can define its gradient

$$\begin{aligned} \nabla _{Z} \mathcal {D}_{\epsilon ,\eta }(\varvec{w}) = \left( \frac{\partial \mathcal {D}_{\epsilon ,\eta }(\varvec{w})}{\partial \varvec{u}}, \frac{\partial \mathcal {D}_{\epsilon ,\eta }(\varvec{w})}{\partial \varvec{\varepsilon }_p}, \frac{\partial \mathcal {D}_{\epsilon ,\eta }(\varvec{w})}{\partial \gamma } \right) . \end{aligned}$$

Lemma 3

The variational inequality (27) is equivalent to the variational formulation: find \(\varvec{w}_{\epsilon , \eta }(t)\in Z\) such that \(\varvec{w}_{\epsilon , \eta }(0)=\varvec{0},\) \(\dot{\varvec{w}}_{\epsilon ,\eta }(t)\in Z\) and

$$\begin{aligned} a(\varvec{w}_{\epsilon , \eta }, \varvec{z}) + \langle \nabla _{Z} \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_{\epsilon , \eta }), \varvec{z} \rangle = l_t( \varvec{z}) \qquad \forall \varvec{z} \in Z , \end{aligned}$$
(28)

where \(\langle ,\rangle\) is the scalar product defined by (16).

Proof

By definition of the convexity of \(j_{\epsilon ,\eta }\) we get

$$\begin{aligned} j_{\epsilon ,\eta }(\varvec{z})- j_{\epsilon ,\eta }(\dot{\varvec{w}}_{\epsilon , \eta }) \ge \langle \nabla _{Z} \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_{\epsilon , \eta }), \varvec{z}-\dot{\varvec{w}}_{\epsilon , \eta }\rangle \quad \forall \varvec{z}\in Z. \end{aligned}$$

The right hand side in the above is the tangent hyperplane to \(j_{\epsilon ,\eta }\) at \(\varvec{z} = \dot{\varvec{w}}_{\epsilon , \eta }\). On the other hand, (27) can be written as

$$\begin{aligned} j_{\epsilon , \eta }(\varvec{z}) - j_{\epsilon , \eta }(\dot{\varvec{w}}) \ge a(\varvec{w}_{\epsilon , \eta }, \dot{\varvec{w}}_{\epsilon , \eta } - \varvec{z}) + l_t(\varvec{z} - \dot{\varvec{w}}_{\epsilon , \eta } ) \quad \forall \varvec{z}\in Z. \end{aligned}$$

Again, the right hand side in the above equation is affine in \(\varvec{z}\) and it vanishes at \(\varvec{z}=\dot{\varvec{w}}_{\epsilon , \eta }\), implying that it is also tangent at \(\varvec{z} = \dot{\varvec{w}}_{\epsilon , \eta }\). Since \(j_{\epsilon , \eta }\) is smooth, the two tangent hyperplanes must be equal

$$\begin{aligned} a(\varvec{w}_{\epsilon , \eta }, \dot{\varvec{w}}_{\epsilon , \eta } - \varvec{z} ) + l_t(\varvec{z} - \dot{\varvec{w}}_{\epsilon , \eta }) = \langle \nabla _{Z} \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_{\epsilon , \eta }), \varvec{z} - \dot{\varvec{w}}_{\epsilon , \eta } \rangle \qquad \forall \varvec{z}\in Z. \end{aligned}$$

Replacing \(\varvec{z}\) in the above by \(\dot{\varvec{w}}_{\epsilon , \eta } + \varvec{z}\in Z\), we deduce (28).

Equation (28) is our approximation of the plasticity problem (21) we treat for the rest of this article. We call it the state equation and its solution, the state solution. As expected, for a fixed \(\epsilon\), one can prove the convergence of the sequence \(\varvec{w}_{\epsilon , \eta }\) of solutions to (27) to the solution \(\varvec{w}_{\epsilon }\) to (23) as \(\eta \longrightarrow 0\). We content ourselves in proving a weak convergence.

Theorem 5

The sequence of solutions \(\varvec{w}_{\epsilon , \eta }\) to (27) satisfies

$$\begin{aligned} \eta \longrightarrow 0,\qquad \varvec{w}_{\epsilon , \eta } \buildrel *\over \rightharpoonup \varvec{w}_{\epsilon }\quad \hbox {in }L^{\infty }([0,T],Z) \quad \hbox {and}\quad \dot{\varvec{w}}_{\epsilon ,\eta } \rightharpoonup \dot{\varvec{w}}_{\epsilon }\quad \hbox {in }L^{2}([0,T],Z), \end{aligned}$$

where \(\varvec{w}_{\epsilon }\) is the solution to (23).

The proof of Theorem 5 is postponed to Appendices.

3 Shape derivative computation

In this section, to simplify the notations, we drop the indices \(\epsilon\) and \(\eta ,\) and simply write \(\varvec{w}\) instead of \(\varvec{w}_{\epsilon ,\eta }.\) We minimize an objective function \(J(\Omega )\) defined as

$$\begin{aligned} J(\Omega ) = \int _0^T\left( \int _{\Omega } m(\varvec{w}(\Omega ))\ {\text {d}}x + \int _{\Gamma _N} p(\varvec{w}(\Omega )){\text {d}}s \right) {\text {d}}t, \end{aligned}$$
(29)

where \(\varvec{w}(\Omega )\) is solution to the state equation (28) and the integrands \(m(\cdot )\) and \(p(\cdot )\) are assumed to be smooth functions at least of class \(\mathcal {C}^1.\) In addition we assume a growth condition on \(m(\cdot )\) and \(p(\cdot )\) such that the objective function is well-defined and the adjoint equation (33) is well-posed. This objective can represent a mechanical property such as the total compliance, total power, elastic energy, plastic energy as well as a geometric property such as the volume. An industrially relevant objective is the total compliance, given by

$$\begin{aligned} J(\Omega ) = \int _0^T \int _{\Gamma _N} \varvec{g}\cdot \varvec{u}(\Omega )\ \ {\text {d}}s\ {\text {d}}t. \end{aligned}$$
(30)
Fig. 2
figure 2

Design domain D and the shape \(\Omega\)

In practice, the shape \(\Omega\) is designed inside a pre-fixed design space \(D\subset \mathbb {R}^d.\) As shown in Fig. 2, the blue region represents the shape \(\Omega ,\) and the blue and grey area represent the design space D. We define the space of admissible shapes \(\mathcal {U}_{ad}\) as

$$\begin{aligned} \mathcal {U}_{ad} = \left\{ \Omega \subset D,\ \int _{\Omega }{\text {d}}x = V_f \right\} , \end{aligned}$$
(31)

where \(\Omega\) is an open set and \(V_f\) is a target volume. The optimization problem then reads

$$\begin{aligned} \min _{\Omega \in \mathcal {U}_{ad}}J(\Omega ). \end{aligned}$$

The question of existence of optimal shapes \(\Omega\) is a delicate one and we shall not dwell into it (see Henrot and Pierre (2018) for a discussion). Rather, we content ourselves with computing numerical minimizers, using a gradient descent method.

3.1 Preliminaries

The gradient in the context of shape optimization is based on the notion of the Hadamard shape derivative (Allaire 2007; Allaire et al. 2021; Henrot and Pierre 2018; Sokolowski and Zolésio 1992). Starting from a smooth domain \(\Omega ,\) the perturbation of the domain is expressed as

$$\begin{aligned} \Omega _{\varvec{\theta }} = (I_d + \varvec{\theta })(\Omega ), \end{aligned}$$

where \(\varvec{\theta }\in W^{1, \infty }(\mathbb {R}^d, \mathbb {R}^d)\) and \(I_d\) is the identity map. It is well-known that when the norm of \(\varvec{\theta }\) is sufficiently small, the map \(I_d + \varvec{\theta }\) is a diffeomorphism in \(\mathbb {R}^d\). With this perturbation of the domain, one can define the notion of a Fréchet derivative for a function \(J(\Omega ) .\)

Definition 1

The shape derivative of \(J(\Omega )\) at \(\Omega\) is defined as the Fréchet derivative in \(W^{1, \infty }(\mathbb {R}^d, \mathbb {R}^d)\) evaluated at 0 for the mapping \(\varvec{\theta } \mapsto J( (I_d + \varvec{\theta })(\Omega ) )\) i.e.,

$$\begin{aligned} J((I_d + \varvec{\theta })(\Omega )) =\; J(\Omega ) + J^{\prime }(\Omega )(\varvec{\theta }) + {o}(\varvec{\theta }) \qquad \hbox {with }\quad \lim _{\varvec{\theta }\rightarrow 0}\frac{{o}(\varvec{\theta })}{\left\Vert \varvec{\theta }\right\Vert _{W^{1, \infty }}} = 0, \end{aligned}$$

where \(J'(\Omega )\) is a continuous linear form on \(W^{1, \infty }(\mathbb {R}^d, \mathbb {R}^d).\)

Given an initial shape \(\Omega ,\) one can then apply the above gradient, and move the shape iteratively, minimizing the objective. In general, nothing ensures that our iterations would converge. Moreover, even in the case of convergence, one ends up in a final shape, which is often a local minimum, depending on the choice of the initial design.

Typically when a structure is designed, the clamped and the forced boundaries are assumed to be non-optimizable. Hence in our optimization, we allow only \(\Gamma\) to move along \(\varvec{\theta }\) as shown in Fig. 2. To incorporate this constraint, we introduce the space

$$\begin{aligned} W^{1, \infty }_0 (\mathbb {R}^d, \mathbb {R}^d) = \{ \varvec{\theta }\in W^{1, \infty }(\mathbb {R}^d, \mathbb {R}^d),\ \varvec{\theta } = \varvec{0}\ \hbox {on }\Gamma _N\cup \Gamma _D \} \end{aligned}$$

and state a classical lemma we shall use later.

Lemma 4

Let \(\Omega\) be a smooth bounded open set and \(\varphi ,\psi \in W^{1,1}(\mathbb {R}^d, \mathbb {R}).\) Define \(J(\Omega )\) by

$$\begin{aligned} J(\Omega ) = \int _{\Omega } \varphi (x)\, dx + \int _{\Gamma _N}\psi (x)\, {\text {d}}s , \end{aligned}$$

then \(J(\Omega )\) is differentiable at \(\Omega\) with the derivative being

$$\begin{aligned} J^{\prime }(\Omega )(\varvec{\theta }) = \int _{\Gamma } \varvec{\theta }\cdot \varvec{n}\ \varphi \ ds \qquad \forall \varvec{\theta }\in W^{1, \infty }_0(\mathbb {R}^d, \mathbb {R}^d). \end{aligned}$$

3.2 Shape derivative

Since the regularized nonlinearity \(j_{\epsilon ,\eta }(\cdot )\) is \(\mathcal {C}^{\infty }\), it is possible to compute the shape derivative of the objective function \(J(\Omega )\) defined by (29).

Theorem 6

Let \(\Omega \subset \mathbb {R}^d\) be a smooth bounded open set. Let \(\varvec{f}\in \mathcal {C}^0([0,T],H^1(\mathbb {R}^d)^d),\ \varvec{g}\in \mathcal {C}^0([0,T], H^2(\mathbb {R}^d)^d)\) and \(\varvec{w}(\Omega )\in H^1([0, T], Z)\) the solution to (28). Then the shape derivative of \(J(\Omega )\) along \(\varvec{\theta } \in W^{1, \infty }_0(\mathbb {R}^d, \mathbb {R}^d),\) \(J^{\prime }(\Omega )(\varvec{\theta })\) is given by

$$\begin{aligned} J^{\prime }(\Omega )(\varvec{\theta }) = \int _0^T\int _{\Gamma } \varvec{\theta }\cdot \varvec{n}\Big (m(\varvec{w}) + \mathbb {C}(\varvec{\varepsilon }(\varvec{u}) - \varvec{\varepsilon }_p)\,:(\varvec{\varepsilon }(\varvec{v})-\varvec{\varepsilon }_q) + \varvec{\varepsilon }_q\,:\mathbb {H} \varvec{\varepsilon }_p + E_{iso}\gamma \mu + \nabla _{Z} \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\cdot \varvec{z} - l_t(\varvec{z}) \Big ){\text{d}}s\, {\text {d}}t, \end{aligned}$$
(32)

where \(\varvec{z}(\Omega )\in H^1([0, T], Z)\) is the solution to the adjoint problem, with the final condition \(\varvec{z}(T)=\varvec{0}\),

$$\begin{aligned} a(\varvec{z},\varvec{\varphi }) - \left\langle \frac{d}{{\text {d}}t}\left( \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\varvec{z}\right) , \varvec{\varphi }\right\rangle = -\langle \nabla _{Z} m(\varvec{w}), \varvec{\varphi }\rangle - \int _{\Gamma _N}\nabla _{Z} p(\varvec{w})\varvec{\varphi } \ ds\qquad \forall t\in [0,T),\, \forall \varvec{\varphi }\in Z, \end{aligned}$$
(33)

which is assumed to be well-posed (recall that \(\langle ,\rangle\) is the scalar product defined by (16) in Z).

Proof

The idea of the proof is classical and, assuming that the adjoint equation is well-posed, it relies on Céa’s techniques (Céa 1986). Define three spaces \(\tilde{V},\) \(\tilde{Q}\) and \(\tilde{Z}= \tilde{V}\times \tilde{Q} \times L^2(\mathbb {R}^d)\) (which are similar to those in (15) except that \(\Omega\) is replaced by \(\mathbb {R}^d\)) by

$$\begin{aligned} \tilde{V} = \{ \varvec{u}\in H^1(\mathbb {R}^d)^d,\ \varvec{u} = \varvec{0}\hbox { on }\Gamma _D \} \quad \hbox {and} \quad \tilde{Q} = \{ \varvec{\varepsilon }_q \in L^2(\mathbb {R}^d)^{d\times d},\ \hbox {tr}(\varvec{\varepsilon }_q)=0\hbox { a.e. in }\mathbb {R}^d \} . \end{aligned}$$
(34)

For \(\tilde{\varvec{w}} = (\tilde{\varvec{u}}, \tilde{\varvec{\varepsilon }}_p, \tilde{\gamma }) \in H^1([0,T],\tilde{Z})\), \(\tilde{\varvec{z}} = (\tilde{\varvec{v}}, \tilde{\varvec{\varepsilon }}_q, \tilde{\mu }) \in H^1([0,T],\tilde{Z})\) (the Lagrange multiplier for the state equation (28)) and \(\tilde{\varvec{\lambda }}\in L^2(\mathbb {R}^d)^d\) (the Lagrange multiplier for the initial condition \(\tilde{\varvec{w}}(0) = \varvec{0}\)), define a Lagrangian by

$$\begin{aligned} \mathcal {L}(\Omega , \tilde{\varvec{w}}, \tilde{\varvec{z}}, \tilde{\varvec{\lambda }}) = \int _0^T\left( \int _{\Omega } m(\tilde{\varvec{w}})\ {\text {d}}x + \int _{\Gamma _N} p(\tilde{\varvec{w}}){\text {d}}s \right) {\text {d}}t + \int _0^T \left( a(\tilde{\varvec{w}}, \tilde{\varvec{z}} ) - l_t(\tilde{\varvec{z}}) + \langle \nabla _{Z} \mathcal {D}_{\epsilon ,\eta }(\dot{\tilde{\varvec{w}}}), \tilde{\varvec{z}} \rangle \right) \ {\text {d}}t + \int _{\Omega } \tilde{\varvec{\lambda }}\cdot \tilde{\varvec{w}}(0)\, {\text {d}}x . \end{aligned}$$
(35)

We remark that here the variables \(\tilde{\varvec{w}}(t), \tilde{\varvec{z}}(t)\) and \(\tilde{\varvec{\lambda }}\) are defined on the full space \(\mathbb {R}^d\) and are thus independent of \(\Omega\). Although \(\tilde{\varvec{u}}(t)\) and \(\tilde{\varvec{v}}(t)\) are required to vanish on \(\Gamma _D\), they do not depend on \(\Omega\) since \(\Gamma _D\) is a fixed boundary. Therefore, writing the optimality conditions applied to the Lagrangian (35), namely that its partial derivatives with respect to the independent variables \((\Omega , {\varvec{w}},{\varvec{z}},\varvec{\lambda })\) vanishes, yields the state equation, the adjoint equation and the shape derivative.

When the Lagrangian (35) is differentiated with respect to the adjoint variable \(\tilde{\varvec{z}}\), along \(\varvec{\varphi }\in H^1([0,T],\tilde{Z})\), and equated to zero, followed by the substitution \(\tilde{\varvec{w}} = \varvec{w},\) we get

$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \varvec{z}}(\varvec{\varphi }) = \int _0^T \left( a(\varvec{w}, \varvec{\varphi } ) + \langle \nabla _{Z} \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}), \varvec{\varphi } \rangle - l_t(\varvec{\varphi })\right) \ {\text {d}}t = 0\qquad \forall \varvec{\varphi }\in H^1([0,T],\tilde{Z}). \end{aligned}$$

Since the bilinear form \(a(\cdot ,\cdot )\) and the linear forms in the above are defined only on \(\Omega ,\) we can replace \(\tilde{Z}\) by Z. Differentiating (35) with respect to \(\tilde{\varvec{\lambda }}\) at \(\tilde{\varvec{w}} = \varvec{w}\), equating it to zero, we deduce the initial condition \(\varvec{w}(0) = \varvec{0}\) a.e. on \(\Omega\). We thus recover the state equation (28). Next, we differentiate the Lagrangian (35) with respect to \(\tilde{\varvec{w}}\) along \(\varvec{\varphi }\in H^1([0,T],\tilde{Z})\) and equate it to zero at \(\tilde{\varvec{w}} = \varvec{w},\, \tilde{\varvec{z}} = \varvec{z},\, \tilde{\varvec{\lambda }} = \varvec{\lambda }\), to get

$$\begin{aligned}&\frac{\partial \mathcal {L}}{\partial \varvec{w}}(\varvec{\varphi }) = \int _{0}^T\int _{\Omega }\nabla _{Z} m(\varvec{w})\varvec{\varphi } \ {\text {d}}x\ {\text {d}}t + \int _{0}^T\int _{\Gamma _N}\nabla _{Z} p(\varvec{w})\varvec{\varphi } \ {\text {d}}s\ {\text {d}}t\\&\quad + \int _{0}^T a(\varvec{\varphi }, \varvec{z}){\text {d}}t + \int _0^T\langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\dot{\varvec{\varphi }}, \varvec{z}\rangle dt + \int _{\Omega }\varvec{\lambda }\cdot \varvec{\varphi }(0)\ {\text {d}}x \hbox { =0}\qquad \forall \varvec{\varphi }\in H^1([0,T],\tilde{Z}). \end{aligned}$$

Using the symmetry of the second derivative \(\nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\), and integrating by parts in time, we deduce

$$\begin{aligned}&\int _{0}^T\int _{\Omega }\nabla _{Z} m(\varvec{w})\varvec{\varphi } \ {\text {d}}x\ {\text {d}}t+ \int _{0}^T\int _{\Gamma _N}\nabla _{Z} p(\varvec{w})\varvec{\varphi } \ {\text {d}}s\ {\text {d}}t + \int _{0}^T a(\varvec{\varphi }, \varvec{z}){\text {d}}t + \left. \langle \varvec{\varphi },\nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\varvec{z}\rangle \right| _{t=T}\\&\quad - \left. \langle \varvec{\varphi },\nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\varvec{z}\rangle \right| _{t=0} - \int _0^T\left\langle {\varvec{\varphi }}, \frac{d}{{\text {d}}t}(\nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\varvec{z})\right\rangle dt + \int _{\Omega }\varvec{\lambda }\cdot \varvec{\varphi }(0)\ {\text {d}}x =0\quad \forall \varvec{\varphi }\in H^1([0,T],\tilde{Z}) . \end{aligned}$$

Since all integrals in the above are defined only on \(\Omega ,\) we can replace \(\tilde{Z}\) by Z. Varying the test function \(\varvec{\varphi }\), we derive the following adjoint equation:

$$\begin{aligned}&\varvec{\lambda } = \left. \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\varvec{z}\right| _{t=0}, \quad \varvec{z}(T) = \varvec{0}\quad \hbox {and}\\&a(\varvec{z},\varvec{\varphi }) - \left\langle \frac{d}{{\text {d}}t}\left( \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\varvec{z}\right) , \varvec{\varphi }\right\rangle = -\langle \nabla _{Z} m(\varvec{w}), \varvec{\varphi }\rangle - \int _{\Gamma _N}\nabla _{Z} p(\varvec{w})\varvec{\varphi } \ {\text {d}}s\qquad t\in [0, T),\ \forall \ \varvec{\varphi }\in {Z}. \end{aligned}$$

Finally, using the relation \(J(\Omega ) = \mathcal {L}(\Omega ,\varvec{w},\tilde{\varvec{z}}, \tilde{\varvec{\lambda }})\), we determine the shape derivative \(J^{\prime }(\Omega )(\varvec{\theta })\) for any \(\varvec{\theta }\in W^{1, \infty }_0 (\mathbb {R}^d, \mathbb {R}^d)\) by

$$\begin{aligned} J^{\prime }(\Omega )(\varvec{\theta }) = \frac{\partial \mathcal {L}}{\partial \Omega }(\varvec{\theta }) + \frac{\partial \mathcal {L}}{\partial \varvec{w}}\left( \frac{\partial \varvec{w}}{\partial \Omega }(\varvec{\theta })\right) , \end{aligned}$$

because \(\tilde{\varvec{z}}\) and \(\tilde{\varvec{\lambda }}\) do not depend on \(\Omega\). Now, replacing them by their precise values \(\varvec{z}\) and \(\varvec{\lambda }\), given by the adjoint problem, the last term cancels to get

$$\begin{aligned} J^{\prime }(\Omega )(\varvec{\theta }) = \frac{\partial \mathcal {L}}{\partial \Omega }(\varvec{\theta }) \end{aligned}$$

and formula (32) is deduced by application of Lemma 4.

3.3 Well-posedness of the time-discretized version of the adjoint equation (33)

In the previous proof, we assumed that the adjoint equation (33) was well-posed. The adjoint problem (33) is a linear backward parabolic equation with a final condition at \(t=T.\) The right hand side of (33) involves the derivative of the objective function which is assumed to satisfy a growth condition that renders it well-defined. The only difficult point is that the time derivative of \(\varvec{z}\) is multiplied by the Hessian operator of the convex dissipation function. If we knew that this operator is coercive, then existence and uniqueness would be easy (assuming further that \(\dot{\varvec{w}}\) is a smooth function). In full generality, the analysis for the time-continuous adjoint problem (33) is very complicated. However, if we consider a time-discretized version of (33), then the analysis is much simpler as we shall now show. We split the time interval [0, T] in N intervals of length \(\delta t\). We denote the solution of the state problem (28), \(\varvec{w}(t)\) evaluated at time instant \(t_n = n\delta t\) by \(\varvec{w}_n = (\varvec{u}_n,\varvec{\varepsilon }_{p,n},\gamma _n)\). Similarly, \(\dot{\varvec{w}}_n = (\dot{\varvec{u}}_n,\dot{\varvec{\varepsilon }}_{p,n},\dot{\gamma }_n)\) denotes the time derivative \(\dot{\varvec{w}}(t)\) at time instant \(t_n\). On the other hand, \(\varvec{z}_n\) denotes an approximation of the adjoint state (33) at time \(t_n\) defined as the solution to the system below: for \(\varvec{z}_N = 0,\) find a family \(\varvec{z}_{n} \in Z\), \(N-1\ge n\ge 0\), such that

$$\begin{aligned}&a(\varvec{\varphi }, \varvec{z}_{n}) + \frac{1}{\delta t}\left\langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_{n})\varvec{z}_{n} - \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta } (\dot{\varvec{w}}_{n+1})\varvec{z}_{n+1},\varvec{\varphi }\right\rangle \nonumber \\&\quad = -\langle \nabla _{Z} m(\varvec{w}_{n+1}), \varvec{\varphi } \rangle -\int _{\Gamma _N}\nabla _{Z} p(\varvec{w}_{n+1}) \cdot \varvec{\varphi } \ {\text {d}}s \qquad \forall \varvec{\varphi }\in Z. \end{aligned}$$
(36)

Theorem 7

We assume that \(\dot{\varvec{w}}_n\in V\times L^{\infty }(\Omega )^{d\times d}\times L^2(\Omega )\) and \(\epsilon>0,\ \eta > 0.\) Then the time-discretized adjoint problem (36) admits a unique solution \(\varvec{z}_n\in Z,\ n = N-1, \ldots ,1, 0.\)

Proof

Every equation in the system (36) is linear in \(\varvec{z}_n.\) The form \(a:Z\times Z\rightarrow \mathbb {R}\) is bilinear, symmetric, bounded and coercive as shown in Lemma 1 for \(h_0 > 0.\) In what follows, we show that the adjoint equation is well posed even for \(h_0=0.\) The bilinear form \(\delta t a(\cdot ,\cdot ) + \langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_{n})\cdot ,\cdot \rangle\) is symmetric and bounded, and to demonstrate its coercivity, we consider it for all \(\varvec{z}, \varvec{z}=(\varvec{v},\varvec{\varepsilon }_q,\mu )\in Z,\)

$$\begin{aligned} \delta t a(\varvec{z},\varvec{z}) + \langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_n)\varvec{z}, \varvec{z}\rangle = \delta t\left( \int _{\Omega }\mathbb {C}(\varvec{\varepsilon }(\varvec{v}) - \varvec{\varepsilon }_q)\,:(\varvec{\varepsilon }(\varvec{v}) - \varvec{\varepsilon }_q) \ \ {\text {d}}x + \int _{\Omega }\mathbb {H}\varvec{\varepsilon }_q\,:\varvec{\varepsilon }_q\ \ {\text{d}}x + \int _{\Omega }E_{iso}\gamma ^2 \ {\text {d}}x\right) +\langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_n)\varvec{z}, \varvec{z}\rangle . \end{aligned}$$

We write the expression of \(\langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_n)\varvec{z}, \varvec{z}\rangle\) (which is the second derivative of \(\mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\) along two directions \(\varvec{z},\varvec{z}\)):

$$\begin{aligned} \langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_n) \varvec{z},\varvec{z} \rangle = \int _{\Omega }\sqrt{\frac{2}{3}}\sigma _Y \left( {N}_{\eta }^{\prime \prime }(\dot{\varvec{\varepsilon }}_p)\varvec{\varepsilon }_q^2 + \frac{1}{\epsilon }{M}_{\eta }^{\prime \prime }\left( \sqrt{\frac{2}{3}}{N}_{\eta }(\dot{\varvec{\varepsilon }}_p) - \dot{\gamma } \right) \left( \sqrt{\frac{2}{3}}{N}_{\eta }^{\prime }(\dot{\varvec{\varepsilon }}_p)\varvec{\varepsilon }_q- \mu \right) ^2 + \frac{1}{\epsilon }\sqrt{\frac{2}{3}}{M}_{\eta }^{\prime }\left( \sqrt{\frac{2}{3}}{N}_{\eta }(\dot{\varvec{\varepsilon }}_p) - \dot{\gamma } \right) {N}_{\eta }^{\prime \prime }(\dot{\varvec{\varepsilon }}_p)\varvec{\varepsilon }_q^2 \right) dx\nonumber \\&\quad \hbox {where}\qquad {N}_{\eta }^{\prime \prime }(\dot{\varvec{\varepsilon }}_p)\varvec{\varepsilon }_q^2 = \left( \frac{\varvec{\varepsilon }_q^2}{(\dot{\varvec{\varepsilon }}_p^2 + \tilde{\eta }^2)^{1/2}} - \frac{(\dot{\varvec{\varepsilon }}_p:\varvec{\varepsilon }_q)^2}{(\dot{\varvec{\varepsilon }}_p^2 + \tilde{\eta }^2)^{3/2}} \right) \quad \hbox {and}\quad \tilde{\eta }= \frac{\sigma _Y\eta }{TE}. \end{aligned}$$
(37)

By construction \(M^{\prime }_{\eta }(\cdot ),\ M^{\prime \prime }_{\eta }(\cdot )\ge 0.\) Moreover

$$\begin{aligned} {N}_{\eta }^{\prime \prime }(\dot{\varvec{\varepsilon }}_p)\varvec{\varepsilon }_q^2 \ge \left( \frac{\varvec{\varepsilon }_q^2}{(\dot{\varvec{\varepsilon }}_p^2 + \tilde{\eta }^2)^{1/2}} - \frac{\dot{\varvec{\varepsilon }}_p^2\ \varvec{\varepsilon }_q^2}{(\dot{\varvec{\varepsilon }}_p^2 + \tilde{\eta }^2)^{3/2}} \right) = \frac{\tilde{\eta }^2 \varvec{\varepsilon }_q^2 }{(\dot{\varvec{\varepsilon }}_p^2 + \tilde{\eta }^2)^{3/2}}. \end{aligned}$$

The second derivative (37) can then be bounded from below by

$$\begin{aligned} \langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_n) \varvec{z},\varvec{z} \rangle \ge \int _{\Omega }\sqrt{\frac{2}{3}}\sigma _Y \varvec{\varepsilon }_q^2 \left( \frac{\tilde{\eta }^2}{(\dot{\varvec{\varepsilon }}_{p,n}^2 + \tilde{\eta }^2)^{3/2}} + \frac{1}{\epsilon }\sqrt{\frac{2}{3}}{M}_{\eta }^{\prime }\left( \sqrt{\frac{2}{3}} {N}_{\eta }(\dot{\varvec{\varepsilon }}_{p,n}) - \dot{\gamma }_n \right) \frac{\tilde{\eta }^2}{(\dot{\varvec{\varepsilon }}_{p,n}^2 + \tilde{\eta }^2)^{3/2}} \right) dx \ge \int _{\Omega }\sqrt{\frac{2}{3}} \frac{\sigma _Y\tilde{\eta }^2\varvec{\varepsilon }_q^2}{(\dot{\varvec{\varepsilon }}_{p,n}^2 + \tilde{\eta }^2)^{3/2}}{\text {d}}x. \end{aligned}$$
(38)

Then, performing a similar calculation as in Lemma 1, we get for \(s\in (0,1)\)

$$\begin{aligned} \delta t a(\varvec{z},\varvec{z}) + \langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}}_n)\varvec{z}, \varvec{z}\rangle \ge \delta t\left( c_0s \left\Vert \varvec{\varepsilon }(\varvec{v})\right\Vert _{L^2(\Omega )}^2 + \left( h_0 - \frac{c_0 s}{1-s}\right) \left\Vert \varvec{\varepsilon }_q\right\Vert _{L^2(\Omega )}^2 + E_{iso}\left\Vert \mu \right\Vert _{L^2(\Omega )}^2\right) + \sqrt{\frac{2}{3}}\sigma _Y\min _{x\in \Omega }\left( \frac{\tilde{\eta }^2}{(\dot{\varvec{\varepsilon }}_{p,n}^2 + \tilde{\eta }^2)^{3/2}}\right) \left\Vert \varvec{\varepsilon }_q\right\Vert _{L^2(\Omega )}^2. \end{aligned}$$

Denote \(C= \sqrt{\tfrac{2}{3}}\sigma _Y\min _{x\in \Omega }\left( \tfrac{\tilde{\eta }^2}{(\dot{\varvec{\varepsilon }}_{p,n}^2 + \tilde{\eta }^2)^{3/2}}\right)\), which is finite since \(\eta >0\). By the assumption \(\dot{\varvec{\varepsilon }}_{p,n}\in L^{\infty }(\Omega )^{d\times d}\) we have that \(C>0\). If \(h_0>0,\) we take \(s = \tfrac{h_0}{2c_0 +h_0}\), while if \(h_0=0,\) we take \(s = \tfrac{C}{2c_0\delta t +C}\) and find the left hand side in the above to be coercive. The adjoint equation (36) thus admits a unique solution \(\varvec{z}_n\in Z,\ n = N-1, \ldots ,1, 0.\)

Remark 2

As shown in the previous theorem, the approximate dissipation function (25) is so constructed such that \(\langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\dot{\varvec{w}})\varvec{z}, \varvec{z}\rangle\) is positive for all non-zero \(\varvec{z}\in Z.\) This ensures the well-posedness of the system (36) even for \(h_0=0.\) This is remarkable because the adjoint system (36) is well posed even when the state equation (28) cannot be shown to be well-posed.

3.4 Formal analysis of the limit adjoint equation

At this stage, one may ask what happens in the adjoint (33) when the parameters \(\epsilon ,\eta \longrightarrow 0.\) In this subsection, we answer this question in a formal manner. Note that the adjoint equation is the adjoint operator of the linearized equation corresponding to the nonlinear problem (27). We have been able to pass to the limit \(\epsilon ,\eta \longrightarrow 0\) in the nonlinear problem (27) (end of Sect. 2), and obtain the formulation (21). Instead of passing to the limit \(\epsilon ,\eta \longrightarrow 0\) in (33), one may equivalently linearize (21). This linearization is not possible in a classical sense as (i) it contains a non-smooth functional \(j(\cdot )\), (ii) it is posed on a convex set K. The first difficulty is addressed using the sub-differential of \(j(\cdot )\). The second difficulty may be addressed using the notion of conical derivative, which is well studied for inequality of the first kind (Sokolowski and Zolésio 1992). It finds application in the obstacle problem and contact mechanics (Maury 2016). The conical derivative for a particular inequality of the second kind is studied in Sokolowski and Zolésio (1992) and an application to the continuous viscoplasticity problem is considered in Sokolowski and Zolésio (1990).

The plasticity problem (21) is an inequality neither of the first kind nor of the second. Its time-discretized version however classifies as an inequality of the second kind. To the best of our knowledge, calculation of the conical derivative for neither the continuous plasticity problem (21) nor its time-discretized version has been performed yet and remains outside the scope of this article. Nevertheless, we shall determine a shape derivative for a simplified version of problem (21) and in a formal manner because of the very strong assumption (A) below. Problem (21) is simplified by assuming that only the first time step of its time-discretized version is considered and that the isotropic hardening \(E_{\text {iso}}=0\) vanishes. These two simplifications are made only for the sake of simplicity of the proof, without loss of generality. We remind the reader that in problem (21) the solution and the forces are assumed to vanish at time \(t=0\). An incremental body force \(\varvec{f}\) and surface load \(\varvec{g}\) are applied, so that the problem reads: find \(\varvec{w} = ( \varvec{u}, \varvec{\varepsilon }_{p})\in V\times Q\) such that

$$\begin{aligned} a( \varvec{w}, \varvec{z}- \varvec{w}) + j(\varvec{z}) - j( \varvec{w}) \ge l_0(\varvec{z} - \varvec{w}) \qquad \forall \varvec{z}\in V\times Q, \end{aligned}$$
(39)

where \(\varvec{z} = (\varvec{v},\varvec{\varepsilon }_q),\) and the forms \(a(\cdot , \cdot ),\, j(\cdot )\) and \(l_0(\cdot )\) are given by

$$\begin{aligned} \begin{aligned}&a( \varvec{w}, \varvec{z}) = \int _{\Omega } \left( \mathbb {C}(\varvec{\varepsilon }( \varvec{u}) - \varvec{\varepsilon }_p):(\varvec{\varepsilon }(\varvec{v})-\varvec{\varepsilon }_q) + \mathbb {H} \varvec{\varepsilon }_p\,:\varvec{\varepsilon }_q \right) {\text {d}}x ,\\&j(\varvec{z}) = \int _{\Omega }\sqrt{\frac{2}{3}}\sigma _Y |\varvec{\varepsilon }_q|\, {\text {d}}x\quad \hbox {and}\quad l_0(\varvec{z}) = \int _{\Omega } \varvec{f}\cdot \varvec{v}\, {\text {d}}x + \int _{\Gamma _N} \varvec{g}\cdot \varvec{v}\, {\text {d}}s, \end{aligned} \end{aligned}$$

respectively. The problem (39) can equivalently be formulated as the minimization of \(\mathcal {J}:V\times Q\rightarrow \mathbb {R}\)

$$\begin{aligned}&\min _{\varvec{z}\in V\times Q}\mathcal {J}(\varvec{z}), \end{aligned}$$
(40)
$$\begin{aligned} \hbox {where}\quad \mathcal {J}(\varvec{z})&= \left( \frac{1}{2} a(\varvec{z}, \varvec{z}) + j(\varvec{z}) - l_0(\varvec{z})\right) . \end{aligned}$$
(41)

Since the functional \(\mathcal {J}(\cdot )\) is convex on its convex domain \(V\times Q,\) it admits a unique solution \(\varvec{w} = (\varvec{u},\varvec{\varepsilon }_p)\in V\times Q\). We seek an adjoint problem for a shape optimization problem with (39) as a constraint. For the same, we must convert the inequation (39) into an equation. In order to facilitate this conversion, we introduce the set where the plastic flow takes place:

$$\begin{aligned} \Omega _{\varvec{w}} = \left\{ \varvec{x}\in \Omega ,\,\, {\varvec{\varepsilon }}_{p}(\varvec{x}) \ne \varvec{0}\quad \hbox { in }\Omega \right\} , \end{aligned}$$
(42)

and a space \(Q_{\varvec{w}}\) defined by

$$\begin{aligned} Q_{\varvec{w}} = \{ \varvec{\varepsilon }_q\in Q,\,\, \varvec{\varepsilon }_q(\varvec{x}) = \varvec{0}\quad \forall \varvec{x}\in \Omega \backslash \Omega _{\varvec{w}} \}. \end{aligned}$$
(43)

Both the set \(\Omega _{\varvec{w}}\) and the space \(Q_{\varvec{w}}\) are dependent on the solution \(\varvec{w}\). We suppose that the set \(\Omega _{\varvec{w}}\) is open and Lipschitz. To simplify notations, we let

$$\begin{aligned} \varvec{\sigma }= \mathbb {C}(\varvec{\varepsilon }( \varvec{u}) - \varvec{\varepsilon }_{p})\quad \hbox {and}\quad \varvec{q} = \mathbb {H} \varvec{\varepsilon }_{p}. \end{aligned}$$
(44)

We now consider the minimization of (41) over the smaller space \(V\times Q_{\varvec{w}}:\)

$$\begin{aligned} \min _{\varvec{z}\in V\times Q_{\varvec{w}}}\mathcal {J}(\varvec{z}). \end{aligned}$$
(45)

This problem is well posed as the following lemma shows.

Lemma 5

Let \(\varvec{w}\) be the solution to problem (40). Let \(\Omega _{\varvec{w}}\) and \(Q_{\varvec{w}}\) be as defined in (42) and (43) respectively and the set \(\Omega _{\varvec{w}}\) be open and Lipschitz. Then there exists a unique solution \(\varvec{w}^*\in V\times Q_{\varvec{w}}\) to problem (45). In addition, if \(a(\cdot ,\cdot )\) is coercive, then

$$\begin{aligned} (\varvec{u}^*, \varvec{\varepsilon }_p^*) = \varvec{w}^* = \varvec{w}, \end{aligned}$$

and \(\varvec{w}^*\) equivalently satisfies

$$\begin{aligned}&\int _{\Omega }\mathbb {C}\varvec{\varepsilon }(\varvec{v})\,:( \varvec{\varepsilon }(\varvec{u}^*) - \varvec{\varepsilon }_{p}^*){\text {d}}x = \int _{\Omega } \varvec{f}\cdot \varvec{v}\ {\text {d}}x + \int _{\Gamma _N} \varvec{g}\cdot \varvec{v}\ {\text {d}}s \qquad \forall \varvec{v}\in V, \end{aligned}$$
(46)
$$\begin{aligned}&\int _{\Omega _{\varvec{w}}} \varvec{\varepsilon }_q \,:\left( \varvec{\sigma }^* - \varvec{q}^* - \sqrt{\frac{2}{3}}\sigma _Y \frac{ \varvec{\varepsilon }_p^*}{| \varvec{\varepsilon }_p^*|} \right) dx = 0\qquad \forall \varvec{\varepsilon }_q\in Q_{\varvec{w}}. \end{aligned}$$
(47)

Proof

The functional (41) is convex on the convex domain \(V\times Q_{\varvec{w}}.\) Thus problem (45) admits a unique solution \(\varvec{w}^*\in V\times Q_{\varvec{w}} .\) This solution also satisfies

$$\begin{aligned} a( \varvec{w}^*, \varvec{z}- \varvec{w}^*) + j(\varvec{z}) - j(\varvec{w}^*) \ge l_0(\varvec{z} - \varvec{w}^*) \qquad \forall \varvec{z}\in V\times Q_{\varvec{w}}. \end{aligned}$$
(48)

We now establish the equality between \(\varvec{w}^*\) and \(\varvec{w}.\) One easily verify that \(\varvec{w}\in V\times Q_{\varvec{w}}.\) Substituting \(\varvec{z} = \varvec{w}\) in (48) and \(\varvec{z} =\varvec{w}^*\) in (39), and adding the two resulting inequations, we get

$$\begin{aligned} a(\varvec{w} - \varvec{w}^*, \varvec{w} - \varvec{w}^*) \le 0. \end{aligned}$$

Given the coercivity of \(a(\cdot ,\cdot ),\) we deduce \(\varvec{w} = \varvec{w}^*\). In order to derive equation (47), we rewrite the functional (41)

$$\begin{aligned} \forall \varvec{z}\in V\times Q_{\varvec{w}}, \quad \mathcal {J}(\varvec{z}) = \frac{1}{2} a(\varvec{z}, \varvec{z}) + \int _{\Omega _{\varvec{w}}}|\varvec{\varepsilon }_q|\, {\text {d}}x - l_0(\varvec{z}) . \end{aligned}$$

In the above, the integral on \(\Omega _{\varvec{w}}\) is differentiable in the classical sense if \(\varvec{\varepsilon }_q(\varvec{x})\ne \varvec{0}\) over the open Lipschitz set \(\Omega _{\varvec{w}}.\) We know that the minimizer \(\varvec{\varepsilon }_p (\varvec{x})\) is non-zero over \(\Omega _{\varvec{w}}\) by definition. Thus the functional \(\mathcal {J}(\cdot )\) is differentiable at \(\varvec{w}^*\), and equating it to zero leads to (46)–(47).

Remark 3

No derivatives are involved in (47) so it implies

$$\begin{aligned} \varvec{\sigma }^{*D} - \varvec{q}^{*D} - \sqrt{\frac{2}{3}}\sigma _Y \frac{ \varvec{\varepsilon }_p^*}{| \varvec{\varepsilon }_p^*|} = \varvec{0} \quad \hbox {and}\quad | \varvec{\sigma }^{*D} - \varvec{q}^{*D}| = \sqrt{\frac{2}{3}}\sigma _Y \qquad \hbox {in }\Omega _{\varvec{w}^*} \end{aligned}$$

since \(\varvec{\varepsilon }_q\) is trace-free. In other words, the normality law is respected. Alongside, the yield limit is attained wherever there is plastic flow. Problem (39) takes \(\Omega\) and the forces \(\varvec{f},\ \varvec{g}\) as input whereas problem (46)–(47) takes \(\Omega ,\,\varvec{f},\ \varvec{g}\), as well as \(\Omega _{\varvec{w}}\) (and hence \(Q_{\varvec{w}}\)), as input.

For the shape optimization problem, we minimize a simplified version of (29), namely

$$\begin{aligned} J(\Omega ) = \left( \int _{\Omega }m( \varvec{w}(\Omega ))\, {\text {d}}x + \int _{\Gamma _N}p( \varvec{w}(\Omega ))\, {\text {d}}s\right) , \end{aligned}$$
(49)

where \(\varvec{w}(\Omega )\) is the solution to (46)–(47). We compute the shape derivative of (49) under the following strong assumption.

(A):

When \(\Omega\) is perturbed to \((I_d+\varvec{\theta })\Omega\), with a small vector field \(\varvec{\theta } \in W^{1, \infty }_0(\mathbb {R}^d, \mathbb {R}^d)\), the corresponding solution \(\varvec{w}_{\varvec{\theta }}\equiv \varvec{w}((I_d+\varvec{\theta })\Omega )\) of problem (40) is differentiable with respect to \(\varvec{\theta }\) and the plastic zone \(\Omega _{\varvec{w}_{\varvec{\theta }}}\) is perturbed to \((I_d+\varvec{\theta }_p)\Omega _{\varvec{w}_0}\) where \(\varvec{\theta }_p\) is a vector field which smoothly depends on \(\varvec{\theta }\) and \(\varvec{w}_0\), while \(\Omega _{\varvec{w}_0}\) is an open Lipschitz set.

In particular, this assumption implies that the plastic zone does not change its topology, meaning that there is no creation of new plastic zones or creation of elastic zones inside the plastic zone.

Theorem 8

Let \(\Omega \subset \mathbb {R}^d\) be a smooth and bounded open set, \(\varvec{f}\in H^1(\mathbb {R}^d)^d\) and \(\varvec{g}\in H^2(\mathbb {R}^d)^d\) be smooth loads. Assume that the solution to (46)–(47), \(\varvec{w}=( \varvec{u}, \varvec{\varepsilon }_p)\in V \times Q\), is smooth, namely belongs to \(H^2(\Omega )^d\times H^1(\Omega )^{d\times d}\). Assume that the integrand \(p(\cdot )\), in the cost function (49), does not depend on \(\varvec{\varepsilon }_p\) and that \(\mathbb {H}\) is proportional to identity tensor.

Under assumption (A), the shape derivative of (49), in the direction \(\varvec{\theta }\in W^{1,\infty }_0(\mathbb {R}^d, \mathbb {R}^d)\), is given by

$$\begin{aligned} J^{\prime }(\Omega )(\varvec{\theta }) = \int _{\Gamma }\varvec{\theta }\cdot \varvec{n}\left( m( \varvec{w}) + \mathbb {C}\varvec{\varepsilon }(\varvec{v})\,:(\varvec{\varepsilon }( \varvec{u}) - \varvec{\varepsilon }_{p}) - \varvec{f}\cdot \varvec{v} \right) {\text{d}}s, \end{aligned}$$
(50)

where \(\varvec{\sigma }, \varvec{q}\) are defined in (44) and \(\varvec{z} = (\varvec{v},\varvec{\varepsilon }_{q})\in V\times Q\) is the adjoint variable satisfying, \(\forall \varvec{\varphi }\in V\),

$$\begin{aligned}&\int _{\Omega } \mathbb {C}\varvec{\varepsilon }(\varvec{\varphi })\,:(\varvec{\varepsilon }(\varvec{v}) - \varvec{\varepsilon }_{q}){\text{d}}x = -\left( \int _{\Omega }\partial _{ \varvec{u}}m( \varvec{w})\cdot \varvec{\varphi }\, {\text {d}}x + \int _{\Gamma _N}\partial _{ \varvec{u}} p( \varvec{w})\cdot \varvec{\varphi }\, {\text {d}}s\right) , \end{aligned}$$
(51)
$$\begin{aligned}&\varvec{\varepsilon }_{q}= - \left( \mathbb {C}+\mathbb {H} + \sqrt{\frac{2}{3}}\sigma _Y\left( \frac{\mathbb {I}}{| \varvec{\varepsilon }_{p}|} - \frac{ \varvec{\varepsilon }_{p}\otimes \varvec{\varepsilon }_{p} }{| \varvec{\varepsilon }_{p}|^3}\right) \right) ^{-1}\left( \mathbb {C}\varvec{\varepsilon }(\varvec{v}) - \partial _{ \varvec{\varepsilon }_{p}}m( \varvec{w}) \right) {^D} \quad \hbox {in }\Omega _{\varvec{w}}. \end{aligned}$$
(52)

Remark 4

Note that the variation \(\varvec{\theta }_p\) of the plastic zone, which is assumed to exist in assumption (A), does not play any role in the shape derivative (50). Theorem 8 is a mathematically clean version of many results in the engineering literature (often in a discretized setting) where the equations are derived without taking into account the non-differentiability issues. Of course, our result relies on a very strong assumption which is the price to pay to deduce a simple formula as (50).

Proof

The shape derivative is determined by applying Céa’s method. The main idea is to define a Lagrangian for the simpler equations (46)–(47) rather than for the variational inequality (39). The Lagrangian is defined by

$$\begin{aligned}&\mathcal {L}\left( \tilde{\varvec{w}}, \tilde{\varvec{z}}, \tilde{\varvec{\lambda }},\Omega _p, \Omega \right) = \int _{\Omega }m( \tilde{\varvec{w}}){\text {d}}x + \int _{\Gamma _N}p( \tilde{\varvec{w}}){\text {d}}s + \int _{\Omega }\mathbb {C}\varvec{\varepsilon }(\tilde{\varvec{v}})\,:(\varvec{\varepsilon }( \tilde{\varvec{u}}) - \tilde{\varvec{\varepsilon }}_{p}){\text {d}}x\nonumber \\&\quad - \int _{\Omega } \varvec{f}\cdot \tilde{\varvec{v}}\ {\text {d}}x - \int _{\Gamma _N} \varvec{g}\cdot \tilde{\varvec{v}}\ {\text {d}}s + \int _{\Omega _p} \tilde{\varvec{\varepsilon }}_{q} \,:\left( \tilde{\varvec{\sigma }}^D - \tilde{\varvec{q}}^D - \sqrt{\frac{2}{3}}\sigma _Y \frac{ \tilde{\varvec{\varepsilon }}_{p}}{| \tilde{\varvec{\varepsilon }}_{p}|} \right) {\text{d}}x + \int _{\Omega \backslash \Omega _p}\tilde{\varvec{\lambda }}\,:\tilde{\varvec{\varepsilon }}_{p}\, {\text{d}}x, \end{aligned}$$
(53)

where \(\tilde{\varvec{w}} = (\tilde{\varvec{u}},\tilde{\varvec{\varepsilon }}_{p})\in \tilde{V}\times \tilde{Q}\) (defined in (34)), \(\tilde{\varvec{z}} = (\tilde{\varvec{v}},\tilde{\varvec{\varepsilon }}_{q})\in \tilde{V}\times \tilde{Q}\) is the adjoint variable, \(\Omega _p\) is the plastic zone (which is independent of \(\tilde{\varvec{w}}\)), \(\tilde{\varvec{\lambda }} \in \tilde{Q}\) is a Lagrange multiplier penalizing the constraint \(\tilde{\varvec{\varepsilon }}_{p}=\varvec{0}\) in the elastic zone \(\Omega \backslash \Omega _p\), \(\tilde{\varvec{\sigma }} = \mathbb {C}( \varvec{\varepsilon }(\tilde{\varvec{u}}) - \tilde{\varvec{\varepsilon }}_p)\) and \(\tilde{\varvec{q}} =\mathbb {H}\tilde{\varvec{\varepsilon }}_p\). The variables \(\tilde{\varvec{w}}\) and \(\tilde{\varvec{z}}\) vanishes on \(\Gamma _D\), which is a fixed set, so it does not cause any problem for differentiating the Lagrangian (53).

We now compute the optimality condition for the Lagrangian (53). The optimal variables are denoted by \(\left( {\varvec{w}}, {\varvec{z}}, {\varvec{\lambda }} \right)\). Since the Lagrangian is linear with respect to \(\tilde{\varvec{z}}\) and \(\tilde{\varvec{\lambda }}\), it is easy to compute its partial derivatives with respect to these two variables. Equating to zero the partial derivative for \(\tilde{\varvec{z}}\) in the direction \((\varvec{\varphi },\varvec{\psi })\in V\times Q\) yields

$$\begin{aligned}&\int _{\Omega }\mathbb {C}\varvec{\varepsilon }(\varvec{\varphi })\,:( \varvec{\varepsilon }(\varvec{u}) - \varvec{\varepsilon }_{p})dx = \int _{\Omega } \varvec{f}\cdot \varvec{\varphi }\ {\text{d}}x + \int _{\Gamma _N} \varvec{g}\cdot \varvec{\varphi }\ {\text{d}}s \qquad \forall \varvec{\varphi }\in V, \end{aligned}$$
(54)
$$\begin{aligned}&\int _{\Omega _p} \varvec{\psi } \,:\left( \varvec{\sigma }- \varvec{q} - \sqrt{\frac{2}{3}}\sigma _Y \frac{ \varvec{\varepsilon }_{p}}{| \varvec{\varepsilon }_{p}|} \right) {\text{d}}x = 0\qquad \forall \varvec{\psi } \in Q. \end{aligned}$$
(55)

Equating to zero the partial derivative for \(\tilde{\varvec{\lambda }}\) in the direction \(\varvec{\mu }\in Q\) leads to

$$\begin{aligned} \int _{\Omega \backslash \Omega _p}\varvec{\mu }: \varvec{\varepsilon }_p {\text {d}}x = 0 \qquad \forall \varvec{\mu } \in Q , \end{aligned}$$
(56)

which implies that \(\varvec{\varepsilon }_p = \varvec{0}\) in \(\Omega \backslash \Omega _p\). Choosing \(\Omega _p = \Omega _{\varvec{w}}\) one can check that the optimality conditions (54)–(55) and (56) are precisely the state equations (46)–(47).

The adjoint equations (51)–(52) are obtained by writing the optimality condition of the Lagrangian (53) with respect to \(\tilde{\varvec{w}}\) in the direction \((\varvec{\varphi },\varvec{\psi }) \in V\times Q\). The adjoint \(\varvec{z}\) is evaluated at the state \(\varvec{w}\), \(\varvec{\lambda }\) and \(\Omega _p = \Omega _{\varvec{w}}\). The adjoint problem amounts to find \(\varvec{z}=(\varvec{v},\varvec{\varepsilon }_{q})\in V\times Q\) such that

$$\begin{aligned}&\int _{\Omega } \mathbb {C}\varvec{\varepsilon }(\varvec{\varphi })\,:(\varvec{\varepsilon }(\varvec{v}) + \varvec{\varepsilon }_{q}){\text {d}}x = -\left( \int _{\Omega }\partial _{ \varvec{u}}m( \varvec{w})\cdot \varvec{\varphi }\, {\text {d}}x + \int _{\Gamma _N}\partial _{ \varvec{u}} p( \varvec{w})\cdot \varvec{\varphi }\, {\text {d}}s\right) \qquad \forall \varvec{\varphi }\in V,\\&\quad \int _{\Omega _{\varvec{w}}} \varvec{\psi } \,:\left( \mathbb {C}\varvec{\varepsilon }(\varvec{v}) + (\mathbb {C}+\mathbb {H})\varvec{\varepsilon }_{q} + \sqrt{\frac{2}{3}}\sigma _Y\left( \frac{\varvec{\varepsilon }_{q}}{| \varvec{\varepsilon }_{p}|} - \frac{( \varvec{\varepsilon }_{p}\,:\varvec{\varepsilon }_{q}) \varvec{\varepsilon }_{p} }{| \varvec{\varepsilon }_{p}|^3}\right) \right) {\text {d}}x \\&\quad = \int _{\Omega _{\varvec{w}}}\partial _{ \varvec{\varepsilon }_{p}}m( \varvec{w}):\varvec{\psi } \, {\text {d}}x + \int _{\Omega \backslash \Omega _{\varvec{w}}}(\partial _{ \varvec{\varepsilon }_{p}}m( \varvec{w}):\varvec{\psi }\, + \varvec{\lambda }\,:\varvec{\psi })\, {\text {d}}x \qquad \forall \varvec{\psi }\in Q, \end{aligned}$$

where we used the assumption that the cost function \(p(\cdot )\) does not depend on \(\varvec{\varepsilon }_p\). The second equation has no derivative on the test function \(\varvec{\psi }\) so it yields

$$\begin{aligned} \varvec{\lambda }&= -\partial _{ \varvec{\varepsilon }_{p}}m( \varvec{w}) \quad \hbox {in }\Omega \backslash \Omega _{\varvec{w}} , \\ \left( (\mathbb {C}+\mathbb {H})\varvec{\varepsilon }_{q}\right) {^D} + \sqrt{\frac{2}{3}}\sigma _Y\left( \frac{\varvec{\varepsilon }_{q}}{| \varvec{\varepsilon }_{p}|} - \frac{( \varvec{\varepsilon }_{p}\,:\varvec{\varepsilon }_{q}) \varvec{\varepsilon }_{p} }{| \varvec{\varepsilon }_{p}|^3}\right)&=\left( - \mathbb {C}\varvec{\varepsilon }(\varvec{v}) + \partial _{ \varvec{\varepsilon }_{p}}m( \varvec{w})\right) {^D} \quad \hbox {in }\Omega _{\varvec{w}}. \end{aligned}$$

Some easy algebra shows that \(\varvec{\varepsilon }_{q}\) is indeed given by formula (52) (where we used the fact that \(\left( (\mathbb {C}+\mathbb {H})\varvec{\varepsilon }_{q}\right) ^D = (\mathbb {C}+\mathbb {H})\varvec{\varepsilon }_{q}\)).

Since \(\varvec{w}\) and \(\Omega _p = \Omega _{\varvec{w}}\), solution of problem (40), satisfies the state equations (46)–(47), we have, for any \(\tilde{\varvec{z}}, \tilde{\varvec{\lambda }}\),

$$\begin{aligned} \mathcal {L} \left( \varvec{w}, \tilde{\varvec{z}}, \tilde{\varvec{\lambda }},\Omega _{\varvec{w}}, \Omega \right) = J(\Omega ) . \end{aligned}$$

Now we differentiate both sides in the above with respect to the shape \(\Omega\) in the direction of a vector field \(\varvec{\theta }\). Because of assumption (A), when \(\varvec{\theta }\) moves the domain \(\Omega\) to \((I_d+\varvec{\theta })\Omega\), the plastic zone \(\Omega _{\varvec{w}}\) is displaced to \((I_d+\varvec{\theta }_p)\Omega _{\varvec{w}}\), with another vector field \(\varvec{\theta }_p\). Thus, by the chain rule lemma, we obtain

$$\begin{aligned}&J^{\prime }(\Omega )(\varvec{\theta }) = \left\langle \frac{\partial \mathcal {L}}{\partial \Omega }\left( \varvec{w},\tilde{\varvec{z}},\tilde{\varvec{\lambda }},\Omega _{\varvec{w}}, \Omega \right) , \varvec{\theta }\right\rangle + \left\langle \frac{\partial \mathcal {L}}{\partial {\Omega }_{\varvec{w}}}\left( \varvec{w},\tilde{\varvec{z}},\tilde{\varvec{\lambda }}, \Omega _{\varvec{w}}, \Omega \right) , \varvec{\theta }_p\right\rangle \nonumber \\&\quad + \left\langle \frac{\partial \mathcal {L}}{\partial \varvec{\varepsilon }_p}\left( \varvec{w},\tilde{\varvec{z}},\tilde{\varvec{\lambda }}, \Omega _{\varvec{w}}, \Omega \right) , \frac{\partial \varvec{\varepsilon }_p }{\partial \Omega }(\varvec{\theta })\right\rangle + \left\langle \frac{\partial \mathcal {L}}{\partial \varvec{u}}\left( \varvec{w},\tilde{\varvec{z}},\tilde{\varvec{\lambda }}, \Omega _{\varvec{w}}, \Omega \right) , \frac{\partial \varvec{u} }{\partial \Omega }(\varvec{\theta })\right\rangle . \end{aligned}$$
(57)

Now, substituting \(\tilde{\varvec{z}} = \varvec{z},\) \(\tilde{\varvec{\lambda }} = \lambda\) and using the adjoint equations (51)–(52), the last line of (57) vanishes because it is the adjoint variational formulation. We obtain

$$\begin{aligned} J^{\prime }(\Omega )(\varvec{\theta }) = \left\langle \frac{\partial \mathcal {L}}{\partial \Omega }\left( \varvec{w},\varvec{z},{\varvec{\lambda }}, \Omega _{\varvec{w}}, \Omega \right) , \varvec{\theta }\right\rangle + \int _{\partial \Omega _{\varvec{w}}}(\varvec{\theta }_p\cdot \varvec{n})\, \varvec{\varepsilon }_q\,:\left( {\varvec{\sigma }}^D - {\varvec{q}}^D - \sqrt{\frac{2}{3}}\sigma _Y \frac{{\varvec{\varepsilon }}_{p}}{| {\varvec{\varepsilon }}_{p}|} \right) {\text {d}}s + \int _{\partial (\Omega \backslash \Omega _{\varvec{w}})}\varvec{\theta }_p\cdot \varvec{n}\, ({\varvec{\lambda }}\,:\varvec{\varepsilon }_p)\, {\text {d}}s. \end{aligned}$$

Given that \(\varvec{\varepsilon }_p=\varvec{0}\) in \(\Omega \backslash \Omega _{\varvec{w}}\) and the assumption that \(\varvec{\varepsilon }_p\in H^1(\Omega )^{d\times d}\), we deduce that the last integral vanishes. Furthermore, the smoothness assumption \((\varvec{u},\varvec{\varepsilon }_q)\in H^2(\Omega )^d\times H^1(\Omega )^{d\times d}\) implies that the yield strength is attained even on \(\partial \Omega _{\varvec{w}}\), so the penultimate integral vanishes too. Thus we find

$$\begin{aligned} J^{\prime }(\Omega )(\varvec{\theta }) = \left\langle \frac{\partial \mathcal {L}}{\partial \Omega }\left( \varvec{w},\varvec{z},{\varvec{\lambda }}, \Omega _{\varvec{w}}, \Omega \right) , \varvec{\theta }\right\rangle . \end{aligned}$$

Finally, using Lemma 4 leads to formula (50).

4 Numerical implementation

We first discuss the numerical resolution of the state equation (28), then that of the adjoint equation (33) and finally we describe the shape optimization algorithm. The domain \(\Omega\) is discretized using a simplicial unstructured mesh and the space Z, defined by (15), is discretized as \(Z^h\), using the finite element framework

$$\begin{aligned} Z^h = \mathbb {P}^1(\Omega )^d \times \mathbb {P}^0(\Omega )^{d\times d} \times \mathbb {P}^0(\Omega ). \end{aligned}$$
(58)

The space K is discretized as \(K^h,\) defined by

$$\begin{aligned} K^h = \left\{ (\varvec{u},{\varvec{\varepsilon }}_{p}, {\gamma })\in Z^h, \ \sqrt{\frac{2}{3}}|{\varvec{\varepsilon }}_{p}|\le {\gamma } \hbox { a.e. in }\Omega \right\} . \end{aligned}$$
(59)

The maximal mesh size is denoted by \(h_{\max }\), the minimal mesh size by \(h_{\min }\) and the number of mesh vertices is \(N_v\). We assume the mesh to be regular, or \(h_{\max }\) and \(h_{\min }\) to be of the same order. The space time-discretized state solution is \(\tilde{\varvec{w}}(t) \in Z^h\) and the space time-discretized adjoint solution is \(\tilde{\varvec{z}}(t) \in Z^h\). The time interval [0, T] is discretized in N intervals of length \(\delta t\). We label the time at the end of n-th time interval as \(t_n,\ n = 1, 2,\cdots , N.\) All our numerical experiments are performed with the open-source software FreeFEM++ (Hecht 2012).

4.1 Resolution of the plasticity formulation

The space-discretized version of the problem (21) reads: find \(\varvec{w}^h(t)\in K^h\) such that

$$\begin{aligned} a( \varvec{w}^h,\varvec{z}^h - \dot{\varvec{w}}^h ) + j(\varvec{z}^h) - j(\dot{\varvec{w}}^h) \ge l_t(\varvec{z}^h - \dot{\varvec{w}}^h ) \qquad \forall \varvec{z}^h\in K^h. \end{aligned}$$
(60)

The solution of the time-discretized version of (60) is denoted by \(\tilde{\varvec{w}}(t)\in Z^h\). More precisely, it is defined by its values \(\tilde{\varvec{w}}_n = \tilde{\varvec{w}}(t_n)\) at each time step and extended by affine interpolation as \(\tilde{\varvec{w}}(t) = \tilde{\varvec{w}}_n + \delta {\tilde{\varvec{w}}}_n( t - t_n)\) for \(t\in [t_n,t_{n+1}]\), where \(\delta {\tilde{\varvec{w}}}_n = (\tilde{\varvec{w}}_{n+1}-\tilde{\varvec{w}}_n)/\delta t\) is the increment. Equation (60) could be regularized and penalized as before but we refrain ourselves from doing so and instead solve its time-discretization via the radial return algorithm (Simo and Hughes 2006).

4.2 Resolution of adjoint system

We denote by \(\tilde{\varvec{z}}_n= \tilde{\varvec{z}}(t_n)\) the discrete values of the adjoint, which is linearly interpolated in time on each sub-interval. We further discretize in space the time-discrete adjoint system (36) which was studied in Sect. 3 (and proved to be well-posed). The space time-discretized adjoint problem is defined by: \(\tilde{\varvec{z}}_ N = 0\) and, for \(n = N-1,\cdots , 1, 0\), find the solution \(\tilde{\varvec{z}}_{n}\in Z^h\) of

$$\begin{aligned}&\delta t\langle \nabla _{Z}m(\tilde{\varvec{w}}_{n+1}), \varvec{\varphi } \rangle + \delta t \int _{\Gamma _N}\nabla _{Z} p(\tilde{\varvec{w}}_{n+1})\varvec{\varphi }\ {\text {d}}s + \delta t a(\varvec{\varphi }, \tilde{\varvec{z}}_{n}) \nonumber \\&\quad + \left\langle \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta }(\delta {\tilde{\varvec{w}}}_{n})\tilde{\varvec{z}}_{n} - \nabla _{Z}^2 \mathcal {D}_{\epsilon ,\eta } (\delta {\tilde{\varvec{w}}}_{n+1})\tilde{\varvec{z}}_{n+1},\varvec{\varphi }\right\rangle = 0 \qquad \forall \varvec{\varphi }\in Z^h . \end{aligned}$$
(61)

This system is going backward in time. One ought to solve the state equation (60) until the last time step, store the solutions \(\tilde{\varvec{w}}_n\) for every time step and retrieve the solutions one by one starting from the last time step. This is thus quite heavy in terms of memory requirement for numerical simulations. Finally, the time-discretized shape derivative reads

$$\begin{aligned} J^{\prime }(\Omega )(\varvec{\theta }) = \sum _{n=0}^{N-1}\delta t\int _{\Gamma } \varvec{\theta }\cdot \varvec{n}\left( m(\tilde{\varvec{w}}_n) + \mathbb {C}(\varvec{\varepsilon }(\tilde{\varvec{u}}_{n+1}) - \tilde{\varvec{\varepsilon }}_{p, n+1}):(\varvec{\varepsilon }(\tilde{\varvec{v}}_n) - \tilde{\varvec{\varepsilon }}_{q,n}) + \mathbb {H}\tilde{\varvec{\varepsilon }}_{p, n+1}:\tilde{\varvec{\varepsilon }}_{q, n} + E_{iso}\tilde{\gamma }_{n+1}\tilde{\mu }_n +\nabla _{Z} \mathcal {D}_{\epsilon ,\eta }\left( \frac{\tilde{\varvec{w}}_{n+1}-\tilde{\varvec{w}}_n}{\delta t} \right) \tilde{\varvec{z}}_n - \varvec{f}(t_n)\cdot \tilde{\varvec{v}}_n \right) {\text{d}}s, \end{aligned}$$
(62)

where \((\tilde{\varvec{u}}_n,\tilde{\varvec{\varepsilon }}_{p,n}, \tilde{\gamma }_n ) = \tilde{\varvec{w}}_n\) and \((\tilde{\varvec{v}}_n,\tilde{\varvec{\varepsilon }}_{q,n}, \tilde{\mu }_n ) = \tilde{\varvec{z}}_n\).

In numerical practice, denoting by L a characteristic length of the domain D, we choose the values of \(\epsilon ,\eta\) for penalization and regularization according to the following rule (see Desai 2021 for more details)

$$\begin{aligned} \epsilon = \left( \frac{h_{\min }}{L}\right) ^{1+d/2} \quad \hbox {and}\quad \eta = \epsilon ^2. \end{aligned}$$
(63)

Remark 5

For all of our numerical experiments, we replace \(\tilde{\varvec{w}}_n\) in the adjoint equation (61) and in the shape derivative (62) by the solution obtained via radial return, \({\varvec{w}}_r(t_n)\in K^h\), which does not take into account the penalization and regularization. In formula (62) of the shape derivative, and more precisely in the term \(\nabla _{Z} \mathcal {D}_{\epsilon ,\eta }\), we neglect the contribution \(\tfrac{1}{\epsilon } M_{\eta }^{\prime }(\cdot )\). The reason for this is because we replace the penalized solution \(\tilde{\varvec{w}}(t)\) by the non-penalized one \(\varvec{w}_r(t)\). For the penalized solution, the contribution \(\tfrac{1}{\epsilon } M_{\eta }^{\prime }\left( \frac{\tilde{\varvec{w}}_{n+1}-\tilde{\varvec{w}}_n}{\delta t} \right)\) is of order \(\mathcal {O}(1)\) since it satisfies the problem (28). However, the same term is of order \(\mathcal {O}(1/\epsilon )\) for the non-penalized solution because the regularization \(M_{\eta }(s)\) of \(\max (0,s)\) is not exactly zero for negative values of s. To avoid this numerical artifact we found it more efficient to just cancel this term in (62).

4.3 Level-set method

The level-set method was introduced by Osher and Sethian (Osher and Fedkiw 2006) and adapted to the shape optimization framework in Allaire et al. (2002), Wang et al. (2003). This method proposes to describe the shape \(\Omega \in \mathbb {R}^d\) by the level-set function \(\phi :D\rightarrow \mathbb {R}\) defined as

$$\begin{aligned} \left\{ \begin{array}{cl} \phi (x)<0 &{}\quad \text {if } x \in \Omega , \qquad \\ \phi (x)=0 &{}\quad \hbox {if } x \in \Gamma , \\ \phi (x)>0 &{}\quad \hbox {if } x \in \overline{\Omega }^c \end{array} \right. \end{aligned}$$

where \(\Gamma\) is the movable part of the boundary \(\partial \Omega\) and D is the design space as shown in Fig. 2. The crux of the method lies in letting the shape deform along a velocity field \(\varvec{\theta }:D\rightarrow \mathbb {R}^d.\) The evolution of the shape is governed by the transport equation

$$\begin{aligned} \frac{\partial \phi }{\partial t} + \varvec{\theta }\cdot \nabla \phi = 0. \end{aligned}$$
(64)

Very often, the velocity field is oriented along the normal, namely \(\varvec{\theta }=\theta \varvec{n}\) where \(\varvec{n} = \nabla \phi / |\nabla \phi |\) and the scalar function \(\theta\) is the normal velocity. In such a case, (64) can be re-written as a Hamilton-Jacobi equation

$$\begin{aligned} \frac{\partial \phi }{\partial t} + \theta | \nabla \phi | = 0 . \end{aligned}$$

In our numerical setting, we work with the linear transport equation (64) because we use non cartesian meshes and rely on the library advect (Bui et al. (2012)) which solves (64) by the method of characteristics, known to be unconditionally stable. The level-set function is a \(\mathbb {P}^1\) function on a simplicial mesh. After every advection, the new shape is captured using a body-fitted mesh obtained using MMG (Dapogny et al. 2014).

The level-set method is known to capture rather smooth surfaces. It is even more the case when each new shape is precisely remeshed with MMG. However, it is not the only possible approach to obtain smooth shapes. Let us mention the recent work (Areias et al. 2021) which is also based on a remeshing strategy and a so-called screened Poisson equation for regularization, which is actually very similar to our own regularization process (65).

It is well-known that the level-set method cannot nucleate new holes in the shape \(\Omega\) although it can easily merge or close them. However in 3D, the level-set function can evolve in such a way that it digs the boundary \(\Gamma ,\) creating holes in the shape. In any case, both in 2D and 3D, we initialize the shape optimization algorithm with a shape that has many holes (see Sect. 5).

4.4 Regularization and extension of the shape derivative

During optimization the produced shapes may not have a smooth boundary so the shape derivative may have no rigorous meaning on the boundary \(\Gamma\). In such a case, it is imperative to regularize the shape derivative (Burger 2003; De Gournay 2006; Allaire et al. 2021) in such a way that it is still a descent direction. One possibility is to consider the \(H^1\) scalar product instead of the \(L^2\) scalar product by finding a function \(dj(\Omega )\in H^1(D)\) such that

$$\begin{aligned} \int _{D}\left( h_{\min }^2\nabla dj(\Omega )\cdot \nabla \varphi + dj(\Omega )\varphi \right) {\text{d}}x = \int _{\Gamma } j^{\prime }(\Omega )\, \varphi \ {\text{d}}x\qquad \forall \varphi \in H^1(D), \end{aligned}$$
(65)

where \(h_{\min }\) is the fixed minimal mesh size, and the function \(j^{\prime }(\Omega )\) is defined by formula (62) with

$$\begin{aligned} J^{\prime }(\Omega )(\varvec{\theta }) = \int _{\Gamma } \varvec{\theta }\cdot \varvec{n} \, j^{\prime }(\Omega ) \, {\text{d}}s . \end{aligned}$$
(66)

Since we have chosen \(\mathbb {P}^1\) basis elements for the displacement vector and the plastic strain, the shape derivative in (62) is \(\mathbb {P}^0\) smooth and so \(j^{\prime }(\Omega ) \in \mathbb {P}^0(\Omega )\). Thus, it is enough to discretize (65) with \(\mathbb {P}^1\) finite elements, so that \(dj(\Omega )\in \mathbb {P}^1(D)\).

4.5 Shape optimization algorithm

We consider the shape optimization problem

$$\begin{aligned} \min _{\Omega \in \mathcal {U}_{ad}}J(\Omega ), \end{aligned}$$

where we remind the reader that \(\mathcal {U}_{ad}\) is the space of admissible spaces inside the design space D (see Fig. 2). In order to devise an optimization strategy taking the volume constraint into account, we introduce a Lagrangian \(\mathcal {L}(\tilde{\varvec{w}}, \tilde{\varvec{z}}, \Omega ,\lambda )\) defined as

$$\begin{aligned} \mathcal {L}(\tilde{\varvec{w}}, \tilde{\varvec{z}}, \Omega ,\lambda ) = \frac{J(\Omega )}{C_1} + \frac{\lambda }{C_2} \left( \int _{\Omega }\ {\text{d}}x - V_f\right) \end{aligned}$$
(67)

where \(\lambda\) is the Lagrange multiplier for the volume constraint and \(C_1, C_2\) are two normalization constants. Starting from some initial shape \(\Omega _0\), these constants are chosen as

$$\begin{aligned} C_1 = \int _{\partial \Omega _0}| j^{\prime }(\Omega _0)|\ dx,\quad C_2 = \left| \int _{\Omega _0}\ dx - V_f\right| , \end{aligned}$$
(68)

where \(j^{\prime }\) is the integrand of the shape derivative, defined in (66), and the initial volume is usually larger than the target volume \(V_f\). In the context of a gradient algorithm, the descent step is also a pseudo-time step for the level-set transport equation (64), denoted by \(\tau\), which we choose as

$$\begin{aligned} \tau = \frac{h_{\min }}{2}, \end{aligned}$$
(69)

where \(h_{\min }\) is the minimal mesh size at the first iteration. The number of gradient descent iterations is \(I_{\max }=200\). The volume constraint is not enforced at each iteration but the volume will converge to its target value by applying a gradient algorithm to the Lagrange multiplier with the same step \(\tau\). We thus perform the Algorithm 1.

Algorithm 1 Repeat over i = 0, \(\cdots\), Imax

1. Solve for \(\tilde{w}\) using the radial return algorithm on the mesh of \(\Omega_{i}\) starting from t1 until the last time tN.

2. Solve for the adjoint \(\tilde{z}\) using (61) on the mesh of \(\Omega_{i}\) starting from the last time tN until t1.

3. Compute the regularized shape derivative dj(\(\Omega_{i}\)) by solving (65) with the right hand side (62).

4. Apply a gradient ascent algorithm, with step \(\tau\) given by (69), to get

\(\uplambda_{i+1}=\uplambda_{i}+\frac{\tau}{C_2}\left(\int\nolimits_{\Omega_i}dx-{V_f}\right).\)

5. Set n = \(\nabla\phi_i\) (the level-set function for \(\Omega_{i}\)) and solve (64) with the initial data \(\phi_{i}\) and a velocity

\({\varvec\theta}_{i}=\left(\frac{dj(\Omega_i)}{C_1}+\frac{\uplambda_{i+1}}{C_2}\right){\varvec{n}}\)

for a pseudo-time step \(\tau\) to obtain \(\tilde\phi_{i+1}\).

6. Re-initialize \(\tilde\phi_{i+1}\) to the signed distance function, using mshdist [14], to obtain \(\phi_{i+1}\) corresponding tothe new shape \(\Omega_{i+1}\).

7. Compute the volume \(V_{i+1}\). If \(|V_{i+1} - V_f|\le 10^{-5} V_f\), then update the level-set \(\phi_{i+1}\) by adding to it the constant \(\phi_{i+1}\).

8. Remesh the box D using MMG [13] to obtain the body fitted mesh of the new shape \(\Omega_{i+1}\).

Remark 6

Once the working domain D is remeshed in the last step of algorithm (1), the volume tolerance \(|V_{i+1} - V_f|\le 10^{-5} V_f\) of the previous step is no longer satisfied. One has to move the mesh points of the remeshed shape \(\Omega _{i+1}\) using the lag 0 option of MMG to ensure that the volume tolerance is satisfied.

5 Results

This section displays 2D and 3D optimization results with three minimization criteria: total compliance (30), total energy (72) and plastic energy (73). In each case a volume constraint \(|\Omega | = V_f\) is imposed and the optimization algorithm 1 is applied. The structure is composed of mild steel with the properties: \(E = 210GPa,\) \(\nu = 0.3,\) \(\sigma _Y = 279MPa,\) \(E_{iso} = 712MPa\). For all test-cases in this section except the one corresponding to (74), we consider a force \(\varvec{g}\) that increases from zero to a final value in one second in a constant direction with a time step \(\delta t = 0.05\). The time-discretized adjoint equation (61) is solved using \(\epsilon ,\eta\) given in (63).

Fig. 3
figure 3

2D Cantilever boundary conditions

5.1 2D cantilever

We study a \(2m\times 1m\) 2D cantilever beam which is partially clamped on the left side (there is a small difference between the size of the Dirichlet boundary condition and the left edge of the beam), while a vertical concentrated force is applied at the middle of the right side of the beam (see Fig. 3). The reason to not completely clamp the left side of the cantilever beam is to allow the shape to move around \(\Gamma _D\) and to avoid potential plastic zone which often appears around the Dirichlet boundary condition. A target volume \(V_f = 0.7m^2\) is imposed. Based on the quasi-static assumption, the rate of force increment has no impact on the solution at the final time instant \(t=1\). However, the rate does impact the objective function (30). If the force grows faster in the beginning and then slowly after the onset of plasticity, the objective function is influenced more by the plastic flow. To see a greater impact of the plastic flow on the shape derivative (and hence the shape), we choose

$$\begin{aligned} \varvec{g}=(0, 220\min (1.5t, 1) )MN/m,\, t\in [0,1]s. \end{aligned}$$
(70)
Fig. 4
figure 4

Von Mises stress at \(t=1\)s corresponding to various shapes for a target volume \(V_f=0.7m^2\) and force (70)

Fig. 5
figure 5

Convergence history corresponding to shapes (4b) and (4d)

Fig. 6
figure 6

Plastic zones (\(\gamma >0\)) at \(t = 1\)s computed for shapes (4b) and (4d)

Table 1 2D Cantilever shape comparison for force (70)

The parameters of the remeshing tool MMG are fixed to \(h_{\min } = 0.01m\) (minimal mesh size), \(h_{\max } = 0.02m\) (maximal mesh size). First, we minimize the total compliance (30). The initial shape and the final shapes for the linear elasticity and plasticity models are shown in Fig. 4. Let us first note that the presence, or not, of the hardening tensor \(\mathbb {H}\) does not change much the resulting optimized shape in Fig. 4c, d. As can be seen on Fig. 4b, d, the optimized shapes for linear elasticity or plasticity are very similar. The only slight difference is near the Dirichlet boundary condition, where the bars are thicker for the plasticity case. It turns out that the displacement for linear elasticity is numerically very close to the one for plasticity. Although the plastic deformation \(\varvec{\varepsilon }_p\) does contribute to the shape derivative for the plasticity case, it does not induce a different topology, compared to the elasticity case. The convergence history for the total compliance is depicted in Fig. 5.

To quantitatively compare the two optimized shapes in Fig. 4b (elasticity), d (plasticity), we perform a plasticity computation for both of them with \(E_{iso}=712MPa,\, \mathbb {H}=105\mathbb {I}MPa\) and the force (70). The plastic zones (where \(\gamma >0\)) at time \(t=1\)s along with the mesh are plotted in Fig. 6 and the total compliance (30) is noted in Table 1. In Fig. 6, we observe that the plastic zones are slightly smaller for (4d) compared to (4b). As seen in Table 1, the total compliance for the cantilever beam obtained for plasticity is \(2.75\%\) lesser than the one obtained for the linear elasticity case. While this improvement is pertinent, it is not very impressive. On the other hand, Table 1 confirms that Fig. 4b is (slightly) better than Fig. 4d for the linear elasticity.

Next, we investigate if a few parameters of the previous test-case (external force, optimization criteria or initialization) results in a drastic change of the plastic zone. Specifically we investigate three variations.

  1. 1.

    Increase the external force to

    $$\begin{aligned} \varvec{g}=(0, 400\min (1.5t, 1) )MN/m,\, t\in [0,1]s\end{aligned}$$
    (71)

    such that the entire shape undergoes a plastic deformation.

  2. 2.

    Consider two new criteria for minimization: total energy

    $$\begin{aligned} J(\Omega ) = \int _0^T\int _{\Omega }\frac{1}{2}\left( \mathbb {C}\varvec{\varepsilon }_e:\varvec{\varepsilon }_e + \mathbb {H}\varvec{\varepsilon }_p:\varvec{\varepsilon }_p + E_{iso}\gamma ^2 \right) {\text{d}}x \end{aligned}$$
    (72)

    and energy due to kinematic hardening

    $$\begin{aligned} J(\Omega ) = \int _0^T\int _{\Omega }\frac{1}{2}\mathbb {H}\varvec{\varepsilon }_p:\varvec{\varepsilon }_p {\text{d}}x , \end{aligned}$$
    (73)

    in addition to the total compliance criterion (30).

  3. 3.

    Consider three different initializations (as shown in Fig.7) for total compliance minimization.

Fig. 7
figure 7

Von Mises stress at \(t=1\)s for the initial shapes (on the left) and optimized shapes for total compliance (30) (on the right), with \(V_t = 0.7m^2,\) \(E_{iso}=712MPa\), \(\mathbb {H}=105\mathbb {I}MPa\) and force (71)

The shapes obtained for the three different initializations are plotted in Fig. 7, their corresponding compliances (30) are presented in Table 2, and their convergence histories are depicted in Fig. 8. As expected, we obtained three different topologies. In Fig. 7, we observe that plastic deformation occurs everywhere in the optimal shapes. This was not expected as yielding should have resulted in a high accumulated plastic deformation and hence a high total compliance. However what actually happens is that, when the shapes reach the yield point, hardening occurs. Once the shape hardens, its load bearing capacity increases. Hence the optimal shapes are the ones that struggle a balance between hardening and plastic deformation. Consequently, it is unrealistic to expect dramatic reduction in the size of plastic zones. As seen in Table 2, the cantilever beam is best optimized if initialized by the solution obtained for the linear elasticity case (Fig. 7c). In Fig. 8, we see almost no decrease in the objective function for the shape of Fig. 7f. This means that the shape obtained for the linear elasticity case is almost optimal for plasticity.

The shapes obtained for different objective functions, namely total energy (72) and plastic energy (73), are plotted in Fig. 9. The shapes 9a, b are similar to the previous shapes of Fig. 7d, f, respectively. In both cases they were initialized with Fig. 7c. Again, the size of the plastic zone (where \(\gamma >0\)) has not decreased. We believe it is because plastic zones are hardened zones and, as a result, are necessary for minimizing the total energy or the plastic energy.

Fig. 8
figure 8

Convergence history for the shapes (Fig. 7b, d, f)

Fig. 9
figure 9

Von Mises stress at \(t=1\)s for optimized shapes, initialized from (7c), with \(V_t = 0.7\;{\text{m}}^2,\) \(E_{\text{iso}}=712\;{\text{MPa}}\), \(\mathbb {H}=105\mathbb {I}\;{\text{MPa}}\) and force (71)

Table 2 2D Cantilever shape comparison for force (71)

5.2 2D wedge

We study a 2D wedge (Fig. 10) which is fixed on its leftmost leg, has a vanishing vertical displacement on its rightmost leg, and is loaded on the middle of its upper boundary.

Fig. 10
figure 10

2D wedge boundary conditions

Fig. 11
figure 11

Von Mises stress at \(t=1\)s for initialized shapes (on the left), optimized shapes for total compliance (30) (on the right), with \(V_t = 0.2\;{\text{m}}m^2,\) \(E_{iso}=712\; {\text{MPa}}\), \(\mathbb {H}=105\mathbb {I}\; {\text{MPa}}\)

A target volume \(V_f = 0.2\;{\text{m}}^2\) is imposed. As before, the force grows in the beginning and then remains constant

$$\user2{g} = (0,\,500\min \,(1.5t,1)MN/m,t \in [0,1]s.$$

The parameters of the remeshing tool MMG are fixed to \(h_{\text{min}} = 0.005 \;{\text{m}}\) (minimal mesh size), \(h_{\max } = 0.01\; {\text{m}}\) (maximal mesh size). Isotropic hardening combined with kinematic hardening is considered using the parameters \(E_{\text{iso}} = 712\;{\text{MPa}},\) \(\mathbb {H}=105\mathbb {I}\;{\text{MPa}}\). Two initializations are considered for the minimization of total compliance (30): one consisting of periodically distributed holes (Fig. 11a), the other being the optimal shape for compliance minimization in linear elasticity (see Fig. 11c). The two initializations result in two different shapes as shown in Fig. 11b, d. The corresponding convergence histories are plotted in Fig. 12. As can be checked on Fig. 12, the shape optimized for linear elasticity performs better in terms of total compliance than the shape optimized for plasticity (Fig. 11d), starting from a periodically perforated initialization. Once again, it stresses the importance of the initialization. In all cases, the optimized shapes undergo plastic deformation everywhere in the solid.

Fig. 12
figure 12

Convergence history for the shapes (Fig. 11b, d)

Finally we consider the case of a force whose direction changes in time, defined as

$$\begin{aligned} \varvec{g} = \left( 80 \cos \left( \frac{\pi t}{2}\right) \left| \sin \left( 3\pi t\right) \right| , -80\sin \left( \frac{\pi t}{2}\right) \left| \sin \left( 3\pi t\right) \right| \right) MPa \qquad \hbox {for } t\in [0,1]s. \end{aligned}$$
(74)

To get an intuitive idea of the direction of forcing, we also plot the force vectors for a few time steps in Fig.13.

Fig. 13
figure 13

Rotating force (74) applied to the wedge

Fig. 14
figure 14

Von Mises stress at \(t = 1\)s for the shape optimized for total compliance (30), force (74), \(V_f=0.2\;{\text{m}}^2\)

Only for this test-case, the time step for plasticity is taken smaller, \(\delta t = 0.01\;{\text{s}}\). It implies that there are at least 100 time steps for solving the plasticity equations and the adjoint system. The initial shape is the same as in Fig. 11a. We minimize the total compliance (30) for plasticity as well as for linear elasticity. For the linear elasticity case, the displacement vector \(\varvec{u}\) is computed for the force (74) at every time step assuming quasi-static evolution. Because of this assumption, there is no dependence of \(\varvec{u}\) on the forcing trajectory (this could also be seen as a multiple loading test-case). However in the case of plasticity, the forcing trajectory plays an important role in influencing \(\varvec{u}\) as the shape undergoes plastic deformation at every time step. This test-case is thus indicative of the role, the forcing trajectory plays in shape optimization for plasticity. The shape optimized for the force (74) in linear elasticity is plotted in Fig. 14a and for combined hardening in Fig. 14b. Table 3 compares the two shapes (14a) and (14b). As anticipated, the shape of Figure 14b performs better for plasticity.

Table 3 2D wedge shape comparison for force (74)
Fig. 15
figure 15

3D cantilever boundary conditions

5.3 3D cantilever

We now consider the minimization of the total compliance (30) for a 3D cantilever beam of dimensions \(5m\times 2.4\;{\text{m}}\times 3\; {\text{m}},\) as shown in Fig. 15. The cantilever beam is fixed on its leftmost side, loaded downwards on a circular region of radius 0.1m on its rightmost side with \(\varvec{g} =(0, 5000 t, 0)\;{\text{MN/m}}\) where \(t\in [0,1]s\). For this test-case, we consider combined hardening with \(E_{\text{iso}} = 712\;{\text{MPa}},\) \(\mathbb {H}=105\mathbb {I}\;{\text{MPa}}\) and a target volume \(V_f = 12\;{\text{m}}^3.\) The parameters of MMG are set to \(h_{\text{min}} = 0.04\;{\text{m}}\), and \(h_{\text{max}} = 0.12\;{\text{m}}\). We initialize the shape optimization with a perforated shape as in Fig. 16a. Learning from the previous test-cases, we also initialize with the shape obtained after minimizing compliance for linear elasticity (see Fig. 16c). The optimization from initialization in Fig. 16a is run for longer, 250 iterations instead of 200 iterations as in the other test-cases. This is because an initialization with holes is far from the optimum and it take longer to converge to a form with plate-like features (which is known to be optimal for maximizing rigidity). As seen in Fig. 16, the two initializations result in the same shape (Fig.16b–d). Their corresponding convergence histories are plotted in Fig.17. The shapes (16b–d), are compared quantitatively in Table 4. As seen in Fig. 17, it takes a long time for the shape (16a) to converge, whereas the shape (16c) converges in the first few iterations. Consequently, we conclude that it is often advantageous to first optimize the shape for linear elasticity, and then use the optimized shape as initialization to minimize for plasticity.

Fig. 16
figure 16

Von Mises stress at \(t=1\)s for the initial shapes (on the left) and the optimized shapes for total compliance (30) (on the right), with \(V_t = 12\;{\text{m}}^3,\) \(E_{iso}=712\;{\text{MPa}}\), \(\mathbb {H}=105\mathbb {I}\;{\text{MPa}}\)

Fig. 17
figure 17

Convergence history for shapes (16b, d)

Table 4 3D Cantilever shape comparison
Fig. 18
figure 18

3D wedge boundary conditions

5.4 3D wedge

We now consider a 3D wedge of dimensions \(1.2\;{\text{m}}\times 0.6\;{\text{m}}\times 0.6\;{\text{m}}\) as shown in Fig. 18. The geometry is supported on four square surfaces each being \(0.05\;{\text{m}}\times 0.05\;{\text{m}},\) three of which can be seen in Fig. 18. The wedge is clamped along all the three axes on one surface and only along y-direction on the remaining three surfaces. The wedge is forced on a square surface on the topmost plane with \(\varvec{g} =(0, -500t, 0)\;{\text{MN/ma}}\) where \(t\in [0,1]\)s. The parameters of MMG are set to \(h_{\text{min}} = 0.012\;{\text{m}}\), and \(h_{\text{max}} = 0.032\;{\text{m}}\). We consider combined hardening with \(E_{iso} = 712\;{\text{MPa}},\mathbb {H} = 105\mathbb {I}\;{\text{MPa}}\) and impose a target volume of \(V_f = 0.07\;{\text{m}}^3.\) Optimized shapes for linear elasticity and plasticity are displayed in Fig. 19. Again, we consider two initializations: one with periodically distributed holes and one obtained by minimizing compliance for linear elasticity. It yields two topologically different optimized shapes as shown in Fig. 5.4. As can be seen in Table 5, the shape (19d) outperforms the shape (19b) in terms of (30) in plasticity as well as in linear elasticity.

Fig. 19
figure 19

Von Mises stress at \(t=1\)s for the initial shapes (on the left) and the optimized shapes for total compliance (30) (on the right), with \(V_t = 0.07\;{\text{m}}^3,\) \(E_{\text{iso}}=712\;{\text{MPa}}\), \(\mathbb {H}=105\mathbb {I}\;{\text{MPa}}\)

Table 5 3D wedge shape comparison