1 Introduction

We consider the numerical treatment of a geometrically nonlinear hyperelastic planar Cosserat shell model. This model has been obtained by dimensional reduction from a full three-dimensional Cosserat continuum model. Its degrees of freedom are the displacement m of the shell midsurface, together with the orientation of an orthonormal director triple \(\overline{R}\) at each point. Consequently, if \(\omega \) denotes the two-dimensional parameter domain, configurations of such a shell are pairs of functions

$$\begin{aligned} (m,\,\overline{R}){\text {:}}\,\omega \rightarrow {\mathbb {R}}^3 \times \text {SO(3)}, \end{aligned}$$

of suitable smoothness, where \(\text {SO}(3),\) the special orthogonal group, is the set of orthogonal \(3 \times 3\) matrices with determinant 1. We consider a hyperelastic material law of the form

$$\begin{aligned} I(m,\,\overline{R})= & {} \int _\omega hW_{\mathrm{mp}}(\overline{U})+ hW_{\mathrm{curv}}\left( {\mathfrak {K}}_s\right) \nonumber \\&+ \,\frac{h^3}{12} W_{\mathrm{bend}}\left( {\mathfrak {K}}_b\right) \mathrm {d}\omega + \text {external loads}, \end{aligned}$$
(1)

where \(W_\mathrm{mp}\) is the membrane energy, \(W_\mathrm{bend}\) is the bending energy, and \(W_\mathrm{curv}\) is a curvature term depending only on the orientation field \(\overline{R}.\) This energy, originally proposed in [31, 34], is a second-order model, frame-invariant, and allows for large elastic strains and finite rotations. The membrane contribution \(W_\mathrm{mp}\) is polyconvex, and uniformly Legendre–Hadamard-elliptic. Existence of minimizers in the space \(H^1(\omega ,\,{\mathbb {R}}^3) \times W^{1,q}(\omega ,\,\text {SO}(3))\) has been shown in [31, 34] for any \(q\ge 2.\)

In this article we consider planar shells only, i.e., we assume that the undeformed configuration \((m_0,\,\overline{R}_0){\text {:}}\, (x,\,y) \mapsto ((x,\,y,\,0),\,\text {Id})\) is a stress-free state. However, our numerical treatment can also be generalized to a general nonplanar shell model. We arrive at the planar model in two steps: first, dimensional reduction of a parent three-dimensional Cosserat model yields a shell model with a quadratic membrane energy, suitable only for small membrane strains. We then generalize this shell model to obtain the finite-strain membrane term.

The shell formulation presented here is closely related to the theory of six-parameter shells with drilling rotations [12, 18, 27, 50]. A detailed comparison between the two approaches in the case of plates has been given in [6, 7], where we have shown existence results for isotropic, orthotropic, and composite plates. In [10] we have adapted the methods of [31] to prove the existence of minimizers for geometrically nonlinear six-parameter shells. In [9] we have considered shells insensitive to drilling rotations, and established a useful representation theorem for this case (corresponding to the Cosserat couple modulus \(\mu _c=0\)). Readers that are used to the notation used in the engineering literature may find [9, 10] a more accessible description of our shell model.

We also mention that the kinematic assumption underlying our Cosserat shell formulation is similar to the one used in describing a viscoelastic membrane, see [32, 49]. Indeed, the viscoelastic membrane is based on the same kinematics, but the independent rotations are evolving through a local evolution equation, whereas for the Cosserat planar shell model, they are determined by energy minimization.

Problems with directional or orientational degrees of freedom are notoriously difficult to discretize. This difficulty is caused by the nonlinearity of the orientation configuration space \(W^{1,q}(\omega ,\,\text {SO}(3))\) (or, in fact, any space of functions mapping into \(\text {SO}(3)\)). As a consequence, discretization methods based on piecewise linear or piecewise polynomial functions cannot be formulated directly for such spaces. Instead, previous discretizations have used ad hoc approaches, each with its particular shortcomings.

An obvious approach uses Euler angles to describe the rotations, and finite elements (FEs) to discretize the angles [54]. However, this leads to instabilities near certain configurations, and such models are suitable only for situations with moderately large rotations [22]. Also, the resulting discrete models are generally not objective.

Alternatively, rotations can be interpolated by means of the Lie algebra \({\mathfrak {so}}(3),\) i.e., the tangent space at the identity rotation. A rotation \(R \in \text {SO}(3)\) is represented as a rotation vector \(a \in {\mathfrak {so}}(3)\) with \(R = \exp a.\) Since \({\mathfrak {so}}(3)\) is a linear space, the rotation vectors a can be interpolated normally using FEs of first or higher order [28, 29]. This approach works only for orientation values bounded away from the cut locus of the identity rotation. To deal with larger rotations, [29] switches to a different tangent space when large rotations are detected.

Unfortunately, using a fixed tangent space for interpolation introduces a preferred direction into the discrete model. The discrete solution therefore depends on the orientation of the observer, and objectivity is not preserved.

For their model of a shell with a single director, Simo and Fox [43] propose to avoid nonlinear interpolation altogether. Instead, they introduce the director vector directions at the quadrature points as separate variables [45]. The discrete problem is solved using a Newton method. After each Newton step, the correction is interpolated from the vertices to the quadrature points. This is easily possible, since the corrections are elements of a tangent space (and hence a linear space). A similar approach is used in [15, 16] in the context of isogeometric analysis, where NURBS basis functions are employed for a geometrically exact representation of the director vector at the quadrature points. However, for a related model [44], Crisfield and Jelenić [14] showed that this approach leads to an artificial path dependence of the solution. An additional disadvantage is that discretization and solution algorithm are not clearly separated. This makes analyzing the method difficult.

One last approach regards the manifold \(\text {SO}(3)\) as a submanifold of a linear space. One can then interpolate in this space, and project the result back onto the manifold. To the knowledge of the authors this approach has never been used for shell models. For harmonic maps into the unit sphere it has been proposed and analyzed in [4]. The approach is attractive for its simplicity. However, the result of the discrete problem depends on the embedding. This is of particular importance in the case of rotations, which can be interpreted as a submanifold of \({\mathbb {R}}^{3 \times 3}\) (in which case the projection is the polar decomposition), but also (as quaternions) as a submanifold of \({\mathbb {R}}^4\) (see “Quaternion coordinates for SO(3)” in Appendix). Furthermore, the approach has only been investigated for discretizations of first order, and it is unclear whether higher approximation orders are possible as well.

In this article we propose a new discretization based on geodesic FEs (GFEs), which solves most of the shortcomings of the previous methods. GFE, originally introduced in [40, 41], are a natural generalization of standard Lagrangian FEs to spaces of functions mapping into a general Riemannian manifold M. The core idea is to write Lagrangian interpolation \(T_\mathrm{ref} \rightarrow {\mathbb {R}}\) of values \(v_1,\ldots ,v_m \in {\mathbb {R}}\) on a reference element \(T_\mathrm{ref}\) as a minimization problem

$$\begin{aligned} \xi \mapsto \mathop {\text {arg min}}\limits _{{w \in {\mathbb {R}}}} \sum _{i=1}^m \lambda _i(\xi ) \left| {v_i -w}\right| ^2, \end{aligned}$$

where the \(\lambda _1, \ldots , \lambda _m{\text {:}}\,T_\mathrm{ref} \rightarrow {\mathbb {R}}\) are the Lagrangian shape functions. For values \(v_1, \ldots , v_m\) in a Riemannian manifold M,  this formulation can be generalized using the Riemannian distance

$$\begin{aligned} \xi \mapsto \mathop {\text {arg min}}\limits _{{w \in M}} \sum _{i=1}^m \lambda _i(\xi ) \text {dist}\left( v_i,\,w\right) ^2. \end{aligned}$$

This construction is also known as the Karcher mean [24] or the Riemannian center of mass. It forms the basis of a general FE theory for functions mapping into a manifold M [40, 41]. FE spaces constructed this way are conforming in the sense that FE functions belong to the Sobolev space \(W^{1,q}(\omega ,\,M)\) for all \(q \ge 2.\) Since their formulation is based on metric properties of M,  they are naturally equivariant under isometries of M. Optimal a priori discretization error bounds have been given in [20].

When using this technique for the case \(M = \text {SO}(3)\) considered here (but the same holds also when discretizing one-director models with \(M=S^2\) such as the one proposed in [43]), the resulting discrete model has many desirable properties. Since the FE spaces are conforming, there is no consistency error introduced when evaluating the continuous energy for FE functions. Since no angles and no “special orientations” appear in the discretization, the discrete model is not restricted to small or moderate rotations. Indeed, as we demonstrate in Sect. 6.3, arbitrary rotations in the deformation can be handled with ease. Finally, from the equivariance of the nonlinear interpolation follows that the frame invariance of the continuous model (1) is preserved by the discretization, and we obtain a completely frame-invariant discrete problem.

As an additional advantage, the fact that the FE space is contained in the continuous Ansatz space \(H^1(\omega ,\,{\mathbb {R}}^3) \times W^{1,q}(\omega ,\,\text {SO}(3))\) implies that properties of the tangent matrix can be inferred from corresponding properties of the continuous tangent operator. In particular, we directly obtain symmetry of the tangent matrix. The tangent matrix is positive definite if the continuous tangent operator is.

The algebraic formulation corresponding to the discrete problem is a minimization problem posed in the product space \({\mathcal {M}} :={\mathbb {R}}^{3N} \times \text {SO(3)}^N,\) where N is the number of Lagrange nodes of the grid. The space \({\mathcal {M}}\) is a 6N-dimensional Riemannian manifold. To solve this minimization problem we use a Riemannian trust-region algorithm [1], which is a globalized Newton method. As such, it is guaranteed to converge to at least a stationary point of the algebraic energy for any initial iterate, and without using intermediate loading steps. At each step of the method, a constrained quadratic minimization problem needs to be solved. We propose to use a monotone multigrid method [26, 40], which allows efficient and robust solutions of the constrained problems even on fine grids. As a variant of the Newton method, the trust-region algorithm requires tangent matrices of the energy. We obtain those matrices completely automatically by using automatic differentiation (AD) as implemented in the software ADOL-C [19, 48].

In this article we show four numerical examples. First, we simulate bending of a clamped beam using GFEs of different orders. The result shows that shear locking does not occur unless we are using first-order elements for the deformation. Then, we compute the post-critical behavior of an L-shaped beam. This was posed as a benchmark problem in [3, 44, 45, 54], and we compare our results with results given there. Thirdly, we demonstrate that our discretization does indeed allow unrestricted rotations. For this we simulate a long elastic strip, which we clamp on one short end, and subject it to several full rotations at the other end. Finally, to show that the Cosserat shell model can represent non-classical microstructure effects, we use it to produce wrinkles in a sheared rectangular membrane. Such shearing tests have been performed experimentally by [52], and we obtain excellent quantitative agreement with their results.

This article is structured as follows: In Sect. 2 we present the continuous model and discuss a few of its properties. Section 3 introduces the GFE method, specialized for the case \(M=\text {SO(3)}\) needed for the Cosserat shell model. Section 4 discusses the resulting discrete and algebraic models. Section 5 explains the Riemannian trust-region method used to find energy minimizers without loading steps. Section 6 gives the four numerical examples. Finally, an Appendix collects various important facts about \(\text {SO}(3)\) needed to implement the GFE method.

2 The continuous Cosserat shell model

In this chapter we present the planar Cosserat shell model and discuss its features. The detailed derivation of the shell model from a three-dimensional parent Cosserat model was presented in the papers [31, 34]. The intermediate shell model for infinitesimal strain is described in Sect. 2.1. The complete finite-strain model is then introduced in Sect. 2.2.

2.1 The small-strain planar Cosserat shell model

We consider a thin domain \(\Omega _h \subset {\mathbb {R}}^3\) of the form \(\Omega _h=\omega \times [{-}h/2,\,h/2],\) where \(\omega \) is a bounded domain in \({\mathbb {R}}^2\) with smooth boundary \(\partial \omega ,\) and \(h>0\) is the thickness of the planar shell. The domain \(\Omega _h\) is the region occupied by the reference configuration of the parent 3D Cosserat continuum. Let \(\{e_1,\,e_2,\,e_3\}\) be the unit vectors along the axes of the reference Cartesian coordinate system, denote by \(\varphi {\text {:}}\,\Omega _h \rightarrow {\mathbb {R}}^3\) the deformation, and by \(\overline{R}{\text {:}}\,\Omega _h\rightarrow \mathrm {SO}(3)\) the independent microrotation of this micropolar continuum.

For the planar shell model we want to find a reasonable approximation \((\varphi _s,\,\overline{R}_s)\) of \((\varphi ,\,\overline{R})\) involving only two-dimensional quantities, i.e., expressed with the help of functions of the in-plane coordinates \((x,\,y).\) Therefore, we assume a quadratic Ansatz in the thickness coordinate z for the finite deformation \(\varphi _s{\text {:}}\,\Omega _h\rightarrow {\mathbb {R}}^3\)

$$\begin{aligned} \varphi _s(x,\,y,\,z)= & {} m(x,\,y)\nonumber \\&+\left( z\varrho _m(x,\,y)+\dfrac{z^2}{2}\varrho _b(x,\,y)\right) \varvec{d}(x,\,y). \end{aligned}$$
(2)

Here \(m{\text {:}}\,\omega \rightarrow {\mathbb {R}}^3\) describes the deformation of the midsurface of the shell, and \(\varvec{d}{\text {:}}\,\omega \rightarrow {\mathbb {R}}^3\) is an independent unit director. We assume the rotations \(\overline{R}_s{\text {:}}\,\Omega _h\rightarrow \mathrm {SO}(3)\) for thin and homogeneous shells to be independent of the thickness variable z,  i.e.,

$$\begin{aligned} \overline{R}_s(x,\,y,\,z)=\overline{R}_s(x,\,y,\,0)\quad \text {for}\quad z\in \left[ {-}\frac{h}{2},\,\frac{h}{2}\right] , \end{aligned}$$

and we specialize the independent unit director \(\varvec{d}\) in the Ansatz (2) by choosing

$$\begin{aligned} \varvec{d}(x,\,y):=\overline{R}_s(x,\,y,\,0) e_3 =:\overline{R}_3. \end{aligned}$$

Thus, the director \(\varvec{d}(x,\,y)\) is taken as the third column of the orthogonal matrix \(\overline{R}_s(x,\,y),\) and the model now also includes drilling rotations about the director \(\varvec{d}.\) The drilling rotations are determined by the first two columns of \(\overline{R}_s.\) For the sake of simplicity, we drop the index s and write \(\overline{R} \) instead of \(\overline{R}_s\) in what follows.

When the director \(\varvec{d}(x,\,y)\) is not normal to the midsurface \(m(x,\,y),\) then transverse shear deformation occurs. The scalar functions \(\rho _m,\,\rho _b{\text {:}}\,\omega \rightarrow {\mathbb {R}}\) in (2) describe the symmetric thickness stretch (for \(\rho _m\ne 1\)) and the asymmetric thickness stretch (for \(\rho _b\ne 0\)) about the midsurface. The scalar field \(\rho _m\) is mainly membrane related, while \(\rho _b\) is mainly bending related. Imposing that the stress vectors on the upper and lower surfaces of the shell have zero normal components (which is a common assumption in the theory of shells, see, e.g., [46], Sect. 5), we obtain the following expressions for \(\varrho _m\) and \(\varrho _b\) [31]

$$\begin{aligned} \begin{aligned}&\varrho _m :=1-\dfrac{\lambda }{2\mu +\lambda }[\langle (\nabla m|0),\,\overline{R}\rangle -2]+\dfrac{\langle N_{\mathrm {diff}},\, \overline{R}_3\rangle }{2\mu +\lambda }, \\&\varrho _b :=-\dfrac{\lambda }{2\mu +\lambda }\left\langle \left( \nabla \overline{R}_3|0\right) ,\,\overline{R}\right\rangle +\dfrac{\langle N_{\mathrm {res}},\, \overline{R}_3\rangle }{(2\mu +\lambda )h}, \end{aligned} \end{aligned}$$

where the parameters \(\lambda ,\,\mu > 0\) are the Lamé constants of classical isotropic elasticity, and \(N_{\mathrm {res}},\,N_{\mathrm {diff}}{\text {:}}\,\omega \rightarrow {\mathbb {R}}^3\) are defined in terms of the prescribed tractions \(N^{\mathrm {trans}}\) on the transverse boundaries \(z={\pm } h/2\) by

$$\begin{aligned} \begin{aligned}&N_{\mathrm {res}}(x,\,y) :=\left[ N^{\mathrm {trans}}\left( x,\,y,\,\frac{h}{2}\right) +N^{\mathrm {trans}}\left( x,\,y,\,{-}\frac{h}{2}\right) \right] , \\&N_{\mathrm {diff}}(x,\,y) :=\frac{1}{2}\left[ N^{\mathrm {trans}}\left( x,\,y,\,\frac{h}{2}\right) -N^{\mathrm {trans}}\left( x,\,y,\,{-}\frac{h}{2}\right) \right] . \end{aligned} \end{aligned}$$

The strain measures for the planar Cosserat shell model are the following: the micropolar non-symmetric stretch tensor \(\overline{U}\) is defined as

$$\begin{aligned} \overline{U} = \overline{R}^T\hat{F} \quad \text {with}\quad \hat{F}=\left( \nabla m| \overline{R}_3\right) \in {\mathbb {M}}^{3\times 3}, \end{aligned}$$

while the micropolar curvature tensor \({\mathfrak {K}}_s\) (of third order) and the micropolar bending tensor \({\mathfrak {K}}_b\) (of second order) are given by

$$\begin{aligned} {\mathfrak {K}}_s&:=\left( \overline{R}^T\left( \nabla \overline{R}_1|0\right) ,\, \overline{R}^T\left( \nabla \overline{R}_2|0\right) ,\, \overline{R}^T\left( \nabla \overline{R}_3|0\right) \right) \\&=:\left( {\mathfrak {K}}_s^1,\, {\mathfrak {K}}_s^2,\,{\mathfrak {K}}_s^3\right) \in {\mathbb {M}}^{3 \times 3 \times 3},\\ {\mathfrak {K}}_b&:=\overline{R}^T\left( \nabla \overline{R}_3|0\right) = {\mathfrak {K}}_s^3 \in {\mathbb {M}}^{3 \times 3}. \end{aligned}$$

We have used the superposed caret and bars for \(\hat{F},\, \overline{R},\,\overline{U}\) in order to distinguish these tensors from the classical notations in 3D elasticity for deformation gradient F,  the continuum rotation \(R=\text {polar}(F),\) and the symmetric continuum stretch tensor \(U=R^TF=\sqrt{F^TF}.\)

We mention that the kinematical structure of this Cosserat shell model is in fact equivalent to the kinematical structure of nonlinear six-parameter resultant shell theory [12, 18, 27], as it was pointed out in [710].

As a result of the dimensional reduction procedure, the following two-dimensional minimization problem for the deformation of the midsurface \(m{\text {:}}\,\omega \rightarrow {\mathbb {R}}^3\) and the microrotation field \(\overline{R}{\text {:}}\,\omega \rightarrow \text {SO(3)}\) is obtained [31]:

Problem 1

Find a pair \((m,\,\overline{R})\) that minimizes the functional

$$\begin{aligned} I(m,\,\overline{R})= & {} \int _\omega hW_{\mathrm {mp}}(\overline{U})+ hW_{\mathrm {curv}}\left( {\mathfrak {K}}_s\right) \nonumber \\&+\, \dfrac{h^3}{12}W_{\mathrm {bend}}\left( {\mathfrak {K}}_b\right) \mathrm {d}\omega - \Pi \left( m,\,\overline{R}_3\right) , \end{aligned}$$
(3)

subject to suitable boundary conditions for the deformation and rotation.

The three parts of the total elastically stored energy density of the shell correspond to membrane-strain \(W_{\mathrm{mp}},\) total curvature-strain \(W_{\mathrm{curv}}\) and specific bending-strain \(W_{\mathrm{bend}}.\) They have the expressions

$$\begin{aligned} W_{\mathrm {mp}}(\overline{U})= & {} \mu \Vert \text {sym}(\overline{U}-\mathbbm {1})\Vert ^2 +\mu _c\Vert \mathrm{skew}(\overline{U}-\mathbbm {1})\Vert ^2 \nonumber \\&+ \, \dfrac{\mu \lambda }{2\mu +\lambda }\text {tr}[\text {sym}(\overline{U}-\mathbbm {1})]^2 \nonumber \\= & {} \mu \underbrace{\Vert \text {sym}( ( \overline{R}_1|\overline{R}_2)^T\nabla m- \mathbbm {1}_2 )\Vert ^2}_{{\text {shear-stretch\,energy}}}\nonumber \\&+ \, \mu _c\underbrace{\Vert \mathrm{skew}(( \overline{R}_1|\overline{R}_2)^T\nabla m)\Vert ^2}_{\mathrm{first\,order\,drill\,energy}}\nonumber \\&+ \, \dfrac{(\mu +\mu _c)}{2}\underbrace{\kappa (\langle \overline{R}_3,\,m_x\rangle ^2 + \langle \overline{R}_3\,,\,m_y\rangle ^2)}_{\mathrm{classical\,transverse\,shear\,energy}} \nonumber \\&+ \, \dfrac{\mu \lambda }{2\mu +\lambda }\underbrace{\text {tr}[\text {sym}((\overline{R}_1|\overline{R}_2)^T\nabla m- \mathbbm {1}_2)]^2}_{\mathrm{volumetric\,stretch\,energy}}, \end{aligned}$$
(4)
$$\begin{aligned} W_{\mathrm {curv}}\left( {\mathfrak {K}}_s\right)= & {} \mu L_c^q \left\| {\mathfrak {K}}_s\right\| ^q \nonumber \\= & {} \mu L_c^q\left( \left\| {\mathfrak {K}}_s^1\right\| ^2+\left\| {\mathfrak {K}}_s^2\right\| ^2+\left\| {\mathfrak {K}}_s^3\right\| ^2\right) ^{q/2},\nonumber \\ W_{\mathrm {bend}}\left( {\mathfrak {K}}_b\right)= & {} \mu \left\| \text {sym}\left( {\mathfrak {K}}_b\right) \right\| ^2 +\mu _c\left\| \mathrm{skew}\left( {\mathfrak {K}}_b\right) \right\| ^2\nonumber \\&+ \, \dfrac{\mu \lambda }{2\mu +\lambda } \text {tr}\left[ \text {sym}\left( {\mathfrak {K}}_b\right) \right] ^2, \end{aligned}$$
(5)

where the additional parameter \(\mu _c\ge 0 \) is called the Cosserat couple modulus, and \(\kappa \) is a shear correction factor (\(0<\kappa \le 1\)). For \(\mu _c > 0\) the elastic strain energy density \(W_{\mathrm {mp}}(\overline{U})\) is uniformly convex in \(\overline{U},\) but for the important case \(\mu _c = 0\) this property is lost. Therefore, the case \(\mu _c = 0\) must be investigated separately. In the curvature energy density \(W_\mathrm{curv},\) the parameter \(L_c > 0\) is an internal length which is characteristic for the material, and is responsible for size effects. Note that \(W_{\mathrm {curv}}\) is a specific contribution which is strictly related to the new Cosserat effects and should not be confused with the bending terms. We mention that this is a first-order model, i.e., no second or higher derivatives of the independent variables m and \(\overline{R}\) appear. Also, the energy depends on the midsurface deformation m and microrotations \(\overline{R}\) only through the frame-indifferent measures \(\overline{U}\) and \({\mathfrak {K}}_s.\) Thus, in the absence of external forces, the planar shell model is fully frame-indifferent in the sense that

$$\begin{aligned} I(Qm,\,Q\overline{R}) = I(m,\,\overline{R}),\quad \forall Q\in \text {SO(3)}. \end{aligned}$$

The reduced external loading functional \(\Pi (m,\,\overline{R}_3)\) appearing in (3) is a linear form in \((m,\,\overline{R}_3),\) defined in terms of the underlying three-dimensional loads by

$$\begin{aligned} \Pi \left( m,\,\overline{R}_3\right)= & {} \int _\omega \langle \overline{f},\,m\rangle +\left\langle \overline{M},\,\overline{R}_3\right\rangle \mathrm {d}\omega +\int _{\gamma _s} \langle \overline{N},\,m\rangle \nonumber \\&+\left\langle \overline{M}_c,\,\overline{R}_3\right\rangle ds, \end{aligned}$$

where \(\gamma _s\times \left[ {-}\frac{h}{2},\,\frac{h}{2}\right] \subset \partial \omega \times \left[ {-}\frac{h}{2},\,\frac{h}{2}\right] \) is the part of the lateral boundary of \(\Omega _h\) where external surface forces and couples are prescribed. The vector fields \(\overline{f},\, \overline{M},\, \overline{N} \) and \(\overline{M}_c\) denote the resultant body force, resultant body couple, resultant surface traction and resultant surface couple, respectively [31].

For the Dirichlet boundary conditions we suppose that there exists a prescribed function \(g_d{\text {:}}\,\Omega _h \rightarrow {\mathbb {R}}^3,\) whose restriction to the Dirichlet part of the boundary gives the prescribed displacement. We further introduce the abbreviation

$$\begin{aligned} g_d^{\prime }{\text {:}}\,\omega \rightarrow {\mathbb {R}}^3, \quad g_d^{\prime }(x,\,y) :=\nabla g_d(x,\,y,\,0) e_3. \end{aligned}$$

For the midsurface deformation m we then consider the boundary conditions

$$\begin{aligned} m(x,\,y)_{|\gamma _0}=g_d(x,\,y,\,0), \end{aligned}$$
(6)

on the Dirichlet part \(\gamma _0\) of the boundary \(\partial \omega .\)

For the microrotations \(\overline{R}\) we can consider various possible alternative boundary conditions on \(\gamma _0,\) see [31, 34]. In what follows, we consider two types:

$$\begin{aligned}&\text {(1)}\,free~boundary~conditions~on \,\overline{R},\nonumber \\&\quad \text {i.e., induced Neumann type (natural) conditions}; \end{aligned}$$
(7)
$$\begin{aligned}&\text {(2)}\,~rigid~director~prescription ,\quad i.e., \quad {\overline{R}_3}_{|\gamma _0}=\frac{g_d^{\prime }}{\Vert g_d^{\prime }\Vert },\nonumber \\&\quad \text {together with zero Neumann conditions for}\nonumber \\&\quad \text {the drilling degree of freedom}. \end{aligned}$$
(8)

The existence of minimizers for this Cosserat planar shell model under various assumptions on the coefficients and boundary conditions has been proved in [31, 34]. For instance, in the case when the Cosserat couple modulus is positive (\(\mu _c>0\)) and for rigid director prescription boundary conditions (8) on \(\gamma _0,\) the following existence result has been shown in [31], using the direct method of the calculus of variations.

Theorem 1

Let \(\omega \subset {\mathbb {R}}^2\) be a bounded Lipschitz domain, and assume that the material parameters satisfy

$$\begin{aligned} \mu _c>0,\quad q\ge 2.\end{aligned}$$

Let the boundary data and external loads functions satisfy the regularity conditions

$$\begin{aligned}&g_d(x,\,y,\,0)\in H^1\left( \omega ,\,{\mathbb {R}}^3\right) ,\nonumber \\&\quad \text {polar}\left( \nabla g_d(x,\,y,\,0)\right) \in W^{1,q}(\omega ,\,\text {SO(3)}),\nonumber \\&\overline{f}\in L^2\left( \omega ,\,{\mathbb {R}}^3\right) ,\quad \overline{M}\in L^1\left( \omega ,\,{\mathbb {R}}^3\right) ,\nonumber \\&\quad \overline{N} \in L^2\left( \gamma _s,\,{\mathbb {R}}^3\right) ,\quad \overline{M}_c\in L^1\left( \gamma _s,\,{\mathbb {R}}^3\right) . \end{aligned}$$
(9)

Then the minimization problem (3)–(5) with boundary conditions (6) and (8) admits at least one minimizing solution pair \((m,\,\overline{R})\in H^1(\omega ,\,{\mathbb {R}}^3) \times W^{1,q}(\omega ,\, \mathrm {SO}(3)).\)

In the case of zero Cosserat couple modulus (\(\mu _c=0\)) the mathematical treatment of the minimization problem is more difficult, due to the lack of unqualified coercivity of the energy function with respect to the midsurface deformation m. The corresponding existence result for this case has been proved in [34] using a new extended Korn’s first inequality for plates and elasto-plastic shells [30, 39]. In this case, we need q to be strictly larger than 2. However, the numerical evidence in Sect. 6 suggests that existence also holds for \(q=2.\) For the sake of simplicity, we present this result in the case of zero external loads, i.e., \(\overline{f}=0,\, \overline{M}=0,\,\overline{N} =0,\,\overline{M}_c=0.\)

Theorem 2

Let \(\omega \subset {\mathbb {R}}^2\) be a bounded Lipschitz domain and assume that the material parameters satisfy

$$\begin{aligned} \mu _c=0,\quad q> 2. \end{aligned}$$

Let the boundary data satisfy the regularity conditions

$$\begin{aligned}&g_d(x,\,y,\,0)\in H^1\left( \omega ,\,{\mathbb {R}}^3\right) ,\nonumber \\&\quad \text {polar}\left( \nabla g_d(x,\,y,\,0)\right) \in W^{1,q}(\omega ,\,\text {SO(3)}). \end{aligned}$$

Then the minimization problem for the functional (3)–(5) with boundary conditions (6) and (8) admits at least one minimizing solution pair \((m,\,\overline{R})\in H^1(\omega ,\,{\mathbb {R}}^3) \times W^{1,q}(\omega ,\, \mathrm {SO}(3)).\)

The statement of Theorem 2 holds also in the case of non-vanishing external loads. In this respect, see the paper [34], where a modification of the external loading potential has been used.

Of particular interest is the choice of the new material parameters \(\mu _c\) (the Cosserat couple modulus) and \(L_c.\) Our model is derived from a 3D-Cosserat model in which the Cosserat couple modulus appears traditionally. It controls the skew-symmetric part of the stresses, and enforces \(\overline{R} = \text {polar}(\hat{F})\) for the limit case \(\mu _c \rightarrow \infty .\) From the literature, there does not exist a single material for which the value of the parameter \(\mu _c\) has been identified unambiguously. Considering this situation, in [33] it is argued that this parameter must be set to zero when modeling a continuous body. In [37, 38] the same question has been discussed in the larger framework of (infinitesimal) micromorphic continua with the same result: the absence of \(\mu _c\) leads to a more stringent physical description. Indeed, it implies that a linear Cosserat model collapses into classical linear elasticity.

However, in a geometrically nonlinear context, which is our case, a vanishing Cosserat couple modulus only implies that there is no first-order coupling between rotations and deformation gradients [35]. Compared with the classical Reissner–Mindlin kinematics without drill energy [36], setting \(\mu _c=0\) appears again as the most plausible choice. Since, therefore, there is no specific reason to have \(\mu _c>0,\) we omit this parameter.

The internal length \(L_c\) appears in Cosserat models as a measure of the length scale of the material microstructure. The numerical results of Sect. 6 show that values of \(L_c\) in the micrometer range lead to realistic results. However, we also note that the shell model with \(L_c\gg h\) can be useful for the description of graphene-sheets which have practically zero thickness but still show a bending stiffness. In a classical shell model, we would expect zero bending resistance.

2.2 A modified large strain Cosserat shell model

We observe that the planar shell model presented above is appropriate for finite rotations, but only small elastic membrane strains, since the membrane part \(W_{\mathrm {mp}}\) of the energy density I is quadratic. We now slightly generalize the model to allow for large elastic stretch as well. We consider again a minimization problem for the energy functional

$$\begin{aligned} I(m,\,\overline{R})= & {} \int _\omega hW_{\mathrm {mp}}(\overline{U}) + hW_{\mathrm {curv}}\left( {\mathfrak {K}}_s\right) \nonumber \\&+ \, \dfrac{h^3}{12}W_{\mathrm {bend}}\left( {\mathfrak {K}}_b\right) \mathrm {d}\omega - \Pi \left( m,\,\overline{R}_3\right) , \end{aligned}$$
(10)

formulated again in two-dimensional quantities m and \(\overline{R}.\) This time, we replace the membrane part of I by

$$\begin{aligned}&W_{\mathrm {mp}}(\overline{U}) = \mu \Vert \text {sym}(\overline{U}-\mathbbm {1})\Vert ^2 +\mu _c\Vert \mathrm{skew}(\overline{U}-\mathbbm {1})\Vert ^2 \nonumber \\&\quad \quad + \dfrac{\mu \lambda }{2\mu +\lambda }\dfrac{1}{2}\left( (\det \overline{U}-1 )^2+ \left( (\det \overline{U})^{-1}- 1 \right) ^2\right) \nonumber \\&\quad = \mu \underbrace{\Vert \text {sym}( ( \overline{R}_1|\overline{R}_2)^T\nabla m- \mathbbm {1}_2 )\Vert ^2}_{{\text {shear-stretch\,energy}}}\nonumber \\&\quad \quad +\mu _c\underbrace{\Vert \mathrm{skew}(( \overline{R}_1|\overline{R}_2)^T\nabla m)\Vert ^2}_{\mathrm{first\,order\,drill\,energy}} \nonumber \\&\quad \quad +\dfrac{(\mu +\mu _c)}{2}\underbrace{\kappa (\langle \overline{R}_3,\,m_x\rangle ^2 + \langle \overline{R}_3,\,m_y\rangle ^2)}_{\mathrm{classical\,transverse\,shear\,energy}}\nonumber \\&\quad \quad +\dfrac{\mu \lambda }{2\mu +\lambda }\underbrace{\dfrac{1}{2}((\det (\nabla m|\overline{R}_3) -1)^2+ (\det (\nabla m|\overline{R}_3)^{-1}- 1)^2)}_{\mathrm{modified\,volumetric\,stretch\,response}}. \end{aligned}$$
(11)

In this expression, we have replaced the quadratic volumetric stretch part \(\text {tr}[\text {sym}(\overline{U}-\mathbbm {1})]^2\) of (4) by the non-quadratic expression

$$\begin{aligned} \frac{1}{2}\left( (\det \overline{U}-1 )^2+ \left( (\det \overline{U})^{-1}- 1 \right) ^2\right) , \end{aligned}$$

which is volumetrically exact. However, since

$$\begin{aligned}&\frac{1}{2}\left( (\det \overline{U}-1 )^2+ \left( (\det \overline{U})^{-1}- 1 \right) ^2\right) \\&\quad = \text {tr}[ \text {sym}(\overline{U}-\mathbbm {1})]^2 +O\left( \Vert \overline{U}-\mathbbm {1}\Vert ^3\right) , \end{aligned}$$

the quadratic membrane energy (4) of the previous section can be recovered by linearization at \(\mathbbm {1} \in {\mathbb {M}}^{3 \times 3}.\)

For the nonlinear modified model (10) we set the following expression for the modified thickness stretch

$$\begin{aligned} \varrho _m :=\dfrac{1}{1+\frac{\lambda }{2\mu +\lambda }(\det \overline{U}-1)}\in (0,\,\infty ), \end{aligned}$$

which can be used for the a posteriori reconstruction of the bulk deformation.

The modified membrane energy density (11) represents an improvement over the initial planar shell model (4) in various regards. Indeed, we note that

$$\begin{aligned} W_{\mathrm {mp}}(\overline{U})\rightarrow \infty \quad \text {if}\quad \det \overline{U}\rightarrow 0. \end{aligned}$$

Moreover, for any fixed \(\overline{R}\) the energy \(W_{\mathrm {mp}}\) is polyconvex [17, 42] with respect to \(\nabla m,\) and it is uniformly Legendre–Hadamard elliptic, independently of \(\mu _c\ge 0.\)

The following existence result for the modified model, in the important case \(\mu _c = 0,\) was originally proved in [34]. Again, we assume vanishing external loads for simplicity.

Theorem 3

Let \(\omega \subset {\mathbb {R}}^2\) be a bounded Lipschitz domain and assume that the boundary data satisfies (9).

Then the minimization problem for the functional (10) with the parameters

$$\begin{aligned} \mu _c=0\quad and \quad q>2, \end{aligned}$$

with boundary conditions (6), (8) admits at least one minimizing solution pair \((m,\,\overline{R})\in H^1(\omega ,\,{\mathbb {R}}^3) \times W^{1,q}(\omega ,\, \text {SO(3)}),\) with

$$\begin{aligned} \det \left( \nabla m|\overline{R}_3\right) =\det \hat{F}>0, \end{aligned}$$

almost everywhere in \(\omega .\)

We note that the formulation (10) has the same linearized behavior as the initial model (3) and it reduces upon linearization to the classical infinitesimal-displacement Reissner–Mindlin model for the choice of parameters \(\mu _c=0\) and \(q>2.\)

Remark 1

The Cosserat model presented above can be extended to a general nonplanar shell model. Indeed, instead of the domain \(\Omega _h\) and the Ansatz for plates (2), one can begin with a shell-like (curved) thin domain and an appropriate Ansatz for shells. Then, the formal dimensional reduction to a two-dimensional shell model is derived analogously as in the case of plates, but involves additional tools from classical differential geometry of surfaces for the description of shell configurations. The resulting Cosserat shell model is quite general and has the advantage that it can be used to also describe elasto-plastic and visco–elasto-plastic material behavior. This work is currently in progress.

3 Geodesic finite elements

Discretization of the shell models presented in the previous section is difficult, because the orientation configuration space \(W^{1,q}(\omega ,\,\text {SO(3)})\) is not linear. As a consequence,linear, and more generally polynomial, interpolation is undefined in these spaces, and standard FE methods cannot be used.

GFEs are a generalization of standard FEs to problems for functions with values in a nonlinear Riemannian manifold M. We give a brief introduction and state the relevant features without proof. While GFEs can be constructed easily for very general M,  we state all results here for the case \(M = \text {SO}(3)\) only. The interested reader is referred to the original publications [40, 41] for more details.

The definition of GFE spaces consists of two parts. First, nonlinear interpolation functions are constructed that interpolate values given on a reference element. Then, these interpolation functions are pieced together to form global FE spaces for a given grid.

3.1 Geodesic interpolation

Fig. 1
figure 1

Second-order geodesic interpolation from the reference triangle into a sphere

We focus on the case of a two-dimensional domain \(\omega .\) All constructions and results work mutatis mutandis also for domains of other dimensions.

Let \(T_\mathrm{ref}\) be a triangle or quadrilateral in \({\mathbb {R}}^2.\) We call \(T_\mathrm{ref}\) the reference element. On \(T_\mathrm{ref}\) we assume the existence of a set of pth order Lagrangian interpolation polynomials, i.e., a set of Lagrange nodes \(a_i \in T_\mathrm{ref},\,i=1,\ldots ,m,\) and corresponding polynomial functions \(\lambda _i{\text {:}}\,T_\mathrm{ref} \rightarrow {\mathbb {R}}\) of order p such that

$$\begin{aligned} \lambda _i\left( a_j\right) = \delta _{ij}\quad \text {for}\quad i,\,j=1,\ldots ,m, \quad \text {and} \quad \sum _{i=1}^m \lambda _i \equiv 1. \end{aligned}$$

We want to generalize Lagrangian interpolation to the case of values \(R_1,\ldots ,R_m \in \text {SO(3)}\) associated to the Lagrange nodes \(a_i.\) In other words, we want to construct a function \(\Upsilon {\text {:}}\,T_\mathrm{ref} \rightarrow \text {SO(3)}\) such that \(\Upsilon (a_i) = R_i\) for all \(i = 1,\ldots ,m.\) This is a non-trivial task because \(\text {SO(3)}\) is not a vector space.

To motivate our construction we note that the usual Lagrangian interpolation of values \(v_1,\ldots ,v_m\) in \({\mathbb {R}}\) can be written as a minimization problem

$$\begin{aligned} \xi \mapsto \mathop {\text {arg min}}\limits _{{w \in {\mathbb {R}}}} \sum _{i=1}^m \lambda _i(\xi ) \left| {v_i -w}\right| ^2, \end{aligned}$$

for each \(\xi \in T_\mathrm{ref}.\) This formulation can be generalized to values in \(\text {SO(3)}.\) We use \(\text {dist}(\cdot ,\,\cdot )\) for the canonical (geodesic) distance on \(\text {SO(3)},\) which is

$$\begin{aligned} \text {dist}\left( R_1,\,R_2\right) = \left\| {\log R_1^T R_2}\right\| . \end{aligned}$$

Definition 1

([41]) Let \(\{\lambda _i\}_{i=1}^m\) be a set of pth order scalar Lagrangian shape functions on the reference element \(T_\mathrm{ref},\) and let \(R_i \in \text {SO(3)},\,i=1,\ldots ,m\) be values at the corresponding Lagrange nodes. We call

$$\begin{aligned}&\Upsilon {\text {:}}\,\text {SO(3)}^m \times T_\mathrm{ref} \rightarrow \text {SO(3)}, \\&\Upsilon \left( R_1,\ldots ,R_m;\,\xi \right) = \mathop {\text {arg min}}\limits _{{R \in \text {SO(3)}}} \sum _{i=1}^m \lambda _i(\xi ) \text {dist}\left( R_i,\,R\right) ^2, \end{aligned}$$

pth order geodesic interpolation on \(\text {SO(3)}.\)

To make the construction easier to understand we work out a simple example.

Example

Let \(T_\mathrm{ref}\) be the reference triangle

$$\begin{aligned} T_\mathrm{ref} :=\left\{ \xi \in {\mathbb {R}}^2{\text {:}}\,\xi _1 \ge 0,\, \xi _2 \ge 0, \,\xi _1 + \xi _2 \le 1\right\} , \end{aligned}$$

and consider the first-order case \(p=1.\) In this case, the Lagrange nodes \(a_1,\,a_2,\,a_3\) are the triangle vertices, and the corresponding shape functions are

$$\begin{aligned} \lambda _1(\xi ) = 1 - \xi _1 -\xi _2, \quad \lambda _2(\xi ) = \xi _1, \quad \lambda _2(\xi ) = \xi _2. \end{aligned}$$

These are simply the barycentric coordinates of \(\xi \) with respect to \(T_\mathrm{ref}.\) Let \(R_1,\, R_2,\, R_3\) be given values on \(\text {SO(3)}.\) The image of \(T_\mathrm{ref}\) under \(\Upsilon \) is then a (possibly degenerate) geodesic triangle on \(\text {SO(3)}\) with corners \(R_1,\, R_2,\, R_3.\) In particular, the edges of \(T_\mathrm{ref}\) map onto geodesics on \(\text {SO(3)}\) ([40, Lemma 2.2 with Corollary 2.2]). Even more, the map \(\Upsilon \) is equivariant under permutations of the values \(R_1,\, R_2,\, R_3\) ([41, Lemma 4.3]), a property not shared by various other commonly used discretization techniques [28, 29, 45]. Figure 1 shows the corresponding second-order case.

While Definition 1 is an obvious generalization of Lagrangian interpolation in linear spaces, it is by no means clear that it leads to a well-defined interpolation function for all coefficient sets \(R_1,\ldots , R_m \in \text {SO(3)}\) and \(\xi \in T_\mathrm{ref}.\) Intuitively, for fixed \(\xi \in T_\mathrm{ref},\) one would expect the functional

$$\begin{aligned} f_{\xi }{\text {:}}\,R \mapsto \sum _{i=1}^m \lambda _i(\xi ) \text {dist}\left( R_i,\,R\right) ^2, \end{aligned}$$
(12)

to have a unique minimizer if the \(R_i \in \text {SO(3)}\) are close enough to each other in a certain sense. For the first-order case \(p=1,\) where all \(\lambda _i\) are non-negative on \(T_\mathrm{ref},\) this follows from a classic result of Karcher [24], which was later strengthened by Kendall [25] (see also [21]). Note that \(\text {SO}(3)\) is complete and has constant sectional curvature of 1 [51, Theorem 2.7.1].

Theorem 4

(Kendall [25]) Let \(B_\rho \) be an open geodesic ball of radius \(\rho < \pi / 2\) in \(\text {SO}(3),\) and \(R_1,\ldots ,R_m \in B_\rho .\) Let \(\{ \lambda _i\}_{i=1}^m\) be a set of first-order Lagrangian shape functions. Then the function

$$\begin{aligned} f_\xi {\text {:}}\,R \mapsto \sum _{i=1}^m \lambda _i(\xi ) \text {dist}\left( R_i,\,R\right) ^2, \end{aligned}$$

has a unique minimizer in \(B_\rho \) for all \(\xi \in T_\mathrm{ref}.\)

If the polynomial order p is larger than 1, the weights \(\lambda _i\) attain negative values on \(T_\mathrm{ref},\) and the results of Karcher and Kendall cannot be used anymore. Having all \(R_i\) in a convex ball still guarantees existence of a unique minimizer, but that minimizer may only be contained in a ball of larger size.

Theorem 5

(Sander [41]) Let \(B_D \subset B_\rho \) be two concentric geodesic balls in \(\text {SO(3)}\) of radii D and \(\rho ,\) respectively, and let \(R_1, \ldots ,R_m \in \text {SO(3)}.\) There are numbers D and \(\rho \) such that if \(R_1,\ldots ,R_m \in B_D,\) then the functional (12) has a unique minimizer in \(B_\rho .\)

A quantitative version of this result is given as Theorem 3.19 in [41]. Unfortunately is is quite technical and we have chosen to omit it here. When preparing the numerical examples of Sect. 6, we have not encountered any problems stemming from a possible ill-posedness of the interpolation for extreme configurations of the \(R_1,\ldots ,R_m.\)

To be able to use the interpolation functions as the basis of a FE theory, they need to have sufficient regularity. The following result follows directly from the implicit function theorem.

Theorem 6

Let \(R_1,\ldots ,R_m\) be coefficients on \(\text {SO(3)}\) with respect to a pth order Lagrange basis \(\{ \lambda _i \}\) on a domain \(T_\mathrm{ref}.\) Under the assumptions of Theorem 5, the function

$$\begin{aligned} \Upsilon \left( R_1,\ldots ,R_m;\, \xi \right) {\text {:}}\,\text {SO(3)}^m \times T_\mathrm{ref} \rightarrow \text {SO(3)}, \end{aligned}$$

is infinitely differentiable with respect to the \(R_i\) and \(\xi .\)

This result is proved in [40, 41] for interpolation in general manifolds.

3.2 Geodesic finite element functions

The interpolation functions of the previous section can be used to construct a generalization of Lagrangian FE spaces to functions with values in \(\text {SO(3)}.\)

For this, let \(\omega \) be the two-dimensional parameter domain of our planar Cosserat shell model, and suppose it has piecewise linear boundary. Let \({\mathcal {G}}\) be a conforming grid for \(\omega \) with triangle and/or quadrilateral elements. Let \(n_i \in \omega ,\, i=1,\ldots ,N\) be a set of Lagrange nodes such that for each element T of \({\mathcal {G}}\) there are m nodes \(a_{T,i}\) contained in T,  and such that the pth order interpolation problem on T is well posed.

Definition 2

(GFEs [41]) Let \({\mathcal {G}}\) be a conforming grid on \(\omega .\) We call \(R_h {\text {:}}\,\omega \rightarrow \text {SO(3)}\) a GFE function if it is continuous, and for each element \(T \in {\mathcal {G}}\) the restriction \(R_h|_T\) is a geodesic interpolation in the sense that

$$\begin{aligned} R_h|_T(x) = \Upsilon \left( R_{T,1}, \ldots , R_{T,m};\, {\mathcal {F}}_T(x)\right) , \end{aligned}$$

where \({\mathcal {F}}_T{\text {:}}\,T \rightarrow T_\mathrm{ref}\) is affine or multilinear and the \(R_{T,i}\) are values in \(\text {SO(3)}\) corresponding to the Lagrange nodes \(a_{T,i}.\) The space of all such functions \(R_h\) will be denoted by \(V_{p,h}^{\text {SO(3)}}.\)

This construction has various desirable properties. As a first result we note that the functions constructed in this way are \(W^{1,q}\)-conforming for all \(q \ge 2.\) This follows from a slight generalization of the proof for Theorem 3.1 in [40].

Theorem 7

\(V_{p,h}^{\text {SO(3)}}(\omega ) \subset W^{1,q}(\omega ,\,\text {SO(3)})\) for all \(p \ge 1,\,q \ge 2.\)

Hence discrete approximation functions for the Cosserat microrotation field \(\overline{R}{\text {:}}\,\omega \rightarrow \text {SO(3)}\) are elements of the space \(W^{1,q}(\omega ,\,\text {SO(3)}),\) in which the Cosserat shell problem is well posed (Theorems 1 and 3). This means that the energies (3) and (10) can be directly evaluated for GFE functions, which simplifies the analysis considerably.

Since GFEs are defined using metric properties of \(\text {SO(3)}\) alone, we naturally get the following equivariance result.

Lemma 8

Let O(3) be the orthogonal group on \({\mathbb {R}}^3,\) which acts isometrically on \(\text {SO(3)}\) by left multiplication. Pick any element \(Q \in O(3).\) For any GFE function \(R_h \in V_{p,h}^{\text {SO(3)}}\) we define \(QR_h{\text {:}}\,\omega \rightarrow \text {SO(3)}\) by \((QR_h)(x) = Q(R_h(x))\) for all \(x \in \omega .\) Then \(QR_h \in V_{p,h}^{\text {SO(3)}}.\)

This lemma forms the basis of the frame-invariance of our discrete Cosserat shell model.

Optimal discretization error bounds for general GFE problems have been proved in [20]. The application of those abstract results to the energy functionals considered in this paper will be left for future work.

4 Discrete and algebraic Cosserat planar shell problem

We now discuss the minimization problem obtained by discretizing the continuous Cosserat shell model of Sect. 2 by GFEs. For that, assume that the two-dimensional domain \(\omega \) is discretized by a grid containing triangle and/or quadrilateral elements. For simplicity, we again assume that the domain boundary is resolved by the grid. We also assume that the grid resolves the Dirichlet boundary \(\gamma _0.\)

4.1 The discrete problem

The functional I given in (10) is defined on the Cartesian product of the spaces \(H^1(\omega ,\,{\mathbb {R}}^3)\) and \(W^{1,q}(\omega ,\,\text {SO}(3)).\) The first factor is a standard Sobolev space of vector-valued functions. For its discretization we introduce the space \(V_{p_1,h}^{{\mathbb {R}}^3}\) of conforming Lagrangian FEs of \(p_1\)th order with values in \({\mathbb {R}}^3.\) In the following we write \(m_h\) for discrete displacement functions from \(V_{p_1,h}^{{\mathbb {R}}^3}.\) For the rotation degree of freedom \(\overline{R}{\text {:}}\,\omega \rightarrow \text {SO}(3)\) we use the GFEs described in the previous chapter. Denote by \(V_{p_2,h}^\text {SO(3)}\) the \(p_2\)th order GFE space for functions on \(\omega \) with respect to the grid, and with values in \(\text {SO}(3).\) In the following we write \(\overline{R}_h\) for discrete microrotations from \(V_{p_2,h}^\text {SO(3)}.\)

It is well known that \(V_{p_1,h}^{{\mathbb {R}}^3} \subset H^1(\omega ,\,{\mathbb {R}}^3)\) (see, e.g., [11, Satz 5.2]). Additionally, we know from Theorem 7 that the FE space \(V_{p_2,h}^\text {SO(3)}\) is a subset of \(W^{1,q}(\omega ,\,\text {SO}(3))\) for all \(p_2 \in {\mathbb {N}}.\) Therefore, the energy functional I is well defined on the product space \(\mathbf {V}_h :=V_{p_1,h}^{{\mathbb {R}}^3} \times V_{p_2,h}^\text {SO(3)}\) for all \(p_1,\,p_2 \in {\mathbb {N}}.\) A suitable discrete approximation of the geometrically nonlinear planar Cosserat shell model therefore consists of the unmodified energy functional I restricted to the space \(\mathbf {V}_h.\)

In analogy to the continuous model, we consider the following boundary conditions for the discrete problem. Let \(g_{d,h} \in V_{h,p_1}^{{\mathbb {R}}^3}\) be a FE approximation of the Dirichlet boundary value function \(g_d{\text {:}}\,\omega \rightarrow {\mathbb {R}}^3,\) and let \(g_{d,h}^{\prime } \in V_{h,p_2}^{{\mathbb {R}}^3}\) be an approximation of the vector field \(g_d^{\prime }.\) Then we demand that the discrete displacement \(m_h\) fulfill the condition

$$\begin{aligned} m_h(x,\,y) = g_{d,h}(x,\,y) \quad \text {for all}\,(x,\,y) \in \gamma _0. \end{aligned}$$
(13)

For the microrotations \(\overline{R}\) we can define discrete approximations of the boundary conditions (7) and (8): we either leave them free, corresponding to homogeneous Neumann conditions for \(\overline{R},\) or, alternatively, corresponding to (8), we can specify the direction of the transversal director vector \(\overline{R}_3\) (rigid director prescription)

$$\begin{aligned} \left( \overline{R}_h\right) _3 (x,\,y) = \frac{g_{d,h}^{\prime }(x,\,y)}{\Vert g_{d,h}^{\prime }(x,\,y)\Vert } \quad \text {for all}\, (x,\,y) \in \gamma _0.\nonumber \\ \end{aligned}$$
(14)

Summing up, the discrete Cosserat shell problem is:

Problem 2

(Discrete Cosserat shell problem) Find a pair of functions \((m_h,\,\overline{R}_h)\) with \(m_h \in V_{p_1,h}^{{\mathbb {R}}^3}\) and \(\overline{R}_h \in V_{p_2,h}^{\text {SO(3)}}\) that minimizes the functional I given in (10), subject to the constraints (13) and (14) on \(\gamma _0.\)

Note that frame indifference of the discrete model is retained naturally, because we simply restrict the frame-indifferent functional I to a subset \(V_{p_1,h}^{{\mathbb {R}}^3} \times V_{p_2,h}^{\text {SO(3)}}\) of its original domain of definition, and this subset is closed under rigid body motions (Lemma 8).

Remark 2

We have discretized the midsurface deformation m using standard FEs, and we have used the novel GFEs only for the rotation field \(\overline{R}.\) We can unify the two approaches when a more abstract viewpoint is taken. Indeed, revisiting the definitions of Sect. 3 it is obvious that GFEs may as well be defined for the target manifold \({\mathbb {R}}^3\) instead of \(\text {SO(3)};\) standard Lagrangian FEs are the result. In this sense, we have used GFEs for both the midsurface deformation and the microrotation field.

When the two orders \(p_1\) and \(p_2\) coincide \(p=p_1 = p_2,\) we can go one step further. Note that the space \(\text {SE}(3) :={\mathbb {R}}^3 \times \text {SO(3)}\) is well known as the special Euclidean group (the group of rigid body motions in \({\mathbb {R}}^3\)). We therefore introduce the GFE space \(V_{h,p}^\text {SE(3)},\) and observe that it is isomorphic to \(\mathbf {V}_h :=V_{p,h}^{{\mathbb {R}}^3} \times V_{p,h}^\text {SO(3)}.\) We can therefore also interpret the discrete Cosserat shell problem as a minimization problem in the single GFE space \(V_{h,p}^\text {SE(3)}.\)

4.2 The algebraic problem

For the numerical minimization of the Cosserat shell energy we need an algebraic formulation. For standard FEs there is a bijective correspondence between FE functions and coefficient vectors, via the representation of the functions with respect to a basis. For GFEs, the situation is more involved. Since GFE functions are continuous by definition, we can always associate a coefficient vector \(\overline{\overline{R}} \in \text {SO(3)}^{N_2}\) to a function \(\overline{R}_h \in V_{p_2,h}^{\text {SO(3)}}\) by pointwise evaluation at the \(N_2\) Lagrange nodes. To formalize this we introduce the evaluation operator

$$\begin{aligned}&{\mathcal {E}}_{p_2} {\text {:}}\,V_{p_2,h}^{\text {SO(3)}} \rightarrow \text {SO(3)}^{N_2}, \quad {\mathcal {E}}_{p_2}\left( \overline{R}_h\right) _i = \overline{R}_h(n_i),\\&\quad i= 1,\ldots ,N_2, \end{aligned}$$

where \(n_i \in \omega ,\,i=1,\ldots ,N_2\) are the Lagrange nodes of the FE space of order \(p_{2}\) on the grid. However, for a given set of coefficients \(\overline{\overline{R}} \in \text {SO(3)}^{N_2}\) there may be more than one GFE function that interpolates \(\overline{\overline{R}}.\) This happens when the set of values violates the assumptions of Theorems 4 or 5 (depending on the FE approximation order \(p_2\)).

All GFE functions that do comply with the conditions of Theorems 4 or 5 element-wise can be identified with coefficient sets \(\overline{\overline{R}} \in \text {SO}(3)^{N_2}.\) In most cases this situation can be achieved by making the grid fine enough. This has been formalized in [41, Theorem 5.2], which we repeat here, adapted to the Cosserat shell problem.

Theorem 9

Let \(\overline{R}{\text {:}}\,\omega \rightarrow \text {SO(3)}\) be Lipschitz continuous in the sense that there exists a constant L such that

$$\begin{aligned} \text {dist}(\overline{R}(x),\,\overline{R}(y)) \le L||x-y||, \end{aligned}$$

for all \(x,\,y \in \omega .\) Let \({\mathcal {G}}\) be a grid of \(\omega \) and h the length of the longest edge of \({\mathcal {G}}.\) Set \(\overline{\overline{R}} = {\mathcal {E}}_{p_2}(\overline{R}),\) tacitly extending the definition of \({\mathcal {E}}_{p_2}\) to all continuous functions \(\omega \rightarrow \text {SO(3)}.\) For h small enough, the inverse of \({\mathcal {E}}_{p_2}\) has only a single value in \(V_{p_2,h}^{\text {SO(3)}}\) for each \(\widetilde{\overline{R}} \in \text {SO(3)}^{N_2}\) in a neighborhood of \(\overline{\overline{R}}.\)

The restrictions posed by this theorem do not appear to pose any difficulties in practice. We therefore assume in the following that \({\mathcal {E}}_{p_2}\) is a (local) bijection.

Analogously to \({\mathcal {E}}_{p_2}\) we define the corresponding operator \({\mathcal {E}}_{p_1}\) doing point-wise evaluation of functions in \(V_{p_1,h}^{{\mathbb {R}}^3}.\) With these operators, it is straightforward to define the algebraic Cosserat shell energy

$$\begin{aligned}&\bar{I}{\text {:}}\,{\mathbb {R}}^{3N_1} \times \text {SO(3)}^{N_2} \rightarrow {\mathbb {R}},\nonumber \\&\quad \bar{I}(\bar{m},\,\overline{\overline{R}}) :=I\left( {\mathcal {E}}_{p_1}^{-1}(\bar{m}),\, {\mathcal {E}}_{p_2}^{-1}(\overline{\overline{R}})\right) , \end{aligned}$$
(15)

where I is the functional (10). The algebraic Cosserat shell problem then is:

Problem 3

(Algebraic Cosserat shell problem) Find a pair \(\bar{m} \in {\mathbb {R}}^{3N_1},\,\overline{\overline{R}} \in \text {SO(3)}^{N_2}\) that minimizes \(\bar{I},\) subject to suitable boundary conditions.

Implementation of Dirichlet boundary conditions for the deformation \(m_h\) is straightforward. For the rotation field we again have the choice between leaving the rotation free, or prescribing the transversal director vector \(\overline{R}_3\) (rigid director prescription)

$$\begin{aligned} \left( \overline{\overline{R}}_i\right) _3 = \frac{g_d^{\prime }(n_i)}{{|g_d^{\prime }(n_i)|}}, \end{aligned}$$

for all Lagrange nodes \(n_i\) on the Dirichlet boundary \(\gamma _0.\)

Remark 3

If \(N_1 = N_2 = N\) we can also interpret the functional (15) as being defined on the manifold \(({\mathbb {R}}^3 \times \text {SO(3)})^N.\)

It was mentioned in Sect. 2 that the shell energy is frame-invariant in the sense that

$$\begin{aligned} I(Qm,\,Q\overline{R}) = I(m,\,\overline{R}), \end{aligned}$$

where Q is any element of \(\text {SO(3)},\) acting on functions in \(H^1(\omega ,\,{\mathbb {R}}^3)\) and \(W^{1,q}(\omega ,\,\text {SO(3)})\) by pointwise multiplication. By the equivariance property (Lemma 8) of GFEs this frame invariance does not get lost by discretization.

Theorem 10

The algebraic energy functional \(\bar{I}\) is frame-invariant in the sense that

$$\begin{aligned} \bar{I}(Q\bar{m},\, Q\overline{\overline{R}}) = \bar{I}(\bar{m},\,\overline{\overline{R}}), \end{aligned}$$

for all \(Q \in \text {SO(3)},\) which, by an abuse of notation, now acts on the components of \(\bar{m}\) and \(\overline{\overline{R}}.\)

This sets the GFE discretization apart from alternative approaches like [28, 29], which do not have this property.

5 Numerical minimization of the algebraic energy

All previous work on nonlinear shell elements has used the Newton method to solve the resulting nonlinear systems of equations. However, it is well known that this method converges only locally. Therefore, a sequence of loading steps is traditionally used to obtain a solution. These loading steps have to be selected carefully to make sure that the Newton solver converges at each loading step. This selection of loading steps can be tedious in practice.

For energy minimization problems there exist globalized versions of the Newton method, i.e., methods that converge for any initial iterate, without using intermediate loading steps. One such method is the so-called trust-region method [13], which replaces each Newton step with a quadratic minimization problem on a convex set. Under reasonable conditions, it degenerates to a standard Newton method when close enough to a solution, and hence local quadratic convergence is recovered.

While the standard trust-region method works for energies defined on Euclidean spaces, a generalization to energies on Riemannian manifolds has been introduced and investigated by Absil et al. [1]. This Riemannian trust-region method can be applied to the algebraic Cosserat energy (15), which is defined on the product manifold \({\mathbb {R}}^{3N_1} \times \text {SO(3)}^{N_2}.\) As an extension of Newton’s method, it shows locally quadratic behavior. On the other hand, it can be shown to converge globally without intermediate loading steps.

5.1 Trust-region methods

We briefly review the trust-region method for Euclidean spaces [13], and then show how it can be generalized to functionals on a Riemannian manifold. Consider a twice continuously differentiable functional

$$\begin{aligned} J{\text {:}}\,{\mathbb {R}}^N \rightarrow {\mathbb {R}}, \end{aligned}$$
(16)

supposed to be coercive and bounded from below. Given any initial iterate \(x^0 \in {\mathbb {R}}^N,\) we want to find a local minimizer of J.

Fig. 2
figure 2

One step of the trust-region method. The new iterate \(x^{k+1}\) is the minimizer of the quadratic model \(m_k\) restricted to the ball \(B_{x^k}(\rho _k)\) (shaded region), unless the energy decrease predicted by the model deviates too much from the true energy decrease \(J(x^k) - J(x^{k+1})\)

The Newton method does this in the following way. Let \(x^k \in {\mathbb {R}}^N\) be any iterate. Approximate J around \(x^k\) by the quadratic Taylor expansion

$$\begin{aligned}&m_k {\text {:}}\,{\mathbb {R}}^N \rightarrow {\mathbb {R}}, \\&m_k(s) = J\left( x^k\right) + \partial J\left( x^k\right) s + \frac{1}{2} s^T \partial ^2 J\left( x^k\right) s, \end{aligned}$$

which in this context is called a quadratic model of J around \(x^k.\) The variable s is to be interpreted as a correction \(s = x - x^k.\) Then, compute a stationary point \(s^k\) of \(m_k,\) and use it as the correction to the next iterate

$$\begin{aligned} x^{k+1} :=x^k + s^k. \end{aligned}$$

Computing the stationary point \(s^k\) is done by the well-known Newton update formula

$$\begin{aligned} s^k = x^{k+1}-x^k = {-}\partial ^2 J\left( x^k\right) ^{-1} \partial J\left( x^k\right) . \end{aligned}$$
(17)

Observe that if the Hessian \(\partial ^2 J(x^k)\) is positive definite at all iterates, then the algorithm produces a sequence of iterates with decreasing energy, i.e., \(J(x^{k+1}) \le J(x^k)\) for all \(k \in {\mathbb {N}}.\) However, iterates with indefinite \(\partial ^2 J(x^k)\) may lead to energy increase.

To enforce global convergence of this, the trust-region method first replaces the search for a stationary point of \(m_k\) by a minimization problem for a minimizer \(s^k\) of \(m_k.\) As a consequence, iterates of the trust-region method are energy decreasing in all cases. Secondly, it notes that the quadratic model \(m_k\) is a good approximation of J only in a neighborhood of \(x^k.\) This observation is made explicit by restricting the minimization problem for \(m_k\) to a ball of radius \(\rho _k\) around \(x^k,\) the name-giving trust region (Fig. 2). In other words, the Newton step (17) is replaced by

$$\begin{aligned} s^k = \mathop {\text {arg min}}\limits _{{s \in {\mathbb {R}}^N}} m_k(s), \quad \big \Vert {s^k}\big \Vert \le \rho _k. \end{aligned}$$
(18)

Since we now look for a minimizer on a compact set only, Problem (18) is well-defined even if \(\partial ^2 J\) is not positive definite.

Unlike the original Newton method, the trust-region method is monotone in the sense that \(J(x^{k+1}) \le J(x^k)\) for all \(k \in {\mathbb {N}}.\) A more quantitative monitoring of the energy decrease allows to control the trust-region radius, i.e., the trust in the quality of the quadratic approximation. The quality of the correction step \(s^k\) is estimated by comparing the functional decrease to the model decrease. If the quotient

$$\begin{aligned} \kappa _k = \frac{J(x^k) - J(x^k + s^k)}{m_k(0) - m_k(s^k)}, \end{aligned}$$
(19)

is smaller than a fixed value \(\eta _1,\) then the step is rejected, and \(s^k\) is recomputed for a smaller trust-region radius. Otherwise the step is accepted. If \(\kappa _k\) is larger than a second value \(\eta _2,\) the trust-region radius is enlarged for the next step. Common values are \(\eta _1 = 0.01\) and \(\eta _2 = 0.9\) [13].

For the trust-region algorithm, the following convergence properties can be shown.

Theorem 11

([13, Theorems 6.4.6 and 6.5.5]) Suppose that J is twice continuously differentiable, bounded from below, and such that its Hessian remains bounded for all \(x \in {\mathbb {R}}^N.\)

  1. (1)

    For all initial iterates we get

    $$\begin{aligned} \lim _{k \rightarrow \infty } \left\| {\partial J\left( x^k\right) }\right\| = 0. \end{aligned}$$
  2. (2)

    Suppose that \(\{x^{k_i}\}\) is a subsequence of the iterates converging to the first-order critical point \(x_{*}.\) Suppose furthermore that \(s^k \ne 0\) for all k sufficiently large. Finally suppose that \(\partial ^2 J(x_*)\) is positive definite. Then the complete sequence of iterates \(\{x^k\}\) converges to \(x_*,\) eventually the step quality \(\kappa _k\) remains above \(\eta _2,\) and the trust-region radius \(\rho _k\) is bounded away from zero.

In particular, since \(\kappa _k > \eta _2\) for all k large enough, the trust-region radius grows near local minimizers, the method eventually degenerates to a pure Newton method, and we get locally quadratic convergence.

Various algorithms for solving the constrained quadratic minimization problems (18) been proposed in the literature. The monograph [13] gives a good overview.

Trust-region methods are much more convenient than standard Newton methods, because they relieve the user of the tedious load-stepping. They typically do not need more iterations than a Newton method. If the tangent problems are very badly conditioned (which, for our shell model, is unfortunately the case if \(L_c\) is small), then Newton methods can be faster because they can use direct solvers to solve the inner problems. Trust-region methods, on the other hand, need to employ iterative solvers, whose convergence speed depends on the matrix condition number. This argument becomes void if large problems are considered, because for such problems the memory consumption of direct solvers makes their use impossible. Also, constructing iterative solver that are Taylor-made for the tangent problems of nonlinear Cosserat shell models may lead to a large speed increase. This is a subject of future research.

Fig. 3
figure 3

In the Riemannian trust-region method, the energy functional defined on M is lifted onto the tangent space at \(x^k\) using the exponential map. Then, a linear correction step is computed on \(T_{x^k} M,\) and applied to \(x^k\) using the exponential map \(x^{k+1} = \exp _{x^k} s^k\)

5.2 Riemannian trust-region methods

The algebraic energy functional \(\bar{I}\) defined in (15) is not a functional of the type (16). Rather, its domain of definition is the nonlinear manifold \({\mathbb {R}}^{3N_1} \times \text {SO(3)}^{N_2}.\) The trust-region method has been generalized to such energies by Absil et al. [1]. Let M be a Riemannian manifold with metric g,  and \(J{\text {:}}\, M \rightarrow {\mathbb {R}}\) twice differentiable and bounded from below (in our case: \(M ={\mathbb {R}}^{3N_1} \times \text {SO(3)}^{N_2}\)). The basic idea of such a Riemannian trust-region algorithm is that in a neighborhood of a point \(x \in M\) the functional J can be lifted onto the tangent space \(T_xM.\) There, a vector space trust-region subproblem can be solved and the result transported back onto M (Fig. 3).

More formally, let again \(k \in {\mathbb {N}}\) be an iteration number and \(x^k \in M\) the current iterate. We obtain the lifted functional by setting

$$\begin{aligned} \hat{J}_k{\text {:}}\, T_{x^k} M \rightarrow {\mathbb {R}}, \quad \hat{J}_k (s) = J\left( \exp _{x^k}s\right) . \end{aligned}$$

Let \(\rho _k > 0\) be the current trust-region radius. The Riemannian metric g turns \(T_{x^k} M\) into a Banach space with the norm \(||\cdot ||_{x^k} = \sqrt{g_{x^k}(\cdot ,\, \cdot )}.\) There, the trust-region subproblem reads

$$\begin{aligned} s_k = \mathop {\text {arg min}}\limits _{{s \in T_{x^k}M}} m_k(s), \quad ||s||_{x^k} \le \rho _k, \end{aligned}$$
(20)

with the quadratic, but not necessarily convex model

$$\begin{aligned} m_k(s)= & {} \hat{J}_k(0) + g_{x^k}\left( \nabla \hat{J}_k(0),\,s\right) \nonumber \\&+ \, \frac{1}{2} g_{x^k}\left( \text {Hess}\hat{J}_k(0)s,\,s\right) . \end{aligned}$$
(21)

Here \(\nabla \hat{J}_k\) is the Riemannian gradient and \(\text {Hess}\hat{J}_k\) the Riemannian Hessian of \(\hat{J}_k\) (see [1] for definitions), and both are evaluated at \(0 \in T_{x^k}M.\) Note that (21) is independent of a specific coordinate system on \(T_{x^k}M.\) As a minimization problem of a continuous function on a compact set, (20) has at least one solution \(s^k,\) which generates the new iterate by

$$\begin{aligned} x^{k+1} = \exp _{x^k} s^k. \end{aligned}$$

As in trust-region methods in linear spaces, the quality of a correction step \(s^k\) is estimated by comparing the functional decrease and the model decrease. The quotient (19) now takes the form

$$\begin{aligned} \kappa _k = \frac{J(x^k) - J(\exp _{x^k} s^k)}{m_k(0) - m_k(s^k)}. \end{aligned}$$

For this method, Absil et al. proved global convergence to first-order stationary points, and, depending on the exactness of the inner solver, locally superlinear or even locally quadratic convergence [1]. For our numerical results we use the monotone multigrid method [26] together with a \(\infty \)-norm trust-region. Details can be found in [40].

5.3 Computing the algebraic tangent problem numerically

Solving the constrained quadratic problems (20) numerically involves the algebraic Riemannian gradient \(\nabla \bar{I}\) and Hessian \(\text {Hess}\bar{I}\) of the functional \(\bar{I}.\) While those could in principle be evaluated analytically, such an approach is involved and error prone (Consider the derivative formulas for the gradient in [40, Chap. 5]). It is much more convenient to use automatic differentiation (AD) to compute the derivatives. AD is a technique to algorithmically compute first and higher derivatives of functions given in form of computer programs [19]. This includes computer programs involving iterative solvers like the Newton method used to evaluate GFE functions (see “Quaternion coordinates for SO(3)” in Appendix). Many good implementations of AD are available as external libraries, and the choice can have a considerable impact on the computational cost of assembling the tangent problem. For this article we have used the open-source ADOL-C software [48], which is one of the few that provides all the features needed to compute first and second derivatives of the energy functionals considered here.

For simplicity we assume that the deformation m and the microrotation \(\overline{R}\) have been discretized with finite elements of equal approximation order. Then there is an equal number of Lagrange nodes \(N = N_1 = N_2\) for both of them, and we can consider the algebraic energy \(\bar{I}\) as being defined on the manifold \(M = ({\mathbb {R}}^3 \times \text {SO(3)})^N.\)

Unfortunately, current AD tools do not directly support derivatives of energies defined on manifolds. We therefore use the following trick. Interpret elements R of \(\text {SO(3)}\) as unit vectors q in \({\mathbb {R}}^4\) using quaternion coordinates (see “Quaternion coordinates for SO(3)” in Appendix). The algebraic energy functional \(\bar{I}\) can then be interpreted as being defined on \(({\mathbb {R}}^3 \times S^3)^N \subset {\mathbb {R}}^{7N}.\) To extend \(\bar{I}\) to a neighborhood of \(({\mathbb {R}}^3 \times S^3)^N\) in \({\mathbb {R}}^{7N}\) we first introduce \(\bar{q} \in {\mathbb {R}}^{4N},\) a vector of quaternions. Componentwise normalization leads to a vector of unit quaternions, which we denote by \(\bar{q}/{|\bar{q}|} \in (S^3)^N\) in an abuse of notation. Using the map F defined in (28) we can construct \(F(\bar{q}/{|\bar{q}|}) \in \text {SO(3)}^N\) (the application of F again component-wise). Then we set

$$\begin{aligned} \tilde{I} (\bar{m},\,\bar{q}) :=\bar{I} (\bar{m},\, F(\bar{q}/{|\bar{q}|})), \end{aligned}$$

which is a smooth functional on an open subset of the Euclidean space \({\mathbb {R}}^{7N}.\) Given a computer implementation of \(\tilde{I},\) an AD system like ADOL-C can then compute the Euclidean gradient \(\partial \tilde{I} \in {\mathbb {R}}^{7N}\) and Hessian \(\partial ^2 \tilde{I} \in {\mathbb {R}}^{7N \times 7N}\) automatically.

To obtain the Riemannian gradient \(\nabla \bar{I}\) and Hessian \(\text {Hess}\bar{I}\) we need additional manipulations. For the gradient we use the following well-known result (see, e.g., [1], Sect. 3.6.1).

Lemma 12

Let M be a smooth Riemannian manifold isometrically embedded in a Euclidean space \({\mathbb {R}}^l.\) For each \(x \in M\) let \(P_x{\text {:}}\, T_x{\mathbb {R}}^l \rightarrow T_xM\) be the orthogonal projection onto the tangent space at x. Let \(f{\text {:}}\,M \rightarrow {\mathbb {R}}\) be continuously differentiable and \(\tilde{f}\) a smooth extension of f to a neighborhood of M in \({\mathbb {R}}^l.\) Then

$$\begin{aligned} \nabla f = P_x \partial \tilde{f}, \end{aligned}$$
(22)

where \(\nabla \) is the gradient operator on M,  and \(\partial \) is the gradient in \({\mathbb {R}}^l.\)

Since \(\bar{I}\) is defined on the N-fold product of \({\mathbb {R}}^3 \times \text {SO(3)}\) we obtain the Riemannian gradient \(\nabla \bar{I}\) by applying Lemma 12 to each factor. Hence, the Riemannian gradient is given by componentwise projection

$$\begin{aligned} ( \nabla \bar{I})_i = P_x (\partial \tilde{I})_i, \quad i = 1,\ldots , N, \end{aligned}$$

where \(P_x\) is the orthogonal projector from \(v \in {\mathbb {R}}^7\) to \({\mathbb {R}}^3 \times T_x S^3.\) This projector can be constructed from the corresponding projector for \({\mathbb {R}}^3\) (which is the identity), and the corresponding projector for \(S^3\)

$$\begin{aligned} P^{S^3}_x = I - x x^T. \end{aligned}$$

A similar formula for the Riemannian Hessian is given in the following lemma. As we now consider second derivatives, the curvature of \(\text {SO(3)}\) comes into play.

Lemma 13

(Absil et al. [2]) With the same notation as in Lemma 12, we have

$$\begin{aligned} \text {Hess}f(x)[z] = P_x \partial ^2 \tilde{f}(x)z + {\mathfrak {A}}_x\left( z,\,P^\perp _x \partial \tilde{f}\right) , \end{aligned}$$

where \({\mathfrak {A}}_x(z,\,v)\) is the Weingarten map of M,  and \(P_x^\perp \) is the orthogonal projector onto the normal space of M at x.

The Weingarten map for the unit sphere in \({\mathbb {R}}^4\) is [2]

$$\begin{aligned} {\mathfrak {A}}_x(z,\,v) :={-}\big (x^Tv\big )z, \end{aligned}$$

and the orthogonal projector onto the normal space at \(x \in S^3\) is

$$\begin{aligned} P_x^\perp = I - P_x = xx^T. \end{aligned}$$

Written in canonical coordinates of \({\mathbb {R}}^{7N},\) the matrix \(\text {Hess}\tilde{I}\) is a sparse symmetric \(7N \times 7N\)-matrix, consisting of dense \(7 \times 7\) blocks. Using this representation for numerical computations is undesirable for two reasons. First of all, it is rank deficient, because the extended functional \(\tilde{I}\) is constant along each normal vector of \(S^3.\) Secondly, it is bigger than necessary: since \(\text {SO(3)}\) (or the set of unit quaternions for that matter) is only three-dimensional, the entire Riemannian Hessian should fit into a \(6N \times 6N\) matrix. To construct such a representation for the Riemannian Hessian at a point \((\bar{m},\, \bar{q}) \in {\mathbb {R}}^{7N}\) we pick a basis for the tangent space of \(({\mathbb {R}}^3 \times S^3)^N\) at \((\bar{m},\, \bar{q}),\) and write \(\text {Hess}\tilde{I}\) in that basis. Luckily, such a basis is easily available. For the components in \({\mathbb {R}}^3,\) the canonical basis can be used. For any point \(q \in S^3,\) an orthonormal basis of \(T_q S^3\) is given by

$$\begin{aligned} D_{q,1} = \begin{pmatrix} q_3 \\ q_2 \\ {-}q_1 \\ {-}q_0 \end{pmatrix}, \quad D_{q,2} = \begin{pmatrix} {-}q_2 \\ q_3 \\ q_0 \\ {-}q_1 \end{pmatrix}, \quad D_{q,3} = \begin{pmatrix} q_1 \\ {-}q_0 \\ q_3 \\ -q_2 \end{pmatrix}, \end{aligned}$$

and this basis depends smoothly on q. We combine the vectors to a \(7 \times 6\)-matrix

$$\begin{aligned} D_q = \begin{pmatrix} 1 &{} &{} &{} &{} &{} \\ &{}\quad 1 &{} &{} &{} &{} \\ &{} &{}\quad 1 &{} &{} &{} \\ &{} &{} &{}\quad q_3 &{}\quad {-}q_2 &{}\quad q_1 \\ &{} &{} &{}\quad q_2 &{}\quad q_3 &{}\quad {-}q_0 \\ &{} &{} &{}\quad {-}q_1 &{}\quad q_0 &{}\quad q_3 \\ &{} &{} &{}\quad {-}q_0 &{}\quad -q_1 &{}\quad -q_2 \end{pmatrix}, \end{aligned}$$
(23)

whose columns form an orthonormal basis of \({\mathbb {R}}^3 \times S^3.\)

Table 1 Material parameters for the cantilever and the L-shape examples

We denote by D the block-diagonal \(7N \times 6N\)-matrix where the ith block is \(D_{q_i}\) as given by (23). Then, in these new coordinates, the Riemannian Hessian has the algebraic form

$$\begin{aligned} \text {Hess}\tilde{I} = D^T \partial ^2 \tilde{I} D - D^T\left( x^T P^\perp _x\partial \tilde{I}\right) D \in {\mathbb {R}}^{6N \times 6N}. \end{aligned}$$
(24)

This matrix has no degenerate directions caused by the embedding of the configuration space into \({\mathbb {R}}^{7N}.\) Indeed, it is again completely intrinsic. In each iteration of the trust-region solver, this is the matrix used to define the quadratic model.

Finally, we point out one lucky coincidence that helps to increase efficiency. AD systems such as ADOL-C are able to compute the product \((\partial ^2 \tilde{I}) D\) directly. This is noticeably cheaper than using AD to compute \(\partial ^2 \tilde{I}\) and later multiplying by D,  because \((\partial ^2 \tilde{I}) D\) has fewer entries than \(\partial ^2 \tilde{I}\) (\(7N \times 6N\) compared to \(7N \times 7N\)). We noted a decrease of about 10 % of the time needed to assemble the Riemannian Hessian (24).

Fig. 4
figure 4

Deformation test of a clamped cantilever. Left reference grid and deformed grid, right vertical deflection as a function of the grid refinement for five different discretization order combinations. In the legend, the first number is the discretization order for the midsurface deformation, and the second number is the order of the microrotation discretization. Only the discretization using first-order elements for both deformation and microrotations shows locking behavior

Fig. 5
figure 5

Left L-shape structure with boundary conditions, right the grid, which is the one also used in [54]

6 Numerical tests

We now present several numerical tests. These demonstrate the capabilities of both our Cosserat shell model and of our discretization. First, we demonstrate that the elements do not suffer from shear locking, as long as the midsurface deformations are discretized with finite elements of at least second order. Then, we reproduce quantitative results from the literature (Sect. 6.2), and show how the model and discretization can handle large rotations with ease (Sect. 6.3). In Sect. 6.4 we simulate the wrinkling of a polyimide sheet, and find very good quantitative correspondence with experimental data. All examples in this chapter were programmed using the Dune libraries ([5], http://www.dune-project.org).

We deliberately do not give detailed measurements of the time spent in the various steps of the energy minimization algorithm. Automatic differentiation of the energy functional to get the algebraic tangent matrices and numerical solution of the constraint tangent problems together consume virtually all run-time, and neither of the two consistently dominates the other. Time spent assembling the tangent problems may possibly be reduced by switching to a different AD library. The run-time behavior of the iterative multigrid solver for the constraint quadratic problems is more difficult to judge. Due to the smallness of some of the parameters appearing in the Cosserat energy I,  the tangent problems are very badly conditioned. Therefore, in many cases the multigrid solver will simply iterate until a prescribed maximum number of iterations has been reached, and the wall-time taken by the solver is simply a multiple of this number. On the other hand, even in such cases the multigrid solver produces enough energy decrease for the outer trust-region method to converge. The precise interplay between the maximum number of allowed iterations of the multigrid solver and the convergence behavior of the trust-region method is delicate, and we have not investigated it here. There is hope that further insight into the problem structure will allow to construct preconditioners that will greatly speed up the multigrid convergence. Alternatively, one may consider replacing the trust-region constraint by a line search globalization, which would allow to use a direct solver for the tangent problems.

6.1 Deflection of a cantilever

To investigate the shear locking behavior of the proposed discretization, we use the classic benchmark of a clamped cantilever loaded transversally at one end. We see that there is no locking provided that the deformation m is discretized using at least second-order finite element functions.

Let the reference domain \(\omega \) be the rectangle \((0,\,100)\,\mathrm {mm} \times (0,\,10)\,\mathrm {mm}.\) We clamp the cantilever at one short end by requiring \(m(x) = (x_1,\, x_2,\, 0)\) and \(\overline{R}_3(x) = (0,\,0,\,1)\) for all \(x \in \{0\}\,\mathrm {mm} \times (0,\,10)\,\mathrm {mm}.\) For the shell material parameters we use the values given in Table 1. We load the cantilever by applying a transversal surface load at the far edge of magnitude 18 N.

We discretize the domain by ten quadrilateral elements (Fig. 4). From this grid, we create a sequence of finer grids by repeated uniform refinement. On this sequence of grids we discretize the solution space by five different GFE spaces: first, we use the same approximation order for deformation and microrotations, testing orders one, two, and three. Then, as the microrotations are related to the first derivative of the deformation m,  we also investigate two combinations where the microrotation field is discretized with one order lower than the deformation field.

For these different discretizations we measure the cantilever deflection as a function of the mesh size. The results are shown in Fig. 4. One can see that all but one discretizations give the same value for the deflection, and that that value is independent from the grid resolution. The exception is the discretization using first-order elements for the deformation. There, the discrete model is much stiffer for coarse elements, and the deflection only approaches the correct value asymptotically for high grid resolutions. We conclude that shear locking is not an issue with GFEs, if at least second-order elements are used for the deformation m. This agrees with the results given in [23].

For all further examples in the chapter we have used second-order elements both for the deformation and for the microrotations.

6.2 Deformation of an L-shape

We begin by comparing our approach to a benchmark problem taken from the literature. The following setup is used by Wriggers and Gruttmann [54], who compare their discrete model with the ones from [3, 44, 45] for the same problem. Our aim here is twofold: we want to show that our discrete model can reproduce quantitative results from the literature. Also, we want to highlight the speed and stability of our solver.

Let \(\omega \) be the L-shaped domain depicted in Fig. 5. Sizes of the shape are given in the figure, and we set the plate thickness to 0.6 mm. We model the material with the finite-strain hyperelastic material of Sect. 2.2. The material parameters are given in Table 1. The Lamé constants \(\mu ,\,\lambda \) correspond to the values \(E = 71\,240\,\text {N}/\mathrm{mm}^2,\,\nu =0.31\) given in [54]. As argued in Sect. 2.1, the coupling modulus \(\mu _c\) is set to \(\mu _c = 0\) N/mm. We set the curvature exponent q appearing in the curvature energy term \(W_\mathrm{curv}\) to \(q=2,\) and the internal length \(L_c\) to \(0.6\,\upmu \)m, following the suggestions of Sect. 2.1.

Fig. 6
figure 6

Example deformation of the L-shape structure for \(P=1.62\) N. Upper picture initial configuration and configuration under load. Lower picture closeup of the clamped part of the structure, with the directors shown as red arrows. (Color figure online)

Fig. 7
figure 7

Out-of-plane deflection as a function of the load. Left own simulation. Right corresponding plot taken from [54]

The boundary conditions are depicted on the left of Fig. 5. The structure is clamped on the left vertical end \(\gamma _0.\) By this we mean that on \(\gamma _0\) we set \(m(x,\,y) = (x,\,y,\,0),\) and the rigid director description \(\overline{R}_3 = (0,\,0,\,1)^T\) for the microrotations \(\overline{R}.\) On the lower horizontal end \(\gamma _s\) we prescribe a uniform surface loadFootnote 1 P in the direction of the first unit basis vector. Zero Neumann boundary conditions are set everywhere else for displacements and rotations. We discretize the domain using 99 quadrilateral elements as depicted on the right of Fig. 5. The equations are discretized using second-order (i.e., nine-node) GFEs.

The first aim of this experiment is to study the buckling behavior of the structure for different values of P. When the structure is loaded, it deforms in-plane as long as the load P stays below a critical value \(P_s.\) For loads beyond this value, the structure starts to buckle laterally. An example deformation using \(P = 1.62\) N is shown in Fig. 6.

Since the in-plane deformation remains a stationary point of the energy even for loads larger than \(P_s,\) a perturbation needs to be applied to trigger the buckling. We do this by starting the trust-region method at the asymmetric initial iterate

$$\begin{aligned} m(x,\,y) = \left( x,\,y,\,z= {\left\{ \begin{array}{ll} 0 &{} \quad \text {if}\,x < 225\,\text {or}\,y < {-}15 \\ 10^{-3}(x-225)(y+15) &{}\quad \text {else} \end{array}\right. } \right) \quad \overline{R} = \text {Id}. \end{aligned}$$
(25)

This adds a little kink in the corner of the domain, which is enough to trigger the buckling.

A plot showing the lateral average displacement of \(\gamma _s\) is shown in Fig. 7. For comparison we have also given the corresponding plot from [54]. It can be seen that the critical value we obtain is between 1.188 and 1.224 N. This is in good agreement to the other values from the literature [3, 44, 45, 54], which we print in Table 2.

Table 2 Literature results for the critical load
Fig. 8
figure 8

Behavior of the Riemannian trust-region solver for the configuration shown in Fig. 6. Left hyperelastic energy per iteration step. Center maximum norm of the correction per iteration. Right radius of the trust-region per iteration. The vertical axis has logarithmic scale in all three images. Note how the solver enters quadratic convergence after iteration 310, and how the trust-region opens up simultaneously

In a second step we want to highlight a few properties of the solver. For this we use the configuration described above with the surface load \(P= 1.62\) N at \(\gamma _s\) shown in Fig. 6. We solve the problem in a single loading step, using the trust-region method described in Sect. 5.2. For the quadratic minimization problems we use a monotone multigrid method as described in [40]. The \(\infty \)-norm is used to define the trust region. We scale the rotation part of the norm by a factor of \(10^{-3},\) so that corrections to the deformation (with numerical values in the two-digit range) are treated equally to corrections to the rotations (which cannot get larger than \(\pi \)).

We start the trust-region solver at the initial iterate given in (25) with an initial trust-region radius of 0.1.Footnote 2 We terminate the iteration as soon as the maximum norm of the correction drops below \(3\times 10^{-6}.\) This criterion was achieved after 334 iterations. Figure 8, left, shows the energy I per iteration (in a semi-logarithmic plot), and we observe that the trust-region method really is monotonically energy-decreasing. The sharp drop in the first few steps corresponds to a decrease of the membrane energy, which dominates the initial configuration (25).

Figure 8 also shows the correction step length and the trust-region radius per iteration step. We note that both remain bounded in the one-digit range until the solver reaches the vicinity of the minimizer at about iteration 310. At this point the behavior is as predicted by Theorem 11: the quadratic models start to match the energy functional very well. Correspondingly, the trust-region radius starts to increase, and the method turns into a pure Newton method. The expected fast local convergence can be observed in the plot of the correction step length. We stress that this solution is computed in a single loading step, i.e., without any path-following mechanism.

6.3 Torsion of a long elastic strip

The purpose of the next numerical example is to show that, unlike, e.g., the approach in [22], our discretization can easily handle large rotations. For this we simulate torsion of a long elastic strip, which we clamp at one short end. Using prescribed displacements, the other short edge is then rotated around the center line of the strip, to a final position of three full revolutions.

Table 3 Material parameters for the twisted strip
Fig. 9
figure 9

Twisted rectangular strip at different parameter values t,  with t equal to the number of revolutions

Let \(\omega = (0,\,100)\,\text {mm} \times (-5,\,5)\,\text {mm}\) be the parameter domain, and \(\gamma _0\) and \(\gamma _1\) be the two short ends. We clamp the shell on \(\gamma _0\) by requiring

$$\begin{aligned} m(x,\,y) = (x,\,y,\,0), \quad \overline{R}_3 = (0,\,0,\,1)^T \quad \text {on}\, \gamma _0, \end{aligned}$$

and we prescribe a parameter dependent displacement

$$\begin{aligned} m_t(x,\,y)= & {} \begin{pmatrix} 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad \cos 2\pi t &{}\quad {-}\sin 2\pi t \\ 0 &{}\quad \sin 2\pi t &{}\quad \cos 2\pi t \end{pmatrix} \begin{pmatrix} x \\ y \\ 0 \end{pmatrix},\\ \left( \overline{R}_t\right) _3= & {} \left( \begin{array}{l} 0 \\ {-}\sin 2\pi t \\ \cos 2\pi t \end{array}\right) \quad \text {on}\,\gamma _1. \end{aligned}$$

For each increase of t by 1 this models one full revolution of \(\gamma _1\) around the shell central axis. Homogeneous Neumann boundary conditions are applied to the remaining boundary degrees of freedom. The material parameters are given in Table 3. We discretize the domain with \(10 \times 1\) quadrilateral elements, and use second-order (nine-node) GFEs to discretize the problem.

The result is pictured in Fig. 9 for several values of t. Having little bending stiffness, the configuration stays symmetric throughout the parameter range. Indeed, by increasing the length scale parameter \(L_c\) one can produce materials that are stiffer in bending. Strips of such material buckle sideways even at only two revolutions.

In order to arrive at configurations with more than one full twist, several intermediate loading steps have to be taken. This is not because the Riemannian trust-region solver would not converge for \(t\ge 1.\) Rather, it would converge, but to a minimizer in the wrong homotopy group (i.e., the minimizing configuration would never show more than a single twist). We note also that the finite-strain membrane energy (11) is essential for this example. Indeed, there appears to be no stable local minimizer of the small-strain energy (3) that corresponds to a twofold rotated strip. When the energy-minimizing Riemannian trust-region algorithm is used to minimize the small-strain energy starting from the two-revolutions configuration, the algorithm converges to the completely planar configuration.

6.4 Wrinkling of a sheared rectangular plastic sheet

In our last numerical example we demonstrate that our shell model does indeed display microstructure. We do this by simulating the wrinkling of a thin rectangular plastic sheet under shearing. Such wrinkling has been studied experimentally by Wong and Pellegrino [52]. Numerical simulations of their experiments can be found in [53] using the commercial FE software Abaqus, and in [47] using a Koiter model with a finite difference discretization. We obtain a good match between their experimental and our numerical results.

Fig. 10
figure 10

Simulation results of the shearing tests. The color visualizes the elevation of the wrinkles, and the color scale has been chosen to match the one used in [47]. (Color figure online)

Fig. 11
figure 11

Experimental results of the shearing tests. Images taken from Wong and Pellegrino [52]

Fig. 12
figure 12

Wrinkle amplitudes at the plane \(y= 64\) mm. Black lines experimental results from Wong and Pellegrino [52]. Red lines our simulation results. Observe that the number of wrinkles is almost identical, but the amplitudes predicted by our simulation are generally too large. (Color figure online)

The experiment consists of a rectangular plastic sheet of dimension \(380\,\text {mm} \times 128\,\text {mm}.\) The sheet is clamped on the long horizontal edges, and free on the short vertical ones. More mathematically, we prescribe Dirichlet boundary conditions \(m(x,\,y) = (x,\,y,\,0),\,\overline{R}_3(x) = (0,\,0,\,1)^T\) on the lower horizontal edge. On the vertical sides of the domain we prescribe zero forces and moments. On the top horizontal side we apply a small horizontal shearing \(\delta _h\) and a vertical prestress \(\delta _v\) by prescribing the Dirichlet boundary condition \(m(x,\,y) = (x + \delta _h,\, y+\delta _v,\, 0),\,\overline{R}_3(x,\,y) = (0,\,0,\,1)^T.\)

Following Wong and Pellegrino, we set the Lamé constants to \(\mu = 5.6452 \times 10^9\,\mathrm {N}/\mathrm {m}^2\) and \(\lambda = 2.1796 \times 10^9\,\mathrm {N}/\mathrm {m}^2,\) which corresponds to the values \(E = 3.5\,\text {GPa},\nu = 0.31\) given in [52]. The shell thickness is \(h = 25\,\upmu \)m. Additionally, we set the Cosserat couple modulus \(\mu _c = 0,\) the curvature exponent \(q = 2,\) and the internal length scale \(L_c = 0.025\,\upmu \mathrm {m}.\) In [52], Wong and Pellegrino state that they vertically prestress their sheets slightly, but no numbers are given. For their own numerical simulations described in [53], they use a value of \(\delta _v = 0.5\) mm. In our own numerical experiments we found that \(\delta _v = 0.5\) mm leads to wrinkles that are too vertical, in particular if there is not much shearing. Low values of \(\delta _v\) on the other hand do not produce enough wrinkles. Best results were obtained using values between 0.2 and 0.4 mm.

We numerically reproduce two of the four shearing experiments described in [52]. The first has a shearing value of \(\delta _h = 0.5\) mm. For this we discretize the domain by a structured grid with \(120 \times 40 = 4800\) quadrilateral elements, and second-order GFEs. We set the vertical prestress to \(\delta _v = 0.2\) mm, and start the trust-region solver from the node-wise interpolant of the function

$$\begin{aligned}&m(x,\,y) = \left( x + \delta _h y / 128\,\mathrm {mm},\; y,\; 2\,\mathrm {mm} \cos (10 x)\right) , \\&\quad \overline{R}(x,\,y) = \mathrm{Id}, \end{aligned}$$

together with the Dirichlet boundary values on the top horizontal side. The cosine waves were added to break the initial symmetry. No attempt was made to influence the simulation results by deliberate adjustments of the initial value.

Plots of the wrinkle elevation are shown on the left of Fig. 10. The results of the corresponding experiment of Wong and Pellegrino can be seen in Fig. 11, also on the left. We obtain a very good quantitative match with our simulation. In particular, we obtain almost the same number of wrinkles (Fig. 12). Moreover, observe how the simulation faithfully reproduces a lot of the fine structure, such as the secondary wrinkles near the horizontal sides, and the wrinkles near the vertical sides.

On the other hand, the amplitudes predicted by our simulation are slightly larger than the ones observed in the experiments. Also, the wrinkles are inclined at a slightly steeper angle than the experimental ones. This suggests that the prestress values \(\delta _v\) is still too large. However, as mentioned above, a lower value of \(\delta _v\) leads to a lower number of wrinkles.

The second experiment uses a larger shear value of \(\delta _h = 3\) mm. With the other parameters as above we obtain a result that is qualitatively correct, but the number of wrinkles is less than what Wong and Pellegrino observed in their experiments. A better match is obtained by increasing the vertical prestress to \(\delta _v = 0.4\) mm and using a fine grid with \(240 \times 80 = 19,200\) elements. This simulation is what is plotted on the right of Figs. 10, 11, and 12. Now we observe a very good quantitative agreement also for this more extreme case, with the same restrictions as for the low-shear case. Since we have not observed artificial stiffness introduced by our discretization, we suspect that using the finer grid makes the trust-region algorithm end up in a different local minimizers of the energy.