1 Introduction

Crystal plasticity finite element (CPFE) models of polycrystalline materials e.g. metals and alloys, are extensively used for simulating deformation under conditions of creep, fatigue, and high stain-rate loading [3,4,5,6,7,8,9,10,11,12]. Crystal plasticity models account for dislocation glide on crystallographic planes and associated hardening due to evolving dislocation structures in the crystalline microstructure. Deformation twinning is also a critical deformation mechanism in the microstructure, which causes change in lattice orientations with localized deformation inside thin twin bands. In hexagonal close-packed (HCP) crystals e.g. magnesium alloys with large differences in the slip system resistances, limited crystallographic slip in certain slip systems triggers the formation of microtwins, which can lead to room temperature brittle-like failure. Sharp changes in the crystallographic texture of polycrystalline microstructures due to twin formation are typically associated with crack nucleation at grain boundary-twin band intersections [13]. Twinning can induce characteristic features in the material response like plastic anisotropy, tension-compression asymmetry, local material softening, etc.

Despite its high relevance, deformation twinning has not been adequately addressed in CPFE modeling until recent developments e.g. in [2, 14,15,16,17]. A method, which allows the use of a wide range of strain rate sensitivity without sacrificing the computational efficiency, has been proposed in [17]. In general, conventional crystal plasticity twin models have adopted the implicit twin volume fraction-based approach that treats twin propagation in the same way as slip [18,19,20,21,22,23]. These approaches ignore deformation heterogeneity within the discrete twins in the microstructure. Hence, they are mostly incapable of predicting the onset of material failure triggered by heterogeneous micro-twins. Explicit twin formation models within CPFE framework have been proposed based on phenomenological twin formation criteria and adaptive mesh-regeneration methods [14, 15, 17]. Such explicit CPFE twin formation models show promise of accurately predicting twinning induced material failure if the physics of twin nucleation, propagation and interactions are correctly taken into account [1, 2].

A major difficulty with conducting high fidelity, image-based CPFE simulations of polycrystalline microstructures with explicit twin formation is the prohibitively high demands on computing time and resources. This stems from the mismatch in deformation-rates between the rapidly evolving twins and the low rate of deformation in the surrounding crystalline matrix. The localized deformation with high-rates of twin evolution in the twin bands cause numerical instability with the stiff non-linear crystal plasticity constitutive equations, which can only be overcome by adopting very fine simulation time steps. This requirement should be however met only within a small fraction of the overall polycrystalline domain. In the absence of a time integration scheme that can differentiate the twinned regions from the untwinned regions, a very high temporal resolution is required for the entire domain. The high resolution mesh of the polycrystalline microstructure, in combination with very fine time steps, cause the CPFE simulation of statistically-equivalent representative volume elements (SERVEs), to be computationally exorbitant.

Numerical methods have been proposed to alleviate lack of convergence with the time-integration algorithms for crystal plasticity constitutive models. Methods of improving convergence have been suggested through the choice of an initial predictor to start iterations in an implicit scheme [24] or by adding a line search technique to the Newton–Raphson iteration scheme [25]. Numerical algorithms using the forward Euler explicit time integration schemes for crystal plasticity models have been proposed in [26]. The Taylor series expansion has been introduced in the integration algorithm in [27] to linearize the crystal plasticity equations that is solved by the Gauss elimination method in a two-level solving sequence. The adaptive sub-stepping method in [28, 29] integrates the crystal plasticity models by splitting the deformation increment into sub-steps. A local error estimator for the crystal plasticity constitutive models with the Runge−Kutta method has been proposed in [30].

While these methods accelerate the slip-based crystal plasticity models, they are inappropriate for non-local micro-twin models as they use one single time step for the entire spatial domain. Subcycling time integration algorithms have been proposed for differentiated time integration in [31,32,33,34,35], where elements or nodes are separated into groups, each associated with a different time-step. This method has been used for improved efficiency in structural FE analysis, where the time step of each subdomain is chosen according to the local stability criteria.

The concept of sub-cycling is adopted in this paper for developing an efficient crystal plasticity FE model of polycrystalline materials with evolving micro-twins. It proposes a multi-time step, subcycling algorithm that is based on the adaptive partitioning of the evolving computational domain into twinned and untwinned domains. The time-integration scheme treats the region of deforming and propagating twins with a much finer time step in comparison with the remaining crystalline matrix region. This paper is organized as follows. The finite deformation crystal plasticity finite element (CPFE) formulation with explicit micro-twin nucleation and evolution criteria are summarized in Sect. 2. The numerical implementation in CPFEM with twin nucleation and propagation is discussed in Sect. 3. Convergence issues with integrating crystal plasticity and twin evolution relations are evaluated in Sect. 4. In Sect. 5, an adaptive subcycling algorithm is developed for accelerating CPFE simulations. Numerical examples with the subcycling-augmented CPFEM are executed for validation in Sect. 6 and the paper is concluded in Sect. 7.

2 Finite deformation crystal plasticity finite element formulation

The finite element weak form of equilibrium equations for a microstructural representative volume element undergoing finite deformation is obtained by taking the product of the governing equations with a weighting function and integrating over the volume in the current or reference configuration. In an incremental formulation and solution process, where a typical time step transcends discrete temporal points t to \(t+\triangle t\), the principle of virtual work for a quasi-static process at time \(t+\triangle t\) occupying the domain \(\varOmega \subset R^3\) is expressed as [36]:

$$\begin{aligned}&\int _{\varOmega ^{t+\triangle t}}(\nabla \delta \mathbf {u}^{t+\triangle t}):\varvec{\sigma } \, d\varOmega ^{t+\triangle t}\nonumber \\&\quad =\int _{\varOmega ^{t+\triangle t}}\delta \mathbf {u}^{t+\triangle t}\cdot \mathbf {b}\, d\varOmega ^{t+\triangle t} \nonumber \\&\qquad +\int _{\varGamma ^{t+\triangle t}_{\sigma }}\delta \mathbf {u}^{t+\triangle t}\cdot \bar{\mathbf {t}}\, d\varGamma ^{t+\triangle t}_{\sigma }\quad \forall \delta \mathbf u ^{t+\triangle t}\in \varvec{\mathscr {U}} \end{aligned}$$
(1)

where \(\varvec{\sigma }\) is the Cauchy stress, \(\mathbf b \) is the body force per unit volume, \(\bar{\mathbf{t}}\) is the applied traction. \(\varGamma \) is the surface on which traction is applied. \(\mathscr {U}=\left\{ \delta u^{t+\triangle t}_{i}\mathbf {e}_{i}\in H^{1}({\varOmega }),\;\delta \mathbf {u}^{t+\triangle t}=\mathbf {0}\;\text {on}\;{\varGamma }_{u}\right\} \) is the space of virtual displacement. Using an updated Lagrangian formulation [37], Eq. (1) may be written in an incremental form as:

$$\begin{aligned}&\int _{\varOmega ^t} \delta \triangle \mathbf {E}\,:\triangle \mathbf {S}d\varOmega ^{t}+\int _{\varOmega ^{t}}\delta \varvec{\eta }\,:\varvec{\sigma }^{t} d\, \varOmega ^{t}\nonumber \\&\quad =R^{{ext}\,^{t+\triangle t}}-\int _{\varOmega ^{t}}\delta \mathbf {e}: \varvec{\sigma }^{t} d\,\varOmega ^{t} \end{aligned}$$
(2)

In the above equation, \(\triangle \mathbf {S}=\mathbf {S}_{t}^{t+\triangle t}- \varvec{\sigma }^{t}\) is the increment of second Piola–Kirchhoff stress, \(\triangle \mathbf {E}= \mathbf {E}_{t}^{t+\triangle t}- \mathbf {E}_{t}^{t}\) is the increment of Green–Lagrange strain. The entire right-hand side of Eq (1) is the external virtual work and is expressed as \(R^{{ext}\,^{t+\triangle t}}\). Furthermore, \(\mathbf{e}=\frac{1}{2}\left[ \left( \frac{\partial \triangle \mathbf{u}}{\partial \mathbf{x}^t}\right) ^T+ \frac{\partial \triangle \mathbf{u}}{\partial \mathbf{x}^t} \right] \) and \({\varvec{\eta }}=\frac{1}{2} \left( \frac{\partial \triangle \mathbf{u}}{\partial \mathbf{x}^t}\right) ^T \frac{\partial \triangle \mathbf{u}}{\partial \mathbf{x}^t} \) are respectively the linear and non-linear parts of \(\triangle \mathbf {E}\). The non-linear Eq. (2) is solved using a Newton–Raphson iterative method. In the i-th iteration step of the update scheme from time t to \(t+\triangle t\), the linearized Eq. (2) to be solved are written as:

$$\begin{aligned} \mathbf {K}_{{t+\triangle t},{i}} \triangle \mathbf {u}=\mathbf {f}_{{t+\triangle t},i}^{ext}-\mathbf {f}_{{t+\triangle t},i}^{int} \end{aligned}$$
(3)

where \(\mathbf {K}_{{t+\triangle t},{i}}\) is the global tangent stiffness matrix in the i-th iterative step, and \(\mathbf {f}_{{t+\triangle t},i}^{ext}\) and \(\mathbf {f}_{{t+\triangle t},i}^{int}\) are respectively the prescribed applied external force vector and the internal force vector. For the i-th iteration step, the force vectors are written as:

$$\begin{aligned}&\mathbf {K}_{{t+\triangle t},{i}}=\int _{\varOmega ^{{t+\triangle t},{i}}} \mathbf {B}^{T}\mathbb {C}^{{t+\triangle t},{i}}\mathbf {B} d\varOmega ^{{t+\triangle t},{i}} \nonumber \\&\quad \quad \quad \quad \quad +\int _{\varOmega ^{{t+\triangle t},{i}} } \mathbf {G}^{T}\underset{\sim }{{\varvec{\sigma }}}^{{t+\triangle t},{i}}\mathbf {G} d\varOmega ^{{t+\triangle t},{i}} \end{aligned}$$
(4a)
$$\begin{aligned}&\mathbf {f}_{{t+\triangle t},{i}}^{int} =\int _{\varOmega ^{{t+\triangle t},{i}} }\mathbf {B}^{T}\varvec{\sigma }^{{t+\triangle t},{i}} d\varOmega ^{{t+\triangle t},{i}} \end{aligned}$$
(4b)
$$\begin{aligned}&\mathbf {f}_{{t+\triangle t},{i}}^{ext} =\int _{\varOmega ^{{t+\triangle t},{i}} } \mathbf{N}^{T} \mathbf{b}^{t+\triangle t}~ d \varOmega ^{{t+\triangle t},{i}} \nonumber \\&\quad \quad \quad \quad \quad + \int _{\varGamma ^{{t+\triangle t},{i}} } \mathbf{N}^{T} \bar{\mathbf{t}}^{t+\triangle t} ~d\varGamma ^{{t+\triangle t},{i}} \end{aligned}$$
(4c)

Here \(\mathbb {C}^{{t+\triangle t},{i}}\) is the elasto-plastic tangent stiffness matrix in the i-th iteration step, \(\mathbf B\) is the strain-displacement matrix, \(\mathbf {G}\) is the gradient operator matrix , and \(\mathbf N\) is the shape function matrix. The matrix \(\underset{\sim }{{\varvec{\sigma }}}^{{t+\triangle t},{i}}\) is explicitly written as:

$$\begin{aligned} \underset{\sim }{{\varvec{\sigma }}}^{{t+\triangle t},{i}}=\left[ \begin{array}{c@{\quad }c@{\quad }c} {\varvec{\sigma }}^{{t+\triangle t},{i}} &{} {\mathbf {0}} &{} {\mathbf {0}} \\ {\mathbf {0}} &{} {\varvec{\sigma }}^{{t+\triangle t},{i}} &{} {\mathbf {0}} \\ {\mathbf {0}} &{} {\mathbf {0}} &{} {\varvec{\sigma }}^{{t+\triangle t},{i}} \end{array}\right] \end{aligned}$$
(5)

where \({{\varvec{\sigma }}}^{{t+\triangle t},{i}}\) is the \(3 \times 3\) Cauchy stress matrix, \({\mathbf {0}}\) is a \(3 \times 3\) matrix of zeros. If the maximum value in the residual array \(\mathbf {f}_{{t+\triangle t},{i}}^{ext}-\mathbf {f}_{{t+\triangle t},{i}}^{int}\) is larger than a small tolerance, the displacement for the next iterate \(i+1\) is updated to the following value:

$$\begin{aligned} \mathbf {u}^{{t+\triangle t},{i+1}}=\mathbf {u}^{{t+\triangle t},i} + \triangle \mathbf {u} \end{aligned}$$
(6)

Equations (3)–(5) are evaluated again in the updated configuration.

2.1 Summary of crystal plasticity-deformation twin constitutive models

A twin nucleation-evolution model has been proposed in conjunction with a crystal plasticity (CP) model in [2] for Magnesium alloys undergoing deformation induced twinning in polycrystalline microstructures. The crystal plasticity-twin model considers total 12 slip systems and 6 twinning systems, consisting of 3 \(\left\langle a\right\rangle \)-basal, 3 \(\left\langle a\right\rangle \)-prismatic, 6 second order \(\left\langle c+a\right\rangle \)-pyramidal slip systems and 6 \(\left\{ 10\bar{1}2\right\} \) extension twin systems. The CP-twin model adopts different flow rules for the twinned region \(\varOmega _{twin}\) and the untwinned matrix region \(\varOmega _{matrix}\), which are respectively expressed as:

$$\begin{aligned} {\mathbf {L}}^{p}= & {} \dot{\mathbf{F}}^{p}{\mathbf {F}^{p}}^{-1}\nonumber \\= & {} \left\{ \begin{array}{ll} \displaystyle \sum \limits _{\beta =1}^{N_{twin}}{\dot{\gamma }_{tw}^{\beta }\mathbf{s}_{0,tw}^{\beta }}+\displaystyle \sum \limits _{\alpha =1}^{N_{slip}}{\dot{\tilde{\gamma }}^{\alpha }{ \tilde{\mathbf{s}}}_{0,slip}^{\alpha }}&{}\quad \text {if} \; x\in \varOmega _{twin}\\ \displaystyle \sum \limits _{\alpha =1}^{N_{slip}}{\dot{\gamma }^{\alpha }\mathbf{s}_{0,slip}^{\alpha }}&{}\quad \text {if} \; x\in \varOmega _{matrix} \end{array}\right. \end{aligned}$$
(7)

where \(\mathbf {L}^{p}\) is the plastic velocity gradient, \(\dot{\gamma }^{\alpha }\) is the slip rate on a slip system \(\alpha \) and \(\mathbf{s}_{0,slip}^{\alpha }\) is its Schmid tensor in the reference configuration. \(\dot{\gamma }_{tw}^{\beta }\) is the shearing rate due to twin propagation on twinning system \(\beta \), and \(\mathbf{s}_{0,tw}^{\beta }\) is the Schmid tensor for twinning. After a material point has twinned, the crystallographic slip plane and direction are reorientated symmetrically by reflection across a mirror or twin plane in the reference configuration denoted by the Schmid tensor \({\tilde{\mathbf{s}}}_{0,slip}^{\alpha }\) [38]. The corresponding slip rate on the new crystallographic planes is denoted by \({\dot{\tilde{\gamma }}}^{\alpha }\). The dislocation slip-rates \(\dot{\gamma }^{\alpha }\) and \({\dot{\tilde{\gamma }}}^{\alpha }\) can be described using a power law model for magnesium, which is written as (\(\tilde{\;}\) dropped for convenience):

$$\begin{aligned} \dot{{\gamma }}^{\alpha }=\dot{\gamma }_{0}^{\alpha } \left| \frac{\tau ^{\alpha }-s_{a}^{\alpha }}{s_{*}^{\alpha }} \right| ^{\frac{1}{m}}sign(\tau ^{\alpha }-s_{a}^{\alpha }) \end{aligned}$$
(8)

Here \(\dot{\gamma }_{0}^{\alpha }\) is a reference slip rate for the slip system \(\alpha \) and m is the power law exponent representing strain-rate sensitivity. The resolved shear stress on slip system \(\alpha \) is expressed as \(\tau ^{\alpha }=\mathbf {F}^{eT}\mathbf {F}^{e}\mathbf {S} : {\mathbf {s}}_0^{\alpha }\). \(s^{\alpha }_{a}\) is the athermal shear resistance due to stress-field between parallel dislocation lines and \(s^{\alpha }_{*}\) is the thermal shear resistance due to local repelling forest dislocations. A hyper-elastic relation in the intermediate configuration relates the second Piola–Kirchhoff stress with the Green–Lagrange strain \(\mathbf {E}^{e}\left( =\frac{1}{2}\left( \mathbf {F}^{eT} \mathbf {F}^{e}-\mathbf {I}\right) \right) \) as:

$$\begin{aligned} \mathbf {S}=\mathbb {C}^{e}:\frac{1}{2}\left( \mathbf {F}^{eT} \mathbf {F}^{e}-\mathbf {I}\right) \quad \text {with}\mathbf {F}^{e}=\mathbf {F}{\mathbf {F}^{p}}^{-1} \end{aligned}$$
(9)

where \(\mathbb {C}^{e}\) is the transversely isotropic elasticity tensor. \(\mathbf {F}^{e}\) is the elastic component of the deformation gradient obtained from the multiplicative decomposition of the total deformation gradient \(\mathbf {F}\). The slip system resistance evolves as a result of mobile dislocations interactions with sessile dislocations, twins and grain boundaries. The details of the hardening model for slip system resistance is given in the Appendix A.

2.1.1 Micro-twin nucleation model

A detailed twin nucleation model along with its numerical implementation into CPFE framework has been established in [1, 2]. The model considers generation of a twin nucleus from the non-planar dissociation process of a sessile pyramidal \(\left\langle c+a \right\rangle \) dislocation. The sessile pyramidal \(\left\langle c+a \right\rangle \) dislocation dissociates into a multi-layer twin nucleus, which propagates with applied resolved shear stress and leaves behind a residual stair-rod dislocation to conserve the Burgers vector. The elastic theory of dislocations [39] is adopted to analyze the change of energy during the dissociation process, which suggests that a stable twin nucleus will form if the following three energy-based criteria are satisfied simultaneously.

$$\begin{aligned}&\text {Dissociation condition}: ~~~~ E_{ini} \ge E_{tw}(d=0, L_0)+E_{r} \end{aligned}$$
(10a)
$$\begin{aligned}&\text {Irreversibility condition}: ~~~ E_{ini}> E_{f}(d_{s},L_0,\tau _{tw}) \end{aligned}$$
(10b)
$$\begin{aligned}&\text {Reliability condition}: ~~~~~~~~~ d_{s}>2r_{0} \end{aligned}$$
(10c)

The initial energy of the system \(E_{ini}\) is given by the self-energy of the sessile \(\left\langle c+a \right\rangle \) dislocation before dissociation and \(L_0\) is the length of sessile \(\left\langle c+a \right\rangle \) dislocation. After the occurrence of dissociation, \(E_{tw}\) is the self-energy of the twinning dislocation loop, \(E_r\) is the self-energy of the stair-rod dislocation, and d is the separation distance between the front segment of twinning dislocation loop and the stair-rod dislocation. \(E_{f}\) is the total post-dissociation dislocation energy of the system. It includes \(E_{tw}\), \(E_{f}\), the stacking fault energy, the interaction energy between twin and stair rod partial dislocations, and the external work. The first criterion states that the dissociation process spontaneously occurs only if the initial energy exceeds the energy of the two partials before any further separation. The second criterion suggests that the equilibrium condition for separation is energetically favorable and the process is irreversible if the final energy is less than the initial energy at a stable separation distance \(d_s\). The third criteria requires that \(d_s\) must exceed a threshold separation distance so that the elastic dislocation theory is valid. Here \(d_s\) minimizes the total energy \(E_f\) after dissociation, i.e. \(\frac{\partial E_{final}}{\partial d}=0,\ \frac{\partial ^{2}E_{final}}{\partial d^{2}}\ge 0\). Critical twin nucleation parameters in Eq. (10) have been calibrated from experiments in [1].

Fig. 1
figure 1

Illustration of the scheme of implementing twin nucleation and propagation models into CPFEM

2.1.2 Micro-twin propagation model

After a stable micro-twin has nucleated, it propagates by two types of mechanisms, viz. (i) elongation of the twin by rapid gliding of twin partial dislocations on twin planes, and (ii) thickening or growth of the twin by migrating twin boundaries from one twin plane to adjacent \(\left\{ 10\bar{1}2\right\} \) twin planes [40]. Gliding of twin partial dislocations occurs by a mixed shear-shuffle process as discussed in [41]. The shear process requires all atoms in the twin partial dislocation core to move by a Burgers vector \(b_{tw}\) distance in same direction. However, unlike slip, the shear motion does not preserve the lattice structure. The configuration after shearing corresponds to a high energy stage. Corrective non-unidirectional atomic shuffling is required at the same time as shearing to place atoms into low-energy symmetric positions, which reduces the energy barrier and allows the further gliding of the twin partial dislocations. Due to the non-unidirectional character of shuffling, the propagation of twins is deemed as a thermal activation process. Modeled in [2] the elongation and growth velocities of twin band are respectively expressed as:

$$\begin{aligned} v_{glide}&=f\lambda _{shear}\exp \left( -\frac{\triangle F-\tau A_{P} b_{tw}}{K_{B}T}\right) \end{aligned}$$
(11a)
$$\begin{aligned} v_{grow}&=\frac{d_{tw}}{\triangle t_{tw}}=d_{tw}P_{promoter}\rho _{tot}l_{tw}v_{glide} \end{aligned}$$
(11b)

Here f is the shear-shuffle frequency, \(\lambda _{shear}\) is the shear distance and the term \(\exp \left( -\frac{\triangle F-\tau A_{P} b_{tw}}{K_{B}T}\right) \) is the probability of continuous gliding in the presence of an internal energy barrier \(\triangle F\). \(A_p\) is the shearing area during the plastic deformation, \(\rho _{tot}\) is the total dislocation density, \(P_{promoter}\) is the fraction of a special type of sessile dislocations that penetrate multiple twin planes and act as promoters to help twin partial dislocations move to adjacent planes, and \(l_{tw}\) is the length of twin partial dislocations. By substituting Eq. (11) into the Orowan equation, the shear-rate on twin systems has been derived in [2] as:

$$\begin{aligned} \dot{\gamma }_{tw}=\dot{\gamma }_0 \left| \frac{\tau (T)}{s_{tw}}\right| ^{\frac{\triangle F}{K_{B}T}}sign\left( \tau (T)\right) \end{aligned}$$
(12)

where \(\dot{\gamma }_0=\rho _{tw} b_{tw} f_{shuffle}\lambda _{shuffle}\) is a reference shear rate, and \(s_{tw}\) is twin system shear resistance. The evolution of \(s_{tw}\) is the result of interaction between mobile dislocations and twin boundaries, which is detailed in Appendix B. In addition, deformation twinning can only provide shear in one-direction, which leads to mirror symmetric lattice configurations. Thus, the shear associated with deformation twinning should satisfy the constraint:

$$\begin{aligned} 0\le \gamma _{tw}\le \gamma _{tw}^{max} \end{aligned}$$
(13)

where \(\gamma _{tw}^{max}\) is the maximum shear associated with a specific twin variant in an integration volume. \(\gamma _{tw}^{max}\) varies for different HCP material. For Mg and Mg alloys, \(\gamma _{tw}^{max}=0.1289\) for \(\left\{ 10\bar{1}2\right\} \) extension twins.

3 Numerical implementation in CPFEM with twin nucleation and propagation

The micro-twin nucleation and propagation is explicitly implemented into the crystal plasticity finite element model, for simulating discrete twin evolution. The multi-stage implementation scheme is illustrated in Fig. 1. In stage 1, all element integration points are examined for twin nucleation using the criteria in Eq. (10). Once the nucleation criteria is satisfied at an element integration point, it is designated as a twin nucleation site. For the four-noded tetrahedral elements with one integration point, this implies that the element is a nucleation site. In the subsequent time steps corresponding to stage 2, all integration points belonging to a grain that contains the twin nucleated element will be examined to check if twin boundary propagates to this point using an explicit criteria. For a twin to propagate to a neighboring point \(\mathbf{X}\) in Fig. 2, the criteria requires:

$$\begin{aligned} v_{prop} \ge \frac{l_{glide}}{\triangle t_{twin}} ~~~~~\text {and}~~~~~ v_{growth} \ge \frac{l_{growth}}{\triangle t_{twin}} \end{aligned}$$
(14)

As illustrated in Fig. 2  \(l_{glide}\) is the distance between the nucleation site and a point \(\mathbf{X}\) projected onto the twin plane, \(l_{growth}\) is the distance of the projected normal to the twin plane and \(\triangle t_{twin}\) is the time interval from the time of twin nucleation to current time. Substituting Eq. (11) into Eq. (14), a non-local criterion is established to determine twin propagation to neighboring integration points, given as:

$$\begin{aligned}&\tau _{crit}\ge \frac{\ln \left( \frac{l_{growth}}{\triangle t_{twin}f_{shuffle}\lambda _{shear}}\right) K_{B}T+\triangle F}{A_{p}b_{tw}} \end{aligned}$$
(15a)
$$\begin{aligned}&\tau _{crit}\ge \frac{\ln \left( \frac{l_{glide}}{\triangle t_{twin}f_{shuffle}\lambda _{shear}d_{tw}P_{promoter}\rho _{tot}l_{tw}}\right) K_{B}T+\triangle F}{A_{p}b_{tw}} \end{aligned}$$
(15b)

The two Eqs. (15) provide quantitative measures of propagating a twin to a neighboring material point in longitudinal and transverse (through-thickness) directions respectively. In the CPFE implementation with the incremental time-marching process, this step is executed explicitly at the end of each time increment.

Fig. 2
figure 2

Illustration of a twin partial dislocation propagation and growth (thickening) to a neighboring point \(\mathbf{X}\) by gliding on the twin plane and growing normal to it

3.1 Time integration algorithm for the constitutive relation

Time integration of crystal plasticity-twin evolution constitutive model in Sect. 2.1 requires evaluation of the non-local variables related to GND and twins from neighboring elements. The non-locality is accounted for using a staggered integration algorithm, which proceeds in three steps as follows:

  • Step I: With known values of deformation variables at time t, and an incremented deformation gradient \(\mathbf{F}(t+\triangle t)\) at time \(t+\triangle t\), update the stress \(\mathbf {S}\), plastic deformation gradient \(\mathbf{F}^p\) and all internal state variables at each Gauss quadrature point, keeping the non-local GND density and twin variables fixed. This step assumes that the primary unknown variable is the second Piola–Kirchhoff stress \(\mathbf {S}\), and seeks its solution from a set of six nonlinear equations by a Newton–Raphson type iterative solver. For the i-th iteration, the residual \(\mathbf {G}\mathbf {(S}^{i})\) is defined as:

    $$\begin{aligned}&\mathbf {G}\mathbf {(S}^{i}(t+\triangle t))=\mathbf {S}^{i}(t+\triangle t)-\mathbf {S}^{tr} +\sum _{\alpha =1}\triangle \gamma ^{\alpha } \mathbf {B}^{\alpha } \end{aligned}$$
    (16a)
    $$\begin{aligned}&\mathrm {where} ~~~~~~ \mathbf {S}^{tr}=\mathbf {C}:\frac{1}{2}\left( \mathbf {A}(t+\triangle t)-\mathbf {I} \right) \end{aligned}$$
    (16b)
    $$\begin{aligned}&\triangle \gamma ^{\alpha }=\triangle t \dot{\gamma }_{0}^{\alpha }\left| \frac{\tau ^{i,\alpha }-s_{a}^{\alpha }}{s_{*}^{\alpha }}\right| ^{\frac{1}{m}}sign(\tau ^{i,\alpha }-s_{a}^{\alpha })\end{aligned}$$
    (16c)
    $$\begin{aligned}&\mathbf {B}^\alpha =\mathbf {C}:\left[ \frac{1}{2}\left( \mathbf {A}(t+\triangle t)\mathbf {s}_{0}^{\alpha }+\mathbf {s}_{0}^{\alpha }\mathbf {A}(t+\triangle t) \right) \right] \end{aligned}$$
    (16d)
    $$\begin{aligned}&\mathbf {A}(t+\triangle t)=\mathbf {F}^{p^{-T}}(t) \mathbf {F}^{T}(t+\triangle t) \mathbf {F}(t+\triangle t)\mathbf {F}^{p^{-1}}(t) \end{aligned}$$
    (16e)

    The i-th iteration update to \(\mathbf {S}\) is evaluated as:

    $$\begin{aligned} \mathbf {S}^{i+1}(t+\triangle t)=\mathbf {S}^{i}(t+\triangle t)-\left. \frac{\partial \mathbf {G}}{\partial \mathbf {S}}\right| _{i}^{-1}\mathbf {G}(\mathbf {S}^{i}(t+\triangle t)) \nonumber \\ \end{aligned}$$
    (17)

    Upon achieving convergence to within a tolerance, update the SSD-related slip system resistances \(s^{\alpha }_{a,SSD}\) and \(s^{\alpha }_{*,SSD}\) using Eqs. (28) and (29). Repeat the Newton–Raphson calculation of second Piola–Kirchhoff stress \(\mathbf {S}\) till \(s^{\alpha }_{a,SSD}\) and \(s^{\alpha }_{*,SSD}\) converge. Finally, update the plastic deformation gradient \(\mathbf {F}^p\), Cauchy stress and elasto-plastic tangent stiffness \(\mathbb {C}=\frac{\partial \mathbf {\sigma }}{\partial \mathbf {\varepsilon }}\) before proceeding to step II.

  • Step II: Evaluate the nodal values of \(\mathbf {F}^p\) from the Gauss quadrature points by using the super-convergent patch recovery (SPR) method [1, 42]. Interpolate \(\mathbf {\nabla }_{X} \times {\mathbf {F}^p}^{T}\) at quadrature points with finite element shape functions and update the GND densities and their rates of hardening from Eqs. (30) to (34).

  • Step III: Evolve the twinned domain by checking for new twin nucleation in this step, by: (a) using criteria in Eq. (10) at each Gauss quadrature point, as well as (b) checking criteria in Eq. (15) for propagation and growth of already nucleated twins in the neighboring elements of the twin nucleation site.

It has been observed that such an staggered update of twinning can lead to convergence difficulties, as well as inaccuracies in the computation of deformation variables in the twinned domain. A subcycling algorithm to overcome these obstacles are developed in the subsequent sections.

4 Convergence issues with integrating crystal plasticity and deformation twinning constitutive relations

Time integration of rate-dependent crystal plasticity equations incorporating twin evolution suffers from numerical instabilities and lack of convergence, ensuing from the highly localized plastic flow in narrow twin bands. Experimental studies in compression tests have reported the formation of thin \(\left\{ 10\bar{1}2\right\} \) twin bands as early as at \(0.4\%\) overall strain [43]. In contrast, the local shear strain within twin bands is found to be \({\sim }12.89\%\). The significant increase of strains from the matrix domains to those in the twin bands lead to largely discrepant critical time steps needed for convergence of the iterative time-integration scheme over the computational domain.

Fig. 3
figure 3

a Strain rate distribution at \(2\%\) strain in a CPFE simulation of AZ31 polycrystal microstructure, b distribution of critical time steps for integrating CP equations

Table 1 Distribution of critical time-step sizes among elements in the polycrystalline ensemble

The rate-dependent power-law type crystal plasticity constitutive models are quite stiff [24, 25, 29, 44]. The low strain-rate sensitivity of many metals at room temperature results in a very small value of the rate-sensitivity parameter m in the flow rule Eq. (8). For example, \(m\approx 0.02\) for Mg at room temperature. Small variations in the resolved shear stress causes large changes in the slip-rates. In Eq. (8), the slip-rate \(\dot{\gamma }\) quickly reaches very high values for \(\tau _{eff}^{\alpha }/s_*^{\alpha } > 1\). The slope of this non-linear equation also rapidly increases for \(\tau _{eff}^{\alpha } > s_*^{\alpha }\). When multiple heterogeneous slip/twin systems exist, the rapid slip-rate changes cause major convergence issues with the Newton–Raphson solver discussed in Sect. 3.1 (step I). The criterion governing convergence is given as:

$$\begin{aligned} \dot{\gamma }^{\alpha }\triangle t\le \triangle \gamma _{crit} \end{aligned}$$
(18)

where \(\triangle \gamma _{crit}\) is a critical slip increment [45]. An alternative criterion has been proposed in [46] as:

$$\begin{aligned} \tau _{eff}^{\alpha } / s_{a}^{\alpha } \le r_{crit} \end{aligned}$$
(19)

A value of \(r_{crit}=2.0\) has been assumed in [46]. The above criteria establish a critical time step \(\triangle t_{crit}\) for numerical time integration of the crystal plasticity model. Since the slip rate \(\triangle \gamma _{crit}\) and the resolved shear stress \(\tau _{eff}^{\alpha }\) are implicitly dependent on the time step \(\triangle t\) in a backward Euler update algorithm, \(\triangle t_{crit}\) is a function of \((m, \gamma _0, \mathbf {C}^e, s_{a}^{\alpha }, s_{*}^{\alpha }, \tau ^{\alpha }, \triangle \gamma _{crit}, r_{crit})\). Thus the critical time step evolves with the evolution of internal state variables.

When deformation twinning is included in the form of the power-law Eq. (12), the instability criterion is more stringent since twinning is even less strain-rate sensitive [47]. Also the twin evolution rate can be much faster than the strain evolution rate. Furthermore, lattice misorientations across grain and twin boundaries renders the strain fields strongly heterogeneous. Strains localize inside twin bands, while stresses concentrate at the intersections of grain–twin, twin–twin and grain–grain boundaries. As a consequence, the distribution of critical time steps inside the microstructural volume is not uniform. Figure 3a shows the strain-rate distributions at \(2\%\) overall strain in a polycrystalline microstructural SERVE of the Mg alloy AZ31 containing 152 grains. The results show that the strain-rates inside twin bands is in general \({\sim }\)(5–10) times higher than those in matrix. This corresponding heterogeneous distribution of critical time steps according to Eq. (18) is depicted in Fig. 3b. Elements inside the twin bands, especially those near twin boundaries and grain boundaries, require much smaller time steps to integrate the twin constitutive equations in comparison with those in the exterior regions. A quantitative comparison of the critical time step size is tabulated in Table 1. \(97\%\) elements can proceed with a time step of \(\varDelta t=10\) s, while only \(3\%\) of the elements located in critical regions require a significantly smaller time step of \(\varDelta t=0.0391\) s. The reduction in time steps is by a factor of \(\sim 255\). This can cause a huge loss of efficiency if all elements are changed to this time step, especially for high resolution microstructural models.

Fig. 4
figure 4

Schematic of the subcycling algorithm showing partitioning and equilibrating domains

5 Adaptive subcycling algorithm for accelerating CPFE simulation

An adaptive multi-time domain subcycling algorithm is proposed for increasing the overall computational efficiency, while accounting for the above-mentioned convergence issues. It is able to avert the efficiency compromise due to minimum critical time step requirements, while allowing the differential temporal resolution in the computational domain. The algorithm partitions the computational domain of the microstructure into sub-domains that are classified as critical (high strain-rate) and non-critical (low-strain-rate). Simulations in each sub-domain can proceed with independent time steps, as required by stability and accuracy criteria for optimal efficiency. For a twinned microstructure, regions of twin bands may be solved with fine time steps, while solving the remaining regions with coarse time steps. A schematic layout of the algorithm is shown in Fig. 4. Starting from a known state at time t, the integration algorithm for the time increment t to \(t + \triangle t\) solves the non-critical sub-domain problem using the coarsest possible time-increment \(\triangle t\) and the critical sub-domain problem using fine time steps \(\triangle \tau \ll \triangle t\). To achieve global equilibrium for the computational domain, the different sub-domains are coupled and residuals at the interfaces of discrepant time-steps in the assembled sub-domains are minimized using a predictor-corrector scheme. Displacement fields at nodal points of adjacent sub-domains that are integrated by different time-steps, will not satisfy compatibility in general. The subcycling algorithm evaluates displacement correctors through the process of equilibrating nodal residual forces.

5.1 The subcycling algorithm: formalism and implementation

The subcycling algorithm partitions the computational domain of the virtual microstructure \(\varOmega \) into a fine time-scale domain \(\varOmega ^F\) and a coarse time-scale domain \(\varOmega ^C\). The formulation is based on Eqs. (3) and (4) in Sect. 2. The nodal degrees of freedom \(\triangle \mathbf {u}\) for the partitioned domain may be categorized into three parts, viz. \(\triangle \mathbf {u}^{F}\), \(\triangle \mathbf {u}^{C}\), and \( \triangle \mathbf {u}^{I}\) depending on whether they belong to \(\varOmega ^F\), \(\varOmega ^C\) or their interface, \(\partial \varOmega ^I\)respectively. The tangent stiffness matrix \(\mathbf K\) and out-of-balance force vector \(\mathbf {f}=\mathbf {f}^{ext}-\mathbf {f}^{int}\) in Eq. (3) are also split into multiple parts and the equilibrium equation is written as:

$$\begin{aligned} \left[ \begin{array}{c@{\quad }c@{\quad }c} \mathbf {K}_{\varOmega ^{F}}^{F,F} &{} \mathbf {K}_{\varOmega ^{F}}^{F,I} &{} 0\\ \mathbf {K}_{\varOmega ^{F}}^{I,F} &{} \mathbf {K}_{\varOmega ^{F}}^{I,I}+\mathbf {K}_{\varOmega ^{C}}^{I,I} &{} \mathbf {K}_{\varOmega ^{C}}^{I,C}\\ 0 &{} \mathbf {K}_{\varOmega ^{C}}^{C,I} &{} \mathbf {K}_{\varOmega ^{C}}^{C,C} \end{array}\right] \left\{ \begin{array}{c} \triangle \mathbf {u}^{F}\\ \triangle \mathbf {u}^{I}\\ \triangle \mathbf {u}^{C} \end{array}\right\} =\left\{ \begin{array}{c} \mathbf {f}_{\varOmega ^{F}}^{F}\\ \mathbf {f}_{\varOmega ^{F}}^{I}+\mathbf {f}_{\varOmega ^{C}}^{I}\\ \mathbf {f}_{\varOmega ^{C}}^{C} \end{array}\right\} \end{aligned}$$
(20)

where \(\left[ \mathbf {K}_{\varOmega ^{F}}\right] =\left[ \begin{array}{c@{\quad }c} \mathbf {K}_{\varOmega ^{F}}^{F,F} &{} \mathbf {K}_{\varOmega ^{F}}^{F,I}\\ \mathbf {K}_{\varOmega ^{F}}^{I,F} &{} \mathbf {K}_{\varOmega ^{F}}^{I,I} \end{array}\right] \) and \(\left\{ \mathbf {f}_{\varOmega ^{F}}\right\} =\left\{ \begin{array}{c} \mathbf {f}_{\varOmega ^{F}}^{F}\\ \mathbf {f}_{\varOmega ^{F}}^{I} \end{array}\right\} \) are the tangent stiffness matrix and the out-of-balance force vector that are integrated and assembled from the fine time-scale elements respectively. Likewise, \( \left[ \mathbf {K}_{\varOmega ^{C}}\right] =\left[ \begin{array}{cc} \mathbf {K}_{\varOmega ^{C}}^{I,I} &{}\quad \mathbf {K}_{\varOmega ^{C}}^{I,C}\\ \mathbf {K}_{\varOmega ^{C}}^{C,I} &{}\quad \mathbf {K}_{\varOmega ^{C}}^{C,C} \end{array}\right] \) and \( \left\{ \mathbf {f}_{\varOmega ^{C}}\right\} =\left\{ \begin{array}{c} \mathbf {f}_{\varOmega ^{C}}^{I}\\ \mathbf {f}_{\varOmega ^{C}}^{C} \end{array}\right\} \) are the tangent stiffness matrix and the force vector integrated and assembled from the coarse time-scale elements.

The subcycling algorithm proceeds in steps that are described below.

  1. 1.

    With the loading increments in a time step from t to \(t+\triangle t\), compute a predictor displacement increment \(\triangle \mathbf {u}^{pred}_{t \rightarrow t+\triangle t}\) for the entire domain \(\varOmega \) using the known global stiffness matrix \(\mathbf {K}_{t}\) at time t.

  2. 2.

    Use the predictor displacement increment at the interface \(\left\{ \triangle \mathbf {u}^{I,pred}_{t \rightarrow t+\triangle t}\right\} \) as the displacement boundary condition to solve the CPFE problem for only the coarse domain in the interval t to \(t+\triangle t\), with a coarse time-step increment \(\triangle t\). The coarse time-scale domain problem is solved for the incremental displacement \(\left\{ \triangle \mathbf {u}^{C}_{t\rightarrow t+\triangle t}\right\} \) from a subset of Eq. (20), given as:

    $$\begin{aligned} \left[ \mathbf {K}_{\varOmega ^{C},t}^{C,C}\right] \left\{ \triangle \mathbf {u}^{C}_{t\rightarrow t+\triangle t}\right\}= & {} \left\{ \mathbf {f}_{\varOmega ^{C},t+\triangle t}^{C,ext}\right\} -\left\{ \mathbf {f}_{\varOmega ^{C},t}^{C,int}\right\} \nonumber \\&-\left[ \mathbf {K}_{\varOmega ^{C},t}^{C,I}\right] \left\{ \triangle \mathbf {u}_{t\rightarrow t+\triangle t}^{I,pred}\right\} \end{aligned}$$
    (21)

    The tangent stiffness matrix and force vector can then be updated to time \(t+\triangle t\) as \(\left[ \mathbf {K}_{\varOmega ^{C},t+\triangle t}\right] \), and \( \left\{ \mathbf {f}_{\varOmega ^{C},t+\triangle t}\right\} \).

  3. 3.

    Apply the predictor displacement increment at the interface \(\left\{ \triangle \mathbf {u}^{I,pred}_{t \rightarrow t+\triangle t}\right\} \) as the displacement boundary condition to solve the explicit CPFE problem for only the fine time-scale domain from time t to \(t+ \triangle t\), with the time increment \(\triangle t\) sub-divided into smaller increments \(\triangle \tau \ll \triangle t\). The displacement boundary condition for each sub-step \(\tau \rightarrow \tau +\triangle \tau \) is linearly interpolated from \(\left\{ \triangle \mathbf {u}^{I,pred}_{t \rightarrow t+\triangle t}\right\} \) as:

    $$\begin{aligned} \left\{ \mathbf {u}^{I,pred}_{\tau +\triangle \tau }\right\} = \left\{ \mathbf {u}^{I,pred}_{\tau }\right\} + \frac{\triangle \tau }{\triangle t}\left\{ \triangle \mathbf {u}^{I,pred}_{t \rightarrow t+\triangle t}\right\} \end{aligned}$$
    (22)

    where \(\left\{ \mathbf {u}^{I,pred}_{\tau }\right\} \) and \(\left\{ \mathbf {u}^{I,pred}_{\tau +\triangle \tau }\right\} \) are the displacements at \(\tau \) and \(\tau +\triangle \tau \), respectively. From Eq. (20), the fine time-scale domain problem is solved in the increment \(\tau \rightarrow \tau +\triangle \tau \) by applying the displacement \(\left\{ \triangle \mathbf {u}^{I,pred}_{\tau \rightarrow \tau +\triangle \tau }\right\} \) as:

    $$\begin{aligned} \left[ \mathbf {K}_{\varOmega ^{F},\tau }^{F,F}\right] \left\{ \triangle \mathbf {u}^{F}_{\tau \rightarrow \tau +\triangle \tau }\right\}= & {} \left\{ \mathbf {f}_{\varOmega ^{F},\tau +\triangle \tau }^{F,ext}\right\} -\left\{ \mathbf {f}_{\varOmega ^{F},\tau }^{F,int}\right\} \nonumber \\&-\left[ \mathbf {K}_{\varOmega ^{F},\tau }^{F,I}\right] \left\{ \triangle \mathbf {u}_{\tau \rightarrow \tau +\triangle \tau }^{I,pred}\right\} \nonumber \\ \end{aligned}$$
    (23)

    The following variables are obtained at the completion of step 3:

    • Fine time-scale domain displacement increments from t to \(t+\triangle t\): \(\left\{ \triangle \widetilde{ \mathbf {u}}^F_{t \rightarrow t+ \triangle t}\right\} \)

    • Tangent stiffness matrix of fine time-scale elements at time \(t+\triangle t\): \(\left[ \widetilde{\mathbf {K}}_{\varOmega ^F, t+\triangle t}\right] \)

    • Force vector of fine time-scale elements at time \(t+ \triangle t\): \(\left\{ \widetilde{\mathbf {f}}_{\varOmega ^F,t+\triangle t}\right\} \)

    The symbol (\(\widetilde{\;}\)) indicates that the variables are obtained by solving a sequence of fine time-increment FE problems.

  4. 4.

    Check for satisfaction of global equilibrium, i.e.:

    $$\begin{aligned} \left\| \left\{ \tilde{\mathbf {f}}_{\varOmega ^{F},t+\triangle t}^{F},\;{\tilde{\mathbf{f}}}_{\varOmega ^{F},t+\triangle t}^{I}+\mathbf {f}_{\varOmega ^{C},t+\triangle t}^{I},\;\mathbf {f}_{\varOmega ^{C},t+\triangle t}^{C}\right\} ^{T}\right\| \le R_{crit} \end{aligned}$$
    (24)

    where \(R_{crit}\) is a scalar convergence tolerance.

    • if yes, exit iteration. \(\left\{ \triangle \widetilde{\mathbf {u}}^{F}_{t \rightarrow t+\triangle t}, \triangle \mathbf {u}^{I,pred}_{t \rightarrow t+\triangle t},\right. \left. \triangle \mathbf {u}^{C}_{t \rightarrow t+\triangle t} \right\} ^T\) is the solution to Eq. (3).

    • if no, calculate the corrector to the displacement increment vector as:

      $$\begin{aligned}&\triangle \mathbf {u}_{t\rightarrow t+\triangle t}^{corr}=\left\{ \begin{array}{c} \triangle \widetilde{\mathbf {u}}_{t\rightarrow t+\triangle t}^{F}\\ \triangle \mathbf {u}_{t\rightarrow t+\triangle t}^{I,pred}\\ \triangle \mathbf {u}_{t\rightarrow t+\triangle t}^{C} \end{array}\right\} +\left\{ \begin{array}{c} \triangle \hat{\mathbf {u}}^{F}\\ \triangle \hat{\mathbf {u}}^{I}\\ \triangle \hat{\mathbf {u}}^{C} \end{array}\right\} \nonumber \\&\left\{ \begin{array}{c} \triangle \hat{\mathbf {u}}^{F}\\ \triangle \hat{\mathbf {u}}^{I}\\ \triangle \hat{\mathbf {u}}^{C} \end{array}\right\} =\left[ \begin{array}{ccc} \mathbf {\widetilde{K}}_{\varOmega ^{F}}^{F,F} &{} \mathbf {\widetilde{K}}_{\varOmega ^{F}}^{F,I} &{} \mathbf {0}\\ \mathbf {\widetilde{K}}_{\varOmega ^{F}}^{I,F} &{} \mathbf {\widetilde{K}}_{\varOmega ^{F}}^{I,I}+\mathbf {K}_{\varOmega ^{C}}^{I,I} &{} \mathbf {K}_{\varOmega ^{C}}^{I,C}\\ \mathbf {0} &{} \mathbf {K}_{\varOmega ^{C}}^{C,I} &{} \mathbf {K}_{\varOmega ^{C}}^{C,C} \end{array}\right] _{t+\triangle t}^{-1}\nonumber \\&\quad \quad \quad \quad \quad \quad \times \left\{ \begin{array}{c} \widetilde{\mathbf {f}}_{\varOmega ^{F}}^{F}\\ \widetilde{\mathbf {f}}_{\varOmega ^{F}}^{I}+\mathbf {f}_{\varOmega ^{C}}^{I}\\ \mathbf {f}_{\varOmega ^{C}}^{C} \end{array}\right\} _{t+\triangle t} \end{aligned}$$
      (25a)

    Replace the predictor displacement increment vector with \(\triangle \mathbf {u}^{corr}_{t \rightarrow t+\triangle t}\) and repeat steps 2-4.

Fig. 5
figure 5

Illustration of the need for a critical time step associated with explicit staggered twin update algorithm

5.2 Method of identifying fine time-scale elements and corresponding time-steps

The decomposition of the computational micro-domain into coarse (\(\varOmega ^C\)) and fine (\(\varOmega ^F\)) time-scale sub-domains necessitates the evaluation of the critical time steps at all integration points. Since an explicit expression \(\triangle t_{crit}=\triangle t_{crit}(m, \gamma _0, \) \(\mathbf {C}^e, s_{a}^{\alpha }, s_{*}^{\alpha }, \tau ^{\alpha }, \triangle \gamma _{crit}, r_{crit})\) is difficult to obtain, an empirical method is developed. For known state-variables at time t, the method estimates the critical time increments in all elements using the following steps:

  • Step I: Starting with a large time increment \(\triangle t\), evaluate a trial nodal displacement increment \(\triangle \mathbf {u}^{trial}_{t \rightarrow t+\triangle t}\) as:

    $$\begin{aligned} \triangle \mathbf {u}^{trial}_{t \rightarrow t+\triangle t}=\mathbf {K}_{t}^{-1}\left\{ \mathbf {f}_{t+\triangle t}^{ext}-\mathbf {f}_{t}^{int}\right\} \end{aligned}$$
    (26)

    where \(\mathbf {K}_{t}\) and \(\mathbf {f}_{t}^{int}\) are the tangent stiffness matrix and internal force vector at time t, respectively. \(\mathbf {f}_{t+\triangle t}^{ext}\) is the external load vector due to applied boundary conditions.

  • Step II: With the incremented nodal displacements \(\mathbf {u}_t+\triangle \mathbf {u}^{trial}_{t \rightarrow t+\triangle t}\) for each element, integrate the constitutive model using the time-integration algorithm in Sect. 3.1. If the time integration converges, the critical time step associated with this element is \(\triangle t_c \ge \triangle t\). If the time integration fails to converge, continue to step III.

  • Step III: Reduce the time increment to \(a\triangle t\), where the factor \(a\in (0,1)\). In this work a is chosen to be \(a=0.5\). Obtain a new trial nodal displacement increment \(\triangle \mathbf {u}^{trial}_{t \rightarrow t+ a\triangle t}\) from:

    $$\begin{aligned} \triangle \mathbf {u}_{t\rightarrow t+{a\triangle t}}^{trial}=\frac{\left\| \mathbf {f}_{t+ a\triangle t}^{ext}-\mathbf {f}_{t}^{int}\right\| }{\left\| \mathbf {f}_{t+\triangle t}^{ext}-\mathbf {f}_{t}^{int} \right\| }\triangle \mathbf {u}_{t\rightarrow t+\triangle t}^{trial} \end{aligned}$$
    (27)
  • Step IV: For all elements that failed to converge in step II, integrate the constitutive model with the nodal displacements \(\mathbf {u}_t+\triangle \mathbf {u}^{trial}_{t \rightarrow t+a\triangle t}\). If the time integration converges, the critical time step associated with this element is \(a\triangle t\). If the time integration fails to converge, reduce the time increment further to \(a^n\triangle t, (n=2,3,\ldots )\) and repeat step III-IV until all elements converge.

The fine time-scale elements are identified by the above process and then assigned the same smallest time step for subcycling calculation. An additional constraint is imposed in the selection of the finer time-scale sub-domains. The volume fraction of fine-scale sub-domain to the total computational domain is constrained to be less than \(10\%\) from the consideration of optimal acceleration as analyzed in Sect. 5.3.1.

5.3 Numerical implementation of subcycling algorithm for twin model

A straightforward way for evolving twins is through a staggered algorithm, discussed in Sect. 3.1, where the equilibrium problem is first solved for deformation and state variables assuming a fixed twin configuration. This is followed by updating the twin domain by nucleating new twins and propagating existing twins. However, such an algorithm will introduce an instability when the time step is too large, as demonstrated with Fig. 5.

Figure 5a shows a single crystal deformation in a time period from t1 to t2 with a micro-twin nucleating at an intermediate time t3. The deformation is characterized by two stages, viz. homogeneous deformation before twin nucleation, and heterogeneous deformation in the twin band and matrix after twin nucleation. For sufficiently small simulation time step, as demonstrated in Fig. 5b, the CPFE simulation using explicit staggered algorithm converges and captures the twin nucleation correctly. However, if the time step increment is large, e.g., \(\triangle t = \frac{t2-t1}{2}\) as shown in Fig. 5c, the simulation fails to capture the nucleation of the twin at t3. It predicts nucleation at an incorrect time \(t4=t1+\triangle t\) instead. Hence, the inhomogeneity in deformation is missed in the interval \(t3-t4\). The resulting error that accumulates in the history-dependent CPFE simulations eventually leads to divergence in the simulation. The iterations alone do not solve the issue in Fig. 5c as it will predicts the twin nucleation at t1, which will also interpolate the deformation incorrectly.

Fig. 6
figure 6

Subcycling steps incorporating an implicit twin update algorithm

A subcycling method, incorporating a novel implicit twin update algorithm, is proposed to overcome the above issues and provide a reliable way for modeling twinning induced heterogeneous deformation. The algorithm is schematically illustrated in Fig. 6. With known deformation variables at time t1, the steps in this algorithm for integrating from time t1 to t2 are delineated below.

  1. 1.

    At time t1, assign elements in existing twin bands that are undergoing high strain-rates to the fine time-scale domain. Use the subcycling algorithm to solve for the displacement increments \(\triangle \mathbf {u}^{pred}\) and stress \(\mathbf {\sigma }^{pred}\) at time t2. Update the state and internal variables at t2, keeping the twin structure fixed.

  2. 2.

    Substituting the stress \(\mathbf {\sigma }^{pred}\) in the twin nucleation criteria of Eq. (10) and twin propagation criteria of Eqs. (15), check for the nucleation of new twins and propagation of existing twins.

  3. 3.

    Update the twinned configuration and add the newly twinned elements to the fine time-scale domain. Transfer the fully twinned elements from the fine time-scale domain to coarse time-scale domain.

  4. 4.

    Return back to time t1. Use the known state and internal variables at time t1 and the updated fine time-scale domain to solve for displacements at the time increment \(t \rightarrow t+\triangle t\) again. This step ensures that the twin nucleation, propagation, and their induced deformation heterogeneities are captured adequately by using smaller time steps. Without this step, deformation in the new twins will not be captured adequately and the staggered twin-update scheme can introduce significant errors.

At this point, it is necessary to summarized the entire algorithmic structure with discrete twin evolution and subcycling enhancement. This is illustrated in Fig. 7, while the one without subcycling enhancement is shown in Fig. 8.

Fig. 7
figure 7

Flowchart showing the entire algorithmic structure with discrete twin evolution and subcycling enhancement

Fig. 8
figure 8

Flowchart showing the algorithmic structure with discrete twin evolution but without subcycling enhancement

5.3.1 Estimated speed-up with subcycling

The speed-up is estimated by comparing the number of operations (NPs) for CPFEM simulations with and without subcycling. Two factors are responsible for speed-up with the subcycling method. In an incremental CPFEM analysis from t to \(t+\triangle t\), they are:

\(\left( 1\right) \) the ratio of degrees of freedom (DOF) in fine time-scale sub-domain to the DOF of entire domain, i.e. \(\frac{N^F}{N^{total}}\);

\(\left( 2\right) \) the ratio of fine time step \(\triangle \tau \) to coarse time step \(\triangle t\), i.e. \(\frac{\triangle \tau }{\triangle t}\).

In each increment of analysis without subcycling using the fine time-step \(\triangle \tau \), a total of m Newton–Raphson iterations are required. In each iteration, a linear system equations is solved by a LU decomposition-based direct solver requiring the total NPs to be \(\mathcal {O}\left( \left( N^{total}\right) ^{3}\frac{\triangle t}{\triangle \tau }m\right) \). When the subcycling method is used in a time increment from t to \(t+\triangle t\), only the fine time-scale domain is solved using fine increments \(\triangle \tau \), while the coarse domain is solved with a coarse time increment \(\triangle t\). If \(m^{sc}\) iterations are required for global equilibrium, the total NPs in subcycling method is \(\mathcal {O}\left( \left( N^{F}\right) ^{3}\frac{\triangle t}{\triangle \tau }m^{sc} + \left( N^{total}-N^{F}\right) ^{3}m^{sc} +\left( N^{total}\right) ^{3}m^{sc}\right) \). The first, second and third terms correspond respectively to the number of operations for solving fine time-scale domain, coarse time-scale domain, and obtaining displacement corrector for the entire domain respectively. The comparison indicates that \(\frac{\triangle t}{\triangle tau}\) and \(\frac{N^{total}}{N^F}\) are the key factors in reducing the NPs with subcycling method if \(m^{sc} \approx m\). Higher acceleration rate can be achieved by subcycling method if the deformation is localized in smaller regions, and if deformation rates exhibit more heterogeneity.

The subcycling algorithm is also amenable to parallelization. Operations in step 2 and 3 of the subcycling algorithm can be distributed to multiple processors based on domain partitioning. The FE code is parallelized with the Open Multi-Processing programming interface OpenMPI-1.4.3.Parallelization in the context of the subcycling algorithm distributes the load evenly on all processors, with the fewer elements in localized deformation regions being allocated to a smaller number of processors.

The subcycling method requires a two-parts procedure at the beginning of each time step, It include the identification of fine time-scale elements and partitioning into fine and coarse sub-domains. Their time requirements are discussed next. The identification process of the fine time-scale elements evaluates the crystal plasticity constitutive laws at each integration point. The computing time scales almost linearly with the degrees of freedom for the domain. In comparison with the time for solving the system equations which scale non-linearly with the DOF of the domain, the identification time is small. For the polycrystalline test problem with 520404 4-noded linear tetrahedral elements in Sect. 6.3, the computation time spent on the identification of the fine time-scale elements is already negligible. The subsequent step of partitioning the domain into fine and coarse sub-domains, requires exchanging history dependent state variables of the fine domain elements among processors. Due to the small volume fraction of the fine time-scale subdomain, the computation time of this part is also negligible.

6 Numerical examples with the subcycling augmented CPFEM

The accuracy, efficiency and robustness of the subcycling algorithm is verified through several numerical simulations and tests. The first example is a check of its efficiency and reliability of the CPFE model for a polycrystal microstructure of the Mg alloy AZ31 that is undergoing deformation without twinning. Its effectiveness for deformation twinning in single and polycrystalline microstructures are tested in Sects. 6.2 and 6.3 respectively.

Fig. 9
figure 9

a The image-based virtual microstructure of the 24-grain polycrystalline AZ31 alloy showing loading and boundary conditions, and b macroscopic stress–strain response by CPFEM with and without subcycling under constant strain-rate loading

6.1 Accuracy and efficiency evaluation for CPFEM simulations without twinning

To verify the reliability of the subcycling algorithm, a polycrystal microstructure of Mg alloy AZ31 under uniaxial loading is simulated with and without the subcycling algorithm and compared to assess convergence. An image-based microstructure of the polycrystalline alloy AZ31 is constructed using the DREAM.3D software [48] as shown in Fig. 9. The RVE in Fig. 9a is of size 25  \(\upmu \)m \(\times \) 25  \(\upmu \)m \(\times \) 25  \(\upmu \)m containing 24 grains with an average grain size of \(~10\, \upmu \)m. The RVE is discretized into 16,371 four-noded linear tetrahedral (TET4) elements with a total of 3270 nodes. A constant strain-rate compressive loading of 0.0001 / s is imposed on the top surface. Minimum displacement boundary conditions are imposed on other surfaces to remove the rigid body modes as illustrated in Fig. 9a. The constitutive parameters for each slip system used in the analyses are those calibrated in [1]. Features of deformation-twinning are deactivated for this example.

The simulation results of macroscopic stress-strain responses with and without subcycling acceleration, but with refined time-steps, are plotted in Fig. 9b. The macroscopic mechanical stress-strain response by the two methods agree rather well. Significantly less time increments are required with the subcycling algorithm leading to accelerated computations. For local variables, the distribution of stresses in the microstructure by CPFE simulations with subcycling algorithm is plotted in Fig. 10a and compared with contour plots of the stress by CPFE simulation with finer time-steps in 10b. Additionally, the local stress component \(\sigma _{xx}\) at \(3 \%\) strain is plotted along a line passing through the interior of the microstructure in Fig.10c. The two results with and without subcycling algorithm are almost identical, attesting to the accuracy of the subcycling algorithm.

Fig. 10
figure 10

Contour plot of loading direction stress \(\sigma _{xx}\) distribution in the 24-grain polycrystalline microstructure by CPFEM simulations: a with subcycling acceleration algorithm, b without subcycling but with refined time-steps, and c comparison of the stress \(\sigma _{zz}\) along a line through the middle section of the microstructure by CPFEM with and without subcycling

For comparing the computational efficiency of the CPFEM simulations with and without subcycling, the simulations in Sect. 6.1 are re-done on a single processor. For \(10\%\) strain, the simulation without subcycling acceleration takes 285 time steps and a total CPU time of 21,250 s. On the other hand, the test with subcycling acceleration takes only 109 steps and a total CPU time of 10,182 s. Approximately, a \(200\%\) speed-up is observed with the subcycling algorithm. The distribution of time steps lengths are shown in the histogram of Fig. 11a. Both simulations have a small fraction of less than 1 second time steps. The simulation without subcycling has most time-steps in the range of 2–4 s in the duration of the simulation. Figure 11b plots the number of subcycling iterations to reach global equilibrium and the number of iterations required by the non-linear solver in CPFEM without subcycling. It shows that subcycling algorithm does not require a large number of iterations in this study. Finally the polycrystal test is performed with 2, 4, and 8 processors to verify the parallelization efficiency and the CPU times are plotted in Fig. 12. For this small test problem, the increased efficiency with parallelization is limited. A higher scaling is expected for problems with larger number of elements.

Fig. 11
figure 11

Distribution of a time steps in CPFEM simulation with and without subcycling algorithm, and b number of iterations in non-linear solver in conventional CPFEM and the number of subcycling iterations in subcycling-accelerated CPFEM

6.2 Simulating deformation with twin evolution in single crystal Mg by subcycling augmented CPFEM

The subcycling-augmented CPFEM is used to simulate deformation and twinning in a single crystal pure Mg, as shown in Fig. 13a. The computational domain of the single crystal has a dimension of 20 \(\upmu \)m \(\times \) 10 \(\upmu \)m \(\times \) 10 \(\upmu \)m, which is discretized into 67,418 four node linear tetrahedral (TET4) elements with 13,021 nodes. A uniaxial, constant strain-rate of \(1\times 10^{-4}\) is applied on the top surface along the x-axis, and minimum displacement boundary conditions are applied to the bottom surface to prevent rigid body motion. The crystal has a single orientation, designated by the Euler angles \([0^\circ ,5^\circ , 0^\circ ]\) in ZXY convention as shown in Fig. 13a. The direction of the applied straining causes the formation of \(\left\{ 10\bar{1}2\right\} \) tension twins. The \(5^\circ \) tilt in the crystal orientation is to make the Schmid factor of twin variant 1 to be highest among all 6 twin variants. Thus only twin variant 1 will be formed during simulations. Without this tilt, both the twin variants 1 and 4 have the highest Schmid factor and the interaction between them must be considered. This mechanism is currently not implemented in the twin evolution part of CPFEM. Constitutive parameters used for CPFEM simulations of pure Mg have been calibrated in [1]. To allow for the formation of multiple twin bands, a small perturbation in the twin nucleation source size \(L_0\) in Eq. (10) is applied.

The reference solution is established by performing six groups of simulation using respectively time-steps of 1s, 2s, 4s, 6s, 8s and 10s in the explicit staggered twin update scheme of Sect. 5.3. Figure 13b shows the simulation results of the fraction of twinned elements at \(1\%\) strain as a function of simulation time step sizes. For time increments smaller than 2 s, the simulation gives a converged result. Hence the explicit staggered simulation results with a time-step size of \(\triangle t=1s\) can serve as the reference solution for this problem.

Fig. 12
figure 12

Computational CPU time for CPFEM simulation with and without subcycling algorithm

For validating the implicit, subcycling-accelerated twin formation algorithm of Sect. 5.3, a simulation with a initial coarse time-step of 10s is conducted. The number fraction of twinned elements as a function of the applied strain is compared with the reference solution in Fig. 14a. Results by the implicit subcycling model with \(\triangle t=10s\) shows very good agreement with the reference solution. It can also be seen that solutions of the explicit staggered algorithms with similar time steps \(\triangle t=10s\) suffer from inaccuracy.

Figure 14b plots the stress-strain response by different models up to \(10\%\) strain. Prior to \(7\%\) strain, the stress levels and hardening rates are very low, since deformation is dominated by twin nucleation and propagation induced plastic flow. Near \(7\%\) strain, twins begin to saturate and occupy larger volumes in the crystal. The dominant deformation mechanism then switches from twin evolution to \(\left\langle c+a\right\rangle \) dislocation slip, causing the hardening rate to increase rapidly. This results in the sigmoidal shape of the stress-strain curves, as is experimentally observed for Mg. This change in deformation mechanisms and the sigmoidal shape of the stress-strain response curves is predicted by all simulations. The subcycling model yields excellent results, identical to the reference solution. However, the explicit staggered twin update scheme with large time steps yield lower hardening rate after twin saturation.

A comparison of the discrete twins in the microstructure as predicted by the different models is made in Fig. 15. Figure 16 compares the distribution of the Lagrangian strain \(E_{yy}\) at \(1\%\) strain. Solutions by the small time-step staggered model and the subcycling model are shown in Figs. 15a and 16a respectively. The \([0^\circ , 5^\circ , 0^\circ ]\) orientation (Euler angles) causes the \(\left( \bar{1}102\right) \left[ 1\bar{1}01\right] \) twin variant to have the highest Schmid factor in comparison with other extension twin variants. The only exception is that a \(\left( 10\bar{1}2\right) \left[ \bar{1}011\right] \) twin variant (variant 4 of the extension twin systems) occurs at the upper left corner of the model due to the local stress-state. This is shown in Fig. 15a, c respectively. The subcycling-augmented CPFE simulation predicts almost the same twinned microstructures as the reference solution by the staggered method with a small time-step. Results of the staggered method with a larger time step \(\triangle t =10s\) in Fig. 15b however do not capture the variant 4 twin and also predict a smaller twinned volume fraction. Deformation localization is predicted within twin bands in the reference and subcycling model solutions. Plastic deformation requires gliding on twin systems or dislocation glide in the \(\left\langle c+a\right\rangle \) slip systems to compress the crystal along the X-axis. The localized strain distribution is caused by easy gliding of micro-twins, in contrast to the \(\left\langle c+a\right\rangle \) system dislocation glide with high shear resistance.

Fig. 13
figure 13

a The Mg single crystal computational model subjected to constant strain-rate, uniaxial loading, and b fraction of twinned elements with different simulation time-steps

Fig. 14
figure 14

Comparison of a evolution of fraction of twinned elements as a function of strain, and b stress–strain response of single crystal Mg simulations by using the explicit staggered and implicit subcycling twin update models

Fig. 15
figure 15

a Simulated twinned single crystal Mg microstructure at \(1\%\) strain, using: a explicit staggered model with a time step \(\triangle t=1s\), b explicit staggered model with a time step \(\triangle t=10s\), and c implicit subcycling model with a time step \(\triangle t=10s\)

Fig. 16
figure 16

Lagrangian strain \(E_{yy}\) distribution at \(1\%\) overall strain, using: a explicit staggered model with a time step \(\triangle t=1s\), b explicit staggered model with a time step \(\triangle t=10s\), and c implicit subcycling model with a time step \(\triangle t=10s\)

Finally, Table 2 gives the nucleation time of some twins by the three different simulation cases. The simulation with implicit subcycling twin update algorithm predicts much more accurate results in comparison with simulation by explicit staggered algorithm using a coarse time step.

Table 2 Nucleation time of some twins in the three simulations of the single crystal test

Significant computational efficiency is achieved with the subcycling model. Simulation by the explicit staggered twin update algorithm with a fine time step of \(\triangle t = 1 s\) takes 4576 s to reach \(10\%\) strain, while the simulation using implicit subcycling twin update algorithm takes only 1273 seconds for the same strain. Approximately 3.6 times speed-up is achieved with subcycling without any loss of accuracy.

6.3 Deformation and twin evolution in polycrystalline statistically equivalent RVE (SERVE) by subcycling augmented CPFEM

The subcycling augmented CPFEM is used to simulate deformation induced twinning in polycrystalline microstructures of Mg alloys. Image-based SERVE of the Mg alloy AZ31 is first generated from the experimental data on microstructures, given in [49]. This data includes statistical distributions of the grain size, crystallographic orientation and crystallographic misorientation across grain-boundaries. This discrete data is fitted to probability distribution functions for input into the microstructure simulator software Dream3D [48]. For a specified number of grains in the ensemble, Dream3D generates a SERVE with statistical distributions matching the experimental data. The resulting SERVE shown in Fig. 17a contains 620 grains of average grain size of 32  \(\upmu \)m in a box of dimension 300  \(\upmu \)m \(\times \) 300  \(\upmu \)m \(\times \) 300  \(\upmu \) m. The computational domain of the SERVE is discretized into 520,404 four noded tetrahedral elements consisting of 96,849 nodes. The texture of the SERVE is represented by the pole Fig. 17b.

A uniaxial, constant strain-rate loading of \(1 \times 10^{-3}\) s\(^{-1}\) is applied along the Y-axis, normal to transverse direction (TD) surface of the virtual microstructure in Fig. 17a. The color contour corresponds to the angle of alignment between the [0001] lattice axis in each grain and the normal direction (ND). Minimum displacement boundary conditions, constraining the rigid body modes is applied to the bottom surface. As opposed to single crystal simulations, the interactions between twins, grain boundaries, as well as between different twin variants are accounted for in the polycrystalline simulations. These interactions are important mechanisms of deformation localization and affect the stability of CPFEM simulations. The subcycling augmented CPFEM is able to provide stable solutions for this combination of complex deformation mechanisms. The macroscopic stress-strain response from CPFE simulations with and without subcycling are compared in Fig. 17c. Almost identical results are observed for the two simulations. The CPFE simulation without subcycling takes 172832 seconds to compute the deformation up to \(2\%\) strain, while the CPFE simulation with subcycling algorithm takes 29844 seconds to compute to the same strain. This corresponds to a speed-up by a factor of  6.

Fig. 17
figure 17

a The polycrystalline SERVE of AZ31 alloy showing misorientation across grains, and boundary conditions, b pole figures showing the texture of the AZ31 alloy SERVE, and c stress-strain responses from simulations with and without subcycling

7 Summary and conclusions

This paper develops a validated multi-time-domain subcycling algorithm for numerical time integration algorithm for crystal plasticity FE models. Nucleation, propagation and growth of explicit twins are considered in the CPFE formulation. Explicit twin evolution has intrinsic issues of low computational efficiency, since the simulation time increment size is bounded by the high deformation-rate inside twin bands. The subcycling augmented CPFEM is beneficial for predicting discrete twin formation and associated heterogeneous deformation in single crystal and polycrystalline microstructures of metals and alloys.

The subcycling method decomposes the simulation spatial domain into the deformation localized critical region and the low deformation rate non-critical regions. The algorithm accelerates the simulation by solving the critical domains using fine time steps and the non-critical domains using coarse time step. By using a predictor-corrector algorithm for the interface displacements, the equilibrium of the entire computational domain is satisfied. Even when the polycrystalline microstructure does not have any explicit twins, the subcycling method accelerates the CPFE simulations by a factor of \({\sim }2\) over simulations without subcycling. When the formation of deformation twins are taken into account, a \({\sim }3\) fold speed-up is observed for pure Mg single crystals, while a \( {\sim }6\) times acceleration is seen for polycrystalline AZ31 microstructures. The level of accuracy in these simulations with the subcycling method is found to be excellent. This developed capability of explicit twin evolution in polycrystalline microstructures is needed for addressing twin related material failure.