1 Introduction

Specialized techniques are available for modeling physical phenomena at the extremes of the length and time scales. At the small/fast end of the spectrum, phenomena involving a few atoms and very fast time scales can often be reproduced using first principles techniques [1, 2]. Ensembles of many more atoms can routinely be simulated on time scales below a millisecond using molecular dynamics [3,4,5]. At the opposite end of the spectrum, continuum models treat material as a continuous homogenized medium rather than as a granular assembly of atoms [6]; this assumption creates a lower limit on continuum theories’ applicable length and time scales, though that limit shifts relative to acceptable error levels. Mesoscale, multiscale, and coupled multiphysics models have proliferated for studying phenomena between or spanning these length and timescale extremes. Mesoscale examples include dislocation dynamics [7,8,9,10], phase field models [11,12,13], and some kinetic Monte Carlo models [14,15,16,17].

This paper concerns a particular class of mesoscale model that uses kinetic Monte Carlo (kMC) to govern discrete, small-scale, relatively fast deformation events, and the Finite Element Method (FEM) to calculate the interactions between the discrete events and their continuum-level cumulative effect (i.e. a sample’s macroscopic shape change). These models are cyclical: the FEM computes the sample’s stress field and passes it to kMC; kMC uses that stress field to select a localized “transformation” (e.g. a shear event or phase transformation) which is passed back to the FEM; the FEM then applies that transformation as an eigenstrain [18] and calculates an updated stress field. Because they appeal to the raw deformation mechanism kinetics, these methods are able to capture much more granular detail than would a continuum constitutive law, while avoiding the many atomic vibrations that molecular dynamics so exhaustively simulates. Consequently, these methods have in common an exceptional compromise between simulation fidelity and size (spatially and especially temporally). Prototypical of this class of models is Homer’s Shear transformation zone dynamics (STZD) model [19] for deformation of bulk metallic glasses (which will be outlined in the next subsection). Other closely related models (cyclically coupling kinetics and the FEM) include a quantized crystal plasticity model for nanocrystalline materials [20,21,22,23,24,25,26] and a kMC model for martensitic phase transformations in shape memory alloys [27]. The Discrete Shear-Transformation-Zone Plasticity model [28, 29] also models metallic glass deformation by cycling between kinetics and elasticity, but uses a hybrid of analytical and FEM calculations in its elastic portion.

The computational scaling of coupled kMC–FEM models is generally dominated by the continuum FEM calculation. The memory consumption and computational time required to evaluate an up-to-date stress field in each step has limited most instantiations of this class of models to two-dimensional approximations, and the few three-dimensional examples in literature (for example, [30, 31]) invariably model very small samples (at most 60 nm in any direction).

To address this shortcoming of the above-described class of models, this manuscript borrows the well-established concept of stiffness matrix factor caching from closely related modeling techniques. Two particularly relevant examples of reuse of the stiffness matrix decomposition in mesoscale modeling are in discrete dislocation dynamics [32] and coupled atomistic/continuum multiscale models [33]. Despite the historical success of stiffness matrix factor caching in other mesoscale models, this strategy has never before been applied to STZD or it sibling models cited above.

This paper’s Methods section describes stiffness matrix factor caching and shows how it accelerates these models. While these methods apply to the entire class of models described above, the data presented in Sect. 3 focus on STZD as a case study. In anticipation of this, the following subsection provides a brief review of the STZD model; the reader is referred to [34] for a more in-depth presentation. This paper concludes with a presentation of the largest-ever three-dimensional STZD simulation, which was executed using stiffness matrix factor caching, and which showed an acceleration of nearly 200 \(\times \) over the original approach.

1.1 Introduction to the STZD model

The STZD model is based on Argon’s theory of metallic glass deformation [35], which postulates shear transformations, groups of atoms collectively shearing, as the fundamental plastic event. This manuscript uses the kinetic model in [30] to calculate the activation rates for various possible shear transformation events; this model is in turn a simplification of Argon’s treatment of shear transformations using Eshelby’s solutions for ellipsoidal inclusions with stress-free strain [36]. STZD models a sample with a finite element mesh, where the mesh elements coarse-grain the sample’s atoms. Clusters of elements (often sharing a common node) constitute potential shear transformation zones (STZ). The physical size of an STZ therefore bounds the maximum element size of the STZD method’s FEM mesh; so the physical sample size that can be simulated by STZD is closely connected to the FEM mesh size that can be handled. That is, simply scaling the FEM mesh size is not an option for reaching longer length scales with STZD.

Table 1 Sizes of STZD simulations in literature, with length scales and brief descriptions of the simulated loading
Table 2 Selected micromechanical experiments on metallic glass from literature, with length scales and brief descriptions of loading

Each step of the STZD model begins with the sample’s stress state, which is calculated by FEM, taking into account the sample’s loading and preexisting eigenstrain. The activation rate for each STZ is then estimated using transition state theory [37], which predicts an Arrhenius-like relation [38]:

$$\begin{aligned} \dot{s} = \nu _0 \exp \left( -\frac{\Delta F}{k_B T}\right) \int _{g \in G} \exp \left( \frac{\tau (\sigma ,g) \gamma _0 \Omega _0}{2 k_B T}\right) \ dg \end{aligned}$$
(1)

where \(\nu _0\) is the transition attempt frequency (on the order of the material’s Debye frequency), \(\Delta F\) is a fixed activation energy barrier, G is the set of combinations of shear plane and direction, \(\tau \) is the shear stress resolved on \(g \in G\), \(\gamma _0\) is the characteristic STZ shear strain, and \(\Omega _0\) is STZ volume. The kMC algorithm then stochastically selects a single STZ shear event as the next transition and computes a time step (a “residence time” before the transition). The probability of choosing any STZ shear event is weighted proportionally to its particular rate. To close the cycle, the FEM applies the appropriate eigenstrain (also called “thermal strain” or “initial strain” in FEM literature) to the FEM mesh elements comprising the selected STZ, increments the sample’s loading conditions, and computes the updated sample stress field. The STZD cycle then repeats. Gradually the individual STZ activation events cause eigenstrain to accumulate in the FE mesh, resulting in macroscopic plastic deformation of the sample.

The STZD model was originally implemented in two dimensions [19] and then extended to three dimensions [39]. It has been successfully used to simulate shear samples [19], tensile samples [39, 40], and single and cyclic nanoindentation [39, 41]. It has also been extended to include free volume as an evolving state variable [42, 43] and to study metallic glass matrix composite materials [40]. The key papers reporting results from STZD simulations are shown in Table 1, along with the dimensionality and the length scales of those simulations. These papers have produced valuable insights into shear band nucleation and structure [31] and metallic glass deformation modes [30], among other phenomena, while relying mostly on two-dimensional approximations. The few three-dimensional samples in the literature never exceeded 60 nm in any direction and typically took weeks on multicore architectures to compute. Comparison to selected micromechanical experiments (cited in Table 2; see also [44]) shows a gap between experimentally-relevant length scales and simulation capabilities which prevents side-by-side comparison for calibration, validation, and forward-modeling purposes. This gap is of particular interest in view of the experimentally-observed transition in metallic glass plasticity between 80 and 500 nm-diameter uniaxially loaded samples [44,45,46,47,48,49]; the technique in this paper brings STZD much closer to being able to study this transition in silico.

2 Method

The method to follow uses the FEM in its constituent pieces rather than as a “black box.” The reader is referred to the first two chapters of [52] for an in-depth introduction to the FEM, but a high-level overview is provided here for context. The FEM takes a discretized mesh of a sample and constructs interpolation functions (“shape functions”) on the mesh elements. Then, under the postulate that (in the case of elasticity) the displacement field satisfying stress equilibrium can be approximated by a weighted sum of the shape functions, the FEM constructs a linear system:

$$\begin{aligned} \mathbf {K}\mathbf {d}=\mathbf {F} \end{aligned}$$
(2)

where the unknown vector \(\mathbf {d}\) consists of the shape function weights best satisfying the underlying differential equation. The symmetric positive definite matrix \(\mathbf {K}\) is termed the “stiffness matrix,” and is constructed from the elastic constants of the sample and the mesh shape functions. The vector \(\mathbf {F}\) is termed the “force vector,” and contains (in addition to the stiffness matrix’s ingredients) information on both Dirichlet and Neumann boundary values, body forces, eigenstrains, and eigenstresses. Equation (2) is often solved by Cholesky decomposition [53,54,55] of the stiffness matrix \(\mathbf {K}=\mathbf {LL}^T\), followed by solution of \(\mathbf {LL}^T\mathbf {d}=\mathbf {F}\) by forward- and back-substitution.

The STZD algorithm can be framed as a cycle with six steps (as shown in Fig. 1a). After a brief setup phase, the stiffness matrix \(\mathbf {K}\) is constructed using the sample mesh and elastic stiffness tensors, at a computational cost of O(n) where n is the number of mesh nodes. Second, a sparse Cholesky algorithm factors \(\mathbf {K} = \mathbf {LL}^T\); this step is the most computationally expensive, with theoretical O(\(n^3\)) complexity, empirically closer to O(\(n^2\)) when sparse linear algebra is leveraged. Third, the force vector \(\mathbf {F}\) is constructed from the sample loading, body force, and preexisting eigenstrain fields, in O(n) time. Fourth, the system \(\mathbf {Kd}=\mathbf {F}\) is solved by forward- and back-substituion, with theoretical O(\(n^2\)) complexity, empirically closer to O(n) with sparse computations. Fifth, in O(n) time the displacement field is postprocessed into stress and strain fields for the sample. Sixth and finally, kMC selects the next transition, also in O(n) time. The asymptotic complexities of these steps are listed in Table 3.

The transition selected by kMC takes the form of an eigenstrain which is applied to a cluster of elements in the FEM mesh. Under the original algorithm the cycle then repeats itself, starting with construction of a new stiffness matrix. However, note that the new stiffness matrix will be identical to the previous one; modifying the boundary values and adding eigenstrain to the model changes neither the sample’s elastic constants nor the shape functions. Therefore, construction and factorizaton of \(\mathbf {K}\) is completely redundant after the first step.

Fig. 1
figure 1

a Kinetic-FEM cycle; b kinetic-FEM cycle with shortcut shown

Table 3 Complexity of pieces of the STZD method, assuming dense numerical linear algebra, where n is the number of nodes in the mesh

This suggests a simple innovation: to calculate the FEM stiffness matrix and its factors once as a setup step, and then to cache those stiffness matrix factors in memory. This eliminates the necessity of calculating and factoring \(\mathbf {K}\) in each simulation cycle; each cycle simply calculates the new force vector, solves the cached stiffness matrix factors against the new force vector, and then postprocesses the newly calculated displacement field to obtain strain and stress data (see Fig. 1b). This strategy is termed “stiffness matrix factor caching.” It produces precisely the same results as the original approach; the physics are not altered, nor is the numerical approximation. This is simply an adjustment to the code’s logical flow to eliminate redundant calculations.

The potential value of this optimization is apparent from the complexities of each part of the STZD algorithm in Table 3; by eliminating the need to factor a stiffness matrix with each step, the overall asymptotic complexity of each step is reduced from \(O(n^3)\) to \(O(n^2)\), assuming dense numerical linear algebra, and by a similar margin using sparse linear algebra. Of course this approach, while novel in the context of STZD modeling (and of the other closely related models mentioned in the introduction), is a straightforward application of well-established ideas within mesoscale modeling [32, 33]; also, commercial FEM packages routinely reuse stiffness matrix factors in time-series calculations. More broadly, reusing matrix factors or inverses is a standard practice in algorithms for fields as diverse as optimization and image processing.

Since this approach consists only of modifying the logical flow of the STZD algorithm in a way that neither alters the physics nor the order of calculations, the results between the old and improved STZD algorithms are identical (down to and including floating-point error). The author has verified that this is the case for the STZD codes about to be described.

2.1 Implementation details

For this study two STZD codes were constructed, one of which follows the conventional algorithm in Fig. 1a and one of which leverages stiffness matrix factor caching as in Fig. 1b, but both of which are otherwise as similar as possible. Both codes are composed in C++11, and in lieu of a commercial FEM solver both codes use a simple in-house FEM library which takes advantage of the Eigen3 matrix library [56] and the Cholmod sparse linear system solver [57]. Both codes were compiled using the Intel compiler, linked against a single-threaded version of Intel’s MKL library, and were run in serial on an Intel Xeon processor clocked at 2.6 GHz in a workstation with 128 Gb of memory. Both codes are instrumented to report the timing breakdown between parts of the STZD cycle to enable more granular comparison between the algorithms. The simulation input is in the form of an .ini file, and the output uses the HDF5 file format [58]. The code that does not use stiffness matrix caching performs similarly to commercial linear FEM implementations in serial execution mode.

3 Results

To examine the scaling of the STZD algorithm with respect to mesh size, wall-clock times were averaged over 10 STZD steps for meshes with between 1606 and 938,407 nodes; the timings are plotted on a log-log axis in Fig. 2. It is evident that the STZD code using stiffness matrix factor caching is empirically faster than the original approach, with a speedup of 196 x for the largest meshes studied for this paper. The observed deviation from dense matrix asymptotic behavior is due to extensive use of sparse numerical linear algebra.

Fig. 2
figure 2

Time required to execute STZD code described in the text, as a function of FEM mesh size

The effectiveness of caching stiffness matrix factors is further illustrated by fractionally breaking the execution time of an original STZD step into pieces in Fig. 3. The optimization described in this paper eliminates the striped regions of that plot (corresponding to building and factoring the FEM stiffness matrix), cutting 98–99.5% of the computation per STZD step.

Fig. 3
figure 3

Fraction of time spent on each part of a step of the STZD algorithm described in the text, as a function of FEM mesh size

Fig. 4
figure 4

Legend to the STZD simulation Figs. 6, 7, 8 and  9. Part a shows the dimensions of the uniaxial samples in terms of the parameter , the diameter of the gauge portion of the sample. Part b maps the norm of STZ strain to color and dot size. The dot sizes shown here are scaled much larger than those in the figures to follow, but are proportionally correct relative to each other

Fig. 5
figure 5

Relative sizes of three-dimensional STZD simulations in literature and this paper. In the top-left corner are the three largest three-dimensional STZD simulations from literature, with a and b from [30], and c from [31]. Along the bottom are the various samples reported in this paper with their respective gauge section diameters ()

Fig. 6
figure 6

STZD compression test of sample. Numbers along the top are KMC steps. This simulation took less than 5 min to run on the machine described in the Sect. 2.1 of the text

Fig. 7
figure 7

STZD compression test of sample. Numbers along the top are KMC steps. Execution time: 13  h

Fig. 8
figure 8

STZD tension test of sample. Numbers along the top are KMC steps. Execution time: 3 days. In the “front” view of steps 23,930 and 28,716, subtle wavelike variations in the density of shear transformations are evident near the plane of the nascent shear band

Fig. 9
figure 9

STZD compression test of sample. Numbers along the top are KMC steps. Execution time: 19 days

To concretely illustrate the utility of this technique the next subsection contains a series of simulated uniaxial tensile and compression tests on the largest-ever STZD samples.

3.1 Uniaxial tensile and compression tests

This section describes uniaxial tests on nanoscale cylindrical samples with gauge diameters from 10 to 50 nm and gauge lengths from 30 to 150 nm; the geometry of these samples is drawn in Fig. 4a. The relative sizes of this paper’s samples in comparison to three-dimensional STZD samples in literature are shown in Fig. 5. Each simulation was run for a number of steps proportional to the volume of the sample, to ensure roughly equal amounts of plastic deformation between the simulations. These simulations were run under the conditions described in Sect. 2.1 above; in particular, they were run in serial fashion. A selection of the simulations are plotted in Figs. 6, 7, 8 and 9, and the remainder are included in supplementary material to this article; in these plots, STZs are plotted as small dots, with the size and color of the dot corresponding to the norm of the cumulative STZ strain. The colorbar for STZs is shown in Fig. 4b. The STZD parameters for all the simulations are given in Table 4.

The most notable feature of the compression sample in Fig. 6 is its runtime of less than 5 min. This is a dramatic improvement on the original approach, which would have taken at least a day to run a comparable simulation on multiple cores. This suggests that stiffness matrix caching will enable simulation of large ensembles of small samples for statistical analysis; this is of particular value because these simulations are stochastic in nature, so analysis of any one simulation might not be representative of the ensemble.

Of course, stiffness matrix factor caching could also enable use of a finer FEM mesh on these small samples. This has been shown to not be a particular issue in STZ Dynamics (assuming that the mesh size is an appropriate fraction of the material’s characteristic STZ volume, as is the case in these simulations), but may be useful as kMC–FEM models are extended to new materials systems in the future.

Moving up to the tensile sample in Fig. 7, which is already larger than any previously published STZD sample, nucleation of orthogonal competing shear bands is observed. The interaction between the shear bands apparently obstructs both of them from crossing the full diameter of the gauge section. This behavior has implications for understanding shear band nucleation and growth, and can only be observed in samples large enough to sustain multiple instances of shear localization. This issue will be thoroughly explored in future articles.

Looking closely at the tensile sample in Fig. 8, one can observe periodic “waves” in the STZ strain field perpendicular to and along the length of the main shear band. Interestingly, these appear very early in the shear band nucleation process (they are visible as early as step 23930 of the simulation). The wavelength of these oscillations (between 10 and 15 nm) is such that they would be impossible to observe in the smaller STZD samples published to date.

The compression sample in Fig. 9 shows nucleation of four shear bands along orthogonal planes, but one of the shear bands ultimately dominates the others and propagates across the diameter of the sample (unlike Fig. 7). It is, however, apparent from the axonometric view that the dominant shear band is impeded by its orthogonal competitors. Again, a smaller STZD sample would be unable to sustain these multiple shear localization instances. This sample and the tension sample in the supplementary material are the largest STZD simulations run to date, exceeding the volume of the previous maximum by a factor of eighteen; these simulations’ dimensions approach the scale of physical nanomechanical experiments currently in the literature (as in Table 2).

It is worth noting here that stiffness matrix factor caching does not negate the necessity of remeshing when plastic deformation to the sample invalidates the linear expansion underpinning the FEM. The FEM stiffness matrix will need to be reconstructed and factored after each remeshing. However, in the context of STZD, remeshing events should be spaced many STZD steps apart, so the speed gains described above remain representative of expected performance even with remeshing.

Table 4 Parameters to all of the STZD simulations appearing in this paper

As mentioned in the implementation details above, the results here are for a serial implementation of STZ Dynamics. Stiffness matrix factor caching is moderately amenable to parallel implementation; there are libraries available with parallel sparse \(\mathbf {LDL}^T\) decomposition and solution from factors (e.g. [59]) but increasing inter-process communication would be expected to cause diminishing returns as additional processes are added.

4 Conclusions

Simulation methods that iteratively link kinetics with localized updates in the FEM, typified by the STZ Dynamics method, have suffered from long run times due to superlinear scaling of the FEM with mesh size. However, if these methods do not require modifications to the mesh or the elastic properties of the sample from step to step, as is the case in STZD, then the FEM stiffness matrix is also unchanged from step to step. This enables an acceleration strategy: to calculate the FEM stiffness matrix, factor it, and cache the factorization in an initialization step, and then to use and reuse the factorization in each step. This is termed “stiffness matrix factor caching.” While reuse of stiffness matrix factors is a common practice in mesoscale modeling, it has never before been applied to STZD and closely-related methods. Stiffness matrix caching constitutes an asymptotic improvement and empirically has produced a speedup of nearly 200x over the original method. This speedup is useful in two respects: it enables simulation of large numbers of small samples to form an ensemble, and it enables simulation of samples on experimentally-relevant length scales in three dimensions. These simulations of larger samples exhibit multiple (often competing) shear bands, sometimes in apparently periodic arrangements. Future work looks to directly compare real nanomechanical experiments to these large STZD simulations for validation purposes, or to illuminate avenues for improvement of the STZD model’s physics. One such improvement might be inclusion of dilatative strain in shear transformations (as in [42]); so long as the elastic stiffness remains static from step to step, the speedup reported here holds. The results in this paper are readily extensible to STZD-like methods in both two and three dimensions; it is hoped that stiffness matrix caching will make three dimensional simulation the norm rather than the exception for STZD and its sibling methods, and that studies comparing these simulations to physical nanomechanical experiments will be forthcoming.