1 Introduction

The boundary element method (BEM) is a technique for numerically solving boundary integral equations (BIEs), which relate unknown parameters over an arbitrary surface defining a domain via integrals of functions over that surface. Advantageously, the BEM restricts the numerical solution to the boundary surface (in contrast to domain discretisation methods such as the finite element method) and may also represent infinite exterior domains with no additional complexity to the problem.

The BEM for elastodynamic frequency domain problems was first researched by Kupradze in the 1930/1940s [1], and by Cruse and Rizzo in 1968 [2, 3], and is now considered a mature field with a number of review articles [4, 5] and books [6–9] available on the subject. However the BEM traditionally suffers from a high computational cost, in terms of both the algorithmic complexity and memory requirements. This is due to the BEM coefficient matrices resulting from the numerical integration of the BIE being dense complex matrices of size \(N \times N\) for N boundary unknowns. Thus the iterative solution of the BEM has algorithmic and memory complexities proportional \(N^2\)—a prohibitive cost for large N.

The use of the fast multipole method (FMM) in conjunction with the BEM has been shown to significantly reduce the computational/memory requirements of the BEM. The FMM was first introduced by Rokhlin [10] for solving 2D Laplace problems and was popularised by Greengard and Rokhlin’s application of the method to particle interaction problems [11]. The fast multipole boundary element method (FMBEM) has since been applied to a number of BIEs in both the time and frequency domains, with much of this work cited in the review articles by Nishimura [12] and Liu et al. [1].

The FMBEM was first applied to the elastodynamic BIE for 2D frequency-domain problems in 1997 by Chen et al. [13] using Rokhlin’s high frequency diagonal translation method [14]. Elastodynamic 2D scattering problems using the low frequency FMM were treated by Fukui and Inoue [15] and Fujiwara [16]. The elastodynamic FMBEM for 3D frequency-domain problems was implemented by Fujiwara [17] in 2000 using the high frequency diagonal translation method and a level-independent truncation number. Yoshida et al. [18] implemented a low-frequency elastodynamic FMBEM in 3D using a translation method which requires \(O(p^5)\) operations to translate a truncated expansion with \(p^2\) coefficients [19], as well as a high-frequency elastodynamic FMBEM using diagonal translations [20]. Isakari et al. [21] applied the low frequency elastodynamic FMBEM to periodic inclusion problems using the \(O(p^3)\) rotation, co-axial translation, rotation (RCR) method [22]. Chaillat et al. [23] improved upon the high-frequency 3D elastodynamic FMBEM by using a level-dependent truncation number to reduce the algorithmic complexity to \(O(N\mathrm{log}_2N)\). Sanz et al. [24] implemented a sparse approximate inverse (SAI) preconditioner for the high-frequency elastodynamic FMBEM while Chaillat [25] used a nested outer–inner GMRES loop for preconditioning (analogous to that employed for the Helmholtz FMBEM [26]). Elastodynamic FMBEM problems involving multiple piecewise-homogeneous domains were treated by Chaillat et al. [27] and by Tong and Chew [28]. Preconditioning strategies for periodic structures using the low frequency elastodynamic FMBEM were treated by Isakari et al. [29]. Visco-elastodynamic problems were investigated by Grasso et al. [30] and elastodynamic half-space problems by Chaillat and Bonnet [31]. Takahashi has recently developed a ‘broadband’ 2D elastodynamic FMBEM [32] which employs both low and high frequency translation methods. Reviews of the literature can be found in Refs. [1] and [33]. Finally it should be noted that other fast techniques, such as the hierarchical matrix method [34] and the precorrected-FFT method [35], have also been successfully applied to the elastodynamic BIE.

This paper presents a 3D elastodynamic FMBEM in the ‘low frequency’ regime, which has received relatively little attention in the literature compared to the high frequency models which employ Rokhlin’s diagonal translation method and so are unstable at lower frequencies. Here the multipole expansions and translations of the elastodynamic fundamental solutions are instead applied using the spherical basis functions and the RCR translation method respectively, and so the present model is stable at lower frequencies. Advantageously, the RCR translations natively incorporate the required interpolation/truncation procedures between octree levels and so no filtering operations are required (as with the diagonal translations). The low frequency FMBEM models are thus ideal for treating problems which require a large number of elements per wavelength e.g. for numerical models which require a high degree of accuracy, or involve boundary surfaces with complex geometries (i.e. where the fine element discretisation required to adequately represent surface features may result in many elements per wavelength). Prior work on the low frequency elastodynamic FMBEM models has focused on scattering problems for arrays of cracks in the context of fracture mechanics [18] and for modelling periodic arrays of spherical inclusions in the context of phononic crystals [21, 29]. The later application is related to the wider field of numerical modelling of metamaterials via FMBEM models (see for example [36] for an application to electromagnetic metamaterials) which typically requires the low frequency implementation of the FMBEM to model the sub-wavelength features of the material. The low frequency elastodynamic FMBEM is also well suited to modelling seismic events for realistic surface geometries at very low frequencies: see for example [37] where the application of the high frequency FMBEM to model an alpine valley required the storage of a large part of the BEM coefficient matrix to ensure stability in the diagonal translations. The low frequency FMBEM does not require these restrictions and so the memory footprint can be substantially reduced for this application. Of course at ‘intermediate’ frequencies both the low and high frequency FMBEM formulations are applicable and so many problems of interest may be treated with either method.

The present model introduces the following new features to the low frequency elastodynamic FMBEM.

  1. 1.

    The efficient multipole expansion of the elastodynamic fundamental solutions is achieved via the use of compact recursion relations for the Cartesian partial derivatives of the the spherical basis functions [38], which are extended here to provide new second-order partial derivative recursion relations.

  2. 2.

    Iterative solution is achieved via a nested outer–inner GMRES solution, where a fast low accuracy inner FMBEM is used as a preconditioner for the full accuracy outer FMBEM, while the inner FMBEM is itself preconditioned using a SAI preconditioner (thus combining the preconditioning strategies employed in [24] and [25] and newly applying them to the low frequency elastodynamic FMBEM).

  3. 3.

    The ‘translation stencils’, which were first applied to the Laplace FMBEM for the local 8-box groups for each level of the octree structure [39] are implemented here for the elastodynamic FMBEM to reduce the total number of multipole translations that must be applied. Details of the implementation of the translation stencils for local 4 and 2-box groups are also presented here for the first time, with the application of all 3 stencil types shown to reduce the number of translations per octree level by up to \(60\,\%\). This combination of strategies yields a solution convergence 2 to 2.5 times faster than the standard GMRES solution of the FMBEM without translation stencils.

The paper is organised as follows: Sect. 2 presents the elastodynamic BEM. Section 3 presents the multipole expansions of the elastodynamic fundamental solutions. Pertinent details of the implemented FMBEM model are then discussed in Sect. 4, including details of the octree structure, truncation number, near field numerical integration, FMBEM procedure, translation stencils, and the iterative solution and preconditioning techniques. Finally, Sect. 5 presents numerical results. The performance of the model is characterised in terms of both its accuracy and its algorithmic/memory complexities, which are shown to be in good agreement with the theoretical estimates. This also appears to be the first time that algorithmic/memory complexity results have been presented for the 3D low frequency elastodynamic FMBEM (with such results absent for the other models in the literature [18, 21, 29]).

2 The elastodynamic BEM

2.1 The elastodynamic BIE

The elastodynamic BIE for an elastic solid domain V exterior to an arbitrary closed surface S is written as:

$$\begin{aligned} \frac{1}{2}u_i(\mathbf{y})&= \int _S \Big (U_{ij}^{\mathrm{T}}(\mathbf{x}-\mathbf{y})t_i(\mathbf{x}) -T_{ij}^{\mathrm{T}}(\mathbf{x}-\mathbf{y})u_i(\mathbf{x})\Big )dS(\mathbf{x}) \nonumber \\&\quad + u_i^{\mathrm{inc}}(\mathbf{y}) \end{aligned}$$
(1)

where \(u_i\) and \(t_i\) are the \(i\hbox {-}{th}\) Cartesian components of the total displacement and surface stress (or traction), \(\mathbf{x}\) and \(\mathbf{y}\) are points on the surface (assumed to be locally smooth at \(\mathbf{y}\)), and the superscript \({}^{\mathrm{T}}\) denotes the transpose of the \(3 \times 3\) coefficient matrices of the fundamental displacement \(U_{ij}\) and traction \(T_{ij}\) solutions [28]. The total displacement \(u_i(\mathbf{x}) = u_i^{\mathrm{inc}}(\mathbf{x}) + u_i^{\mathrm{sca}}(\mathbf{x})\) is the sum of the incident \(u_i^{\mathrm{inc}}(\mathbf{x})\) and scattered \(u_i^{\mathrm{sca}}(\mathbf{x})\) field components and similarly for the total surface stress. The fundamental solutions may be expressed as [23]:

$$\begin{aligned}&U_{ij}(\mathbf{x}-\mathbf{y}) \nonumber \\&\quad =\frac{1}{k_s^2 \mu } \left( (\delta _{qj}\delta _{ir}-\delta _{qr}\delta _{ij}) \big [G_s(\mathbf{x}-\mathbf{y})\big ]_{qr}-\big [G_p(\mathbf{x}-\mathbf{y})\big ]_{ij}\right) \end{aligned}$$
(2)
$$\begin{aligned}&T_{ij}(\mathbf{x}-\mathbf{y}) =\mu \left( \beta \delta _{ik}\delta _{lm} + \delta _{il}\delta _{km} + \delta _{kl}\delta _{im} \right) \nonumber \\&\times \big [U_{lj}(\mathbf{x}-\mathbf{y})\big ]_mn_k(\mathbf{x}) \end{aligned}$$
(3)

where the subscripts \(i, j, k, l, m, q, r = 1,2,3\) (repeated indices in products imply summation), \(\beta = \tfrac{2\nu }{1-2\nu }\), \(n_k(\mathbf{x})\) is the \(k\hbox {-}{th}\) component of the outward surface normal at \(\mathbf{x}\) and \(\delta _{ij}\) is the Kronecker delta function. \(G_p(\mathbf{x}-\mathbf{y})\) and \(G_s(\mathbf{x}-\mathbf{y})\) denote the Helmholtz Green’s function:

$$\begin{aligned} G_f(\mathbf{x}-\mathbf{y}) = \frac{e^{ik_fr}}{4\pi r} \end{aligned}$$
(4)

where \(f=p,s\) denotes the compressional \(k_p\) and shear \(k_s\) wavenumbers respectively, \(r = |\mathbf{x} - \mathbf{y}|\), \(i = \sqrt{-1}\) and the first partial derivative of \(G_f(\mathbf{x}-\mathbf{y})\) with respect to the position vector \((\mathbf{x}-\mathbf{y})\) is denoted as:

$$\begin{aligned} \big [G_f(\mathbf{x}-\mathbf{y})\big ]_{i} = \frac{\partial G_f(\mathbf{x}-\mathbf{y})}{\partial \mathbf{x}_{i}} \end{aligned}$$

with similar notation employed for the higher order derivatives (e.g. \(\big [G_f(\mathbf{x}-\mathbf{y})\big ]_{ij}\), \(\big [G_f(\mathbf{x}-\mathbf{y})\big ]_{ijk}\)). Finally, the compressional and shear waves are defined by the following sound speeds (\(c_p\), \(c_s\)) and wavenumbers:

$$\begin{aligned} c_p= & {} \sqrt{\frac{2(1-\nu )}{1-2\nu }}c_s \quad k_p= \sqrt{\frac{1-2\nu }{2(1-\nu )}}k_s \\ c_s= & {} \sqrt{\frac{\mu }{\rho _s}}\quad k_s = \sqrt{\frac{\rho _s \omega ^2}{\mu }} \end{aligned}$$

where \(\rho _s\), \(\mu \) and \(\nu \) are respectively the density, shear modulus and Poisson’s ratio of the elastic solid.

2.2 The elastodynamic BEM

Equation (1) may be discretised on the boundary surface and numerically solved by applying a boundary condition (BC) to eliminate one of the unknowns (\(u_i\), \(t_i\)) on S. Here the surface is discretised into N plane triangular elements with piecewise constant unknowns and the collocation method is applied at the centre of each element to yield a set of 3N equations relating 3N unknowns at the set of N source/receiver points \(\mathbf{x}_N\) and \(\mathbf{y}_N\) (with \(\mathbf{x}_N = \mathbf{y}_N\) for the collocation method). Applying a traction-free BC (\(t_i = 0\) on S) yields:

$$\begin{aligned} \left( \frac{1}{2}\mathbf{I}_{3N} + \mathbf{T}_{3N}\right) u_i(\mathbf{x}_N) = u_i^{\mathrm{inc}}(\mathbf{y}_N) \end{aligned}$$
(5)

where \(\mathbf{T}_{3N} = \int _{S} T_{ij}^{\mathrm{T}}(\mathbf{x}_N,\mathbf{y}_N) dS(\mathbf{x}_N)\) is the coefficient matrix of size \(3N \times 3N\) containing the numerical integrations of the traction fundamental solution between each and every pair of elements and \(\mathbf{I}_{3N}\) is the identity matrix. Similarly for a rigid surface BC (\(u_i = 0\) on S):

$$\begin{aligned} \big (\mathbf{U}_{3N}\big )t_i(\mathbf{x}_N) = - u_i^{\mathrm{inc}}(\mathbf{y}_N) \end{aligned}$$
(6)

where \(\mathbf{U}_{3N}\) has an analogous form to \(\mathbf{T}_{3N}\). In either case, the discretised BIE takes the form of an \(\mathbf{Az} = \mathbf{b}\) matrix equation with coefficient matrix \(\mathbf{A}\) and vectors of unknowns \(\mathbf{z}\) and knowns \(\mathbf{b}\).

2.3 BEM algorithmic complexity

The BEM results in a dense matrix \(\mathbf{A}\) which for N unknowns requires \(O(N^2)\) computational memory to store. A direct solution of the \(\mathbf{Az} = \mathbf{b}\) matrix equation has a computational cost of \(O(N^3)\) operations for the inversion of \(\mathbf{A}\), while an iterative solution has a computational cost of \(O(N^2)\) operations per matrix-vector product. For large scale BEM problems the \(O(N^2)\) algorithmic and memory complexity of the iterative BEM solution is prohibitive. The FMM is able to alleviate these issues by approximately calculating the matrix-vector products to a prescribed accuracy [26], but without explicitly forming nor directly multiplying the full coefficient matrix, thus reducing the complexity of the FMBEM to less than \(O(N^2)\).

3 The FMM for the elastodynamic BIE

3.1 Multipole expansion of the Helmholtz Green’s function

The central idea of the FMM is to ‘factorise’ the Helmholtz Green’s functions \(G_f(\mathbf{x}-\mathbf{y})\) such that an intermediate point or expansion centre, \(\mathbf{c}\), is introduced into the radial distance r in Eq. (4) and so \(r = |\mathbf{x}-\mathbf{y}|\) can be re-expressed as \(r = |(\mathbf{c}-\mathbf{y}) - (\mathbf{c}-\mathbf{x})|\). The multipole expansions which separately involve \(\mathbf{x}\) and \(\mathbf{y}\) may then be independently calculated and reused with other expansions involving different source/receiver points.

In the low frequency regime the multipole expansion of \(G_f(\mathbf{x}-\mathbf{y})\) is achieved via the singular S and regular R spherical basis functions [22]:

$$\begin{aligned} \big [S_{n}^{m}(\mathbf{r})\big ]^f&= h_{n}(k_fr)Y_{n}^{m}(\theta ,\phi ) \end{aligned}$$
(7)
$$\begin{aligned} \big [R_{n}^{m}(\mathbf{r})\big ]^f&= j_{n}(k_fr)Y_{n}^{m}(\theta ,\phi ) \end{aligned}$$
(8)

where \(r = |\mathbf{r}|\), \(j_{n}(k_fr)\) and \(h_{n}(k_fr)\) are respectively the spherical Bessel and spherical Hankel functions of the first kind, the integer degree n and order m have ranges \(n \ge 0\) and \(-n \le m \le n\), and \(Y_{n}^{m}(\theta ,\phi )\) is defined as:

$$\begin{aligned} Y_{n}^{m}(\theta ,\phi ) = (-1)^{m} \sqrt{\frac{2n+1}{4\pi } \frac{(n - |m|)!}{(n + |m|)!}}P_{n}^{|m|}(\cos \theta )e^{im\phi } \end{aligned}$$
(9)

where \(P_{n}^{|m|}\) is the associated Legendre function and \(\theta \) and \(\phi \) denote the angular coordinates of the spherical coordinate system \(\mathbf{r} = r(\sin \theta \cos \varphi , \sin \theta \sin \varphi , \cos \theta )\). Eqs. (7) and (8) may be used to reconstruct \(G_f(\mathbf{x}-\mathbf{y})\) as:

$$\begin{aligned} G_f\big (\mathbf{x}-\mathbf{y}\big ) = ik_f \displaystyle \sum _{n=0}^{\infty } \displaystyle \sum _{m=-n}^{n} \big [S_{n}^{m}(\mathbf{r}_1)\big ]^f \big [R_{n}^{-m}(\mathbf{r}_2)\big ]^f \end{aligned}$$
(10)

where \(\mathbf{r}_1 = \mathbf{c}-\mathbf{x}\), \(\mathbf{r}_2 = \mathbf{c}-\mathbf{y}\) and \(|\mathbf{r}_2| < |\mathbf{r}_1|\) [22].

3.2 Multipole expansion of \(U_{ij}\) and \(T_{ij}\)

The \(U_{ij}\) and \(T_{ij}\) solutions may be rewritten as:

$$\begin{aligned} U_{ij}&= \frac{1}{k_s^2 \mu }\Big (k_s^2G_s\delta _{ij} + \big [G_s\big ]_{ij} - \big [G_p\big ]_{ij}\Big ) \end{aligned}$$
(11)
$$\begin{aligned} T_{ij}&= \frac{1}{k_s^2}\Big [\beta k_p^2\big [G_p\big ]_{j}\delta _{ik} +k_s^2\Big (\big [G_s\big ]_{k}\delta _{ij} + \big [G_s\big ]_{i}\delta _{kj}\Big ) \nonumber \\&\quad + 2\Big (\big [G_s\big ]_{ijk} - \big [G_p\big ]_{ijk}\Big )\Big ]n_k(\mathbf{x}) \end{aligned}$$
(12)

where the \((\mathbf{x}-\mathbf{y})\) arguments have been neglected for brevity. Thus to apply Eq. (10) to \(U_{ij}\) and \(T_{ij}\) requires the first and second-order partial derivatives of R and S to express the third-order partial derivatives of \(G_f\). Gumerov and Duraiswami [38] derived compact analytic recursion relations for the first-order partial derivatives via the use of differential operators which are expressed in the spherical coordinate system of the basis functions. Here the first-order relations have been extended to provide new recursion relations for the second-order partial derivatives (see the Appendix) which circumvent the need for the more unwieldy expressions which result from a direct evaluation of the partial derivatives (see for example Ref. [40]). Adopting a similar notation for the partial derivatives of the spherical basis functions (i.e. \([F_n^m]_{i}^f\), \([F_n^m]_{ij}^f\) etc. for \(F = S,R\) and \(f=p,s\)) and neglecting the \(\mathbf{r}\) arguments for brevity, the following common terms result from the \(U_{ij}^{\mathrm{T}}t_i\) product in Eq. (1):

$$\begin{aligned} \Big (k_s^2G_s + \big [G_s\big ]_{ij}\Big )t_i= & {} ik_s\sum _{n=0}^\infty \sum _{m=-n}^{n} \Big (k_s^2\big [S_n^m\big ]^st_j\nonumber \\&\quad +\,\big [S_n^m\big ]_{ij}^st_i\Big ) \Big (\big [R_n^{-m}\big ]^s\Big ) \end{aligned}$$
(13)
$$\begin{aligned}&\big [G_p\big ]_{ij}t_i = -ik_p\sum _{n=0}^\infty \sum _{m=-n}^{n} \Big (\big [S_n^m\big ]_{i}^pt_i\Big )\Big (\big [R_n^{-m}\big ]_{j}^p\Big ) \end{aligned}$$
(14)

and similarly, the \(T_{ij}^{\mathrm{T}}u_i\) product has the common terms:

$$\begin{aligned}&\Big (\beta k_{p}^2\big [G_p\big ]_{k}\delta _{ij} - 2 \big [G_p\big ]_{ijk}\Big )n_iu_j \nonumber \\&\quad =ik_p\sum _{n=0}^\infty \sum _{m=-n}^{n}\left( \left( \beta k_{p}^2\big [S_n^m\big ]^p\delta _{ij} -2\big [S_n^m\big ]_{ij}^p \right) n_iu_j \right) \nonumber \\&\qquad \times \Big (\big [R_n^{-m}\big ]_{k}^p \Big ) \end{aligned}$$
(15)
$$\begin{aligned}&2\big [G_s\big ]_{ijk}n_iu_j = ik_s\sum _{n=0}^\infty \sum _{m=-n}^{n} \Big ( 2\big [S_n^m\big ]_{ij}^sn_iu_j \Big )\Big (\big [R_n^{-m}\big ]_{k}^s \Big ) \nonumber \\ \end{aligned}$$
(16)
$$\begin{aligned}&k_{s}^2 \big [G_s\big ]_{i}\big (n_iu_j + n_ju_i \big ) \nonumber \\&\quad = -ik_s^3\sum _{n=0}^\infty \sum _{m=-n}^{n} \Big (\big [S_n^m\big ]_{i}^s\big (n_iu_j +n_ju_i \big )\Big )\Big (\big [R_n^{-m}\big ]^s \Big ) \end{aligned}$$
(17)

The integral of \(U_{ij}^{\mathrm{T}}t_i\) over a plane triangular element E with area A and source point \(\mathbf{x}\) at its centre can be evaluated at a ‘well separated’ receiver point \(\mathbf{y}\), assuming a constant value of the unknown over the element, as:

$$\begin{aligned} \int _{E}{U_{ij}^{\mathrm{T}}t_i}dE \approx \frac{A}{k_s^2\mu } \Big ( k_s^2G_s - \big [G_p\big ]_{ij} + \big [G_s\big ]_{ij}\Big )t_i \end{aligned}$$
(18)

via Eqs. (13) and (14), where it is assumed that for a large separation distance between \(\mathbf{x}\) and \(\mathbf{y}\), the integral can be approximated by the product of the element area and the value of the unknown/fundamental solution at \(\mathbf{x}\). The integral for \(T_{ij}^{\mathrm{T}}u_i\) is similarly evaluated as:

$$\begin{aligned}&\int _{E}{T_{ij}^{\mathrm{T}}u_i}dE \approx A \bigg [2\Big (\big [G_s\big ]_{ijk} - \big [G_p\big ]_{ijk}\Big )n_iu_j \nonumber \\&\quad +k_{s}^2\big [G_s\big ]_{i}\big (n_iu_k + n_ku_i\big )+ \beta k_{p}^2\big [G_p\big ]_{k}\delta _{ij}n_iu_j \bigg ] \end{aligned}$$
(19)

via Eqs. (15), (16) and (17). Thus the multipole expansion in Eq. (18) requires four sets of S expansions: three for Eq. (13) with \(j=1\ldots 3\) and one for Eq. (14) while that for Eq. (19) requires five sets: one each for Eq. (15) and Eq. (16) and three for Eq. (17) with \(j=1\ldots 3\). Equation (19) can also be calculated using only four S expansions (as is done in [23]) by combining Eqs. (15) and (17):

$$\begin{aligned}&k_{s}^2 \big [G_s \big ]_{i}\big (n_iu_j + n_ju_i \big ) + 2\big [G_s\big ]_{ijk}n_iu_j \nonumber \\&\quad =-ik_s\sum _{n=0}^\infty \sum _{m=-n}^{n} \Big (k_s^2\big [S_n^m\big ]^s_{i}\big (n_iu_j + n_ju_i \big ) +2\big [S_n^m\big ]^s_{ijk}n_iu_j \Big )\nonumber \\&\quad \quad \times \Big (\big [R_n^{-m}\big ]^s \Big ) \end{aligned}$$
(20)

However Eq. (20) requires explicit recursion relations for the third-order partial derivatives of the S expansions.

4 The elastodynamic FMBEM: implementation

The FMBEM accelerates the matrix-vector products of the iterative BEM solution by calculating the far field part of the boundary integrals which involve ‘well separated’ elements via the FMM (Eqs. 18–19) and the near field part via the conventional BEM (Fig. 1a).

Fig. 1
figure 1

Calculation of the \(\mathbf{Ax} = \mathbf{b}\) matrix-vector product using the FMM (a) and the interactions between well-separated groups using multipole expansions (b). The source expansion set centred about \(\mathbf{c}_\mathbf{1}\) can be evaluated at the distant groups of receivers centred about \(\mathbf{c}_\mathbf{2}\) and \(\mathbf{c}_\mathbf{3}\) by translating the expansion centre along vectors \(\mathbf{t}_\mathbf{1} = \mathbf{c}_\mathbf{2}-\mathbf{c}_\mathbf{1}\) and \(\mathbf{t}_\mathbf{2} = \mathbf{c}_\mathbf{3}-\mathbf{c}_\mathbf{1}\)

The computational savings of the FMBEM result from the fact that multipole expansions which share the same expansion centre \(\mathbf{c}\) may be combined into a single set of coefficients and so the FMBEM is able to treat the interaction between groups of points/elements, instead of the pair-wise interactions of the conventional BEM. Furthermore, the ability to shift or ‘translate’ the expansion centres allows each expansion set to interact with multiple well separated groups by shifting the expansion centres to each group in turn (see Fig. 1b).

Thus the FMBEM requires two main mechanisms:

  1. 1.

    A method to translate the expansion centres, to build larger or smaller expansion sets by combining or separating pre-existing expansion groups.

  2. 2.

    A system for discretising the 3D space occupied by a BEM surface to both construct local element groups, and identify which groups are well separated.

Here the RCR translation method is used to apply the multipole translations, while an octree structure is used for the volume discretisation.

4.1 The rotation, co-axial translation, rotation (RCR) translation method

Three types of translation operations are required to implement the FMBEM, denoted here as R|R, S|S and S|R. Their purpose is to shift the expansion centre of an established set of S or R coefficients to a new location:

$$\begin{aligned} \big [R_n^m(\mathbf{r}+\mathbf{t})\big ]^f =&\sum _{n'=0}^\infty \sum _{m'=-n'}^{n'} (R|R)_f(\mathbf{t})\big [R_{n'}^{m'}(\mathbf{r})\big ]^f \end{aligned}$$
(21)
$$\begin{aligned} \big [S_n^m(\mathbf{r}+\mathbf{t})\big ]^f =&\sum _{n'=0}^\infty \sum _{m'=-n'}^{n'} (S|S)_f(\mathbf{t})\big [S_{n'}^{m'}(\mathbf{r})\big ]^f \end{aligned}$$
(22)
$$\begin{aligned} \big [S_n^m(\mathbf{r}+\mathbf{t})\big ]^f =&\sum _{n'=0}^\infty \sum _{m'=-n'}^{n'} (S|R)_f(\mathbf{t})\big [R_{n'}^{m'}(\mathbf{r})\big ]^f \end{aligned}$$
(23)

where \(|\mathbf{r}| > |\mathbf{t}|\) and \(|\mathbf{r}| < |\mathbf{t}|\) for Eqs. (22) and (23) respectively [22]. Applying the above translations using the Wigner 3-j symbols [22] requires \(O(p^5)\) operations to translate a truncated multipole expansion with \(p^2\) coefficients for some positive integer p [19] (note that a full discussion on the proper choice of p is presented in Sec. 4.3), while the translation coefficients may be derived in \(O(p^4)\) operations [22]. The RCR translation method [38] is used in the present model, which only requires \(O(p^3)\) operations per translation, as both arbitrary rotations and translations co-axial to the z-axis of the spherical coordinate system can be applied in \(O(p^3)\) operations. Thus the RCR method first rotates the coordinate system such that the new z-axis is co-axial to the translation vector, applies the z-axis translation to the new expansion centre, and then reverses the rotation, each costing \(O(p^3)\) operations (see Fig. 2b).

Fig. 2
figure 2

Direct (a) and RCR (b) multipole translations, costing \(O(p^5)\) and \(O(p^3)\) operations per translation respectively

4.2 The octree structure

The FMBEM requires the volume occupied by the boundary surface to be systematically discretised (separate to the BEM discretisation) to construct multipole expansions for local element groups and to apply the interactions between well-separated groups. An octree structure can discretise the unit cube by recursively splitting each cube into eight smaller cubes. The octree is implemented by building a bit-interleaved binary integer from the normalised set of source points, in the form [x-bit, y-bit, z-bit, x-bit, y-bit, z-bit, \(\ldots \)] from the most to least significant bits [22]. The leading three bits then indicate in which of the eight boxes the point resides: for each dimension a 0 or 1 bit indicates the box closer or further from the origin respectively. Subsequent three-bit groups then recursively indicate which of the eight ‘children’ boxes the point occupies when the ‘parent’ cube is subdivided. Truncating the bit-interleaved numbers after 3l bits gives l octree levels (see Fig. 3). The octree structure has several advantageous features:

  • The octree is constructed from the bit-interleaved numbers and so empty boxes are never constructed or searched to find in which box a point resides.

  • The truncated bit-interleaved numbers on any octree level l will give the same box number \(\mathbf{d}_l\) for all points contained within each box, yielding a simple method to build local multipole expansion groups.

  • Expansion groups are considered ‘well separated’ if separated by at least one box on the same level.

  • Expansion groups of various sizes can be constructed by either combining children boxes on level \(l+1\) to give the parent group on level l or vice-versa.

The octree structure thus provides fast search procedures (bit-deinterleaving or bit-shift operations [22]) for the grouping and interaction of the sets of multipole expansions. The finest level of subdivision is determined here by specifying an upper limit to the allowed number of elements per box on the lowest octree level.

Fig. 3
figure 3

Octree structure for an example point with a bit-interleaved binary number of 101100 truncated to six bits. The first three bits 101 indicate the point lies within the right x-box, left y-box and right z-box on level one (8 boxes total). The next three-bit group 100 then indicates which of the eight children boxes (resulting from subdividing the parent box on level one) that the point resides on level two (64 boxes total)

4.3 Truncation of the spherical basis functions

The S and R expansions must be truncated for the numerical implementation of the FMM. The approximation errors inherent in the FMM are dependent upon the truncation length, translation algorithm and the specified ‘well-separated’ criterion. Gumerov and Duraiswami showed that the errors introduced by the truncation and translation of the spherical basis functions are bounded [22] and proposed the following formulae for the truncation number \(p \equiv \hbox {max}(n)\) in Eq. (10) [26]:

$$\begin{aligned} p_{\mathrm{h}}&=ka+\frac{\left( 3\, \mathrm{ln}\left( \frac{1}{\xi } \right) \right) ^{\frac{2}{3}}(ka)^{\frac{1}{3}}}{2},\quad p_{\mathrm{l}}=1-\frac{\mathrm{ln}\left[ \xi \left( 1- \frac{1}{\sigma } \right) ^{\frac{3}{2}} \right] }{\mathrm{ln}(\sigma )} \nonumber \\ p&= \left( p_{\mathrm{h}}^{4} + p_{\mathrm{l}}^{4} \right) ^{\frac{1}{4}} \end{aligned}$$
(24)

where k is the wavenumber, 2a is the diagonal length of an unnormalised box on level l of the octree, \(\xi \) is a specifiable error tolerance for the multipole expansions and \(\sigma =2\). A p-truncated expansion thus has \((p-1)^2\) (n, m) coefficients (see Eq. 9). All octree boxes on each octree level l have the same truncation number \(p_l\) and so the translations between octree levels require interpolation or ‘anterpolation’ (adjoint-interpolation) procedures to extend or contract the truncated expansions. The RCR method automatically applies these operations in the decomposition of the translation (see Fig. 4).

Fig. 4
figure 4

Interpolation (a) and anterpolation (b) procedures applied via the RCR method as 3 sparse matrix-vector products. An expansion with \((p)^2\) coefficients is interpolated to \((p')^2\) coefficients (\(p' > p\)) by multiplying in turn by the rotation matrix \((p)^2 \times (p)^2\), the rectangular co-axial translation matrix \((p')^2 \times (p)^2\) and the reverse rotation matrix \((p')^2 \times (p')^2\). Similarly, anterpolation from \((p')^2\) to \((p)^2\) coefficients is applied via rotation \((p')^2 \times (p')^2\), co-axial translation \((p)^2 \times (p')^2\) and reverse rotation \((p)^2 \times (p)^2\). The sparsity patterns correspond to ascending (n, m) coefficients in the truncated expansions

4.4 Numerical integration of the near field

The near field for each element, consisting of all elements contained within the same octree box or the nearest neighbour octree boxes on the lowest octree level, are not considered well separated for multipole expansion and so are treated via regular numerical integration and stored in a sparse near field matrix (see Fig. 1). Furthermore the \(\frac{1}{r}\) dependence of the Helmholtz Green’s function results in near-singular and singular behaviour in the \(U_{ij}\) and \(T_{ij}\) fundamental solutions when \(r \rightarrow 0\) and \(r = 0\) respectively. Substitution of Eq. (4) into Eqs. (2) and (3) reveals apparent singularities of the order \(\frac{1}{r^3}\) and \(\frac{1}{r^4}\) in \(U_{ij}\) and \(T_{ij}\) respectively. Bonnet [8] showed that by applying a series expansion to the exponential terms the singularities in \(U_{ij}\) and \(T_{ij}\) can be weakened to \(\frac{1}{r}\) and \(\frac{1}{r^2}\) respectively. Thus the three types of near field element integrals are treated as follows:

Regular integration: The regular integrals of \(U_{ij}\) and \(T_{ij}\) are treated using standard low order Gauss–Legendre quadrature for plane triangular elements [41].

Near-singular integration: Elements which may exhibit near-singular behaviour are treated using a near-singular integration technique [42] and are applied to the weakened singularities of the fundamental solutions.

Singular integration: The singular integrals are applied to the weakened singularities. The \(\frac{1}{r}\) singularity of \(U_{ij}\) is treated using a radial coordinate transformation [43], while the \(\frac{1}{r^2}\) singularity of \(T_{ij}\) is approximated using a limiting case of the near-singular integral (where the receiver point is placed arbitrarily close to the collocation source point of the singular element). The approximated \(T_{ij}\) singular integrals were compared to an analytic integration technique [37] for the main diagonal \(T_{ii}\) terms, yielding an agreement of 3–4 significant figures when using a 30 point Gauss–Legendre quadrature rule. This level of accuracy is sufficient for the constant elements and is consistent with the errors inherent in the FMM. An obvious avenue for future work is to implement a more robust singular integration technique for the \(T_{ij}\) terms, or to implement a regularisation method (for example [44]) to reduce the order of the singularities.

4.5 FMBEM procedure

The FMM is thus applied to the BEM to calculate the far fields (\(\textit{FF}\)) parts of the surface integrals via Eqs. (18) and (19) for all elements considered well separated on the lowest octree level, while the near field (\(\textit{NF}\)) parts of the surface integrals are precalculated and stored in sparse near field matrices \(\mathbf{U}_{(\textit{NF})}\) and \(\mathbf{T}_{(\textit{NF})}\) to be reused in each step of the iterative solution. The FMBEM first requires a precalculation stage that must only be performed once prior to the iterative solution, and then for each iteration the FMM procedure consists of an ‘upward pass’, a ‘downward pass’ and a ‘final summation’.

4.5.1 Precalculation stage

The iterative FMBEM solution is set up as follows:

  • Calculate the bit-interleaved binary numbers for the octree structure from the set of collocation points.

  • Calculate the lowest (most refined) octree level \(\hat{l}\) using the bit-interleaved numbers and the user-specified maximum number of elements per box. Truncate the bit-interleaved numbers to \(3\hat{l}\) bits.

  • Determine the truncation numbers via Eq. (24).

  • Precalculate and store the RCR S|R, S|S and R|R translation coefficients for all octree levels.

  • Finally, calculate the near field part of the surface integrals and store as sparse matrices.

4.5.2 Upward pass

The purpose of the upward pass is to build a single combined multipole expansion for each set of source points contained in every box on every level of the octree structure. This is achieved by first building the required source expansions for \(U_{ij}\) and/or \(T_{ij}\) about the set of expansion centres \(\mathbf{c}_{\hat{l}}\) at the box centres on the lowest octree level \(\hat{l}\). The sets of expansions for each octree box are then combined into a single source expansion per box. Thus the 4 sets of source expansions for \(U_{ij}^{\mathrm{T}}t_i\) [Eq. (13) for \(j=1\ldots 3\) and Eq. (14)], are:

$$\begin{aligned} \Big [R_{\hat{l}}^{[U(j)]}(\mathbf{c}_{\hat{l}})\Big ]^s&= \sum _{\mathbf{x}_N \in \mathbf{d}_{\hat{l}}} ik_s\frac{A_{N}}{k_s^2\mu }\Big (k_s^2\big [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_N)\big ]^st_j(\mathbf{x}_N) \nonumber \\&\quad + [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_N)]^s_{ij}t_i(\mathbf{x}_N)\Big ) \end{aligned}$$
(25)
$$\begin{aligned} \Big [R_{\hat{l}}^{[U(4)]}(\mathbf{c}_{\hat{l}})\Big ]^p&= \sum _{\mathbf{x}_N \in \mathbf{d}_{\hat{l}}} -ik_p\frac{A_{N}}{k_s^2\mu }[R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_N)]^p_it_i(\mathbf{x}_N) \end{aligned}$$
(26)

where the expansions are built and summed for each set of source points \(\mathbf{x}_N\) contained within each of the octree boxes \(\mathbf{d}_{\hat{l}}\) with corresponding box centres \(\mathbf{c}_{\hat{l}}\). Similarly the 5 sets of expansions for \(T_{ij}^{\mathrm{T}}u_i\) are:

$$\begin{aligned} \Big [R_{{\hat{l}}}^{[T(1)]}(\mathbf{c}_{\hat{l}})\Big ]^p&= \sum _{\mathbf{x}_N \in \mathbf{d}_{\hat{l}}} ik_pA_{N}\Big (\beta k_p^2\big [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_N)\big ]^p\delta _{ij} \nonumber \\&\quad -2\big [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_N)\big ]^p_{ij} \Big )n_i(\mathbf{x}_{N})u_j(\mathbf{x}_{N}) \end{aligned}$$
(27)
$$\begin{aligned} \Big [R_{\hat{l}}^{[T(2)]}(\mathbf{c}_{\hat{l}})\Big ]^s&= \sum _{\mathbf{x}_N \in \mathbf{d}_{\hat{l}}} 2ik_sA_{N}\big [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_N)\big ]^s_{ij} \nonumber \\&\quad \times n_i(\mathbf{x}_{N})u_j(\mathbf{x}_{N}) \end{aligned}$$
(28)
$$\begin{aligned} \Big [R_{{\hat{l}}}^{[T(j)]}(\mathbf{c}_{\hat{l}})\Big ]^s&= \sum _{\mathbf{x}_N \in \mathbf{d}_{\hat{l}}}{}-ik_s^3A_{N}\big [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_N)\big ]^s_i\nonumber \\&\quad \times \Big (n_i(\mathbf{x}_{N})u_j(\mathbf{x}_{N}) + n_j(\mathbf{x}_{N})u_i(\mathbf{x}_{N}) \Big ) \end{aligned}$$
(29)

Note that in Eqs. (25)–(29) the source expansions are constructed using R expansions (Eq. 8) as the S|R translations (Eq. 23) are applied to R expansions and yield S expansions upon RCR translation.

The combined source expansions for each box on the lowest octree level \(l=\hat{l}\) which share the same parent box on level \(l-1\) may then be R|R translated (via Eq. 21) from the child to the parent box centre and combined to yield an expansion set for each box on level \(l-1\):

$$\begin{aligned} \Big [R_{l-1}^{[U(g)]}(\mathbf{c}_{l-1})\Big ]^f&= \sum _{\mathbf{d}_{l} \in \mathbf{d}_{l-1}} (R|R)_f(\mathbf{c}_{l-1}-\mathbf{c}_{l})\Big [R_{l}^{[U(g)]}(\mathbf{c}_{l})\Big ]^f\end{aligned}$$
(30)
$$\begin{aligned} \Big [R_{l-1}^{[T(h)]}(\mathbf{c}_{l-1})\Big ]^f&= \sum _{\mathbf{d}_{l} \in \mathbf{d}_{l-1}} (R|R)_f(\mathbf{c}_{l-1}-\mathbf{c}_{l})\Big [R_{l}^{[T(h)]}(\mathbf{c}_{l})\Big ]^f \end{aligned}$$
(31)

for \({g=1\ldots 4}\), \({h=1\ldots 5}\) and \({f=s,p}\) as per Eqs. (25)–(29). The interpolation required to extend the expansions from length \(p_l\) on level l to length \(p_{l-1}\) on level \(l-1\) is automatically applied by the RCR algorithm. The process of children to parent R|R translation and summation is repeated up to \(l=2\) (containing \([4\,{\times }\, 4\,{\times }\, 4]\) boxes) as on level 1 (8 boxes total) no pair of boxes is well separated from one another for S|R translation.

4.5.3 Downward pass

The purpose of the downward pass is to apply the S|R translations between all well separated boxes on each successive octree level. Beginning from level 2, the S|R translations are applied between box centres for each set of well separated source boxes \([\mathbf{d}_{l}]_{w}\) to each receiver box (see Fig. 5a) and then combined into a single expansion per box. The set of nearest neighbour boxes \([\mathbf{d}_{l}]_{n}\) for each receiver (the \([3\times 3\times 3]\) region of surrounding boxes) are not well separated for S|R translation. The S|R translations are applied via Eq. (23) as:

$$\begin{aligned} \Big [S_{l}^{[U(g)]}(\mathbf{c}_{l})\Big ]^f&= \sum _{\mathbf{d}_{l} \not \in [\mathbf{d}_{l}]_{n}}(S|R)_f(\mathbf{c}_{l} - [\mathbf{c}_{l}]_{w})\Big [R_{l}^{[U(g)]}([\mathbf{c}_{l}]_{w})\Big ]^f \end{aligned}$$
(32)
$$\begin{aligned} \Big [S_{l}^{[T(h)]}(\mathbf{c}_{l})\Big ]^f&= \sum _{\mathbf{d}_{l} \not \in [\mathbf{d}_{l}]_{n}} (S|R)_f(\mathbf{c}_{l} - [\mathbf{c}_{l}]_{w})\Big [R_{l}^{[T(h)]}([\mathbf{c}_{l}]_{w})\Big ]^f \end{aligned}$$
(33)

The combined S expansions for each receiver box are then S|S translated from the parent box centres on level 2 to the children boxes on level 3, thus allowing a new region of the octree boxes to be S|R translated using the level 3 S expansions (see Fig. 5b). The downward S|S translations are applied as:

$$\begin{aligned} \Big [S_{l+1}^{[U(g)]}(\mathbf{c}_{l+1})\Big ]^f&= (S|S)_f(\mathbf{c}_{l+1} - \mathbf{c}_{l})\Big [S_{l}^{[U(g)]}(\mathbf{c}_{l})\Big ]^f \end{aligned}$$
(34)
$$\begin{aligned} \Big [S_{l+1}^{[T(h)]}(\mathbf{c}_{l+1})\Big ]^f&= (S|S)_f(\mathbf{c}_{l+1} - \mathbf{c}_{l})\Big [S_{l}^{[T(h)]}(\mathbf{c}_{l})\Big ]^f \end{aligned}$$
(35)

The RCR algorithm automatically applies any required anterpolation to contract the truncation degree \(p_l\) on level l to \(p_{l-1}\) on level \(l-1\). The new intermediate region which become available upon S|S translation to the lower octree level consists of a maximum of \(6^3-3^3=189\) occupied boxes and so at most 189 S|R translations are applied per box per octree level in the FMBEM. The downward pass repeats the process of S|R translation (via Eqs. 32–33) and S|S translation (via Eqs. 34–35) until the lowest octree level \(\hat{l}\) is reached.

Fig. 5
figure 5

S|R translations (black lines) for 2 arbitrary boxes on octree levels 2 (a) and 3 (b). The S|R translations are directed from the well-separated light grey source boxes to each dark grey receiver box. On level 3 the larger level 2 boxes have already been S|R translated to the level 2 parent box and so are R|R translated to the children box centres on level 3. The neighbour boxes of each receiver (empty regions) are not well separated for S|R translation

4.5.4 Final summation

The final S expansions from the downward pass represent the radiated field from all well separated sources from all octree levels, evaluated at each receiver box on the lowest octree level. The complete surface integrals are then constructed as:

$$\begin{aligned}&\int _{S}U_{ij}^{\mathrm{T}}(\mathbf{x}_{N}, \mathbf{y}_{N}) t_i(\mathbf{x}_{N})dS(\mathbf{x}_{N}) \approx \displaystyle \sum _{n=0}^{p_{\hat{l}}} \displaystyle \sum _{m=-n}^{n} \Bigg (\Big [S_{\hat{l}}^{[U(i)]}(\mathbf{c}_{\hat{l}})\Big ]^s\nonumber \\&\quad \times \overline{\Big [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_{N})\Big ]^s} -\Big [S_{\hat{l}}^{[U(4)]}(\mathbf{c}_{\hat{l}})\Big ]^p\overline{\Big [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_{N})\Big ]^p_{i}} \Bigg ) \nonumber \\&\quad +\mathbf{U}_{(NF)}t_i(\mathbf{x}_{N})\end{aligned}$$
(36)
$$\begin{aligned}&\int _{S}T_{ij}^{\mathrm{T}}(\mathbf{x}_{N}, \mathbf{y}_{N}) u_i(\mathbf{x}_{N})dS(\mathbf{x}_{N}) \approx \displaystyle \sum _{n=0}^{p_{\hat{l}}} \displaystyle \sum _{m=-n}^{n} \Bigg (\Big [S_{\hat{l}}^{[T(1)]}(\mathbf{c}_{\hat{l}})\Big ]^s \nonumber \\&\quad \times \overline{\Big [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_{N})\Big ]^p_{i}} +\Big [S_{\hat{l}}^{[T(2)]}(\mathbf{c}_{\hat{l}})\Big ]^s\overline{\Big [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_{N})\Big ]^s_{i}} \nonumber \\&\quad +\Big [S_{\hat{l}}^{[T(i)]}(\mathbf{c}_{\hat{l}})\Big ]^s\overline{\Big [R_{\hat{l}}(\mathbf{c}_{\hat{l}} - \mathbf{x}_{N})\Big ]^s}\Bigg ) + \mathbf{T}_{(NF)}u_i(\mathbf{x}_{N}) \end{aligned}$$
(37)

where overbars denote the complex conjugate of the R expansions and \(\overline{R_n^m(\mathbf{r})} = R_n^{-m}(\mathbf{r})\) [22] (see Eq. 10). Finally, for simplicity we have assumed here a single truncation number \(p_l\) for all expansions per octree level, when in fact \(p_l\) is also dependent on the wavenumber and so the truncated expansions will have different lengths on each octree level for the \(k_s\) and \(k_p\) terms.

4.6 Translation stencils

The number of required S|R translations may be reduced by noting that many of the well separated source boxes are common to more than 1 of the 8 receiver boxes in each local group (i.e. sharing the same parent box). The S|R translations for the common source boxes may instead be applied to an expansion point at the centre of the local group and the combined expansions can then be S|S translated to each individual receiver box. Gumerov and Duraiswami implemented such a translation stencil for the Laplace fundamental solution [39] for source boxes which are separated from all 8 boxes in the local receiver groups, and they note (without details) that a similar method can be applied to groups of 4 and 2 receiver boxes. Such an implementation for the 4 and 2-box stencils is presented here for the elastodynamic FMBEM for the first time, along with details for the 8-box stencil which are included for completeness.

The well separated criterion required for the S|R translation is described by the parameter:

$$\begin{aligned} \eta = \frac{\mathrm{max}({r}_1,{r}_2)}{D - \mathrm{min}({r}_1,{r}_2)} \end{aligned}$$
(38)

where \(r_1\), \(r_2\) and D are the magnitudes of the 2 expansion vectors and the translation vector respectively [39]. The largest possible expansion vector in any box is \(\frac{3^{1/2}}{2}\) (assuming a unit length for the box) while the shortest S|R translation vector is \({D=2}\). Thus in the worst case \(\eta \approx 0.7637\) and so provided:

$$\begin{aligned} D \ge \mathrm{min}({r}_1,{r}_2) + \frac{\mathrm{max}({r}_1,{r}_2)}{0.7637} \end{aligned}$$
(39)

the S|R expansions can be applied to local receiver groups with larger expansion vectors for \(r_1\) and/or \(r_2\).

4.6.1 8-box stencil

The \(6\times 6\times 6\) region of well separated source boxes contains a set of 80 boxes which satisfy Eq. (39) for the group of 8 receiver boxes (e.g. \(r_1=\frac{3^{1/2}}{2}\), \(r_2 = \sqrt{3}\)) as shown in Fig. 6. The S|R translations for these source boxes are applied to the centre of the parent receiver box and then S|S translated to each of the 8 receiver boxes. In fact the required parent to child S|S translations are applied in the downward pass of the FMBEM procedure and so the S|R translations for the 8-box stencil can be applied on the parent octree level. Thus the 8-box stencil reduces 8 sets of 80 S|R translations to 80 S|R translations and so the total number of translations for the local group of 8 receivers, \(8\times 189 = 1512\) translations, reduces to \(1512-8\times 80+80=952\) translations.

Fig. 6
figure 6

Translation stencil boxes for the group of 8 receivers. The set of 80 light grey source boxes are considered well separated from the parent box containing the 8 children receiver boxes (dark grey). For clarity the stencil has been split into halves which connect via the dashed lines

4.6.2 4-box stencil

The local group of 8 receiver boxes may also be split into subgroups of 4 boxes which are in planar configurations aligned along the \(\pm \)xyz-axis directions (i.e. 6 overlapping groups in total) with each receiver box being a member of 3 of the 4-box groups. The sets of well separated sources boxes for each 4-box group contains 40 source boxes which satisfy Eq. (39) \(\big (r_1=\frac{3^{1/2}}{2}\), \(r_2 = \frac{3}{2}\big )\) and are not contained within the 8-box stencil. The 4-box stencils partially overlap one another and so if they are first applied, for example, to the sets aligned in the z-axis planes (see Fig. 7) then the 4-box stencils for the receiver groups aligned in the x/y planes will only contain 4 boxes per stencil which have not already been treated in the 8-box and z-plane 4-box stencils. Thus the 4-box stencils reduce 4 sets of 40 S|R translations for 2 of the 4-box receiver groups (aligned in the same axis planes) to 40 S|R translations + 4 S|S translations (to translate from the 4-box group centre to the individual boxes) per group, while the 4-box groups aligned in the other axis directions each reduce 4 sets of 4 S|R translations to 4 S|R + 4 S|S translations per group. Hence the total number of S|R translations is further reduced to \(952-2(4\times 40-40)-4(4\times 4-4)=664\) S|R translations + 24 S|S translations.

Fig. 7
figure 7

Translation stencils for the 4-box receiver groups aligned in planes in the z-axis. The left/right diagrams show the sets of 40 boxes which are well separated from the 4-box groups aligned in the upper/lower z-plane (the clear boxes indicate the other receiver boxes in the local 8-box group)

4.6.3 2-box stencil

Finally the local group of 8 receiver boxes may also be split into 4 groups of 2 boxes \(\big (r_1=\frac{3^{1/2}}{2}\), \(r_2 = \sqrt{\frac{3}{2}}\big )\), which can be aligned along each of the axis directions. The number of source boxes per stencil set differs depending on which of the axis directions the 4-box stencils were first applied. Assuming for example that the 4-box stencils were first applied to the z-aligned groups of receivers, the translation stencils for the 2-box groups will be as shown in Fig. 8, where the z-axis 2-box stencils each contain 10 source boxes (as the majority of the boxes have already been treated in the z-aligned 4-box translation stencils) while the x/y aligned box pairs each have 24 boxes per stencil. The combined sets of the well separated source boxes for all 4 x/y aligned 2-box groups exactly intersect, while that of the z-aligned groups is a subset of the x/y stencils, and so the 2-box stencils need only be applied along one of the axis directions. Here the x/y aligned box pairs would be the preferred choice to maximise the number of source translations treated by the stencils. The 2-box stencils thus reduce \(8\times 24\) S|R translations to \(4\times 24\) S|R + 8 S|S translations, further reducing the total to \(664-8\times 24+4\times 24=568\) S|R + 32 S|S translations.

Fig. 8
figure 8

Translation stencils for the 2-box receiver groups aligned along the x, y and z axes, assuming the 4-box translation stencils were first applied to the z-aligned groups. The translation stencils for the x/y aligned receiver groups each have 24 boxes while the z-aligned groups have 10 boxes

Finally, there remains 37 source boxes per receiver which are too close to satisfy Eq. (39) with any kind of receiver grouping and so they are directly applied for each receiver box (see Fig. 9). Using the 8, 4 and 2 box translation stencils thus allows a reduction of 1512 possible S|R translations to 568 S|R translations + 32 S|S translations, or 71 S|R translations + 4 S|S translations per receiver box (a \(~60\,\%\) reduction in the total number of applied translations). Gumerov and Durasiwami [39] state that they were able to reduce the number of S|R translations to 61 per receiver box using 8, 4 and 2-box stencils but do not provide details of their implementation. It should be noted that the best computational savings are achieved when all source and receiver boxes are occupied. This is generally not the case for the octree structrues constructed for arbitrary 3D BEM meshes, and so the efficiency is reduced. In the present implementation the orientations of the 4 and 2-box receiver groups are optimally choosen to minimise the total number of S|R translations required when the octree structure is only partially filled.

Fig. 9
figure 9

Full translation stencil for 1 receiver box in the local group. The top, middle and bottom rows show the contributions from the 2, 4 and 8-box stencils which construct the complete well separated region of \(6^3-3^3\) boxes

4.7 FMBEM iterative solution and preconditioning

In this work the GMRES method is used for the iterative BEM solution of the \(\mathbf{Ax} = \mathbf{b}\) discretised system of equations. Central to the convergence of the iterative solution is the use of an appropriate preconditioner matrix \(\mathbf{A}_{\mathrm{pre}}\), which seeks to improve the condition number of \(\mathbf{A}\) and so reduce the number of iterations required for convergence of the preconditioned system of equations \(\mathbf{A}_{\mathrm{pre}}\mathbf{Ax} = \mathbf{A}_{\mathrm{pre}}\mathbf{b}\). Particularly one would like \(\mathbf{A}_{\mathrm{pre}} \sim \mathbf{A}^{-1}\) so that \(\mathbf{A}_{\mathrm{pre}}\mathbf{A} \sim \mathbf{I}\).

A combination of preconditioning strategies is employed here to accelerate the convergence of the iterative solution. Firstly a low accuracy FMBEM is used as a preconditioner for the main FMBEM iterative solution [25, 26]. For each outer GMRES iteration, a low accuracy FMBEM is used to converge an inner GMRES solution to a coarse convergence tolerance, and this ‘preconditioned’ solution is then passed to the current outer GMRES loop iteration, which is calculated using the full accuracy FMBEM and iterated to a fine convergence tolerance. The nested outer–inner GMRES solution results in a different preconditioner being used for each outer iteration, thus making it a flexible GMRES (fGMRES) method [45]. Advantageously, the multipole expansions and RCR translations may be reused for the preconditioner FMBEM by further truncating the full accuracy expansions/translations and so there is no additional memory cost. Secondly, the low accuracy inner GMRES loop is itself preconditioned using a sparse approximate inverse (SAI) preconditioner [46] constructed from the sparse near field matrices \(\mathbf{U}_{(NF)}\) and/or \(\mathbf{T}_{(NF)}\), as was similarly done by Sanz et al. [24] for the high frequency elastodynamic FMBEM. It has been observed by some authors [26, 47] that the exclusive use of local preconditioners constructed from the sparse near field matrices provide slower convergence at higher wavenumbers due to the reduced diagonal dominance of the coefficient matrices, while the low accuracy FMBEM preconditioner works well at all wavenumbers [26]. Thus the present implementation novelly combines the preconditioning strategies in Refs. [24] and [25] and is newly applied to the low frequency elastodynamic FMBEM.

5 Numerical results

This section presents a number of numerical results for the low frequency elastodynamic FMBEM, including a characterisation of the model in terms of its algorithmic complexity and memory requirements. This appears to be the first time that such results have been presented in the literature for the low frequency elastodynamic FMBEM models [18, 21, 29]. Comparisons to analytic solutions and/or published results from high frequency elastodynamic FMBEM models also quantify the accuracy of the present model in terms of the element per shear wavelength (\(\lambda _s\)) discretisation.

For all numerical examples the convergence tolerances of the outer and inner GMRES loops are specified as \(1\times 10^{-4}\) and \(1.5\times 10^{-1}\) respectively, while the corresponding accuracies of the multipole translations (\(\xi \) in Eq. 24) are specified as \(1\times 10^{-4}\) and \(2\times 10^{-1}\) respectively. The maximum octree level is determined by specifying a maximum of 40 sources/elements per box (unless otherwise stated) on the lowest octree level \(\hat{l}\). All results were calculated on an Intel i7 hexacore processor running at 3.2GHz, equipped with 64GB of RAM.

5.1 P-wave scattering from a spherical traction-free cavity

Numerical results for the developed low frequency elastodynamic FMBEM are first presented for the problem of a unit amplitude compressional plane wave scattering from a unit radius traction-free spherical cavity embedded in an infinite exterior elastic solid domain. The problem is solved for shear wavenumbers of \(k_s=13\) and \(k_s=43\), the elastic material properties are specified as \(\nu =1/3\) and \(\mu =1/4\) Pa, and the mesh contains 148, 390 elements (\(N=445,170\)), giving approximately 85 and 20 elements per \(\lambda _s\) for the \(k_s=13\) and \(k_s=43\) wavenumbers respectively. Fig. 10 shows the real/imaginary parts of the radial component of total surface displacement as a function of the angle from the incident wave direction (\(0^{\circ }\) being the backscatter direction) compared to the analytic solution [48]. The relative error norm (r.e. norm) \(\frac{\hbox {norm}(\mathbf{s} -\mathbf{s}^{\mathrm{an}})}{\hbox {norm}(\mathbf{s}^{\mathrm{an}})}\) between the FMBEM \((\mathbf{s})\) and analytic \((\mathbf{s}^{\mathrm{an}})\) solution was 1.50 and \(1.72\,\%\) for the \(k_s=13\) and \(k_s=43\) wavenumbers respectively. The present FMBEM took 2.9 and 39 hours to solve the problem at the \(k_s=13\) and \(k_s=43\) wavenumbers, and required 18.24 GB and 19.95 GB of storage space respectively when using \(\hat{l} = 5\) and a maximum of 80 sources/elements per \(\hat{l}\) box. The \(k_s=13\) problem represents a large-scale problem in the low frequency regime, having \(\approx 0.13 \lambda _s\) per \(\hat{l}\) box, while the \(k_s=43\) problem represents the practical upper limit of the wavenumber range for the low frequency RCR translations, whose \(O(p^3)\) cost becomes prohibitive for the large truncation numbers required at higher frequencies.

Fig. 10
figure 10

Real/imaginary parts of the radial component of the FMBEM total surface displacement versus the angle from the incident wave direction (i.e. \(0^{\circ } =\) backscatter direction), compared to the analytic solution [48]

The FMBEM solutions for both wavenumbers required a relatively large amount of memory due to the storage of the sparse near field matrices—15.5GB in both cases—which constitutes about \(0.5\,\%\) of the full coefficient matrix. Solution of the same problem configeration and wavenumbers using \(\hat{l} = 6\) and a maximum of 40 sources/elements per \(\hat{l}\) box (giving \(\approx 6.5 \times 10^{-2} \lambda _s\) per \(\hat{l}\) box for the \(k_s=13\) problem) reduces the memory requirements to 5.46 GB and 6.65 GB of memory but increases the solution times to 4.7 and 73 hours respectively for the \(k_s=13\) and \(k_s=43\) wavenumbers. This trade-off between the solution time and memory usage is typical of the FMBEM.

5.2 Elastic sphere under a radial traction

The accuracy of the present FMBEM model is now investigated at very low frequencies. The numerical example of a solid elastic sphere excited by a time-harmonic radial traction on the sphere surface is adopted, which has a known analytic solution [6]. The sphere has a radius of 6 m, the elastic properties are specified as \(\nu =1/4\) and \(\mu =10^6\) Pa, and the radial surface traction (i.e. pressure) is specified as 100 Pa. The solution at all wavenumbers is calculated on a 2700 element mesh and compared to the full BEM results. A small amount of material damping is also included in both the numerical and analytic solutions via complex wavenumbers \(k_s^* = k_s(1+0.01i)\) and \(k_p^* = \sqrt{\frac{1-2\nu }{2(1-\nu )}}k_s^*\) [30] to provide a finite value in the analytic solution at the singularity occurring at \(k_s \approx 0.74\). It should be noted that the rules for determining truncation numbers for the FMBEM should be suitably modified when using complex wavenumbers to maintain a specified accuracy (as shown in Ref. [30] for the high frequency visco-elastodynamic FMBEM). However in the limiting case of very small complex wavenumber components (as used here) the modified rules simplify to the standard truncation formula and so the unmodified truncation numbers based on Eq. (24) have been employed for this example. Fig. 11 shows the modulus of the mean radial displacement results from the BEM, FMBEM and the corresponding analytic solution versus \(k_s\).

Fig. 11
figure 11

Absolute value of mean radial displacement versus \(k_s\) for a solid elastic sphere excited by a time-harmonic radial traction for the BEM and FMBEM, compared to the analytic solution [6]. A small amount of material damping has also been included in the numerical/analytic solutions [30]

The numerical (BEM and FMBEM) and analytic solutions presented in Fig. 11 can be seen to be in good agreement. The FMBEM solution uses 2 octree levels (\(\hat{l} = 3\)), and so the level 3 octree boxes have a length of 1.5 m, corresponding to \(~2.4\times 10^{-2}\lambda _s\) at the lowest wavenumber of \(k_s=0.1\). Thus the present FMBEM allows low frequency problems to be treated without restriction to the minimum fractional wavelength per octree box, which would otherwise be required by the high frequency FMBEMs to avoid instabilities in the diagonal S|R translation coefficients [1, 12]. Finally it should noted that for the relatively small problem size used in this comparison (\(N=8100\)), the total solution time of the FMBEM is up to \(20\,\%\) slower than that of the BEM. This is typically the case for the FMBEM solution of small problems due to the initial computational overheads of setting up the FMM (multipole expansions, octree structure etc.) and so the ‘cross over’ point where the FMBEM becomes faster than the BEM is typically of the order of \(10^3-10^4\) unknowns [17, 18] depending (strongly) on the type of FMBEM model/settings and the problem of interest. For the present numerical example and FMBEM settings the cross-over point is approximately \(10^4\) unknowns.

5.3 Algorithmic complexity

The measured algorithmic complexity and memory requirements of the developed FMBEM model are now presented. A detailed theoretical evaluation of the multilevel FMBEM using the \(O(p^3)\) RCR translations can be found in Ref. [22], where a theoretical algorithmic complexity of \(O(N^{3/2})\) was derived for the case when the number of unknowns is proportional to the wavenumber (i.e. \(N \propto k\)), while at low frequencies (i.e. ) the derived theoretical algorithmic complexity was O(N). Algorithmic complexity results are provided for these two scenarios which involve (1) an increasing wavenumber with an approximately fixed number of elements per wavelength \((N \propto k)\), and (2), an increasing number of unknowns for a fixed wavenumber .

Table 1 first presents the iterative solution time (normalised to the GMRES solution time for each problem), the number of GMRES/fGMRES iterations and the final solution residual versus N for a number of solver configurations (GMRES, fGMRES, fGMRES/SAI and fGMRES/SAI + translation stencils) to indicate the relative performance of each solution method. The results presented are for \((N \propto k)\) using approximately 20 elements per shear wavelength, for the the case of a unit radius solid elastic sphere excited by a unit amplitude time-harmonic radial traction on the sphere surface [6], with elastic material properties specified as \(\nu =1/3\) and \(\mu =1/4\) Pa. The \(N \propto k\) case was chosen to present problems which have a slow convergence and thus more clearly demonstrate the performance of each component of the iterative solver.

Table 1 The normalised solution time (\(\hat{t}\)), number of GMRES/fGMRES iterations (iter.) and the solution residual (res.) versus N for several iterative solver configurations. Solution times for each problem are normalised to the GMRES solution time, the fGMRES iterations are presented in the format outer/inner, and all residuals are presented with an order of \(\times 10^{-5}\)

It can be seen from Table 1 that the use of the nested outer–inner fGMRES solution reduces the total solution by 9–56 % compared to the GMRES solver. The use of the SAI preconditioner (fGMRES/SAI) for the inner GMRES loop provides mixed results, further reducing the total solution time of the fGMRES by 3–12 % for 4 of the problem sizes, while increasing the total solution time for the \(N=30,822\) and \(N=526,188\) problem. This is due to the SAI preconditioner changing the rate of convergence in the outer GMRES loop, while at the same time minimising the required number of inner GMRES iterations. The two slower solutions thus required one additional outer GMRES iteration to converge to the specified tolerance. Conversely, for the \(N=75,690\) and \(N=378,948\) problems, the use of the SAI preconditioner both reduced the total number of inner GMRES iterations and yielded a smaller final solution residual, indicating a faster rate of convergence. The effect of the SAI preconditioner is reduced for the larger problems, where the upper limit for the inner GMRES iterations is reached before the desired convergence tolerance is reached, and so the SAI does not reduce the number of inner iterations below this limit. Finally, the use of the fGMRES/SAI with translation stencils provides a further 5–20 % reduction in the total solution time compared to the fGMRES/SAI solution. The maximum computational reduction from the translation stencils is not reached in practise for the spherical scatterer, which has a simple surface geometry and so not all boxes are occupied in the octree structure. The translation stencils also slightly reduces the rate of convergence, as evinced by the generally increased solution residuals yielded this from solver, while the \(N=274,350\) problem with translation stencils also required one additional outer GMRES iteration to converge to the specified tolerance. In all cases the fGMRES/SAI solver with translation stencils provides equivalent or faster solution times than the other solution methods and is typically 2 to 2.5 times faster than the standard GMRES solution.

Figure 12 presents the outer GMRES single matrix-vector product calculation time, total solution time, memory usage in MB, and solution error versus N for element discretisations of 10 and 20 elements per \(\lambda _s\) (i.e. \(N \propto k\)) for the same problem configuration and material properties presented in Sect. 5.1. The wavenumbers range from \(k_s=13 \ldots 46\) and \(k_s=7 \ldots 46\) for the 10 and 20 element/\(\lambda _s\) discretisations respectively. Note that the standard deviations for the various least-squares data fits of the N-scaling are also presented. However the residual errors of these data fits should not be considered to follow a normal probability distribution due to there being a correlation between the octree structure and the calculation time/memory requirements.

Fig. 12
figure 12

Single matrix-vector product calculation time (a), total solution time (b), memory usage in MB (c) and solution error (d) versus the number of unknowns N for an increasing wavenumber, shown for discretisations of 10 and 20 elements/\(\lambda _s\)

Figure 12a shows that the calculation time per matrix-vector product for the full-accuracy outer GMRES loop of the fGMRES solution have algorithmic complexities of \(O(N^{1.3})\) and \(O(N^{1.1})\) for the 10 and 20 element/\(\lambda _s\) discretisations respectively, which is less than the theoretical algorithmic complexity of \(O(N^{3/2})\) [22]. However the total solution time of the FMBEM (Fig. 12b, which includes both the set up and iterative solution time, for which the later dominates: typically contributing \({>}95\,\%\) of the total solution time) show complexities of \(O(N^{2.5})\) and \(O(N^2)\) for the 10 and 20 element/\(\lambda _s\) discretisations respectively i.e. \(\ge O(N^2)\) complexity of the direct BEM solution (albiet with leading constants much less than 1). The larger algorithmic complexity of the total solution times is due to the slow convergence of the iterative solution, where a large number of inner GMRES iterations are required for solution convergence. Clearly a more efficient preconditioning strategy is needed. The FMBEM memory requirements for the 10 and 20 element/\(\lambda _s\) discretisations (Fig. 12c) show a less than N proportionality in both cases, while the r.e. norms (Fig. 12d) can be seen to be generally lower when using a larger number of elements per wavelength (as expected). The sharp increases in the r.e. norms for certain problem sizes is due to the wavenumbers for those element/wavelength configurations lying close to the eigenfrequencies of the adjoint interior problem, where the elastodynamic BIE (Eq. 1) is known to break down. A suitable stabilisation technique (for example [49]) is required to provide a stable solution at all frequencies for the exterior problems. Away from these eigenfrequencies the present FMBEM yields average r.e. norms of approximately 5.5 and \(3.7\,\%\) for the 10 and 20 element/\(\lambda _s\) discretisations respectively. For a similar problem configuration RMS errors in the range of 0.4–2.3 % [23] and 2–3 % [30] were observed when using at least 10 nodes per shear wave and a minimum octree box size of 0.3\(\lambda _s\) for high frequency elastodynamic FMBEMs employing linear isoparametric elements. Thus the present low frequency FMBEM yields solution errors which appear to be consistent with those of the high frequency models (at wavenumbers where both models are applicable), when considering the constant elements used in the present model.

Figure 13 shows the outer GMRES single matrix-vector product calculation time, total solution time, memory usage in MB, and solution error versus N for fixed wavenumbers of \(k_s=\) 5, 13 and 25 (i.e. ), again presented for the problem configuration and material properties as specified in Sect. 5.1. These wavenumbers provide results over a range of low to moderate frequencies, while the problem sizes yield discretisations ranging from approximately 10–100 elements/\(\lambda _s\).

Fig. 13
figure 13

Single matrix-vector product calculation time (a), total solution time (b), memory usage in MB (c) and solution error (d) versus the number of unknowns N for fixed wavenumbers of \(k_s=\) 5, 13 and 25

For the algorithmic complexity scenario both the calculation time per matrix-vector product of the FMBEM (Fig. 13a) and the total solution time (Fig. 13b) have an algorithmic complexity of N to a power less than one for all three wavenumbers, which is in good agreement with the O(N) theoretical estimate [22]. Clear sets of stepped increases in the calculation time can be seen in Fig. 13a, b, which correspond to the problem sizes which require a new octree level to be introduced to the structure to allow the problem to be solved with the specified maximum number of sources/elements per lowest octree box. The additional S|R translations required on each new octree level sharply increase both the calculation time per matrix-vector product and the total solution time. The problem size may then be made several times larger before another octree level is required and so there is only a small additional cost from building the source/receiver expansions as N is increased, while the cost of the required translations remains unchanged for the fixed wavenumber. The introduction of each new octree level corresponds to a downward step in the total memory required by the FMBEM (see Fig. 13c) as the part of the surface integrals that must be directly calculated and stored in the sparse near field matrices is reduced for the smaller boxes on the new octree level. Again the inverse relation between the solution speed (Fig. 13b) and the memory requirements (Fig. 13c) of the FMBEM is obvious. Each new octree level halves the separation distances between the source/receiver boxes while the element size is still relatively large compared to the box size, resulting in an initial increase in the solution error for the problems which introduce a new octree level (Fig. 13d). The solution errors then reduce with increasing N (i.e. an increasing number of elements per wavelength) until the problem size requires another octree level to be introduced to the octree structure. Figure 13d shows that the r.e. norms for all 3 wavenumbers are approximately 5–6 % for the 10 element/\(\lambda _s\) discretisation, while a minimum r.e. norm of 1–2 % is reached for the larger element/\(\lambda _s\) discretisations (depending on where the new octree levels are introduced). Clearly both the rate of reduction of the r.e. norms with increasing N and the achieved minimum residual is dependant on the exact wavenumber. The algorithmic/memory complexity of the present model is thus shown to be in good agreement with the theoretical estimates in the low to moderate frequency range.

5.4 Scattering of plane elastic waves by canyons

As an example of the application of the low frequency elastodynamic FMBEM (and to demonstrate the model for non-spherical geometry), displacement results are presented for the scattering of compressional plane waves from a semi-ellipsoidal canyons embedded in the free surface of a homogeneous elastic half-space. Numerical modelling of such configurations have been the focus of a number of studies, which include the use of elastodynamic BEMs [50], and elastodynamic FMBEMs [23]. The configuration of interest is shown in Fig. 14, where the canyon and a surrounding circular disc (of radius D) of the half-space boundary surface is meshed and a traction-free BC is applied to the surface. The semi-ellipsoidal canyon has semi-axes a, b, a in the xyz axis directions respectively. The incident \(k_p\) plane wave is defined by the angle \(\theta \) from the z-axis.

Fig. 14
figure 14

Schematic of the problem for the scattering of plane elastic waves by a canyon embedded into the free-surface of an elastic half-space

The incident displacement field in the half-space takes the form of a ‘free-field’ \(u_i^{F}\), a combination of the incident compressional plane wave and both the compressional and shear waves reflected from the half-space under a traction-free boundary condition [6, 37]. The incident free-field also induces an incident traction or surface stress \(t_i^{F}\) on the half-space boundary surface, which may be expressed in terms of the free-field displacement [50]. The introduction of an irregularity to the surface of the half-space results in a scattered displacement field \(u_i^{\mathrm{sca}}\) and so the total field \(u_i = u_i^{F} + u_i^{\mathrm{sca}}\). Under the traction-free BC \(t_i^{F} = 0\) on the free-surface while the scattered traction field \(t_i^{\mathrm{sca}}\) resulting from the surface irregularity must be 0 on the planar part of the surface and \(t_i^{\mathrm{sca}} = - t_i^{F}\) on the irregularity. The interior BIE for the scattered displacement field is thus [37]:

$$\begin{aligned}&\frac{1}{2}u_i^{\mathrm{sca}}(\mathbf{y})-\int _ST_{ij}^{\mathrm{T}}(\mathbf{x},\,\mathbf{y}) u_i^{\mathrm{sca}}(\mathbf{x})dS(\mathbf{x}) \nonumber \\&\quad =\int _CU_{ij}^{\mathrm{T}}(\mathbf{x},\mathbf{y})t_i^F(\mathbf{x})dS(\mathbf{x}) \end{aligned}$$
(40)

where S denotes the complete discretised surface (canyon + half-space) and C denotes the canyon surface.

Figure 15 shows the displacement results for a compressional plane wave impinging upon a semi-ellipsoidal canyon with semi-axis \(b=3a\) in the y-axis direction (see Fig. 14) at an oblique incidence (\(\theta = 30^{\circ }\)). The specified wavenumber is \(k_sa = 0.5\pi \) and the elastic half-space has material properties \(\nu = 1/3\) and \(\mu = 1\) Pa. The canyon and a surrounding circular disc of radius \(D = 6a\) of the half-space boundary surface is modelled with a 8372 element boundary mesh (\(N=25,116\)). The displacement results in Fig. 15 are shown along a cross section of the mesh from \(x=-3a \ldots 3a\) and \(y=0\). Good agreement is observed between the present FMBEM and the digitised results from a high frequency FMBEM [37].

Fig. 15
figure 15

Modulus of the \(u_z\) (vertical) and \(u_x\) (horizontal) total displacement components from a compressional plane wave impinging upon a semi-ellipsoidal canyon at an oblique incidence (\(\theta = 30^{\circ }\)) for a wavenumber of \(k_sa = 0.5\pi \). The displacement results are shown along a cross section from \(x = -3a \ldots 3a\) and \(y = 0\) and are compared to those from a high frequency FMBEM [37]

Canyon scattering results are finally presented for a large-scale problem which involves the same geometry, material properties and incident field configuration as the previous example, but for a higher wavenumber of \(k_sa = 2\pi \). The canyon and surrounding half-space (\(D = 6a\)) are modelled with 119, 278 elements (\(N=357,384\)). Figure 16 shows the modulus of the \(u_z\) (vertical) and \(u_x\) (horizontal) displacement fields from a top down view of the canyon. The present FMBEM solved the problem in 130 minutes, with 93 minutes required for the fGMRES solution and 37 minutes required for the problem set up, predominantly for the calculation of \(\mathbf{U}_{(NF)}\) and \(\mathbf{T}_{(NF)}\). The solution converged to a residual of \(6.40 \times 10^{-5}\) using 8 outer and 75 inner GMRES iterations in total, costing 226 s and 45 s per iteration respectively. The FMBEM required a total of 4.03 GB of memory for \(\hat{l} = 7\) (corresponding to \(~9.4\times 10^{-2}\lambda _s\) per \(\hat{l}\) box), with \(\mathbf{U}_{(NF)}\) and \(\mathbf{T}_{(NF)}\) each requiring 1.31 GB, and the SAI preconditioner requiring 342 MB. As a point of comparison, a similar problem with \(N=353,232\) was treated in Ref. [37] using a high frequency elastodynamic FMBEM (having an \(O(N\hbox {log}_2N)\) algorithmic complexity), which solved the problem using \(\hat{l} = 5\) and a single GMRES loop with a convergence tolerance of \({<}10^{-3}\) in 76 minutes. Thus the solution times of the present low frequency model and the referenced high frequency model would appear to be approximately comparable for this example in the mid-frequency range where both models are applicable. Of course the differences in programming language, computer hardware, parameters used in the FMBEM calculations (number of octree levels, GMRES tolerances, use of preconditioners etc.) and the different algorithmic complexities of the low and high frequency FMBEM models would all have significant effects on the calculation times for each model, and so the present analysis should only be considered a cursory comparison.

Fig. 16
figure 16

Modulus of the \(u_z\) (vertical) and \(u_x\) (horizontal) total displacement components from a compressional plane wave impinging upon a semi-ellipsoidal canyon at an oblique incidence (\(\theta = 30^{\circ }\)) for a wavenumber of \(k_sa = 2\pi \). The displacement fields are shown from a top–down perspective (\(-\)z-axis) with the semi-ellipsoidal canyon edge outlined in black

6 Conclusions

This paper presents a low frequency 3-D elastodynamic FMBEM model which has the following new features:

  1. 1.

    The multipole expansion and translation of the elastodynamic fundamental solutions were treated with the spherical basis functions and RCR translation method respectively, and this has required the development of new compact recursion relations to efficiently calculate the second-order Cartesian partial derivatives of the spherical basis functions.

  2. 2.

    The convergence of the iterative solution was accelerated via a novel combination of an outer–inner fGMRES method and SAI preconditioner matrix, where the inner GMRES loop solves a fast low accuracy inner FMBEM to a coarse convergence tolerance which is then used to precondition the full accuracy outer GMRES iteration. Furthermore the inner GMRES loop was also preconditioned via a SAI matrix to further accelerate the solution convergence.

  3. 3.

    Translation stencils were newly applied to the elastodynamic FMBEM and an implementation for the 8, 4 and 2-box stencils was shown to reduce the total number of translations that must be applied per octree level by up to \(~60\,\%\). This combination of schemes was shown to yield total solution times which were 2 to 2.5 times faster than the standard GMRES solution.

  4. 4.

    A characterisation of the algorithmic and memory complexities for this type of low frequency elastodynamic FMBEM model was presented for what appears to be the first time in the literature. The demonstrated algorithmic complexities were in good agreement with the theoretical predictions, while the model’s accuracy is consistent with that of the published high frequency elastodynamic models when considering the lower order elements employed here.

Finally, the present model was shown to provide stable results in the low wavenumber range (as expected), where the high frequency elastodynamic FMBEMs prevalent in the literature are known to break down.

Although the typically published CPU times per matrix-vector product for the present model satisfied the theoretical estimates, the total solution times showed an algorithmic scaling equal to or greater than \(O(N^2)\) when using a fixed number of elements per wavelength. This is due to the slow convergence of the solution, despite the combination of strategies employed to accelerate the solution convergence. A more efficient preconditioning strategy, such as the use of an analytical preconditioner, is expected to greatly improve the convergence of the iterative solution. Stability of the elastodynamic BIE for exterior problems and proper treatment of the singular integrals are other issues to be addressed. Finally the present low frequency elastodynamic FMBEM model may be extended to incorporate the high frequency expansions/translations at the coarser octree levels, yielding a broadband FMBEM model.