1 Introduction

The extended reality (XR) industry has seen a tremendous boom in the past 8 years, with its frontiers stretching into various domains and applications. Spatial audio is a fundamental ingredient of immersive virtual reality scenes, which provide spatial information of auditory events [1, 2]. Virtual spatial audio can be synthesized using perceptual or physical methods. The three-dimensional auditory display has been recreated through earphones in perceptual spatial audio, while physical methods utilize loudspeakers placed at multiplanar locations. Most commonly, perceptual spatial audio is generated using head-related transfer functions (HRTFs).

HRTFs define how human anthropometric features transform the sound waves from different spatial locations while reaching the ear. Generally, it is measured as the alteration in sound pressure when audio waves travel from a certain point to the ear canal. In most of the studies, the directional dependence of HRTF was given more emphasis, while the distance dependence was overlooked [3,4,5,6]. Thus, predominantly HRTF was measured at different points on an imaginary spherical surface with the listener at the center. However, due to the nonlinear distance characteristics of hearing, the HRTF in the entire spherical volume has attracted much attention from the research community. A series of works on close-range sound perception and associated nonlinear complexities was reported in [7, 8]. Incorporating proximity region auditory effects might improve the plausibility of virtual auditory scenes while recreating moving and close-range auditory sources [9, 10]. The synthesis of the distance-dependent HRTF can play a vital role here. In this article, the following definitions have been used to denote the distance-dependent spherical regions with the listener as the center: (i) proximity region: radius within one meter, (ii) distal region: radius more than one meter, (iii) near-field region: radius within one wavelength, and (iv) far-field region: radius more than one wavelength. Please note that proximity and distal regions do not depend on sound source frequency, while near and far-fields depend.

Conventionally, experimental methods are employed to measure the HRTF [4], and the procedures are very tedious and require considerable human endeavor. Most of the available HRTF databases are distance-independent, owing to the substantial experimental cost of measuring high-resolution HRTF in the entire spherical volume around the listener [3,4,5,6, 11, 12]. However, it must be measured in proximity regions and distal regions to understand and incorporate the distance dependency of HRTF [13]. While measuring distance-dependent HRTF, the massive number of measuring positions and poor directivity of loudspeakers in close ranges aggravate the intricacy of the experimental procedure. Some experimental calculations of distance-dependent HRTF have been reported using tiny sound sources like micro-dodecahedral loudspeakers, spark noise, or spark gap [14,15,16,17]. However, in all these works, the process was very demanding, expensive, and required hours of human endurance. The computational solutions are feasible substitutes for measuring high-resolution distance-dependent HRTF. The numerical tools can be convenient considering the advancement in data processing power and solving methods in past decades. An early computational attempt to solve the HRTF problem at a fixed distance was reported using the boundary element method (BEM) on scanned geometric models [18, 19]. Poor speed and insufficient accuracy at very high frequencies were the main disadvantages of these simulations. Irregular mathematical errors were also reported in BEM solutions, and additional algorithms were incorporated to eliminate them [20]. It gets trickier with larger geometrical models at higher frequencies. Moreover, the implementation of BEM for assessing the HRTF in whole spherical volume can be even more complicated due to the absence of local interconnectivity of elements in BEM [21].

This study proposes two computational solutions incorporating finite element tools and exterior acoustic techniques to measure high-quality distance-dependent HRTF. The major challenge associated with the finite element method (FEM) is the high computational resource for meshing the whole acoustic domain at high frequencies, maintaining discretization requirements [22]. While measuring distance-dependent HRTF, the massive volume of the finite domain must be meshed to accommodate the entire space around the listener. In the proposed simulation tools, finite domain volume has been limited and combined with exterior acoustic domains employing techniques of infinite elements (IFE) or non-reflective boundary conditions (NRBCs) using absorbing layers. The truncated bounded region is meshed adaptively with frequency for optimal computational performance. The evaluated high-resolution HRTFs have been compared with experimental data and BEM solutions. The proposed methods also enable the effortless incorporation of bioacoustic properties into the computational models of HRTF. Appropriate middle ear attributes like ear canal absorption coefficients make computed HRTF more congruent with the experimental data.

2 Hybrid Computational Methods: Background, Theory, and Formulation

As discussed in the introduction, HRTF has generally been assessed at a fixed boundary surface from the listener. Hence, the boundary element method (BEM) is the conventional simulation technique to evaluate HRTF as the formulation usually involves discretization of nothing other than domain boundaries [23]. Thus, fewer equations are involved in BEM due to the reduction of the problem’s dimensionality. But the inherent nonlocal connectivity of elements in BEM formulations usually gives less structured and fully populated matrices that reduce the expected efficiency [21]. In the beginning, regular BEM was employed in HRTF measurement with massive computational time, even up to 50 days for narrow bandwidths [18]. Later, the speed of BEM was increased to a certain extent by accommodating fast multipole methods [24,25,26]. However, its limitation in addressing the acoustic problems that require the evaluation of volumetric fields, such as distance-dependent HRTF, has not been tackled well. BEM’s computational cost and storage requirements for sizeable exterior domain problems are enormous and sporadically provide non-unique solutions at some frequencies [23]. Moreover, incorporating appropriate acoustic attributes of the ear canal, hair, skin, and cloths of the listener into HRTF is complicated using BEM owing to the lack of local interconnectivity. Hence the extension of the integral boundary solution to the whole acoustic volume, which is required for evaluating distance-dependent HRTF, is not straightforward.

On the other hand, the finite element method (FEM) can be effectively implemented for large volumetric fields as a common algebraic eigenvalue problem. The numerical advantage of having sparse matrices substantially accelerates the computations and reduces the memory requirement in FEM. Due to this computational edge, FEM is definitively competitive with BEM even with its higher-order formulations. Additionally, FEM can provide more accurate solutions due to inherent local connectivity [27]. The advantages of FEM over BEM in these predicaments have been well described in the literature [21, 27].

However, efficiently modeling a large acoustic volume in an unbounded space is a critical challenge in FEM. Various exterior acoustic techniques should be incorporated with FEM to handle this. It is vital to prevent spurious reflections at boundaries when transforming FEM formulation from unbounded space to bounded domain. Otherwise, it may pollute the whole solution. Infinite elements and non-reflective absorbing layers can be employed proficiently at the finite region (bounded domain) boundary for this purpose. In the infinite element method, the solutions in the exterior domain have been directly given by the infinite nodes at the finite region boundary using its shape functions. The finite meshes can also be truncated by non-reflective absorbing layers satisfying Sommerfeld conditions. Consequently, far-field estimation techniques such as Ffowcs Williams Hawkings (FWH) method are applied to estimate HRTF in the far-field. Brief theoretical formulations of the proposed techniques are described in the coming sections.

2.1 Domains for Finite Element Formulation

Consider a scattering object of arbitrary shape \(\mathcal{H}\) with surface \({S}_{\mathcal{H}}\) in an unbounded domain \(\mathcal{U}\) as shown in Fig. 1.a. The problem is governed by the Helmholtz differential equations [27, 28]. Additionally, Sommerfeld radiation condition must be satisfied, which means there are only outgoing waves at infinity. The problem can be formulated as,

$$ \nabla^{{2}} p + k^{2} p = 0\;{\text{in}}\;{\mathcal{U}} = {\text{R}}^{3} {\backslash \mathcal{H}} $$
(1)
$$ \frac{\partial p}{{\partial n}} + \beta p = g\;{\text{on}}\;S_{{\mathcal{H}}} $$
(2)
$$ \mathop {\lim }\limits_{r \to \infty } r\left( {\frac{\partial p}{{\partial r}} - ikp} \right) = 0 $$
(3)

where \(k\) is the wavenumber; \(r= \| x\| \) where \(x\) is the radial distance from the sound source; \(p = {\mathrm{e}}^{-i\omega t}\) is the acoustic pressure with \(\omega \) as natural frequency;\(\frac{\partial p}{{\partial n}}: = \nabla p\) where \(n\) is the gradient in the outward direction perpendicular to \({S}_{\mathcal{H}}\); \(\beta \left(x;k\right), g\left(x;k\right)\) are frequency-dependent complex boundary information functions [27].

Fig. 1
figure 1

Finite element formulation: a unbounded space and b bounded domain

The FEM formulation of a huge unbounded finite volume requires impractical computational capacity. To reduce the unbounded region's volume and consequently curtail the meshing load and computational cost, it is necessary to divide the unbounded regions into bounded (\(\mathcal{F}\)) and external regions (\(\mathcal{E}\)).

$${\mathcal{U}} = {\mathcal{F}} + \mathcal{E} .$$
(4)

The bounded region is modeled with finite elements, and the bounded and external regions are divided by an artificial boundary \({S}_{\mathcal{F}}\) as shown in Fig. 1.b. The solutions at finite region (\(\mathcal{F}\)) can be evaluated using FEM, and the solution at different points in the external region (\(\mathcal{E}\)) is estimated through the far-field expansion of the solution at the surface \({S}_{\mathcal{F}}\).

2.2 Boundary Formulation

There are different approaches to model the artificial boundary \({S}_{\mathcal{F}}\) and exterior regions (\(\mathcal{E}\)). Finite region truncating tools like infinite elements and non-reflecting boundary conditions using absorbing layers are illustrated in [27]. These techniques in combination with finite elements for bounded volume can be formulated as described in the next section.

2.2.1 Finite Elements with Infinite Elements Method (FIEM)

In the infinite element method, a single convex surface \({S}_{\mathcal{F}}\) is placed at the boundary of the finite region (\(\mathcal{F}\)) with outer layers extended till infinity, as shown in Fig. 2. The infinite element method for wave problems was established in [29]. The external domain (\(\mathcal{E}\)) is discretized using a collection of infinite elements. Each node in the boundary is attached to an infinite element in \(\mathcal{E}\). The methods for matching the regions \(\mathcal{F}\) and \(\mathcal{E}\) are well explained in [30]. The finite region field has been evaluated using the finite element method and induced on the surface \({S}_{\mathcal{F}}\). Assume the sound pressure \(p\left(x,k\right)\) follows Helmholtz equation (1), with boundary conditions (5) and (6).

$$ \nabla p \cdot {\text{n}}_{{\mathcal{F}}} = - \rho a\left( {\theta ,\phi } \right). $$
(5)
Fig. 2
figure 2

Hybrid method 1: finite elements in combination with infinite elements (FIEM)

Equation (5) is the kinematic condition on \(\mathcal{F}\) for a steady time-harmonic normal acceleration, \(a\left(\theta ,\phi \right){e}^{-i\omega t}\).

$$ \nabla p \cdot n_{E} = ikp + \eta $$
(6)

where \(\eta = \mathrm{O}\left(1/{X}^{2}\right)\) as \(X\) approaches infinity with Sommerfeld radiation condition.

A trial solution can be developed using variational formulation and discretization in the \(\mathcal{E}\) domain, as given by (7),

$$ p\left( {x,k} \right) = \mathop \sum \limits_{\mu = 1}^{m} \mathop \sum \limits_{\nu = 1}^{n} q_{\mu \nu } f_{\nu } \left( {r,k} \right)g_{\mu } \left( {\theta ,\phi } \right) $$
(7)

where \({g}_{\mu }\left(\theta ,\phi \right)\) is global shape function of finite region FEM solution on the surface \({S}_{\mathcal{F}}\); \({f}_{\nu }\left(r,k\right)\) is the radial interpolation function, \({q}_{\mu \nu }\) gives the nodal coefficient values of the pressure at corresponding nodes, i.e., \(\upnu \) th node on a radial path extended from \(\mu \) th node on the surface \({S}_{\mathcal{F}}\) as shown in Fig. 2 [29].

The infinite element formulation is a function of material properties, interpolation order, and the coordinate system. The sufficient convergence condition for IFE to work adequately is that the finite domain \(\mathcal{F}\) and sound sources should be enclosed within the surface \({S}_{\mathcal{F}}\). Interpolation order is also an important criterion for accurate simulations.

2.2.2 Finite Elements with Absorbing Layers Method (FALM)

Another technique for transforming the unbounded problem into a bounded problem is implementing non-reflective boundary conditions (NRBCs) through absorbing layers. Perfectly matched layers (PMLs) belong to the family of absorbing layers with NRBCs. The PML was first developed in the domain of electromagnetics and later widely modified for acoustic problems [31,32,33,34,35]. PMLs do not reflect any wave regardless of its angle of incidence, which gives it an extra edge over infinite elements. The reflectionless characteristics may provide better accuracy in PML based methods. It has been reported in earlier works that PML may provide satisfactory results even if the truncation of the finite region is in the near-field region and can accommodate non-homogeneous situations [34].

In PML, an exterior layer (\(\mathcal{L}\)) of finite thickness has been introduced at an external boundary \({S}_{\mathcal{F}}\) of the finite domain (\(\mathcal{F}\)), as shown in Fig. 3. Consequently, the waves are truncated by the finite-absorbing layer using complex variable change, also known as the stretching process. The distance and direction of the stretching should be computed for modeling the stretching function. The PML can be defined in all coordinate systems. The wave equation has to be modified with absorbing material properties to implement PML. A detailed formulation can be found in the literature [27, 32, 35], and the wave equation can be revised as,

$$ \nabla \cdot \left( {D\nabla p} \right) + k^{2} sp = 0\; {\text{in}}\; {\mathcal{L}} $$
(8)
$$ s = s_{1} s_{2} s_{3} $$
(9)
$$ D = \left( {\begin{array}{*{20}c} {s_{2} s_{3} /s_{1} } & 0 & 0 \\ 0 & {s_{1} s_{3} /s_{2} } & 0 \\ 0 & 0 & {s_{1} s_{2} /s_{3} } \\ \end{array} } \right) $$
(10)

where \(D\) is a complex-valued material tensor with coefficients \({s}_{i}\left({x}_{i}\right)=1-\left(i{\sigma }_{i}\right)/k\) whereas \({\sigma }_{i}\left({x}_{i}\right), i = \mathrm{1,2},3\dots \) are absorption functions.

Fig. 3
figure 3

Hybrid method 2: finite elements in combination with non-reflecting boundary conditions using absorbing layers (FALM)

Conventionally, the value of absorption function gradually increases inside the layer \(\mathcal{L}\) toward the outward direction. The PML equation and absorption functions are fully compatible with finite element data structures and can be easily incorporated into the FEM tools. In addition to PML, external region field values should be measured using near-to-far field estimation techniques; in this work, Ffowcs Williams Hawkings (FWH) technique is employed.

FWH technique can be applied to predict far-field pressure generated by the distributed volume source induced by the finite element domain. It can be considered as an extension of the Kirchhoff problem. The essential character of the Kirchhoff problem is finding an expression for wavefield from the given surface boundary conditions, and FWH is an advanced version of this. The theoretical concept of FWH is well described in the literature [36]. The fundamental FWH formulation has the impermeability condition that the waves should not pass through the surface on which FWH equation is applied. However, if the equation is extended by relaxing the impermeability of the surface, FWH equations could be used to measure the far-field degree of freedom (DoF) from the near-field estimations [37, 38]. For that case, FWH formulation can be applied to an arbitrary and imaginary mathematical surface that divides the domain into the near-field and the far-field. Then the field value on the imaginary surface is the only required parameter for far-field calculations. A complete derivation of the FWH equation used for near-field to far-field estimation can be found in [37,38,39,40], and formulation can be given as,

$$ { \boxdot }^{2} p^{\prime} = \left( {\partial /\partial t} \right)\left\{ {\left[ {\rho u_{n} - \left( {\rho - \rho_{0} } \right)v_{n} } \right]\delta \left( f \right)} \right\} $$
(11)
$$ - \left( {\partial /\partial x_{i} } \right)\left\{ {\left[ {\rho \left( {u_{n} - v_{n} } \right)u_{i} + pn_{i} } \right]\delta \left( f \right)} \right\} + \left( {\overline{\partial }^{2} /\partial x_{i} \partial x_{j} } \right)\left[ {T_{ij} {\text{H}}\left( f \right)} \right] $$
(12)

where wave operator \({ \boxdot }^{2} = \left[ {\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {c^{2} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${c^{2} }$}}} \right)\left( {{\raise0.7ex\hbox{${\partial^{2} }$} \!\mathord{\left/ {\vphantom {{\partial^{2} } {\partial t^{2} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\partial t^{2} }$}}} \right)} \right] - \nabla^{2} ,\;p^{\prime} = \left( {\rho - \rho_{0} } \right)c^{2}\) \(\rho\) is the density, \(c\) is the speed of sound, \({u}_{n},{v}_{n}\) are velocities of normal fluid and surface, \({T}_{ij}\) is Lighthill stress tensor, \(p\) is surface pressure on \(f\)= 0, \(H(f)\) is Heaviside function and \(\overline{\partial }/\partial \) is generalized differentiation with \({p}^{\mathrm{^{\prime}}} = {p}^{\mathrm{^{\prime}}}\) outside \(f=0\) and \({p}^{\mathrm{^{\prime}}}=0\) inside \(f=0\). The acoustic pressure at different locations in the entire external region (\(\mathcal{E}\)) can be evaluated from sound pressure fields on the boundary surface (\({S}_{\mathcal{F}}\)) using the FWH formulation.

3 Implementation Using Adaptive Frequency Meshing

The two FEM-based methods (FIEM & FALM) are simulated to evaluate HRTF and validate the accuracy and performance compared to other HRTF data. For validating the proposed methods in the distal region, FIEM and FALM are implemented using surface meshes of the human upper body provided by the SYMARE database (Sydney-York Morphological And Recording of Ears Database) [4, 41]. The measured and BEM synthesized HRTFs along with corresponding scanned surface meshes of the subjects are available in the SYMARE depository. However, due to the dearth of proximity region HRTF data with corresponding surface meshes, proximity region experimental HRTF were measured using manikin created for an Indian subject. These experimental measurements were then compared to HRTFs computed by FIEM and FALM.

3.1 3D Modeling of Human Head and Experiment Preparations

The computational analysis of HRTF requires high-quality surface meshes of the human upper body. The surface meshes of the subject’s head are generated using a handheld Artec 3D space spider scanner. A similar capturing process was reported in [42]. The maximum scanning rate of the spider scanner is 15 frames per second, and it produces a high-grade scan. The Artec scanner’s software tool uses a 'global registration algorithm,’ which automatically aligns multiple 3D scan data to create an initial mesh [43]. Later, Meshlab [44], an open-source tool, is employed to improve the mesh without missing any essential details. Some inner parts of the pinna are very arduous to scan, and they are approximated. Mesh resolution and quality matrices are assessed using Meshlab tools [44]. The quality and resolution mapping tool, Per face quality, is used to visualize two standard mesh quality criteria [45] (i) ratio of the triangles’ area and length of the largest side (as shown in Fig. 4a) and (ii) ratio of inscribed and circumscribed ball radii (as shown in Fig. 4b). Both parameters were of high quality, as illustrated in Fig. 4 and, indicate that the mesh has the high resolution required for computational modeling. The scanned head model is first attached to a generalized torso mesh, as shown in Fig. 5.a. Then, the combined model is used to measure distance-dependent HRTF using FIEM and FALM.

Fig. 4
figure 4

Quality mapping of mesh model. a Ratio of the triangles’ area and length of largest side and b ratio of inscribed and circumscribed ball radii

Fig. 5
figure 5

Experiment arrangements: a surface mesh created for numerical methods and b experimental setup in anechoic chamber

For measuring HRTF through experiments, a physical dummy model is 3D printed from the scanned surface mesh. The dummy is printed as two different parts—the ears with acrylonitrile butadiene styrene (ABS) and the remaining head with polystyrene foam. The two ears and head are then attached to a generalized torso made of fiberglass, as shown in Fig. 5b.

A low-cost HRTF experiment setup was constructed, as shown in Fig. 5.b. The measurements were carried out in a fully anechoic chamber (5 m × 5 m × 3 m) with a cutoff frequency of 200 Hz and noise rejection ratio (with respect to outside) of 65 dB. The reverberation time of the room (T60) at 1600 Hz was evaluated to be 250 ms. The speakers (Sony SRS-XB10/BC) were fixed equiangularly (18°) on a rotatable arc. The distance from the listener to the audio source can be adjusted using arcs of different radii. The pre-polarized microphones (PCB Piezotronics 130F20 Preamplifiers) were placed in the ear canal opening of the dummy. The center of the head was aligned to the center of the arc, viz., the origin. HRTF was measured using logarithmic sine sweep signals. Sine sweep signals with frequencies ranging from 0.2 kHz to 20 kHz were used for measurements. A longer source signal (32 k) was chosen to improve the signal-to-noise ratio. Signal acquisition was executed with NI 9234 sound and vibration input module at a sampling rate of 51,200. The signal acquisition was repeated twelve times to reduce the noise level and increase the consistency. A rectangular window of 32 k was applied to the measured signal. HRTF calculation was carried out using fast Fourier transform with an nfft length of 32,768. HRTFs were calculated using (13). Sound pressure levels (SPLs) were taken at four different distances (\(r\) = 25 cm, 50 cm, 75 cm, and 100 cm) from the origin in various directions. The post-processing was performed using Matlab™. The SPL measured at the center when the dummy was absent has been used as the reference measurement.

$$ {\text{HRTF}}\left( {\theta ,\phi ,r,\omega } \right) = \frac{{{\text{Sound Pressure}}_{ear} \left( {\theta ,\phi ,r,\omega } \right){ }}}{{{\text{Sound Pressure}}_{{{\text{r}}e{\text{f}}}} \left( {\theta ,\phi ,r,\omega } \right)}}. $$
(13)

3.2 Numerical Calculations

The acoustic principle of reciprocity is used for the computational process. It asserts that acoustic source and microphone locations in HRTF measurement can be swapped, and the proof of the theorem is already established in the literature [46]. Hence, the acoustic source is placed near the ear canal, and sound pressures are measured at different locations in the spherical volume around the listener. This technique facilitates numerical computation by reducing the number of acoustic sources into one.

The element size is a significant concern in any numerical computation, and six elements have been chosen per wavelength in the FEM simulations. Without truncating finite domains, the implementation of FEM is almost impossible for audible range simulations. Above 5 kHz, FEM simulations require enormous computational resources and impractical time, as seen in Table 1. Therefore, the viable simulation methods, (i) finite elements with infinite elements method (FIEM) and (ii) finite elements with absorbing layers method (FALM), were performed for the 3D scanned surface mesh of Indian subject and SYMARE depository meshes [4, 41]. Although we could reduce the mesh volume of the finite domain using FIEM and FALM, the number of elements can still be huge at higher frequencies owing to the minimum volume of the meshes required to avoid truncation within the near-field.

Table 1 Computation time of different finite element simulations

3.3 Adaptive Frequency Meshing

It is always better to place truncating boundary surface just outside the near-field of the sound source for accurate results by capturing all near-field effects and reflections from different anthropometric features of the listener. At higher frequencies, mesh volume should be reduced as much as possible for faster simulation. In FALM, the presence of the PML mesh slightly raises the computational cost. A frequency-based adaptive meshing is an optimal approach to reduce the volume of the finite meshes without trading off the accuracy. For tackling the challenge of evaluating the distance-dependent HRTF in the entire audible range with limited computational resources, the mesh volume and element size must be optimized for each frequency band. Hence, 0.2–20 kHz frequency spectra are divided into different bands. The lowest frequency of each band decides the volume of the meshes. The optimal element size is determined by the highest frequency in each band. The mesh thickness, defined as the minimum radial distance between exterior and interior boundaries of the mesh, is the major component in determining the volume of adaptive finite and PML domains. One wavelength thickness has been employed as a rule of thumb for creating each frequency band. Adaptive frequency meshing of PML and finite region in each frequency band is visualized in Fig. 6. As described above, the mesh volume of the finite region and PML decreases, and element size increases when frequency increases (from band 1 to band \(n\)). FIEM is employed with an interpolation factor of five to evaluate radial functions, as discussed in Sect. 2.2.1.

Fig. 6
figure 6

Adaptive meshing of finite meshes in FALM: the inner mesh is finite region and outer mesh is PML. The frequency increases from left to right (from band 1 to band n)

In the initial acoustic simulations, the human body is considered an acoustically rigid model (absorption factor is zero). The monopole source is placed at the ear canal opening, and sound pressure at different locations in proximity and distal regions is evaluated using FIEM and FALM. The reference HRTF simulations are carried out by placing the source at the origin without a 3D model. Simulations are performed using Actran™ [47], and later, results are post-processed using Matlab™. For understanding the mesh quality dependence of the proposed methods, a reduced quality (doubled the maximum element size) surface mesh was created in Meshlab, and simulations were repeated.

4 Results and Discussion

The HRTF spectral information like peaks and notches are significant and generally considered as the cues for median plane localization [48]. The relative position of frequency components is important in comparing various HRTF data [49]. For effectively analyzing the spectral distributions of different HRTFs, three analytical expressions—frequency scaling difference (FSD), spatial correlation metric (SCM), and spatial magnitude difference (SMD)—were determined during the post-processing of the results. SCM is measured as the mean of the correlation of frequency responses of two different functions over the whole spatial region. SMD is the difference in magnitude of the frequency responses of two different functions over the spatial domain. Consider two HRTF data, \({HRTF}^{(1)}\) and \({HRTF}^{(2)}\). SCM and SMD of these data sets can be calculated using Eqs. (14) and (15), respectively.

$$ \begin{aligned} {\text{SCM}} & = \frac{1}{N}\sum\limits_{{j = 1}}^{N} {\frac{1}{M}} \\ & \quad \sum\limits_{{i = 1}}^{M} {\frac{{\left( {{\text{HRTF}}_{i}^{{\left( 1 \right)}} - \overline{{{\text{HRTF}}^{{\left( 1 \right)}} }} } \right)\left( {({\text{HRTF}}_{i}^{{\left( 2 \right)}} - \overline{{{\text{HRTF}}^{{\left( 2 \right)}} }} } \right)}}{{\sigma _{{{\text{HRTF}}^{{\left( 1 \right)}} }} \sigma _{{{\text{HRTF}}^{{\left( 2 \right)}} }} }}} \\ \end{aligned} $$
(14)
$$ {\text{SMD}} = \left| {{\text{HRTF}}^{\left( 1 \right)} } \right| - \left| {{\text{HRTF}}^{\left( 2 \right)} } \right| $$
(15)

where \(N\) is the number of spatial data points, \(M\) is the number of spectral data points, and \(\sigma \) is the standard deviation.

Frequency scaling difference is a measure of similarity between two HRTF spectra, and its measurement is well described in [49]. FSD value provides the amount of frequency scaling that must be applied to an HRTF to give the best spatial and spectral correlation with another HRTF. FSD value closer to 1.0 gives better correlation [49]. For example, an FSD value of 0.99 or 1.01 between two functions indicates the same level of correlation and would mean that they are better correlated than two other functions with an FSD value of 0.9 or 1.1.

The azimuthal angles of 0°, − 90°, 180°, and 90° represent the acoustic sources in the front, to the right, in the back, and to the left, of the listener, respectively. The elevation angles of 0°, 90°, and − 90° represent the sources on the line-of-sight plane, at the top and the bottom of the listener, respectively. The HRTF was not measured in the region below − 45° elevation angle, and it was linearly interpolated for visualizations.

4.1 Comparison with the SYMARE Database Measurements with Proposed Methods in the Distal Region

BEM-based HRTF and experimental HRTF from the SYMARE database [4, 41] were compared with HRTFs simulated by FIEM and FALM. In most spatial directions, results were congruent and matched well with BEM results, as shown in Fig. 7. The spectral shape of the experimental data also matches well with computational results, but a difference in amplitude is observed. The computational methods adequately captured the distribution pattern of peaks and notches in the HRTF spectrum. The FSD was evaluated as 1.023, 1.026, and 1.032 for FALM, FIEM, and BEM, respectively, for subject-01 of the SYMARE database. It indicates slightly better congruity between experimental data and calculated HRTFs by proposed methods than BEM, mainly owing to the advantage of FEM in capturing near-field effects. Additionally, FALM and FIEM provided slightly better SCM of 0.8768 and 0.8667 than BEM’s 0.8528. Similar trends were also observed for other subjects in the SYMARE depository. Good frequency correlation (0.85–0.9) between experimental results and numerical methods has also been achieved, as seen in Fig. 8.

Fig. 7
figure 7

HRTFs comparison in distal region: HRTFs measured at 100 cm using different methods FALM, FIEM, Experiment measurement from SYMARE database (EXP) and BEM measurement from SYMARE database at (azimuthal angle, elevation angle): a 0°, 0°, b 180°, 0°, c − 90°, 0°, and d 90°, 0° (HRTF measured for the right ear of subject-01 in the SYMARE database)

Fig. 8
figure 8

Frequency correlation: the frequency correlation of experimental HRTF with FALM, FIEM, and BEM

The numerical results showed more peaks at higher frequencies and higher amplitude in whole spectra compared to the experimental results. The overestimation of magnitude in all numerical measurements compared to SYMARE’s experimental HRTF (measured directly with human subjects) might be due to the acoustic impedance effects of the subject's skin, hair, and clothes in experiments, as suggested in earlier studies [50]. The authors presume that the acoustic reflections from the ear canal walls in the simulation are significant contributors to this difference.

For inferring the contribution of middle ear reflections to the overestimation of magnitude of simulated data, computations are repeated incorporating specific ear canal properties. The simulations can exempt most of the sound reflections occurring inside the middle ear based on the following assumptions: (i) the middle ear absorbs the sound pressure waves as vibrations during the hearing, and (ii) experimental HRTF is measured at the ear canal opening; hence it does not include middle ear reflections. Based on this rationale, the inner part of the ear canal wall is modeled with a high absorption coefficient in simulations. Hence, a frequency-independent absorption factor of 0.8 has been included for the ear canal walls, and other parts remained acoustically rigid in the simulations.

The integration of the middle ear model improved the congruity of the spectral components of simulated results with the experimental results, as shown in Fig. 9. Now, the amplitude of synthesized HRTF at mid-high frequencies is in the same magnitude levels as experimental HRTF. It has also minimized the extra peaks in higher frequencies which did not appear in experimental results. FSD and SCM are also improved to 1.011 and 1.018 for FALM and FIEM with ear canal modeling. The SMD between hybrid methods and experimental HRTF shows a better match than the SMD between BEM and experiment results, as shown in Fig. 10. The SMD greater than 10 dB is only present at certain locations compared to almost the entire spatial region for BEM.

Fig. 9
figure 9

Comparison between ear canal modeling: HRTF measured for right ear at (azimuthal angle, elevation angle). a 0°,0°, b 180°,0°, c − 90°,0°, and d 90°,0°

Fig. 10
figure 10

Spatial magnitude difference: SMD evaluated at 8 kHz and 12 kHz

Spatial frequency response surfaces (SFRS) [51] have been created for computational and experimental HRTFs, as shown in Fig. 11. FALM and FIEM show good agreement with experimental results, especially in the frequency regions below 12 kHz. At higher frequencies, especially in contralateral locations, even experimental data have higher noise levels and might contribute to the differences in SFRS. Most of the peaks and notches are well captured by FALM and FIEM. The magnitude level is also in good agreement with BEM, owing to middle ear modeling. It implies that these methods can be used as accurate substitutions for HRTF experiments. Still, there is more scope for improvement in FIEM and FALM. Accounting for the bioacoustic properties of skin, hair, etc., can help reduce the differences between the experimental and computational models. The magnitude difference at lower frequencies, as shown in Figs. 9 and 11, is mainly due to the computational models with limited bioacoustic attributes [50]. A percentage of sound waves passes through the skull and mouth in a real-life scenario; it is also not considered in the numerical analysis here. Additionally, middle ears and inner ears are highly sophisticated biological parts. They require meticulous computational models with attributes like eardrum inclination, frequency-dependent absorption factor, ear wax presence, etc., compared to the simple model used in this work, based on a single attribute of frequency-independent absorption factor. Thus, integrating these factors would make the computational methods more accurate and perfect replacements for experimental methods. BEM tools usually do not assure local connectivity of elements and also require complex modeling procedures to add intermediate bioacoustic parts between the sound source and the surface on which boundary integrals are evaluated [21, 50]. In contrast, the proposed methods have higher scope and convenience of bioacoustics modeling because the modeling of complex connected parts is much easier using locally interconnected computational elements employed in FEM [21].

Fig. 11
figure 11

SFRS comparison: spatial frequency response surfaces of the right ear HRTFs of subject-01 from the SYMARE database

For comparing the spatial hearing accuracy of different HRTFs, subjective listening analysis is more appropriate. But the audio perception studies are out of the scope of present work, and the HRTF comparisons are limited to analytical comparisons of the spectral features of different HRTFs as in [49]. From the results of various psychophysical experiments [48, 52], it is clear that the frequency distribution of peaks and sharp notches in the HRTF spectra plays a significant role as localization cues. Hence, it is meaningful to examine different HRTFs analytically and compare the different spectral elements. In the next phase of validation, experimental HRTF in proximity regions are compared to corresponding simulation results.

4.2 Comparison with the Experimental Measurements with Proposed Methods in Proximity Region

The simulated proximity region HRTF of the Indian subject using FALM and FIEM are compared with experimental results. As mentioned earlier, there are limitations in conducting accurate proximity region acoustic experiments. Different factors like the directivity of the speakers, poor microphone responses at higher frequencies influence the experimental results. Nevertheless, as shown in Fig. 12, both spectral features of the experimental and computational HRTFs agree satisfactorily in the proximity regions, especially in below 12 kHz. The first peak and notch are perfectly aligned in the proximity region, especially for the ipsilateral sides. It is also noted that FALM results are in better agreement with experimental results than FIEM results, especially at higher frequencies. As mentioned earlier, PML shows better performance in eliminating unwanted reflections and better in accommodating near-field effects of the sound source. These reasons contribute to the better accuracy of the FALM, especially in proximity regions.

Fig. 12
figure 12

Proximity region comparison of HRTFs: the HRTFs were calculated for the manikin using FALM, FIEM, and experimental measurement method (EXP) at (azimuthal angle, elevation angle, distance from the center of the listener’s head)—a 0°, 0°, 100 cm, b 0°, 0°, 75 cm, c 0°, 0°, 50 cm, d 0°, 0°, 25 cm e 180°, 0°, 100 cm, f 180°, 0°, 75 cm, g 180°, 0°, 50 cm, h 180°, 0°, 25 cm, i − 90°, 0°,100 cm, j − 90°, 0°, 75 cm, k − 90°, 0°, 50 cm, l − 90°, 0°, 25 cm, m 90°, 0°, 100 cm, n 90°, 0°, 75 cm, o 90°, 0°, 50 cm, and p 90°, 0°, 25 cm

4.3 Computational Time

The massive advantage of FIEM and FALM over the BEM methods is the high-resolution volumetric HRTF measurement with reduced computational cost. Initially, BEM methods reported a computational time of 50 h per frequency [18]. Later, employing the fast multipoles, the simulation time was reduced to 5 h [24]. On the other hand, the simulation of FIEM took nearly 2 h, and FALM took 3 h with 68 frequencies. Without any adaptive meshing, it perhaps could take many weeks for the simulations to converge. However, the authors believe that comparing computational speeds on different systems and surface meshes is not meaningful. Nevertheless, both of these proposed methods could calculate the whole region HRTF in a time period that is comparable to the time BEM took for calculating HRTF on a spherical surface alone. In the future, with cloud computing and parallel processing capabilities, this may reduce to even shorter simulation run times, in the order of several minutes.

4.4 Mesh Criteria

In the BEM-based HRTF approach, the field integrals are calculated on the head surface mesh, requiring high-resolution surface mesh for accurate results. But in FALM and FIEM methods, the head surface is not used for integral measurements, which provides a slight advantage to these methods as they could give good results with reduced quality head surface meshes. Only contralateral regions have significant spatial magnitude differences while simulating with reduced quality mesh (maximum element size has doubled), as shown in Fig. 13. The errors are more in the contralateral region than the ipsilateral sides, mainly because of the head shadowing and directivity of the ear pinna [4]. It has enormous benefits because capturing the high-quality head surface mesh is challenging for producing individualized HRTF. This method can be applied with low-resolution meshes to produce acceptable results without massive accuracy degradation. This is one of the remarkable capabilities of this method.

Fig. 13
figure 13

Mesh quality comparison: SMD between high-quality and reduced-quality mesh simulations

At times, BEM produces critical frequency errors; hence it may require a unique approach and collocation process for each head surface mesh [19, 20]. The proposed methods have not shown similar compatible issues at mesh-dependent critical frequencies, owing to the mathematical superiority of finite element methods. Hence, they can also be implemented much more facilely for automated simulation because it does not require extensive pre-processing.

4.5 Spatial Resolution

The generation of high-resolution distance-dependent field measurements is another advantage of the proposed methods. HRTF at 91,801 points within a 1.25-m radius spherical region was calculated within four hours. It is almost a hundred times the number of the calculation points in regular HRTF measurements. It means that 3D audio rendering tools using this HRTF do not need additional algorithms for the smooth rendering of moving and proximity region objects as in [9]. The methods can contribute to the studies on distance-dependent human hearing and proximity region effects of anthropometric features, owing to the high-resolution whole region measurement [53, 54].

5 Conclusion

This work demonstrates the successful implementation of two finite element-based numerical methods, viz FALM and FIEM, for measuring distance-dependent HRTF with good accuracy and low computational resource requirement. The proposed methods also showed computational convenience in incorporating the absorption factor for the middle ear. Likewise, it can accommodate bone conduction into HRTF computation, provided that magnetic resonance imaging (MRI) is utilized to create internal models of the head. Additionally, the integration of acoustic properties of hair, skin, and clothes is also feasible in the proposed finite element-based methods.

FALM and FIEM can be constructively utilized to understand various diffraction and reflection patterns of the human body in proximity regions, hence convenient in hearing perception studies. The distance dependence of sound perception is a comparatively less explored region of research, and appropriate computational tools can be beneficial. Evaluating personalized distance-dependent HRTF using these methods with less human effort and time can be very beneficial in perceptual spatial audio rendering technologies, especially for moving and proximity region sound sources. Another advantage of the proposed methods is their better performance with lower-quality mesh surfaces. As a result, the photogrammetry tools can be easily incorporated into these simulation techniques to create surface meshes from 2D images of the subjects and measure personalized HRTFs with less effort. Besides the application of FALM and FIEM in understanding HRTFs of human adults, they could be employed in the studies pertaining to children or even other members of the animal kingdom, especially mammals and birds, where the experimental approach is impractical.

As the virtual auditory applications expand their wings, the proposed simulation tools for evaluating distance-dependent HRTFs can be vital for the faster generation of personalized spatial audio. Along with fast-growing computational speed, cloud computing technologies, and advances in machine learning, simulation-based production of spatial audio can play a significant part in the future extended reality applications.