Introduction

According to a widely accepted taxonomy (Shera and Guinan 1999), DPOAE components of different phase-gradient delays arise from two different generation mechanisms. The main component (distortion or D) of the 2f1f2 = fDP DPOAE is nonlinearly generated by cubic intermodulation distortion of the f2 and f1 tones. A backward distortion product (DP) wave is generated in the region around the tonotopic characteristic frequency place for f2 (x2), where the overlap of the mechanical response to the two primary tones is maximal. The forward wave generated by the same distortion mechanism acts as an intra-cochlear stimulus for the generation of the second component (reflection or R) from the fDP tonotopic region through a linear coherent reflection mechanism due to the presence of randomly distributed irregularities (roughness) along the basilar membrane (BM). The R component is therefore approximately equivalent to a stimulus-frequency OAE (SFOAE) or transient-evoked OAE (TEOAE) evoked by a stimulus generated within the cochlea. The two DPOAE components are characterized by very different phase-gradient delays, so they can be effectively separated by filtering in the time-frequency domain and separately studied either experimentally or using cochlear modeling tools. Indeed, in DPOAE acquisition paradigms where the ratio r is kept constant, the components D and R generated by wave- and place-fixed mechanisms are, respectively, characterized by constant and rapidly rotating phase in a scaling-symmetric cochlea. In this study, a fixed-f2 paradigm is considered instead, which slightly affects this prediction, but a separation of the two components is still generally possible in the time-frequency domain.

The two DPOAE components are also differently sensitive to r. For the nonlinear distortion component D, the overlap between the f2 and f1 excitation patterns increases with decreasing r, so the DPOAE level would tend to increase with the recruitment of a greater number of DPOAE generators. However, this effect is progressively balanced by the loss of coherence of the intra-cochlear backward DP wave as its constituent wavelets have increasingly different phases. The reflection component R remains unaffected as coherence is regained along the forward path to the place-fixed reflection place(s) (e.g., Shera and Guinan 2007). The typically dominant D component, at commonly used stimulus levels, determines the basic bandpass shape of the overall DPOAE response as a function of r (or fDP) for recordings using a fixed f2. This bandpass function has been proposed to be a manifestation of cochlear tuning by some (Brown et al. 1992; Brown et al. 1993), and second cochlear filter by others (Allen 1980). The second filter explanation found some credence as it could also account for the presumed discrepancy between the tuning measured on the BM and that derived from neural thresholds in early animal experiments. Although more recent experiments have shown nearly identical BM and neural tuning, at least in the basal part of the cochlea (as reviewed in Robles and Ruggero 2001), the existence of a second filter related to the mechanical action of other cochlear elements is still in consideration as a possible reconciliation between BM and neural measurements as well as a possible explanation for the results of fixed-f2 DPOAE experiments.

Interference-based explanations of the bandpass shape of the DPOAE response, in which no second filter is involved, have been proposed by several authors. Van Hengel (1996) was able to reproduce the bandpass DPOAE response with a nonlinear model numerically solved without a second filter in the time domain. Similar results were also obtained by Liu and Neely (2010) using their nonlinear cochlear model. Talmadge et al. (1998) used a perturbative approach to generate DPOAEs in a linear delayed-stiffness model, demonstrating that the bandpass shape was generated by interference by removing the phase dependence of the basis functions in the generation integral of the backward DP wave. The same mechanism highlighting the importance of the relation between the spatial width of the OAE generation region and the local wavelength of the traveling wave was also forwarded by Shera and colleagues (Shera 2003; Shera and Guinan 2008). Interference between elementary wavelets coming from extended cochlear regions is also relevant for interpreting the Allen-Fahey experiments (Allen and Fahey 1992), where DPOAE levels are reported as a function of ratio while holding the intra-cochlear DP level in the x2 region constant, as a method for estimating the cochlear amplifier gain function. Also in that case, the variable size of the DP generation region plays a relevant role (e.g., Shera 2003).

In this report, experimental data and numerical simulations of DPOAEs are presented as functions of r, with the explicit purpose of studying the behavior of the unmixed D and R components. An interpretation of the observed phenomenology is proposed, based on the increasing loss of coherence of the backward DP waves as the overlap region size increases. Theoretical quantitative relationships are found between cochlear tuning and the optimal value of r, where the DPOAE response peaks, based on a heuristic model of the DPOAE response. Equivalently, the relationship between tuning and the width of the DPOAE vs. ratio function is also examined. These relationships are successfully tested by analytical and numerical model simulations and subsequently applied to sample experimental data to get tuning estimates at different frequencies and stimulus levels.

Methods

Model

A linear transmission line cochlear model was used, based on that proposed by Zweig in 1991, in which the cochlea is schematized by a one-dimensional box model, with a tonotopic basilar membrane (BM) of surface density σbm separating two volumes filled by a homogeneous incompressible fluid. A nonlinear version (Sisto et al. 2015) of the same delayed-stiffness cochlear model was also implemented and solved in time domain with an iterative technique. Both models were discretized using N = 2000 partitions and tuned with an exponential scale-invariant map \( {\omega}_{bm}(x)={\omega}_0{\mathrm{e}}^{-{k}_{\omega }x} \), with ω0 = 2π·20,600 rad/s and kω  = 1.38 cm−1, representing the tonotopic structure of the human cochlea.

The equations for the differential pressure p and the transverse BM displacement ξ are those of a one-dimensional tonotopic transmission line:

$$ \left\{\begin{array}{l}\frac{\partial^2p\left(x,t\right)}{\partial {x}^2}={k}_0^2{\sigma}_{bm}\overset{\cdot \cdot }{\xi}\left(x,t\right)\\ {}\frac{p\left(x,t\right)}{\sigma_{bm}}=\overset{\cdot \cdot }{\xi}\left(x,t\right)+{\omega}_{bm}(x)\delta \overset{\cdot \cdot }{\xi}\left(x,t\right)+{\omega}_{bm}^2(x)\xi +{\rho \omega}_{bm}^2(x)\xi \left(t-\tau \right)\end{array}\right. $$
(1)

where δ, ρ, and τ are the parameters of the local BM admittance; k0 = 31 cm−1 (Talmadge et al. 1998) is a constant related to average geometrical and density parameters of the cochlear cavity; and σbm is the BM surface density. The linear version of the model (1) was solved in the frequency domain for different values of cochlear tuning. A double-pole structure near the peak of the admittance associated with the linear model (1) was assumed, in which the cochlear partition impedance has the form:

$$ \mathrm{Z}\propto {\left(\mathrm{s}-\widehat{\mathrm{s}}\right)}^2, $$
(2)

where s is the Laplace complex variable jω + Γand \( \widehat{\mathrm{s}} \)is the pole position in the complex plane. Following Shera (2001), Eq. (2) implies that both Z and its first derivative should be zero at the pole \( \widehat{\mathrm{s}} \).

$$ \mathrm{Z}=0;\kern1em {\left.\frac{\mathrm{dZ}}{\mathrm{ds}}\right|}_{\mathrm{s}=\widehat{\mathrm{s}}}=0 $$
(3)

From this condition, four independent equations can be derived setting four parameters, the three model parameters δ, ρ, τ, and the distance of the pole from the imaginary axis α*. Following Shera’s strategy, imposing the conditions (Eq. (3)) and setting the tuning value Q, it is possible to set the model parameters in such a way that the poles move horizontally in a direction that is approximately parallel to that of the real axis.

In the linear version of the model, which is solved in the frequency domain, Q is set as a constant value along the BM, and the four parameters of the model are computed accordingly, using an empirical relation (Sisto et al. 2016) between tuning and α*. The generation of DP waves, eventually producing DPOAEs at the cochlear base, was introduced as a perturbation, along the lines of the scheme outlined by Talmadge et al. (1998). In this scheme, the perturbative DP source function is proportional to the local value of the product of the displacement at frequency f1 and that at frequency f2 squared. An integral over the BM length is performed to compute how the wavelets associated with these sources propagate back to the cochlear base and add as complex vectors (i.e., in a phase-dependent way) to generate the total DPOAE signal. As the DP source term depends on the local displacement at both primary frequencies, the generation region coincides with the so-called overlap region around x(f2). No roughness perturbation was introduced, to focus on the D component of the response.

The nonlinear model was numerically solved in the time domain, using a fast algorithm described elsewhere (e.g., Sisto et al. 2015). The dynamic nonlinearity was introduced assuming a dependence of tuning on the instantaneous and local value of the BM velocity:

$$ {Q}_{nl}\left(\xi \right)={Q}_{pass}+\left({Q}_{act}-{Q}_{pass}\right)\left(1-\tanh \left(\frac{\overset{\cdot }{\xi^2}\left(x,t\right)}{{\overset{\cdot }{\xi}}_{sat}^2}\right)\right), $$
(4)

where Qpass and Qact are the constant tuning factors of the two asymptotic linear regimes approached by the nonlinear model, respectively, at high and low stimulus levels, and \( {\overset{\cdot }{\xi}}_{sat} \) is the BM velocity threshold value for the onset of nonlinear saturation phenomena.

Extracting Tuning Estimates From DPOAE Level vs. Ratio Curves

From an experimental viewpoint, one may distinguish between “direct” tuning estimates, in which the experimental output is the ratio between frequency and bandwidth of some bandpass response, and “indirect” estimates, in which tuning is estimated by measuring another physical quantity (e.g., the OAE group delay), and assuming the validity of some theoretical relation predicted by cochlear mechanics. Several methods have been proposed so far for measuring cochlear tuning, based on OAE measurements. The ultimate goal of these methods needs to be specified, in order to evaluate their performance. Behavioral tuning estimates, obtained with masking techniques, might be considered as the “gold standard” as they directly interrogate perception. However, estimates of such tuning capture any and all filtering that may occur in the auditory system. On the other hand, mechanical tuning in the cochlea is best defined by the ratio between frequency and bandwidth of the BM response to a pure tone, “directly” measured as a function of level and frequency. Behavioral tuning estimates obtained in humans are related to but not quantitatively identical to the underlying tuning properties of the BM—the differences between them arising from critical differences in measurement methods that evoke additional phenomena such as nonlinear suppression as well as the possible influence of additional filters in the auditory system.

Cochlear models provide a theoretical link between BM tuning and the properties of the OAE complex response (both amplitude and phase). Measurement of OAE suppression tuning curves (e.g., Gorga et al. 2011) perhaps most closely mimic the conditions under which behavioral tuning is measured with simultaneous masking. While nonlinear suppression shapes the outcomes of both experiments, the tuning estimates from behavioral and OAE suppression cannot be readily compared due to the lack of a valid theoretical framework connecting these measurements at different frequencies and stimulus levels. Indeed, the estimated tuning value depends on the level (relative to the maximum of the bandpass response) at which the width is measured, and the dependent variables (perceptual threshold and OAE levels) are not necessarily equivalent or proportional to each other. Other OAE-based tuning estimates (Shera et al. 2002, 2010; Moleti and Sisto 2003, 2016; Sisto and Moleti 2007; Sisto et al. 2013) are based on measurements of the phase-gradient delay, τ(f), of the OAE response by linear coherent reflection (SFOAE, TEOAE, or reflection component of DPOAE). In this case, cochlear models provide a relation between τ and BM tuning only, so one should consider that these indirect methods do not even attempt to reproduce behavioral tuning estimates, unless one assumes that BM tuning and behavioral tuning are the same quantity. Moreover, as a dependence of OAE delays (and therefore, tuning) on stimulus level has been observed (e.g., Sisto et al. 2013; Moleti and Sisto 2016), one must compare τ-derived estimates of tuning with behavioral tuning estimates at the same level of BM displacement.

As the DPOAE level vs. ratio curve shows a bandpass shape as a function of fDP, it is possible to get a direct estimate of tuning from the width of this curve, e.g., by measuring the equivalent rectangular bandwidth (ERB) and computing the ratio between frequency and bandwidth. This procedure may be applied to any curve with a bandpass shape, but what one estimates this way is just the “tuning” of the bandpass curve itself, whose relation with the tuning properties of the underlying resonant system, if any, is necessarily model dependent. In the case of the DPOAE level vs. ratio curve, whether the bandpass shape reflects any filtering properties of the auditory periphery has been questioned (e.g., Van Hengel 1996; Talmadge et al. 1998; Shera 2003; Shera and Guinan 2008). Nevertheless, a theoretical relation between the parameters of the bandpass response and the BM tuning may be used to get indirect estimates of BM tuning, either from the value of the optimal ratio of the response or from the width of the DPOAE level vs. ratio curve and BM tuning. A simple model was developed to get a heuristic relation between the parameters of the DPOAE level vs. ratio curve and BM tuning. The proposed schematization is a very simple one, chosen to highlight the physical explanation of the observed dependence on tuning of the optimal ratio and of the width of the DPOAE vs. r curve.

Assuming, for simplicity, equal amplitude (and, therefore, BM tuning Q) for the two primary tones, the spatial width at half maximum of each resonant response peak is

$$ \Delta {x}_Q=\frac{1}{{\mathrm{k}}_{\upomega}\mathrm{Q}} $$
(5)

The width at half maximum of the superposition region between the two primaries can be found with elementary geometrical considerations (see Fig. 1) as

$$ \Delta {x}_{over}=\Delta {\mathrm{x}}_{\mathrm{Q}}-\frac{{\Delta \mathrm{x}}_{\mathrm{r}}}{2}, $$
(6)
Fig. 1
figure 1

Schematic view, on a linear scale, of the dependence on ratio and tuning of the width of the overlap region. Because the wavelength of the 2f1-f2 traveling wave and the spatial extent of the peak region are determined by the mechanical tuning of the cochlea, whereas the width of the overlap region also depends on ratio, the main features (optimal ratio and the width) of DPOAE level vs. r curve are dependent on tuning, in a theoretically (model-dependent) predictable form.

where Δxr represents the spatial distance between the tonotopic places of the primary tones, related to r:

$$ {\Delta \mathrm{x}}_{\mathrm{r}}=\mathrm{x}\left({\mathrm{f}}_1\right)-\mathrm{x}\left({\mathrm{f}}_2\right)=\frac{\mathrm{r}\hbox{-} 1}{{\mathrm{r}\mathrm{k}}_{\omega }}=\frac{\beta }{{\mathrm{k}}_{\omega }}, $$
(7)

and

$$ \beta =\frac{\Delta \mathrm{f}}{{\mathrm{f}}_2}=\frac{\mathrm{r}\hbox{-} 1}{\mathrm{r}} $$
(8)

With decreasing ratio, significant negative interference starts to occur as the width of the DPOAE generation region exceeds half the fDP wavelength at the peak, as wavelets in phase opposition begin to cancel each other. Therefore, we predict that the maximum of the DPOAE vs. r curve should occur (approximately) for r where the width of the DPOAE generation region is equal to half the fDP wavelength at the peak. The fDP wavelength at the peak, for a second-order filter function, is

$$ \widehat{\uplambda}=\frac{2\uppi}{\widehat{\mathrm{k}}}=\frac{2\uppi}{{\mathrm{k}}_0\sqrt{\mathrm{Q}}} $$
(9)

From Eqs. (5), (6), (7), and (9), one may estimate the ratio corresponding to the maximum DPOAE level:

$$ \frac{\widehat{\uplambda}}{2}=\Delta {\mathrm{x}}_{over}\Rightarrow \frac{\uppi}{{\mathrm{k}}_0\sqrt{\mathrm{Q}}}=\frac{1}{{\mathrm{k}}_{\upomega}\mathrm{Q}}-\frac{\beta_{\mathrm{max}}}{2{\mathrm{k}}_{\upomega}}, $$

hence,

$$ {\beta}_{\mathrm{max}}=\frac{2}{\mathrm{Q}}-2\uppi \frac{{\mathrm{k}}_{\omega }}{{\mathrm{k}}_0\sqrt{\mathrm{Q}}}=\frac{2}{\mathrm{Q}}-\frac{b_0}{\sqrt{\mathrm{Q}}}, $$
(10)

where (using the cochlear parameters of Talmadge et al. (1998))

$$ {b}_0=2\uppi \frac{{\mathrm{k}}_{\omega }}{{\mathrm{k}}_0}\cong 0.28 $$
(11)

Solving the algebraic Eq. (10) for the square root of tuning, and keeping the positive solution, one gets a tuning estimate based on the optimal ratio:

$$ {Q}_{peak}={\left(-\frac{b_0}{2{\beta}_{\mathrm{max}}}+\sqrt{{\left(\frac{b_0}{2{\beta}_{\mathrm{max}}}\right)}^2+\frac{2}{\beta_{\mathrm{max}}}}\;\right)}^2 $$
(12)

A similar relationship may be found between the width of the DPOAE level vs. ratio curve and BM tuning, assuming that the minimum of the experimental curve on the low-ratio side corresponds to the condition in which the width of the generation region equals one wavelength:

$$ {\beta}_{\mathrm{max}}-{\beta}_{\mathrm{min}}=\frac{b_0}{\sqrt{\mathrm{Q}}}, $$
(13)

hence,

$$ {\mathrm{Q}}_{width}={\left(\frac{b_0}{\beta_{\mathrm{max}}-{\beta}_{\mathrm{min}}}\right)}^2. $$
(14)

In the analysis of experimental curves, a Gaussian fit to the data may be useful to minimize fluctuations of the estimated tuning. In this case, the ratio corresponding to the minimum on the low-ratio side may be approximated as that at two standard deviations from the peak.

It must be evaluated here how the crudeness of the schematization of Fig. 1 could affect the quantitative estimates of tuning one obtains using Eqs. (12) and (14): Several effects should be considered indeed:

  1. 1)

    The fDP wavelength in the generation region is longer than its value at the peak, by a factor dependent on Q and r, of order 2 at moderate ratios (e.g., for Q = 6 and r = 1.2);

  2. 2)

    The variation of the DPOAE phase coming from the spatial dependence of the primary tone phase within the overlap region is not negligible. This correction affects the evaluation of the “optimal” width of the overlap region.

  3. 3)

    The slope of the BM profile is not constant, as implicitly assumed in Fig. 1, and, more important, the effective width of the generation region is narrower than that obtained by the geometrical approach, because the DPOAE source is a cubic function of the local amplitudes at the primary frequencies, therefore much steeper than the overlap region plotted in Fig. 1. The latter effect would require correcting the width by a factor of order 2;

  4. 4)

    In Eq. (10), we use the same variable Q to denote tuning in the xDP region (related to the wavelength) and tuning in the overlap region (related to BM response spatial widths). Moreover, if the level of the f1 stimulus is higher than that of the f2 stimulus, as it generally happens in experiments, Eq. (6) still holds, but with ΔxQ relative to f1. Therefore, a frequency shift (which is not negligible for large ratios) occurs between the two terms involving tuning in Eq. (10).

Taking into account all the above uncertainties, the proposed method should be considered just as a “first-order” improvement, relative to the direct method, in which one just divides the frequency by the experimental width of the DPOAE vs. r curve. Any refinement of this method would be strongly model dependent, and would lead to much more complicated and entangled relations; therefore, we chose to use Eqs. (12) and (14) in their simplest form, without introducing any correction. The justification for this choice comes from (a) the results of the numerical simulations of the full cochlear model (see Fig. 5 in the “Results” section) and (b) the results of a semi-analytical computation, which both account for all the above-mentioned effects, to a different degree of approximation.

The proposed semi-analytical computation assumes that, in the frequency domain, the analytical form of the local DPOAE source is proportional to ξ12ξ2, and that the BM response ξ is proportional to k3/2. The wave vector function is approximated as \( k\left(\omega, x\right)=\frac{k_0\omega }{\sqrt{\omega^2(x)-{\omega}^2+ i\omega \gamma}} \), with γ = ω/Q. The DPOAE phase shift between x2 and at x2xover is computed for each frequency as the spatial integral of Re(k), including also the contribution 2ϕ1-ϕ2, and the “optimal ratio” condition is reformulated by requesting that between x2 and at x2xover:

  1. a)

    The amplitude of the DPOAE source drops by a factor of 2 and

  2. b)

    The phase of the DPOAE wave changes by π.

This request yields an implicit relation between tuning and optimal ratio, which has to be inverted numerically. One eventually finds a monotonic dependence of Q on rmax, which is not much different from that predicted by Eq. (12) (see Fig. 2). Despite the crudeness of the scheme of Fig. 1, the different effects seem therefore to partly compensate each other, because the difference between the two curves in Fig. 2 is smaller than 20 % for r between 1.1 and 1.4. Note that the physical explanation for the pseudo-resonant curve is the same in both schematizations, and that no adjustment of the cochlear parameters was made in both cases.

Fig. 2
figure 2

Comparison between the relation between the optimal ratio and tuning predicted by the rough schematization of Fig. 1 (Eq. (12)), and by a more accurate implicit relation between the two physical quantities, based on a semi-analytical computation

Data Acquisition and Subjects

Sample DPOAE measurements were performed on six young normal hearing subjects for primary levels of 65/55- and 55/40-dB SPL (L1/L2), using a fixed-f2 paradigm where f2 was fixed at 1, 2, 4, 6, 8, 10, and 12.5 kHz while f1 was linearly swept over 4 s to get f2/f1 ratios between 1 and 1.5. Responses were averaged for eight sets of measurements for each f2 frequency. Magnitude and phase of the total ear canal DPOAEs were derived using the least squares fit method. The DPOAE response may be represented as a function of r or as a function of fDP. In either case, it generally shows a bandpass pattern, with a broad peak around r = 1.2–1.3.

The same experiment was simulated numerically for f2 = 2 kHz, using both the nonlinear model and the analytical perturbative model, to further validate Eqs. (12) and (14).

Wavelet Filtering Technique

A time-frequency filtering method based on the wavelet transform (Moleti et al. 2012; Sisto et al. 2013) was applied for separating the different DPOAE components generated by different emission mechanisms, both for the experimental data and for the model simulations. In the fixed-ratio paradigm that is generally used for the DPOAE acquisition, the component D has nearly zero phase-gradient delay while the component R has rapidly rotating phase, as a function of the DP frequency. The filtering regions for the different DPOAE components are therefore delimited by hyperbolic curves in the time-frequency domain.

Results

The time-frequency representation of the DPOAE vs. fDP “spectra” show a distortion component with approximately zero delay, along with a second component with greater delay. The delay of this second component decreases approximately as 1/f2. This pattern is visible in the experimental data displayed in Fig. 3 (top), for the same subject, for two different values of f2 (2 and 6 kHz).

Fig. 3
figure 3

top Time-frequency representation of a human DPOAE vs. ratio response, for f2 = 2 kHz (left) and 6 kHz (right). The stimulus level is 65- and 55-dB SPL for f1 for f2, respectively. The green lines delimit the filtering region of the D component, while the yellow line is the upper delay limit for single-reflection R components. bottom Corresponding D components plotted as a function of ratio. With increasing frequency, both the optimal ratio and the width of the curve decrease, which, according to Eqs. (12) and (14), mean increasing cochlear tuning

After time-frequency filtering, the D component of the DPOAE response is plotted as a function of r in Fig. 3 (bottom) for the same data. The bandpass shape of the response is evident, sufficiently well-fit by a Gaussian function of frequency (dashed line). The data from this subject exhibit two general features: both the optimal ratio and the width of the curve decrease with increasing frequency, consistent with an increase in sharpness of BM tuning (Eqs. (12) and (14)).

The D component of the DPOAE response is also plotted against r for two subjects in Fig. 4 (top) for f2 = 2 kHz, for two different stimulus levels. One may note that, as the stimulus level increases, both the optimal ratio and the curve width increase, suggestive of decreasing sharpness of tuning. The D component obtained from the nonlinear model at f2 = 2 kHz is also shown in Fig. 4 (bottom left) as a function of ratio, for several stimulus levels. As in the experimental data, the maximum is shifted to larger ratios and the bandwidth increases as the stimulus level increases, probably related to lowering of effective tuning by nonlinear saturation. A very similar behavior is obtained (Fig. 4, bottom right) with the analytical linear model, in which tuning is explicitly changed as an external parameter.

Fig. 4
figure 4

top Filtered DPOAE D component as a function of r, fitted to a Gaussian profile, for f2 = 2 kHz for two different stimulus levels (65–55- and 55–40-dB SPL, thin and thick lines, respectively) from two different ears. bottom left Dependence on r of the DPOAE component D in the nonlinear model solved in the time domain, for f2 = 2 kHz and for four different stimulus levels of L2 between 40- and 55-dB SPL in 5-dB steps (lines of decreasing thickness). In the analytical linear model (bottom right), the overall tuning was varied between 3 and 8 with unit steps (lines of increasing thickness)

The theoretical relations between cochlear tuning, optimal ratio, and width of the DPOAE vs. ratio curve were tested on simulations obtained with the linear model. Cochlear tuning was directly estimated from the width of the computed BM response to a pure tone, and compared in Fig. 5 to the theoretical predictions of Eq. (12) (squares) and to that of Eq. (14) (circles). The overall agreement is quite satisfactory, considering the crudeness of the geometrical schematization leading to Eqs. (12) and (14), and the fact that no parameter has been tuned in the model. This agreement suggests that the tuning of the experimental DP vs. ratio curves can be directly used to estimate the sharpness of BM tuning.

Fig. 5
figure 5

Correlation between Q measured on the BM in the linear model and that estimated either from the width (Eq. (14), filled circles) or from the peak (Eq. (12), open squares) of the DPOAE vs. r response, for f2 = 2 kHz. The model tuning was varied between 4 and 10

In Fig. 6, we report the average of the tuning estimates obtained for six subjects as a function of frequency from the DPOAE vs. ratio curves at 65–55-dB SPL stimulus level, using three different methods. Open circles represent tuning computed from the optimal ratio, using Eq. (12); open diamonds represent tuning computed from the optimal and minimal ratio, using Eq. (14); and full squares represent tuning computed with a direct method, by dividing the frequency of the maximum by the bandwidth of the DPOAE vs. fDP curves. In any case, the method was applied to a Gaussian fit to the actual experimental curve. The three methods yield similar results, the second one being affected by larger numerical instability (due to the presence, in the denominator of Eq. (14), of the difference between two terms of the same order). Tuning generally increases with frequency, reaching a flat maximum plateau around 6–10 kHz. The fact that the absolute value of the direct tuning estimate is almost coincident with that obtained using Eq. (12) was not expected, because no adjustment had been applied to the parameters of Eq. (12), directly taken from Talmadge et al. (1998). In the “Methods” section, the difficulty of comparing tuning estimates obtained with different techniques had been highlighted. In the present case, the crudeness of the theoretical assumptions underlying Eqs. (12) and (14) suggests an even more cautious interpretation of such comparisons. Nevertheless, it may be noted that the behavioral estimates by Glasberg and Moore (1990), obtained with simultaneous masking, approximately represented by the dashed-dotted line, are not inconsistent with the results of the present study. The OAE-based estimates by Shera et al. (2002), obtained from SFOAE phase-gradient delay data, have similar frequency dependence and sharper tuning, in agreement with behavioral tuning estimates by the same authors (Oxenham and Shera 2003; not reported in Fig. 6), obtained with forward masking. On the other hand, the tuning estimates by Moleti and Sisto (2016), obtained from the phase-gradient delay of the DPOAE reflection components at the same stimulus level (55–65-dB SPL), are comparable in sharpness but show a steeper slope as a function of frequency.

Fig. 6
figure 6

Comparison between three different average tuning estimates obtained for six subjects from the DPOAE vs. ratio curve using Eqs. (12) and (14), and directly dividing the frequency of the peak by the bandwidth of the DPOAE vs. ratio curve. Error bars represent one standard deviation. For reference, two OAE-based tuning estimates, and a behavioral tuning estimate, are also shown. A small frequency shift, not present in the data, was added to enhance visibility of the error bars

Discussion

Overall, the DPOAE level vs. r functions obtained experimentally, and shown in Figs. 3 and 4, are more symmetrical than those predicted by both numerical and analytical models, which also tend to overestimate the shift of the maximum and the change in width with stimulus level. Nevertheless, both the shift of the maximum and the variation of the width are qualitatively well predicted by both models. The agreement between the experimental results and the analytical linear model, in which two-tone suppression effects are obviously not accounted for, suggests that interference phenomena play indeed a very crucial role in determining the behavior of the DPOAE vs. r curves. This behavior could be mostly attributed to a compromise between the positive effect of increasing the width of the overlap region and the negative interference among DP wavelets of different phases within the generation region. These observations further suggest that the shape of the DPOAE vs. r curves depends on a single parameter, i.e., the mechanical tuning of the BM, which is explicitly changed in the linear model, and changes with stimulus level in the nonlinear model. Therefore, the results of this study strongly support the possibility of using DPOAE vs. r curves for objective estimates of cochlear tuning. Interesting, in Liu and Neely (2010), bandpass DPOAE vs. r curves were obtained using a nonlinear cochlear model, showing the same dependence on stimulus level as demonstrated here, which we attribute to the dynamical change of tuning of the cochlear amplifier—an explanation applicable to their results as well.

A few studies have explored the DPOAE stimulus parameter range in detail, varying the f2 frequency, r, and both primary levels, L1 and L2 (Kummer et al. 1998, 2000; Johnson et al. 2006). In particular, Johnson et al. (2006) performed extensive measurements of the DPOAE response at selected frequencies f2 = 1, 2, 4, 8 kHz as a function of three parameters: L1, L2, and r. In such studies, the nonmonotonic dependence of the DPOAE level on L1, at fixed L2 and r, was interpreted as the result of nonlinear suppression of the f2 response by the f1 tone. Although one cannot rule out a significant role of suppression phenomena, a possible alternative explanation of this phenomenology is that the optimal ratio condition \( \frac{\widehat{\uplambda}}{2}=\Delta {\mathrm{x}}_{over} \) is fulfilled, at a fixed ratio, for a specific level L1, according to a variation of our simple “geometrical” model in which the condition L1 = L2 is relaxed.

The crudeness of the geometrical model used to estimate the relationship between BM tuning and the parameters of the DPOAE vs. r curve suggests that the tuning estimates obtained this way could be affected by rather large systematic errors. Another uncertainty comes from that on the numerical values of the cochlear parameters k0 and kω, which, in this study, have been taken from Talmadge et al. (1998). On the other hand, the proposed methods have been shown to be in rather good agreement with numerical simulations in which tuning was a controlled parameter, based on more realistic shapes of the BM response. The proposed method could be further refined and validated by animal experiments, in which a direct comparison between such tuning estimates and those obtained from direct measurements of the BM response width would be possible.

Other OAE-based techniques for estimating BM tuning, such as those based on SFOAE or TEOAE group delay (e.g., Shera et al. 2002; Sisto et al. 2013; Moleti and Sisto 2016), have been proposed. The comparison shown in Fig. 6 between the tuning dependence on frequency obtained with behavioral techniques, with OAE-delay-based techniques, and with those based on our analysis of the DPOAE vs. ratio curves, shows that the latter are in rather satisfactory agreement with behavioral tuning estimates obtained with simultaneous masking. On the other hand, Fig. 6 also shows how different theoretical assumptions significantly affect the steepness of the estimated dependence of tuning on frequency. It may be profitable to focus on the frequency dependence estimates, as comparison of absolute tuning values is difficult, for the previously discussed reasons. For example, Moleti and Sisto (2016) seem to overestimate this steepness, probably because the assumption that the wavelength in the peak region scales as the square root of tuning is strictly valid in a long-wave limit only. This assumption is present also in the present model (Eq. (9)), which could slightly affect the steepness of the tuning dependence on frequency estimated by Eq. (14) also, and to a lesser extent, that estimated following Eq. (12). Generally speaking, the methods based on the DPOAE vs. r curve are less sensitive to this assumption because they measure the DPOAE D component, whose generation source amplitude and phase is also sensitive to the f1 primary tone, for which the long-wave approximation is more accurate in the DP generation region. Equation (12) is even less sensitive to this assumption, because tuning enters in two terms of Eq. (10), and only one of them contains the long-wave assumption. This brief discussion was meant to emphasize that, in addition to opening the possibility of using different independent techniques to get objective estimates of cochlear tuning, such comparisons are also important to validate or falsify the models of the cochlea that provide the theoretical basis for the proposed methods.

The fact that the DPOAE R component typically shows a less sharply peaked dependence on ratio, or even a monotonic dependence (see, e.g., Botti et al. 2016), further supports the above interpretation of the behavior of the D component. Indeed, the beamforming mechanism (Shera and Guinan 2007) predicts that the phase shifts producing negative interference for the D response would be canceled in the residual path of the forward DP wave that is partially reflected in the x(fDP) region to generate the R component.

Although the theoretical explanation for the bandpass shape of the DPOAE vs. ratio curves that is provided in this study is consistent with theoretical models and experimental data, an alternative (or complementary) explanation, based on the existence of a mechanical second filter, cannot be ruled out, also because its predictions are not clearly stated. It would be useful to design experiments capable of discriminating between the two hypotheses.

Conclusions

The peaked dependence of the DPOAE (D component) level on the primary frequency ratio, considered as evidence for the existence of a second cochlear filter, could alternatively be explained by a simple linear interference phenomenon: with decreasing ratio, as the width of the overlap region increases, destructive interference causes the decrease of the DPOAE D component response. A simple model was developed that assumes that destructive interference occurs as the overlap region starts exceeding half the local wavelength of the distortion product. On the basis of this simple geometrical model, a relation was derived between the ratio corresponding to the maximum of the DPOAE vs. ratio curve and the tuning factor along the BM. A similar relation holds between tuning and the width of the same curve. The width of the DP vs. ratio curves was estimated using a perturbative linear model and a nonlinear model solved in time domain, confirming the relation with the width of the BM profile obtained with the heuristic geometrical model. These findings also support the idea of using measurements of the experimental DPOAE vs. ratio curve for objective cochlear tuning estimates in humans. The application of the method to human subjects yields tuning values (and dependence on frequency and stimulus level) that are in reasonable agreement with previous tuning estimates.