Keywords

1 Introduction

According to the World Health Organization, around 466 million people worldwide suffer from some hearing disorder, and 34 million of these are children [1]. When an auditory disorder is detected early, the treatments, e.g. prostheses or implants, tend to be more efficient.

The gold standard test for hearing screening is the pure tone audiometry. This test requires a patient’s behavioral response to obtain pure tone hearing thresholds. Since it depends on feedback from the individual, it is very difficult to perform it on patients who are unable to cooperate, such as babies and children. For these cases, objective, automatic audiometric techniques were developed and most of them use brain electric potentials evoked by external stimuli [2].

Auditory Steady-State Response (ASSR) is an evoked potential used for the objective prediction of hearing thresholds [3]. ASSR is elicited in the brain by means of repeated sound stimuli at a high rate so that the responses to each stimulus overlap. According to [4], the ASSR evoked by amplitude modulated tone is characterized by an increase in energy in the modulation frequency (and its harmonic) in the Electroencephalogram (EEG) power spectrum.

In an objective audiometry, the behavioral feedback of the patient can be replaced by the test of presence or absence of the ASSR. The tests can be performed statistically through Objective Response Detectors (ORDs), usually using the frequency domain by means of the Discrete Fourier Transform (DFT) [3]. The Magnitude-Squared Coherence (MSC) is an ORD frequently used to detect ASSR [5, 6]. ORD functions depend on the Signal-to-Noise Ratio (SNR) as well as on the degrees of freedom used in the detector’s estimation. In addition, the statistical threshold for the absence and presence decision is obtained based on the detector distribution under the Null Hypothesis, which is the lack of response.

The challenge of the objective audiometry is the sensitivity x specificity performance and test time. The test time is directly proportional to the epoch length. Thus, reducing the number of points of the epochs without losing performance is desirable in practical objective audiometry systems. Usually, the epoch length cannot be varied because it is predefined by the coherent sampling criterion, which is used for preventing spectral leakage. This method consists in adjusting the stimulus frequency in order to have an integer number of cycles for a fixed epoch length [7].

In this sense, this work proposes a new objective response detector based on coherence, least square and phase compensation. This new detector eliminates the need of fixed epoch length in the analysis and keeps the coherent sampling criterion.

2 Methods

2.1 Magnitude-Squared Coherence (MSC)

The coherence estimate of a deterministic and periodic input signal \(x\left[ n \right]\), representing the auditory stimulus, and the output signal \(y\left[ n \right]\), representing the EEG signal, depends only of the output signal, given as [5]:

$$M\hat{S}C = \frac{{\left| {\sum\nolimits_{i = 1}^{M} {Y_{i} (f)^{2} } } \right|}}{{M\sum\nolimits_{i = 1}^{M} {\left| {Y_{i} (f)^{2} } \right|} }},$$
(1)

where “^” denotes estimation, \(Y_{i} \left( f \right)\) is the DFT of the \(i\)-th epoch of the \(y\left[ n \right]\), \(f\) is the frequency of the input signal and \(M\) is the number of epochs used in the calculation.

In order to use this function as an ORD, the associated critical value must be found. It is a threshold in which values above it indicates response. Critical values are commonly obtained based on the inverse cumulative density function of the detector distribution under the null hypothesis (H0) of lack of response. Under the null hypothesis, \(y\left[ n \right]\) is assumed to be a Gaussian noise. Thus, the distribution of MSC under H0 is given by [8]:

$$M\hat{S}C\left( f \right)|_{{H_{0} }} \sim \beta_{(1,M - 1)} ,$$
(2)

where \(\beta_{{\left( {1,M - 1} \right)}}\) is the beta distribution with 1 and M − 1 degrees of freedom. Thus, the detection threshold is achieved by [9]:

$$MSC_{crit} = 1 - \alpha^{{\frac{1}{M - 1}}} ,$$
(3)

where \(\alpha\) is the given significance level. Thus, the ASSR is detected when \(M\hat{S}C > MSC_{crit}\).

2.2 New Detector

The MSC is an ORD that depends of the amplitude and phase estimations of the ASSR in different signal epochs. Estimates using DFT are affected by the spectral leakage and the coherent sampling is required, so only epochs lengths that have an integer number of ASSR cycles can be used.

The new detector uses the MSC, but it replaces the DFT with least square and phase compensation to estimate the ASSR spectral content.

Least Square: The least squares method is a mathematical optimization technique that finds the best fit for a data set that minimizes the sum of the squares of the differences between the estimated value and the observed data.

To use the least squares to extract this information, the signals from each window must be modeled as sinusoidal signals at the same frequency as the ASSR. The response model for the \(i\)-th epoch can be given by:

$$y_{i} \left[ n \right] = A_{i} \cos \left( {\frac{{2\pi f_{m} n}}{{F_{s} }} + \theta_{i} } \right),$$
(4)

where \({f}_{m}\) is the modulation frequency, \(n\) is the discrete time, \(F_{s}\) is the sampling frequency, \(A_{i}\) is the amplitude and \(\theta_{i}\) is the phase.

\(y_{i} \left[ n \right]\) can be rewritten as follows:

$$y_{i} \left[ n \right] = A_{i} \cos \left( {\theta_{i} } \right)\cos \left( {\frac{{2\pi f_{m} n}}{{F_{s} }}} \right) - A_{i} \sin \left( {\theta_{i} } \right)\sin \left( {\frac{{2\pi f_{m} n}}{{F_{s} }}} \right).$$
(5)

For the \(i\)th epoch with \(L\) samples we have the following matrix form:

$$ \left[ {\begin{array}{*{20}c} {y_{i} \left( 1 \right)} \\ {y_{i} \left( 2 \right)} \\ {\begin{array}{*{20}c} {y_{i} \left( 3 \right)} \\ \vdots \\ {y_{i} \left( L \right)} \\ \end{array} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\cos \left( {\frac{{2\pi f_{m} 1}}{{F_{s} }}} \right)} & {-sin \left( {\frac{{2\pi f_{m} 1}}{{F_{s} }}} \right)} \\ {\begin{array}{*{20}c} {\cos \left( {\frac{{2\pi f_{m} 2}}{{F_{s} }}} \right)} \\ {\cos \left( {\frac{{2\pi f_{m} 3}}{{F_{s} }}} \right)} \\ {\begin{array}{*{20}c} \vdots \\ {\cos \left( {\frac{{2\pi f_{m} L}}{{F_{s} }}} \right)} \\ \end{array} } \\ \end{array} } & {\begin{array}{*{20}c} {-sin \left( {\frac{{2\pi f_{m} 2}}{{F_{s} }}} \right)} \\ {-sin \left( {\frac{{2\pi f_{m} 3}}{{F_{s} }}} \right)} \\ {\begin{array}{*{20}c} \vdots \\ {-sin \left( {\frac{{2\pi f_{m} L}}{{F_{s} }}} \right)} \\ \end{array} } \\ \end{array} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {A_{i} \cos \left( {\theta_{i} } \right)} \\ { A_{i} \sin \left( {\theta_{i} } \right)} \\ \end{array} } \right]. $$
(6)

It can be rewritten as follows:

$${\varvec{y}}_{{\varvec{i}}} = {\varvec{M}}\left[ {\begin{array}{*{20}c} {R_{i} } \\ {I_{i} } \\ \end{array} } \right].$$
(7)

The parameters \(R_{i}\) and \(I_{i}\) can be estimated as follows:

$$\left[ {\begin{array}{*{20}c} {\widehat{{R_{i} }}} \\ {\hat{I}_{i} } \\ \end{array} } \right] = \left( {{\varvec{M}}^{{\varvec{T}}} {\varvec{M}}} \right)^{ - 1} {\varvec{M}}^{{\varvec{T}}} {\varvec{y}}_{{\varvec{i}}} .$$
(8)

The amplitude and phase estimation can be determined as follows:

$$\widehat{{A_{i} }} = \sqrt {\widehat{{R_{i} }}^{2} + \widehat{{I_{i} }}^{2} } ,$$
(9)

and

$$\widehat{{\theta_{i} }} = {\text{a}}\tan \left( {\frac{{\widehat{{I_{i} }}}}{{\widehat{{R_{i} }}}}} \right).$$
(10)

In this way, amplitude and phase estimations of the ASSR are independent of the epoch length. In epoch length that satisfies the coherent sampling, the least square estimates the same spectral content as the DFT.

Phase Compensation: Least square estimates ASSR amplitude and phase regardless of the epoch length, but in epoch lengths that do not have an integer number of ASSR cycles, the expected value for the phase in each epoch will be different. This is because the beginning of each epoch occurs at a different phase of the ASSR. For the least squares to be used in the MSC, phase compensation is proposed. The phase compensation predicts the difference in the expected value of the phase in each epoch and subtracts this value from the calculated, so that the phases computed in all epochs will have the same expected value.

This phase compensation is given by the increment in the calculated phases of the following factor:

$$\widehat{{\theta_{i} }} \leftarrow \widehat{{\theta_{i} }} + 2\pi \left( {i - 1} \right)\left( {1 - decimal\left( {\frac{{Lf_{m} }}{{F_{s} }}} \right)} \right),$$
(11)

where \(i\) is the index of the collected epoch and \(decimal\) is a function that extracts the decimal part of a number.

2.3 Windowing

The windowing weighting function is a well-known technique to minimize the effects of spectral leakage, especially in the presence of the power-line interference. This technique consists of multiplying the signal point by point by a weighting function before estimating the spectral content. When no windowing is used explicitly, this is equivalent to using rectangular windowing [10].

3 Material and Methods

The experiments were conducted in a soundproof booth, located in the Interdisciplinary Center for Signal Analysis (NIAS) at Federal University of Viçosa (UFV). This study was conducted on 5 healthy hearing adults (age range 21−29 years old). Each subject participated in 5 sessions that consist of the EEG recording during AM auditory stimulation, according to the protocol approved by the Local Ethics Committee (UFV/CAAE: 56346916.4.0000.5153). The subjects were instructed to sit comfortably, keep their eyes closed and not to fall asleep during the exam.

3.1 Stimuli

The volunteers were stimulated by an amplitude modulated tone. The carrier frequency was 1000 Hz and the modulation frequency was fixed in 37.5 Hz, in order to fit 64 cycles in an epoch of 1024 points, accordingly to the coherent sampling criterion (the sampling rate was 600 Hz). A modulation depth of 100% was used. The stimuli were generated digitally with CD quality and performed monaurally in the right ear, through a shielded cable coupled to an inserted earphone E-A-R Tone 5A (Aero Technologies). The intensity of the stimuli was calibrated by a pressure level meter to 70 dB SPL (Brüel and Kjær model 2250 with coupler 2 cc DB 0138, DKK).

3.2 EEG Data

The electroencephalographer BrainNet BNT 36 (Lynx Tecnologia, Brazil) was used for EEG acquisition. The parameter settings were 100 Hz low-pass filter, 0.1 Hz high-pass filter and sampling frequency of 600 Hz. The gold-plated electrodes, with 10 mm diameter, were connected to the signal amplifier and placed on the scalp with the assistance of an electrolytic gel. The electrodes positions were defined according to the International 10−20 System, with reference to electrode \({C}_{z}\) and ground on \({F}_{pz}\), in the derivations: \(F_{7} ,T_{3} ,T_{5} ,F_{p1} ,F_{3} ,C_{3} ,P_{3} ,O_{1} ,F_{8} ,T_{4} ,T_{6} , F_{p2} ,F_{4} ,C_{4} ,P_{4} ,O_{2} ,F_{z} ,O_{z} ,P_{z} ,A_{1} \, \text{and}\,A_{2} .\)

EEG Bipolar Derivation: These bipolar derivations are formed by the difference of potential between two scalp positions. In this case, the total number of available bipolar derivation is the pairwise combination of 22 electrode positions, which result in 231 bipolar derivations. Since each of the 5 participants repeated 5 times the recording procedure, then the number of bipolar derivation signals available for analysis was 5775. Each recording last about 1 min and 23 s, generating signals with 49 epochs of 1024 samples. Since each bipolar derivation contains a different intensity of the ASSR, then each one has different SNR. In other words, the procedure allowed analysis of 5775 EEG signals with different SNR levels, which improve the statistical significance of the results.

3.3 Epoch Length

The data were acquired to have 49 epochs of 1024 samples, resulting in signals with 50,176 samples. The use of the least square allowed to vary the epoch length to values other than 1024. The epoch length was varied between 4 and 4000 samples. For these epoch lengths where it is not possible to use all data, the last samples have been discarded.

Considering the results achieved in [10], the standard windowing and the best windowing to mitigate the effects of the spectral leakage was used, which is the rectangular and tukey windowing, respectively.

3.4 Performance Measurement

The performance measures were calculated considering all available EEG bipolar derivations. For each epoch length, the detection rate and false positive were calculated considering a significance level of 0.05.

Detection Rate: The detection rate was calculated by the percentage of the 5775 signals where the MSC detected the presence of ASSR.

False Positive: The false positive was computed by the detection rate at 20 frequencies corresponding to the 20 bins neighboring the modulation frequency bin for an epoch of 1024 samples. That is, the frequencies were taken in the range between 35.24 and 41.12 Hz.

4 Results and Discussion

The database signals have a fixed number of samples, which are 50,176 samples. The total number of epochs increase when dividing the signals into smaller epoch lengths. For most epoch lengths, it is not possible to use all 50,176 samples, which require the disposal of part of these samples. Part of the variation in the new detector performance for different epoch lengths may be associated with the amount of different data used, especially for larger epochs, which have greater disposal.

Figure 1 shows the detection rate for different epoch lengths obtained by analyzing the modulation frequency and neighboring frequencies, and also depicts the false positive rate using the new detector with rectangular windowing. Regardless of the epoch length, the false positive was below 5%, which was the expected value due to the significance level of 0.05. This result corroborates that one found in [10], which analyzed different types of windowing, but with the epoch length fixed in 1024. From Fig. 1, it can also be observed that the detection rate of the new detector with rectangular windowing is very sensitive to small variation in the epoch length and presents low detection rates for small epoch lengths.

Fig. 1
figure 1

Detection rate of the new detector calculated at the modulation frequency and on the neighboring frequency with different epoch length and using rectangular windowing

Figure 2 illustrates the detection rate for different epoch lengths obtained by analyzing the modulation frequency and neighboring frequencies, and also shows the false positive rate using the new detector with tukey windowing. It is observed that the false positives were close to the expected, except for smaller epoch lengths. This result agrees with [10]. From Fig. 2 it is also noticed that the detection rate of the MSC with tukey windowing is less sensitive to small variation in the epoch length, but presents low detection rates for small epoch lengths.

Fig. 2
figure 2

Detection rate of the new detector calculated at the modulation frequency and on the neighboring frequency with different epoch length and using tukey windowing

As data were collected in order to split the signals into 1024 samples epochs, it is possible to observe that smaller epochs can be analyzed, with the new detector, without loss of performance. This can be an advantage when using a sequential testing strategy, in which the time between one test and another is reduced [11,12,13].

Table 1 reports the detection rate and false positive determined for different epoch lengths using rectangular and tukey windowing. These epoch lengths correspond to those in which no data have been discarded and a fair comparison can be done. As expected from [10], the rectangular window is very sensitive to spectral leakage and presented false positives well below expectations, while the tukey windowing is more robust regarding spectral leakage and the false positives were close to the significance level for epochs larger than 256 samples. Using the tukey windowing, the new detector showed higher detection rates, but for epochs less than 256 samples, the false positives were less than the significance level.

Table 1 Detection rate and false positive for different epoch lengths and windowing using the MSC with least square and phase compensation

The data used in this work were collected in order to respect the coherent sampling criterion for epoch length of 1024 samples. Compared to this epoch length, the detection rate using the epoch length of 512 was 5.8% higher and using the epoch length of 256 was 10.2% higher.

The low detection rate presented by the detector is due to use of all EEG bipolar derivations, which includes several signals with low SNR.

5 Conclusions

In this work a new objective response detector was proposed. This new detector uses the MSC replacing the DFT by the least square and phase compensation. The advantage of this new detector is that it allows to analyze different epoch length.

The new detector does not require the use of the coherent sampling, but it is still sensitive to the presence of non-white noise such as the interference from the power-line. Windowing pre-processing continues to mitigate the spectral leakage even with the new detector. The tukey windowing presented the best result.

From the results it is possible to notice that small windows are not recommended to use, as there is no false positive control. It was also found that large epochs present lower detection rate. For the data used in this work, the best epoch lengths to use with the new detector were ranging from 256 to 512 samples.

In future works, the effects of varying the epoch length in sequential test strategies can be verified, in addition to using the least square with the compensation in different types of ORD beyond MSC.