Introduction

The middle ear and the Eustachian tube (ET) work as a functional unit. The ET provides ventilation, protection, and drainage. The most important muscle of the ET is the tensor veli palatini, which opens the anterior fibrocartilaginous part along two-thirds of the auditory tube. An impaired ET is an important factor in the pathogenesis of various middle ear diseases. Although several methods have been developed to probe ET functioning, such as tympanometric, manometric, radiologic, endoscopic, and sonometric approaches [15], a clinically useful test for evaluating the actual state of ET functioning is still lacking.

Sonotubometry is a non-invasive method that offers an assessment of ET functioning under physiological conditions. The technique is based on a sound signal being introduced into the nose, and on passage through the ET its effect on the external ear canal during tubal activity being recorded and evaluated.

Several studies have shown that noise pollution from pharyngeal motion was a major obstacle for reliable detection and identification of the probe signal in the ear canal. These studies were performed using either sinusoidal or broadband signals as the stimulus. By and large most researchers, including our group, have used pure (i.e., mono-frequency) tones; see for example [68]. Despite some promising results sonotubometry using pure-tone signals failed to gain wider acceptance as a routine method, due to technical shortcomings and its low reliability. Only few studies related to the use of broadband signals exist [9, 10]. While in principle being more appropriate for testing ET function, the methods based on broadband signals could not overcome the aforementioned problems of noise pollution either.

In the present study the use of a special class of broadband signals, so-called perfect sequences, is investigated. Perfect sequences (PSEQ) are periodic, deterministic random noise signals with an ideally flat spectrum [1113]. Such noise signals are mainly used for system identification and acoustic echo control in mobile communication systems [14, 15]. In contrast to other broadband signals such as white noise, with PSEQs all frequency components are stimulated equally during every period. In addition, in combination with specific system identification algorithms, its particular features allow for dynamic estimation of impulse responses and to devise, in principle, a dynamic virtual tube model for further investigation [16, 17]. This gives PSEQ a theoretical advantage over all other mono- and multi-frequency signals used in sonotubometry thus far. Note, however, that the evaluation of the virtual tube model does not constitute part of the present study.

The individuals, who were tested in this study, using our new PSEQ sonotubometry approach, are part of a larger cohort that had been tested using pure-tone signals as stimulus. It was demonstrated that using sinusoidal signals only 55% of the actual openings could be detected; clearly, such low positive-result identification is insufficient to be acceptable in clinical application [5]. In this context, the aim of the present study was to introduce PSEQs into sonotubometry, as a novel type of stimulus signal, and to evaluate whether this innovation would provide the desired improvements for reliable clinical use.

Materials and methods

Subjects

The study involved 20 otologically healthy subjects: 14 men and 6 women. The mean age of the study group was 28 years (range 8–52 years). No subject had a history of ear disease. Before applying sonotubometry a complete clinical examination of head and neck was performed on all subjects, with special attention to the ear, the nose, and the nasopharynx. Subsequent to the clinical examination all subjects were tested using pure-tone audiometry, tympanometry, and rhinomanometry. Note that patients with pathological findings were excluded from the study.

The study was approved by the ethics committee of the Aachen University and conducted in accordance with the Declaration of Helsinki. A written consent to undergo the new test methodology was obtained in every case.

Measurement procedure

The measurement procedure was carried out in a quiet environment to avoid background noise interfering with the results. Before testing, all ear wax was removed from the external ear canals under visual microscopy control. Each subject received an induction into the test procedure, instructing them specifically to avoid, as much as possible, any kind of undesirable background noise (clenching the teeth together, collecting of saliva before swallowing, etc.). All subjects were asked to report any changes in their auditory perception, including changes in loudness and the location of the sound variation (left or right ear).

As stated above, perfect sequences (PSEQ) were used as the stimulus signal. The acoustic stimulus signal was coupled into the nasal cavity through a small loudspeaker (professional earphones) for the duration of 15 s in each measurement cycle. The sound was directed along the floor of the nose. A miniature microphone was placed into the outer ear canal for recording the signal transmitted from nose to ear.

During the course of one measurement procedure four different maneuvers were performed in sequence, constituting four separate measurements, in order to induce an auditory tube opening. The different maneuvers performed were (1) yawning, (2) dry swallowing, (3) water swallowing, and (4) the so-called Toynbee maneuver (pinching the nostrils during swallowing). In every subject measurements were taken before and after nasal decongestion (induced by four drops of xylometazoline 0.1% in each nostril). The aforementioned duration of 15 s of the stimulus signal provided sufficient time so that the subjects could make multiple attempts at provoking a tubal opening, and to judge whether there was any subjectively noticeable change in the auditory perception of the measurement signal in the ear. As multiple maneuvers of the same type were carried out during a single measurement, normally multiple openings of the ET were observed.

During a measurement the temporal position of the maneuvers was signaled by the subject by activating a button. The button was pressed at the start of a maneuver and released at the end of it. The time periods during which the button was pressed are visualized on the sonotubograms by colored vertical bars (see Figs. 4, 5, 6, 7, 8). The audio data were recorded and stored for later analysis; some important metadata were added as well, like the age, the side (left/right) the measurement was performed, and whether the subject recognized a variation in loudness and tone characteristics during the measurement. Each sonotubogram was analyzed with respect to its characteristic shape, objective increase in sound pressure level, and opening duration of the auditory tube for any of the four maneuvers.

For evaluation, the sonotubometric results were combined with the subjects’ perception of tone and loudness variation, in order to obtain an accurate picture of ET function. Note that tube openings were defined as an increase of sound energy at the microphone location of at least 5 dB. To increase the reliability of the analysis the registered tube openings were considered only to be valid when simultaneously the subject also reported a subjective sound level increase.

Sonotubometry device

For this study, a custom-made device was developed, including the measurement software (Fig. 1); the system was centered on a standard-laptop. An external soundcard (PreSonus FIREBOX, PreSonus Audio Electronics, Inc., Baton Rouge, Louisiana, USA) was used as the acoustical interface. The stimulus signal was coupled into the nose via professional earphones (Shure E3, Heilbronn, Germany), with a frequency range of 25 Hz–18.5 kHz. The sound pressure level of the emitted signal was 71 dB. At the receiving end a miniature microphone (Sennheiser KE4, Wedemark, Germany), with a frequency range of 20 Hz–20 kHz, was placed into the external ear canal. Note that neither the microphone nor the earphone needed calibration because only the change in sound intensity at the ear was exploited. At the microphone location ambient noise with significant low-frequency components interfered with the wanted signal. In order to reduce its impact on the measurement results, a digital high-pass filter was integrated into the measurement software. The measurement system ran at a sampling rate of 48 kHz.

Fig. 1
figure 1

Schematic of the sonotubometric device strategy

In the sonotubograms shown further below, the sound energy transmitted from the loudspeaker to the microphone is plotted over time. Note that the values on the y-axis do not reflect the sound pressure level; rather they are a measure for the fraction of the emitted sound energy, which is received at the microphone location. The actual values, in the range of 10−9 to 10−5 on a linear scale, are shown on the logarithmic dB-scale \( \left( {X_{\text{dB}} = 10\log \left( {X_{\text{lin}} } \right)} \right) \) in the range of −90 dB to −50 dB for easier visualization and evaluation. As an additional feature two special sound levels are introduced with this new type of sonotubogram representation:

  1. 1.

    Passive level The sound level measured in the external ear canal with stimulus, but no maneuver being executed.

  2. 2.

    Noise level The sound level measured in the external ear canal without stimulus and neither a maneuver being executed.

As depicted in Figs. 4, 5, 6, 7, 8, the two levels are marked by colored horizontal bars. It turns out that the incorporation of these two values into the evaluation increases the diagnostic reliability [17].

The acoustic hardware components themselves (i.e., loudspeaker, microphone and sound card) naturally influence the measurements. However, using an appropriate equalizer this influence can be kept sufficiently small not to adversely affect the results.

Stimulus signal

For the loudspeaker signal, i.e., the stimulus signal, a periodically repeated perfect sequence (PSEQ) [11, 13] is used. PSEQs are discrete-time, binary, ternary, or poly-phase pseudo-noise sequences of distinctive length, with advantageous properties in the context of digital system analysis and identification [15]. In contrast to other broadband signals, like white noise, their main advantage is their ideally flat spectrum, i.e., all possible frequency components (here 0–24 kHz) are excited equally. Compared to state-of-the-art systems with sinusoidal stimulus, where only the information at one frequency component can be exploited, the PSEQ stimulus provides a significant gain in simultaneous information over a wide frequency range [17].

Naturally, the length of one period of the PSEQ had to be adjusted to the length of the impulse response of the transmission link between nose and ear. The latter can be determined by means of system identification techniques; as a result, a PSEQ of length 992 was selected, which corresponds to a length of about 20 ms, at a system sampling rate of 48 kHz. The PSEQ stimulus of 15 s duration is made up of 725 repeats of the selected PSEQ (see Fig. 2 for the general principle).

Fig. 2
figure 2

Principle of the construction of a stimulus signal × (t) of length 12 with an odd-perfect sequence of length 4

The system identification process provides impulse and frequency responses continuously changing over time. These responses describe the dynamic characteristics of the nose-to-ear transmission link. Examples for impulse and frequency responses are depicted in Fig. 3 for two points in time for an open and a closed ET, respectively. Comparison of the two curves clearly reveals the differences in amplitude and spectral form between the two states of the ET. Furthermore, note that all relevant frequency components are covered by the proposed measurement system.

Fig. 3
figure 3

Impulse (a) and frequency (b) response determined by system identification using PSEQs as stimulus signal, for open and closed ET

Statistical evaluation

Statistical analyses of variance were performed for the mean sound intensity and the mean duration using the type of maneuver, the ear side, and the status of decongestion as independent factors. For comparisons in pairs adequate t tests were calculated. For testing independence chi-square tests were calculated. All statistical analyses were performed with SAS for Windows (version 9.1).

Results

A total of 320 measurements were performed: 20 individuals, left and right auditory tube (×2), maneuvers for attempting opening of the ET (×4), before and after decongestion of the nasal tract (×2). The subjects reported subjective ET openings in 298/320 measurements (93.12%). Using our sonotubometry methodology in 237/320 measurements (74.1%) ET openings were registered objectively. Since in all these 237 measurements the subjects also noticed openings subjectively, these sonotubograms were classified as valid, according to the criteria mentioned above. Each subject demonstrated at least three valid measurements. During a single measurement cycle multiple openings could be observed, so that a total of 888 ET openings were found in the valid sonotubograms.

Associated with the four maneuvers, subjective ET openings appeared in 80/80 measurements (100%) during water swallowing, in 80/80 measurements (100%) during dry swallowing, in 77/80 measurements (96.25%) during Toynbee maneuver, but only in 61/80 measurements (76.25%) during yawning. The evaluation of the sonotubograms corresponding to the specific measurements yielded valid measurements in 72/80 cases (90%) with water swallowing, in 69/80 cases (86.25%) with dry swallowing, in 64/80 cases (80%) with Toynbee maneuver, and a very low number of 32/80 cases (40%) with yawning.

Evidence of different shapes of sonotubograms could be observed, and these were classified into four types. The most common shape was a spike-shaped type (37%, see the example in Fig. 4), followed by the double-peak type (31%, see the example in Fig. 5); much less frequent are the descendant type (15%, see the example in Fig. 6) and the plateau type (8%, see the example in Fig. 7). In rare situations (4.5%) a mixture of sonotubogram types could be observed in an individual subject for a single maneuver (see the example in Fig. 8). It is noteworthy that the different sonotubogram types could be associated with the different maneuvers: dry and liquid swallowing as well as Toynbee maneuver mostly exhibited spike- or double peak-type sonotubograms, while yawning mostly resulted in plateau-type sonotubograms (22/32, ~69%). These association findings are summarized in Table 1.

Fig. 4
figure 4

Example for a spike-type sonotubogram. The vertical bars indicate the time periods during which the response button was pressed, i.e., a maneuver was performed. The two horizontal bars indicate noise (lower) and passive (upper) levels

Fig. 5
figure 5

Example for a double peak-type sonotubogram

Fig. 6
figure 6

Example for a descendant-type sonotubogram

Fig. 7
figure 7

Example for a plateau-type sonotubogram

Fig. 8
figure 8

Example for the occurrence of two different sonotubogram types in one measurement sequence (here double-peak type and spike type)

Table 1 Different sonotubogram types associated with different maneuvers

For systematics, the variation in sound intensity during the auditory tube openings was evaluated. For the four maneuver types it was in the ranges of 10–30 dB (median 18, mean 19.06, SD ± 5.04) for water swallowing, of 6–30 dB (median 16, mean 17.1, SD ± 4.38) for dry swallowing, of 9–29 dB (median 16, mean 17.2, SD ± 4.09) for Toynbee maneuver, and of 5–20 dB (median 10, mean 10.17, SD ± 3.8) for yawning. Overall, a mean increase in sound intensity of 16.9 dB (SD ± 5.3, median 16.0) was observed during ET activity.

No significant difference in the increase of sound intensity (P > 0.05) could be observed when comparing the right and the left ear for each of the four performed maneuvers, at least not in the normal, healthy ears surveyed in this study. The sound variation measured before and after nasal decongestion, showed no significant difference either (P > 0.05).

Results from sonotubometry measurements for the four maneuver types can be compared with each other. Those during yawning were worse than those observed during dry or water swallowing, and during the Toynbee maneuver. During yawning valid measurements could be registered in only 13 subjects (65%); the sound increase was also significantly less than for the other maneuvers (P < 10−4). Although water swallowing seems to provide the clearest results, really significant differences (P < 0.021) were only observed in 4 of the 20 subjects, in comparison to dry swallowing and Toynbee maneuver. A comparison of dry swallowing and Toynbee maneuver showed no significant difference (P > 0.05).

For the 888 recorded openings, the mean opening duration of the auditory tube was 670 ms (SD ± 324 ms, median 600 ms). Of these the mean opening durations for the various maneuvers were as following:

  • for water swallowing 576 ms (SD ± 213 ms, median 500 ms);

  • for dry swallowing the mean is 683 ms (SD ± 290 ms, median 700 ms);

  • for the Toynbee maneuver the mean is 646 ms (SD ± 307 ms, median 600 ms); and

  • for yawning the mean is 911 ms (SD ± 484 ms, median 1,000 ms).

When comparing the measured sound level increase with the subjects’ auditory perception, we find a significant correlation between the subjects’ (subjective) perception of sound variation and our (objective) registered sonotubograms (P < 0.05, chi-square test); this indicates a clear interdependence between the two variables.

Discussion

Eustachian tube activity can be detected using a variety of methods. Unfortunately, nearly all approaches attempted thus far have their particular drawbacks. Of these methods sonotubometry (assessment of the dynamic properties of the acoustical transmission link between nose and ear), in which ET activity is observed under physiological conditions seems to show rather promising potential, although some results scatter widely. When pure-tone signal stimuli were used, the range of detected ET openings in healthy subjects was reported to be in the range of less than 55% to more than 90% [5, 6, 8, 18]. For example, in their recent studies van der Avoort [8, 18] reported up to 81% openings in children and 91% openings in adults. In contrast, in a cohort studied recently by our group only 55% of openings were detected, using an 8-kHz stimulus signal [5]. Mondain and co-workers [19] reported similar results with 62% detected openings. This broad range of results questions the reliability of the approach with a pure-tone test signal. We think that multi-tone stimuli may be the answer. For example, conventional broadband signals yielded up to 87% agreement with (traditional) tympanometric results [9]. However, due to technical difficulties as well as a limited reliability of both approaches, a broader clinical use of sonotubometry did not materialize thus far.

First experiences with our new approach exploiting a special class of acoustical signals, the so-called perfect sequences (PSEQs), show that such stimuli signals in combination with bespoke signal-processing techniques may significantly enhance the potential of sonotubometry for clinical applications [3, 4, 15]. For example, the mean increase in sound intensity was 16.9 dB for sonotubometry using PSEQ versus 13.8 dB in sonotubometry based on an 8 kHz pure-tone stimulus [5]. Furthermore, a prolonged mean opening duration of the ET was also registered for the measurements with PSEQ, namely, 670 ms in our current study versus 355.6 ms in sonotubometry using an 8 kHz pure-tone stimulus [5].

The individuals were randomly chosen from the cohort mentioned further above; all of them were tested with PSEQs immediately after pure-tone signal sonotubometry. The 74.1% detected openings with PSEQs during the current study clearly demonstrate the advantage of this new approach when compared to the 55% detected openings with pure-tone signals.

As discussed above, our sonotubograms can be broadly divided into classes with different, characteristic shapes. This confirms results from our earlier studies with pure-tone signal stimuli and those of other groups [6, 19]. Because of the similar results it can be postulated that the different patterns reflect a particular activity of the involved muscles. Note that the majority of the individuals generated a typical sonotubogram type for a given maneuver; in a minority of cases (<5% in this study) different sonotubogram types could be observed for an individual subject during a single measurement. The muscular activity results obtained by other groups [2022], using alternative methods, indeed seem to confirm our interpretation of a link between opening activity type and sonotubogram type. In this regard, the different sonotubogram types observed for the different maneuvers performed may reflect a different duration and amplitude of ET muscle activity. The spike-type sonotubogram, which is most frequent during dry swallowing, may be explained by a shorter duration of ET muscle activity compared to water swallowing. Although swallowing with water seems to be easier for the subjects, the impact of the liquid on the pharynx walls may be responsible in some cases for one of the two peaks in double-peak sonotubograms. The plateau curve, which was mostly observed during yawning, may reflect a prolonged ET muscle activity with a relatively constant amplitude.

While swallowing triggers a reflex that almost certainly led to ET opening in our healthy subjects, yawning did not. Yawning was the most problematic maneuver in this regard. Because in general yawning cannot be performed consciously, obviously it is more difficult to induce ET activity by moving the jaw. When yawning was successfully performed, an opening could be observed in 32 (40%) of the 80 yawning maneuvers. A controlled trial with a larger number of healthy subjects and repeated measurements will be necessary to determine the efficacy of the use of yawning in sonotubometry when ET function is sought from its analysis.

Background noise events were minimized as much as feasible; however, rarely could they be eliminated completely. For example, in a number of cases muscular activity of the pharynx, water on the pharyngeal walls during water swallowing, or temporo-mandibular joint crepitus during yawning caused high levels of underlying noise pollution. These unwanted noise sources could be identified as the cause for most of the invalid measurement results. While such involuntary events can hardly be avoided at all times, our current research activity addresses the reduction of noise in general, and, in particular, how to cope with the aforementioned events when they occur.

The most common underlying cause of middle ear disease is dysfunction of the Eustachian tube. Although in some recent studies sonotubometry is being considered for the diagnosis and evaluation of surgical treatment of different pathological entities of the middle ear [23, 24], up to now it has not yet gained wider acceptance as a tool in the diagnostics of ET dysfunction in clinical work and research.

While significant work still needs to be done, we are reasonably confident that sonotubometry based on PSEQ stimuli may in future allow for an easier identification of those patients at particular risk for developing middle ear pathology and associated hearing loss. A long-term monitoring of ET activity would be desirable in order to provide us the opportunity to detect and study transient or prolonged periods of ET dysfunction. In particular, studies on otologically affected patients are on the way in order to determine whether sonotubometry based on PSEQ stimuli could provide a reliable tool for the normalization of the ET function, and thus the prediction of malfunction.

Conclusion

Sonotubometry using PSEQs is an innovative approach that provides improved results and enhanced reliability when compared to conventional sonotubometric testing. In the study presented here 20 otologically healthy subjects were examined. Of the four ET opening maneuvers performed water swallowing exhibited the best reliability and reproducibility while yawning was found not to be suited for sonotubometric testing, as it seems very difficult for most of the patients to induce tubal openings by this maneuver.

Exploiting PSEQ stimuli the characteristics of the transmission link between nose and ear are obtained for all relevant frequency components. This information can then be used to significantly improve the visualization of the ET activity. Besides this evident gain, our new excitation strategy shows even further potential. The particular characteristics of the PSEQs allow for the application of sophisticated signal-processing algorithms, and these in turn can be exploited for new approaches for reducing the influence of unavoidable noise distortion. These studies are well under way, together with research into the extraction of a virtual tube model for the transmission link [16, 17].