Introduction

The intense growth of air traffic in the last decades has a strong impact on society and environment. Residents in the vicinity of airports are annoyed by aircraft noise and fear negative effects for their health and well-being. Nocturnal aircraft noise may lead to sleep fragmentation as a primary effect, which may in turn cause decrements in daytime performance as a secondary effect. Studies with objective measures of performance during the day following noise-disturbed nights are rare and show inconsistent results. Different acoustical parameters could account for the impact on performance: the number of noise events during the night, the maximum sound pressure level (SPL) of the single event and the energy equivalent noise level (LAeq), which is calculated by integrating the sound energy of all noise events over a given time period (usually the whole night).

Some studies point at the importance of the number of noise events during the night (Öhrstrom and Rylander 1990; Öhrstrom 1995). There are, however, reports that associate daytime performance impairments with maximum SPL or LAeq-values that the subjects received in the night and with an intermittent noise exposure in general (Öhrstrom and Rylander 1982; Öhrstrom et al. 1988). In a field study, the reduction in the nightly LAeq from 46.6 dB(A) by 5.8 dB(A) resulted in improved reaction times (Wilkinson and Campbell 1984). However, some studies did not find performance decrements due to road, rail or aircraft noise (Griefahn et al. 2000; Marks and Griefahn 2005; Schapkin et al. 2006).

Nocturnal aircraft noise alters sleep architecture (Basner et al. 2006a; Michaud et al. 2007; Basner and Samel 2005). Polysomnographical results from the present study have shown that nocturnal aircraft noise led to the fragmentation of sleep (Basner et al. 2006a; Basner and Samel 2004; Basner and Samel 2005). Which of the observed changes in sleep structure account for daytime performance impairments is unknown so far. Sleep deprivation studies hint at sleep duration as being the most important factor for daytime fatigue and performance loss (Bonnet 1986b; Gillberg 1995; Jewett et al. 1999; Martin et al. 1999).

So far, studies on nocturnal traffic noise effects examined only small parts of the possible noise exposure spectrum concentrating either on the number of noise events or on few categories of maximum SPL or LAeq. The aim of this study was to examine the impact of nocturnal aircraft noise on daytime performance in a large subject sample. It was designed to cover a large range of number of aircraft noise events per night, maximum SPLs and LAeq-levels to allow the calculation of dose–response relationships. The implementation of the same objective performance measurements in the laboratory and the field offered the possibility to provide a direct comparison of the findings of both settings.

It was hypothesized that nocturnal aircraft noise alters performance depending on acoustical criteria (i.e. the number of aircraft noise events per night, maximum SPL and LAeq) and depending on the day in the course of the study (time in study).

Methods

Participants

One hundred and twenty-eight participants were examined in a laboratory study. One hundred and twelve of these participants were exposed to aircraft noise while 16 participants were not exposed to noise forming a control group. The participants were between 19 and 65 years old (mean age 38 ± 13 (SD) years); 75 (58.6%) were women. Sixty-four participants volunteered for the field experiment; two participants of the field study had to be excluded from the performance data analysis since they were disturbed repeatedly by their small children while executing the tasks. Included participants were between 19 and 61 years old (mean age 37 ± 13 (SD) years); 35 (54.7%) were female. Participants were selected in a multi-level selection procedure which included different questionnaires, medical examination and training of computer-assisted performance tests. They were in good physical health, free from intrinsic sleep disorders, and had a normal hearing threshold according to their age. The selection process is described in detail elsewhere (Basner et al. 2004; Basner et al. 2006b).

All participants signed an informed consent according to the Declaration of Helsinki and were reimbursed for participation. The study was approved by the Ethics Committee of the North Rhine Medical Board.

Design

In the laboratory experiment, participants slept in the occupational medical simulation facility (AMSAN) of the German Aerospace Center (DLR) for 13 consecutive nights. We conducted 16 study periods in total with 8 participants each. They stayed in the laboratory from 7 p.m. to 8 a.m. and were free to proceed with their normal activities during the day. Time in bed was scheduled from 11 p.m. to 7 a.m. that allowed for a maximum sleep period of 8 h. Naps during the day were prohibited, and compliance was controlled with actigraphic monitoring. The first night served as adaptation to the polysomnographic equipment and the unfamiliar environment. The second night was the baseline night; nights 12 and 13 were recovery nights, all of them without noise exposure. During nights 3–11, participants were exposed to changing patterns of aircraft noise which were presented via loudspeakers in each private sleeping room. Each sleeping room was acoustically calibrated before the study started to guarantee a correct playback of the original aircraft noise events recorded in the vicinity of Düsseldorf airport. There were 4, 8, 16, 32, 64 or 128 aircraft noise events played back during the night, with an A-weighted maximum SPL between 45 and 80 dB(A) in 5-dB steps (Table 1). LAeq ranged from ≤30 dB(A) under baseline conditions to 57 dB(A). The noise events were equidistantly distributed throughout the night between 23:15 and 06:45 hour. Table 1 shows the number of nights provided with each combination of number of aircraft noise events and maximum SPL. The design should represent realistic noise situations so that very quiet and very noisy noise patterns were not included in the design. The experiments were performed in an “incomplete block design” (blocks correspond to participants). Nine out of the 30 possible exposure combinations were randomly assigned to the nine noise nights of each study period. Figure 1 shows that this random process led to slightly higher mean LAeq-values during the later noise exposure nights of the study. The experiment was conducted double blind; i.e. neither the experimenters nor the participants knew which noise pattern would be presented.

Table 1 Number of study nights with a certain combination of maximum sound pressure level (SPL) and number of aircraft noise events
Fig. 1
figure 1

Mean reaction time (±SE) in psychomotor vigilance task (PVT) in the course of the study and respective mean energy equivalent noise level (LAeq) (±SE) values during the previous nights of the laboratory setting

The field experiment was conducted in the vicinity of Cologne/Bonn airport, which is among Germany’s airports with the highest night-time traffic density, mainly due to cargo flights. Regions were preselected, where participants were exposed primarily to aircraft noise and with low road traffic densities during the night. Participants were examined during nine consecutive nights in their homes. They had to keep a core sleep time between midnight and 6 a.m. The first night served again as adaptation. Using class-1 sound level meters, SPLs were recorded outside (2 m in front of the window) and inside the bedroom at the sleepers’ ear. Maximum SPLs inside ranged from 16 to 73 dB(A), LAeq from ≤30 to 51 dB(A) in the field.

Cognitive performance measurements

The computer-assisted performance tests were conducted in the morning shortly after getting up (before breakfast and shower). The tasks were implemented on the test-software ERTS (experimental run time system) of the Berisoft Company (Frankfurt, Germany). Participants were trained 40 times in conducting the test in the run-up of the study to guarantee a stable baseline position. They were asked to perform the tests as fast and correct as possible with constant effort to keep the influence of motivation low.

Psychomotor vigilance task (PVT)

Participants had to respond to a white digital stopwatch (that showed the time in ms) appearing in irregular intervals (1.5–10 s) at the dark screen by pressing a key as fast as possible. The test lasted 10 min. The number of presented signals depended on the reaction times of the subject. In this sample, the number of presented signals averaged 83 (±3 SD) per 10-min trial. Reaction times that lasted equal to or longer than 500 ms were regarded as lapses (Dinges and Powell 1985). Reaction times that were shorter than 130 ms were most probably reactions without stimulus (false starts) and therefore excluded from analysis.

Memory search task (MST)

Six letters were presented at the beginning of the task and had to be memorized without time pressure. When ready, the 3-min recall phase started, in which letters of the whole alphabet were presented. The subject had to decide whether the current letter belonged to the learning set using two predefined keys. It was equally likely that the letter belonged to the learning set or not. In this sample, participants had a mean rate of 109 (±8) presented letters per trial. Reaction times were recorded. Reaction times that were faster than 130 ms were regarded as false responses and excluded from analysis. The task belongs to the AGARD STRES battery (AGARD 1989).

NASA task load index (TLX)

TLX is a questionnaire that asks for the subjective and retrospective judgment of workload (Hart and Staveland 1988). It comprises six questions concerning mental, physical and temporal demand of the task, performance, effort and frustration. The scale ranges from 0 points (very low workload) to 20 points (very high workload). Additionally, each subject created an individual ranking of the importance of the questions ranging from 5 (very important) to 0 (not important). Scale value and rank importance were multiplied and summed up to get a single value. Workload was then grouped into five categories (covering 60 points each) from 0 points (very low workload) to 300 points (very high workload). TLX rating was not conducted during the field experiment.

Additional data

In addition to the data described earlier, classical polysomnographical data were recorded including an electroencephalogram (EEG), an electrooculogram (EOG) and an electromyogram (EMG). The participants filled in various questionnaires, amongst others annoyance ratings. The nightly urine was collected and analysed for stress hormones. These findings were described elsewhere (Basner and Samel 2004, 2005; Basner et al. 2004, 2006a, b; Quehl and Basner 2006; Maaß and Basner 2006; Quehl 2006).

Statistics

Statistical analyses were conducted using SAS version 9.2. The first step was to investigate aircraft noise effects in general (noise versus baseline) with mixed models (PROC MIXED in SAS) in the laboratory. Subject effects were considered random because the subjects in the experiment were only a small subset of eligible persons. The mixed model accounted also for the fact that single subjects were investigated repeatedly over several nights, i.e. that the data were not independent. The multiple testing was furthermore accounted for by adjustment according to Dunnett.

The second step was to calculate dose–response relationships for laboratory and field conditions. Again subject effects were considered random. Mean reaction times were analysed with mixed model regressions using a random intercept model (PROC MIXED in SAS). Lapses in the PVT were so rare that dose–response relationships were calculated using random intercept logistic regression models (absence of lapses = 0, at least one lapse = 1). In the MST, hit rate (HR, correct yes response on a signal trial), false alarm rate (FAR, incorrect yes response on a noise trial), sensitivity (A′) and bias (B D ) were calculated according to the non-parametric signal detection theory (Stanislaw and Todorov 1999). A′ is a measure of detection accuracy and varies between 0.5 (signals are not differentiated from noise trials) and 1.0 (correct separation of signal and noise trials). B D ranges within the limits of −1 (yes response to all trials) and +1 (no response to all trials), with 0 representing the absence of response bias. Gender and age were included as moderators in all of the models (male = 1, female = 0). The iterative modelling process tested first the single parameters for a statistical significant influence on performance. Only parameters that proved to be individually significant at α < 0.05 were tested for non-linearity and for interactions. In order to distinguish between nights with and without noise, an indicator variable was included in the modelling of the laboratory data (time in study × noise/noise-free conditions). For the parameters number of aircraft noise events, LAeq, and time in study, the data of noise-free conditions were included in the models since they formed a stepless extension to the experimental conditions. For maximum SPL, the data of noise-free conditions were excluded, because there was a gap between 0 and 45 dB maximum SPL and the following 5-dB steps which might have distorted the regression coefficients. Furthermore, maximum SPL could not be used for field data analyses since, in contrast to the laboratory setting, maximum SPL was changing within each night and from subject to subject.

Thirty-two participants received a special noise pattern in the recovery nights (testing a noise-free core sleeping time concept) so that the data of these participants were excluded from analyses of day 12 and 13.

Results

General noise effects in the laboratory

Figure 1 displays the mean reaction time in PVT in the course of the study days. Mean reaction time in PVT was 241.0 ms (±2.0 SE) under baseline conditions (day 2) and increased up to 245.9 ms (±2.5 SE) at day 11. Reaction time improved immediately to 242.3 ms (±2.8 SE) after one recovery night (day 12). Comparing performance under baseline (day 2) and noise conditions (pooled data of days 3–11) in the experimental group using mixed model ANOVA showed that the impairment under noise conditions in PVT was significant (p = 0.0014).

Mean reaction time in MST on the contrary improved from day 2 (639.5 ms (±12.3 SE)) to day 11 (618.6 ms (±12.2 SE), Fig. 2a). After one recovery night, reaction time slowed again to 625.5 ms (±14.3 SE). Along with the improvement in mean reaction time, FAR increased from 1.4% (±0.2 SE) at day 2 up to 2.4% (±0.5 SE) at day 11 (Fig. 2b). Mixed model ANOVA of MST performance data comparing baseline with pooled noise conditions in the experimental group revealed that mean reaction time improved significantly by 15.9 ms (±6.0 SE) during noise conditions (p = 0.0083). FAR rose significantly from 1.6% (±0.3 SE) to 1.9% (±0.2 SE) (p = 0.0417), while A′ decreased from 0.987 (±0.002 SE) to 0.984 (±0.002 SE) (p = 0.0206). HR (p = 0.1219) and B D (p = 0.9537) did not change significantly.

Fig. 2
figure 2

a Mean reaction time (±SE) in memory search task (MST) in the course of the study with regression line and 95% confidence interval of the laboratory setting. b Mean false alarm rate (FAR) (±SE) in memory search task (MST) in the course of the study with regression line and 95% confidence interval of the laboratory setting. The figure displays regression models, and formula that are not adjusted for age and gender. Thus, regression lines represent the average performance level of the complete sample and are more straightforward. Tables and text give the regression coefficients for the adjusted models. Adjusted and unadjusted models differed only regarding the intercept values

TLX subjective rating of workload ranged in the low category under baseline conditions with 107 points (±6 SE). Under noise conditions, workload increased significantly reaching 114 points (±6 SE) (p = 0.0216). Nevertheless, workload was still rated as being low.

Dose–response relationships in the laboratory

Mean reaction time in PVT increased continuously with time in study, thereby reflecting the LAeq pattern of the previous night (Fig. 1), and decreased immediately after only one recovery night. A regression model that accounted for the acoustical criteria LAeq and number of noise events, and the course of performance during noise nights and noise-free nights reflected the data best. Mean reaction time increased linearly with the number of aircraft noise events during the night (0.02 ms (±0.01 SE)/noise event, p = 0.0137) and with increasing LAeq (0.10 ms (±0.05 SE)/dB, p = 0.0413) (Fig. 1). Time in study was included in the model (0.06 ms (±0.08 SE)/day, p = 0.4591) and the interaction between time in study × noise/noise-free conditions (0.25 ms (±0.10 SE), p = 0.0119) in order to distinguish between the performance loss caused by noise exposure and performance under noise-free conditions. Age was a significant, and gender was a non-significant moderator (age: 0.4 ms (±0.2 SE)/year, p = 0.0172; gender: −3.5 ms (±4.2 SE), p = 0.3981).

The probability of the occurrence of at least one lapse in the PVT increased significantly from day to day (0.10 (±0.03 SE)/day (logit scale, p = 0.0043; time in study × noise/noise-free condition: 0.01 (±0.02 SE), p = 0.7247; intercept: −2.19 (±0.55 SE), p = 0.0001)). Concerning the lapses in the PVT, age (−0.02 (±0.01 SE)/year, p = 0.0987) and gender (0.40 (±0.32 SE), p = 0.2214) did not have a significant influence.

For MST, time in study showed significant effects (Fig. 2a, b).

Mean reaction time improved linearly from day to day by 2.6 ms (±0.5 SE) (p = 0.0001, Table 2). Similar to the mean reaction time, HR and A′ decreased and FAR increased linearly and significantly with time in study (Table 2). HR diminished from 95.3% at the second day by −0.10% (±0.03 SE)/day (p = 0.0046), A′ was reduced from 0.981 at day 2 by −0.0006 ± (0.0001 SE)/day (p < 0.0001) and FAR increased from 2.7% at day 2 by 0.09% (±0.02 SE)/day (p < 0.0001). B D did not change significantly (Table 2).

Table 2 Results of the regression models for time in study effects in MST

Dose–response relationships in the field

For PVT, one model including LAeq and time in study showed significant results (Fig. 3).

Fig. 3
figure 3

Mean reaction time (±SE) in psychomotor vigilance task (PVT) in the course of the study and respective mean energy equivalent noise level (LAeq) (±SE) values during the previous nights of the field study. Day 2 is Wednesday morning and so on

Mean reaction time increased linearly with rising LAeq (0.3 ms (±0.1 SE)/dB(A), p = 0.0284) and with time in study (0.6 ms (±0.2 SE)/day, p = 0.0008). Age and gender were in that case non-significant moderators (age: 0.5 ms (±0.3 SE), p = 0.0854; gender: −12.7 ms (±7.7 SE), p = 0.0977; intercept: 218.1 ms (±13.4 SE), p < 0.0001).

For MST, the time in study effect proved significant. FAR increased linearly and significantly by 0.14% (±0.05 SE)/day (p = 0.0046), while B D decreased from day to day by 0.0235 (±0.0109 SE) (p = 0.0313). Age and gender were non-significant moderators for B D (age: 0.0042 (±0.0035 SE), p = 0.2322; gender: −0.0219 (±0.0895 SE), p = 0.8068; intercept: 0.3466 (±0.1593 SE), p = 0.0335). FAR decreased significantly with age (age: −0.1% (±1.0 SE), p = 0.0072; gender: −0.1% (±1.0 SE), p = 0.9112; intercept: 6.2% (±1.7 SE), p = 0.0005). Acoustical criteria and time in study did not affect mean reaction time.

Discussion

In the presented studies, performance impairments caused by nocturnal aircraft noise were observed in PVT and MST, both as a general noise effect (comparing the pooled noise data to baseline) and in dose–response relationships.

In the laboratory, mean reaction time in PVT prolonged depending on the sum of nocturnal aircraft noise events and LAeq. For the examined range of 0–128 aircraft noise events per night, the model showed a maximum increment in reaction time of 2.6 ms (0.02 ms/aircraft noise event), and for the range of 30–57 dB LAeq, the model showed a maximum addition of 2.7 ms (0.10 ms/dB LAeq). In the field, LAeq and time in study showed a significant effect on mean reaction time in PVT.

A model including all three acoustical parameters was not developed, since the parameters are not independent; i.e. LAeq can be calculated from maximum SPL and number of noise events.

The impairments in reaction time in PVT were small, but significant. Their magnitude was less than the range that we found in this study for gender difference or age progression and less than has been reported previously (Philip et al. 2004). Thus, it was important to adjust for these two moderators in the regression models. Most probably, nocturnal aircraft noise exposure led to a moderate partial sleep deprivation which affects daytime performance (Basner et al. 2006a). Using the impact of cumulative partial sleep deprivation or acute effects of alcohol exposure on PVT as scaling of values (Elmenhorst et al. 2009), nocturnal aircraft noise exposure had a considerably smaller influence on reaction time. Reaction time impairments caused by four nights of 5-h sleep ranged between 5.7 and 8.8 ms; breath alcohol concentration of 0.6‰ led to reaction time extensions of about 13 ms. However, in the present study, participants had a relatively long time in bed of 8 h which could partly compensate for the effects of sleep fragmentation due to aircraft noise so that severe daytime performance impairments were not expected. Even though it has been shown with means of infrared pupillography that participants in the present study were objectively sleepier during the day after noise-disturbed nights (Basner 2008).

A cumulative performance loss was observed with increasing time in study in PVT and MST, in laboratory and field. Mean reaction time in PVT increased in the laboratory and in the field experiment. Probabilities for lapses increased in the laboratory. Due to the recovery nights in the laboratory, we could show that mean reaction time in PVT increased depending on the LAeq-value of the previous night and immediately recovered after one night without noise. The cumulative performance impairment in the field could hint at a loss of motivation of the participants to conduct the performance tests which is quite improbable regarding the results of the laboratory study. It is more likely that the relatively high LAeq-values in the beginning of the second week that led to respective performance impairments on day 8 and 9 (Tuesday and Wednesday morning) distorted the regression line and caused a significant effect. A time in study effect could be observed in MST as well. In the laboratory, mean reaction time in MST improved in the course of the study at the expense of a decrease in HR and A′, and an increase in FAR. In the field, FAR increased in the same way. Additionally, B D indicated that participants answered less conservative, i.e. assigned more items to the learning set in the course of the experiment. The observed changes in MST could hint at a change in working strategy which causes the participants to work faster but less accurately. That could be a consequence of nocturnal aircraft noise as well as a mere response to numerous repetitions of the task during the study. Similar performance increments in episodic memory (recognition and recall) as found in children living in noise-exposed regions (Stansfeld et al. 2005; Clark et al. 2006) were not seen in the MST. In the field experiments, fewer regression models proved significant than in the laboratory. Some studies have reported similar differences. They have judged the smaller influences in the field as habituation effects (Horne et al. 1994; Fidell et al. 1995; Pearsons et al. 1995). In this study, noise exposure in the laboratory was conducted over a wide range; i.e. participants were exposed to maximum SPLs that were not recorded in the field. This may have contributed to stronger aircraft noise effects on performance in the laboratory as well.

Subjectively evaluated workload increased little but significantly after noise-disturbed nights. Workload, however, ranged in the low category, both under baseline and noise conditions. Thus, the small objectively measured performance impairments are well reflected by the subjective workload ratings.

The presented results of performance impairments after noise-disturbed nights are consistent with the findings of former studies. Öhrstrom et al. (1988) and Öhrstrom (1995) hinted at the importance of the number of noise events during the night and the meaning of different maximum SPLs for performance the following day. Mostly, studies on nocturnal traffic noise and performance presented trends but were not able to confirm significant performance deteriorations. This could be a consequence of field experiments, which showed only small effects in this study as well, or of the design of laboratory studies with, compared to our study, less noisy exposure patterns and smaller subject samples. Öhrstrom and Rylander (1982) could not detect significant performance impairments until noise events reached 80 dB(A). Disturbances in sleep architecture are believed to be the cause of performance impairments the following day, but performance loss after sleep deprivation (Wilkinson 1968, 1969; Wilkinson et al. 1966) and sleep fragmentation (Bonnet 1985, 1986a, 1987; Stepanski 2002; Martin et al. 1996) is considerably more pronounced.

It remains speculation which relevance inheres in the observed daytime performance alterations after nocturnal aircraft noise exposure. Performance deteriorations depending on the acoustical characteristics of the night are significantly and consistently verifiable, but small. They range in limits that are also found for age progression and gender differences. Nevertheless, the performance alterations hint at the changes of physiological processes by nocturnal aircraft noise exposure. Considering the limited representativeness of the sample (only healthy adults were included), stronger performance impairments in children or ill persons due to nocturnal aircraft noise cannot be ruled out.