Introduction

Recent studies of participant compliance with cortisol sampling procedures in ambulatory settings indicate that estimates of cortisol values may be compromised by failure to adhere to the proper timing of sampling protocols [1]. One aspect of cortisol activity, the size of the cortisol awakening response (CAR), has been shown to be particularly sensitive to non-compliance [1, 2]. The CAR refers to the change in cortisol levels upon awakening; levels typically increase by 50% to 75% during the first 30 to 40 min after waking [3] before declining throughout the remainder of the day.

The rapid rate of change in cortisol levels after waking means that the accuracy of CAR measurement is highly subject to the timing of samples used to measure the CAR. Often, participants are asked to provide one saliva sample immediately upon waking and another at the estimated peak of the CAR, approximately 30 min post-awakening [1, 4]. If participants fail to provide their first sample immediately upon waking, the cortisol value for that sample could be affected by the post-awakening cortisol rise. Similarly, if the wake +30 min sample is taken too early or too late, it may miss the CAR peak. Most prior studies of cortisol sampling compliance have relied on participant-reported waketimes in judging the accuracy of participant sampling times. The current study builds upon prior research by utilizing objective measures of waketimes, to examine the extent to which inaccurate reporting of waketimes may contribute to poorly timed CAR samples and inaccurate assessments of the CAR.

One recent study has taken this approach, noting that self-reported waketimes were relatively accurate and had minimal influence on the CAR if taken within a 15-min time frame [5]. However, that research was conducted with a sample of older adults who might have higher-than-average compliance rates. The current research focuses on adolescents, a group expected to have lower levels of compliance, due to developmental changes in sleep patterns, including a shift towards eveningness, and lower alertness in the morning [6]. This study addresses the following questions: (1) How accurate are adolescents in reporting their own waketimes? (2) Do inaccurate reports of waketimes contribute to failure to comply with sampling protocols? and (3) What are the implications of non-compliance for estimates of the CAR?

Method

Participants

Participants were 91 late adolescents, ages 17 to 20 years old (mean = 19) who are part of a larger study on the development of emotional disorders over the transition to adulthood. For the larger study, all juniors in two high schools (in suburban Chicago and suburban Los Angeles) were asked to complete a screening questionnaire, including the neuroticism scale from the Eysenck Personality Questionnaire—Revised [7]. In total, 923 of the 1,370 students screened were invited, and 491 participated. A subsample (75%) from suburban Chicago were invited to participate in a longitudinal cortisol protocol. The current study uses data from the second wave of that protocol, which is when the objective (actigraphy) assessment of sleep–wake timing was added. Of these, 121 (81%) continued to wave 2. Because the study’s original goal was to predict the onset of emotional disorders, those who scored in the top tertile on the neuroticism scale were over-sampled and constitute 61% of the final sample.Footnote 1 Thirty participants were excluded for taking corticosteroid medications, pregnancy, and/or missing data, leaving a total of 91 participants. Certain participants failed to provide complete data (30% completed 1 day, 35% completed 2 days, and 33% completed 3 days), leaving a total of 181 days of data from the 91 participants in the final sample.

Procedures and Measures

In addition to providing saliva samples, participants wore wrist-based actigraphy monitors for three weekdays that provided ongoing activity records from which sleep timing was estimated. Diary reports, in which they recorded their wakeup time each morning and momentary emotions, and health questionnaires, which were used to determine exclusionary criteria and medical controls, were also analyzed.

Saliva Sampling and Diary Reports

Participants were asked to provide six samples of saliva per day for three consecutive weekdays. They were instructed to express their saliva through a small straw into a 2-ml polypropylene vial. Samples were requested at wakeup, 30 min after waking, and four additional time points. These analyses focus on the two morning samples, which were used to calculate the CAR. Participants were instructed not to eat, drink, or brush their teeth in the 30 min before providing their samples. They noted the time that each sample was taken on vial labels at the time of each sampling. All cortisol values were initially measured in micrograms per deciliter and were subjected to a natural logarithmic transform. The CAR was calculated by subtracting the cortisol level at wakeup from the level 30 min later.

Actigraphy Data

Participants wore the Actiwatch-64 (Mini Mitter Respironics, Inc., Bend, OR, USA) on the wrist of their non-dominant hand for 3 days. Actiwatches use accelerometers to assess participants’ motor activity across the waking day and while sleeping. Actigraph data were scored using Actiware-Sleep software, version 3.4 (Mini Mitter). Epoch length used to calculate sleep analysis statistics was set to 1 min, which is considered adequate for the determination of sleep–wake timing [8]. This is a validated sleep scoring algorithmFootnote 2 [9] that objectively estimates waketime after sleep onset [10, 11] based on significant movement after at least 10 min of inactivity. As recommended [8], actigraphy raw data were visually inspected, and adjustments were made if it appeared the automatic algorithm had made an error in determining the waketime. Actigraphy data on sleep timing are highly correlated with polysomnography (PSG) [12].

Assay Procedures

Completed cortisol samples, diaries, and Actiwatches were returned to the university-based laboratory by courier. Saliva samples were then refrigerated at −20°C until they were sent by courier to Trier, Germany to be assayed. Samples were assayed in duplicate using a time-resolved immunoassay with fluorometric detection (DELFIA) and the average of the duplicate values were used [13]. The intra-assay coefficients of variation were between 4.0% and 6.7%, and the corresponding inter-assay coefficients of variation range from 7.1% to 9.0%.

Data Analysis

First, we examined the absolute differences between the self-reported and actigraphy-based waketimes. Next, we calculated whether these differences related to compliance with the requested timing of the first two samples each day. Participants were deemed “compliant” with the wakeup sample if they provided it within ±5 min of actigraphy-based waketimesFootnote 3 and with the wakeup +30 min sample if they provided it between 25 and 35 min after waking. Using logistic regression, we examined whether the degree of accuracy of self-reported waketimes predicted compliance with these protocols. Lastly, we predicted the size of the CAR from compliance estimates based on self-reported and actigraphy-based waketimes.

We hypothesized that taking the wakeup sample too late or the CAR sample earlier or later than the desired 30 min after waking could produce inaccurate estimates of the CAR, most likely resulting in an underestimation of this parameter. Furthermore, use of objective waketimes will provide better estimates of compliance, and compliance based on actigraphy-based waketimes will better predict the size of the CAR.

Regression analyses were clustered within individuals in order to calculate robust standard errors. Race/ethnicity (20% Black, 7% Hispanic), gender (24% male), age, objective waketime, oral contraceptives (30% of females), and nicotine use (16% smokers) were included as covariates because prior research indicates that these factors are associated with cortisol parameters [1416]. Although level of neuroticism was not related to wake delay or compliance, analyses were weighted to adjust for the over-sampling of adolescents with high levels of neuroticism.

Results

Concordance between Self-Reported and Objective (Actigraph-Based) Waketimes

Self-reported waketimes were found to be relatively accurate: 75% were within 5 min and 92% were within 15 min of objective waketimes (see Fig. 1). On average, however, self-reported waketimes occurred 6.2 min after objective waketimes (SD = 14.3; range = 0–153 min). While the first sample occurred 2.8 min on average after subjective waketimes (SD = 9.3; range = 0–90) on average, it occurred 7.1 minutes after objective waking (SD = 15.9; range = 0–153). On average, the second sample occurred 35.4 min after subjective waking (SD = 12.83; range = 15–120) and 37.9 min (SD = 19; range = 15–183) after objective waking (Table 1).

Fig. 1
figure 1

Plot of actigraph-based and self-reported wakeup times

Table 1 Levels of compliance with cortisol sampling protocol based on actigraph and self-reported waketimesa

Associations between Waketime Accuracy and Compliance

The gap between self-reported and objective waketimes influenced whether or not participants were found to be compliant with taking their samples within the optimal time frames after objective waking. Participants with identical self-reported and objectively measured waketimes (gaps of less than 1 min) were six times more likely to comply with the entire morning sampling protocol (i.e., they were on time for both samples 1 and 2; OR = 6.04, p < 0.01), while those with larger gaps between objective and subjective waketimes (more than 5 min) were 90% less likely to be compliant for both samples (OR = 0.10, p < 0.001), compared to participants with discrepancies between 1 and 5 min. This was due to lower compliance with sampling protocols for both sample 1 and sample 2—compared to those with identical subjective and objective waketimes, participants whose self-reported wakeup times differed from objective waketimes by more than 5 min were 97% less likely to be compliant with the timing of sample 1 (OR = 0.03, p < 0.001) and 91% less likely to be compliant with the timing of sample 2 (OR = 0.09, p < 0.001). Those with waketime discrepancies between 1 and 4 min did not differ significantly from those with discrepancies of 0 min.

Associations between Compliance and CAR

Participants who took both samples on time relative to objective waketimes had significantly larger CARs (0.41 SD, p < 0.05) than those who provided neither sample on time, after adjusting for health and demographic covariates. The CARs of participants who provided one sample on time did not differ significantly from those who provided zero or two samples on time. In original units, the raw difference in CAR between compliant and non-compliant participants is 0.19 μg/dl. This effect size is not trivial—CAR differences of a similar magnitude have been found to significantly differentiate those at high vs. low levels of chronic stress [17] and clinical groups such as obese vs. non-obese men [18].

In contrast to actigraphy-based analyses of compliance, participants whose self-reported waketimes indicated compliance with sampling protocols for one (β = 0.29 SD, p > 0.05) or both (β = 0.22 SD, p > 0.05) samples did not have significantly higher CARs, although the positive coefficients suggest that the impact of non-compliance on the CAR may be obscured by the subtle bias introduced by reliance on self-reported waketimes.

Associations between Waketime Discrepancies and the CAR

Finally, we analyzed whether discrepancies between actigraphy-based and self-reported waketimes significantly predicted the CAR. We found a positive but non-significant association in the full sample, using weights that adjust for over-sampling adolescents with high levels of neuroticism (β = 0.09, p > 0.10). However, when data were analyzed separately for participants with low, medium, and high levels of neuroticism, in line with Dockray et al. [5], we found that gaps greater than 15 min between subjective and objective waketimes predicted significantly lower CARs among the low neuroticism participants (β = −1.08 SD, p < 0.05). There was also a trend for lower CARs among those with discrepancies between 5 and 15 min (β = −0.96 SD, p < 0.10). Thus, for normative (non-high-risk) participants, inaccurate waketime reporting may have a significant effect on the accuracy of measurement of the CAR.

Discussion

These results indicate that accurate reporting of waketimes has implications for compliance with the timing of morning cortisol collection protocols, which, in turn, has implications for estimates of the size of the CAR. Because cortisol levels increase so rapidly after waking, it is particularly important that the sample timing protocols are strictly followed when examining this parameter. Inaccurate assessments of waketime contribute to non-compliance with morning sampling protocols and may lead to significant under-estimates of the CAR among low-risk samples. Failure to adhere to sample timing protocols may result in incorrect calculations (typically underestimations) of the CAR, which may obscure associations between the CAR and other variables of interest.

Nonetheless, adolescents were reasonably accurate when reporting their waketimes, with an average delay between actigraphy-based and subjective waketimes of 6.2 min, and 90% of participants self-reporting waketimes within 15 min of their objectively determined waketimes. Thus, although objective sleep measures may improve the ability to estimate participant compliance with requested sampling times, and thus accurately calculate the size of the CAR, reliance on self-reported waketimes will likely yield reasonable estimates of waketimes for the majority of participants.

These analyses are limited by the fact that only two data points were used to estimate the CAR, and it is possible that certain individuals who complied with our sampling protocol may have naturally experienced their peak awakening response at a time point that was slightly earlier or later than we requested. Moreover, because the actigraph assesses sleep based on level of activity, periods of low activity may be falsely scored as sleep and vice versa. Although actigraphy data have been found to be highly correlated with polysomnography, the actigraphy data likely included a small degree of measurement error, and the use of other objective measures of sleep timing (i.e., PSG) may further increase the level of accuracy of sleep timing measures [12]. In addition, because we over-sampled participants who were high in neuroticism, the results may not be generalizable to a low-risk population; however, when high-risk participants were excluded from analyses, results were similar.

In summary, inaccuracies in subjective reports of waketime significantly influence participant compliance with sampling protocol, which is associated with slightly lower estimates of the CAR. However, in line with results found with older adults [5], most adolescents are fairly accurate in estimating time of waking. Thus, the large added cost of actigraphy (hundreds of dollars per device) specifically for the purpose of monitoring objective waketime may not be warranted, but, where available, use of actigraph-based waketimes may be a useful tool for improving the reliability of CAR measurement.