Introduction

Caffeine is widely regarded as a useful psychostimulant (it is even sold as an over-the-counter medicine to “relieve tiredness and help maintain mental alertness”), although consumers also recognise that caffeine can disrupt sleep and that caffeine withdrawal is associated with various adverse effects, including fatigue and headache. What then are the overall or net benefits (if any) of caffeine consumption for psychological functioning? While the information available from research on the behavioural effects of caffeine is extensive, it does not provide an unequivocal answer to this question (James 1997; Rogers and Dernoncourt 1998). The reasons for this are primarily methodological. While many studies have found that caffeine (versus placebo) increases self-ratings of alertness, and improves mood and cognitive performance, in the vast majority of these studies the participants had a history of regular caffeine consumption, and they were tested after a substantial period of caffeine abstinence (i.e., withdrawal). What this experimental protocol leaves open is the question of whether the results obtained are due to beneficial effects of caffeine or to deleterious, including fatiguing, effects of caffeine withdrawal. Furthermore, because significant fatigue results from even overnight caffeine withdrawal (e.g., Richardson et al. 1995), it is possible that the psychostimulant effects of caffeine felt by regular caffeine consumers represent only reinstatement of functioning (James 1994). That is, caffeine intake merely restores mood, alertness and performance to baseline levels (i.e., the levels displayed by individuals completely free of the effects of caffeine and acute caffeine withdrawal).

One way to investigate this question is to compare the effects of caffeine in caffeine consumers and non-consumers; although these groups are self-selected and therefore pre-existing differences might account for variation found in responses to caffeine administration (Goldstein et al. 1969; Rogers et al. 2003). Another approach is to test non-caffeine-withdrawn individuals (Smith et al. 1994; Warburton 1995; Warburton and Bersellini 2001), but again this is inconclusive, because the possibility that poorer performance in the placebo condition is due to residual caffeine withdrawal cannot be definitely excluded.

A much better approach to the problem is to measure the psychostimulant effects of an acute caffeine versus placebo challenge in long-term (i.e., ‘fully’) caffeine-withdrawn participants (LTW) compared with overnight caffeine-withdrawn participants (ONW). A net beneficial effect of caffeine consumption would be demonstrated if caffeine administration led to significant improvements in functioning in LTW as well as in ONW participants, and especially if caffeine brought both groups up to the same level of improved functioning. In contrast, the reinstatement or withdrawal reversal hypothesis predicts that without caffeine ONW participants will perform worse than LTW participants, and that caffeine administration will affect functioning only of ONW participants, bringing their level of functioning up to but not exceeding that of LTW participants. More or less exactly this result was obtained in two independent studies, one using a letter recognition task (James 1998) and the other using a long-duration simple reaction time task (Rogers et al. 1998; see also Bruce et al. 1991).

This methodology has therefore already contributed important new information relevant to estimating the impact of caffeine consumption in everyday life. However, its application to date has been restricted to the performance of simple tasks during the morning when alertness is relatively high. Consequently, it remains possible that there are circumstances under which caffeine consumption will bring about a significant net improvement of functioning. In particular, it has been suggested that caffeine is most likely to benefit performance when alertness is low (e.g., Johnson et al. 1990; Lorist et al. 1994; Smith et al. 1994; Horne and Reyner 1996; Lieberman et al. 2002; Reyner and Horne 2002; Wesensten et al. 2002). Accordingly, the present study investigated the effects of caffeine on self-rated mood and alertness, and performance of various cognitive and psychomotor tasks in LTW and ONW participants whose sleep had been restricted to 5 h on the night before testing. Overnight caffeine withdrawal was used because this is a feature of many previous studies of the psychostimulant effects of caffeine (James 1997; Rogers and Dernoncourt 1998), and it is also part of everyday patterns of caffeine consumption (Smit and Rogers 2002).

Materials and methods

Participants

Forty-eight participants (29 female) aged between 20 and 34 years took part in the study. They were all students at the University of Bristol, and they participated for payment. The study was described as an investigation into the effects of a fruit squash drink on mood and cognitive function in the context of sleep deprivation. The participants were moderate to high caffeine consumers, all consuming at least three cups of tea or coffee a day (this information was gathered by email before the participants had been recruited to the study). They were either non-smokers or light smokers, and reported that they had no food sensitivities and were not pregnant or breastfeeding. All participants read an information sheet and signed a consent form before taking part in the study.

Study design and procedure

Participants were provided with supplies of tea and/or coffee and instructed to avoid all other caffeine-containing drinks during the 3 weeks immediately before testing. Twenty-three of the participants were provided with decaffeinated tea and/or coffee (LTW) and 25 were provided with regular tea and/or coffee (ONW). They were told that the purpose of this part of the study was to investigate responses to changes in “brands of tea and coffee”, and no reference was made to the caffeine content of the supplies. They were monitored closely during this period, which included periodic visits to the laboratory to replenish their supplies of tea and/or coffee, and to download data from electronic diaries used for measuring mood, and tea and coffee consumption patterns (details of this part of the study will be reported elsewhere—in preparation).

On the night before testing, participants’ sleep was restricted to 5 h. Information collected on their usual sleeping patterns revealed that this was a reduction of between 1 and 5.5 h (mean 3 h) in a normal night’s sleep for these individuals. In order to ensure that they slept for only 5 h, they arrived at the laboratory at 10.30 p.m., where they were supervised while they watched films until 2.30 a.m. After this they were returned home by taxi to get to sleep at around 3 a.m. They were then required to check into the laboratory again at 9 a.m., so it was anticipated they would wake about 8 a.m. in order to arrive on time.

These sleep-restricted participants were tested after overnight caffeine abstinence on a battery of mood and cognitive performance tasks and other measures before (baseline) and beginning 30 min after administration of 1.2 mg per kg body weight of caffeine and on a separate occasion after placebo (order counterbalanced across participants). This is a dose of 84 mg of caffeine for a person weighing 70 kg, which is similar to the amount of caffeine contained in a serving of instant coffee (James 1997). Testing took place 2 days apart either in the morning or early afternoon (counterbalanced across participants) on Tuesdays and Thursdays. Note that this meant that half the LTW participants tested on Thursday had received caffeine 2 days previously. It was assumed that this single exposure to a modest dose of caffeine would not significantly affect their status as long-term withdrawn. The alternative procedure of withdrawing them perhaps again for a further 3 weeks was felt to be unnecessarily demanding and costly. On both occasions each participant provided a sample of saliva, which was analysed (see below) to check on compliance with the instructions regarding consumption of caffeine-containing drinks. Participants were tested alone, either in a room with an experimenter present (blood pressure, heart rate and hand steadiness), or accommodated in a separate, private booth (cognitive and psychomotor tasks and mood).

These procedures were approved by the University of Bristol, Department of Experimental Psychology Human Research Ethics Committee, and the study was performed in accordance with the ethical standards laid down by the 1964 Declaration of Helsinki.

Drug administration

Caffeine was administered double-blind in a blackcurrant drink (Blackcurrant Squash, Sainsbury’s Supermarkets, UK, diluted according to manufacturer’s instructions). Food grade caffeine hydrochloride was dissolved in this drink at a concentration 0.336 mg/ml and the volume of drink given to individual participants was varied so that each participant received a dose of 1.2 mg caffeine per kg body weight (e.g., 250-ml drink for a 70-kg person). The placebo drink was the same volume of blackcurrant squash drink, without caffeine.

Test battery

Various tasks were used to measure cognitive and psychomotor performance. Included in this battery were tasks which assessed the ability to sustain attention (e.g., simple reaction time and focus of attention tasks) and the ability to inhibit responses (impulsivity task), tasks which assessed memory, and a cognitively demanding, reasoning task. Two psychomotor tasks assessed tapping speed and hand steadiness. Mood, including alertness, some physical sensations (symptoms), and perceived task demand were measured using self-rating scales. Blood pressure and heart rate were also recorded. The cognitive performance and tapping tasks, and self-report rating scales were programmed and presented using E-Prime 1.0 (Psychology Software Tools, 1996–2001) run on networked PCs with 15-in. colour monitors and standard keyboards. Participants were tested in individual booths to ensure minimal unintended distraction. The tasks are described in detail below in the order that they were completed. The total time taken to complete this test battery was about 50 min.

Self-reported mood and physical sensations

Participants were instructed to rate mood and physical sensation states according to how they were feeling “at the moment” using 9-point rating scales presented one at a time on the computer monitor. Ratings were made by keying the appropriate number from 1 to 9. Four aspects of mood were rated on bipolar scales: energetic mood (drowsy/sluggish–energetic/alert), tense mood (tense–relaxed), hedonic tone (sad/gloomy–happy/cheerful), and overall mood (bad mood–good mood). Physical sensations were rated on one bipolar scale and four unipolar scales. The physical sensation descriptors were clear-headed–muzzy/dazed, light-headed/feeling faint, jittery/shaky, and headache. On the latter four unipolar scales 1 represented “not at all” and 9 represented “extremely”.

Hand steadiness

Participants held a 1.59-mm-diameter stylus in holes drilled in a 1-mm-thick metal plate, which stood on a table so that it leaned away from the participant at an angle of 50° from the vertical. The holes measured 3.96, 3.18, 2.76, 2.36 and 1.98 mm in diameter (32011 Steadiness tester, Lafayette Instrument, IN, USA). Participants were instructed to hold the stylus in each hole for 30 s, as far as possible avoiding contact with the metal plate. They were permitted to use their preferred hand but not use their other hand or the table top for support. A buzzer indicated if the stylus touched the side of the hole, and chimes indicated the start and finish of the 30 s. The number of contacts and the time spent in contact (ms) were recorded for each hole using an interface and programme designed by a colleague (W.S. Maggs, Department of Electrical and Electronic Engineering.)

Systolic and diastolic blood pressure and heart rate

Systolic and diastolic blood pressure and heart rate were measured using the Omron 711 Intellisense automatic inflation monitor (Omron Healthcare UK, West Sussex, UK).

Two-finger tapping task

Participants were required to press alternately the n and m keys on the computer keyboard as quickly as possible using their first and index fingers of their preferred hand. They were told that the task would end automatically when they had tapped 300 times, and that this would be signalled on the computer monitor. The time taken to make 300 alternate taps was recorded.

Memory (immediate and delayed recall)

Fifteen words were presented one at a time. They were displayed for 1 s with an inter-stimulus interval of 1 s. The words were nouns between four and seven letters long and they were all of similar word frequency and imagery. No plurals, or names were used. Before presentation of the words participants were told that they would be required to remember as many words as possible. Immediately after presentation of the words, then again 25 min later, participants were given 2 min in which to write down all the words that they could remember. For both immediate and delayed recall the dependent variable was the number of words correctly recalled.

Long duration variable foreperiod Simple Reaction Time (SRT) task

For this task participants were instructed to press the space bar as quickly as possible upon detection of a stimulus, a small star, presented in the centre of the computer screen. There was a variable stimulus onset of 1, 3, 7 and 16 s. The trials were divided into six blocks, each consisting of four of each of the variable stimulus onset durations occurring in random order. The dependent variable was mean reaction time.

Focus of attention

This was based on the two-choice reaction time task developed by Eriksen and Eriksen (1974) (see also Broadbent et al. 1989). On each trial three warning crosses were presented on the computer monitor for 500 ms and then replaced by a target letter (A or B). This target was either presented alone or accompanied by distracter stimuli on both sides. The distracters were stars, letters the same as the target letter, or letters different from the target letter, that were positioned either near or far from the target. Participants were required to indicate whether the target was an A or B by pressing keys labelled A and B on the computer keyboard (A = J key and B = F key on the keyboard). Twenty-four continuous blocks of 16 trials were completed. The dependent variables were mean reaction time, number of errors, and Eriksen effect scores (Broadbent et al. 1989).

Reasoning task

In this task (Baddeley 1999) participants were presented with a series of sentences, each describing the order of presentation of two letters, A and B. Each sentence was either followed by the letters AB, or BA. The participants’ task was to decide whether the sentence correctly described the order of the letter pair. Participants were given the following example; “B follows A–AB” which is true and “B precedes A–AB” which is false. The sentences also varied in difficulty, for example, “A does not proceed B” and “B is not proceeded by A”. There were 32 different sentences that ran continuously in random order for 3 min. Participants responded by pressing the F key labelled T (true) and J key labelled F (false) on the computer keyboard. The dependent variable was mean number of correct responses.

Impulsivity (inhibition) task

The Test of Variables of Attention described by Leark et al. (1996) was adapted as follows. Participants saw different coloured squares appear successively one at a time in the centre of the computer monitor. They were required to press the space bar as quickly as possible on appearance of each coloured square, except for the blue one for which they were to respond to by doing nothing. There was no inter-trial interval (i.e., the next stimulus was presented immediately the space bar was pressed). For blue squares, when the space bar was not pressed, the stimulus remained displayed for 2 s, after which it disappeared and the next trial followed immediately. There were six blocks of 65 trials (15 red, 15 green, 15 yellow, 15 pink and five blue), and within each block these trials occurred in random order. The dependent variables were the number of correctly withheld responses and mean reaction time of correct responses.

Self reported task demand

On completing the test battery, participants rated how difficult, effortful and tiring they found the tasks to be. Responses were made on scales ranging from 1 to 9 where 1 represented “not at all” and 9 represented “extremely”.

Habitual caffeine intake

Average daily caffeine consumption was calculated from drink intake questionnaires (Richardson et al. 1995) using the following estimates adapted from James (1997): instant coffee 90 mg/cup, tea 60 mg/cup, cola 30 mg/can, energy drinks such as Red Bull 80 mg/can, and caffeine tablets 50 mg/tablet.

Enzyme immunoassay for caffeine

A competitive enzymeimmunoassay for caffeine was established and validated against an HPLC assay and then used for analysis of caffeine levels in saliva samples. Wells were initially coated with rabbit anti-mouse immunoglobulins (DAKO Cat. no. Z0109) and then with anti-caffeine antibody (Biodesign Cat. no. G45110M AMS Biotechnology, UK) at 1 in 128,000 dilution. Standard solutions (5–500 ng/ml; 0.026–2.6 μM) of caffeine (Sigma Cat. no. C-8960) were prepared in phosphate-buffered saline containing 0.1% bovine serum albumin (PBS–BSA). The caffeine derivative, 7-(5-carboxypentyl)-1,3-dimethylxanthine (Department of Biochemistry, University of Surrey, UK), was conjugated to HRP using the mixed-anhydride method (Erlanger et al. 1957), as modified by Dawson et al. (1978). Salivary samples were thawed overnight before analysis, centrifuged at 10,000 rpm for 15 min using a Heraeus Sepatech Biofuge A centrifuge and supernatants diluted 1 in 100 in PBS–BSA. For the assay, 10 μl standard, control saliva or diluted unknown saliva was added to individual wells in duplicate, followed immediately by 100 μl caffeine–peroxidase conjugate at 1/800 dilution. Plates were shaken briefly and then incubated for 1 h at 37°C. After washing, bound peroxidase activity was determined using TMB as substrate. A standard curve of %B/Bo (absorbance reading for each well (B) divided by the average absorbance for the zero standard wells (Bo) ×100) versus standard concentration was used to determine the concentration of caffeine in the samples.

Data analysis

Saliva samples taken on the two test days showed that a number of participants had clearly failed to comply with the instructions concerning avoidance of caffeine-containing drinks. These participants were excluded from the analyses of the effects of caffeine and caffeine withdrawal presented below. They were six participants from the LTW group who gave one or two saliva samples showing a caffeine concentration >500 ng/ml, and eight participants from the ONW group who gave one or two saliva samples showing a caffeine concentration >2,000 ng/ml. The use of these different ‘thresholds’ is based on the expectation that LTW participants who had complied would not have detectable levels of caffeine in their saliva (the assay was not reliably sensitive to caffeine concentrations below 500 ng/ml), but that ONW participants might have residual, but low systemic levels of salivary caffeine as a result of caffeine consumed the previous day. The latter assumption is supported by observations from a previous study (Heatherley et al., submitted) in which participants known to have complied with instructions to abstain from caffeine overnight, nevertheless had caffeine concentrations of between 545 and 1,834 ng/ml in saliva sampled during the morning or early afternoon the next day. Note that using the stricter criterion (<500 ng/ml) for the ONW group would have excluded participants with higher intakes of caffeine (throughout the day or in the evening) and/or slower caffeine-elimination rates, which would then have limited the generality of the findings.

The data on cognitive performance and mood were analysed to test the two main predictions of the withdrawal reversal hypothesis (see “Introduction”); namely that, (1) in the absence of caffeine LTW participants will perform better and have better mood than ONW participants, and (2) caffeine administration will affect performance and mood of ONW participants but not of LTW participants. This was done by examining ONW versus LTW differences in pre-treatment (baseline) mood and performance, and examining differences in their subsequent responses to caffeine, respectively (cf. Rogers et al. 2003). Pre-treatment data were analysed using ANOVA with ‘withdrawal’ (LTW versus ONW) and ‘time of testing’ (morning versus afternoon) as between-subjects factors. Note that the study design coupled with this method of analysis provides two estimates (one from the occasion when participants subsequently received placebo and one from when they subsequently received caffeine) of pre-treatment dependent variables for each individual for the crucial comparison of LTW versus ONW between-subjects effects. The effects of caffeine administration were analysed by calculating difference scores (post-treatment minus pre-treatment) and subjecting these to ANOVA with ‘treatment’ (caffeine versus placebo) as a within-subjects factor and ‘withdrawal’ and ‘time of testing’ as between-subjects factors. By controlling for ‘trait’ and ‘state’ differences in performance measured pre-treatment, this procedure greatly increases sensitivity in relation to detecting effects of caffeine versus placebo. Note that F ratios and P values for main and interaction effects calculated for these difference (change) scores are numerically the same as effects calculated for raw scores with the pre-/post-treatment factor included. Where appropriate, paired comparisons were carried out using t-tests.

Results

Table 1 shows that the two groups of participants (LTW and ONW) did not differ significantly in gender ratio, age, body weight, or habitual level of caffeine intake.

Table 1 Participant characteristics

Tables 2, 3 and 4 show the results for cognitive performance, mood and blood pressure. The results in the left-hand panels of these tables show the effects of withdrawal (LTW and ONW) on performance etc., and the results in the right-hand panels show these participants’ subsequent responses to caffeine, corresponding respectively to the first and second predictions of the withdrawal reversal hypothesis.

Table 2 Results for cognitive and psychomotor performance
Table 3 Results for mood, physical sensations and task demand
Table 4 Results for blood pressure and heart rate

Prior to receiving caffeine the ONW and LTW participants differed on a number of measures of mood and cognitive performance in ways consistent with the withdrawal reversal hypothesis. Specifically, the ONW participants’ performance on the focus of attention, reasoning, and impulsivity tasks was significantly worse or marginally significantly worse than the performance of the LTW participants on these tasks. There was also a substantial, though non-significant, difference in reaction time on the SRT task, and ONW participants reported that they found the tasks more tiring, and difficult (marginally significant). Ratings of mood and other feelings also showed marked negative effects of overnight caffeine withdrawal, with the ONW participants reporting heightened feelings of tension, light-headedness, and jitteriness (marginally significant), more headache, and lower clear-headedness, and energy (marginally significant) than the LTW participants. There were no significant pre-treatment differences between ONW and LTW participants for psychomotor performance (tapping and hand steadiness tasks), blood pressure or heart rate.

Caffeine (versus placebo) had effects on performance mood and blood pressure. In relation to performance, these effects were broadly as predicted by the withdrawal reversal hypothesis. For two of the cognitive performance tasks, SRT and focus of attention, there was a significant and a marginally significant treatment by caffeine interaction effect, respectively; due to the fact that caffeine significantly affected the performance of the ONW participants but not of the LTW participants. In both cases there was a further deterioration in performance from pre-treatment levels when the ONW participants received placebo, and caffeine prevented this (Table 2). Caffeine did not significantly affect the performance of the LTW participants on any of the cognitive tasks.

In contrast, the (rather weak) effects of caffeine on mood and other feelings occurred in both ONW and LTW participants. These effects were, relative to placebo, a decrease in light-headedness, and an increase in clear-headedness (marginally significant), but also an increase in jitteriness (marginally significant).

Caffeine relative to placebo also improved tapping performance, but tended to increase tremor (impair hand steadiness) (Table 2), and it increased blood pressure (Table 3). The magnitude of each of these effects of caffeine was very similar in the two groups of participants.

None of the effects of caffeine withdrawal and caffeine described above varied systematically with time of testing (morning versus afternoon) (P>0.1).

Finally, it is worth noting that analysis of raw post-treatment scores confirmed these various findings. ANOVA, with withdrawal (LTW versus ONW) and time of testing (morning versus afternoon) as factors, showed that in the placebo condition ONW participants performed significantly worse on the SRT, focus of attention (errors), reasoning and impulsivity tasks (response time), and reported significantly worse mood (energetic mood, overall mood, clear-headedness, light-headedness, jitteriness), greater headache, and greater task demand (‘difficult’ and ‘tiring’) than LTW participants (P<0.05). However, when they had received caffeine, ONW and LTW participants differed significantly (P<0.05) on only five variables, namely focus of attention (response time), energetic mood, headache, light-headedness and ‘tiring’. ONW and LTW participants did not differ significantly (P>0.1) on tapping speed, hand steadiness, blood pressure and heart rate under either the placebo or caffeine conditions.

Discussion

A clear finding from this study was that acute (i.e., overnight) caffeine withdrawal was associated with negative effects, including impaired cognitive performance and the perception that the cognitive tasks were more difficult and tiring to perform, greater headache, reduced alertness and clear-headedness, and an increased feeling of light-headedness. Feelings of tension and jitteriness were also increased, which might be further symptoms of acute caffeine withdrawal or effects of caffeine consumed the previous day (e.g., Goldstein et al. 1969; Rogers et al. 2003).

Some aspects of mood were weakly improved by caffeine and certain other feelings, such as light-headedness, were more clearly improved. Although these effects occurred in both the acutely (ONW) and long-term (LTW) caffeine withdrawn participants, they were not accompanied by improved cognitive performance in the LTW participants. That is, caffeine consumption failed to benefit the performance of participants who were free of the negative effects of acute caffeine withdrawal, even in the context of low alertness induced by sleep restriction. This is a key result, because it contradicts suggestions that caffeine is especially beneficial for performance when alertness is low (e.g., Johnson et al. 1990; Lorist et al. 1994; Smith et al. 1994; Horne and Reyner 1996; Lieberman et al. 2002; Reyner and Horne 2002; Wesensten et al. 2002). Caffeine did affect the cognitive performance of the ONW participants, but merely to prevent yet further deterioration in their poorer pre-treatment performance. In the study by Lieberman et al. (2002) participants were withdrawn from caffeine and ‘almost totally sleep-deprived’ for 72 h before caffeine administration. The authors state in relation to their findings that the ‘typical dietary levels of caffeine intake by the subjects were not high’ and therefore ‘caffeine withdrawal would have been modest in these volunteers’ (p. 254). The evidence provided for this, however, is unconvincing. No data on the usual caffeine intakes of the volunteers are presented and, although pre-study salivary levels of caffeine appeared to be low, information on what time of day the saliva samples were taken is not given. The latter omission is crucial, as the levels reported (mean ≅ 600 ng/ml) are similar to those found in the present study after overnight caffeine withdrawal.

The finding that the SRT and focus of attention tasks revealed effects of caffeine and caffeine withdrawal is consistent with the general finding that performance on such vigilance and continuous performance tasks is particularly sensitive to caffeine administration (e.g., James 1997; Rogers and Dernoncourt 1998). Memory performance, on the other hand, appears to be much less reliably affected by caffeine. In the present study the SRT and focus of attention tasks took up respectively 12 and 8 min of the 50 min taken to complete the entire task battery. It is unlikely that more or clearer performance effects would have been found had a higher dose of caffeine been used. The average dose received by participants in this study was 79 mg, and various previous studies show that the dose–response relationship for caffeine and cognitive performance effects is very ‘flat’ in the range above about 30 mg (Lieberman et al. 1987; Rogers and Dernoncourt 1998; Smit and Rogers 2000).

It is worth noting that sleep restriction did successfully lower alertness. Many of the participants complained of feeling tired during the testing sessions, and their mean pre-treatment alertness ratings (LTW = 3.69, ONW = 2.65) were markedly lower than those of caffeine non-consumers and caffeine consumers (NC = 5.20, C = 4.27) who participated in a similar study but who were not sleep-restricted (Rogers et al. 2003). Sleep restriction also appears to have degraded performance. Parallel data for the SRT task are follows: LTW = 402 ms and ONW = 460 ms (present study), and NC = 371 ms and C = 376 ms (Rogers et al. 2003).

One aspect of psychomotor performance, tapping speed, was improved by caffeine in both ONW and LTW participants; however, at the same time caffeine tended to impair hand steadiness. These observations confirm previous findings (e.g., Richardson et al. 1995). Indeed, increased tremor is a well-known effect of caffeine, and especially of moderate to high doses of the drug (e.g., James 1990, 1997; Arnold et al. 1993; Bovim et al. 1995; Miller et al. 1998). Furthermore, and again consistent with many previous reports (e.g., Goldstein et al. 1969; James 1990, 1997), there were negative effects of caffeine for both groups in relation to feelings of jitteriness (marginally significant) and also blood pressure. Pre-treatment blood pressure did not differ between LTW and ONW participants, and it was increased by caffeine quite substantially and to the same extent in both groups.

The approach of comparing the performance of LTW and ONW participants before and after an acute caffeine challenge has been used previously in only a very few studies (Bruce et al. 1991; James 1998; Rogers et al. 1998). The present results extend the results of those studies by measuring effects on a much wider range of performance and other variables and, crucially, by showing that caffeine did not benefit the LTW participants’ performance even though their alertness had been lowered by restricting their sleep. James (1998) and Rogers et al. (1998) both found that caffeine failed to improve LTW participants’ performance on sustained attention tasks, and indeed in the latter study, if anything, caffeine tended to impair performance (not significant). However, self-rated alertness of LTW participants was increased by caffeine in James’s study, and to a lesser extent in the study by Bruce et al. (1991). The explanation for this mismatch between apparent alerting and performance effects of caffeine is uncertain, although one possibility, as noted previously (Rogers et al. 2003), is that the increase in alertness is a misinterpretation of other subjective effects of caffeine, including an increase in ‘jitteriness’ (e.g., Goldstein et al. 1969; and the present study).

Bruce et al. (1991) also included a tapping task in their study, and found that there was a trend for caffeine to improve performance in the ONW participants (actually 24 h caffeine withdrawn), but not in the LTW participants. Nevertheless, the small effect of the lower (250 mg) of the two doses of caffeine administered in this study was identical for LTW and ONW participants, which agrees with the present results (Table 2). Five hundred milligrams of caffeine failed to further improve tapping performance, or subject state, in LTW participants, whereas it did so for ONW participants. This perhaps suggests the presence of tolerance to certain performance-disrupting, adverse effects of caffeine in the ONW participants, which is lost with long-term withdrawal.

A surprising finding from the present study was the failure of one third of the participants to comply with the instructions concerning restrictions on consumption of caffeine-containing drinks. It may be that this happened in part because some participants were attempting (mistakenly) to ameliorate the impact of sleep restriction. Nonetheless, such poor compliance is a cause for concern, as inclusion of the non-compliers would have caused the study to underestimate the effects of caffeine consumption and caffeine withdrawal (data not shown). This may well be a general problem in this area of research, as measurement of systemic caffeine concentration has to date been used only rarely to check participant compliance in such studies. In the present study the use of different criteria for including LTW and ONW participants is justified by the fact that systemic caffeine concentration might well remain moderately elevated after overnight caffeine abstinence, especially in slow eliminators and/or in individuals who have consumed caffeine during the evening prior to testing. These are simply variations in everyday patterns of caffeine consumption. If anything, the criteria used would be expected to have reduced the likelihood of finding differences in responses of the LTW and ONW participants, because for the ONW participants presumably the effects of caffeine withdrawal and subsequent caffeine administration would be greatest when systemic caffeine levels are at their lowest.

Taken together, the findings from this study provide further strong support for the withdrawal reversal hypothesis. In particular, cognitive performance was found to be affected adversely by acute caffeine withdrawal, and cognitive performance was not improved by caffeine in the absence of these adverse effects. Different patterns of effects (or lack of effects) of caffeine and caffeine withdrawal were found for other variables, but overall these results also suggest that there is little benefit to be gained from caffeine consumption.