Background

Sleepiness and sleep deprivation are more prevalent now than they have ever been before with the shift towards a 24/7 society (Bonnefond et al. 2006). As a result, diurnal working patterns are no longer considered normal (Ganesan et al. 2019). Industries, such as manufacturing, defence and healthcare are heavily reliant on this highly functioning, nonstop workforce. Circadian mismatch can result from poor sleeping habits, rapidly rotating shifts, nonconventional business hours and a poor sleeping environment (Bonnefond et al. 2006; Ganesan et al. 2019). As a result, fatigue, decreased alertness and lowered employee morale can prevail throughout a workforce. This poses many challenges within the 24/7 industries; however, many contemporary approaches exist to combat fatigue, such as regular rostering reform, fatigue related education and fatigue monitoring (Sadeghniiat-Haghighi and Yazdi 2015; Wolkow et al. 2019). Techniques to monitor fatigue have been deemed essential in some industries to ensure employee safety and optimise productivity and performance (Wolkow et al. 2019).

Fatigue can be evaluated through various objective or subjective measures. Employers often implement a series of tests to monitor and assess fatigue in employees to determine if they pose an increased risk or are safe to be at work. Quantitative fatigue information can be sought through objective testing. Objective tests for measuring fatigue can include computer/electronic-based assessments, such as the psychomotor vigilance test (PVT) (Basner et al. 2011), as well as drowsiness measures assessed via biological markers, such as saliva (Harris et al. 2010), urine (Flynn-Evans et al. 2018) and blood (Wolkow et al. 2019). The latter three; however, can be considered to be more invasive and take hours-to-days to be conclusive—rendering them incompatible as a rapid measure.

The PVT is a short reaction time test used to measure objective vigilant attention (Basner et al. 2011; Basner and Dinges 2011; Arsintescu et al. 2017). The test requires an individual to press a key in response to a visual stimulus displayed on an electronic screen (Basner and Dinges 2011; Arsintescu et al. 2017). Typically, this can be in the form of a handheld electronic device, mobile telephone or tablet, or desktop computer/laptop. Through the recruitment of the prefrontal, motor and visual cortex, the PVT tests the cognitive domain of vigilant attention (Basner et al. 2011). Performance on the PVT is typically based on three elements: reaction time, performance lapses and false starts (Basner et al. 2011; Basner and Dinges 2011; Arsintescu et al. 2017). For instance, an individual’s reaction time is determined as the duration from which the stimulus is first displayed, through to when it is acknowledged by pressing the key. A performance lapse occurs when the key is pressed too quickly, typically ≤ 100 ms, or too slowly, ≥ 500 ms (Basner and Dinges 2011; Arsintescu et al. 2017). Finally, a false start is registered when the key is pressed without a stimulus. The PVT tests can vary between 3, 5 and 10 min in duration; however, each of these tests have similar performance outcomes (i.e., reaction time, performance lapses and false starts) and have high levels of specificity (Basner et al. 2011; Basner and Dinges 2011; Arsintescu et al. 2017). The PVT has been described as the gold standard for fatigue observation due to its sensitivity in detecting sleepiness (Arsintescu et al. 2017).

The Karolinska Sleepiness Scale (KSS) is a subjective situational sleepiness measure used to judge attention and performance whilst undertaking routine tasks (Miley et al. 2016). It is a 9-point self-rated scale of one’s propensity to fall asleep at that moment in time (i.e., 1 = extremely alert; 9 = very sleepy, great effort to keep alert, fighting sleep) (Miley et al. 2016). Previous literature has frequently collected data from both the PVT and KSS to assist fatigue analysis in response to differing shift schedules. As a result, both PVT and KSS will be examined in this review.

Fatigue is an important factor for individuals and organisations alike to monitor and review to ensure safety within a workplace. We know alertness in the workforce is essential for health and safety. Subjective fatigue measures may not necessarily align with objective measures, as results can be falsified (to meet an accepted fatigue level to be at work) or people may not comprehend each subjective criterium—resulting in distorted scores (Flynn-Evans et al. 2018). This subjectivity can result in unsafe working conditions for employees. As a result, an objective measure (e.g., PVT) is optimal for employers due to decreased ability to cheat or make fraudulent declarations. Although many studies have examined the influence of shift rosters on PVT performance in naturalistic/real-world settings, there are limited reviews which have been conducted to examine this area of the literature as a whole. This systematic review sought to evaluate PVT performance among shift workers in response to different 24/7 rostering schedules (i.e., day, afternoon, evening and night shifts) performed under naturalistic conditions.

Methods

The design, implementation and reporting of this systematic review has been conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al. 2009).

Data sources and search strategy

The online databases searched for this review included Embase, Medline via OVID, PsycINFO and CINHAL. The literature search was conducted in April 2020 with no exclusion search filters applied. The Population, Intervention and Outcome (PIO) framework (Sayers 2008) was used to construct the search terms used to search each database. These included (P) shift work* AND (I) psychomotor vigilance test OR PVT OR reaction time OR Joggle AND (O) alert* OR recovery OR fitness* OR response* OR safety OR performance OR fatigue OR health OR sleep*.

Records identified in the initial search were uploaded into Covidence© (Veritas Health Innovation 2017), an electronic data extraction and referencing tool, where duplicates were removed. Abstract and title screening of the records were completed by two independent authors searching for suitable articles, with conflicts resolved via phone discussion between the authors. The remaining articles’ full texts were uploaded and reviewed against the predetermined inclusion and exclusion criteria, independently by the same two authors. The conflicts were addressed by a third author through the same software. Hand searching was completed by reviewing all the reference lists of the included articles; however, no further articles were identified.

Inclusion and exclusion criteria

Articles were excluded if they met any of the following criteria: (a) not written in English, (b) opinion pieces, (c) participants who did not work a rotating 24/7 roster (i.e., any combination of day, afternoon, evening or night shifts), (d) systematic reviews or meta-analyses, (e) studies that examined the influence of factors outside of the roster (e.g., caffeine or food) on PVT performance, (f) studies that took place in a simulated or laboratory settings and (g) if no full text was available.

Study quality assessment

A quality assessment tool, as published by the National Heart, Lung, and Blood Institute (NHLBI) was chosen to evaluate the risk of bias in each of the included studies (National institute of Health 2014). The tool was called the ‘Quality assessment tool for observational cohort and cross-sectional studies’. Each article was independently reviewed by two authors and conflicts were addressed via phone correspondence. Grading of Recommendations, Assessment, Development and Evaluation (GRADE) guidelines were used by the lead author to assess the primary evidence (Atkins et al. 2004).

Synthesis of findings

The data extraction was performed by the lead author and separated into two main summaries. The first, an overall summary of the included studies based on the following categories: (a) study design, (b) country where the study was conducted, (c) occupational group/industry, (d) shift/roster type and (e) fatigue-related measures, (f) study aims and conclusions, (g) NHLBI rating and (h) relationship between fatigue measures (Appendix, Table 2). In the second summary, the data from the included studies were extracted based on the following categories: (a) sample size, (b) age range, (c) gender ratio, (d) sleep between day shifts, (e) KSS results, (f) PVT duration, PVT timing, PVT model, (g) PVT performance outcomes (i.e., PVT mean reaction time, PVT mean lapses) and (h) PVT application (Table 1). Many articles did not report their raw data, rather a descriptive analysis or graphical representation. Owing to large variability between studies in rostering patterns and inconsistent application of objective and subjective fatigue measures, a meta-analysis of the study results was not feasible. Articles were grouped broadly into categories associated with the application of the PVT for discussion and comparison. These included (a) multiple instances per shift, (b) commencement and cessation of shift and (c) other varying times throughout the shift.

Table 1 Details of included studies

Results

Search results

The search strategy yielded 135 results, of which 23 were duplicates and removed from the review. Abstracts and titles of the remaining 112 articles were then screened for eligibility based on the ‘Inclusion and exclusion criteria’ outlined above, and from these, 60 records were excluded (Fig. 1). Fifty-two articles were evaluated at the full text level, of which a further 35 were removed as per Fig. 1. After the screening and eligibility assessment, 16 articles were included in the review and progressed to data extraction (Fig. 1). No new studies were identified through hand searching (Fig. 1).

Fig. 1
figure 1

PRISMA flow chart

Study characteristics

Included articles were published between 2006 and 2019. Studies originated from the United States (n = 8), Finland (n = 3), Norway (n = 2), Australia (n = 2) and Germany (n = 1). There was a large representation of occupational industries including healthcare (nurses, support staff and doctors) (n = 7), airline support staff (n = 3), mining (n = 3), police officers (n = 1), military (n = 1) and airline pilots (n = 1). The roster and shift types examined in the included studies varied considerably between randomised shifts to a progression forward or reverse rotating roster. Inclusion criteria resulted in each study using the PVT as a primary measure of fatigue. In addition, secondary fatigue-related measures included were: the KSS (n = 8), other recognised subjective sleepiness scales (n = 5), sleep duration assessed using actigraphy (n = 8) and sleep/work diaries (n = 3), biological markers of fatigue (n = 2) and electroencephalography (EEG) recordings (n = 1) (Appendix 1, Table 2).

Half of the included studies (50%) used the 10-min version of the PVT and the most common modality (25%) was the palm pilot version of the PVT, however many of the included studies (31.3%) did not indicate how the PVT was applied. Half (50%) of the included studies requested participants to undertake the PVT before, during and after their shift, while others administered the PVT based specifically on the schedule/roster (19%) or at a single timepoint in the morning, afternoon or night (19%).

Quality assessment

Risk of bias

As per the NHLBI Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies (National institute of Health 2014), all included studies were of fair methodological quality. None of the 16 studies used blinding which consequently may have introduced an increased risk of detection bias from outcome assessors. Three of the included studies did not have a participation rate of greater than 50% of eligible individuals and thus their results may not be generalisable to the wider population. Five of the included studies had no sample size justification and four had a loss to follow-up after baseline of greater than 20%, namely due to participants being no longer eligible to participate or unable to be re-contacted. All studies used similar ANOVA based statistical analyses approaches to examine data.

GRADE assessment

As per the GRADE recommendations, the initial body of evidence in this review is ‘low’ quality due to the majority of included studies being observational (Atkins et al. 2004). The quality of the overall data is low due to lack of blinding allocation process. A number of studies had limited follow-up assessments and/or experienced a large dropout of participants due to a variety of reported reasons (e.g., researchers could no longer get in contact with participants, participants no longer wished to remain a part of the study, complex methodologies that required extensive and timely participant interactions). Although three-quarters of the included studies had consistent findings in relation to PVT performance, four studies had mixed results (Bonnefond et al. 2006; Ganesan et al. 2019; Waggoner et al. 2012; Vanttola et al. 2019). Therefore, the overall body of evidence in this review was determined to be of ‘very low’ quality.

Narrative synthesis

The PVT was administered to participants in each study at varying timepoint intervals. The positioning of these timepoints were based on the roster design being examined in the study or the potential for the timepoint to identify key changes in workers fatigue levels, hypothesized by study authors. In combination with the PVT, other subjective and objective fatigue-related measures were applied. The KSS was the most popular subjective measure used in the included studies and this was frequently coupled with the PVT. In addition to the PVT, the most commonly used objective measure was actigraphy, which was used to track movement and sleep patterns. Actigraphy was often used alongside sleep and work diaries, which provided subjective information on sleep and work hours.

There appeared to be a relationship between objective and subjective fatigue measures in several studies which included both of these types of measures. For example, Bjorvatn et al. (Bjorvatn et al. 2006) found that if the mean reaction time (mRT) in the PVT increased towards the end of a shift, then KSS levels also showed an increase and this finding was consistent across several other studies included in this review (Bonnefond et al. 2006; Basner and Dinges 2011; Bjorvatn et al. 2006; Harma et al. 2006; Basner et al. 2017; Waggoner et al. 2012; Vanttola et al. 2019). Conversely, Wilson et al. found poor congruence between PVT performance and the KSS among nurses (Wilson et al. 2019). Furthermore, a number of included studies investigated relationships between fatigue and actigraphy, with findings indicating that shorter sleep durations were related to increased objective fatigue levels (e.g., slower PVT response speed) (Bonnefond et al. 2006; Flynn-Evans et al. 2018; Ferguson et al. 2011; Basner et al. 2017). Finally, Shattuck et al. found a direct correlation between decreased mRT and worsened mood (Shattuck and Matsangas 2016).

To facilitate the synthesis of the literature, the below section has been split into three categories based on the application of the PVT in the included studies (i.e., ‘Multiple instances per shift’, ‘Commencement and cessation of shift’ and ‘Other varying times’; Table 1).

Multiple instances per shift

The PVT was applied at multiple times (i.e., ≥ 3 administrations) throughout each shift in five of the included studies (Flynn-Evans et al. 2018; Shattuck and Matsangas 2016; Anderson et al. 2012; Bjorvatn et al. 2006; Wilson et al. 2019). Anderson et al. administered the PVT every 6 h through a 24–30-h shift for resident interns, which showed a slow and steady decline in cognitive attention (p < 0.05) (Anderson et al. 2012). Similarly, Flynn-Evans et al. tested pilots at key stages (start of shift, end of each ascent, prior to each descent and end of shift) throughout their shift, but also before and after major sleep periods, yielding a similar decline in cognitive attention measured on the PVT (p < 0.05) (Flynn-Evans et al. 2018). Research by Bjorvatn et al. in oil rig workers and Wilson et al., in hospital-based nurses, however, had set times the PVT was to be completed throughout their 12-h shift (Bjorvatn et al. 2006; Wilson et al. 2019). This typically occurred at 0000, 0300 and 0600 on night shifts or 1200, 1500 and 1800 on day shifts (Bjorvatn et al. 2006; Wilson et al. 2019). These two studies observed comparable decreases in PVT mRT on the first night shift. However, Wilson and colleagues found reaction time performance slowly steadied the longer a nurse was on consecutive night shifts (p < 0.0001) (Wilson et al. 2019). Furthermore, Wilson et al. found mean lapses on the PVT tripled on night shifts compared to days (p < 0.006) (Wilson et al. 2019). Conversely, Bjorvatn et al. found lapse data in oil rig workers remained consistent across the shifts (Bjorvatn et al. 2006). Overall, the use of the PVT at multiple instances throughout a shift show a decline in cognitive attention.

Commencement and cessation of shift

The PVT was administered to participants at the commencement and cessation of each shift in six of the studies in the included analysis (Bonnefond et al. 2006; Ruggiero et al. 2012; Behrens et al. 2019; Ferguson et al. 2011; Thompson et al. 2016; Harma et al. 2006). Behrens et al., Ferguson et al., Ruggiero et al. and Thompson et al. all examined rosters that progressed forward (i.e., days to nights to rostered days off) and included shifts that were 12–13 h in length (Ruggiero et al. 2012; Behrens et al. 2019; Ferguson et al. 2011; Thompson et al. 2016). The studies each reported significantly lower mRT on the PVT from the beginning to the end of the first shift. After this first shift; however, mRT typically remained a constant for the rest of the roster duration, suggesting participants were accustomed to the test process and results did not decrease further (Ruggiero et al. 2012; Behrens et al. 2019; Ferguson et al. 2011; Thompson et al. 2016). This was consistent with the other studies which examined PVT performance at multiple instances across a shift.

Bonnefond et al. and Harma et al. both found correlations between ages and shift type on fatigue assessed using the PVT (Bonnefond et al. 2006; Harma et al. 2006). The middle and older age groups performed significantly worse (threefold) in response time and lapses on night shifts than younger participants (p < 0.01) (Bonnefond et al. 2006). This was significant among maintenance workers performing a backwards rotating roster (mornings to nights to evenings) (p < 0.0003) (Bonnefond et al. 2006). Conversely, Harma et al. found a weaker correlation between age and PVT performance, however in this study, participants performed a forward-facing roster which resulted in decreased performance across all age groups in the later night shifts (p < 0.0064) (Harma et al. 2006).

Other varying times

In the remaining five studies, the timing of the PVT was more targeted towards specific rosters or industry designs (Ganesan et al. 2019; Vanttola et al. 2019; Basner et al. 2011; Harris et al. 2010; Waggoner et al. 2012). Basner et al. examined sleep patterns and sleep inertia of physician interns and residents (Basner et al. 2017). In this study, the PVT was administered to these workers between 0600 and 0900 each morning after either a night on-call or at home (Basner et al. 2017). The findings from this study revealed reaction time was significantly slower during the mornings after being on-call (p < 0.001) compared to regular shifts (Basner et al. 2017).

Waggoner et al., administered the PVT in police officers to examine vigilant attention at the end of their rostered work ‘week’, where rotations always finished with three consecutive 12-h nights (Waggoner et al. 2012). In this study, police were tested on their driving performance, cognition and vigilant attention at a research facility after the third and final 12-h night shift and these results were compared to the same test battery completed three days later after rostered days off (Waggoner et al. 2012). The KSS results showed officers reported double the level of subjective sleepiness post-night shift (p < 0.005) as compared to when well rested (Waggoner et al. 2012). Furthermore, the number of performance lapses on the PVT more than doubled (p < 0.001) post-night shift as compared to when rested (Waggoner et al. 2012).

The remaining three studies tested participants typically on their first day and subsequent (i.e., 3rd, 5th and 7th shift) shifts or twice during random allocation of shifts (Vanttola et al. 2019; Harris et al. 2010; Ganesan et al. 2019). Ganesan et al., who looked at healthcare workers, found day shift mRT on the PVT slowly increased on consecutive days of work, whereas mRT measured across consecutive night shifts was found to increase by 50 ms in comparison (Ganesan et al. 2019). These results indicate a faster decrease in healthcare workers vigilant attention when working nights (p < 0.001) (Ganesan et al. 2019). Similarly, when examining performance lapses, workers performed the worst on night duty and the number of lapses almost doubled at the end of the roster rotation (p < 0.001) (Ganesan et al. 2019). Harris et al., examined PVT performance among offshore oil rig personnel who were working either a 12-h day shift, 12-h night shift or a swing shift (7-night shifts followed by 7-day shifts, each 12-h in duration) schedule (Harris et al. 2010). The PVT was administered to personnel on their first, seventh (or eighth if on swing shift) and fourteenth shift (Harris et al. 2010). There was no statistical difference in mRT between either of the 14-h day or swing shift schedule. Interestingly, the PVT results showed a significant increase in mRT on the first night on the 14-h night schedule, however this promptly normalised on subsequent tests (p < 0.05) (Harris et al. 2010).

Discussion

The aim of this systematic review was to evaluate participants PVT performance in response to 24/7 shift rosters performed in different occupational settings under naturalistic conditions. It is clear the PVT has been applied through varying study protocols, which has yielded consistently significant results. There is a reasonable quantity of objective data that demonstrates decreased alertness and increased sleepiness throughout the majority of included studies.

Among the included studies, half used the 10-min PVT and the most common PVT modality was the palm pilot. Despite this variation in the duration and modality of PVT applied across the included studies, general conclusions can still be drawn from the cumulative results. For instance, there is evidence to support the notion that the varying length PVTs (i.e., 3-, 5- and 10-min variations) produce comparable results and one version or modality does not necessarily produce stronger or more accurate results (Basner et al. 2011; Basner and Dinges 2011; Arsintescu et al. 2017). It is, however, pertinent to note that in some prospective and/or in-field studies, the 10-min PVT may be impractical due to time, budget and/or logistical constraints. Thus, a 3- or 5-min PVT may be more appropriate and yield similar results to a 10-min version. Further, in the majority of studies that included multiple fatigue measures, there appeared to be a positive relationship between subjective and objective fatigue measures.

Multiple instances per shift

Although the studies by Anderson et al. and Flynn-Evans et al. both have similar study designs in the application of the PVT (i.e., administered 6-hourly or > 3 times per shift), they interestingly had quite different results that may be explained through differences in roster design (Flynn-Evans et al. 2018; Anderson et al. 2012). The medical interns examined by Anderson et al. had a mRT of 284.6 ms at the start of the shift, which slowly increased by 100 ms by the end of their duty period (p < 0.05), up to 30 h later (Anderson et al. 2012). When compared with these medical interns (Anderson et al. 2012), the pilots examined by Flynn-Evans et al. had a similar mRT at the start of their shift (p < 0.05) (Flynn-Evans et al. 2018). However, the duty period examined among pilots by Flynn-Evans et al. was substantially shorter than the shift duration examined in interns by Anderson and colleagues, which may have explained the increase in mRT (i.e., slower to react to the stimulus) at the cessation of shift due to prolonged working hours (Anderson et al. 2012; Flynn-Evans et al. 2018). Both studies had a moderate number of PVT performance lapses in the early phases of their respective shifts, possibly explained by participant learning error; however, the lapses tripled for the medical interns at the end of their duty period (p < 0.05) (Flynn-Evans et al. 2018; Anderson et al. 2012). Thus, this finding could indicate interns are at significant risk of impaired performance towards the end of their shift due to extended duty hours. There is limited research that has examined how this level of impairment on PVT performance relates to real-life/real-world medical errors. This is a research area for further evaluation in the future.

Wilson et al. and Bjorvatn et al. both demonstrated a significant decrease through the first shift (typically nights) in PVT mRT and then slower decreases on consecutive shifts afterwards (p < 0.05) (Bjorvatn et al. 2006; Wilson et al. 2019). This is indicative of circadian rhythm attempts at adaption (Wilson et al. 2019). However, as demonstrated in the study by Wilson et al., this cannot be maintained for extended periods of time (i.e., > 4 shifts) (Wilson et al. 2019). Finally, both Wilson et al. and Bjorvatn et al. found key PVT performance challenges at the change of shift, including changes between day-to-night and night-to-day (i.e., swing shift) (p < 0.05) (Bjorvatn et al. 2006; Wilson et al. 2019).

The regular use of the PVT in the above-mentioned studies has provided a clear demonstration of cognitive vigilant attention changes throughout shift work. This is, however, not without research limitations. Components of data collection had to be removed as some participants undertook the tests too early/late which could have skewed the data (Flynn-Evans et al. 2018). Other studies reported missing individual assessments or full-days of data collection, requiring the application of missing data imputation methods (Bjorvatn et al. 2006). This resulted in incomplete data sets and thus the findings may not necessarily be a true representation of the sample. Future research could benefit by collecting data from participants at a central location, for example a research room in a hospital, to minimise incomplete data sets.

Commencement and cessation of shift

Analysis of studies examining PVT performance at the start and end of a shift was impeded by 33% of raw data not being reported through the studies, but descriptive techniques had been used instead. PVT performance examined at the beginning and end of each shift showed a decrease in cognitive vigilant attention, as expected; however, this was more extreme on night shifts (Ruggiero et al. 2012; Behrens et al. 2019; Ferguson et al. 2011; Thompson et al. 2016). Thompson et al. also supports this notion through the testing of nurses’ vigilance after three 12-h shifts within a four-day time period (Thompson et al. 2016). In this study, cumulative fatigue resulted in a 21.2% increase in the standard deviation of mRT in this cohort of nurses (Thompson et al. 2016).

Two studies found that work hours are seldom the primary factor contributing to fatigue (Ruggiero et al. 2012; Ferguson et al. 2011). This adds to the growing literature that reduced sleep quantity is a major determinant of mRT later in the shift (Ferguson et al. 2011). For instance, Ferguson et al. reported that miners performing the PVT with less than 6 h of sleep in the preceding 24 h had slower response times than those who had 7+ h of sleep (Ferguson et al. 2011). Laboratory studies; however, indicated PVT performance could be maintained in the short term (i.e., 1–2 days) with 6 h of sleep (Ferguson et al. 2011). This finding is consistent with research by Ruggiero et al. who found sufficient rest (7+ hours sleep) is required for improved PVT performance in critical care registered nurses (Ruggiero et al. 2012). This can, and often is, supplemented by shift workers undertaking sanctioned napping (Basner et al. 2011, 2017; Harris et al. 2010; Flynn-Evans et al. 2018; Miley et al. 2016; Ruggiero et al. 2012; Shattuck and Matsangas 2016; Basner and Dinges 2011; Arsintescu et al. 2017; Moher et al. 2009; Sayers 2008; Veritas Health Innovation 2017; National institute of Health 2014; Atkins et al. 2004; Anderson et al. 2012; Bjorvatn et al. 2006; Wilson et al. 2019; Tanaka et al. 2011; Behrens et al. 2019; Ferguson et al. 2011; Thompson et al. 2016; Harma et al. 2006; Waggoner et al. 2012; Vanttola et al. 2019; Wyatt et al. 1999; Richter et al. 2021) during the night shift (Ruggiero et al. 2012; Wilson et al. 2019), which has been found to improve reaction times and reduce the effects of sleepiness (Hilditch et al. 2016).

The time of day that the PVT test is completed and the number of hours of wakefulness are also known to have an impact on reaction time (Wyatt et al. 1999). For instance, laboratory-based studies have found that when the PVT is performed later in the day or during the night in workers, this can lead to a mismatch between the clock time and the natural biological clock of participants, resulting in significantly decreased neurobehavioral outputs (e.g., slower PVT mRT) (Wyatt et al. 1999). Indeed, several of the included studies found that PVT reaction time was remarkably slower at times not congruent with a natural biological sleep/wake cycle, such as on night shifts, very early mornings or late afternoons (Bonnefond et al. 2006; Flynn-Evans et al. 2018; Anderson et al. 2012; Bjorvatn et al. 2006). This is undoubtedly a significant finding, however few of these studies also measured each participant’s individual sleep patterns and prior hours of wakefulness (Anderson et al. 2012; Bonnefond et al. 2006; Flynn-Evans et al. 2018). Thus, like prior laboratory-based studies (Wyatt et al. 1999), it is important future field studies take time of day as well as the number of hours of awakening into consideration when examining PVT performance in naturalistic shift work settings.

Although the effects of age on PVT performance was not a focus of this review, it was a noteworthy factor in the included studies. For instance, Bonnefond et al. and Harma et al. both found that alertness and fatigue levels among older adults were affected the least on single night shifts and rapidly forward rotating rosters as compared to multiple successive nights (Bonnefond et al. 2006; Harma et al. 2006). This not only promoted employment longevity, but also decreased recovery time to transition back to a normal circadian rhythm (Bonnefond et al. 2006). Notwithstanding this, PVT mRT and mean lapses were still markedly higher in older adults when compared with younger and middle-aged adults (Bonnefond et al. 2006). Together, these studies suggest that to reduce fatigue-related impairment, older shift workers should have rapidly forward progressing rosters, increased recovery time, increased flexibility and lower frequencies of night duty (Harma et al. 2006).

Although the data appear to have reasonable academic rigour, the limitations of these studies include single institution research (Behrens et al. 2019), small to medium sample sizes (Ruggiero et al. 2012), low response rates (Ruggiero et al. 2012) and high participant drop-out in some studies (Bjorvatn et al. 2006), which may impede the generalisability of findings. However, in combination, these studies contribute to a growing evidence base that indicates fatigue levels can be maintained through sanctioned napping and rapidly forward rotating rosters. Further, the sample size in each age cohort was small (n = 13 to 19), which limits the generalisation of the age-related findings to the wider population (Bonnefond et al. 2006). Finally, Harma et al. lowered the chance of behavioural abnormalities present in their study results by designing a 2-week roster intervention (Harma et al. 2006). In addition, alcohol and sleeping medicines were restricted and a stringent research protocol was provided (Harma et al. 2006). However, this may not be reflective of usual behaviours of shift workers in naturalistic settings who may have access to alcohol and sleeping medications (Richter et al. 2021).

Other varying times

Analysis for this domain was challenged by a 60% lack of raw data; however, descriptive statistics have been used. Basner et al. concluded that post on-call fatigue levels could cause clinical concerns due to significantly lower levels in cognitive vigilant attention (Basner et al. 2017). It has been reported fewer than 7 h of sleep in a 24-h period can increase errors and create safety challenges with the therapeutic patient–clinician relationship (Basner et al. 2017). Waggoner et al. agreed with Basner’s (Basner et al. 2017) study in that extended periods of duty can decrease performance, alertness and impair clinical and personal decision making (i.e., driving home) (Waggoner et al. 2012).

Ganesan et al. found that subjective fatigue was perceived to be lowest on the first night shift in a series of consecutive night shifts; however, objectively measured PVT performance indicated impairment was similar across all subsequent shifts (Ganesan et al. 2019). This finding was in contrast to other studies whose results suggested the contrary (Bonnefond et al. 2006; Ganesan et al. 2019; Waggoner et al. 2012; Vanttola et al. 2019). Furthermore, Harris et al. found circadian adaption to night duty occurred within a week (when working greater than 7-night shifts); however, recovery back to normality took longer with no adverse outcomes to PVT mRT (Harris et al. 2010).

Overall summary

The majority of included articles suggest subjective fatigue measures are consistent with objective measures in response to different rostering schedules (Basner et al. 2017; Waggoner et al. 2012). There was a consistent correlation between increased mRT measured on the PVT and increased KSS levels, particularly during evening/night shifts (Ganesan et al. 2019; Basner et al. 2011; Waggoner et al. 2012; Vanttola et al. 2019). Harris et al. (Harris et al. 2010) also found prolonged number of night shifts resulted in longer re-adaption times in participants cortisol levels, which is a biological marker found in prior simulator-based research to be associated with increased levels of subjective fatigue in shift workers (Wolkow et al. 2016). Finally, Bjorvatn et al. (Bjorvatn et al. 2006) found objective sleep duration measured using actigraphy was similar to subjective sleep, which is a finding that was consistent with other studies included in this review (Bonnefond et al. 2006; Flynn-Evans et al. 2018; Ferguson et al. 2011; Harma et al. 2006). However, in a small number of included studies there are divergent findings in this area (Ganesan et al. 2019; Vanttola et al. 2019). Future research should be directed towards exploring the potential human and experimental factors that may explain these deviations in the literature, especially in naturalistic settings.

Limitations

Methodological limitations in this review include the inclusion/exclusion search criteria containing only articles in the English language, which unfortunately could not be avoided as none of the research team were bilingual. The requirement for articles to be peer-reviewed increases the academic rigor of the review; however, it also means some primary, contemporary research is excluded. The influence of timed caffeine or food consumption on fatigue-related impairment would be an interesting comparison; however, these factors were not within the scope of this review and could be considered in future evaluations. Further, evaluating simulated studies against the real-world research examined in the current review could also evoke further investigation into study design or roster review.

Conclusion

Through this review, impairments in PVT performance were consistently demonstrated in response to varying roster patterns and shift types (night and day shifts), indicating increased fatigue levels and reduced alertness. There is, however, a distinct lack of evidence examining vigilant attention changes during sustained periods of work and on-call work (beyond 30+ h). Healthcare providers, defence personnel, pilots and communication specialists (translators and interpreters) spend extended periods on-call and could experience more pronounced changes in fatigue related impairments in performance. Future research addressing this area is required for improved workplace safety, productivity and performance.