Introduction

The effect of surgeon fatigue on clinical outcome has been a topic of great debate for decades. In an editorial titled “The tired surgeon” in 1965, Schenk [1] raised the question of whether it is in the patient’s best interest for sleep-deprived residents to perform an operation. Almost 40 years later, in 2003, the accreditation council for graduate medical education in the United States put into effect a work-hour reform that established a maximum 80 h workweek for medical residents. This reform was prompted by concern over the negative effect of restricted sleep on performance [2, 3].

Over the past two decades, numerous cohort studies have been conducted to assess the effect of surgeon fatigue on surgical outcomes. In 2018, Gates et al. [4] concluded in their systematic review that surgeon fatigue does not affect surgical outcome [4]. However, the effects of surgeon fatigue on different types of surgery; namely, elective and non-elective, remain unclear. In this study, we analyzed the effect of surgeon fatigue on surgical outcomes after separating surgeries into elective and non-elective groups. We hypothesized that surgeon fatigue would have a greater negative effect on elective surgery than on non-elective surgery, because many non-elective procedures are emergency or transplantation, meaning that the effect of timeliness would counter any negative effect of surgeon fatigue.

Methods

Data sources

We searched the literature for original articles published between January, 2000 and January, 2020 in PubMed with the following keywords: “[surgeon] and [(sleep deprivation) or (sleep deprived) or (fatigued)]” and “{(surgeon) and [(sleep deprivation) or (sleep deprived) or (fatigued)] and [(mortality) or (morbidity) or (outcomes)]}.” Using these combinations, we identified 1024 and 403 articles, respectively.

Inclusion and Exclusion Criteria

We included articles that discussed the effect of surgeon fatigue on postoperative mortality or total postoperative complications. We found 19 such articles [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23], 3 of which were excluded: one because it did not distinguish the number of patients who underwent surgery by fatigued or rested surgeons [17]; one, because it did not reveal the actual numbers of patients [20]; and one because an update [22] was published the following year by the authors [23]. Finally, 16 articles [5,6,7,8,9,10,11,12,13,14,15,16, 18, 19, 21, 22] were selected for the analysis of postoperative mortality. Of these 16 articles, 8 [5, 7, 9, 11, 12, 15, 16, 19] also included total postoperative complications for a similar analysis (Fig. 1).

Fig. 1 
figure 1

The selection flow chart for systematic review

Data extraction

We extracted data on the total number of patients, postoperative mortality, and total postoperative complications after surgeries performed by either fatigued or rested surgeons from each article and organized them. The following information was also extracted and organized: authors, title, publication year, published journal, types of surgery, and definitions of fatigued and rested states.

Statistical analysis

We used microsoft office excel (Microsoft Corp., Redmond, WA) to organize the extracted data and R foundation for statistical computing, Vienna, Austria (version 4.00 for Windows; R foundation for statistical computing, Vienna, Austria) with “meta” package (version 4.12–0; Schwarzer 2007; Balduzzi et al. 2019) “metafor” package (version 2.4–0; Wolfgang Viechtbauer) and “dmetar” package (version 0.0.9000; Mathias Harrer and David Daniel Ebert) to develop the forest plots of risk ratios (RRs), the funnel plots, and to perform Egger’s test. The RR for each article was calculated from the extracted data with two-sided 95% confidence intervals (CIs). The heterogeneity between articles was analyzed using the I2 statistic and P values. We considered an I2 value below 50% to be small and an I2 value above 50% to be significantly large. Publication bias was analyzed using Egger’s test for asymmetry. P values less than 0.05 were considered significant for both heterogeneity and Egger’s test. All RRs and 95% CIs were calculated using the random effect model because of the expected high heterogeneity between studies as they focused on different types of surgery.

Results

Article characteristics

All 16 articles [5,6,7,8,9,10,11,12,13,14,15,16, 18, 19, 21, 22] included postoperative mortality and 8 of these articles [5, 7, 9, 11, 12, 15, 16, 19] also included total postoperative complications. The type of surgery discussed in each article differed. Five articles [7, 12, 15, 16, 21] discussed the effect of surgeon fatigue on elective surgery and 11 [5, 6, 8,9,10,11, 13, 14, 18, 19, 22] discussed the effect of surgeon fatigue on non-elective surgeries, such as emergency or transplantation surgeries (Table 1).

Table 1  Summary characteristics of selected papers

Although the definition of “fatigued” differed among articles, we followed each article’s definitions, except for three articles that described multiple groups of “fatigued” surgeons [5, 13, 19]. Two [5, 13] articles defined “being fatigued” at the starting time of the operation, and another [19] defined it as sleep hours during the night before the operations. For the two former articles [5, 13], we grouped the “fatigued” and “rested” surgeons so that they were as similar as possible to those defined by Zafar et al. [14] because they had the largest number of patients in their study while defining the terms by the starting times of the operations. For the study by Chu et al. [19], we grouped “rested” surgeons as those who had slept for more than 6 h and “fatigued” as those who had slept for less than 6 h. Table 1 gives the modified definitions of being “fatigued” and “rested” with the number of patients. Eight of the 16 articles [5,6,7,8,9,10,11,12,13,14,15,16, 18, 19, 21, 22] defined being “fatigued” by partially focusing on the surgeons’ preoperative activities, such as sleep duration, performing surgery, or treating patients before the surgeries of interest [6, 8, 12, 15, 16, 19, 21, 22], whereas the other 8 articles [5, 7, 9,10,11, 13, 14, 18] defined being “fatigued” by focusing solely on the starting times of surgical procedures. One of the latter 8 articles established the working hours of the medical teams’ shifts, stating that the surgical team worked a 24 h shift from “8.00am–8.00am the next day.” The other 7 articles included no mention of shift work hours or other reasons for possibly different levels of surgeon fatigue during the given time intervals used to define being “rested” and “fatigued”.

Patient characteristics

A total of 51,508 patients underwent elective surgeries: 22,597 by fatigued surgeons and 28,911 by rested surgeons, with 468 postoperative mortalities (232 and 236, respectively). A total of 143,789 patients underwent non-elective surgeries (49,971 and 93,818, respectively), with 6111 postoperative mortalities (2168 and 3943, respectively). Among the eight articles that included total postoperative complications, a total of 51,238 patients underwent elective surgeries (22,575 and 28,663, respectively), with 7384 total postoperative complications (3665 and 3729, respectively). A total of 40,342 patients underwent non-elective surgeries (8154 and 32,188, respectively), with 2055 total postoperative complications (549 and 1506, respectively).

Effects on postoperative mortality

The RR of the rate of postoperative mortality among elective surgeries was 1.03 (95% CI, 0.86–1.24), with a prediction interval of 0.77–1.38. The heterogeneity (I2 and P value) was 0% and 0.47. Egger’s test showed an intercept of 0.36 (95% CI, − 0.82 to 1.54), with a P value of 0.58 (Fig. 2). The RR of the rate of postoperative mortality among non-elective surgeries was 1.08 (95% CI, 0.85–1.38), with a prediction interval of 0.48–2.45. The heterogeneity (I2 and P value) was 88% and < 0.01. Egger’s test showed an intercept of 2.63 (95% CI, 0.86–4.39), with a P value of 0.02 (Fig. 3). We were unable to lower the heterogeneity (I2) of non-elective surgeries to below 50% by subgroup analysis according to procedure type.

Fig. 2 
figure 2

Forest plot and funnel plot of the rates of postoperative mortality among fatigued and rested surgeons for elective surgeries

Fig. 3 
figure 3

Forest plot and funnel plot of the rates of postoperative mortality among fatigued and rested surgeons for non-elective surgeries

Effects on total postoperative complications

The RR of the rate of total postoperative complications among elective surgeries was 0.99 (95% CI, 0.95–1.04), with a prediction interval of 0.91–1.09. The heterogeneity (I2 and P value) was 0% and 0.80. Egger’s test showed an intercept of − 0.40 (95% CI, − 1.18 to 0.39), with a P value of 0.39 (Fig. 4). The RR of the rate of total postoperative complications among non-elective surgeries was 0.93 (95% CI, 0.67–1.28), with a prediction interval of 0.22–3.88. The heterogeneity (I2 and P value) was 88% and < 0.01. Egger’s test showed an intercept of − 2.85 (95% CI, − 9.32 to 3.62), with a P value of 0.47 (Fig. 5). We were unable to lower the heterogeneity (I2) of non-elective surgeries to below 50% by subgroup analysis according to procedure type.

Fig. 4 
figure 4

Forest plot and funnel plot of the rates of total postoperative complications among fatigued and rested surgeons for elective surgeries

Fig. 5 
figure 5

Forest plot and funnel plot of the rates of total postoperative complications among fatigued and rested surgeons for non-elective surgeries

Discussion

We found no significant differences in the rates of postoperative mortality or total postoperative complications among patients who underwent elective and those who underwent non-elective surgeries; according to whether they were performed by fatigued or rested surgeons. We also found that the heterogeneity between the rates of postoperative mortality and total postoperative complications among elective surgeries was small, whereas that between the rates of postoperative mortality and total postoperative complications among non-elective surgeries was significantly larger. We were unable to reduce this to below 50% by subgroup analysis according to procedure type. Regarding publication bias, we found a significant asymmetry among the rates of postoperative mortality of non-elective surgeries, but no significance was obtained otherwise.

We hypothesized that surgeon fatigue would have a greater negative effect on the outcomes of elective surgeries than on those of non-elective surgeries. However, based on our results, surgeon fatigue had no effect on the outcomes of elective surgeries and little to no effect on the outcomes of non-elective surgeries. One reason why surgeon fatigue may have had little or no effect on the surgical outcomes might be that, even when the surgeon was fatigued and likely to make more mistakes, other members of the surgical team were present to prevent potential mistakes before they were made. After all, surgery is performed by a team and most procedures are under someone else’s watch. This may act as a safety net for mistakes and counter the negative effects of surgeon fatigue on surgical outcomes. Although previous studies concluded that fatigue had a negative effect on the dexterity and motor skills of individual surgeons in laboratory settings [24,25,26], our findings were that fatigue has little or no effect on surgical outcomes in the real-world settings of hospitals.

Limitations

This study has four key limitations. The first is the inconsistency in the definition of being “fatigued,” as shown in Table 1. The second is that 8 of the 16 articles we reviewed focused only on one surgeon when evaluating surgeon fatigue, and thus, are not reflective of the level of fatigue as a surgical team [6, 8, 12, 15, 16, 19, 21, 22]. The other eight articles, defined surgeon fatigue solely by the starting times of the surgery, which reflected the fatigue level of the whole surgical team [5, 7, 9,10,11, 13, 14, 18]. However, seven of these articles did not mention shift work hours or other reasons for possible different levels of surgeon fatigue during the given time intervals used to define being “fatigued” and “rested”. This meant that we were unable to determine the presence of surgeons who were more rested than expected given the “fatigued” state or surgeons who were more fatigued than expected given the “rested” state. The third limitation is the high level of heterogeneity among the non-elective surgeries. We were unable to explain the underlying reasons with subgroup analysis according to the procedure type. The high heterogeneity among non-elective surgeries meant that we were unable to draw a clear conclusion. The fourth limitation is the risk of biases. As the number of articles was limited, we included all available articles, regardless of the level of risk of bias, to increase the volume to analyze the difference between elective and non-elective surgeries. Another risk for bias is publisher bias, which may discourage researchers from publishing studies that indicate worse outcomes caused by fatigue. Although we were able to show a significant asymmetry for the rate of postoperative mortality of non-elective surgeries (Fig. 3), we found no significant asymmetry otherwise (Figs. 2, 4, and 5). This is possibly because of the limited number of articles available.

Looking forward, an objectively measurable index for being fatigued is needed. One possible solution is to measure salivary amylase activity. In their 2010 study, Uesato et al. showed that patients’ sympathetic agitation can be measured as an increase in salivary amylase during endoscopic submucosal dissection [27]. If we can show that salivary amylase activity reflects the level of fatigue of surgeons, we may be able to use this as an indicator of whether the surgeon is fit to perform an upcoming surgical procedure. Another possible solution is to measure peak saccade velocity. In 2014, Di Stasi et al. reported a significant decrease in saccadic velocity among surgical residents after a call day [28]. We may be able to incorporate this measurement as an indicator immediately, using the eye-tracking devices that are readily available. While current devices may be expensive, prototype computational eyeglasses, called the iShadow, have been developed by a team lead by Walker at the University of Massachusetts Amherst to make eye-tracking technology affordable for patients with cancer-related fatigue and other conditions [29]. The affordability of such devices may encourage surgical teams to introduce saccadic velocity as an objective measurement of the level of fatigue.

Further, objective measurement of fatigue will help us to understand how much rest is required between surgeries. The eight articles that defined being “fatigued” by partially focusing on surgeons’ preoperative activities all concluded that there were no significant differences in the outcomes of their surgery. However, the definitions of being “fatigued” and “rested” used in each article are too specific to draw a general conclusion yet as to how much rest is required between surgeries.

Conclusion

Based on our analysis of the published literature available, surgeon fatigue does not affect the rates of postoperative mortality or total postoperative complications after elective surgeries and it may have little or no effect on the rates of postoperative mortality or total postoperative complications after non-elective surgeries.