Surgeons frequently feel distracted while performing operative tasks [1]. Distractions in the operating room (OR) such as the door opening, phone ringing, or alarm sounds from medical devices may divert the clinicians’ attention and lead to adverse events. Many observational studies reported frequent distractions in the OR [2]: ranging from 6 distractions/h for urological procedures [3] to 33 distractions/h for endourological procedures [4]. Further, it has been suggested that distractions may be a contributing factor to adverse events in surgery [5]. However, this relationship between distractions and adverse events has not been well established.

A number of controlled experiments have been conducted in simulated settings, and they generally showed distractions to have negative effects on surgical performance (see [2] for a review). However, these studies employed surrogate surgical tasks (e.g., peg transfer task) and mainly had a sole novice surgeon as the participant. Therefore, these studies do not entirely reflect the true OR environment; the participants were mostly novice surgeons who had little experience in real OR environments, and operations in reality are not completed by one surgeon only but require extensive teamwork. To the best of our knowledge, to date, only one direct observational study investigated the effects of distraction on adverse surgical events [6]. Through real-time observations of 31 cardiac surgeries, it was found that surgical flow disruptions were significantly correlated with surgical errors, that is, occasions in which a planned sequence of activities failed to achieve its intended outcome (e.g., such as incorrect placement of aortic valve suture). Surgical flow disruptions were defined as “deviations from the natural progression of an operation, thereby potentially compromising the safety of the operation” and were categorized as being related to teamwork issues, equipment and technology issues, resource-based issues, supervisory/training issues, and extraneous interruptions. The authors found that when surgical flow disruptions increased so did surgical errors; however, teamwork failures were the only factor that was significantly linked to surgical errors. A limitation of direct observation studies such as [6] is that observers may miss or misinterpret events.

Naturalistic data collected by audiovisual recordings instead of observers in the OR can to a large extent overcome this limitation [7]. In this paper, we report analysis conducted on such a dataset to investigate the relation between OR distractions and adverse events. The data was collected through a multisource platform called the OR Black Box® (ORBB, Surgical Safety Technologies, Inc.), which is a surgical safety initiative that started in 2013 at St. Michael’s Hospital, a large teaching hospital in Toronto, Canada. Previous analysis conducted on the ORBB dataset analyzed 132 elective laparoscopic general surgeries and reported a median of 138 auditory distractions, 20 surgical errors, and 8 adverse events per case, and at least 1 cognitive distraction in 84 of the observed cases [8]. Surgical errors and adverse events were found to be more common during dissection and reconstruction steps suggesting that some procedural steps may be more critical and require surgeon’s focused attention. However, an analysis of the relation between distractions and adverse events was not conducted.

In the current paper, we focused on this relationship and analyzed a subset of ORBB data that consisted of laparoscopic Roux-en-Y gastric bypass (LRYGB) operations, which formed the majority of recorded cases within the ORBB dataset. In particular, we analyzed severe technical adverse events: that is, intraoperative events that are due to errors and can lead to serious injury or death (e.g., a bleeding event from a major artery due to the use of incorrect instruments). We hypothesized that the rate of intraoperative distractions is correlated with the occurrence of severe technical adverse events in LRYGB operations.

Methods

Data collection

Intraoperative data was collected using the ORBB (Fig. 1). ORBB collects audiovisual recordings of the OR environment, laparoscopic videos of the operation, and physiological measurements of the patient from the time the patient is fully draped until the start of the removal of drapes. Raw data from microphones, wall-mounted panoramic cameras, and laparoscopes are recorded, time-synchronized, encrypted, and stored in secure servers in St. Michael’s Hospital. Analysts review and annotate the raw data. Distraction analysts receive a two-month long training on a distraction-annotation framework: a modified Disruptions in Surgery Index, DiSI [9]. Clinical analysts are staff-level surgeons who receive three months of training for administering standardized protocols to annotate the technical data including procedural steps of the surgery, intraoperative technical events, event severity ratings, technical errors, and surgeons’ technical skills. All annotations are completed within 30 days, at which point the raw data is deleted and the annotations anonymized.

Fig. 1
figure 1

ORBB instrumentation including cameras, microphones, and computers installed in the operating room

Dataset

Operative data was prospectively collected from 64 LRYGB between 2017 and 2019 from three different Canadian hospitals including St. Michael’s Hospital. Written consent was obtained from both the OR team members and the patient. Retrospective analysis of this data was approved by St. Michael’s Hospital Research Ethics Board (SMH REB #16-243). Case data was missing for distraction (n = 1) and mislabeled surgery type (n = 3); thus, 60 cases were included in the final analysis (Fig. 2).

Fig. 2
figure 2

Cases included for analysis

Data coding

Technical events are identified through the SEVerity of intraoperative Events and REctification framework (SEVERE), a validated tool that quantifies an event’s risk of harm as well as the extent to which that event was rectified [10]. Technical errors are identified based on the Generalized Error Rating Tool, GERT [11], and surgeons’ technical skills are rated through the Objective Structured Assessment of Technical Skills tool, OSATS [12].

Technical events were defined as clinically relevant intraoperative adverse events that can potentially result in an injury or harm to the patient. Technical events were coded according to the SEVERE framework and included bleeding, mechanical injury, thermal injury, ischemic injury, and insufficient closure of anastomosis; loss of pneumoperitoneum, gastrointestinal spillage, and hematoma formation were coded separately. The SEVERE tool is provided as a table in the Appendix. Some technical events may be expected and common in surgeries, and at times simply due to patient anatomy and nature of the surgical task, such as bleeding during dissections. To account for this difference, if a technical event was due to the nature of the task (e.g., a bleeding event during dissection), this event was coded with a “no error” label; otherwise (e.g., a bleeding event while grasping the bowel) was coded with an “error” label, where an error is defined as “the failure of a planned action to be completed as intended or the use of a wrong plan to achieve an aim” [11]. Our error definition is similar to the surgical error definition used in the previously described study of cardiovascular surgeries [6]. Because we were interested in preventable adverse events that may be associated with distractions, we focused only on events that were accompanied with an error.

Event severity was assessed as the clinical impact level of an event and was determined based on a 5-point scale based on the SEVERE framework [10]. Ratings 1–3 indicated minor to moderate technical events that do not require the surgeon’s attention immediately, while events with ratings 4 and 5 indicated harm to vital tissues and required immediate attention and rectification from the surgeon for patient safety. For example, a focal thermal injury to the abdominal wall is rated 1, whereas thermal injury to a small bowel causing a full-thickness injury through all layers of bowel wall is rated 5. Minor and moderate events (i.e., events rated 1, 2, or 3) may not require to be rectified and are almost expected. However, if severe events (i.e., events that are rated 4 or 5) are not rectified, they could lead to adverse outcomes. Thus, in our analysis, we focused on severe technical events (ratings 4 and 5).

Distractions in this study were defined as external sources that may lead to a break in attention [13]. Distractions annotated included people entering/exiting, machine alarms, external communication (phone and pager calls), staff being late, loud music in the OR, and surgeon switches. Staff being late (n = 3 events total) and loud music (n = 6 events total) were infrequent as they represented a very small portion of all distractions recorded in a given case. Therefore, these distractions were not included in the statistical models. As per distraction analysts’ directions, surgeon switches were also excluded from analysis since the switch between surgeons, residents, and fellows were often not determinable during distraction labeling. Therefore, the final list of distractions explored included people entering/exiting, machine alarms, and external communications. Table 1 provides more details on these distractions.

Table 1 Description of logistic regression variables and their measurement methods

Procedural steps were annotated as access/exposure, dissection/mobilization, reconstruction (i.e., jejunojejunostomy, gastric pouch creation, or gastrojejunostomy steps), inspection, and closure. Periods when surgeons wait for an instrument or stop to plan for future actions were marked as “no progress” (n = 13 cases). And seldom, secondary procedures such as hernia repairs or cholecystectomy took place during LRYGBs operations (n = 7 cases). Specimen resection and removal of specimen were also labeled if the secondary procedure involved those two steps. To account for procedural steps in our analysis, we asked two clinical analysts to rate procedural steps in terms of their criticality. Criticality was defined as the potential danger to a patient if the procedural step was done without focused attention. Two analysts rated each procedural step separately based on three levels: low, medium, and high criticality. Agreement was reached after discussions. As a result, access/exposure and closure steps were labeled as low criticality (LC); dissection/mobilization was rated as low to medium criticality (LMC). Because the surgeon would assess the work completed during reconstruction step as part of inspection, reconstruction and inspection were combined and was rated as high criticality (HC). The analysts were not able to rate criticality for no progress and secondary procedure steps with the given level of information: criticality of these two would depend on the surgeon’s task at hand. For example, waiting for an instrument would be considered low criticality while planning future steps would be high criticality for no progress steps. The overall agreement for procedural criticality categorization was 92.3% with free marginal kappa of 0.88 (95% CI 0.66, 1.00), which is considered to be almost perfect [14]. Examining the prevalence of the severe technical events across procedural step categories within our dataset revealed that severe events almost always occurred during HC procedural steps (91 out of 92 severe technical events, 98%). Therefore, the analysis of severe events focused on the HC procedural steps only.

Surgical team factors consisted of operating surgeon’s technical skills and staff changeovers to account for team composition. Technical skills of the operating surgeon were rated by clinical analysts using the OSATS tool [12] for each 20-min segments of operation. Ratings did not differentiate the training level of the primary operator (resident, fellow or staff surgeon) to maintain the privacy and confidentiality of the OR staff. Three OSATS items (respect for tissue, knowledge of instruments, and instrument handling) were excluded from our analyses because clinical analysts informed us that the ratings given to these three items depended on the occurrence of a technical event, and if used as a covariate to predict a technical event, these items would have resulted in a circular argument (i.e., lack of technical skill assumed to lead to events, but technical skill scored lower when an event is observed). To measure technical skill independent of the technical events recorded in our data, only the remaining four OSATS items (time and motion, use of assistance, knowledge of specific procedure, and flow of operations and forward planning) were included in our analysis. Previous studies have also made adaptations to the OSATS tool, including item removals [15]. Our adaptation maintains the properties of the original OSATS tool (i.e., “the behaviorally-oriented global rating scale, the task-specific checklist, and the use of multiple stations or tasks” [15]) while keeping the majority of the OSATS items. For each OSATS item, we calculated a time-weighted average using the OSATS scores rated during HC procedural steps. Later, these time-weighted average values for 4 items were summed to obtain a single OSATS value (out of 20). Staff changeovers were regarded as a surgical team factor since a change in the team composition may affect the information flow between members and hence, could affect the team performance. Changeover-related staff traffic was already accounted for in the distraction variable, people entering/exiting. Therefore, we considered staff changeover as a team factor, although there could be other aspects of staff changeover that could lead to a distraction. As per the study protocol, changeovers were collected for nurses (n = 66), surgeons (n = 3), and those that were not determinable (n = 12). Because surgeon changeovers were rarely observed, these observations were excluded from the analysis along with not determinable observations. Hence, surgeon’s technical skills and nurse changeovers were included in analysis as the surgical team factors.

Patient information prospectively collected for the 36 cases recorded in St. Michael’s Hospital consisted of (1) patient’s BMI and (2) whether the patient had an abdominal surgery prior to the current operation. As this study was a retrospective analysis of anonymized data, with some differences in prospective data collection between sites, patient chart data could not be retrieved from other sites.

Data analysis

First, rank differences in rates of distractions between LC, LMC, and HC procedural steps were explored through the Friedman test given the non-normality of the data. The significant Friedman test was followed with post hoc tests as described in [16].

To investigate the relation between distractions and severe technical events, logistic regression analyses were conducted. Because 98% of severe technical events were observed during HC procedural steps, only this procedural step was used in the regression analyses. The outcome variable was initially divided into the following 2 classes: cases without severe technical events (n = 15) and cases with at least 1 severe technical event (n = 45). However, because one of the classes had only 15 observations, the outcome variable was regrouped into 2 new classes: (1) cases with at most 1 severe technical event (n = 35) and (2) cases that have more than 1 severe technical events (n = 25). First, unadjusted odds ratios (ORs) were calculated for each factor of interest, along with their 95% confidence intervals. Then, multivariate logistic regression models were developed. Due to the small sample size and limited access to patient charting data, two models were built to limit the number of factors included in each multivariate model: (1) a multivariate logistic regression model for HC procedural steps that investigated the relation between distractions (people entering/exiting, machine alarms, external communication) and severe technical events while controlling for surgical team factors (nurse changeovers, OSATS scores), and (2) a multivariate logistic regression model that investigated the relation between patient factors (BMI level, previous abdominal surgery history) and severe technical events. This second model was built to explore the relationship between patient factors and severe technical events to inform data collection for our future studies. Multicollinearity was checked through variance inflation factors (no issues were identified); goodness of fit was checked through Hosmer–Lemeshow goodness of fit tests. All statistical analyses were conducted in R [17].

Results

60 LRYGB operations were analyzed. Overall, the mean operative duration was 83.18 (SD = 21.97) min. There were 1.53 severe technical events per case on the average (SD = 1.41), with a range of 0 (n = 15) to 6 events (n = 1). 82% of the cases contained 2 events or less. On average, 47.6 distractions (SD = 20.3) were observed per hour: people entered/exited the OR 17.8 times (SD = 7.80), a machine alarm was heard 26.7 times (SD = 17.3), and an external communication occurred 2.34 times (SD = 1.68) per hour. Detailed descriptive statistics for all 60 cases are presented in Table 2 for case duration, severe technical events, distractions, and surgical team factors. As stated earlier, patient information was available for 36 of the cases: the mean BMI level for these patients was 48.1 kg/m2 (SD = 8.23); 16 had previous abdominal surgeries.

Table 2 Descriptive statistics for the entire case and different procedural steps for the 60 cases analyzed

In 3 cases, a staff member arrived late to the OR within the first 18 min of the surgery, and during non-HC steps. Severe technical events (2 events in particular) were observed in only 1 of these 3 cases. In separate 5 cases, a team member commented on loud music. Four of these cases had the loud music comment during HC steps: 1 case had no severe technical event, 2 cases had 1 event each, and 1 case had 3 events. In the 5th case, which did not have any severe technical events, 2 loud music comments were made within 3 min of each other during a secondary procedure.

Rate of machine alarms, χ2(2) = 63.68, p < 0.001, and percent time spent on external communications, χ2(2) = 24.5, p < 0.001, significantly varied across procedural categories, whereas rate of people entering/exiting the OR did not, χ2(2) = 4.41, p = 0.11. Follow-up tests showed that LC steps had a higher rate of machine alarms compared to LMC and HC steps; and both LC and LMC had less external communication than HC (p < 0.05).

Regression models

As stated earlier, the dependent variable had two classes: (1) cases with at most one severe technical event and (2) cases with more than one severe technical events. Descriptive statistics across these two classes are presented in Table 3.

Table 3 Descriptive statistics of distractions, surgical team factors, and patient factors across the two severe technical event categories

In unadjusted analysis, machine alarms (OR = 1.29, 95% CI 1.05–1.66) and OSATS scores (OR = 0.65, 95% CI 0.43–0.93) were found to be significantly associated with severe technical events. In adjusted analysis (logit model presented in Table 4), controlling for surgical team factors, an additional machine alarm observed in a 10-min period during HC procedural steps was associated with a 58% increase in the odds of severe technical events (95% CI 18–133%). Further, an OSATS score that was 1 unit higher (maximum score of 20) was associated with a 50% decrease in the odds of severe technical events (95% CI 12%, 70%). A Wilcoxon signed rank test comparing the rates of machine alarms 10 s prior to and 10 s after a severe technical event showed no significant difference, W = 29, p = 0.47. Therefore, no evidence was found that the increased rate of machine alarms could be due to the occurrence of a severe technical event. Supporting this, low criticality steps, where no severe technical events were recorded, also had a higher rate of machine alarms than high criticality steps as reported earlier.

Table 4 Logistic regression results for predicting severe technical events through distractions controlling for surgical team factors

The logit model investigating the relation between patient factors and severe technical events on 36 cases (Table 5) revealed a marginal statistical significant result for BMI (p < 0.1), with an increase of 1 unit in BMI associated with a 9% increase in odds of severe technical events (95% CI 0%, 21%).

Table 5 Logistic regression results for predicting severe technical events with patient factors

Discussion and conclusions

This research investigated the relation between intraoperative distractions (people entering/exiting, machine alarms, and external communication) and severe technical events in laparoscopic Roux-en-Y gastric bypass operations. A naturalistic dataset collected through a comprehensive operative capture platform, OR Black Box, was utilized to analyze 60 operations. Descriptive analysis showed that every hour, on average, 18 people entered or exited the OR, 27 machine alarms went off, and 2 external communications took place. Overall, these distractions occurred 48 times/h. This rate is higher than those reported in direct observational studies [2]. For example, [4] reported 33 distractions/h during 28 endourological surgeries collected from 1 teaching hospital. Our study may have captured a larger rate of distractions due to differences in type of surgery studied or how we defined distraction, or it may also be that we were able to capture a larger rate of distractions as our retrospective analysis of recordings is less prone to missing distractions than direct observational studies. In general, however, our findings support other research in the conclusion that distractions are frequent in the OR.

In our data, all but one severe technical event occurred during high criticality procedural steps (i.e., reconstruction and inspection). Therefore, our logistic regression modelling to investigate the relation between intraoperative distractions and severe technical events focused on high criticality procedural steps. Controlling for surgical team factors (nurse changeovers and surgeons’ technical skills), an additional machine alarm observed in a 10-min period was associated with a 58% increase in the odds of severe technical events (the case having 2 or more severe technical events as opposed to 1 or no events). This significant relation may imply that machine alarms can draw valuable attentional resources away from critical procedural tasks. However, it should be noted that further investigation is needed into this effect. Alarms can potentially be detrimental to surgeons’ performance if they occur during a critical task which requires the surgeon’s attention, but alarms can also convey critical information and draw the team’s attention to an urgent issue. A potential strategy to minimize the distracting effects of alarms is to employ the sterile cockpit rule: reducing unnecessary distractions during critical steps and developing protocolized communication for necessary ones, as pilots do in aviation [18]. Certain alarms might be directed to staff members that need that information (e.g., anesthesia) through individual headsets [19] to reduce alarm distractions for other team members. Additionally, medical devices may be designed with modes that only alert the entire team of critical alarms, thus reducing unnecessary alarms during phases of surgery that require focused attention. This may have the additional desired effect of reducing alarm fatigue, allowing a subset of the team to address low level alarms while alarms of high importance are distinguished so teams can respond more quickly and effectively.

We did not observe any severe technical events during procedural steps that were not deemed to be of high criticality. Further, the steps that were deemed to be of low criticality (i.e., access/exposure and closure) had higher rates of machine alarms than high criticality steps. In general, distractions may have no detrimental effects on surgical performance when they occur during tasks that do not require high levels of focused attention. Some distractions, such as music, may even improve performance by increasing arousal during monotonous tasks [20, 21]. In addition, percent time spent on external communications (i.e., phone calls and pagers) was found to be higher during lower criticality procedural steps compared to high criticality ones. It is possible that other OR team members may have been actively adjusting their engagement in external communications during critical phases of operation. Similar phenomena have been observed in studies of distraction in Intensive Care Units, with other medical professionals interrupting nurses less when nurses conduct critical tasks [22, 23].

Although we did not find a significant relation between severe technical events and the percent duration of external communications and the rate of people entering/exiting, it is possible that these distractions, if untimely, can have detrimental effects on surgical performance. Controlled experiments suggest that pager calls and phone calls can interfere with surgeon’s performance [13, 24, 25]. We did not differentiate incoming and outgoing communications or pagers and phone calls as this coding was not available in our dataset. It is possible that how distracting an external communication is would depend on these factors. For example, pager calls received by the operating surgeon may be more distracting to them compared to an outgoing call made by the circulating nurse. Further, frequent door openings in the OR due to people entering or exiting can be detrimental to patient safety also by increasing the risk of surgical site infections [26,27,28]. The three most common reasons for people entering or exiting the OR has been identified as getting information, supplying equipment, and scrubbing in and out [26, 29, 30]. Interventions can be implemented to reduce the frequency of people entering/exiting the OR and adjust their timing to occur during lower criticality phases to minimize unnecessary distractions to the surgical team. The reason for entering/exiting the OR could be taken into account in such interventions. One potential intervention is to implement a preoperative briefing to ensure that required equipment is available in the OR and is functioning properly.

A survey study reported that surgeons felt distracted in the OR [1]. In our study, team members commented on music being loud in five cases; four such comments were made during high criticality steps of the surgery. Team members may have felt distracted by loud music during these critical tasks. Teamwork training can help facilitate such essential communication as individuals may feel hesitant to speak up in the OR if a rigid hierarchy is in place [31]. Actively reducing or removing such distractions during phases that require focused attention can also help enhance patient safety. Through briefings, surgical goals, expectancies, and critical tasks can be made explicit to the team members. A shared understanding can be formed on when to refrain from initiating distractions and when to handle distractions for other team members. Increasing awareness about distractions, training for non-technical skills such as teamwork, and warning systems (e.g., lights that indicate when critical tasks are being performed) are some other example strategies that can be used to mitigate OR distractions. However, these mitigation strategies need to be carefully evaluated before implementation; the strategy must not block the potential benefits of distractions (e.g., conveying critical information, reducing boredom) and must not introduce new distractions to the OR environment.

Surgeon’s technical skill (as measured by OSATS), was found on multivariate logistic regression analysis, to be associated with a decreased likelihood of severe technical events, in line with previous research [32]. Due to limited sample size, we were not able to investigate the interaction between technical skills and distractions. However, skilled surgeons are likely less affected by distractions given that they may have obtained automaticity in many surgical tasks [2] and can therefore have more spare cognitive capacity. The results of [33] support this argument; in a controlled experiment, experienced surgeons were able to attend to secondary tasks while maintaining their primary task performance, whereas novice surgeons could not perform secondary tasks as well. Our other covariate, rate of nurse changeovers, was found to be not significant. However, changeovers may disrupt information flow [34], and it may be beneficial, where possible, to schedule changeovers for lower criticality phases of operation.

Although this paper is the first to investigate the relation between intraoperative distractions and severe technical events through the analysis of a naturalistic dataset, it has limitations that can inform future research. Our statistical analyses were constrained by sample size. Our dependent variable was whether a case had no or one event vs. more than one event; this grouping was selected as there were relatively few cases with no events. Further, because we only had patient health record access for 36 of the cases, we were not able to control for patient factors in the results discussed above. Although our secondary analysis conducted on these 36 cases highlighted the need for further access to patient data, this secondary analysis only focused on BMI and previous abdominal surgery and excluded anatomical data (liver size, mesenteric thickness/length, etc.) as we did not have access to it. Further, we investigated only one type of elective bariatric surgery. Moreover, some of the distractions captured may have been detrimental to other team members’ performance, but our analysis focused on the surgeon’s performance. We were also not able to capture technical skills of individual surgeons for privacy reasons. Future directions for this research include increasing the sample size, investigating different procedure types, and investigating the effects of distractions on other team members’ performance, as well as capturing additional contextual details on distractions (e.g., reasons for people entering the OR, urgency of machine alarms), studying other distraction types (e.g., case-irrelevant conversations, missing or malfunctioning equipment), and investigating the interactions between distractions and technical skills. A larger dataset can also enable future studies to look into the associations between certain distractions that may affect OR culture (e.g., staff being late, loud music). Although the analysis of a naturalistic dataset provides many advantages, in particular, capturing distractions as they naturally happen in the OR, the results can only be interpreted as correlations. Experimental methods are needed to support these conclusions for causal inferences. Further, interventional studies can inform the design and effectiveness of different distraction mitigation strategies.