Introduction

Retained surgical items (RSI) are serious adverse events in Operating Room (OR) settings around the world. RSI are items unintentionally left in a patient after surgery; some discovered long after the postoperative period [1]. Risk factors associated with RSI as reported in the literature are identified in Table 1. Surgical counting is a manual process to count the materials used in the sterile field to prevent retained surgical items in patients during surgery [2]. Manual counts do not always guarantee that a count is correct. For example, one study found 61 cases of RSI in operations where counts were performed, and 88 percent involved a final count that was in error yet thought to be correct [3]. The consequences of RSI affect patients and may include infection, pain and suffering, readmission, reoperation, sepsis, abscess, and death [3,4,5]. The incidence of RSI has ranged in studies from a rate of 1 in 5,500 to 1 in 7,000 operations [5]. However, the studies did not report the timeframe during which RSI were measured, so it is difficult to compare RSI incidence rates. RSI incidence may also affect the reputation of health care organizations. For example, a recent news story reported surgical sponges left inside a woman for one month, and this report had an impact on the perception of patient safety in that organization [6].

Table 1 Risk factors associated with retained surgical items

RSI are considered to be avoidable. Surprisingly, RSI continue to occur despite prevention strategies recommended by several organizations such as the World Health Organization (WHO) and the Association of periOperative Registered Nurses (AORN) [1, 7]. Hospital settings either adopted recommendations by one or the other organization, or developed safety practice guidelines of their own to address RSI. As a result, a variety of prevention strategies have been implemented. However, the effectiveness of those guidelines or developed practices is unclear because the results of studies aiming to understand guideline or practice effectiveness have been mixed.

In this systematic review we examine the available published evidence regarding the type and effectiveness of interventions that have been shown to prevent RSI in the OR.

Objectives

Our overall objective was divided into two research questions:

1) What types of intervention have been implemented to prevent retained surgical items?

2) What is the effectiveness of those interventions?

Methods

Protocol and registration

Reporting of this systematic review complied with preferred reporting items for systematic reviews (PRISMA) guidelines [17]. The protocol for conducting this review was developed a priori and deposited in the University of Michigan’s institutional repository, Deep Blue. It is available at https://doi.org/10.7302/n909-mt98.

Eligibility criteria

We included studies that evaluated interventions to prevent or reduce RSIs in the OR during general surgery. We included all study types, both quantitative and qualitative, and also included quality improvement (QI) projects. We limited our review to studies focusing on adult patients undergoing surgery within the hospital setting. Our primary outcome was RSI but we also included near misses, recounts, miscounts, and count discrepancies as these events are also associated with RSI and thus pose a risk to patient safety [1, 3, 4, 18, 19]. We included studies reported in both peer-reviewed and the gray literatures. We excluded studies conducted in minimal invasive, laparoscopic, or robotics surgery. We excluded conference abstracts, editorials, letters, or opinions, as typically there is insufficient detail in these article types to assess intervention quality. Similarly, we excluded audit or review articles.

Search methodology

Comprehensive searches were developed by an informationist (K.M.S.) from inception to November 10, 2020 in PubMed, Embase (Elsevier), CINAHL Complete Plus (EBSCO), Cochrane Library (Wiley), and Scopus (Elsevier). The authors supplemented the comprehensive database searches with ClinicalTrials.gov, Mednar, and OpenGrey to identify the relevant gray literature. We did not include non-indexed journals because the majority of these do not publish rigorous studies. To minimize the possibility of missing relevant studies, references for all included studies were reviewed. The resulting citations were moved to the citation manager Endnote X9 (Clarivate Analytics) for multi-pass duplicate detection and removal.

The searches were built around three main concepts: surgical objects, surgical procedures, and counting interventions. Each search consisted of a combination of controlled terms appropriate for the selected databases and keywords. An English language limit was applied across all databases, and a human limit and source type limit were applied in a few of the included databases. The reproducible searches for all databases and associated search files are available at Deep Blue Repositories University of Michigan https://doi.org/10.7302/n909-mt98. Unique citation records were uploaded to Rayyan QCRI (rhttps://www.rayyan.ai/) for screening. Two authors (M.M. and R.S.) independently reviewed citations following the inclusion/exclusion criteria outlined in the protocol. The two screeners resolved conflicts by discussion and consensus between themselves.

Data collection process

We developed a data extraction table. One reviewer extracted the data from the included studies and the other reviewer checked the extracted data. There was no need to contact authors for further information.

Data items

Information was extracted from each included study on: (1) study characteristics (including author/year/title, objectives, study design, setting and country, sample size); (2) type of intervention (including intervention characteristics, procedure phase, effectiveness/results, duration of the study, and health care professional focus); (3) and type of outcome measure (including RSI, near misses, recounts, miscounts, and count discrepancies).

Data sources and search strategy

The search yielded 1,792 articles, of which 1,790 were found through database searching and an additional two through forward citation tracking. We also searched for sources published in professional conferences, but these were excluded because they did not meet inclusion criteria. We removed 569 duplicates via the deduplication process in EndNote X9. For the remaining 1,223 articles, we assessed title and abstracts and excluded 1,136 articles that did not meet inclusion criteria. Two reviewers independently reviewed the full text of 87 articles for potential eligibility. Disagreements were resolved in discussion and clarification so that seventeen articles were selected by both reviewers for inclusion in our final analysis. (Fig. 1).

Fig. 1
figure 1

PRISMA flow diagram

Study characteristics

Table 2 summarizes the 17 studies that were included: one randomized control trial (RCT), two observational studies, one case control study, and 13 QI projects. The included studies represented four countries, with 14 studies conducted in the USA. Other represented countries include Brazil [20], the UK [21], and Australia [22]. Most studies were performed in academic medical centers; one study took place at the USA Department of Veterans Affairs Hospital [23] and one study in a community hospital [24]. There were two multicenter studies, and 15 single-site studies. Collectively, the 17 studies examined general surgery, labor, gynecology, urology, orthopedic, plastic surgery, bariatric, ear nose throat surgery (ENT), and vascular surgical procedures.

Table 2 Characteristics of included articles

The majority of the studies delivered interventions in the intra-operative period during the counting process; five studies [20, 21, 25,26,27] targeted interventions during other activities such as handover, timeout, and patient identification. Two studies targeted the postoperative period [28, 29]. Studies mostly delivered interventions to multidisciplinary OR staff. One study targeted radiologists [30], three studies did not identify the targeted population [23, 31, 32] for intervention, and three studies [22, 24, 33] targeted only OR nurses as shown in Table 3.

Table 3 Features of interventions in included studies, grouped by intervention type

Interventions

Interventions fell into four broad categories: (a) technology-based interventions (n = 6), (b) communication-based interventions (n = 2), (c) practice or guideline interventions (n = 5), and (d) interventions in more than one category (n = 4).

Technology-based interventions

Technology-based interventions were the most common, with six studies employing technology interventions to reduce the incidence of RSI. The intervention strategies included radio frequency (RF) with wand, radio frequency detector (RFD), data-matrix-coded sponges (DMS), and bar coding. One study [34] reported a statistically significant reduction in the frequency of RSI events, and RSI rates. In contrast, two other studies reported negative results [23, 35], finding significantly higher RSI rates in settings that implemented a surgical count technology system. Also team performance and the counting process were rated lower in operations randomized to bar coded sponges, due to technology malfunction and failure to use technology when it was available. While two studies [32, 34] reported a reduced incidence of RSI, the additional cost of using technology intervention ranged from $0.17 to $11.63 per case. The DMS intervention cost more than the RFD intervention.

Interventions based on practice or guideline changes.

The second most common type of intervention involved making changes to practice or guidelines, and these were the focus in five studies. Intervention strategies included developing a new practice, using a new tracing sheet for additional items that were not typically included in the routine sheet, or using a timeout board to identify items that were packed inside patients. Four studies targeted only the surgical count process and one study [20] sought to change both counting and hand-off practices by implementing an audit and feedback tool. These studies also measured a variety of outcomes. Two studies [20, 22] measured compliance with best practice, and one study [26] reported an increased ability to locate missing items by implementing a practice intervention that included a timeout board and timely X-ray. Only two studies [33, 36] focused on RSI as an outcome, and neither of these studies reported a statistically significant reduction in RSI incidence, instead reporting a decrease in RSI frequency.

Communication-based interventions

Two studies implemented communication-based interventions [21, 25]. In one study, researchers developed an intervention to improve communication during verbal handovers from the delivery room to the postpartum unit after vaginal birth by including information on retained vaginal swabs. This study included as outcomes a measure of the incidence of RSI, incidence of near misses, and staff compliance. The communication-based intervention was found to significantly decrease near misses and improve handovers, and decrease RSI incidence from two to none. Although the RSI incidence decreased, we cannot tell whether it came from the intervention or previous errors in reporting. The other study measured TeamSTEPPS skills [25] (e.g., brief and debriefing performance), but not RSI.

Multiple interventions

Four studies [24, 27, 30, 37] implemented multiple interventions, including various combinations of technology (RFD and DMS), standardized practices and changes to communication. All of these studies reported a decrease in RSI, although none were statistically significant. Besides reducing RSI, two studies reported other outcomes such as improved staff performance [37], knowledge of regarding prevention RSI, and compliance with the safety sponge technology system [24].

Intervention effectiveness

Studies reporting the effectiveness of interventions are shown in Table 2. The outcome that demonstrated intervention effectiveness was a decrease in RSI [21]. Other outcomes were reported in seven studies including other error events such as a decrease in surgical count discrepancies, the ability to locate missing items, or miscount rates [25, 26, 29, 30, 33, 34, 36]. A majority of the studies reported the frequency or rate of RSI, near misses, or miscounts as a health outcome, however, seven studies [20,21,22, 24, 25, 34, 37] reported other outcomes related to staff performance such as compliance or satisfaction. Three studies [29, 32, 34] attempted to estimate hospital costs related to an RSI intervention. Only two studies reported a statistically significant reduction in RSI and near miss events. Of these, one study used a DMS counting system as the intervention (p < 0.001) [34] while the other study used a communication-based intervention to improve the swab handover process (p < 0.0001) [21]. A majority of the studies did not report comparator or pre-intervention event rates.

Quality of included studies and risk for bias

Two authors judged the quality of studies independently and any disagreements were resolved by discussion. We used criteria described by the National Institutes of Health (NIH) for the four research studies and the Quality Improvement minimum quality criteria set for the QI projects (QI-MQCS) [38]. The ratings on the different items were used by the reviewers to assess the risk of bias in the study due to flaws in study design or implementation. A score was given to classify the quality of each paper as poor (0–6), fair (7–11), or good (12–16). In general terms, a "poor" rating indicated significant risk of bias, a "fair" study indicated some bias deemed not sufficient to invalidate its results, and a "good" study had the least risk of bias, so that the results were considered to be valid. Overall, study quality was poor to fair, with only four studies achieving scores categorized as good on the quality assessment tools. The majority of studies were identified as having insufficient information to assess quality because of a lack of information about study design, comparators, adherence and fidelity, or lack of health outcomes. Nearly all of the studies had low external validity.

Risk of bias ranged from moderate to high. Of the four research studies, the RCT [35] had a moderate risk of bias because randomization was performed at the patient level, not the provider level where the intervention was targeted. Although several studies targeted multidisciplinary members’ behaviors, none reported staff member characteristics before and after the interventions, which both lowered study quality and increased the risk of bias.

Discussion

Interest in RSI has been increasing internationally, rising to the level of a global patient safety issue because of adverse patient outcomes that result from RSI [5]. The findings from our review highlight current trends in the types of interventions being used to prevent RSI, and also point out the effectiveness of each intervention.

There were four types of interventions used to reduce RSI: technology, changes to practice or guidelines, communication, or some combination of these three types. A majority of the studies implemented technology interventions. Heterogeneity in the interventions used and variable study quality limit our confidence in the interventions’ ability to reduce RSI. None of the studies that deployed multiple interventions tested the effectiveness of each type of intervention separately. Such approaches make it difficult to assess the cumulative effectiveness of multiple interventions, or the relative value of each intervention, and contribute to ongoing confusion in our understanding of how best to prevent RSI. Moreover, most studies did not report the statistical results, therefore conclusions cannot be drawn about the significance of the findings.

The vast majority of the studies included in this review employed interventions that involved multidisciplinary teams, with only a small number of studies including only OR nurses. We found two studies that included all stakeholders and leadership involvement in policy development and reported a reduced RSI incidence and an improvement in staff performance [34, 37]. An analysis of scrub nurses’ perspectives on teamwork found that teamwork played a significant role in preventing retained swabs [39]. Remarkably, only one study includes radiologists, despite the important role radiologists can play in identifying RSI [30]. A recent review of RSI prevention also emphasized that when team members cooperated with evidence-based standards surgical counting improved [40]. Teamwork in the OR is important for patient safety [41]; however, the majority of studies did not report important details about team performance either before or after the intervention. Of the 13 QI projects, only one [25] reported an improvement in team performance, but the primary outcome was compliance with audits of a surgical safety checklist in the OR, rather than RSI.

We found a lack of consistent and rigorous study designs. The majority of articles in our review were QI projects rather than research studies. Of the four research studies, only one used a RCT design. The other three studies may have been constrained by feasibility issues or ethical concerns. The study in our review that used a RCT design reported negative conclusions on team performance during the counting process, suggesting that a RCT study design in and of itself does not guarantee positive outcomes. The RCT was unable to determine whether the intervention (bar code system) decreased the rate of RSI because of the large sample required to show statistical significance [35].

There may be several reasons why our review yielded mostly QI projects. Although cluster RCT study designs can minimize bias, not all investigators have the funding or resources to conduct such studies. Most of the guidelines and recommendations were developed for use in high income countries, yet they may not be practical in LMIC. Cultural differences and hierarchical relationships may be major barriers in successfully implementing practice guidelines [42, 43]. For example, “speak up” recommendations from AORN may not be practical in countries that have a hierarchy and male-dominated culture in the workplace.

Several articles reported error events such as miscounts, or count discrepancies rather than RSI events. An integrative review showed that error events were a risk factor for RSI [44]. Several factors contribute to the challenges in measuring RSI. It might be that RSI events may not be detected immediately after surgery (which is when researchers were investigating). Since RSI are rare events, measuring them takes a long time, so researchers likely chose the outcomes that they could measure during the research timeframe.

The effectiveness of interventions may depend on the context of the place in which it is applied. For example, the same type of intervention yielded different results in two studies. One study using DMS technology to prevent RSI was able to reduce the frequency of retained sponges [34], unlike the other study using RFD-tagged items which demonstrated that the technology was not successful in preventing RSI because either the surgical count technology malfunctioned or was not used [23].

Our findings can provide guidance on how to move forward with efforts to reduce RSI globally. Context, culture, and workflow differences exist in every OR, therefore guidelines and standardized practices may not be generalizable across settings. By learning about the unique characteristics in each setting, resulting interventions will be more likely to succeed. Learning about the unique characteristics of a setting may require that context, culture, and workflow differences be measured using either quantitative or qualitative methods as part of developing an RSI prevention intervention. Second, each OR has their own error events and different types of RSI and ways to report this event, therefore RSI prevention strategies need to reflect the true situation in each setting. Third, more attention is needed on the costs of interventions. Two studies reported extra costs from technology-based interventions [32, 34]. Given the high costs of adopting technology-based interventions, these may not be feasible in LMIC, so that interventions targeting social systems may be more appropriate in LMIC.

Our review has several limitations. First, we restricted our search to English and Thai language literatures (i.e., using the ThaiJO database); however, we did not find any studies in Thai. Although we wanted to capture a global picture of the types of interventions to prevent RSI and their effectiveness, the majority of studies were conducted in the USA or Australia, with only one study being conducted in a less wealthy country, Brazil. Second, a majority of the studies that we included had methodological problems, were of poor to fair quality, lacked comparator data, or had insufficient sample sizes. These methodological limitations constrain us from making a conclusion on the best intervention to reduce RSI.

Conclusion

In conclusion, we found that interventions to reduce RSI fell into two broad categories: those that targeted either the social system through education and/or policy changes, or those that targeted the technical system through the adoption of new technology. In LMIC technology-based interventions may not be financially feasible, so in those contexts interventions that target the social system may be more appropriate. However, none of the interventions were especially effective leaving the larger question of how to prevent RSI unanswered. A fresh approach may be needed, one that uses rigorous methods to investigate local contexts and build knowledge so that interventions to prevent RSI have a greater likelihood of success.