Anesthesia care occupies a unique scope of practice in medicine, as it facilitates the safe delivery of surgical care and postoperative pain management.1,2,3 Since its inception, anesthesiology has driven improvements in the quality and safety of surgical care. There has been a 97% decrease in anesthesia-related deaths since the 1940s and the current mortality risk attributable to anesthesia for surgical inpatients is approximately 1 in 100,000.4

Improvement in health care delivery depends on the ability to measure outcomes that direct meaningful changes in health systems.5,6,7,8 This process requires derivation of valid and usable quality indicators specific to the area of health care delivery. Quality indicators are commonly categorized using the Donabedian framework, which classifies indicators as structure (resources and capacity), process (health care providers’ actions during delivery of care), and outcome (impact of health care service or intervention on health status).9,10 Classification systems aid in communicating the relevance and significance of quality indicators but do not provide insight into the level of evidence on which they are based. In the USA, the gold standard for evidence-based health care quality measurement are the National Quality Forum (NQF)-endorsed quality measures.11,12

Within the field of anesthesiology, the Anesthesia Quality Institute (AQI) was established in 2010. The AQI has implemented a number of quality initiatives (e.g., National Anesthesia Clinical Outcomes Registry, Anesthesia Incident Reporting System)13 to improve delivery of anesthesia care. Similar efforts in the UK include the development of the Perioperative Quality Improvement Programme in 2016, with the goal of collecting perioperative information such as complications and patient-reported outcomes in patients undergoing major noncardiac surgery.14 These registries have driven improvements in large-scale measurement of health care delivery but their ability to affect meaningful clinical change remains unknown. Furthermore, the measures used to improve health care delivery are often derived from expert consensus rather than systematic reviews or prospective evaluation. Many anesthesia-specific indicators are not NQF-endorsed.11 Ultimately, it remains unclear which quality indicators should be used to drive improvements in perioperative patient outcomes from an anesthesiology perspective.

Our objective was to conduct an umbrella review (a systematic review of systematic reviews15) to identify and synthesize systematic reviews that provide an overview of anesthesia-attributable quality indicators in noncardiac surgery. As multiple up-to-date systematic reviews of quality indicators were available, an umbrella review design was effective and efficient, with the advantages of allowing a broad overview of indicators while supporting a detailed synthesis to identify consistent recommendations that can inform care.16 The umbrella review design also allowed identification of important gaps requiring future scholarship. We further planned to synthesize our objectives according to the Donabedian framework to align our findings with a key quality improvement framework. Ultimately, our goal was to provide findings that could highlight robust quality indicators that should be routinely considered in clinical practice, while identifying key knowledge gaps to inform further research and development in anesthesia quality.

Methods

The present umbrella review was conducted in accordance with best practice methodology recommended by the Joanna Briggs Institute (JBI).17 The protocol was registered a priori with the International Prospective Register of Systematic Reviews (PROSPERO); CRD42020164691. This manuscript adheres to the applicable Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines.18

Literature search

We developed a comprehensive systematic search with the assistance of an information specialist. Our search strategy adhered to the Peer Review of Electronic Search Strategies (PRESS) checklist, an evidence-based set of guidelines used by librarians and information specialists to evaluate search strategies.19 We applied the search to MEDLINE, Embase, CINAHL, and Cochrane databases for articles published from database inception to 25 January 2022. We included electronic publications ahead of print, in-process, and other nonindexed citations. No language restrictions were applied. We combined keywords for quality indicators with surgery-, anesthesia-, and perioperative-specific keywords and the term “systematic review” (see Electronic Supplementary Material [ESM] eAppendix for full search strategy).

Study selection

We included systematic reviews examining perioperative quality indicators in patients ≥ 18 yr of age undergoing noncardiac surgery. We excluded studies if the reviews 1) included mixed populations of medical or surgical patients where perioperative indicators could not be separated from other types, 2) focused on surgical, nursing, or critical care quality indicators, 3) did not meet all eleven criteria of the JBI critical appraisal checklist17 (see ESM eTable 1 for full checklist).

Screening

Title and abstract screening was performed in duplicate by two independent reviewers (F. N., G. L.). Disagreements were reviewed and resolved by consensus with a senior team member (G. H.). If uncertainty remained, the full text of the study was reviewed. Full texts were also reviewed independently and in duplicate for consensus; disagreements were resolved with a third reviewer. The screening process was completed using DistillerSR (Evidence Partners, Ottawa, ON, Canada), a web-based systematic review platform.

Risk of bias assessment

Risk of bias assessment of the included systematic reviews was performed by adaptation of an existing risk of bias assessment tool.20 As there is no specific tool available to assess the risk of bias in systematic reviews of quality indicators, we used the AMSTAR 2 (revised A MeaSurement Tool to Assess systematic Reviews)21 criteria that our team determined to be applicable to reviews of quality indicators (see ESM eTable 2 for all AMSTAR 2 criteria). AMSTAR 2 denotes criteria as critical or noncritical. Critical criteria relevant to our study were criteria 2 (protocol registration prior to commencement of review), 4 (adequacy of the literature search), and 7 (justification for excluding individual studies). Noncritical criteria that were also assessed were 1, 3, 5, 6, and 16. Criteria 8–15 did not apply to our included studies.

The risk of bias assessment was completed in duplicate by two independent reviewers. Discrepancies were resolved by consensus. Included studies were assigned an overall level of confidence as “high” (0–1 noncritical weakness), “moderate” (> 1 noncritical weakness), “low” (one critical flaw with or without a noncritical weakness), or “critically low” (more than one critical flaw with or without noncritical weaknesses) according to the AMSTAR 2 criteria as outlined in Box 2.17

Data extraction

The primary outcome was any perioperative quality care indicator that could be directly attributed to the practice of anesthesiology. For example, the appropriate administration, timing, and dose of prophylactic antibiotics by an anesthesiologist was considered to be an anesthetic indicator, while the development of surgical site infection was considered to be multifactorial. Data were extracted from full text using a prespecified and piloted data collection form. The data extraction was completed using forms created with Airtable (San Francisco, CA, USA), a spreadsheet-database platform. Quality indicators were collected from systematic reviews by an individual reviewer, unless a systematic review included more than ten indicators, in which that study was reviewed in duplicate to ensure no indicators were missed. All indicators were reviewed in duplicate after being collected from the original systematic review. Discrepancies or uncertainties were discussed with a senior author. Where applicable, the surgical subspeciality focus of an included quality indicator was extracted and categorized as reported in the included review. All indicators were classified into the Donabedian framework per the original systematic review, or by the review team if not previously classified. Along with classification by Donabedian domain, each indicator was also classified according to the most relevant perioperative phase of care (preoperative, intraoperative, postoperative). An indicator was classified as being “perioperative” if it was relevant to all three perioperative phases of care. If an indicator could be applied to multiple (> 1) but not all perioperative phases, it was also classified into the perioperative phase, which was used to classify any indicator that could not be uniquely classified into the preoperative, intraoperative, or postoperative phases. The classification process was performed in duplicate by two authors (F. N., G. L.). Any disagreements or uncertainty were reviewed and resolved in consensus with a senior team member (G. H.). If an indicator was assigned to multiple domains by the original authors, reviewers determined the most relevant domain through consensus.

Using these two frameworks allowed for the creation of an evidence matrix where indicators could be concurrently mapped by Donabedian domain within the context of the perioperative journey. To address overlapping data between systematic reviews, data were extracted from all included systematic reviews and duplicate quality indicators were removed during data synthesis, as outlined in the methodology of performing umbrella reviews.22

Level of evidence

We also collected the Oxford Centre for Evidence-Based Medicine (OCEBM) level of evidence23 for each indicator using the OCEBM 2009 criteria; this was the most used system in the included studies. Levels of evidence were extracted as reported for each indicator in each included systematic review. If included reviews did not state the OCEBM level of evidence, the level of evidence was set as missing (i.e., we did not assign our own level of evidence if none was reported).

Results

Included studies

Our systematic search returned 1,475 citations. After duplicate removal and title and abstract screening, 92 citations advanced to full-text review; in the end, 23 systematic reviews were included. The oldest included systematic review was from 2005 and the most recent review was from 2019. The most common reason for excluding a citation from our full-text review was lack of anesthesia indicators (n = 24), focus on nursing, surgical, or intensive care unit indicators (n = 10), or duplicate articles (n = 7). Three citations were excluded for not meeting all eleven of the JBI criteria.24,25,26 A PRISMA flowchart outlining the search results is shown in Fig. 1.

Fig. 1
figure 1

PRISMA flow diagram

Study characteristics

A description of the 23 included reviews27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49 with a summary of key findings is shown in Table 1. The 23 included systematic reviews synthesized data from a total of 3,164 primary studies. The most common country of publication was the USA (11/23). Most included systematic reviews used the Donabedian quality framework (13/23).

Table 1 Characteristics of included systematic reviews

Risk of bias

Using the AMSTAR 2 tool, the overall quality of included reviews was “high” (n = 0), “moderate” (n = 1), “low” (n = 7), or “critically low” (n = 15). Specific critical AMSTAR 2 domains leading to decreased overall confidence were 1) the lack of explicit statements that review methods were established a priori, 2) failing to outline the search strategy, or 3) failing to list the excluded studies and reasons for exclusion. Methodological quality ratings for each included review are summarized in ESM eTable 3.

Quality indicators

In total, 330 unique perioperative quality indicators were collected from the included systematic reviews (full list in ESM eTable 4). One hundred of the 330 quality indicators were specific to a surgical subspecialty, with orthopedic (n = 27) and colorectal surgery (n = 21) having the most specialty-specific indicators. Donabedian process indicators occurred with the highest frequency (n = 169; 51% of indicators), followed by outcome indicators (n = 114; 35%), and structure indicators (n = 47; 14%). Most indicators applied to the intraoperative phase of care (n = 135; 41%), followed by postoperative (n = 75; 23%), preoperative (n = 57; 17%), and perioperative (n = 63; 19%) phases. Overall structure, process, or outcome quality indicator distribution organized by perioperative phase of care is shown in Fig. 2.

Fig. 2
figure 2

Quality indicator distribution organized by Donabedian and perioperative domain

Structure indicators (Table 2) were identified across all perioperative phases. Level 2 evidence supported preoperative multidisciplinary care, while the strongest evidence (level 1b) supported intraoperative quality indicators (i.e., that anesthetists performing regional techniques should be trained in ultrasound-guided techniques) and perioperative indicators (i.e., the need for protocols to manage perioperative complications and comorbidities). No postoperative recommendations were supported by high-level evidence.

Table 2 Qualitative synthesis of quality indicators by perioperative phase

Process indicators (Table 2) were also present across all phases of perioperative care, but were predominantly focused on pre- and intraoperative phases. Within the preoperative phase of care, there was high-quality evidence (level 1a) supporting the use of prophylactic antibiotics prior to surgical incision, use of thromboembolism prophylaxis, and avoidance of premedication or sedation prior to procedures. Four other preoperative processes also achieved level 1b evidence.

For process indicators in the intraoperative phase of care, the strongest evidence (level 1a) supported the use of sterile technique for central venous line insertion, avoidance of systemic morphine intraoperatively, and the presence of a trained anesthetist throughout the operative period. Three intraoperative level 1b process indicators and five level 2 process indicators were also identified.

Perioperatively, level 1a evidence supported an indicator of continuation of chronic beta blockers and maintenance of normothermia. Three further quality indicators achieved level 1b evidence and three indicators achieved level 2 evidence. The highest level of evidence achieved by postoperative process quality indicators was level 2b, specifically the use of a multimodal approach to optimize gut function.

Outcome indicators (Table 2) were most frequently reported in the intraoperative phase of care; however, no outcome indicators were supported by high-level evidence (i.e. OCEBM level 1 or 2). Most outcome indicators related to failure at the process level (i.e., failure of a regional technique, airway incident, hypothermia) or medical and surgical complications.

Overall, 88 of the 330 indicators had an OCEBM level of evidence assigned based on the original systematic review. Of these, 45 indicators included in our study were supported by level 1 or 2 evidence. Most indicators supported by level 1 or 2 evidence were process indicators (n = 40) and were specific to the preoperative phase of care (n = 16). The quality indicators supported by the highest evidence are highlighted in Table 3.

Table 3 Quality indicators with highest level of evidence

Discussion

In the present umbrella review of 23 systematic reviews that reported 330 anesthesia quality indicators from 3,164 primary studies, we synthesized available indicators as structures, processes, or outcomes organized by the applicable perioperative phase of care. By situating indicators from recent and up-to-date systematic reviews within the Donabedian quality framework across the perioperative period, supported by the underlying quality of evidence for each indicator, we provide a map of quality indicators that can be used to monitor anesthesia quality and drive improvement. Our map further identifies gaps among available indicators that should be addressed through further scholarship. High-level evidence supports current monitoring of important perioperative structures related to staff training and clinical management protocols, and processes associated with antibiotic and thromboembolic prophylaxis, sterile procedural techniques, presence of trained anesthesia providers, maintenance of normothermia, and beta-blocker continuation. A clear gap appears to exist in outcome-quality indicators, as none were supported by high-quality evidence, which highlights a key focus for anesthesia quality initiatives moving forward.

Improving the quality and outcomes of perioperative care is a long-established priority and a strength of the field of anesthesiology. Over the past 80 years, quality and safety measures have allowed for anesthesia-related deaths to decrease by more than 97%.4 Concurrent with these remarkable improvements in safety, the formal field of quality improvement has also emerged, led by earlier pioneers such as Codman, who focused on adequate measurements of long-term patient outcomes, and Donabedian who codified the need to consider the related impacts of structures and processes of care on the end results experienced by patients.50 Such efforts have been taken up in recent decades in anesthesia-specific quality organizations such as the USA-based AQI13 and UK-based Perioperative Quality Improvement Programme initiatives,14 which have worked towards improving the assessment and delivery of high quality perioperative care.

Based on the findings of our umbrella review, substantial effort has been expended in the field to develop quality indicators to drive further improvement in anesthesia care. Since 2005, at least 23 systematic reviews of quality indicators have been published, which synthesized findings from more than 3,000 original studies. Taken together, this has led to over 300 quality indicators that have been recommended at least once. While a common saying in health care improvement is that “we cannot improve what we can’t measure,”51 clinicians, quality leads, administrators and other stakeholders require high-quality information to choose relevant and evidence-based quality indicators from the hundreds that have been described. Our synthesis suggests that at least five structures and 40 process indicators are supported by high-quality evidence (i.e., OCEBM level 1 or 2). These indicators should likely be considered for regular monitoring in many settings, recognizing that applicability may vary by surgical specialty or perioperative setting. Our review found that quality indicators with the strongest evidence were for antibiotic prophylaxis and venous thromboembolism (VTE) prophylaxis, including their appropriate selection and timely administration, which coincide with national consensus guidelines for antibiotic and VTE prophylaxis proposed by the NQF,52,53 as well as national and international guidelines by the Centers for Disease Control and Prevention and the World Health Organization.54,55 By highlighting indicators supported by high-level evidence, we propose potential evidence-based targets for further guidelines to be developed that can direct the measurement, assessment, and improvement of health care quality metrics within the field of anesthesiology. Concurrently, the lack of high-quality evidence supporting other core areas of anesthesia management, such as monitoring and reversal of neuromuscular blockade, highlight the continued need to develop a high-certainty evidence base for routine anesthesia care.

Several indicators that were classified as OCEBM level 1 evidence in their original systematic review could be applied to all cases of noncardiac surgery, but there were some that could be patient- or surgery-specific. For example, avoiding the use of routine preoperative medication or sedation and avoiding the intraoperative use of systemic morphine were indicators supported by level 1a evidence in an included systematic review,29 without additional information in the included review regarding their applicability as quality indicators for specific patient populations, types of surgery, or other clinical contexts. This is an acknowledged limitation of umbrella reviews, which necessitates that clinicians interpret and apply this evidence synthesis through the lens of clinical experience and expertise. For example, while high-certainty randomized controlled trial evidence shows no efficacy for preoperative sedation with lorazepam in older patients,56 this cannot be directly generalized to provision of midazolam in the operating room immediately prior to or as part of induction of general anesthesia. Therefore, some indicators may be suggesting that the premedication and systemic morphine should be used judiciously and tailored to specific patient populations rather than used routinely and in all patients. This highlights the importance of developing specific, objective, and contextualized quality indicators in the perioperative period.

Once quality indicators have been identified, it is important to develop methods of measuring them appropriately so that they can be applied towards improving patient outcomes. Importantly, the adoption of electronic health records has allowed meaningful patient data to be collected more easily and on a larger scale.57 For quality improvement departments, it will be important to prioritize measurement of quality indicators that can feasibly and consistently be collected, as this will vary by hospital system and electronic health record. By highlighting quality indicators of the highest level of evidence, we propose several possible indicators that can be used by anesthesiology departments moving forward. Collection of data into nationwide databases such as NACOR13 developed by the AQI allows for the broad evaluation of indicators, with the potential for use in large-scale health care analytics that can drive health care quality improvement using comprehensive evidence.

The identification and measurement of quality indicators is not enough to improve health care delivery—measurement of these indicators needs to encourage change in the actions of health care providers, a process that requires initiatives that can provide feedback and re-evaluation. Feedback should be continuous and should occur over an extended time, two features that have been shown to be influential on physicians’ acceptance of feedback.58 Within our own hospital, we have explored the use of a performance assessment tool that collected postoperative outcome data (e.g., postoperative nausea, vomiting, and pain) that was sent as feedback to anesthetists.59 The majority of participants in this initiative found that it would be an effective tool for professional development but further studies will be required to explore how the feedback can be directly used to change outcomes within our institution. Other groups have used methods such as monthly feedback systems or peer audits to direct change60 and have shown the ability to create clinically significant differences in process measures of quality after the implementation of these feedback systems. The focus on process measures can support high levels of adherence and change in clinician behaviour to improve care, as process measures are directly under the purview of health care providers.61 While determining which quality indicators are important to measure, the continual reassessment of their use based on feedback will be vital in ensuring that they will properly improve patient care.

Ultimately, while optimization of health care structures and processes is an important goal, the primary measure of success in perioperative care should reflect the outcomes that patients experience. This makes it notable that none of the outcome measures synthesized were supported by high-quality evidence. Moving forward, collaboration between quality improvement experts and researchers working to optimize outcomes in clinical research could represent one pathway to achieve a consistent and high-quality set of patient outcome indicators. Efforts such as the Standardized Endpoints for Perioperative Medicine (StEP) group and Core Outcomes Measures in Perioperative and Anaesthetic Care (COMPAC) initiative represent key building blocks.62 Nevertheless, in addition to inputs from clinicians, quality leads and researchers, patients should also be directly involved to ensure relevance of any measures to the individuals who ultimately receive anesthesia care. Further efforts to causally link improvements in structure and process indicators to these key outcomes should also be prioritized.

Our review has several limitations. First, no risk of bias tool exists that is specific to qualitative umbrella reviews examining quality indicators. To ensure we approached risk of bias assessment in a structured manner, we used the AMSTAR 2 tool with unapplicable criteria omitted to evaluate the systematic reviews included in our umbrella review. As such, the risk of bias assessment may not assess all relevant qualities of included studies as no specific tool exists for this type of assessment.

Second, our umbrella review search was limited to systematic reviews published in bibliographic databases. There may be quality indicators which are housed in other settings such as guidelines, websites, or hospital policies, which have not been included in systematic reviews. Initiatives from other groups such as StEP or COMPAC that have examined quality indicators in anesthesiology would not be captured in our study if their work was not published in a systematic review.

Additionally, overlap in data across systematic reviews is a known problem in umbrella reviews.22 Although we tried to overcome this by removing duplicate quality indicators, it is likely that some of the reviews included in our umbrella review drew from similar or same sources. Further analysis into the degree of overlap could provide additional insight into whether disproportionate consideration is being made towards indictors for which an established evidence base is already present.

Working in the perioperative realm, there is significant overlap between the work of anesthesiologists, surgeons, nurses, and other clinicians. Delineating clinical indicators as purely attributable to anesthesia or surgery is difficult in the collaborative setting of perioperative medicine and is an acknowledged challenge in the field.63 For example, the development of a surgical site infection can be considered attributable to anesthesia, surgery, or potentially jointly attributable. While we were unable to identify any qualitative evidence that could proportionally attribute surgical site infections to surgeons vs anesthesiologists, we acknowledge that this indicator could be considered an anesthetic, surgical or combined quality indicator. Moving forward, future efforts to develop quality indicator sets will likely need to address methods to navigate quality metrics that are not clearly attributable to only one group of clinical actors.

Lastly, heterogeneity existed across the systematic reviews included in our study. Seven of 23 included studies used a form of expert consensus to determine quality indicators, using a Delphi, modified Delphi, or Research and Development/University of California, Los Angeles method.64 Only a subset of our included methods included this approach while the others were systematic reviews of existing quality indicators, therefore introducing heterogeneity into our data.

From an umbrella review that synthesized 23 systematic reviews of anesthesia quality indicators, we mapped 330 indicators across the perioperative period according to the Donabedian structure-process-outcome framework. Our review highlights high-quality indicators specific to anesthesia currently supported by level 1 evidence such as antibiotic prophylaxis, VTE prophylaxis, postoperative nausea and vomiting prophylaxis, maintenance of normothermia, perioperative beta-blocker management, and multidisciplinary care. These may represent useful and valuable targets for anesthesiology quality-improvement initiatives. Further development of quality indicators at the patient outcome level are required and should build upon patient-oriented and multistakeholder engagement.