Introduction

The rising cost of health care has been a driving force behind the nascent field of quality improvement. In the USA, healthcare spending represents 17% of the gross domestic product (GDP) and is expected to increase by as much as 5% each year in the coming decade [1, 2]. This trend is in sharp contrast with the majority of the world’s developed nations—including Canada, France, Sweden, Japan, the UK and Australia—which spend a stable 10% or less of their GDP on health care [1]. Notably, higher healthcare spending in the USA has not resulted in better health outcomes, which is reflected in high maternal mortality, high infant mortality, and decreasing life expectancy [3,4,5].

The Affordable Care Act of 2010 contained several provisions for improving quality in health care, but the most visible effort has been the Center for Medicare and Medicaid Services hospital readmissions reduction program [6]. The result has been an explosion of quality improvement efforts primarily focused on reducing readmissions for heart failure, myocardial infarction, and pneumonia. In particular, a brief review of the literature on quality improvement shows that the majority of published studies focus on heart failure and the transition from acute care to outpatient care, with primary outcomes of readmission rate and cost [7,8,9,10,11]. Readmissions are often treated as a substitute metric for quality.

As quality improvement projects have become more popular and numerous, there is a need to examine which strategies have proven successful in order to guide future efforts and facilitate implementation at clinician practices and departments. Our field of gastroenterology in particular has witnessed a recent increase in QI-related publications. Here, we aimed to compile a practical guide to successful QI within gastroenterology, focusing upon established quality metrics for a range of acute and chronic diseases. We conducted a systematic review of the literature.

Methods

Defining Quality Improvement in Gastroenterology

To frame our literature search, we used a commonly accepted definition of quality improvement (QI): a systematic approach to the analysis of practice performance and efforts to improve performance. In the context of health care, we viewed QI as a means of improving both process and outcomes. Accordingly, we did not include studies that simply assessed how well we are currently meeting quality standards, but restricted our results to studies with an active intervention [12]. To obtain quality metrics for specific GI diseases, we reviewed current disease guidelines (ACG, AGA, AASLD), Choosing Wisely, and the Physician Quality Reporting System (PQRS) [13, 14].

Systematic Review

We performed a systematic search of the QI literature within gastroenterology. Our PubMed/Medline search included terms for specific diseases within gastroenterology: inflammatory bowel disease (IBD), irritable bowel syndrome (IBS), celiac disease, gastroesophageal reflux disease (GERD), chronic and acute pancreatitis, chronic liver disease and cirrhosis, colorectal cancer screening, endoscopy, and gastrointestinal bleeding. Our terms for quality improvement included: health services research, healthcare delivery, healthcare improvement, clinical practice, quality of care, quality indicators, quality improvement, quality metrics, and reducing readmissions (please see Supplement 1 for exact search terms). We restricted our search to studies performed on adults and in humans. We included only experimental and quasi-experimental studies and excluded studies of surgical treatment or surgical outcomes. While we restricted our search to studies written in English, we did include studies done in other countries. We conducted a search of the grey literature, including OpenGrey, Health Systems Evidence, and the New York Academy of Grey Medicine Literature.

Results

Systematic Review

Our initial search of PubMed yielded 5345 results (see Supplement 1 for search terms). After title, abstract, and full text review cycles, we refined this list to 28 studies. We added five more studies found by hand search, from either searching the references of already included studies, or Google. Our search of the grey literature did not yield any additional titles that met our criteria.

Descriptive characteristics for each of the 33 studies are summarized in Table 1. The majority of studies were conducted in the USA, after passage of the Affordable Care Act. There were 17 studies on endoscopy, six on chronic liver disease including cirrhosis, four on IBD, three on GI bleeding, two on GERD, and one on celiac disease. We did not find any relevant studies of IBS or pancreatitis.

Table 1 Description of quality improvement studies in gastroenterology

Celiac Disease

There are no formally established quality metrics for celiac disease. To guide our search, we reviewed the ACG guidelines for diagnosis and management of celiac disease and searched for QI papers that would address appropriate serologic testing, taking adequate biopsies during EGD, or referring to a nutritionist to gluten free diet counseling. The one study we found used a pre-/posttest design to encourage providers to order tissue transglutaminase antibody (TTG) alone for serologic diagnosis, as compared to a “celiac disease panel” that included TTG as well as anti-gliadin antibody (GA IgG and IgA) and endomysial antibody (EMA IgG/IgA) [15]. Guidelines recommend TTG testing alone for initial testing for celiac disease. Based upon discussion with gastroenterologists, the study initiated the following interventions: eliminating the celiac panel order and educating physicians of expected first-line and second-line tests through a laboratory memo. After this intervention, repeat audit of laboratory testing showed that the rate of TTG alone increased and rates of ordering GA or EMA decreased significantly. While this study used education as a means to reach out to ordering physicians, this intervention was also paired with changing the EMR to encourage following the guidelines. As in the majority of studies in this review, education is a common component of QI but is often insufficient on its own.

Chronic Liver Disease and Cirrhosis

Studies of liver disease were evenly divided among the inpatient and outpatient setting. Among inpatient studies, Tapper developed an electronic checklist for use in the hepatology ward, to increase compliance to guidelines for management of cirrhosis and its complications [16]. Guidelines covered inpatient acute issues such as chemical prophylaxis for venous thromboembolism and antibiotic prophylaxis for spontaneous bacterial peritonitis, but also included initiation of long-term therapy to continue post-discharge such as beta-blockers for varices and lactulose and rifaximin for hepatic encephalopathy. Ultimately, the electronic checklist resulted in more patients receiving secondary prophylaxis for spontaneous bacterial peritonitis and appropriate dosing of lactulose, and its use was associated with significantly lower 30-day readmissions and in readmissions specifically for hepatic encephalopathy. Of note, all patients with hepatic encephalopathy in the checklist study were routinely placed on rifaximin. The Ghaoui and Desai studies each created protocols for mandatory GI involvement for admitted patients with liver disease [17, 18]. Ghaoui restricted this initiative to patients with decompensated cirrhosis, while Desai went further by dictating hospitalist and hepatologist co-management for all liver disease patients. Despite these differences, both studies found significantly increased compliance to quality measures for liver disease, including early paracentesis, prophylactic antibiotics for SBP or GI bleeding, early endoscopy for GI bleeding, and appropriate use of FFP (Table 2).

Table 2 Selection of useful strategies for QI

In the outpatient setting, the Kennedy and Aberra studies both targeted screening for hepatocellular carcinoma (HCC); the patient population in Kennedy was chronic viral hepatitis, versus cirrhosis for Aberra [19, 20]. Both studies utilized nursing-based protocols to improve screening rates; in these protocols, patients were entered into a separate database and contacted for HCC screening independent of their clinic visits. Further, nurses were able to order ultrasounds directly and to schedule reminder calls. Both programs, which were remarkably similar, showed significantly increased screening rates post-intervention. Similarly, the Loy study sought to improve HCC screening rates in patients with cirrhosis but also included hepatitis A and B vaccination and screening for esophageal varices [21]. In contrast to Kennedy and Aberra, the goal of the Loy study was to increase provider rather than patient compliance with cirrhosis quality measures, using individual feedback. Provider feedback was given at baseline following chart review, and this process was repeated intermittently over 3 years. Using this continuous QI approach, Loy was able to demonstrate a significant increase in compliance with vaccination and HCC screening starting at 2 months that was sustained through 3 years; screening for esophageal varices was not affected.

Among the liver disease studies, there were no RCTs. Study designs were split between interrupted time series (Loy, Kennedy) and pre–posttest (all others), the most common designs for QI. All studies were performed at a single center, and sample size varied from 56 patients to over 2000. However, there was no clear relationship between sample size and how successful the study was, likely due to heterogeneity in the study populations (decompensated, inpatients, vs. stable outpatients) and in interventions. Impressively, each study showed increased compliance to the disease-specific quality measure of choice, but none were able to directly demonstrate effects on mortality, length of stay, or readmissions. The most likely reason for this is unaccounted confounding; the role of sample size is unclear. As in most QI studies, there were no sample size calculations, and it was unclear what factors were considered for sample size.

Endoscopy

The 17 endoscopy studies can be grouped into those focusing on provider metrics—including adenoma detection rate (ADR), withdrawal time (WT), documentation, and referrals—and those focusing on patient metrics, such as bowel prep quality. The most frequent study design, used by ten papers, was pretest/posttest, followed by three randomized controlled trials, two interrupted times series, one non-randomized controlled trial, and one prospective cohort study. Units of study included individual providers, practices/clinics, and endoscopies; sample sizes ranged from 10 to 15 providers at a single clinic to over 10,000 endoscopies. Despite the variations in design, intervention, and sample size, only three of the 17 studies were negative; both negative studies of ADR looked at individual changes in performance rather than aggregate performance (Inra, Shaukat), while the negative study of bowel prep solely recruited patients with decompensated cirrhosis (Clayton) [22,23,24].

Among the 12 studies on provider metrics, the most common metric was ADR and the most common intervention was individual and group feedback. However, only three studies—Inra, Mai, and Kahi—had education as their sole intervention [22, 25, 26]. Neither Inra nor Kahi showed differences in ADR; Mai resulted in improved documentation for EGD and colonoscopy. The studies that also incorporated retraining, financial incentives, or other active interventions were more successful: Abdul-Baki et al. [27] utilized public reporting of endoscopists’ quality metrics, including rate of complications, appropriate documentation, correct recommendation for follow-up, withdrawal time, and adenoma detection rate. The study then followed participating gastroenterologists for 4 years after the public report and demonstrated a significant increase in average ADR, albeit without a concurrent control group. In Ross, physicians at a single center received their individual versus practice average ADR and sessile serrated adenoma detection rates (SSA) and provided a financial incentive based upon productivity, ADR and SSA [28]. They were able to show a nonsignificant increase in both ADR and SSA. The EQUIP series of studies deployed a program of video-based education, individual feedback, and direct retraining sessions in endoscopy techniques [29,30,31]. The initial single-center study showed a significant increase in ADR (Coe), which was sustained over several months in the follow-up study by Ussui. However, the subsequent multicenter RCT done by Wallace was equivocal, showing increased ADR in both control and intervention sites. Like EQUIP, the Shaukat study also used direct retraining sessions but additionally included a financial penalty for not reaching minimum ADR and withdrawal time metrics; nonetheless, no significant difference was found from this intervention [23]. Of note, Shaukat looked for differences in individual-level quality metrics, rather than an aggregate before and after ADR across providers. The results of the Shaukat study raise an interesting question: Does imposing a financial penalty, instead of a financial reward, hurt performance by harming physician morale?

In Rajasekhar, the study group trained “local leaders” on their colonoscopy technique bundle, which included supine positioning, rectal retroflexion, and reaching minimum WT; these leaders were then tasked to hold training sessions for clinic staff and send periodic reminders about adherence [32]. The study was positive but its results are difficult to interpret however, as compliance with a single one of the bundle’s measures was used as a surrogate for overall compliance with all measures. Still, this “trickle down” training approach may be a cost- and time-efficient way to disseminate education and retraining. The Imperiali et al. [33] study implemented a “continuous quality improvement” approach in which they serially measured colonoscopy completion rates and ADR in 6-month audit cycles and identified several points to intervene. Ultimately, they changed sedation practices to increase patient comfort, gave lower performing physicians access to training sessions, and continually gave individual feedback. At study end, all endoscopists had reached over 90% completion rate and 20% ADR (Fig. 1).

Fig. 1
figure 1

Study selection

As their sole intervention, Barclay placed a timer set to 8 min in the endoscopy suite to ensure the minimum withdrawal time; in addition to sounding at 8 min, there were intermediate alarms set to help endoscopists pace their procedure (the timer could be paused while taking biopsies) [34]. The timer system led to significantly increased withdrawal time and ADR. However, the long-term effectiveness of this approach is unclear, as it would be impractical, or at least irritating, to have a timer present during procedures indefinitely.

Four studies (Clayton, Hayat, Park, and Hsueh) focused upon patient education to improve bowel prep during outpatient screening colonoscopy [24, 35,36,37]. Each study created a patient education video, ranging from 6 to 30 min, explaining the importance of bowel prep in successfully completing colonoscopy. Despite this similarity, one study did not show any change in bowel prep while the others did. In the unsuccessful study, Clayton, there was a relatively small sample size but more importantly, the video was played in the office during the course of a visit for liver transplantation evaluation rather than at home at the patient’s leisure. These patients were higher acuity in general and may have had more difficulty with bowel prep than the average patient; patients also had some degree of ascites, as an additional challenge. Two of the positive studies, Hayat and Park, also examined whether improved bowel prep leads to increased ADR, but both studies were negative on this front. The length of the video varied significantly but did not appear to affect the success of the study.

The final endoscopy study, Grassini, targeted inappropriate referrals for colonoscopy [38]. Family physicians in an open-access endoscopy system were educated on criteria for appropriate colonoscopy referrals via a full-day continuing medical education (CME) course, along with a follow-up letter repeating the list of approved criteria. Gastroenterologists rated the appropriateness of referrals they received before and after this intervention and found a significant improvement. While this was an education-based intervention, the Grassini study wisely offered providers CME, which likely led to increased attendance and engagement.

Gastroesophageal Reflux Disease and Proton Pump Inhibitor Use

Both studies in this section occurred in the clinic setting and included measures for appropriate use of proton pump inhibitor (PPI) therapy. The Player study was an RCT to test an EMR-based decision support tool encouraging physicians to initiate PPI therapy if GERD had been diagnosed previously and to consider the diagnosis of GERD if atypical symptoms such as chronic cough and asthma were present (based on ICD-9 coding) [39]. Implementation of the tool increased the rate of new GERD diagnoses and the odds of initiating treatment for those with atypical symptoms. At the other end of the care pathway, the Walsh study promoted reassessment of patients on chronic PPI therapy, with the goal of de-escalating or de-prescribing if appropriate [40]. Their study also used the EMR, to first deliver a message to reassess therapy in patients on PPI for greater than 8 weeks and then to present a decision support tool to guide either continuation, lower dosing, or complete discontinuation of PPI. The study used a pretest/posttest design; subsequent chart review showed the need for PPI was reassessed in over 90% of these patients, and about 25% were de-prescribed. Both studies showed that the EMR can be successfully leveraged to improve adherence to guidelines, although the cumulative effect of multiple EMR-based clinical decision support tools may lead to “click fatigue” among providers.

Gastrointestinal Bleeding

Although several studies included GI bleeding among their outcomes, three studies focused exclusively on GI bleeding. The Loftus study initiated a protocol for a mandatory conference call between consultants from GI, interventional radiology, and surgery in the case of severe GI bleeding, defined by: large-volume bleeding, hemodynamic instability, four or more units red blood cells transfused within 24 h or 8 units total, history of recurrent bleed, re-bleeding after endoscopy or no clear source of continued bleeding on endoscopy, patient is Jehovah’s witness refusing transfusion, or at GI service’s discretion [41]. This multi-disciplinary approach led to shorter time to procedures (whether endoscopic, surgical, or radiographic) and fewer transfusions and was associated with shorter LOS and lower readmissions for recurrent GI bleeding.

The Johnson study focused on GI bleeding within the subset of suspect variceal bleed in cirrhosis [42]. Initially, they attended departmental conferences to educate hospitalists and intensivists on “optimal care,” defined by use of a proton pump inhibitor, somatostatin analogue, and prophylactic antibiotics. An electronic admission order set was then implemented which led to significantly more patients receiving optimal care and to lower 30-day readmissions. LOS was unaffected, and the study was not sufficiently powered to test the effect on mortality.

Finally, Pfau debuted a standard care pathway for non-variceal upper GI bleeding, including type and timing of laboratories, when to obtain X-ray imaging, use of IV versus PO acid blocking medications, and indications for ICU admission or for discharge [43]. The standardized pathway was developed by an interdisciplinary hospital panel and disseminated through e-mails and lectures. Their outcomes were time to endoscopy, length of stay, inappropriate use of X-rays, and inappropriate use of IV H2 blockers or PPIs. In a pretest/posttest study design, they were able to show reduced rates of inappropriate IV H2 blockers or PPIs, but no difference in other outcomes including LOS or time to endoscopy.

All three GI bleeding studies used a pretest/posttest design and occurred at a single hospital center. Johnson and Loftus were able to show both compliance with their interventions and resulting improvements in readmission rate and LOS. Pfau, while having the largest sample size, was not able to show an effect on these quality metrics; one reason may be that their standardized care pathway was simply too long, discouraging adoption. Their pathway had seven categories (such as physical examination, criteria for ICU admission) with 2–6 items in each category. In addition, several items on the pathway were appropriate and may reduce cost but probably do not affect clinical outcomes, including the use of PO instead of IV acid blockers. It is possible that the most important interventions were lost in the excessively comprehensive set of recommendations. However, overall these studies showed that checklists and other care guidelines can improve outcomes for critically ill patients.

Inflammatory Bowel Disease

All four studies of IBD had pretest/posttest design and studied ways to improve adherence to IBD guidelines among a small group of community-based physicians. Three of the four IBD studies originated from the same research group and focused on increasing understanding and adherence for Physician Quality Report System (PQRS) and National Quality Strategy (NQS) measures. PQRS measures for IBD include minimizing corticosteroid use, performing appropriate vaccinations, and testing for hepatitis B and tuberculosis prior to initiating anti-TNF therapy [14]. The six NQS priorities are broader and non-disease specific, encompassing patient safety, patient-centered care, care coordination, preventative care, lifestyle changes, and cost effectiveness [44].

The Sapir study provided education to gastroenterologists through group chart review (which counted as continuing medical education) and included a control group [45]. Subsequent chart audit showed significantly increased adherence for the intervention group across several measures including influenza and pneumonia vaccination, testing for hepatitis B and tuberculosis, and assessment of treatment side effects.

The two Greene papers studied the same PQRS measures for IBD, but separately in the UC versus Crohn’s populations; each study used education as well as chart auditing with individual physician feedback [46, 47]. The Greene UC study showed increased understanding of the measures and intent to apply them, but did not actually perform repeat chart audit or other follow-up to measure post-intervention compliance. However, their later Crohn’s study did include a post-intervention repeat chart audit. While there was no overall difference in adherence before and after intervention, a secondary analysis showed significant improvement in four of ten measures among initially “low performing” physicians: assessment of IBD type, assessment of IBD activity, testing for bone loss from corticosteroid use, and tobacco cessation counseling.

Finally, the Walsh study developed a proforma covering vaccination and screening for several infections in IBD: HPV, HIV, hepatitis C, varicella, influenza, pneumococcus and tuberculosis [48]. Gastroenterologists were surveyed for their baseline awareness of the relevant guidelines and their current practice; charts were also audited to determine baseline and post-intervention screening and vaccination rates. However, these post-intervention rates were not communicated to physicians. Following this education, physicians self-reported greater compliance to guidelines, while chart review showed increased orders and referrals placed for these vaccinations and testing. Overall, the studies were positive and showed higher adherence to guidelines after physician education. This is somewhat surprising, since many QI studies using education as the sole intervention have not been successful. It is possible that physicians are more likely to be interested in adhering to PQRS measures since there is a financial incentive for doing so.

Discussion

Our review of quality improvement literature in gastroenterology revealed several useful takeaway points to guide departments and clinicians in implementation. First, education was the most widespread and simple intervention used in these studies; however, its success was highly variable. This finding is likely the result of the heterogeneity contained within education as an intervention and the fact that many studies paired education with another intervention. For example, education in some of the colonoscopy studies could be as simple as giving physicians a “report card” on their performance meeting adenoma detection rates; this approach was generally unsuccessful unless coupled with another tactic such as retraining sessions for low-performing endoscopists. In contrast, the Grassini study also used education (successfully) to reduce the rate of inappropriate referrals for colonoscopy in an open-access system. However, here education referred to a full-day information session for family medicine clinicians that granted them several CME credits. The (largely) successful studies to improve compliance with PQRS measures in IBD utilized education, but improved compliance with PQRS also has a potential financial benefit for physicians treating Medicare patients. While education is a viable, low-cost strategy in quality improvement, the likelihood of success is directly related to the quality of the education and whether it is paired with any other interventions. It would be helpful to use education as an “active control” in future QI studies in order to quantify what contribution education makes to the success of a QI project when paired with other interventions.

Other successful QI programs used the electronic medical record to increase physician compliance with high-quality and guideline-based care. Tapper used an EMR checklist to improve inpatient cirrhosis care, Johnson implemented an EMR order set to outline “optimal care” for variceal bleeding, and both GERD studies (Walsh and Player) used decision support tools to encourage appropriate initiation and discontinuation of proton pump inhibitors. All four of these studies were quite effective and had the advantage of not requiring any additional staff time or resources to operate continuously once implemented. On the other hand, widespread application of EMR messages, mandatory forms, and decision support tools has the potential if used too frequently to inspire user fatigue. In a given system, the total number of EMR tools used must be carefully considered to avoid habitual “alert override,” a rising problem in which clinicians become inundated with alerts and began to ignore them indiscriminately, with a resulting decreased in compliance and quality [49]. Another consideration for EMR interventions is the level of complexity. The Pfau GI bleeding study was less successful than Tapper or Johnson, and one potential reason is the length and scope of their standardized care pathway for upper GI bleeding. The EMR itself has added complexity to providers’ tasks, and any interventions utilizing the EMR should be as streamlined as possible to increase the likelihood of adherence.

To improve patient compliance, dedicated staff and protocols appear to have merit. The Kennedy and Aberra studies of HCC screening in cirrhosis were remarkably similar, and both utilized nurse-driven protocols to identify patients in need of screening and remind patients by phone or messaging. Both studies were effective and circumvented the common problem of patient no-shows or lack of follow-up: Patients were contacted independently of office visits and nurses were able to order ultrasounds. While this approach is labor-intensive, employing dedicated staff may well be cost-effective when used to prevent highly morbid outcomes such as diagnosis of advanced HCC.

Another “low tech” strategy for QI was to mandate GI and other sub-specialty involvement in the management of select patient populations. Loftus required a multi-disciplinary meeting among GI, IR, and surgery for severe GI bleeding, and Ghaoui implemented GI consultation for all inpatients with decompensated cirrhosis. Desai went still further, by coordinating co-management of chronic liver disease patients between hospitalists and hepatology. All three studies showed increased compliance with high-quality care such as early endoscopy for GI bleeding, early paracentesis, or prophylaxis for SBP; only Loftus showed a reduction in length of stay and mortality, but the other studies were not adequately powered. Redesigning workflows in this way has great potential to streamline necessary care, particularly in the time-sensitive inpatient setting.

Rigorous study design is an ongoing issue in quality improvement studies. The vast majority of studies included here utilized quasi-experimental design, most often a simple pretest/posttest approach. There were a handful of randomized and non-randomized controlled trials, including those on endoscopy retraining and improving patient bowel prep. It was unclear what factors led the authors to choose one design over another, although it is likely that financial and time restraints influenced the rigor of study design. Overall, the studies were heterogeneous and it was unclear whether one design was more generally successful than others; however, the general argument can be made that the field of QI would benefit from the use of more rigorous designs using control groups such as RCTs. There were also a wide variation in sample size and no obvious pattern relating sample size to likelihood of having a successful intervention, which again may relate to overall study heterogeneity. Notably, none of the studies contained power calculations and it is likely that several of the smaller studies were underpowered. In general, there is a need for standardized application of research methods to quality improvement studies.

Our study is also notable for what was not found: we did not find any studies on improving patient-reported outcomes (PROs) or the impact of the patient experience on compliance with care. Since patient engagement and satisfaction are crucial factors in achieving high-quality care, explicit measurement of PROs should be an integral part of quality improvement. There was also a lack of studies on specific areas within GI such as IBS and chronic pancreatitis. We suspect this gap reflects the paucity of quality metrics for these diseases and the need for metric development.

Moving forward, a crucial question in quality improvement is how to focus limited resources on strategies likely to benefit the most patients. While technology can be a crucial and time-saving aid, it cannot replace dedicated staffing or reimagining of clinical workflows in certain cases. Here, we have attempted to characterize the attributes of successful QI programs for select disease areas in gastroenterology and consider the challenges involved in their implementation. Appropriate management of limited resources is becoming more important as we move away from volume-based to value-based health care, and accordingly QI will continue to play an important role in all fields of medicine. Value in this setting represents the health outcomes achieved per dollar spent and an alignment of patients’ and providers’ goals. Combining this approach with improving the patient experience should be the goal of quality improvement.