FormalPara Key Points
  • Diagnostic imaging has contributed substantially to patient care and the practice of medicine, but is accompanied by continuing gaps in quality of care and patient safety.

  • The Institute of Medicine has defined six domains of healthcare quality—safe, timely, effective, efficient, equitable, and patient centered. Additional domains include measures of “value” as well as evaluations of patient experience and provider well-being.

  • Quality measures serve to identify and quantify performance gaps, evaluate interventions to improve performance, monitor and sustain the gains achieved, and demonstrate accountability and value.

  • Measures for accountability and value should optimally assess patient outcomes but process measures can serve as effective tools for performance improvement.

  • Good quality metrics are clinically meaningful to good patient care, can be created and maintained with high quality using available data, are actionable, relate to a target for quality improvement, and have good validity and reproducibility.

  • Exemplar measures for diagnostic radiology include percent of critical results communicated within appropriate predefined timeframes (safety domain), timeliness of examination and reporting completion, adherence to evidence-based clinical practice guidelines (effectiveness), and patient satisfaction with radiology services (patient-centeredness).

  • Data from disparate database systems such as the picture archiving and communication system and electronic health record can be aggregated to form a radiology data warehouse from which quality measures can be constructed using visualization and analytics software tools to populate a performance dashboard or scorecards.

  • Quality measures alone are insufficient to improve performance, which requires leading and managing change to address technology, processes, and behaviors (personnel).

1 Overview

Advances in diagnostic imaging have helped revolutionize the practice of medicine. These advances have enhanced physicians’ understanding of diseases, improved diagnostic accuracy, and contributed tremendously to patient care. However, imaging studies are also associated with potential safety risks including kidney injury (Mitchell et al. 2012), allergic reactions from intravenous contrast, and exposure to radiation (Sodickson et al. 2009; Gee 2012). Despite benefits, significant performance gaps remain in diagnostic radiology relevant to quality of care. In their seminal report, Crossing the Quality Chasm, the Institute of Medicine (IOM) identified waste as a substantial feature of our healthcare delivery system (Institute of Medicine 2001). Heterogeneity and unwarranted practice variation contribute to this waste. Variations in diagnostic radiology practices are well documented and numerous. For example, in one large urban emergency department (ED), use of head CT for patients with trauma ranged by physician from 7.2 to 24.5% of patient encounters (with a single outlier of 41.7%) (Andruchow et al. 2012). Nationally, among 34 million Medicare fee-for-service beneficiaries in 2012, the average adjusted CT utilization intensity ranged from 330.4 studies per 1000 beneficiaries in the lowest decile hospital referral region (HRR) to 684.0 in the highest decile HRR; adjusted MR imaging utilization intensity varied from 105.7 studies per 1000 beneficiaries to 256.3 (Ip et al. 2015).

Even in a single radiology practice, substantial unexplained variation exists among radiologists in the frequency of follow-up recommendations in radiology reports, such as for pancreatic cysts—with a 2.8-fold difference in recommendation rates between readers (Ip et al. 2011), and in adherence to evidence-based guidelines for follow-up recommendations for pancreatic cysts (Bobbin et al. 2017), pulmonary nodules (Lu et al. 2016), and renal masses (Maehara et al. 2014). Variations among radiologists in terminology used to convey diagnostic certainty (Khorasani et al. 2003; Hillman et al. 2004) can create ambiguity and confusion. Such unexplained and unwarranted variations in practice of diagnostic radiology can lead to suboptimal quality of care, waste, and a diminished patient experience. Initiatives to close such performance gaps will enhance the value of radiologists and diagnostic imaging in health care.

2 What Is Quality?

In 2001 as a part of Crossing the Quality Chasm (Institute of Medicine 2001), the IOM identified six domains of healthcare quality which have come to frame the definition of quality in the United States today:

  • Safe: Avoiding harm to patients from the care that is intended to help them.

  • Effective: Providing services based on scientific knowledge to all who could benefit and refraining from providing services to those not likely to benefit (avoiding underuse and misuse, respectively).

  • Patient centered: Providing care that is respectful of and responsive to individual patient preferences, needs, and values and ensuring that patient values guide all clinical decisions.

  • Timely: Reducing waits and sometimes harmful delays for both those who receive and those who give care.

  • Efficient: Avoiding waste, including waste of equipment, supplies, ideas, and energy.

  • Equitable: Providing care that does not vary in quality because of personal characteristics such as gender, ethnicity, geographic location, and socioeconomic status.

More recently, additional domains have been proposed, including those of value, as well as evaluations of patient experience and provider well-being. The IOM domains are not mutually exclusive; several are interrelated and interventions to improve quality in multiple domains have the most leverage to improve overall healthcare quality. For example, ensuring timely booking and conduct of appointments for imaging procedures will improve efficiency of the system (and potentially equitable distribution of care) in addition to timeliness. However, improvements in timeliness and efficiency should not come at the expense of patient safety or effectiveness, and an ability to perform more MRI and CT scans must be coupled with assurances that only appropriate orders are completed (i.e., be effective by refraining from providing services to those not likely to benefit), and that unnecessary radiation exposure and other patient safety risks are minimized.

3 Why Measure Quality?

“Quality” and “value” have become integral components of the US healthcare regulatory, compliance, and reimbursement systems. In order for radiology to successfully compete for resources in our rapidly changing healthcare system, we must be able to measure, demonstrate, and continually improve quality and value. However, measuring quality is necessary but not sufficient to change performance. “Insanity is doing the same thing over and over and expecting different results” (attributed to Albert Einstein). Therefore, to improve performance (quality, safety, and efficiency) and create value, we must successfully manage change, changes that address people, processes, and technology. Within this framework, quality measures serve multiple purposes, including to (1) identify and quantify performance gaps, (2) evaluate interventions to improve performance, (3) monitor and sustain the gains achieved, and (4) demonstrate value or accountability (Boland et al. 2017), such as adherence to regulatory or accreditation requirements. Measures for accountability or value should optimally assess patient outcomes; however, process measures can serve as effective tools for performance improvement.

4 Characteristics of Good Quality Metrics

“Not everything that counts is measurable, not everything that is measurable counts” (attributed to Albert Einstein). In other words, not all processes or desired outcomes can be measured, and while a process could be measured, not all processes can have meaningful effects to achieve the desired outcome(s). It is also important to distinguish metrics (e.g., radiology report turnaround time) from target performance (e.g., 80th percentile at 6 h). Characteristics of good quality metrics include the following:

  • Clinically meaningful: The motivation behind a metric must be trusted by the people who will be using it and affected by it. Gaining user trust and support is significantly easier when a metric is sincerely clinically meaningful to the ultimate goal of good patient care. Aligning and demonstrating how a metric will affect patients as well as the interests of the clinician users will greatly improve impact. Metrics to address compliance requirements are critical to ensuring that necessary processes are in place. However, compliance metrics alone limit the opportunity to motivate clinically meaningful changes in practice to create value in healthcare delivery.

  • Relates directly to a defined target for quality improvement (QI): A metric must be clear and focused on an objective for QI. To optimize practice, measurement should be embedded in change management initiatives to address technology, people, and process gaps to enable the desired goals. Simply measuring performance may have short-term effects on performance of some, but any such gains are likely to be varied among users and unsustainable over time.

  • Distinguish metrics from target performance: A good quality metric enables adjustment of the performance target, when clinically or operationally relevant, to ensure continuous QI.

  • Easy to measure: This requirement seems simple, but numerous complexities may be encountered in accessing and comprehending the data necessary to create a quality metric. For example, if a metric is a proportion, the data in the numerator and denominator must be explicitly defined and measurable. There are several important caveats to consider. An important QI initiative in your practice may require data recording and capture by people who observe or participate in your current workflow. Such “manual” data collection strategies are often used in QI initiatives. However, to sustain any gains from such initiatives once the QI team has completed their work, easily measured, system-generated data will be needed to efficiently monitor the practice’s performance over time to help avoid sliding back to prior behaviors, processes, or outcomes.

An asset utilization metric for an expensive capital asset such as MRI helps illustrate some of the complexities. If the metric is % of time the scanner is in clinical use, the numerator can be the number of minutes a patient was in the room (time stamp of patient entering the room subtracted from the timestamp of the patient leaving the room) for all the patients scanned each day, divided by the denominator of the total number of minutes the scanner was operational that day. This may seem simple enough, but it would require each timestamp for each patient be accurately and consistently documented, and available (easily extracted), and that expected and unexpected scanner downtime be accurately captured and available for calculation each day. Also, inefficient or unnecessarily long imaging protocols will not be apparent—a single patient scanned all day in the scanner will result in a 100% capacity utilization, utterly underrepresenting the performance gap. Thus a second metric may need to be added to measure the length of each exam—which is by necessity varied across different body parts and indications for the study. Figure 1 illustrates a weekly scorecard of a capacity utilization metric for CT and MRI, based on the proportion of predetermined appointment slots used at each imaging location at a large, urban, academic medical center radiology practice, Brigham and Women’s Hospital (BWH), in Boston, MA.

  • “Easily” obtained: This attribute is particularly important to the sustainability of a metric and related QI efforts. The data needed to create the metric should optimally reside in systems used in your practice, and the data should optimally be extractable from your operational systems for reporting using commercially available, off-the-shelf data visualization tools. The more that data to construct a metric can be automated, the more sustainable it is. An important caveat is the limitation of most systems used in clinical operations to visualize and present data in meaningful forms suitable for QI initiatives. Practices focused on QI will thus need to invest in data visualization and analytics tools, and human resources capable of extracting the needed data from operational systems. The advent of machine learning techniques such as natural language processing (NLP) is helping certain metrics, previously unsustainable over time, become more feasible. For example, NLP can replace manual chart review for indications when assessing the appropriateness of MRI lumbar spine examinations performed in the ED for back pain. It is likely that artificial intelligence will help further automate the creation of useful metrics.

  • Reproducible: A foundation of the scientific process, a metric must be calibrated and reproducible, measuring the same thing consistently.

  • Valid: Credibly measures the desired attribute. For example, if a technologist enters the timestamp manually for each patient entering and leaving a scanner, errors may occur by delays in data entry or erroneous data entry into systems. The proportion of such erroneous data can make a metric for patient exam time invalid for QI or performance monitoring purposes.

  • Easy to explain: A metric’s ultimate purpose is to be consumed by a user. If a metric is too convoluted, despite how ideologically accurate it may be, its message cannot be conveyed in a meaningful manner so as to affect behavior and, ultimately, meaningful change and improvement.

  • Actionable: A metric whose results cannot be acted upon is useless as it will not produce the desired change or improvement.

  • A good quality metric enables identification of performance gaps and opportunities for improvement. If the ideal target performance of a quality metric is achieved by all in your practice, the metric is no longer a tool for QI. Rather it may become a useful tool for marketing your practice’s services. Thus a useful metric should help identify processes, behaviors, or outcomes that should be improved.

Fig. 1
figure 1

Weekly scorecard of capacity utilization for CT and MRI slots (target = 85%)

5 Examples of Imaging Quality Metrics

Quality measures for diagnostic radiology can be defined in each of the six IOM domains of quality. A recent report of the American College of Radiology’s Economics Committee on value-based payment models also provides a very useful framework for developing clinically meaningful metrics for your practice (Boland et al. 2017). As one example, Fig. 2 displays a “dashboard” of key quality, safety, and performance metrics for the Radiology Department at BWH, arrayed by IOM quality domain. The subsections that follow review exemplar imaging quality metrics in several domains.

Fig. 2
figure 2

Radiology Department Quality Dashboard at Brigham and Women’s Hospital

5.1 Safety

Failure to promptly communicate critical imaging test results is not uncommon and such delays are a major source of malpractice claims in radiology and a potential source of patient harm. Therefore, communication of critical results from diagnostic procedures between caregivers was named a 2011 Joint Commission national patient safety goal. BWH established an enterprise-wide communication of Critical Test Results policy for communication of critical imaging results (Khorasani 2009), and developed an automated system, Alert Notification of Critical Results (ANCR), designed to facilitate such communication (Lacson et al. 2014a, b, 2016; O’Connor et al. 2016). Nearly 50,000 critical result alerts are generated annually; >98% have closed loop acknowledgement within the timeframe stipulated by BWH policy. The BWH dashboard tracks the daily percentage of critical results with closed loop acknowledgment within BWH policy parameters, critical results (‘alerts’) acknowledged over time, as well as the number of alerts that are overdue (unacknowledged beyond the timeframe stipulated by BWH policy parameters). Target performance is >95% of critical results acknowledged within policy timeframe (1 h for Level 1 or red alerts; 3 h for Level 2 or orange alerts; 15 days for Level 3 or yellow alerts) (Lacson et al. 2014b).

5.2 Timeliness

These metrics should be created and measured for various modalities and care settings. At BWH, timely ambulatory MRI access is defined as the third available outpatient appointment. The third appointment is used because using the next available appointment invariably overstates capacity, as one or two cancellations occur daily. This is also congruent with how the healthcare delivery system reports outpatient access to other specialists. Inpatient and ED MRI access is defined by the time it takes from an examination request until it is performed (target performance: 90% of exams performed within 5 and 12 h, respectively). Clicking on the summary measure for ED or inpatient access on the dashboard’s home page (Fig. 2) links to a more detailed weekly scorecard of performance for CT and MRI for ED and inpatients (Fig. 3) that depicts performance for these metrics. At most practices, this information resides in the Radiology Information System (RIS). At BWH, because of the full adoption of an electronic health record (EHR) and embedded computerized provider order entry (CPOE) system for all imaging studies, the request time is taken from the CPOE database, and the examination completion is taken from the RIS module of the EHR.

Fig. 3
figure 3

Weekly scorecard of performance indications for CT and MRI for emergency department and inpatients

Various timeliness of interpretation metrics can be constructed with data obtained from the RIS or report generation databases (e.g., speech recognition solutions), depending on the practice setting. These measures span the timeliness and efficiency domains of quality. Examination and report milestones can be designated as follows: (1) examination complete (all images obtained), (2) examination dictated by the radiologist, (3) report transcribed and ready for the radiologist’s signature, and (4) report signed and finalized by the radiologist. The time interval between each milestone describes practice or individual radiologist performance for the timeliness of reporting. For example, the time from completion to finalization depicts report turnaround time, while the time from transcription (a report in preliminary status created by a trainee, or in a small and diminishing number of practices where a transcriptionist translated the voice file into text for edit and signature by the radiologist) to finalization refers to radiologist signature time. With the use of speech recognition technology, the time from dictation to transcription may be irrelevant at many practices.

The BWH Radiology Dashboard tracks the hours from preliminary to final report (preliminary reports are generated by a trainee), as well as the hours from examination completion to final report. Target performance for signature time is 90% of reports within 6 h, 7 × 24 × 365 inclusive of all care settings—ED, inpatients, and outpatients. Clicking on the summary measure on the dashboard’s home page (Fig. 2) links to a more detailed analytics module displaying various additional complementing metrics such as proportion of reports generated by trainees in different radiology subspecialty divisions (Fig. 4a) or the number of imaging studies completed each hour of each day (averaged over a predefined time period) to enable optimization of the radiologist workforce for timely delivery of needed clinical care (Fig. 4b).

Fig. 4
figure 4

(a) Scorecard of hours from preliminary to final (PtoF) report vs. %trainee-generated reports by subspecialty division, January–June, 2017. (b) Scorecard of the number of imaging studies completed each hour of each day (averaged over January–June, 2017)

5.3 Effectiveness

Measures in the domain of effectiveness assess whether services are provided based on scientific knowledge to those who could benefit and not provided to those not likely to benefit (avoiding overuse and waste). Numerous measures are possible to assess the appropriateness of the radiology examination ordered (“the right procedure”), e.g., the % of appropriate head CT orders among ED patients with head trauma. For most radiology practices, the determination of appropriateness can typically be made by comparing the order indications to appropriate use criteria, such as the American College of Radiology (ACR) Appropriateness Criteria® (American College of Radiology 2017), or to published evidence-based or local best practice guidelines. Such metrics for adherence to evidence can be constructed and used in QI initiatives. Multifaceted health information technology-enabled QI initiatives can improve adherence to evidence-based guidelines during the radiology test ordering process (Gupta et al. 2014; Raja et al. 2014; Ip et al. 2014), reaching 85% adherence to Wells criteria when ordering chest CT for pulmonary embolism in the ED and 96% adherence to American College of Physicians guidelines for use of MRI in primary care patients with low back pain. Similar multifaceted interventions have been shown to improve report signature time (Andriole et al. 2010), quality of multiparametric prostate MRIs (Silveira et al. 2015), and quality of rectal cancer staging MRI reports (Sahni et al. 2015). Tracking and improving appropriate use of imaging will be an important focus of QI initiatives and potential target of federal regulations (Protecting Access to Medicare Act of 2014) as we transition from transactional healthcare financing to value-based payment systems.

Most practices have some program for interpretation accuracy as part of their quality assurance programs. More recently, information technology (IT) solutions have been developed and implemented at some practices. The ACR’s RADPEER® system is an example of such a program and can be integrated into a picture archiving and communication system (PACS). While interpreting a current examination, a radiologist can review the report of a prior examination and agree or disagree with the prior interpretation. The substance of the disagreement can also be graded. Using such software, one can create metrics at the practice or individual radiologist level, using peer-reviewed agreement or disagreement as a proxy for accuracy of interpretation.

5.4 Patient Centered

Although debate persists regarding survey content, timing of survey administration, and relevant risk adjustment methodologies, there is evidence that self-reported measures of patient experience are distinctive indicators of healthcare quality (Manary et al. 2013). Thus engaging patients and eliciting their feedback to motivate improvements have become major initiatives across the nation’s healthcare delivery systems. However, there are few reports of such initiatives in radiology. Surveys are typically delivered to patients on paper or electronically, using standard survey content to enable comparison between peer institutions. Results of surveys are presented as mean patient satisfaction scores and percentile rankings when compared to peer institutions. Free text comments from patient respondents can be categorized as negative, positive, or mixed. Given the multitude of imaging locations within some practices (distributed by physical location and modality for example) it is possible to create a heat map based on the percentage of surveys with negative patient comments to identify targets for performance improvement (Fig. 5). Though it remains to be seen if such an approach can help improve patient satisfaction performance, experiments with various strategies to engage and train the workforce to improve patient interactions will be needed to shape optimal intervention to address this import quality domain.

Fig. 5
figure 5

Brigham and Women’s Hospital (BWH) patient experience heat map. % of patient comments that are negative; by category and by imaging center

6 Creation, Presentation, and Distribution of Quality Metrics

In a typical practice, multiple health IT systems are used in clinical operations. In radiology, such systems include the EHR, RIS module, report generation system (e.g., speech recognition system), and PACS, among others. Each system has its own database, often with different definitions for similar data/milestones. Combining the data from these various databases can provide a very useful infrastructure for developing metrics. However, in reality, informatics challenges as well as needed human resources with appropriate skills hamper such an approach in many organizations. Still, the most practical approach for quality metrics creation and reporting requires creating a new database (a data warehouse), populated by data from the disparate systems in use (Prevedello et al. 2008). Business intelligence refers to the set of tools needed to integrate, store, analyze, and present data from nonintegrated sources. Integration is a key process step to ensure that data from different sources are checked for consistency and subsequently converted into a unified format. This integration is referred to as Extract Transform Load (ETL) process and can be used to extract data from each database to populate the data warehouse. This process can be enhanced to normalize data across the varied operational databases to help automate the near-real-time population of the data warehouse.

The normalization of data is needed to minimize heterogeneous encoding of data across various databases. A simple example is to validate and ensure that a milestone called “exam begin” in one system is or is not the same as “exam start” in another operational system. Such attention to detail is critical when creating the data warehouse to help ensure that metrics can ultimately be clinically relevant, accurate, and reproducible. Relational databases, where data are represented in numerous related tables, are very common but are not ideal for ad hoc analysis because of additional needed data processing to easily understand the results of queries. Another method of organizing the data is using multidimensional data cubes using On-Line Analytical Process (OLAP) tools to enable the user to better understand the results during ad hoc queries. Relational databases can thus be enhanced by connecting to OLAP tools to enable easily understood real-time queries to the data warehouse (Prevedello et al. 2010). Once the data warehouse is created, analytic and visualization tools can thus leverage the normalized data in the warehouse to create near-real-time views of desired metrics. Although definitions are somewhat arbitrary, a dashboard often refers to near-real-time, online view of performance measures, analogous to a speedometer in an automobile. A scorecard, in distinction, will refer to a static view of performance updated at some predetermined interval (e.g., weekly, monthly). Analytics tools in contrast enable a user to create numerous custom queries of the data warehouse as needed. Figure 1 represents the current BWH quality “dashboard” with key quality, safety, and performance indicators on the home page with some updated daily, others weekly or monthly.

7 Managing Change

Creating and publishing the results of quality metrics alone is highly unlikely to result in sustainable meaningful improvement in your practice. Rather, performance improvement requires managing change in your practice, including leaders who can address technology, process, and people issues to create and sustain gains. Within such a change framework, quality measures are a necessary, but not sufficient, tool. Successful change management is a discipline to its own and requires dedicated skills and resources (Khorasani 2004; Kotter 1995), a topic beyond the scope of this chapter.

Conclusion

National initiatives (Choosing Wisely—An Initiative of the ABIM Foundation [Internet] 2015; Medicare Access and CHIP Reauthorization Act of 2015 (MACRA) [Internet] 2015) are under way to improve quality, reduce waste, and transform the healthcare system from its current transactional payment model to one based on quality and value. Measuring, monitoring, and reporting radiology quality measures, combined with multifaceted change management initiatives to address information technology, care processes, and behaviors (people) of providers who order radiology studies, and those who perform and interpret them, can encourage and enable evidence-based practice, improve quality and patient experience of care, and reduce waste. Additional research will continue to inform best practices to develop, measure, and employ quality measures as part of meaningful interventions to improve the healthcare delivery system.