Introduction

Quality assessment and public reporting are powerful approaches to improve quality of care whether it is preventive services, acute surgical care, and chronic illness management. We can learn a lot from the 20 years of coronary artery bypass grafting (CABG) surgery report cards experience (Hannan et al. 2012). It is also widely recognized that the chief factor of the success of the cardiac surgery report cards is the development of the New York State (and then national) coronary angioplasty reporting system to ensure collection of high-quality clinical data, including data elements not routinely available from administrative databases. Establishment of the Percutaneous Coronary Interventions Reporting System (PCIRS) in 1992 allowed for development and ongoing calibration of the cardiac surgery risk-adjusted mortality model which in turn provides meaningful reports that local practices can use to compare their performance with similar groups and national benchmarks, without the fear of being penalized for treatment high-risk patients. In the last 20 years, greatly due to the publically available CABG Reports Cards, the outcomes of CABG and over cardiac surgical procedures have improved dramatically (Mukamel and Mushlin 1998; Hannan et al. 1994, 1995).

In recent decades, as life expectancy has continued to grow around the world, the illness profile of highly populated countries in the Middle East and Asia has undergone an epidemiologic transition from predominantly infectious diseases to primarily chronic illness, vastly expanding the role and importance of surgical services. Surgical procedures that were previously extremely rare as well as “simple, ineffective, and relatively safe” became common, “complex, effective, and potentially dangerous” (Chantler 1999). On average, an American patient is expected to undergo about 10 surgical procedures in a lifetime, translating into an estimated 234 million operations annually worldwide (Weiser et al. 2008; Lee et al. 2008). While surgery can be extremely beneficial, often saving lives, surgical procedures are also associated with the risk of complications, infection, and death. Furthermore, surgical interventions are the key treatment modalities for many prevalent conditions including cancer, trauma, and obstetrics, positioning surgical quality and safety as one of the top public health concerns.

Public worry and focus on medical outcomes is entirely warranted. The Institute of Medicine (IOM) in the landmark 1999 patient safety report “To Err is Human” concluded that the healthcare in the United States is not as safe as it should be. One of the report’s main revolutionary conclusions was that the majority of medical errors in the United States did not result from individual recklessness. More commonly errors are caused by faulty systems, processes, and underlying conditions that lead people to either make mistakes or fail to prevent them. The report advocated reducing harm through system-based initiatives rather than increasing pressure on individual providers (Brown and Patterson 2001). A focus on surgical outcomes is thus even more paramount where any small slip can quickly lead to disastrous consequences.

While the IOM report led to some system-level improvements, including expansion of health insurance coverage through PPACA in 2010, many problems remained or even worsened. In 2013, the IOM convened a committee of experts to examine the quality of cancer care in the United States and formulate recommendations for improvement. Delivering High-Quality Cancer Care: Charting a New Course for a System in Crisis presented the committee’s findings and recommendations. The committee concluded that the cancer care delivery system is in crisis due to a growing demand for cancer care, increasing treatment complexity (including surgical procedures), a shrinking workforce, and rising costs (Levit et al. 2013).

While it is widely recognized and accepted that assessment of surgical quality and outcomes should be a continuous process alongside care delivery, there is no clear consensus on how, when, and what outcomes should be measured. The problem is fueled by the fact that quality’s definition changes depending on the stakeholder’s perspective. For instance, surgeons evaluate each other’s quality based on technical skills, board certifications, and morbidity which is under their perceived direct control, characteristics that are often invisible and hence meaningless to patients. Instead, patients prefer clinicians with excellent communication skills who are always on time, regardless of whether or not the surgeon is a board-certified Fellow of the American College of Surgeons (FACS). Similar discrepancies and misalignments can be observed with respect to surgical outcomes. The vast majority of surgical oncologists will consider clean margins as synonymous with being “cured of cancer,” despite the fact that a patient may still have to endure many months of exhausting and toxic chemotherapy and radiation, temporary or permanent colostomy, fatigue, depression, and undesirable cosmetic changes. Successful quality improvement in clinical practice requires a common vision, multidisciplinary plans, and cooperation among all involved stakeholders, across the spectrum of all clinical providers including healthcare administrators, payers, social services, community organizations, and patient advocates.

Hurtado (Hurtado et al. 2001) defines quality as “the degree to which health services for individuals and populations increase the likelihood of desired health outcomes and are consistent with current professional knowledge,” but such broad definitions can have limited direct applications. A more useful definition of quality measures it over six domains: effectiveness, timely access, capacity, safety, patient centeredness, and equity (Leatherman and Sutherland 2003). Within each of these domains, it is possible to measure various elements, and so from this paradigm, a picture of a service’s quality of care can be outlined. However, such comprehensive assessment can be too burdensome and thus not practical for frequent monitoring and real-time evaluation.

In addition, there have been significant efforts to identify and assess important elements of care pathways, rather than individual procedures, which may lead to better outcomes and higher quality (Donabedian 1966; Hurtado et al. 2001; Maxwell 1984; Schiff and Rucker 2001; Sitzia and Wood 1997). Many countries have made significant progress with the implementation of national quality programs (Department of Health Office 1995; Department of Health 2000) including NSQIP (Agency for Healthcare Research and Quality 2009; Australian Commission on Safety and Quality in Healthcare 2008; American College of Surgeons 2014a), but further research is required to accurately and affordably improve assessments of surgical quality.

Stakeholders for Surgical Outcome Assessment

There are many stakeholders that actively participate in surgical quality initiatives. When there is common purpose between these groups, progress can easily be made; however, often agendas do not align making advancement difficult. Understanding the key stakeholder, their perspective, and roles is fundamental to quality improvement.

Medical societies and professional groups have long been the leaders in developing clinical practice guidelines, supporting provider accreditation, and both auditing and providing clinical training as well as continuing medical education activities. While heavily dominated by surgeons, the field of surgical outcome assessment also includes medical and radiation oncologists, imaging scientists, primary care providers, other advanced care partners, and allied health professionals. These include, but are not limited to, the American College of Surgeons (ACS), the Commission on Cancer (CoC) the Consortium for Optimizing Surgical Treatment of Rectal Cancer (OSTRiCh), American Society of Colon and Rectal Surgeons (ASCRS), Society for Surgery for the Alimentary Tract (SSAT), Society of Surgical Oncology, and others (American College of Surgeons 2014b, c; Optimizing the Surgical Treatment of Rectal Cancer 2014; Society for Surgery of the Alimentary Tract 2016; Society for Surgical Oncology 2014).

The provider stakeholder structure can take many forms and can work at every level of the healthcare system. For instance, the American College of Surgeons represents an umbrella organization that pushes an overarching quality agenda. Its purpose is to be broad, as the organization spans multiple disciplines. While ACS includes lobbying initiatives in congress, it also has recently employed benchmarking for hospitals and now individual providers through data collection and risk adjustment. Other broad organizations, such as the National Comprehensive Cancer Network (NCCN), release specific consensus guidelines aimed at improving care through utilizing the best available evidence. Other societies with a narrower focus also contribute to determining guidelines aimed at standardizing care for specific biologic systems as demonstrated by the American Society of Colon and Rectal Surgeons who release guidelines about colon screening recommendations, prophylaxis, and other elements of cancer care. There are also disease-specific groups such as the Consortium for Optimizing Surgical Treatment of Rectal Cancer (OSTRiCh), or regional groups such as the Upstate New York Quality Initiative (UNYSQI), which currently focuses on improving the quality of colon resections. Quality improvement at the hospital and surgical division level also occurs aimed at more specific interventions such as thromboprophylaxis protocols or surgical site infection prevention bundles that are more applicable to single providers or individual hospital systems. This hierarchical structure, however, is not partitioned or independent with extensive overlap between organizations, societies, disease-specific coalitions, and locoregional initiatives. Collaborations between all groups can propel initiatives; however, their recommendations are not always aligned with one another with nuanced differences that can create confusion and can potentially hinder quality improvement efforts.

In the current environment post-PPACA, accountable care organizations are frequently the key drivers of clinical quality improvement. This is because according to the Triple Aim principle developed by Don Berwick and the IHI, high-quality care overall is less expensive than poor care. Accountable Health Partners LLC is one of the accountable care organizations in the Greater Rochester area. It was organized to create a partnership between URMC and community physicians, to enable them to succeed in the looming era of value-based contracts by creative initiatives to deliver high-quality care at a lower cost. The goals and interests of the AHP are parallel to those of PPACA: to engage specialty providers in the delivery of integrated care pathways; to establish efficient communication between care managers in medical homes, primary care, and specialist practices; to develop an integrated information system capable of monitoring quality of care measures; and to develop a payment mechanism to facilitate such engagement.

Other community-based stakeholders may include medical societies, public health and safety providers and agencies, social and aging services, and educational organizations. Stakeholders outside of the healthcare system and non-for-profit world may include patient support groups and organizations, payers, large self-insured corporations, and business alliances who are also interested in improving overall community health at a lower cost (Blackburn 1983; Brownson et al. 1996; Group 1991; Fawcett et al. 1997; Goodman et al. 1995; Howell et al. 1998; Johnston et al. 1996; Mayer et al. 1998; Zapka et al. 1992; Roussos and Fawcett 2000). In Upstate New York, the Greater Rochester and Finger Lakes regions are well recognized for their long history of community-wide collaborations including University of Rochester Medical Center, Finger Lakes Health Systems Agency (FLHSA), Monroe County Medical Society (MCMS), Rochester Business Alliance, Rochester regional office of American Cancer Society (2014), local payers (e.g., Excellus Blue Cross Blue Shield), accountable care organizations, and others. The FLHSA is an independent community health planning organization working collaboratively with multi-stakeholder groups to improve healthcare quality and access and eliminate healthcare disparities in the nine-county Finger Lakes region. Its mission is to bring into focus community health issues via data analysis and community engagement and to implement solutions through community collaboration and partnership. It has become the convener and facilitator of multi-stakeholder community initiatives to measure and improve the health, healthcare, and cost of care. In the initial round of the CMMI Innovation Challenge, the FLSHA was awarded with a $26.6 million initiative “Transforming the Delivery of Primary Care: A Community Partnership.”

Excellus Blue Cross Blue Shield is a nonprofit health plan, whose mission is to work collaboratively with local hospitals, doctors, employers, and community leaders to offer affordable healthcare products. For instance, Excellus administers its managed care products for Medicaid eligible individuals through its partnering organization, Monroe Plan. Over the years, Excellus partnered with many other community stakeholders (e.g., Kodak, MCMS, URMC) to lead several area-wide initiatives aimed to improve quality of care and population health and reduce necessary variation in care and services overuse.

Types of Data for Surgical Outcome Assessment

Existing Data Sources

There are multiple types of medical data available, and each have their own set of complexities that while answering important questions also leave gaps that require further analysis from alternative perspectives found through other data sources. Typical datasets are comprised of the following: hospital discharges, claims, registry, and survey results. Other administrative types of data include hospital discharge data or billing data as recorded and provided by the hospital itself. These datasets are highly dependent on local practices and can vary between institutions. It can be linked with other subject data providing an in-depth chart review; however, it is limited by the cases performed at an individual hospital. Some states have statewide discharge census data, including California and New York (Hannan et al. 1994, 1995, 2012, CA Society of Thoracic Surgeons 2014). These datasets provide billing data at a larger level, which includes ICD-9 codes by diagnosis, with the ability to track hospital and surgeon level variation, subject linking longitudinally across in-state and charges (in contrast to claims paid out) (Table 1).

Table 1 Types of data used to assess surgical outcomes, quality, and safety

Claims data are available at a national as well as local levels and include Medicare data that can be linked to other datasets and insurance claims (i.e., Excellus-blue shield, large self-insured corporations (Xerox, Kodak), and data warehouses (Thompson Reuters)). Registry data can be quite detailed, albeit specific to the registry’s purpose. Examples of registry datasets include tumor registries like SEER that can be linked to Medicare for more robust analysis, NCDB that expands cancer data beyond the identified cancer centers that are included within SEER, and the National Surgical Quality Improvement Program (NSQIP) registry that samples approximately 20% of all cases performed at participating hospitals. Other registries include those maintained by provider organizations (AMA, AHA). Finally, survey data can provide the patient perspective that is lacking from other large dataset analyses. Two prime examples are the Medicare Current Beneficiary Survey and the Hospital Consumer Assessment of Hospital Providers and Systems (HCAHPS) Survey.

The first database for surgical outcomes was developed in NYS for cardiothoracic surgery (Hannan et al. 1990) leading to substantial quality improvement, facilitating development of the field of quality assessment and risk adjustment in medicine, and challenged the traditional approach of confidential reporting of adverse events. Based on its success, this was expanded to the STS National Database established in 1989. The STS states that “physicians are in the best position to measure clinical performance accurately and objectively” (Surgeons 2014), serving as a mandate for surgeon participation in these initiatives.

While cardiac surgery has long maintained a similar database for tracking quality, this approach was expanded nationally to help improve surgical outcomes. The National Surgical Quality Improvement Program (NSQIP) has been a major development within the surgical community as it provides more detailed surgical information at a national level than was ever previously available. The main purpose of this program was to improve quality through benchmarking, where hospitals were given risk-adjusted data comparing outcomes nationally to other hospitals of similar size. Based on the depth of data, numerous research studies have been conducted, describing surgical risk factors and comparing operative approaches. While this has been very useful for expanding our understanding of surgical quality as a whole, it was quickly realized that different operations needed specific in-depth data in order to design meaningful quality improvement strategies. One approach to providing more detailed data has been the roll out of procedure targeted variables, in which institutions can add to the traditional NSQIP data for additional cost. This approach allows for a more detailed approach to individual procedures. This was first made available with the release of the 2012 NSQIP dataset, and the impact remains to be seen. Targeted variables have required consensus from experts that can be difficult to obtain and be limited in its scope. This in-depth approach also requires more resources limiting participation.

Another specialty-specific approach includes the Organ Procurement and Transplantation Network (OPTN) database aimed at monitoring transplant programs nationally. This is monitored and run by the US Department of Health and Human Services (National Cancer Institute 2014). The desire for more detailed data has led to a number of subspecialty datasets modeled after NSQIP. A few examples include a vascular surgery-specific dataset, the Vascular Quality Initiative (2014), Pediatric NSQIP, and an endocrine surgery-specific dataset (Collaborative Endocrine Surgery Quality Improvement Collective 2014). The methods of data collection vary, NSQIP employs a clinical nurse reviewer, and CESQIP does not yet have the same infrastructure, requiring the surgeon or the surgeon’s designee to input data.

Another approach has been the creation of regional collaboratives, which requires a high level of collaboration with both academic and nonteaching hospitals alike. Regional collaboratives will likely play a role in decreasing unnecessary variability and tracking quality at a more manageable, regional level, where it is easier to implement change than at the national level. Thus far, the regional approach has been seen in both Michigan and Central New York. The central New York collaborative, called UNYSQI (Upstate New York Surgical Quality Initiative), has focused predominantly on colorectal surgery and more specifically at addressing the question of readmissions. NSQIP allows for 40 additional variables, and given this narrow limitation, specific questions must be addressed.

Participation in data collection programs is promoted as it meets criteria for both maintenance of certification (MOC) and Physician Quality Reporting System (PQRS) as part of CMS (EHealth University: Centers for Medicare & Medicaid Services 2014). This section for maintaining credentials requires that providers evaluate their performance based upon specialty-established requirements which must include national benchmarking. The MOC outlines six core competencies, one of which is practice-based learning and improvement. Part IV of the process for continuous learning includes practice performance assessment. For the American Board of Surgery, diplomats must participate in a national, regional, or local surgical outcome database or quality assessment program. The PQRS is a part of CMS and is the second specific incentive promoting the use of outcome data collection programs as it uses both payment adjustments to penalize, as well as incentive payments to ensure providers report quality data (Table 2).

Table 2 Databases and outcomes used to assess surgical outcomes, quality, and safety

Data Quality

A common saying in large database analysis is “garbage in garbage out,” and while there are methods to account for missing data, a major limitation remains with extensive missing data points. One approach might be to limit case inclusion to only those with a full set of data; however, this quickly limits patient inclusion. This approach may be appropriate for some major data points such as sex, where it can be assumed that if subject sex is not included then other variables are likely to be of questionable quality. Missing data may also be secondary to the data collection process. For instance, in NSQIP, preoperative laboratory values are gathered; however, there remains extensive variation in timing of preoperative labs, as well as whether a specific blood level is checked at all. One particular example is albumin level. Albumin level has demonstrated associations with nutrition and overall health status. Studies have shown associations with surgical outcomes as well; however, this laboratory value is not always checked preoperatively. In fact, there may be a bias of checking this value in patients that may be at risk for malnutrition or have other major comorbidities. This fact may bias results leading to concern about its inclusion in multivariable analysis, even though it holds clinical value. Some suggest it should not be included at all, while others suggest it requires a more nuanced approach. Albumin, for instance, is reported as a continuous variable, but can be transformed into a binary variable using clinically meaningful cutoffs previously described as 3.5 g/dl. By assuming all missing values fall within the normal range, one creates a differential misclassification that underestimates the true effect as some in this group may in fact have low albumin levels. Thus, if an observed association is found, it likely is true, albeit an underestimate. The data can then still be useful for clinical decision making even though many values are in fact missing. Another approach to this same problem can be assessing whether those in the missing dataset are different with respect to the endpoint than the others. This is specifically testing whether there is differential misclassification. If there is, then one can treat the missing data group as its own categorical level without making any assumptions if there is an observed effect compared to subjects with data. Another method includes imputation of data. These methods are beyond the scope of this chapter, but briefly involve separate analysis predicting that specific data point based on the subject’s other characteristics.

Missing data of the first type (missing sex) can be avoided through auditing processes. Many data collection programs employ auditing processes to ensure quality data and sites are not included if they demonstrate inability to conform to predetermined standards.

Another major limitation to all large datasets is changing variable definitions over time. While this process is necessary to some extent as clinically meaningful definitions may change with time, it can drastically limit the subject numbers available for analysis for that endpoint. One such example is postoperative transfusion within NSQIP. Initially, the number of transfused units was included intraoperatively and postoperatively defined as greater than 4 units. Researchers were able to then describe this endpoint as major postoperative bleeding and specifically describe the extent of intraoperative blood loss. This changed in 2011 when the number of intraoperative units of blood was removed altogether and postoperative transfusion was changed to 2 units or more of packed red blood cells. The first limitation is the danger of merging datasets across years without understanding these changes. First, if ignored, researchers may erroneously code these missing intraoperative transfusions as no transfusion given and make assumptions upon it which will clearly be mistaken. Secondly, it poses a challenge in the second instance as the postoperative transfusion variable in the newer dataset has a different clinical meaning. Two units of blood can be given for merely low hematocrit levels with comorbidities meant to optimize patients and no longer representing a postoperative bleeding event. These two variables of transfusion are not comparable over time, given the changes limiting analysis.

Changes in Surgical Procedures and Practices Over Time

Other issues regarding data collection include the constantly evolving process of case definition and even the addition of new surgical procedures over time. For instance, the change from ICD-9 to ICD-10 is looming, and how this will impact data collection remains to be seen. The nuanced changes between the two systems will likely impact some areas more than others, and a deep understanding of these nuances will be necessary to compare cases between these two time periods. The last major ICD coding change was in 1975, and the medical arena has changed dramatically in that time including the advent of the electronic record.

Some databases only include ICD-9 coding where numerous different procedures may be relevant for repair of that diagnosis, for instance, appendicitis can be treated by an open approach making an incision in the right lower quadrant or can be treated using laparoscopic techniques, using three small incisions and a camera for appendix extraction. Where only ICD-9 codes are available such datasets lack discrimination preventing comparison of operative approach.

The introduction of laparoscopic procedures is one example of how surgical procedures change over time; while the first report of laparoscopic appendectomy was published in 1981, this practice did not become ubiquitous until the turn of the century and now represents the preferred technique (Korndorffer et al. 2010).

These changes can significantly impact research as each procedure has specific complications; however, there may be limits in the available data due to changes not captured by the coding systems. For instance, CPT coding does not capture robotic techniques lumping them with laparoscopic procedures. This has limited observational studies comparing or even tracking robotics usage over the past decade. Another example on the limits of CPT coding include the absence of transanal endoscopic microsurgery (TEMS) codes used for distal rectal cancer resections that are of sufficiently minimal rectal wall invasion. This approach is a minimally invasive one that spares the rectum and the sphincter allowing for essentially full rectal function in low-grade tumors; however, they are lumped in with other rectal cancer resections which often include complete rectal resections with end colostomy or loss of sphincter. The difference in quality of life and even the types of complications are huge. While it clearly makes it impossible to perform observational studies on TEMS within large datasets, it also adds variation and error into any assumptions about outcomes after low rectal cancer resections. There are some ways to exclude TEMS from dataset by selecting cases where the tumor stage was sufficiently high to make TEMS contraindicated; however, this does not help elucidate specifically the advantages of TEMS. Another example where CPT coding fails is differentiating between some specific laparoscopic approaches. Although open inguinal hernia repair has been a bread-and-butter surgical operation, within the last decade, increasingly surgeons are applying their laparoscopic skills to hernia repair. There are two available laparoscopic approaches: totally extraperitoneal (TEP) or transabdominal preperitoneal (TAPP). The TAPP approach enters the abdominal cavity in standard laparoscopic fashion repairing the hernia from the inside using tacks, whereas the TEP approach enters a space above the peritoneum placing the mesh between layers and usually does not require tacks to keep the mesh in place. Both approaches may have different risk profiles and long-term sequelae; however, observational evaluation is limited since there is no differentiation by CPT codes in the ICD-9 system.

There also remain many processes that are not coded in most databases. This includes many data points that may impact outcomes, such as patient follow-up strategies, staffing, utilization of trainees, and even postdischarge medications. While large datasets evolve, opportunities to expand the data as research questions arise may be available. UNYSQI is one example where through the ACS-NSQIP institutions can track their own specific data points which may help answer specific questions.

The surgical field is constantly progressing, not just specifically with new procedures but also with the introduction of entirely new specialties. For example, endocrine surgery is starting to become a major surgical subspecialty; although not yet a board-certified specialty, the presence of these more specialized surgeons may impact outcomes. Other major changes in surgery may also impact outcomes, which have not been included in current databases. For example, resident work hour restrictions by the ACGME continue to change and become increasingly strict. Previously, it was not unheard of for surgical residents to work 120–100 h weekly, where now work hours are capped at 80 per week and interns are prevented from taking 24-h call. These changes have drastically changed patient coverage and in some cases required supplementing staffing through advanced practice providers or moonlighters. These changes have not been tracked and it is unclear how changing the workforce structure has impacted outcomes. Although controversial, this question holds some urgency as more and more restrictions are being implemented. In fact, a new randomized controlled trial will observe how these restrictions impact care; one arm of the trial will require surgical residents to follow the new regulations, while the other will function without work hour restrictions. However, such data is largely absent from current datasets.

Other major changes include the advent of telemedicine, and with robotics, even remote operations are now possible with the first transatlantic cholecystectomy or so-called “Lindbergh” operation was performed in 2001 (Marescaux et al. 2002). These changes were only possible through improvements in electronic communication that decreased the lag time sufficiently to allow such an operation.

The role that virtual communication will have in the future remains unclear, but will likely increase in frequency in the coming decades. Currently, such approaches are not tracked; however, including such practices in large healthcare databases may be useful in understanding their uptake and impact on clinical care. Other adjunct advances also impact surgical care, although largely unappreciated, such as major advances and availability in high-quality imaging. Where 20 years ago computed tomography was limited, it is now ubiquitous and high-quality scans are available within minutes. These findings change the diagnostic paradigms and the quality of surgical decision making, although availability of such high-quality CT scans is not included in databases, even those that track whether CT scanning was done at all. Other technological advances include intraoperative imaging through 3D laparoscopy and the development of new instruments that make previously unthinkable operative approaches possible such as single incision surgery or natural orifice transluminal endoscopic surgery that allows surgeons to perform cholecystectomy through the vagina.

There are many other changes to the structure of healthcare that may drastically impact outcomes including advances in patient monitoring or quality of care in the intensive care unit. While it would be onerous to include all of these changes into any given dataset, it is important to remember the many forces that impact outcomes. Much like a projectile in physics has many forces that alter its course such as friction, rotation, and wind forces, and many of these forces can be ignored to provide the overall picture using the major forces of velocity and gravity on the object to provide an estimated course; however, keeping these other forces in mind remains important as they may have potential to be key forces in surgical care.

Individual Surgeon Variation (Preferences, Techniques, and Skills)

Even if there is a single code and agreed-upon surgical treatment or practice, the implementation of this can vary considerably. Laparoscopic cholecystectomy, for instance, one of the most commonly performed operations, has considerable variation in the way the procedure itself is performed. The absence of this precise detail is in obstacle to standardizing procedures nationally. There are statistical techniques for controlling for variation at the surgeon level, specifically hierarchical modeling with random effects. Hierarchical random effect modeling also addresses the issue that most multivariable models ignore; independence assumptions are voided in healthcare studies as patients are treated by surgeons within hospitals which have been shown to impact quality. Surgeon volume is one surgeon factor that was initially noted in 1979, where complex procedures such as pancreatectomy and coronary artery bypass graft have better outcomes when performed by higher-volume surgeons (Solomon et al. 2002; Birkmeyer et al. 2002; Katz et al. 2004). This may in part reflect standardization of technique, evidence-based practice, and skill, which may be a function of practice. Teasing out how outcomes are dependent on technique variation is virtually impossible in current large dataset, although one could argue this variation might explain quality to a much greater degree than even risk adjustment based on patient factors.

Timing of Complications

Even if a reasonable outcome is chosen, it is essential to understand the interplay of that complication with the hospital course. Incorrect assumptions about this can lead to incorrect answers. Recent studies on readmissions have suffered from major errors when they attempt to include complications as risk factor for readmission (Aquina et al. 2014b). Some studies suggest that complications are the biggest risk factor for readmission, and while this may seem reasonable, they often confuse the reason the patient was admitted with a risk factor for readmission. This has led to disastrous consequences as inclusion of such reasons for readmission in the model can make all other risk factors no longer statistically significant, and in one model, the authors came to the incorrect conclusion that the only risk factor for readmission was postoperative complications, although subsequent studies have demonstrated this to be false. This can be avoided by using complication timing to define complications as during the inpatient stay as compared to at postdischarge. While predischarge complications have been associated with readmissions, the effect estimates have been much lower than previously described when all complications are considered together.

Limited Information on Socioeconomic Drivers of Health

Analyses of patterns and outcomes of care require an assessment of the complex relationships among patient characteristics, treatments, and outcomes. Furthermore, according to the Andersen healthcare utilization model (Aday and Andersen 1974), usage of health services (including inpatient care, outpatient physician visits, imaging, etc.) is determined by three dynamics: predisposing factors, enabling factors, and need. Predisposing factors can be characteristics such as race, age, and health beliefs. For instance, an individual who believes surgery is an effective treatment for cancer is more likely to seek surgical care. Examples of enabling factors could be familial support, access to health insurance, one’s community, etc. Need represents both perceived and actual need for healthcare services. To conduct and interpret outcome analyses properly, researchers should both understand the strengths and limitations of the primary data sources from which these characteristics are derived and have a working knowledge of the strategies used to translate primary data into the categories available in public databases. For instance, SEER-Medicare documents details on individual cancer diagnoses, demographics, (age, gender, race), Medicare eligibility and program enrollment by month, and aggregate measures of the individual’s “neighborhood” (e.g., average income and years of education presented at the zip-code and census-tract level) as determined through a linkage to recent US Census data. However, census level data do not allow for assessment of differences among those zip-code areas.

Many analyses of large databases focus on the patient’s race or ethnicity as a confounder or a predictor of outcome or a marker for other unobserved factors (disadvantaged geographic area or low health literacy). Information on race is generally available, while information on ethnicity is often missing or inappropriately coded. While most of the US data surveys allow only one category for Hispanic ethnicity (yes/no), the NCDB classifies cancer patients into seven categories (Mexican, Cuban, Puerto-Rican, Dominican, South/Central American, Hispanic by name, and Other). In our analysis of treatment patterns for Hispanic cancer patients in NCDB, we demonstrated persistent disparities in receipt of guideline-recommended care. The care in Hispanic group as a whole was not significantly different from non-Hispanic, while individual subgroups demonstrated significant differences, highlighting a critical need of acknowledging Hispanic subgroups in outcome research.

Need for Linked Data

Surgical safety and quality are multifactorial issues with more than one risk factor and hence multiple potential mechanisms for improvement. For instance, reduction in postsurgical complications could be partially achieved by more efficient patient education about early symptoms, improvement in surgeon’s skills, changes in nursing and hospital practices, use of surgical visiting nurse services, and other interventions. Similarly, one quality improvement intervention may have impact on multiple stakeholders including patients and their caregivers, clinic personnel, and health insurance. Hence, a comprehensive evaluation may require information about all involved parties. Such data are rarely available in one dataset, and therefore, many surgical outcomes and quality improvement studies are using multiple merged sources of data.

The SEER-Medicare data is a product of a linkage between two large population-based datasets: Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute and beneficiaries healthcare claims data collected by the Center for Medicare and Medicaid Services for billing and enrollment purposes. The linked dataset includes Medicare beneficiaries with cancer from selected states participating in SEER Program, with unit of observation being one healthcare utilization event. This includes all Medicare-covered healthcare services from the time of a person’s Medicare eligibility (before or after cancer diagnosis) until their death. Because of complex sampling design, number of included variables, and specific data reporting practices for tumor characteristics and services utilization, the investigator considering a SEER-Medicare-based study or a proposal should spend time understanding SEER-Medicare data limitations (National Institute of Health 2014) and learning about data layout and coding (manuals and training are available at the NCI and other cancer research organizations).

The Medicare Current Beneficiary Survey (MCBS) is a longitudinal survey of a nationally representative sample of the Medicare population. The MCBS contains data about sociodemographics, health and medical history, healthcare expenditures, and sources of payment for all services for a randomly selected representative sample of Medicare beneficiaries (Centers for Medicare and Medicaid Services 2014). For every calendar year, there are two separate MCBS data files released: Access to Care and Cost and Use files which can be ordered directly from the CMS with assistance from the Research Data Assistance Center at the University of Minnesota (Research Data Assistance Center 2014).

MCBS Access to Care file contains information on beneficiaries’ healthcare access, healthcare satisfaction, and their usual sources of care (Goss et al. 2013; Research Data Assistance Center 2014). MCBS Cost and Use file offers a complete summary of all healthcare expenditure and source of payment data on all healthcare services including expenditures not covered by (CMS Research Data Assistance Center 2015). The information collected in the surveys is combined with the claims data on the use and cost of services. Medicare claims data includes information on the utilization and cost of a broad range of costs including inpatient hospitalizations, outpatient hospital care, skilled nursing home services, and other medical services. In order for the Cost and Use file to collect, summarize, and validate accurate payment informations, the release of C&U file is usually delayed by 2 years compared to the MCBS AC file.

In addition to publically available merged datasets, individual investigators can create their own aggregated databases by linking together information from multiple sources and combining existing data with prospectively collected and patient-reported information. Examples of such studies include a NSQIP-based evaluation of preoperative use of statins and whether it is associated with decreased postoperative major noncardiac complications in noncardiac procedures (Iannuzzi et al. 2013c), a study of recipients of abdominal solid organ transplant (ASOT) using additional data from patient medical records (Sharma et al. 2011), and a retrospective review of the data from medical records of patients diagnosed with hepatocellular carcinoma compared to patients in the California Cancer Registry (CCR) (Atla et al. 2012).

Data Management and Big Data

More and more data are being collected for different purposes and are available to be linked together including electronic memberships, online purchasing and consumer behavior records, electronic transactions and others. The datasets become so large and complex that it becomes difficult to manage using traditional resources, and organizations have to increase their resources in order to be able to manage them. Before we know what to do with it, we have entered into a new era of big data. Big data is high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery, and process optimization (Gartner 2013). The challenges of working with big data include analysis, capture, curation, search, sharing, storage, transfer, visualization, and privacy violations, among many others. Innovative solutions such as cloud computing chip away at some challenges while remaining limited by others. For instance, cloud computing outside services such as Amazon ec2, box, dropbox, internet2, etc. provide storage or processing capabilities, but without internal infrastructure or agreements with the outside services, there is the potential for privacy violations. Yet, just like with the administrative data several decades earlier, the opportunities provided by big data potentially outweigh the risks and, in time, may become data-driven analytics as routine as EMR and digital image sharing.

Structure-Process-Outcome Assessment in Surgery

Theoretical Framework of Quality Assessment in Healthcare

According to Donabedian (1966), if there is evidence that good structure leads to appropriate processes which in turn result in good outcomes, quality of healthcare intervention could be measured in terms of either structures (S), processes (P), or outcomes (O) (Fig. 1).

Fig. 1
figure 1

Donabedian approach for evaluating outcomes

These indicators can be measured using electronic, readily available, data from the organizational health information systems, data collected by cancer trackers, and other regional data systems, like Rochester RHIO. It is important to work closely with each hospital’s clinical quality assessment team, to avoid redundancy in data collection and other quality assessment and reporting initiatives (e.g., Hospital Scorecard, the Clinical Service Scorecard, and the Management Plan Tracking Reports, SCIP, HCAHPS), and others (Hospital Consumer Assessment of Healthcare Providers and Systems 2014; The Joint Commission Core Measure Sets 2014a). Additional financial and pre- and postadmission cost and utilization information about patients can be obtained from CMS claims data for Medicare fee-for-service beneficiaries and Excellus BCBS claims for commercially insured and Medicare HMO patients (Medicare Health Insurance Claim (HIC) number or health insurance ID will be abstracted from the patients’ medical charts).

The bundles of care for surgical patients can be defined by multidisciplinary care teams for specific diagnoses and surgical service lines. A care bundle identifies a set of key interventions from evidence-based guidelines that, when implemented, are expected to improve patient outcomes (Institute for Healthcare Improvement 2006). The aim of care bundles is to change patient care processes and thereby encourage guideline compliance in a number of clinical settings (Brown et al. 2002; Burger and Resar 2006; Pronovost et al. 2006). Using regional or national healthcare utilization and expenditure data with Medicare or private plan reimbursement schedule, clinicians and hospital administrators can estimate annual cost of care for surgical patients receiving various care bundles, by disease stage. These bundled cost estimates can be used internally (e.g., for budgeting projections or to calculate return on investment for new programs and interventions) or externally, to provide a foundation for contract negotiations with payers, regional healthcare systems, and accountable care organizations (Froimson et al. 2013; Ugiliweneza et al. 2014).

While it is tempting to seek out a single perfect metric of surgical quality, anybody familiar with the complexity and variation in patient risks and the delivery of surgical care would agree that such metric could not possibly exist. More suitable would be a multidimensional measure similar to the six-domain definition of healthcare quality suggested by the World Health Organization (WHO). These dimensions require that healthcare be:

  • Effective: delivering healthcare that is adherent to an evidence based and results in improved health outcomes for individuals and communities

    Example: each cancer case is reviewed by a specialty multidisciplinary team at least once before the final decision about treatment is reached.

  • Efficient: delivering healthcare in a manner that maximizes resource use and avoids waste

    Example: avoid unnecessary imaging for colorectal cancer (CRC) patients such as PET scans or multiple CT scans.

  • Accessible: delivering healthcare that is timely, geographically reasonable, and provided in a setting where skills and resources are appropriate to the medical need

    Example: providing a hub-and-spoke model for chemotherapy delivery for CRC patients residing far from major cancer centers

  • Acceptable/patient centered: delivering healthcare which takes into account the preferences and aspirations of individual service users and the cultures of their communities

    Example: offering palliative care to all patients with advanced cancer

  • Equitable: delivering healthcare that does not vary in quality because of personal characteristics such as gender, race, ethnicity, geographical location, or socioeconomic status

    Example: providing financial assistance to low-income cancer patients assuring that out-of-pocket expenses do not represent a barrier for adequate treatment

  • Safe: delivering healthcare that minimizes risks and harm to service users

    Example: following WHO surgical checklist to minimize the risk of surgical complications and never events

As illustrated by the examples above, this definition of healthcare quality provides the link between the organization of care, care processes, surgical quality, and outcomes. Hence, it enables all participating stakeholders (e.g., clinicians, researchers, payers, and hospital administrators) to rely on Donabedian’s framework when assessing quality of surgical services. According to Donabedian, if there is evidence that good structure leads to appropriate processes which in turn results in good outcomes, quality of healthcare intervention could be measured based on presence of appropriate structures (S) or processes (P).

Below we provide several examples of evidence-based measures of quality in surgical care.

Structure

Lord Darzi, international expert on quality and innovation in cancer care, world-leading colorectal surgeon, the former Minister of Health in the United Kingdom, and the lead author of the UK Darzi Plan to redesign care delivery, encouraged healthcare agencies to “localize care where possible, and centralize services where necessary” for efficacy and safety. This implies that routine healthcare, like cancer survivorship services, should take place as close to home as possible, while more complex care, like active cancer treatment, should be centralized to ensure it is carried out by the most skilled professionals with cutting-edge equipment and high volume/experience.

There exist several validated care delivery models to improve access to specialty care for patients with complex chronic disease living in underserved or remote communities (for instance, using videoconferencing technology for enhanced care coordination). There is a large body of literature demonstrating that standardized care pathways, use of multidisciplinary teams (MDTs), resident involvement (Iannuzzi et al. 2013a, b), availability of specialized providers (e.g., board-certified surgical specialists, surgical nurses, and PA) and services (e.g., stoma care, wound care, surgical ICU), and receiving care in a high-volume center of excellence are associated with better outcomes (Reames et al. 2014; Howell et al. 2014).

Evidence that hospital volume influences outcomes has been verified in nearly every major type of surgery (Begg et al. 1998; Birkmeyer et al. 2002; Katz et al. 2004). This body of work highlighted important and previously unrecognized variations in hospital performance and ignited efforts to improve surgical quality among poorly performing hospitals. In an effort to reduce these variations among hospitals, new health policy and quality improvement initiatives, such as public reporting, pay-for-performance, and surgical checklists, have been implemented to promote best practice and improve standards of care (Hannan et al. 1990, 2012; Haynes et al. 2009; Lindenauer et al. 2007). Over the last decade, surgical mortality rates have significantly decreased throughout the country, possibly due to such measures (Weiser et al. 2011; Finks et al. 2011; Birkmeyer 2012). While surgical/facility volume is easy to measure, the mechanism of association between procedure volume and outcomes remains to be poorly understood. Possible explanations highlight the importance of surgical expertise, specialized services, and infrastructure that tend to be associated with large-volume centers.

Patient management following multidisciplinary principles consistently leads to superior outcomes at much lower costs. Published supporting evidence for improved cancer-specific outcomes with the use of multidisciplinary teams is available for a range of cancers, including breast, lung, head and neck, esophageal, and colorectal (Chang et al. 2001; Coory et al. 2008; Gabel et al. 1997; Stephens et al. 2006; Wille-Jorgensen et al. 2013; Burton et al. 2006).

Process

Many factors that constitute the structure and organization of surgical services contribute to the processes of care and, ultimately, affect patient outcomes. For instance, in addition to knowing structural features, such as whether a hospital has a surgical ICU, it is also important to identify processes of care, such as how the ICU is staffed and what policies, regulations, and checklists the SICU personnel adhere to, including failure to rescue, escalation of care, communication, use of imaging and antibiotics, and patient nutritional protocols. If a residence program is housed in a hospital (structure), what, when, and how surgical residents are required to perform during cases (processes) may vary by institution and has serious impact on institutional outcomes.

There is also a growing interest regarding the potentially detrimental impact of interruptive operating room (OR) environments on surgical performance (Healey et al. 2006; Wiegmann et al. 2007). Previous investigations showed that interruptions occur frequently in ORs, across various surgical specialties (Weigl et al. 2015).

In an effort to improve surgical outcomes and potentially lower costs, recent attention has been placed on efficiency of care delivery and the surgical volume-outcome relationship. Luft et al. first explored this concept in 1979 showing that there was a relation between hospital volume and mortality for complex procedures such as open-heart surgery or coronary bypass (Luft et al. 1979). Since then, Birkmeyer et al. expanded on this idea by showing a significant relationship between both hospital volume and surgeon volume and operative mortality for many different procedures, including resections for lung, bladder, esophageal, and pancreatic cancer (Birkmeyer et al. 2002). Subsequent surgical oncology studies have shown an association between volume and negative margin status, superior nodal harvest, and both short-term and long-term survival. Recently, volume-outcome relationship has been demonstrated even for less specialized procedures, such as incisional hernia repair (Aquina et al. 2014a).

Evidence of the volume-outcome relationship, along with financial pressures, implementation of surgical bundled payments, and shift to accountable care organizations brought to light the importance of efficient and coordinated models of care delivery. With the increase in the number of surgical subspecialties and nonsurgical specialties performing surgical procedures (e.g., intervention radiology and cardiology, urogynecology), there is an increase in the involvement of advanced practice providers in patient care delivery (e.g., nurse practicioners (NP), physician assistants (PA), technicians, and therapists) and growing acceptance of multidisciplinary care pathways (oncology, geriatrics, orthopedics, among others). For example, high-volume bariatric surgery practices can hire psychologists, nutritionists, exercise therapists, and specialty nurses to provide additional supportive services. This approach can free surgeon’s time and improve care coordination and patient experience. There are other situations when the specialty and training of provider is important – for the procedures that could be performed by different types of providers, for instance, inferior vena cava filter (IVC filter), a type of vascular filter that is implanted to prevent life-threatening pulmonary emboli (PEs). IVC filters could be placed by a number of different types of providers (vascular surgeons, general surgeons, cardiologists, interventional radiologists) for various indications. The outcomes of the intervention (mortality, complications, PE) could potentially depend on the specialty and skill of the provider.

In general, clinic staff rarely bill for their services and often are employed by the institution. Multidisciplinary consultations for cancer patients are also not reimbursable and often count toward “academic time” for faculty physicians. As a result, these services may be “invisible” from insurance claims or medical records. In fact, only one provider can be associated with each billable service (procedure or hospital admission). For any service delivered by more than one provider (e.g., resident participating in a surgical case, several APPs involved in hospital discharge process), additional data may need to be included (e.g., operating notes, individual provider claims).

Surgical Outcomes

A choice of optimal outcome for each study or evaluation depends on the goal of the assessment as well as factors that may be driving this outcome (causal pathway) and resources available to the investigators as some of the outcome collection processes may be very costly and time consuming (e.g., health utility and quality of life measurement) (Drummond et al. 2005; Iezzoni 2004). Below we describe some of the most common types of outcomes used in surgical outcome research and quality assessment and discuss their applications, limitations, and sources of data.

Clinical Outcomes

Mortality: When defining mortality, it is important to be specific about the duration of the observation period (e.g., in-hospital vs. 30-day mortality) as well as the starting point for the observation period (e.g., day when the procedure was performed for 30-day postsurgical mortality versus 30 days after hospital discharge for 30-day hospital mortality). Using hospital discharge abstracts and publicly available software, one can measure in-hospital mortality using the most appropriate definitions for the needs of the project. For instance, if there is a significant variation in the hospital length of stay between patients in the study, it may be more accurate to define hospital mortality based on the 30-day postadmission interval rather than postdischarge time (Borzecki et al. 2010; Hannan et al. 1990, 2013).

Cancer Survival: For surgical oncology studies, cancer survival rate is often more appropriate outcome metric than surgical mortality because the vast majority of cancer patients receive multimodal therapy. Cancer survival is reported by most tumor registries or can be calculated from pathology reports. Cancer survival is defined as a percentage of people who have survived a certain type of cancer for a specific amount of time (e.g., 12 months, 2 or 5 years). Certain cancers can recur many years after first being diagnosed and treated (e.g., breast cancer). During this time, a former cancer patient (also called survivor) may die from a different condition (oncologic or benign), and hence, the most appropriate choice of reported statistics in this case would be tumor site-specific mortality. For instance, patient may be successfully treated for thyroid cancer but die from colon cancer 20 years later. Other types of survival rates that give more specific information include disease-free survival rate (the amount of cancer patients who are cancer-free), progression-free survival rate (the amount of cancer patients who are not cured but their cancer is not progressing), and cancer recurrence (cancer that has returned after treatment and after a period of time during which the cancer was not detected). Sometimes without detailed pathology data, it is impossible to distinguish cancer recurrence from cancer progression. An example of recurrence versus progression dilemma could be observed in rectal cancer patients who received nonsurgical neoadjuvant treatment. Following neoadjuvant chemoradiotherapy (CRT) and interval proctectomy, 15–20% of patients are found to have a pathological complete response (pCR) to combined multimodal therapy, but controversy persists about whether this yields a survival benefit (Martin et al. 2012).

Surgical Complications: Incisional Hernia. Incisional hernia is abdominal wall fascia that fails to heal. Incisional hernia is a common postoperative complication following major abdominal surgery. Data on incidence of incisional hernia is highly variable with reported values ranging from 0% to 91%. Diagnosis for incisional hernias is typically within the first 3 years after initial laparotomy (Yahchouchy-Chouillard et al. 2003; Rosen et al. 2003; Rea et al. 2012); however, it may take up to 10 years to become evident after the initial surgery (LeBlanc et al. 2000; Akinci et al. 2013). This large amount of variation in the reported rates of incisional hernia is not unforeseen, given the wide assortment of the group of patients included into the studies, the executed surgery, and the amount of time during the follow-up (Caglià et al. 2014). Several outcome measures could be appropriate for a study on incisional hernia including incidence, prevalence, rates of hospital admission, and reoperation.

Surgical Complications: Surgical Site Infection (SSI) (Schweizer et al. 2014). In addition to pain, discomfort, and high risk for readmission, surgical site infections (SSIs) are identified with an excessive amount of morbidity and mortality. The costs of SSIs have been the focus of quality improvement and safety efforts ever since the Centers for Medicare and Medicaid have halted compensation for the growing costs linked with SSIs after some surgical operations (so-called potentially preventable infections) (Aquina et al. 2014b). Prior studies have reported cost of hospitalizations after SSIs in the range from $24 000 to $100 000 (Schweizer et al. 2014).

Patient-Reported Outcome Measures (PROMs)

Patient-Reported Outcomes Measurement Information System (PROMIS®): Measures included in PROMIS® are intended for standardized assessment of various patient-reported outcome domains – including pain, fatigue, emotional distress, physical functioning, and social role participation (Devlin and Appleby 2010). PROMIS® is a new set of tools intended to be used in routine clinical practice as a part of electronic medical record (EMR) (Cella et al. 2007) system. PROMIS® was established in 2004 with funding from the National Institutes of Health (NIH). PROMIS measures are based on common validated metrics to ensure computerized and burden-free data collection process in any healthcare setting that yields accurate measurement of patient health status domains over time with few items (National Institute of Health 2015a).

Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) (Systems 2014): Just like with any other consumer goods and services, many providers and organizations have collected information on patient satisfaction with healthcare. However, prior to HCAHPS, there was no national standard for collecting and publicly reporting patients’ perspectives on their healthcare experience that would enable valid comparisons to be made across providers. In May 2005, the National Quality Forum (NQF), an organization responsible for standardization of healthcare quality measurement and reporting, formally endorsed the CAHPS® Hospital Survey (Press Ganey Associates Inc 2014).

The HCAHPS survey is mailed to a random sample of hospital patients after a recent discharge. The survey asks patients to rate 21 aspects of their hospital care combined into nine key topics: communication with patients and doctors, communication between patients and nurses, responsiveness of the hospital staff, pain management, communication with patients about medicines, discharge information, hospital’s cleanliness, hospital environment’s noise levels, and transition of care. Patients’ perception of care is a key performance metric and is used to determine payments to hospitals (Hospital Consumer Assessment of Healthcare Providers and Systems 2014). The Hospital Compare database (4605 hospitals) can be used to examine complication rates and patient-reported experience for hospitals across the nation. Prior studies have demonstrated an inverse relationship between patient experience and complication rates. This negative correlation suggests that reducing these complications can lead to a better hospital experience. Overall, these results suggest that patient experience is generally correlated with the quality of care provided.

Depending on the type of surgery and patient population, other outcome measures may be also relevant (e.g., pain, functional status, and cognitive ability). Quality of life is a multidomain indicator that combines all aspects of health relevant to patients and, hence, may serve as an aggregate outcome measure.

Quality of Life and Subjective Well-Being (Lee et al. 2013): Quality continues to be placed at the heart of discussions about healthcare. This raises important questions how quality of care should be measured and from whose perspective, patient’s, provider’s, or payer’s. Subjective well-being (SWB) is a measure of the overall “wellness” of an individual and as such has the potential to be used as this global marker for how treatments affect people in the experience of their lives. SWB links all stages in the treatment and care process, thus allowing the overall quality of care to be determined and valued according to its direct effect on people’s lives. SWB has been shown to have an effect on outcomes at all stages of the treatment experience, and improved health and quality outcomes are shown to consistently enhance SWB (Lee et al. 2013). Furthermore, SWB measures have been shown to be a suitable method to value the impact of healthcare on the families and caregivers of patients and, in this way, can join up health outcomes to show wider effects of treatment on patients’ lives. Measuring an individual’s SWB throughout his or her treatment experience can enable a full appraisal of the quality of care that they receive. This could facilitate service improvements at the microlevel and help value treatments for resource allocation purposes at the macrolevel.

Surrogate Outcomes

Although everybody recognizes the importance of measuring patient outcomes and several valid and accurate measures (as described above) are available, there are several practical barriers to measuring patient outcomes. These include time (waiting for cancer recurrence or mortality to occur while maintaining regular follow-up with a patient), personnel costs (to perform routine surveillance and follow-ups), and patient burden (repeated follow-up, evaluations, and surveys). One of the potential solutions to these problems is use of surrogate outcomes. A surrogate outcome (or endpoint) is a measure of effect of a specific treatment that may substitute for a real clinical endpoint but does not necessarily have a guaranteed relationship (Cohn 2004). Surrogate markers are also used when the number of events is very small, thus making it impractical to conduct a clinical trial to detect a statistically significant effect (e.g., instead of measuring VTE events which have an incidence of less than 1%, studies often use ultrasound-detected blood clots which are much more prevalent but do not always result in PE or VTE) (Fleming and DeMets 1996). A correlate does not make a surrogate. It is a common misconception that if an outcome is a correlate (i.e., correlated with the true clinical outcome), it can be used as a valid surrogate endpoint (i.e., a replacement for the true clinical outcome). However, proper justification for such replacement requires that the effect of the intervention on the surrogate endpoint predicts the effect on the clinical outcome – a much stronger condition than correlation. Other examples of commonly used surrogate outcomes in surgery include costs of care as a measure of poor outcomes and disability, positive surgical margins, carcinoembryonic antigen (CEA), and number of lymph nodes retrieved as a measure of long-term cancer recurrence and mortality (Nussbaum et al. 2014).

Composite Outcomes: Episode of Care or Care Bundles

The value of quality reporting in surgical care, however, is limited by problems with existing measures of quality, mainly, that existing quality indicators are designed to measure the quality of a specific facility (e.g., hospital) or a specific provider (e.g., surgeon). This, however, does not reflect the current paradigm of care delivery when a patient may be diagnosed in the community, referred to a regional center of excellence for neoadjuvant chemoradiation, followed up for 6 months by an academic colorectal surgeon, before returning back to the community for years of posttreatment surveillance. Regional standardized pathways of care and multidisciplinary team (MDT) approach has been recommended by all clinical societies to better identify, coordinate, deliver, and monitor the optimal treatment on an individual patient-by-patient basis (Chang et al. 2001; Coory et al. 2008; Stephens et al. 2006; Abbas et al. 2014; Wille-Jorgensen et al. 2013; Morris et al. 2006; Gatt et al. 2005; Adamina et al. 2011).

Risk Adjustment

Risk adjustment is a set of analytic tools used for an array of functions in the healthcare (Iezzoni and Long-Bellil 2012; Schone and Brown 2013). One of the primary uses of risk adjustment is providing fair comparison between different patient populations, providers, or programs. Risk adjustment is also necessary to set costs for health plans to suggest expected treatment expenses of their specific membership group. Because of discrepancy in everyone’s health and treatment needs, the cost and outcomes of healthcare may differ from person to person. Without risk adjustment, plans or providers have an enticement to enroll and treat healthier patients (so-called cream skimming or cherry-picking) and avoid sick, frail, or complex patients. After appropriate risk adjustment, plans and providers receive a larger amount of reimbursement for members with numerous chronic illnesses than for members with a small amount of or no health problems at all. In addition to costs, risk adjustment is also applied to health outcomes when comparing performance across providers (e.g., risk-adjusted mortality is reported by the STS National Database and NSQIP, CABRG Report Cards NYS, UK surgical mortality (National Health Services 2015); The Society of Thoracic Surgeons National Database 2014). The methodology used to risk adjustment varies, depending in part on healthcare market regulations, the populations served, and the source of payments. Risk adjustment is used in all major public programs offering health coverage in the United States – including Medicare Advantage (MA), Medicare Part D, and state Medicaid managed care programs. The STS National Database, with its three million patient records, has long used risk adjustment to provide more accurate patient outcomes. If not risk adjusted, the records of surgeons who perform operations on higher-risk patients would always look worse than the records of surgeons who treat low- or average-risk patients.

From Data to Quality Improvement

Understanding Hospital Billing Data

For many hospital and outpatient services, there is a wide difference between billed charges and the amounts that providers expect to receive for services. Hospital charges are usually determined by hospital administrators depending on prior history and demand. Reimbursement rates, on the other hand, or the payments that hospitals are actually willing to accept for a specific service or product, vary by payer and specific plan. On average, hospitals billed Medicare 3.77 times (standard deviation = 1.83) what they were actually reimbursed, with a range of 0.42 to 16.23 (Muhlestein 2013). The ratio may vary for private payers.

High hospital charges, though, do have some important consequences. First, since the charges do not correlate with the amount being paid and hospital expenditures required to produce a specific service (i.e., true cost), it becomes difficult, if not impossible, to compare process between hospitals, and draw conclusions about financial sustainability of various service lines. Second – and potentially devastating for some – those who are uninsured who receive care at a hospital, or those who are insured and receive care at an out-of-network hospital, may face a bill that greatly exceeds by many times the negotiated price paid by any payer.

Focusing on Modifiable Factors

One of the major paradoxes that limits our ability to improve practice based on the results of published studies is that most available predictors are not modifiable (readmissions: patient severity, comorbidities), while most modifiable factors are not routinely collected through standard clinical data systems (SES, organizational structure). Furthermore, the reported statistical associations not equal causation (but often assumed) and hence, modifying predictor may not result in a desired change in the outcome of interest. Let’s consider the example below.

Failure to rescue (FTR) refers to the mortality among patients with serious complications (Johnston et al. 2014; Pucher et al. 2014; Almoudaris et al. 2013). Typically, it is hospitals with greater FTR rates (not greater complication rates) that have the greatest rate of mortality. Thus although complications may occur, outcomes can still be improved by optimizing the quality of care provided to the patient post-complication. Although there have been several studies highlighting the importance of FTR as a marker for quality of care, these have only considered organizational aspects of healthcare. Few have explored the underlying human factors that lead up to this critical event. Two main factors may contribute toward an FTR event: first, a failure to recognize a sick patient and, second, a failure to act promptly once deterioration has been detected. In both situations, an escalation of care (EOC) process is required if FTR is to be avoided.

EOC involves a nurse recognizing a change in patient status and communicating it to a postgraduate year 1 (PGY1) resident, who subsequently reviews the patient and then escalates care further for advice and/or management. Escalation is a difficult process, as the first doctor called by the nurses will usually be the most junior; this is the traditional hierarchy. After initial assessment, the junior doctor must then contact his or her senior to explain why they need help and the urgency of response required. All of this places a premium on the value of communication between team members. However, failures in communication are ubiquitous and frequent in the postoperative phase. Although this EOC process lies at the center of FTR and is critically important for safety and quality of surgical care, it remains difficult to measure and quantify and, hence, relatively unexplored in the research literature.

Identifying Actionable Goals

Despite the most sound study design and state-of-the-art statistical methodology, outcome studies do not always lead to meaningful improvement in care quality and patient outcomes. Is this the ground for skepticism? Not at all. Just like many investigations in basic biomedical sciences, outcomes and quality assessment projects often fall short of their potential impact by simply reporting barriers to high-quality care without considering strategies for systematically overcoming these limitations and obstacles. Other common mistake is assuming that just because some risk factors are statistically associated with poor quality or outcomes, they represent a target for improvement. For instance, if low patient education is associated with poor cancer prognosis, it may be naïve to assume that more education would improve outcomes in cancer patients without a high school diploma. In this case, low education is likely to be a marker for social and economic deprivation in this demographic group. Addressing this issue may require developing a system-wide solution like providing a care navigator, graphics rather than text-based decision support tools, and phone- rather than internet-based communication with care providers.

Sometimes when large administrative dataset are used for the analysis, statistically significant risk factors are not necessary clinically significant. Before considering any change in clinical practice, it may be beneficial to review the results for face validity with all stakeholders involved in care process. One approach is to use a systematic quantitative validated method to assess risks in the process of information transfer across all phases of surgical care. The method is known as failure mode and effect analysis (FMEA) and was originally developed by engineers to accomplish proactive risk analyses (McDermott et al. 1996). The National Center for Patient Safety of the US Department of Veterans Affairs adjusted FMEA for use in healthcare, resulting in healthcare FMEA (HFMEA) (DeRosier et al. 2002). Healthcare FMEA is a multistep process (Fig. 2) that uses a multidisciplinary team to proactively evaluate a healthcare process. The team uses process flow diagrams, hazard scoring, and decision trees to identify potential vulnerabilities and to assess their potential effect on patient care. The method captures the likelihood of risks, the severity of consequences, and the probability that they may be detected and intercepted before causing harm. Healthcare FMEA has so far been applied to medication administration (Fletcher 1997; McNally et al. 1997; Kunac and Reith 2005; Weir 2005), intravenous drug infusion (Adachi and Lodolce 2005; Apkon et al. 2004; Wetterneck et al. 2006), blood transfusions (Burgmeier 2002), equipment problems (Weinstein et al. 2005; Wehrli-Veit et al. 2004), and surgery (Nagpal et al. 2010).

Fig. 2
figure 2

Main steps in surgical healthcare failure mode and effect analysis (HFMEA) (Adapted from the Veterans Affairs National Center for Patient Safety, DeRosier et al. 2002)

Presenting Results

Quality outcome research results may be presented in a variety of ways depending in part upon the endpoint and how that data will be used. Standard statistical approaches using student’s t-test for continuous and chi-square for categorical data, for instance, have long been noted to have biased results based on patient factor distribution. This is particularly important for observational studies using data where patients have not been randomized. Higher-level statistical packages using multivariable approaches to adjust for patient-level factors are now readily available, providing adjusted estimated effects in terms of odds ratios. Despite the ubiquity of such methods, if not well thought out, results can be drastically skewed. Only confounding factors and covariates not on the causal pathway should be included. If one controls for factors on the causal pathway, one may find that no presumed risk factors are associated with the outcome, because they have been effectively controlled for in the multivariable analysis. This will be discussed further below. Confounders such as comorbidities may also be highly collinear, and grouping or using already established practices for comorbidity adjustment may be helpful in decreasing the number of variables, particularly if the research question is regarding comparing two different surgical approaches where one only desires to adjust for comorbidities rather than ascertain their independent contribution to risk for poor outcome.

While multivariable analyses are presented with odds ratios, even this relatively straightforward result presentation requires some additional thought in terms of the desired interpretation. One particular nuance is whether using a reference group that makes the odds ratio greater than one, in other words suggesting increased risk, or such that the odds ratio suggests a protective effect. It is often more intuitive to present odds ratios suggesting increased risk; however, this is not always appropriate.

As quality data becomes more prevalent, multiple metrics reportedly measuring the same poor outcome may exist. Auditing these results and comparing which approach is more reliable and measures the underlying disease state is of utmost importance, particularly if this data is to lead to clinical change. For instance, using Pearson’s correlation coefficient, a study of NSQIP data when compared to regional data measuring anastomotic leaks found that the traditional approach of “organ space infection” poorly correlated with the more specific anastomotic leak variable as more specifically defined. These findings suggest that prior reports are based on identifying organ space infection as an anastomotic leak in colorectal surgery.

Odds ratios may be difficult to put into clinically meaningful terms other than demonstrating relative importance. Another approach to taking multivariable analysis to the next step is the creation of risk scores aimed at guiding clinical decision making. This approach effectively operationalizes the data available in multivariable analysis by weighting risk factors. The approach to these analyses is slightly different as they are aimed at predicting an event, rather than identifying all potential risk factors. This changes in which variables are included in analysis, as only those that improve the predictive ability should be used. There may be a high degree of crossover; however, risk scores are most useful when they are simple and so one may desire to make a parsimonious model, that is, a model with the fewest number of covariates while maximizing the predictive power of the model (Iannuzzi et al. 2013d, 2014a; Kelly et al. 2014a). In order to perform a predictive analysis, data should be split into a development and validation dataset so the risk score can be tested on naive subjects estimating its ability to be applied to novel patients. Another similar approach is the use of nomograms, which is simply another way to organize risk score-type data.

With the advent of the electronic record, some of this risk scoring can now be integrated directly into the clinical record, alerting physicians about high-risk patients for readmissions or high-risk DVT patients prompting some action such as prophylaxis prescription. This approach has increased the use of guideline-based approaches and may be an effective tool moving forward. NSQIP also provides individual patient risk calculators for many complications which allow in-office estimates of risk based on individual patient factors. This tool anecdotally has a high degree of satisfaction for patients and providers alike and likely improves the consent process.