“Quality is never an accident; it is always the result of intelligent effort.”

John Raskin

The term “data-driven” has long left the academic boundaries and became a key household term used to describe decision making across a wide range of areas and industries including marketing, weather reports, financing, human resources, communication, and of course healthcare. It is widely recognized that clinicians, researchers, and policymakers depend on the use of large administrative, clinical, and registry datasets to deliver high-quality medical and surgical care, support research and innovation, and improve population health. Based on the programs that were originally created at the Department of Veterans Affairs, the American College of Surgeons (ACS) introduced and supports several highly used surgical datasets starting with the National Surgical Quality Improvement Program (NSQIP) made available in 2004 [1]. In 2017, the ACS in collaboration with the American Society for Metabolic and Bariatric Surgery (ASMBS) released the 2015 Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) Participant Use Data File (PUF) [2].

Given the growing importance of clinical quality as a driver for innovation as well as cost containment efforts, it is anticipated that the MBSAQIP PUF will allow researchers to evaluate quality of care and outcomes among patients undergoing bariatric surgery [2, 3]. In just 2 years after its release, 28 papers have been published utilizing MBSAQIP data with a potential for improving quality of care and outcomes for patients with metabolic disorders. The methodology for using large datasets has been well established in other fields (economics, sociology, and epidemiology). However, only a few studies outlined general recommendations and checklists for analyzing large surgical datasets [4, 5], and most of the clinical studies utilizing MBSAQIP data do not report their approaches for data validation, imputing missing values and ensuring data accuracy, completeness, and consistency [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33].

In this paper, we describe specific approaches that investigators can use before analyzing the MBSAQIP data regardless of the purpose of the analysis. This framework could be used to facilitate resident research training, strengthen institution quality improvement programs, or improve the quality of the national MBSAQIP registry.

Methods

MBSAQIP data

The MBSAQIP PUF is the largest bariatric-specific clinical dataset in the US and includes data on all bariatric surgery procedures performed at MBSAQIP-participating centers and was first released in 2017. It is a Health Insurance Portability and Accountability Act (HIPAA)-compliant data file containing cases submitted to the MBSAQIP Data Registry. The PUF is comprised of patient-level data and does not identify hospitals, healthcare providers or patients [34]. Since the MBSAQIP data do not contain any personal health information and are publicly available in an anonymous manner, it does not constitute Human Subjects Research as per federal regulations [35]. Therefore, the study was exempt from the University at Buffalo Institutional Review Board (IRB) review.

The 2015 PUF includes cases with operation dates between January 1, 2015 and December 31, 2015 [34]. The data collection processes and variable definitions are described elsewhere [34, 36]. Telem and Dimick summarized the contents, strengths, and limitations of the dataset and provided statistical recommendations for new investigators [2]. In short, the Main file is a flat data file in which each row represents a case (surgery), and contains data on 168,093 patients from 742 participating centers. There are 154 associated variables (e.g., patient demographics, preoperative patient characteristics, operative information, outcomes, and other case descriptors). The additional four files (BMI, Reoperation, Readmission, and Intervention) may have multiple rows per case. The BMI file has 13 variable and provides details on postoperative weight and BMI (recorded once or multiple times) and the days from the operation when they were recorded. The Reoperation file has 15 variables on patients who underwent any operations within 30 days of index MBS, including the type, whether or not it was an emergency procedure, whether or not it was unplanned (yes/no), whether or not it was related to the index MBS procedure, the reason for reoperation, and the number of days from the index surgery to the reoperation. The Readmission and Intervention data files have nine variables each and provide information on readmissions and interventions, respectively, performed within 30 days following the index MBS surgery, similar to the Reoperation file. Released data in the PUF are limited to 30-day outcomes and exclude the longer-term data variables collected within the MBSAQIP registry.

For the purpose of this study, we merged the five subsets of the 2015 MBSAQIP PUF (Main, BMI, Reoperation, Readmission, and Intervention) to develop a single flat analytic dataset (Fig. 1).

Fig. 1
figure 1

Development of a single flat file for the current study from the five component data files of the 2015 MBSAQIP PUF. Three data quality measures (completeness, accuracy, and consistency) were assessed in the analyses. Postoperative BMI is the main outcome of the bariatric surgery. Reoperations and readmissions serve as quality indicators for providers and hospitals, respectively

Analysis

We examined data completeness, accuracy, and consistency for all registry variables in the flat analytic file. We developed several logic and validity tests including the following:

  1. i.

    Individual profiles of patient BMI changes over time:

    In MBSAQIP, BMI was recorded on different dates before and after surgery. Before MBS, the highest BMI as well as the BMI closest to surgery were recorded. Postoperative BMI was recorded from one or more clinic visits occurring from days 0 to 150 after MBS. We examined the units of measurement [pounds (lbs.) or kilograms (kg)] used to report patient weight and calculated weight changes between the visits. If patient BMI varied more than 10 units between two subsequent appointments, we also checked the original weight and height reported at those visits and recalculated the BMI to check for computational errors. We excluded patients whose pre/postoperative weight/BMI were recorded as ‘0.’

  2. ii.

    Individual patient care pathways (chronologic record of patient admission, discharge, and procedure history)

    Using the combined flat file and the various MBSAQIP-created variables depicting the number of days from MBS to postoperative events/outcomes, we reconstructed patient chronologic care pathway starting with MBS admission, perioperative and postoperative outcomes and any (none, single or multiple) interventions, readmissions, and reoperations within 30 days of the index MBS.

  3. iii.

    Clinical and face validity tests between pairs of variables describing the same clinical encounters (emergency intervention vs. procedure type, related admission with intervention vs. planned intervention)

    We examined whether the entries for the admission type (urgent or elective, planned or unplanned), matched the provided categories of readmissions, interventions, and reoperations. The registry includes two follow-up questions for any reported hospital readmission. One question asks whether the admission was unplanned at the time of the principal operation. Answer “yes” implies that the readmission was unplanned and answer “no” implies that the readmission was planned. The second follow-up question requires choosing one out of 25 possible reasons for the readmission. We assessed the clinical validity of reasons for unplanned readmission using the opinion of experts in surgery and emergency medicine. Similar coding analysis was used for postoperative interventions. We excluded 11 out of 8715 readmissions and 5 out of 3855 interventions because of missing entries for reason for readmissions and interventions, respectively.

Results

Data completeness

The most common missing variables in the 2015 MBSAQIP PUF were BMI and ethnicity. Overall, 2442 (1.5%) patients had zero values for preoperative weight (hence, the corresponding preoperative BMI, which is calculated using patient’s weight) recorded closest to surgery. Additional 8888 (5.3%) patients did not have a recorded postoperative weight (and the corresponding postoperative BMI). Hispanic ethnicity was recorded as “unknown” for 17,230 (10.3%) patients. A multivariable analysis using a standard statistical software (i.e., SAS) could potentially lose 33,868 (20%) cases resulting in an analytic dataset of 142,319 (Fig. 2).

Fig. 2
figure 2

Amount of data that is potentially unusable because of missing/invalid/unknown/no response entries in the 2015 MBSAQIP PUF

Data accuracy

Among 168,093 patients in the combined data file, only 125,803 (75%) patients had the same unit of measurement (lbs. or kg) for preoperative weight closest to surgery and the first recorded postoperative weight after MBS. This difference in units of measure may result in inaccurate calculations of patient BMI, which is a key indicator of MBS success. While MBS is generally indicated for patients with a BMI 35 or higher [37], 12,972 (7.7%) patients in the 2015 PUF had BMI less than 35 [BMI < 18: (n = 82); 18 ≤ BMI < 25: (n = 1202); 25 ≤ BMI < 30 (n = 2252); 30 ≤ BMI < 35: (n = 9436)]. An additional 5.5% (n = 9266) of patients had a pre-surgical BMI of 60 or higher which corresponds to 350 lbs. for a 5′4″ woman or 410 lbs. for 5′9″ man [38]. The average weight loss between pre- and post-MBS was 16.1 lbs., (standard deviation = 24.2 lbs.) and ranged from − 625 to 524 lbs. (− 71% to 132%) of patient’s preoperative weight measured closest to MBS (Fig. 3A).

Fig. 3
figure 3

A Range of weight reduction comparing patients’ preoperative and postoperative weights in the 2015 MBSAQIP PUF. B Number of readmissions reported to have occurred on the day of or the day after the index bariatric surgery procedure in the 2015 MBSAQIP PUF

Out of 8715 hospital readmissions within 30 days of MBS, 325 (3.7%) were reported on day 0 or 1 postoperatively where day 0 is the day of operation (Fig. 3B). Given that the dataset does not contain a variable describing whether the surgery was performed in an inpatient or ambulatory setting, it is impossible to conclude if this represents a data error, admission to an ICU, transfer to a higher level hospital or inpatient admission after ambulatory surgery.

Data consistency

Among 25 provided reasons for readmission, all except for one (internal hernia, n = 49) category represented emergency conditions according to the current bariatric and acute surgery standards (Table 1). Among the 8715 readmissions, 611 (7%) of them were coded as planned (‘no’ option for the ‘Unplanned Readmission’ variable) including 166 cases of nausea and vomiting, 67 cases of unspecified abdominal pain, and 21 cases of anastomotic leak.

Table 1 Unplanned and planned readmissions and their reasons in the 2015 MBSAQIP PUF

Similarly, among 24 possible reasons for postoperative interventions, all except for one (internal hernia, n = 10) category represented emergency procedures (Table 2). Out of 3850 postoperative interventions performed the 30 days following MBS, 480 (12%) were coded as planned at the time of the index MBS, including interventions for respiratory failure (n = 6), gallstone disease (n = 4), and pulmonary embolism (n = 4).

Table 2 Unplanned and planned interventions and their reasons in the 2015 MBSAQIP PUF

Discussion

Data cleaning and imputation of missing values are the key initial steps when working with preexisting (administrative, clinical, registry, patient-reported) datasets collected for non-research purposes. While preparing the MBSAQIP PUF for the analysis, we uncovered several problems with data completeness, accuracy, and consistency. We identified missing and out of range values for the key parameters (such as weight, BMI, date of readmission). We also encountered several variables with implausible values (600 lbs weight loss from the day of surgery to post-surgical follow-up appointment or patients with BMI as low as 15) and erroneous categories (e.g., planned readmissions for fever, nausea and respiratory failure, zero value weight/BMI). Preoperative BMI was less than 35 for 7.7% of patients. Recently, ASMBS updated its recommendation that MBS may be offered as an option for individuals with Class I obesity (30 ≤ BMI < 35) and obesity-related comorbidities, who do not achieve substantial, durable weight loss and improvement in their comorbidities with reasonable non-surgical methods [39]. However, without additional information, it is difficult to determine whether some weight and BMI recordings of patients were real, due to errors in data entry, or whether there were unusual circumstances where these patients had to undergo MBS. Overall, data quality issues resulted in the loss of 20% of observations from the analytic dataset.

Despite these significant data quality issues, to date, none of the manuscripts using the MBSAQIP PUF reported a comprehensive plan for data cleaning, imputation of missing values, and data quality assurance. While the current MBSAQIP Program Standards Manual (version 2.0) [36] emphasizes the importance of data quality for assessing quality of surgical care, no specific steps and procedures for data quality assurance are currently required. Such lack of transparency and awareness on data quality is, at best, impractical and potentially dangerous since the data are being widely used for QI/QA, provider comparison, and payment.

The issue of data quality is not new and not unique to surgical datasets or the US. Several comprehensive reviews on methodology for validation of data quality, data standardization to facilitate data sharing, and harmonization of data collection practices and systems have been published during the last decade by leading national and European healthcare, public health, health informatics, health insurance, and clinical quality organizations [40,41,42,43,44,45,46,47,48,49]. The approaches and procedures that could reliably improve data quality include the following:

  1. 1.

    Automated data checks for completeness (no-skip pattern) and accuracy (e.g., flagging values outside of pre-defined acceptable range, BMI = 140) [43, 45, 46]

  2. 2.

    Data audits for consistency (e.g., use multiple variables to conduct logic checks: readmission before index admission discharge is not valid) [45]

  3. 3.

    In addition to providing definitions of all registry variables, the data publisher should also include standardized algorithms for abstracting these values based on commonly used clinical data systems (e.g., which values to use if multiple inconsistent values for patient weight are reported in the EMR, whether to assign reasons for readmission based on the DRG, CPT, or clinical notes) [40, 41, 43].

  4. 4.

    Annual conferences, continuous education courses, and workshops for clinical data reviewers and abstractors to share successful practices and enhance homogeneity of national registries and databases [42, 44, 48].

  5. 5.

    Reporting commonly used diagnostic codes, procedures, and interventions to facilitate linkage with insurance reimbursement schedules [47].

  6. 6.

    For any new or recoded variables created by the data registry and released with PUF, a detailed methodology should be included to help researchers understand the definition and limitations of the new parameter (was the readmission unplanned, yes/no) [50].

  7. 7.

    Regular workshops to learn how to use MBSAQIP appropriately and avoid common data and analytic pitfalls, similar to the approach used by the CMS/ResDAC [51]

Depending on the type of data standards and the aspects of data quality considered, data standards may or may not impact quality of data analysis. Because of the substantial differences in normative language and jargon used by different professionals, significant expectation differences between suppliers (data reviewers and abstractors) and customers (researchers and data analysts) can result. For example, some view data quality as those things pertaining to the data values themselves (e.g., when was the patient discharged) while others also include contextual features such as discharge disposition, discharge plan, who performed the discharge education, did patient feel ready to be discharged, etc. Lack of specificity about data standards and data quality enables diverging expectations of the value of data standards and impedes their adoption and progress. We hope instead that clarity in language and intent will hasten the promise of standardization that has benefitted so many other industries.

While not immune from its own flaws, one of the major advantages of the ACS NSQIP dataset is its dependence on inter-rater reliability between the data collector and audits by participating institutions. In this system, variables with greater than 5% disagreement are flagged for future data collection education programs [1]. NSQIP has seen tremendous benefits with this system. A similar and successful system is demonstrated by The Society for Thoracic Surgeons (STS). However, their unique distinction is in categorizing three levels of quality assurances—at the internal, regional, and national levels, which has led to the STS having correctly verified data with an average of 96.4% [52, 53].

Conclusion

The importance and value of large datasets in this new era of research cannot be overstated. Moreover, as we continue to focus on value-based, patient-centered care, quality assessments and its subsequent consequences have implications over many areas within the healthcare system. While always keeping patient safety and evidence-based medicine at the forefront, it is imperative that we work together within the medical community. There is need for collaboration with other allied healthcare personnel, as well as partnering with other experts in the public health sector, including epidemiologists, statisticians, and policy makers, in order to properly utilize and interpret these complex datasets. Finally, we must learn from other successful databases, and utilize some of their techniques in our pursuit for excellence in data quality and patient care.