Laparoscopic vertical sleeve gastrectomy (LVSG), since its Medicare approval as a stand-alone procedure in 2012, has rapidly become the most commonly performed primary bariatric operation in the USA, surpassing the Roux-en-Y gastric bypass (RYGB). The safety profile of LVSG has thus far been excellent, with acceptably low rates of postoperative complications including infection, thromboembolism, stricture, bleeding, and staple line leak. In 2012, the Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) was formed to establish process and outcome measures for accreditation. Among the standards used to quantify and define safety are 30-day postoperative outcomes, notably, readmission, reoperation, reintervention (e.g. endoscopy), or mortality.

Nationwide data on LVSG is contained within the MBSAQIP participant use data file (PUF), an extensive database containing information on operations performed at all the accredited bariatric centers in the USA. In the dataset from operations performed in 2016, almost 800 different centers submitted information on over 186,000 unique bariatric operations [1]. This robust cohort is sufficiently powered to address difficult questions in bariatric surgery in an effort to improve cost, safety, and efficiency. Its data has potential to drive changes in practice with regard to operative technique, preoperative patient preparation and guidance, as well as best practices postoperatively.

Artificial neural networks are complex modeling systems in which an algorithm is generated by teaching the system to predict an outcome. Used far more ubiquitously in other applications such as computer engineering, ANNs are only now quickly emerging as potentially useful tools for projecting clinical outcomes [2]. In surgery, ANNs have successfully been developed to predict survival after liver transplantation, diagnosis of acute appendicitis, and prompt extubation after coronary bypass, as a few representative examples [3,4,5]. Despite their limitations, ANNs can offer predictive modeling with a higher fidelity than statistical techniques commonly used in surgery, such as multiple regression. In this investigation, we aim not only to use the 2016 MBSAQIP dataset to identify the panel of risk factors that may portend the composite endpoint of 30-day morbidity and mortality after VLSG, but also to optimize the modeling of the variance contained within the identified risk factors using an ANN.

Materials and methods

For this study, all patients were taken from the 2016 MBSAQIP dataset. The MBSAQIP constitutes a joint venture by the American College of Surgeons and the American Society for Metabolic and Bariatric Surgery, and its database represents the largest bariatric surgical dataset in the country. Information regarding all bariatric procedures performed at accredited center is input in this database, capturing over 200 patient and surgeon/center variables as well as short-term outcomes after surgery. The MBSAQIP PUF is a Health Insurance Portability and Accountability Act compliant document, and all variables and outcomes are as defined in the accompanying PUF Variables and Definitions Manual [1]. As this dataset does not contain identifying information, it was deemed exempt from the University of Minnesota Institutional Review Board.

Our study cohort was derived first by querying all patients who underwent a LVSG as indicated by the designation of CPT code 43644 (n = 114,251). Patients were excluded only if they did not undergo traditional multi-port LVSG (robotic, single-incision, etc.), generating 101,721 patients who were studied. Select patient factors were chosen a priori for inclusion based on simplicity, accuracy of quantification, and plausibility of having an association with the outcome of interest. These variables included demographics, major comorbidities (severe hypertension, diabetes mellitus), non-independent functional status, and the presence of a history of previous obesity/foregut surgery. Among demographics, binary variables included female gender and non-white race. Continuous variables included age and initial body mass index (BMI0), defined as BMI at the time of surgery. All other variables were dichotomized based on their presence or absence. Severe hypertension was defined as the use of three or greater anti-hypertensive medications at the time of surgery.

The primary endpoint of interest was a composite 30-day morbidity and mortality, defined by the presence of a 30-day readmission, reoperation, reintervention, or death. Specific interventions and justifications for these events, reported non-uniformly as free-text, were not addressed. Patients were subsequently stratified by presence of a 30-day endpoint, and a bivariate analysis was used to determine association of the patient factors with an event. Bivariate analysis was performed using Chi square test or the Mann–Whitney U test as appropriate. Categorical variables were represented as percentages and continuous variables as median (interquartile range).

Factors significant on bivariate analysis were subsequently included in a multivariate nominal logistic regression analysis for independent association with the 30-day endpoint. Results of the multivariate analysis were expressed by an odds ratio with 95% confidence interval, and strength of association was also characterized with a P value. The quality of the multivariate analysis was characterized by an r2 goodness of fit, P value, as well as the area under the receiver-operating characteristic curve (AUROC), and this analysis excluded patients without all included variables known.

Similarly, an ANN model was generated using a three-node back-propagation technique with k-fold validation with each node was assigned an equal training value of 0.333, representing previously used conservative parameters to minimize further model complexity and mitigate overfitting [6]. Eighty percent of the patients were randomly used to train the model, and the other 20% were withheld to constitute the interval validation set. The ANN model is illustrated in Fig. 1. ANN models were characterized by r2 goodness of fit, P value, as well as the AUROC. The AUROC values of the two multiple variable models were then compared [7].

Fig. 1
figure 1

Diagram of 3-node artificial neural network used to predict 30-day morbidity and mortality

A two-tailed P value of 0.05 or less was taken as the threshold to denote statistical significance. All basic statistical analyses were performed using GraphPad Prism 8 (LaJolla, CA). Multivariate analysis and artificial neural network modeling were performed with JMP Pro 13 (Cary, NC).

Results

Of the 101,721 VLSG patients included in the analysis, 79.4% were female with a median age of 44.3 years (n = 101,704). Additionally, 27.6% were of non-white race, and their median BMI0 was 43.3 kg/m2 (n = 100,807). Within this cohort, there were 3853 patients with a 30-day morbidity or mortality and 97,868 patients without a national rate of 3.8%. Of these 3853 patients, 81.5% (n = 3140) had a 30-day readmission, 23.0% (n = 887) had a 30-day reoperation, 23.0% (n = 887) had a 30-day reintervention, and 1.7% (n = 65) patients had a 30-day mortality. Of the patients with a 30-day endpoint, 25.3% (n = 976) were noted as having two or more of the four events that comprise the composite endpoint. These results are summarized in Table 1.

Table 1 Baseline characteristics of the patient cohort

Bivariate analysis identified those factors with an association with a 30-day endpoint. Associated demographics include advanced age (P = 0.003), non-white race (P < 0.001), and initial BMI (P < 0.001). Presence of the comorbidities of severe hypertension and diabetes mellitus, as well as non-independent functional status and previous obesity/foregut surgery were also associated with the 30-day endpoint (P < 0.001). These factors were included in a multiple logistic regression analysis to determine independent association with a 30-day outcome. Indeed, all factors remained statistically significant (n = 100,791, P < 0.001, r2 = 0.008). Results of the bivariate and multivariate analyses are summarized in Table 2.

Table 2 Bivariate and multivariate analyses of associations with the 30-day endpoint

The multivariate analysis was subsequently subject to ROC curve analysis, which generated an AUROC of 0.572 (n = 100,791). The same factors considered in multivariate analysis were imputed in an artificial neural network as described. The algorithm derived from the ANN training set (80% of the patients chosen randomly) generated an AUROC of 0.581 (n = 80,633); similarly, the validation set (derived from the 20% of patients withheld) generated an AUROC of 0.585 (n = 20,158). A comparison of the ROC curves between the multivariable and ANN training set models is illustrated in Fig. 2, revealing an improved goodness of fit of the ANN model.

Fig. 2
figure 2

Receiver-operating characteristic curves for the 7-variable multivariate nominal logistic regression model (left, AUROC = 0.572) and the 7-variable artificial neural network training set (right, AUROC = 0.581). Curve for artificial neural network validation set not shown (AUROC = 0.585)

Discussion

In this investigation, we elected to examine outcomes after the VLSG. Mechanistically, the drastically reduced stomach volume restricts bolus capacity and provides for earlier satiety, allowing for significant caloric reduction [8]. Its 5 year weight loss has been shown commensurate with the gastric bypass, though with less potential for postoperative digestive syndromes [9]. Relative to the bypass, its principle drawbacks include the potential for continued or worsening reflux and potentially inferior resolution of comorbidities, specifically, diabetes mellitus [10].

The primary outcomes of interest, and those used to help characterize whether a bariatric center meets accreditation criteria, are 30-day morbidities (readmission, reoperation and reintervention), as well as 30-day mortality. In particular, significant attention to 30-day readmissions as an outcome measure has been given, as this event is frequently avoidable, may not be reflective of a true complication, is costly, and necessitates utilization of significant emergency department resources [11, 12]. Lower than for the gastric bypass, VLSG has a readmission rate of 2.8%, most frequently, for nausea, vomiting, and dehydration symptoms. Demonstrated risk factors for readmission include black race, diabetes, hypertension, renal failure, and severe chronic obstructive pulmonary disease [13]. As such, determination of a compact, directed panel of preoperative factors chosen to be examined a priori was primarily influenced by previously characterized risk factors for readmission, the most prevalent and perhaps avoidable of the four components of the 30-day endpoint [14]. Demographics, comorbidities with high prevalence, as well as functional status and revisional surgery were thus taken to be used as our examinable risk factors.

Predicting which patients will have a 30-day morbidity or mortality represents a challenge, as it is ostensibly governed by preoperative, intraoperative and postoperative patient, and surgeon/center factors, in addition to an element of random chance. Nonetheless, optimally characterizing the variance attributable to simple preoperative patient factors can help better identify and stratify those patients more likely to have a positive endpoint early in the course of their bariatric care. In this study, multivariable analysis demonstrated the independence of advanced age, non-white race, and higher initial BMI as predictors of a 30-day morbidity and mortality. Additionally, confirming previous reports, severe hypertension and diabetes mellitus were also risk factors, as were non-independent functional status and previous obesity/foregut surgery [15]. Using the logistic regression function to predict the development of a 30-day endpoint on the basis of only the included seven preoperative patient factors generated an AUROC of 0.572, a measure reflecting the goodness of fit of the model in its predictive ability. The ANN model, in contrast, demonstrated by an improved AUROC.

The potential for ANN use in surgery is significant, as the algorithms derived from ANNs are sophisticated, non-linear, and capable of recognizing complex interactions among both continuous and categorical variables in order to optimize outcome prediction. Moreover, if desired, the algorithms can continually be refined prospectively, as new patient data is input. Analogous to neuronal synapse interaction in the brain, ANNs are model systems taught to predict an outcome by considering an input layer of variables with which to undertake pattern recognition. Intermediately derived data from each layer is subsequently processed at hidden nodes, which, similar to a neuron, are used to integrate and weight inputs, and pass on the information for further processing and prediction. The ANN model used in this investigation benefited from the 20% of patients randomly withheld to constitute a cohort to provide internal validation to the algorithm, to prevent against overfitting. Despite their potential, ANNs have seen only limited adaptation to clinical outcomes modeling in surgery, including bariatric surgery.

In 2007, Lee and colleagues used a 249 patient set to demonstrate an improvement in the prediction of post-bariatric surgery weight loss using an ANN model relative to logistic regression, on the basis of type of operation as well as preoperative triglyceride and hemoglobin A1c levels [16]. Subsequently, ANNs were used to predict excess weight loss after adjustable gastric banding in two subsequent small studies [17, 18]. Most recently, we reported our use of ANN modeling to predict weight loss at 6 months and one-year after laparoscopic Roux-en-Y gastric bypass [19]. Using over 647 patients from a single institution, five factors associated with postoperative weight loss were modeled using multiple linear regression and, optimally, by an ANN. This study aimed to overcome the biggest weakness of the ANN model, its complexity precluding it from clinical use, by constructing a web-based patient-centered tool to use the ANN algorithm to generate an estimation of weight loss expectation at 6 and 12 months postoperatively [19]. Similarly, the development of a user-friendly neural network based tool to identify early high risk patients constitutes the best measure for early intervention on modifiable risk factors and subsequent potential improvement in 30-day morbidity and mortality. This investigation contributes to the body of work using ANNs to predict bariatric surgical outcomes, as we successfully use ANNs to demonstrate superiority in its prediction of the occurrence of a 30-day composite endpoint.

The findings of this study must be considered in the context of its limitations. While nationally comprehensive, the MBSAQIP dataset used is subject to the selection bias inherent in retrospective analysis of prospectively collected data. Furthermore, despite standardized definitions of variables and outcomes, attribution of data is subject to bias due to medical record misrepresentation, misclassification, and ultimately misinterpretation by the institutional coders who contribute. In part due to this unavoidable variability, the 30-day endpoints of interest chosen for consideration in this study represented the most well-defined, and attribution was less subject to interpretation of nebulous or conflicting information in the medical record. Next, the use of our endpoint as a composite of 30-day reintervention, readmission, reoperation, or mortality fails to allow for distinguishing characteristics of each of these endpoints alone, as there may be significant variability in factors portending each of these four outcomes. Notably, one might hypothesize that risk factors for readmission may be due to factors more related to a patient’s psychiatric health and burden of comorbidities, while risk factors for patients who required a 30-day reintervention and reoperation may be more related to those that influence technical complexity of the LVSG and their ability to heal. This separation was not discerned in this study. However, use of 30-day composite endpoints after sleeve gastrectomy has been established, and in fact is generally broader in the scope of those events that constitute a 30-day morbidity and mortality [20]. We believe the endpoint chosen represents one that is clinically significant and eminently quantifiable. Next, the ANN algorithms, though beneficial due to their high fidelity and ability to continually be refined, are algorithmically complex and clinical use remains a challenge. The relative weighted contribution of each variable toward the endpoint is not known in ANNs in contrast to logistic or linear regression techniques, and thus, intervening to best improve risk profile is less straightforward. Nonetheless, ANNs remain a promising advanced modeling system in the prediction of surgical outcomes, particularly as datasets grow ever more comprehensive and complex.

In conclusion, this study reveals several risk factors for 30-day morbidity and mortality after VLSG using the best available dataset. In addition, we demonstrate, on a small scale, that ANN models can be used to optimize prediction of postoperative outcomes in bariatric surgery. We acknowledge the limited variance attributable to the factors considered, among the other limitations. However, a more comprehensive analysis of a greater number of variables using an ANN with a much larger input layer may be warranted in the future.