Introduction

In obese patients with type 2 diabetes mellitus (T2DM), bariatric and metabolic surgery demonstrated excellent glycemic control compared with intensive medical intervention [1,2,3,4]. The short- and mid-term diabetes remission rate after bariatric and metabolic surgery ranged from 50 to 70% in the literature [5,6,7]. Predicting diabetes remission before surgery could be crucial for patients and physicians during the decision-making process for controlling diabetes [8]. Earlier intervention may provide better long-term metabolic outcomes, especially in patients with a high possibility of diabetes remission [9,10,11].

The diabetes remission predictors after surgery were identified, including β-cell preserve, degree of diabetes control, diabetes duration, age, weight loss, or miscellaneous factors [12,13,14]. Furthermore, numerous risk prediction models were also proposed to integrate the complex multiple predictors of diabetes remission before surgery [15]. Two types of model are available: scoring system and logistic regression. In the scoring system, each variable is individually scored according to the different values, and the total score is calculated through the addition of individual scores. For example, ABCD score was the first widely used risk model. It consists of four variables, with a total score ranging from 0 to 10 [16]. A higher ABCD score means a higher chance of diabetes remission. The logistic regression model involves calculating the odds of diabetes remission in a logistic regression equation, which ranges from 0 to infinity. For example, Simona et al. proposed the formula Log (odds) = 0.089 + (− 0.145 × fasting glycemia) + (− 0.21 × diabetes duration) + (0.059 × body mass index [BMI]) [17]. Higher odds meant a higher probability of diabetes resolution.

The major limitation of using current prediction models is that most were developed using Roux-en-Y gastric bypass (RYGB) cohorts, whereas nowadays, sleeve gastrectomy (SG) is the most widely performed bariatric and metabolic surgery globally [18, 19]. Although RYGB and SG had similar short- and mid-term diabetes control rate [6, 7], the mechanisms causing diabetes remission were different [20, 21]. Furthermore, many models were created using a sample size and external validation was lacking [15]. Hence, the objective of our study was to evaluate the performance of the current risk models for predicting diabetes remission at 1 year after SG.

Patients and Methods

Study Patients

This retrospective study was conducted in Taipei Medical University Shuang-Ho Hospital in Taiwan and approved by the Institutional Review Board (IRB No N201804060). Consecutive patients with T2DM who underwent SG at Shuang-Ho Hospital between December 2011 and September 2016 and completed 1 year of postoperative follow-up were included in this study. The indication of the bariatric and metabolic surgery was BMI ≥ 32.5 with T2DM, or poorly controlled diabetes with BMI ≥ 27.5. The exclusion criteria for this study were the presence of end-organ damage, previous gastrointestinal operation, and age < 18, or > 65 years.

Patients undergoing bariatric and metabolic surgery were evaluated pre- and postoperatively by a multidisciplinary team of surgeons, physicians, psychologists, clinical nurse specialists, dietitian nutritionists, and anesthetists. SG is mainly composed of vertical transection of the stomach using multiple 60-mm linear staples guided by a 36-French orogastric tube with the distance between the pylorus and staple line ranging from 4 to 6 cm. Subsequently, the staple line was reinforced using 3-0 V-Loc™ to prevent hemorrhage and leakage. Postoperative follow-up of our patients was at 1, 3, 6, 9, and 12 months. Demographic, anthropometric, and clinical data were collected retrospectively from medical records.

Diabetes remission was defined by the American Diabetes Association in 2009 [22]. Complete remission of diabetes is HbA1c < 6.0% without oral or injectable diabetes medication (including metformin); partial remission is HbA1c < 6.5% without oral or injectable diabetes medication; and diabetes in control is HbA1c < 7.0% with or without diabetes medication.

Model Selection

We searched Pubmed, EMbase, and the Cochrane Central Register of Controlled Trials for full-text articles. Keywords containing morbid obesity, bariatric or metabolic surgery, diabetes remission, and predict were considered for inclusion. We excluded models consisting of postoperative variable or variables not measured in our practice. For example, we excluded the diabetes remission score of Ugale’s et al., which contained the variable “stimulated C-peptide,” which was not routinely checked in our daily practice [23]. Searches were performed on December 31 2017.

Statistical Analysis and Assessment of Model Performance

Preoperative patient characteristics were presented with descriptive statistics of mean and standard deviation for continuous variables and percentage (%) for categorical variables. Comparison of clinical data before and after SG was performed using a paired t test. Comparison between diabetes remission and non-remission groups was established using an independent t test or Chi-square test appropriately.

Performance validation in different models was assessed using preoperative data of the models to predict the diabetes remission after surgery. The score and the odds of diabetes remission of each model were calculated for each patient. The score was calculated according to the definition of the original scoring model. As for the logistic regression model, because not every model demonstrated its original equation, we recalculated all constant and beta values in the equations based on our data and used them for the final odds of the logistic regression models.

Discrimination and calibration are major tools in the assessment of model performance [15, 24, 25]. The discrimination of a model means how well the model could separate the positive outcome from the negative outcome. It was evaluated using area under the receiver operating characteristic (ROC) curve with the method of DeLong et al. An area under the ROC curve (AUC) of 0.501–0.699 was considered poor discrimination; 0.700–0.799 was acceptable discrimination, and 0.800–0.899 suggested excellent discrimination [26]. ROC curves were used to compare the efficacy of various models according to DeLong et al. A two-sided p value < 0.05 was considered statistically significant. Youden’s index was used to determinate the optimal cut-off point for the AUC. At this point, the model would have maximal sensitivity and specificity [27]. The corresponding positive predictive value (PPV) and negative predictive value (NPV) of the cut-off point were also determined.

The calibration of a model meant how closely the prediction model resembled the outcome. Calibration was performed using the Hosmer–Lemeshow goodness-of-fit test, which compared the differences of predicted and observed values in logistic regression; a lack of fit was defined as p < 0.05 [26]. The scoring model was transformed into logistic regression to calculate calibration. Furthermore, the predicted-to-observed diabetes remission ratios of the prediction model were compared in the logistic regression equation of each model. Predicted diabetes remission was defined as positive if an odds greater than 1, and negative if less than 1. Statistical analysis was performed using MedCalc Statistical Software version 18 (MedCalc Software bvba, Ostend, Belgium).

Results

Study Patients

Between December 2011 and September 2016, a total of 182 patients with diabetes underwent SG in our hospital. Five patients were excluded because they did not meet the inclusion criteria, and 49 patients (26.9%) were excluded on account of loss to follow-up, resulting in 128 patients in our study. The mean age of our study participant was 42.4 years; 45.3% men. The mean BMI was 39.2, and excess weight loss (EWL) was 61.4%. Mean preoperative HbA1c was 8.0, which decreased to 5.7 after 1 year, and the T2DM complete remission rate was 71.9%. Detailed patient characteristics are listed in Table 1.

Table 1 Characteristics of the patients (n = 128)

Risk Prediction Models

A literature search was conducted for preoperative risk prediction models of diabetes remission after bariatric or metabolic surgery. A total of 11 models (6 scoring models and 5 logistic regression models) were included in our study. Scoring models consisted of ABCD [16], Individualized metabolic surgery (IMS) [28], DiaRem [29], Advanced-DiaRem (Ad-DiaRem) [30], DiaBetter [31], and model proposed by Robert et al. [32]. Logistic regression models consisted of five models proposed by Dixon et al. [33], Hayes et al. [34], Park et al. [35], Ana et al. [36], and Sinoma et al. [17]. Each model was composed of two to six variables. Models by Dixon et al. and Hayes et al. had the least number of variables, whereas Ad-DiaRem and Ana et al. had the maximum number of variables. Diabetes duration and HbA1c were the most frequently used variables in all models, and by contrast, fasting glucose and C-peptide were the least used one. The complete overview of preoperative variables for each model is listed in Table 2, and the details of all the models are provided in the Supplement Table 1.

Table 2 Preoperative variables in the risk prediction models for diabetes remission after bariatric and metabolic surgery

None of the models were developed from patients who underwent sorely SG. IMS and DiaBetter models included SG and RYGB patients [28, 31], whereas Ana et al. contained biliopancreatic diversion, RYGB, and SG [36]. Furthermore, DiaRem, Ad-DiaRem, ABCD, and Park et al. score is composed of sorely RYGB [16, 29, 30, 35]. All scoring models were externally validated by at least one independent cohort, except for that of Robert et al. However, no logistic regression models were external validated.

Comparison of Remission and Non-remission Group

The remission group comprised younger patients, higher C-peptide, shorter diabetes duration, less oral diabetes drug number, less insulin use, and more EWL with statistical significance compared with non-remission group (Table 3). All models showed significant difference between remission and non-remission groups.

Table 3 Characteristics of the patients who achieved complete remission of diabetes and those without remission 1 year postoperatively

Discrimination

Sensitivity and specificity of scoring models and logistic regression models are demonstrated as ROC curves in Fig 1a, b, respectively. IMS, Ad-DiaRema, DiaBetter, ABCD, DiaRem, and Ana et al. models achieved AUC > 0.800 and revealed excellent discrimination power (Table 4). Robert et al. and Hayes et al. demonstrated poor discrimination power with an AUC ranging from 0.501 to 0.699. The best performance among the six models was by IMS and Ad-DiaRem, with an AUC of 0.849. However, they were not statistically different from the other models, except Robert et al. and Hayes et al. The cut-off points for the ROC curve of each model and its corresponding PPV and NPV are also shown in Table 4.

Fig. 1
figure 1

Discrimination power of different risk prediction models using the ROC plot. a Scoring systems. b Logistic regression models. IMS, ABCD, DiaRem, Ad-DiaRem, DiaBetter, Ana et al., and Dixon et al. showed excellent discrimination power (AUC, 0.800–0.899), and Robert et al. and Hayes et al. showed poor discrimination power (AUC 0.500–0.699)

Table 4 Discrimination of different models using AUC and the associated cut-off points

Calibration

The calibration of each risk prediction model was performed using the Hosmer–Lemeshow test (Table 5). All models demonstrate good fits statistically (p > 0.05). In addition, predicted diabetes remission rate were calculated according to the equation of individual model and compared with the observed diabetes remission (Fig. 2). All models overestimated diabetes remission from 5 to 30% compared with the observed rates. The predicted diabetes remission of the DiaBetter model was the most similar to the observed diabetes remission.

Table 5 Calibration of different models was evaluated using goodness-of-fit Hosmer–Lemeshow test and the predicted-to-observed ratio
Fig. 2
figure 2

The predicted-to-observed diabetes remission ratio of different risk prediction models. a Scoring system. b Logistic regression models. All models overestimated diabetes remission 5 to 30% compared with observed rates. The predicted diabetes remission rate of the DiaBetter model had the most similar results to observed diabetes remission

Discussion

This study validated the performance of 11 risk models to predict diabetes remission 1 year after SG. IMS, ABCD, DiaRem, Ad-DiaRem, DiaBetter, Ana et al., and Dixon et al. models showed excellent discrimination power with an AUC range of 0.800–0.899. Furthermore, all models over-estimate diabetes remission from 5 to 30% but did not lose their goodness of fit. The objective of our study was to provide the most comprehensive external validation of current risk prediction models of diabetes remission after bariatric and metabolic surgery with patients undergoing SG.

Effective models focus on one specific group of patients to provide precise prediction. The majority of the current models were created from RYGB patients [15]. Aurélie et al. performed a similar validation of six models with 84 patients who underwent RYGB with a 1-year follow-up [39]. However, SG has become the mainstream of bariatric and metabolic surgery worldwide [18, 19], which has led to the need for risk models suitable for SG patients, not RYGB. In the literature, only ABCD, IMS, and DiaRem scores had validated their models in sorely SG patients [28, 37, 40,41,42]. Some head-to-head comparison of individual model was conducted but they were mostly composed of RYGB patients [30, 31, 38, 43]. The advantage of our study was that it sorely used patients with SG to validate all models created from non-SG ones. Hence, it could offer improved evidence for current clinical utility.

Performance evaluation of one clinical prediction model requires precise statistical tools, usually through discrimination and calibration [24, 25]. Discrimination refers to the model’s ability to separate the positive outcomes from negative ones, whereas calibration refers to how similar the prediction is to the observed outcome. In some conditions, even if the model demonstrates effective discrimination, its prediction may differ greatly from the actual condition [24]. However, a recent systematic review of risk prediction models for diabetes remission after bariatric surgery concluded that few underwent external validation, and most did not calculate the discrimination and calibration [15]. Ad-DiaRem, DiaBetter, Robert et al., Ana et al., and Dixon et al. presented discrimination power in the original derivations with AUC value of 0.911, 0.867, 0.950, 0.923, and 0.69, respectively. These differed greatly in comparison with our validation. In a validation cohort conducted by Aurélie et al., the prediction error was used to calculate calibration [39]. By contrast, we applied the Hosmer–Lemeshow goodness-of-fit test and predicted to the observed ratio to provide a more intuitive understanding of the calibration results.

Effective risk models not only provide accurate prediction but are also easy to use [15]. Scoring models are by far more convenient to use in daily clinical practice than logistic regression models. Hence, we recommend the IMS, ABCD, DiaRem, Ad-DiaRem, and DiaBetter scoring models because they have strong discrimination power and are convenient to use. In addition, we calculated the optimal cut-off point of each model for clinical application. Using cut-off points, patients with a high probability of diabetes remission can be identified, which motivates them to undergo surgery earlier rather than postponing it. In the low probability of diabetes resolution group, although surgery could provide better glycemic control than medical therapy alone, effective risk models could prevent unreasonable expectations from surgery.

Models by Hayes et al. and Robert et al. showed poor discrimination power (AUC < 0.7); this is because the model of Hayes et al. contains two variables [34], whereas most models contain > 3 variables to integrate the complex mechanism of diabetes remission after surgery. Furthermore, unlike other scoring models, Robert et al. presented the only model using a binary scoring system [32], which did not weigh differently in various degrees of severity. This explains its poor performance.

The major limitation of our study is that it included results of 1 year after surgery, which is insufficient to predict long-term diabetes remission of SG. The durability of SG in long-term diabetes control remains controversial [28, 44, 45]. We should be cautious of interpreting the results of this study in clinical usage due to the possibility of diabetes relapse. However, even in patients with diabetes relapse, the “metabolic memory” or “legacy effect” persisted and provided long-term metabolic benefits, [9, 10] which strengthens the need for early intervention [11]. Moreover, this study was retrospective and included a small sample size, further prospective validation with a larger sample size will be required to improved prediction power. Moreover, this study cohort was composed of only Asian patients; thus, further studies that include other races will be necessary to confirm ethnic differences.

In conclusion, our results revealed that IMS, ABCD, DiaRem, Ad-DiaRem, DiaBetter, Ana et al., and Dixon et al. models are excellent at predicting diabetes remission at 1 year after SG. Scoring models were recommended because of their easy utility. Further research is warranted to establish risk prediction models of diabetes remission after SG.