The capacity of bariatric surgery procedures to induce type 2 diabetes remission (DR) is still unmatched by any other intervention. Long-term (> 1-year postoperative) DR rates have been less-frequently reported, but may reach 50–63% 5-year postoperatively [1,2,3]. Hence, while DR is achievable in a considerable number of patients following bariatric procedures, others remain with diabetes after the surgery (non-diabetes remission, NDR). While bariatric surgeries entail considerable risks and costs, the number of operations is increasing due to the obesity epidemic and newly defined indications [4]. Thus, there is a growing need for tools that better predict long-term outcomes, particularly the resolution of major obesity-related co-morbidities, such as type 2 diabetes. These tools could improve clinical decisions and help set realistic outcome expectations, which are important for patients, their carers, and healthcare systems.

Predictive tools for DR after bariatric surgeries—predominantly following Roux-en-Y gastric bypass (RYGB) have been proposed [5,6,7]. The ABCD and DiaRem scores are both based on preoperative patients’ characteristics [7,8,9,10,11,12]. These tools highlight parameters such as older age, longer duration of diabetes, the use of anti-diabetic medications (particularly insulin), and poorer metabolic control, as predictors of a lower chance for DR. It is commonly thought that the basis for the link of these predictors with lower DR rates is diabetes severity. Accordingly, the compromise of pancreatic beta-cell reserve limits postoperative endogenous insulin that is required to maintain euglycemia without additional medications, this being the ADA definition of DR. Indeed, preoperative C-peptide levels, which are included in the ABCD score [11], may improve prediction of DR after bariatric surgery [13].

The DiaRem score [7] was shown to exhibit an acceptable predictive power for DR in various populations 1-year post-RYGB, with a predictive performance superior to other scores [9]. The DiaRem has also been shown to predict DR following other types of bariatric surgeries [14]. The score is easily implemented in clinical practice, since it relies on basic clinical parameters (age, BMI, HbA1c, and the use of insulin therapy and classical oral hypoglycemic agents), rather than on less-conventional or non-standard biomarkers, like C-peptide, which is included in the ABCD score [6]. Nonetheless, the following issues limit the universal use of the DiaRem score: (i) DiaRem was demonstrated to be predictive of DR 1-year after RYGB. Its predictive capacity for longer-term DR is controversial [15, 16]. (ii) Limited information is available regarding DiaRem’s ability to predict DR after other bariatric procedures, such as sleeve gastrectomy (SG) and adjustable gastric banding (GB) [14]. (iii) Although DiaRem performs well at the extreme score values (low values nicely predict DR and high values predict NDR), its performance in the middle score range was reported as sub-optimal [17]. (iv) DiaRem accounts for the use of oral hypoglycemic agents, but not newer classes of drugs, including GLP-1 analogs, DPP4, and SGLT2 inhibitors, which have become highly prevalent in current type 2 diabetes pharmacotherapy.

Recently, we proposed a new scoring system based on the DiaRem—the Advanced (Ad)-DiaRem. The Ad-DiaRem outperformed DiaRem in predicting DR 1-year post-RYGB in two independent populations [8]. Ad-DiaRem includes the items present in DiaRem with a re-defined “penalty score” for each item, to which we added the number of anti-diabetic drugs (including new drug classes), and diabetes duration (Supplemental Table 1). In the present analyses, we aimed to evaluate the capacity of DiaRem and Ad-DiaRem to predict longer-term DR (i.e., 2 and 5 years) following RYGB, SG, and GB. We hypothesized that Ad-DiaRem may perform better than DiaRem in predicting DR and NDR, as we recently reported for DR prediction 1-year post-RYGB [8]. To address this hypothesis, we assessed the capacity of the two related scores to predict DR 2 and 5 years after RYBG, SG, and GB in a large HMO registry database. Our findings provide real-world data of the potential clinical usefulness of the examined tools.

Research Design and Methods

Study Population

The population included in the current analyses was previously described in detail [1]. Ethical approval for the study was obtained from the Rabin Medical Center Ethics Committee, Petach Tikvah, Israel. From the electronic medical records of Clalit Health Services (CHS), the largest healthcare organization in Israel, we identified 13,425 persons who underwent bariatric procedures during 1999–2011; of them, 2190 (16%) had a preoperative diagnosis of type 2 diabetes based on the ADA criteria (Fig. 1). CHS criteria for bariatric surgery in Israel are consistent with those issued by the NIH [18] and include BMI > 40 kg/m2 or > 35 kg/m2 and at least one obesity-related risk factor. The full preoperative criteria required to calculate both the DiaRem and Ad-DiaRem scores (detailed below), and 2- and 5-year postoperative glycemic status information were available for 1502 (68.6%) and 1459 (66.6%) persons, respectively. Complete and partial DR were defined according to the ADA criteria [19]. DR was achieved at 2 and 5 years in 62.4% and 53.7%, 61.8% and 53.5%, and 56.4% and 53.8% of patients after RYGB, SG, and GB, respectively (Fig. 1).

Fig. 1
figure 1

Flow chart of study population

The DiaRem Scoring Systems

DiaRem scores (Supplemental Table 1) were calculated as reported [7], based on age, preoperative HbA1C, the use of metformin, sulfonylurea, glitazones, and/or insulin; the range of scores was 0–21. A low DiaRem score should predict a high chance of DR, and a high DiaRem score should predict NDR. We calculated the lowest score range that cumulatively included at least 80% of the DR patients and the high-score range that included ≥ 80% of NDR patients, to visually (qualitatively) depict the score’s respective sensitivities.

The Ad-DiaRem score was developed based on the French BARICAN cohort [20], using machine learning, as detailed elsewhere [8]. In brief, the Ad-DiaRem is based on six potential DR predictors of 43 screened clinical, laboratory, and adipose tissue variables. Ad-DiaRem includes the DiaRem criteria, plus the number of anti-diabetic drugs (all currently clinically available drug classes used for type 2 diabetes treatment), and diabetes duration (originally, patient-reported; here, based on the electronic medical file database). Machine learning was used to determine optimal categories and to identify the respective weights (penalty scores) of each variable that results in the best prediction of postoperative diabetes status.

Statistical Analysis

The predictive performances of the DiaRem and Ad-DiaRem were evaluated by areas under the receiving operator characteristic (AUROC) curves using the DeLong method. Analyses were conducted using the SPSS (Chicago, IL) for Windows Software, version 20.0 and http://vassarstats.net/roc_comp.html to compare ROC analyses.

Results

Preoperative baseline characteristics of the population included in the current analyses are presented in Table 1, stratified by either 5-year DR/NDR status or by surgical procedure (the parallel data for 2-year postoperative DR/NDR status is presented in Supplemental Table 2). As expected, patients who exhibited long-term DR were more likely than patients who exhibited postoperative NDR to be younger, with lower preoperative HbA1c, and diagnosed with diabetes for a shorter period of time, and they were less likely to be treated with insulin or with a high number of diabetes medications. Importantly, baseline characteristics differed between patients who underwent the three procedures; this reflects common clinical practice in Israel. For example, the more radical procedure, RYGB, is more often performed in patients with more intensively treated diabetes and with longer preoperative diabetes duration. Thus, this real-world dataset is better geared to provide within-procedure comparisons (between DiaRem and Ad-DiaRem, and between 2- and 5-year postoperative DR prediction) than to compare the performance of the scores between the three procedures.

Table 1 Baseline characteristics of study population

The capacity of DiaRem versus Ad-DiaRem to predict DR 5-years post-RYGB is shown in Fig. 2. Patients who exhibited DR tended to cluster, in both scores, at the lower range (Fig. 2a, green bars). Yet, patients with 5-year postoperative NDR exhibited a bi-modal score distribution by DiaRem, but not by Ad-DiaRem (Fig. 2a, red bars). The lowest DiaRem score range, which includes at least 80% of the DR patients (i.e., sensitivity ≥ 0.8, Fig. 2a, green shaded area), had a positive predictive value (PPV) for DR of 73.2% for DiaRem and 78.2% for Ad-DiaRem [the corresponding scores for predicting 2-year postoperative outcome were 81.3% and 86.4%, respectively (2-year DR status by score distribution is presented in Supplemental Fig. 1i and 1ii)]. This seeming superiority of Ad-DiaRem over DiaRem in predicting DR is also manifested by a smaller, 1 versus 2 score category overlap, respectively, between the red and green shaded areas, representing 80% or more of the DR or NDR patients, respectively (Fig. 2a). More formally, by ROC analysis, Ad-DiaRem had a larger area under the ROC curve [AUC = 0.85 (0.76–0.93)] than did DiaRem [AUC = 0.78 (0.69–0.88), Fig. 2b], though the difference did not reach statistical significance. Importantly, while for DiaRem, the AUC for predicting postoperative diabetes status decreased from 0.81 at 2 years (Supplemental Figure 1iii) to 0.78 at 5 years, Ad-DiaRem maintained an equivalent predictive capacity of DR post-RYGB for 5 years as for 2 years postoperatively (both AUC = 0.85). Further, the chance for DR 5-years postoperatively spanned from 100 to 0% for the lowest to the highest Ad-DiaRem score ranges whereas the corresponding score ranges with DiaRem were 88.9 to 16.7% (Fig. 2c). This demonstrates a mild advantage of post-RYGB DR prediction by Ad-DiaRem over DiaRem. By a complementary representation, increasing score ranges associated with a diminishing chance for DR in a more consistent/continuous manner when using Ad-DiaRem than DiaRem (Supplemental Fig. 4A).

Fig. 2
figure 2

The capacity of DiaRem versus Ad-DiaRem to predict diabetes remission (DR) 5-years post-Roux-en-Y gastric bypass (RYGB) (a). Patients with diabetes who underwent RYGB were followed for being either diabetes free (i.e., DR, green bars) or remaining with diabetes (i.e., non-DR, red bars) 5-years postoperatively (i: by DiaRem scoring; ii: by Ad-DiaRem scoring). The green shaded area denotes the lowest score range that includes at least 80% of the DR patients (i.e., sensitivity ≥ 0.8). The red shaded area denotes the highest score range that includes at least 80% of the non-DR patients. (b) ROC analysis for DiaRem score (blue line) 5-years post-RYGB (area under the ROC curve: 0.78) and Ad-DiaRem score (red line) 5-years post-RYBG (area under the ROC curve: 0.85). (c) Percent DR 5 years postoperatively by predicting score. DR 5-years postoperatively spanned from 88.9 to 16.7% for the lowest to highest DiaRem score ranges and from 100 to 0% for the lowest to highest Ad-DiaRem score ranges

DR 5-year post-SG prediction using the two scores is shown in Fig. 3 (2-year postoperative outcome status is presented in Supplemental Fig. 2). As for RYGB, score distribution of the NDR patients exhibited a bi-modal pattern by DiaRem, but not by Ad-DiaRem. PPV for predicting at least 80% of the 5-year postoperative DR population was 71.0% for DiaRem, a decrease from 82.1% for 2-year postoperative DR prediction (Supplemental Fig. 2), while the corresponding values for Ad-DiaRem were 76.2% and 85.1% for 5-year and 2-year outcome predictions, respectively. Nevertheless, ROC analysis and the range of %DR prediction scores did not reveal a clearly improved predictive capacity of Ad-DiaRem over DiaRem for SG (Fig. 3b, c), although the cumulative %DR declined more with increasing score when Ad-DiaRem was applied compared to DiaRem (Supplemental Fig. 4B).

Fig. 3
figure 3

The capacity of DiaRem versus Ad-DiaRem to predict diabetes remission (DR) 5-years post-sleeve gastrectomy (SG) (a). Patients with diabetes who underwent SG were followed for being either diabetes free (i.e., DR, green bars) or remaining with diabetes (i.e., non-DR, red bars) 5-years postoperatively (i: by DiaRem scoring; ii: by Ad-DiaRem scoring). The green shaded area denotes the lowest score range that includes at least 80% of the diabetes remission patients (i.e., sensitivity ≥ 0.8). The red shaded area denotes the highest score range that includes at least 80% of the non-DR patients. (b) ROC analysis for DiaRem score (blue line) 5-year post-SG (area under the ROC curve: 0.82) and Ad-DiaRem score (red line) 5-years post-SG (area under the ROC curve: 0.82); (c) Percent DR 5-years postoperatively by predicting score. DR 5-years postoperatively spanned from 84.2 to 3.4% for the lowest to highest DiaRem score ranges, and from 82.8 to 0% for the lowest to highest Ad-DiaRem score ranges

DR post-GB prediction, using either DiaRem or Ad-DiaRem, is presented in Fig. 4 (DR 2-year postoperative prediction is shown in Supplemental Fig. 3). Applying DiaRem, scores of NDR patients greatly overlapped with those achieving DR 5-year postoperatively (Fig. 4a-i), and this remained the case when using Ad-DiaRem, though the latter was not depicted by a bi-modal NDR score distribution (Fig. 4a-ii). The PPVs for correctly predicting > 80% of the post-GB DR were only 64.3% for DiaRem and 66.3% for Ad-DiaRem, which represent decreases from 70.4% and 72.2%, respectively, for the 2-year postoperative DR prediction by the two scores. ROC analysis (Fig. 4b), %DR per score range (Fig. 4c) and the cumulative %DR with increasing score (Supplemental fig. 4C) all revealed similarly low predictive power of the two scores.

Fig. 4
figure 4

The capacity of DiaRem versus Ad-DiaRem to predict diabetes remission (DR) 5-years post-gastric banding (GB) (a). Patients with diabetes who underwent GB were followed for being either diabetes free (i.e., DR, green bars) or remaining with diabetes (i.e., non-DR, red bars) 5-years postoperatively (i: by DiaRem scoring; ii: by Ad-DiaRem scoring). The green shaded area denotes the lowest score range that includes at least 80% of the DR patients (i.e., sensitivity ≥ 0.8). The red shaded area denotes the highest score range that includes at least 80% of the non-DR patients. (b) ROC analysis for DiaRem score (blue line) 5-years post-GB (area under the ROC curve: 0.73) and Ad-DiaRem score (red line) 5-years post-GB (area under the ROC curve: 0.73). (c) Percent DR 5-year postoperatively by predicting score. DR 5-years postoperatively spanned from 76.1 to 9.1% for the lowest to highest DiaRem score ranges, and from 79.4 to 0% for the lowest to highest Ad-DiaRem score ranges

Finally, to better understand the bi-modal distribution of NDR patients seen with DiaRem, we performed a secondary analysis of the patients with DiaRem < 11 (Table 2). Those who, despite this favorable low DiaRem score range, exhibited NDR 2- or 5-years postoperatively tended to be younger and with a lower preoperative BMI. Nevertheless, they had higher fasting levels of glucose and HbA1c despite more intensely treated diabetes.

Table 2 DR and NDR among those with favorable DiaRem score (< 11 points)

Conclusions

Of the several scoring systems that have been described for predicting DR post-bariatric surgery, DiaRem is one of the most easily implemented and the most reported [9, 12]. DiaRem provides acceptable predictability, as has been demonstrated in several cohorts [7, 9, 12, 14,15,16]. However, data on the usefulness of DiaRem to predict DR more than 1 year postoperatively appears limited [12, 16], and only sparse data have been published regarding its use following common procedures other than RYGB [14]. Here, we evaluated the potential to extend the application of DiaRem using a large HMO cohort. Our results demonstrated that a low-range DiaRem score was similarly useful for predicting DR at 2- and 5-years post-RYGB and SG, as for 1-year post-RYGB. However, trends were observed toward lower performance with time from surgery; this was largely consistent with another recently published report [16]. Notably, DR predictability by DiaRem is significantly lower for GB than for the other two surgeries. Although a lower degree of weight loss achieved by GB compared to the other two procedures could be suspected as an explanation, such difference in weight loss was only a statistically significant finding in this cohort after 1 and 2 years, but not 5-years postoperatively [1]. Furthermore, the Ad-DiaRem score, which was developed using machine-learning tools, improved the positive predictive value in the lower score range for all three procedures at both time points, but tended to improve AUROC and diminish overlap between the DR and NDR score ranges only for RYGB—the procedure for which DiaRem was originally designed. Ad-DiaRem also remained inferior to the DiaRem prediction post-GB.

Analysis of the distribution of DiaRem scores reveals possible reasons for the limited predictive capacity of this scoring system: NDR patients exhibited a bi-modal distribution of scores in our study population, with a significant proportion of NDR patients demonstrating low scores, within the range that should have predicted DR. Intriguingly, according to DiaRem, relatively few patients scored in the range of 11–15. The bi-modal NDR distribution was markedly eliminated with the Ad-DiaRem, possibly reflecting the inclusion in this score of new anti-diabetic drugs that were not included in the DiaRem. This may have contributed to the mildly improved predictive power of Ad-DiaRem compared to DiaRem. Yet, despite eliminating the bi-modal NDR score distribution, Ad-DiaRem did not increase the AUROC compared to that achieved by DiaRem, for either SG or GB. Unlike for RYGB, for SG and GB, DR prediction at 5 years tended to decrease compared to 2-years postoperatively. This may imply differential mechanisms by which each procedure induces DR.

Limitations of this study include our reliance on a single nationwide HMO cohort, and the relatively lower numbers of patients post-RYGB. This reflects the common procedures performed in Israel. The data required to perform the analyses were available for 68.6% and 66.6% of the patients with diabetes who underwent bariatric surgery, at 2- and 5-years postoperatively, respectively. This is a limitation, as those not included in the analysis had lower preoperative scores than did those included (mean Ad-DiaRem scores of 5.31 and 5.59 compared to 9.35 and 9.29 for 2- and 5-year postoperative, respectively). Moreover, patient baseline characteristics differed for the investigated procedures, and DR rates following GB were not inferior at 5-years postoperative compared to those of the other procedures. Nonetheless, the electronic medical records platform accessed enabled tracking a large number of patients with preoperative diabetes up to 5-years postoperatively, and defining their glycemic status using the ADA criteria, which rely on fasting glucose and HbA1c. Diabetes duration, one of the two criteria added to the Ad-DiaRem, was based on data from patients’ electronic files (originally—patient-reported). The relatively large number of patients in the cohort is an added strength of this study, as is the high diversity in patients’ cultural backgrounds, representative of Israel’s population. Collectively, this report includes “real-world data,” an essential complement to results of randomized controlled trials. Within this context, the comparable 5-years DR rates between the GB, RYGB, and SG may be attributed to a number of factors. First, patients who underwent GB tended to have shorter diabetes duration and to be less likely to use insulin therapy (Table 1). This is consistent with prior 2-years post-GB results, in which patients with newly diagnosed diabetes were enrolled [21]. In addition, in Israel, GB is performed in a limited number of specialized centers, with strict long-term postoperative management by dedicated surgeons.

In summary, Ad-DiaRem is an easily implemented score that provides modest improvement for predicting DR compared to the original DiaRem, 2- and 5-years postoperatively. The Ad-DiaRem provides reasonable predictive capacity, which is most suitable for RYBG and similarly for SG, but may be lower for GB. If confirmed by other independent cohorts, Ad-DiaRem may contribute to better precision care of obese persons with diabetes who are candidates for bariatric surgery. Future studies should consider refining Ad-DiaRem to suit the specific procedure and the postoperative follow-up period, potentially by including early (1-year) postoperative outcomes to increase predictive capacity of longer-term DR.