Introduction

Postoperative complications in colorectal cancer surgery are associated with increased morbidity and mortality as well as risk for cancer recurrence [1,2,3].

The ability to identify patients at high risk for complications has potential for patient optimization prior to surgery, increased surveillance postoperatively and can aid in better informed decision-making for both patient and surgeon in guiding surgical treatment. The multidisciplinary team (MDT) meeting, where patients with newly diagnosed cancer are discussed and their surgical treatment is planned, could benefit from calculated risk assessments.

Factors that are associated with an increased risk for developing severe postoperative complications or anastomotic leakage have been reported in previous studies [4,5,6]. However, assessing the full phenotype of a patient to quantify an individual’s risk is more multifaceted. Machine learning (ML) algorithms are increasingly being utilized in medical research due to their ability to capture complex relationships between a multitude of variables, including in surgical risk prediction [7].

The aim of our study was to train and internally validate models to predict the occurrence of complications, as well as anastomotic leakage specifically, after resection for colorectal cancer. Using only variables that are available prior to surgery from the Danish Colorectal Cancer Group (DCCG) database, we tested whether they can achieve sufficient discriminative power to be used in a clinical setting.

Materials and methods

Data sources

Prospectively collected patient data for building the prediction models were taken from the national quality assurance ‘Danish Colorectal Cancer Group’ (DCCG) database, that has recorded information on over 95% of all patients that have received a colorectal cancer diagnosis in Denmark since 2001 [8]. It consists of demographic data as well as detailed information on comorbidities, tumor stage and localization, chemotherapy, procedure type and resection, and whether intra- and or postoperative complications occurred. The DCCG has an overall data completeness of > 96%, last validated in 2020 with an accuracy of 95% [9]. While the registration of postoperative complications has been part of the DCCG registry from inception, more detailed variables were introduced in 2014, using the Clavien–Dindo classification of severity of complications [10]. As the outcome was defined as postoperative complications of Clavien–Dindo grade 3B or higher, only this subset of data was used.

Source vocabulary data from the DCCG were transformed into standard vocabularies, primarily the systematized nomenclature of medicine clinical terms (SNOMED CT) and by using custom concepts where granularity was lacking in currently available standard vocabularies. Patient sensitive information was encrypted and data were de-identified. Data were standardized to the observational medical outcomes partnership (OMOP) common data model (CDM) and the standardized vocabularies [11, 12]. Data conversion, transformation and quality control (QC) were achieved using the observational health data sciences and informatics (OHDSI) open source tools guided by the European health data evidence network (EHDEN) and in in collaboration with EdenceHealth NV (Veldkant 33A, Kontich, Belgium).

The study was approved by the Region Zealand (REG-047-2020). The study results are presented in accordance with the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement [13].

Patient population

All patients over 18 years of age undergoing any resection for colorectal cancer between January 1, 2014 and December 31, 2019 from the nationwide DCCG database were included. Seventy-five percent of the patients were randomly selected for model training, while the remaining 25% were used for internal model validation, following a three-fold cross-validation strategy for hyper-parameter optimization. End of follow-up was 30 days after surgery. For the prediction models analyzing anastomotic leakage, only those patients receiving a gastrointestinal anastomosis were included. This included patients that received more than one resection of bowel segment, with both an intestinal anastomosis and an ostomy performed within one surgery. Table 1 shows the patient demographics and their preoperative characteristics used for training the models. In order for models to be utilized for all patients undergoing colorectal surgery, the data set was not limited to specific surgeries. Rather, when inputting patient data into the model, the model can predict outcomes for a variety of open, laparoscopic and endoscopic resections in emergent and elective settings.

Table 1 Patient cohort used for prediction model building and their preoperative characteristics

While having predicted risks available prior to surgery is mostly relevant to elective surgery and the MDT setting, it was decided that models should still be able to calculate the risk for postoperative complications for emergent cases as well. Risks can still be calculated before surgery and the prediction may have an impact on surgical strategy.

Outcomes

The primary outcome of this study was defined as a patient experiencing a postoperative complication of grade 3B or higher as defined by the Clavien–Dindo postoperative complication classification within 30 days after surgery for colorectal cancer [10]. This classification grades the severity of the complication by the necessity, manner, and invasiveness of a resulting intervention. Thus, a patient requiring a postoperative intervention under general anesthesia, admission to the intensive care unit (ICU) or death within 30 days, was considered to have experienced the target outcome.

A secondary prediction model was designed for patients receiving an anastomosis during surgery, and their risk of experiencing anastomotic leakage within 30 days after surgery. While those anastomotic leaks that were the cause of a Clavien–Dindo grade 3B or higher complication were included in the previous outcome, we wondered if we could predict anastomotic leakage specifically. This could facilitate the decision of whether an anastomosis or a stoma is more appropriate for the individual patient at the time of resection. Anastomotic leakage was defined as a either type A, B or C leakage diagnosed clinically, radiologically, endoscopically or surgically [14].

Prediction models compute the probability between 0 and 1 of a patient developing the predicted outcomes.

Predictors

All variables available prior to surgery as well as decisions made prior to surgery that would be available in an MDT meeting, such as planned primary procedure, surgical approach, priority and intent, were used as candidate predictors in the prediction models. Predictors included demographic variables, tumor staging and localization, biopsy results, imaging results, known comorbidities, planned primary procedure and neo-adjuvant chemotherapy. Some predictors could be available only for certain patients, such as tumor distance from the anal verge measured by flexible or rigid endoscopy or MRI staging of lymph nodes, which would therefore only be available for risk calculations in patients with rectal cancer.

Missing data

Missing data were considered missing at random. Standard practices in ML were followed by using one-hot encoding for categorical variables. Missing data were classified as the absence of these categorical variables.

Statistical analysis

All preoperative variables were used as predictors in initial analysis, after which a manual feature selection was performed and variables were selected that were relevant and available at an MDT meeting. All categorical predictors were converted into binary predictors. Minimum cohort sample size calculations were performed using the ‘pmsampsize’ package v.1.1.2 in R.

For prediction model training, the ATLAS version 2.9.0 interface for OHDSI tools was used to design the study and run characterizations. The prediction models were trained using the ‘Patient Level Prediction’ package, R version 4.1.0 and Python version 3.9.6 [15, 16]. Prediction model algorithms included least absolute shrinkage and selection operator (LASSO) logistic regression, gradient boosting machines, adaboost, random forest, K nearest neighbor, multilayer perceptrons and decision trees, and models were trained using the default hyper-parameters settings. The models with the best performance on the internal validation sets were then selected based on their capability of discrimination using the AUROC and Precision-Recall curve.

Sensitivity and specificity were obtained for their respective thresholds. As there is a tradeoff between sensitivity and specificity, thresholds can be set depending on which is favored in a specific clinical setting. A threshold set for high specificity would thereby be preferred in a setting where knowing whether an individual will not get a complication, is prioritized. When requiring high diagnostic accuracy on the probability of getting an outcome such as anastomotic leakage, a high sensitivity might be preferred and a threshold can be set accordingly, in order to identify those that might benefit from a stoma.

Results

A total of 23,907 patients underwent surgery for colorectal cancer in Denmark between 2014 and 2019. Of these, 2,958 patients (12.4%) experienced a complication of Clavien–Dindo grade 3B or higher within 30 days after surgery. Of 17,190 patients that received a gastrointestinal anastomosis in the same timeframe, a total of 929 patients (5.4%) experienced anastomotic leakage. Table 1 shows the patient data used for prediction modeling. Minimum sample size calculation results can be found in the supplementary Tables 1 and 2.

Prediction models

Postoperative complications Clavien–Dindo grade 3B or higher

The prediction model with the best discrimination and calibration for predicting postoperative complications of Clavien–Dindo grade 3B or higher used 111 preoperative predictors in a LASSO Logistic Regression model with an AUROC = 0.704 (95%CI 0.683–0.724, Fig. 1). (Table 3 in the supplement shows performance metrics for all other ML models trained). After reverting binary variables to categorical variables, the variable selection process singled out a total of 30 variables for predicting the risk of postoperative complications of Clavien–Dindo grade 3B or higher (Table 2 shows demographic predictors used in the model. The full list can be found in the supplementary Table 4). The area under the precision-recall curve (AUPRC) was at 0.285 (95%CI 0.252–0.317). Calibration of the validation set was good, with calibration-in-the-large 1.01 (ratio of mean predicted risk/mean observed risk), calibration-intercept 0.10 and calibration-slope 1.06. (Fig. 2). The Brier score was 0.10.

Fig. 1
figure 1

Receiver operating characteristic plot of LASSO logistic regression model for predicting postoperative complications CD 3B or higher for patients operated on between 2014–2019 CD-3B:Clavien–Dindo grade 3B

Table 2 Demographic preoperative variables list used in the model for predicting Clavien–Dindo complications grade 3B or higher, with regression coefficients for binary predictors (Full list of predictors can be found in the supplementary table 1)
Fig. 2
figure 2

Calibration plot of LASSO logistic regression model using loess algorithm for internal validation set for predicting postoperative complications CD 3B or higher for patients operated on between 2014 and 2019. Dots represent subgroups of patients and their predicted vs. observed probability of the outcome (Loess Locally estimated scatterplot smoothing) CD-3B Clavien–Dindo grade 3B

Based on regression coefficients, American Society of Anesthesiologists (ASA) scores of 3 or higher, performance status of 2 or higher, abdominal colectomies, open surgery, emergency surgery and tumor perforation had high impact on the model and were associated with a higher risk for postoperative complications. Predictors associated with a lower risk of postoperative complications were endoscopic resections, female patients, right-sided and sigmoid colectomies, ASA score of 1, curative resections, resections without creation of an ostomy, and patients with slightly elevated body mass index (BMI).

Anastomotic leakage

The prediction model with the best discrimination and calibration for predicting anastomotic leakage used 58 preoperative predictors in a LASSO Logistic Regression model with an AUROC = 0.690 (95%CI 0.655–0.724, Fig. 3). (Supplementary Table 5 shows performance metrics for all other ML models trained). When reverting binary variables to categorical variables, the model used a total of 27 variables for predicting the risk of anastomotic leakage (Table 3). The AUPRC was at 0.119 (95%CI 0.079–0.162). Calibration of the testing set was also good with a calibration-in-the-large of 1.00, a calibration-intercept of -0.15 and calibration-slope 0.94 (Fig. 4). The Brier score was 0.05.

Fig. 3
figure 3

Receiver operating characteristic plot of LASSO Logistic Regression model for predicting anastomotic leakage for patients operated on and receiving a gastrointestinal anastomosis between 2014 and 2019

Table 3 Full preoperative variables list used in the model for predicting anastomotic leakage, with regression coefficients for binary predictors
Fig. 4
figure 4

Calibration plot of LASSO Logistic Regression model using Loess algorithm for internal validation set for predicting anastomotic leakage for patients operated on and receiving a gastrointestinal anastomosis between 2014 and 2019. Dots represent subgroups of patients and their predicted vs. observed probability of the outcome (Loess Locally estimated scatterplot smoothing)

Patients with rectum resections were predicted to be at higher risk for anastomotic leakage, as were patients with a higher BMI, ASA score of 3, ileorectal anastomosis and smokers. Variables used as predictors for not developing anastomotic leakage were among others right-sided hemicolectomies, female patients, non-smokers, patients receiving a permanent ostomy and ASA 1 score.

The full covariate list with the intercept is shown in the supplementary Table 6.

Thresholds

Positive and negative predictive values as well as sensitivity and specificity were calculated according to the various risk thresholds (see supplementary Tables 7 and 8).

Discussion

We trained two prediction models using a national quality assurance database to predict the risk of postoperative complications and anastomotic leakage after colorectal cancer surgery. Only predictors that were available during the MDT meeting were included in order to enable informed decision-making prior to surgery.

Model performances using only preoperative variables available from the registry were not up to a sufficient standard for use in a clinical setting yet with an AUROC of 0.704 (95%CI 0.683–0.724) and 0.689 (95%CI 0.654–0.724) respectively for discriminative power. Interestingly, the automatically selected predictors for the postoperative complications risk models agreed with the current literature. Variables with the highest absolute weights associated with a higher risk and used by the prediction model were ASA score 3 or higher, performance status ≥ 2, emergency surgery, undergoing total colectomy and open surgery. Similarly, variables in the model identified as being associated with a lower risk for complications were endoscopic resections, ASA 1 score, female patients, and patients with a slightly elevated BMI.

The model for prediction of anastomotic leakage equally identified and utilized predictors that are known to be associated with anastomotic leakage [17]. Predictors associated with the occurrence of anastomotic leakage were patients that had rectal cancer, were smokers and with a BMI > 30 kg/m2. Female patients, non-smokers, ASA 1 score and right-sided resections were predictors associated with not getting an anastomotic leak.

Besides selecting predictors that are known to be associated with postoperative complications or anastomotic leakage, our models selected additional parameters using a total of 30 and 27 categorical predictors respectively. This illustrates the complexity of individual risk assessment and how a data-driven approach can add value to predictions.

The American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) has similarly been used to construct a universal risk calculator, that can predict complications after a wide variety of surgeries. A specific machine learning model for hepatopancreatic and colorectal surgeries was also created [7, 18]. While predictive performance for the latter was very high (AUROC = 0.816) in the universal risk calculator for postoperative complications, discriminatory power varied greatly during subsequent external validation by other groups [19,20,21,22]. Performing an external validation of existing models such as the ACS NSQIP surgical risk calculator using our data might allow for better comparison of predictive performance between models, but is not currently possible due to some data, such as steroid use that is required for the risk calculator, not being available in our dataset.

Models to predict anastomotic leakage have previously been trained using literature reviews or expert opinion of prognostic factors as well as existing datasets [17, 23, 24]. Models that have been externally validated have used both preoperative and intraoperative variables to predict the risk for anastomotic leakage [17, 25]. Among the variables used to predict anastomotic leakage were obesity, sex, ASA score, preoperative serum total proteins, ongoing anticoagulant treatment, the occurrence of intraoperative complications, blood loss or transfusion as well as duration of operation. External validation performed relatively well, with AUROC’s reported between 0.623 and 0.96 [26,27,28,29]. Using intraoperative variables for prediction of anastomotic leakage means a model is unavailable prior to surgery, but of course, it still allows the surgeon to make intraoperative treatment decisions such as creating a stoma instead of an anastomosis.

It is important to acknowledge that prediction models must always be taken into context, and predictions can, but need not force a change in treatment. The identification of a patient at high risk in itself can influence an outcome. An increased surveillance in order to identify complications early in the postoperative phase has been shown to lower mortality in high-risk patients [30]. Another option can be a form of pre-habilitation for optimizing patients’ health status prior to surgery [31, 32]. Alterations in treatment, such as creating an ostomy instead of a primary anastomosis, should always be a decision made by the surgeon and the patient. Considering the high negative predictive value of our model for predicting the risk of anastomotic leakage, having a low risk for anastomotic leakage could encourage the creation of an anastomosis. (Supplementary Table 6).

In prediction modeling, the AUROC curve is essentially a visualization of predictive discrimination across all thresholds. Defining a threshold is not a necessity for predictive modeling, as the calculation of the individual risk can be sufficient for informed decision-making. Furthermore, multiple thresholds can be set based on different clinical scenarios and patients. In the clinical setting, it is the treating colorectal surgeon with the patient that effectively sets an individual threshold for a patient. However, if a binary ‘high-risk’ or ‘low-risk’ classification is desired, thresholds can be set which can be based on a harm-benefit analysis of morbidity and mortality, health care costs as well as patient and surgeon preferences.

Strengths of this study are that data-driven predictions are based on a large, validated, well-curated, nationwide quality assurance registry with high completeness. Furthermore, models only used preoperative predictors, making them available for aiding in clinical decision-making prior to surgery at the MDT meeting. Limitations include the restriction of complications to 30 days after surgery, potentially missing out on some surgery-related complications (e.g., late anastomotic leakage), and the complexity of the prediction models based on 30 and 27 categorical preoperative predictors respectively. Use of a model in a clinical setting could require simplification with additional sacrifice of performance of the model, or an application that can access the required variables directly from electronic health records to eliminate manual entering of a large amount of data to increase usability. Due to lack of a gold standard, we deem a clinical validation of prediction models necessary before considering them in a clinical setting. Furthermore, accuracy could also be reduced because we aimed to predict complications for a wide variety of colorectal cancer surgeries in two ‘universal’ models, versus if we had constructed separate models for colon or rectum cancer surgery, or even separate primary procedures. Training procedure-specific models could be another way forward to increase predictive performance.

Utilizing additional data sources containing more in-depth phenotypic data might ameliorate prediction models so that they can aid in informed decision-making for personalized medicine in cancer surgery. Previous prediction models have used albumin and hemoglobin levels, and it has been shown that a combination of physiological measurements and medical history makes better predictions [23, 33]. Adding preoperative laboratory data and past medical history to our prediction model might improve predictive performance. External validation will indicate the extent that the models can be generalized on populations outside of Denmark, and after clinical validation studies we believe that prediction modeling will be an integral part of the MDT in future.

Conclusions

In this study, we demonstrated that it was possible to train and validate prediction models that could predict the occurrence of postoperative complications and anastomotic leakage. Discriminative power was deemed insufficient for current use in a clinical setting, but increasing data coverage to include a wider spectrum of the patient’s phenotype and genotype or constructing procedure-specific models, could improve models to become valuable for clinical practice.

Machine learning methods using observational health data utilized known factors associated with a higher risk for complications and anastomotic leakage as well as identified additional predictors, confirming the usefulness for a data-driven approach in prediction modeling.