Introduction

Surgery for colorectal cancer (CRC) puts patients at a considerable risk of adverse events, such as surgical site infection, anastomosis insufficiency, or bladder voiding problems (in case of rectal cancer), as well as multiple other complications [1, 2]. The case fatality rate is substantial: reports of 30-day mortality vary between ca. 3% after elective surgery and over 40% after emergency operations [3, 4].

The individual risk of an adverse event depends on multiple factors, such as tumor characteristics (e.g., location and stage), type of surgery, socio-demographic variables like age and sex, comorbidity (e.g., diabetes mellitus, cardiovascular, or neurologic diseases), behavioral risk factors (smoking, alcohol abuse), and concomitant treatments (immunosuppression, glucocorticoids, radio-, radio-chemo-, and chemotherapy). Realistic preoperative assessment of complication probabilities is crucial for patient information, shared decision-making, and fine-tuning therapies [1].

Traditionally, risk estimates rely on the personal experience of the treating surgeon [5, 6]. Statistical models published in terms of formulas for calculating risk scores provide a more objective basis for decision-making, but are often felt cumbersome to apply in clinical practice [7]. This leads to the development of web-based risk calculators, often based on large databases from the US population [1]. Informal personal communications indicate that using these tools is common practice among surgeons in Germany, and thus, probably internationally. Herein lies a danger, since statistical prediction models cannot simply be transported from one population to another [8,9,10]. This motivated the development and validation of a new risk calculator based on a database of CRC surgery patients treated in German hospitals. In this paper, we describe the development and validation of the statistical models underlying the web tool.

Patients and methods

Data source

Model development and validation relied on subsamples from the Study, Documentation, and Quality Center (StuDoQ) database of the German Society for General and Visceral Surgery (Deutsche Gesellschaft für Allgemein- und Viszeralchirurgie (DGAV)), a registry of CRC operations in participating hospitals throughout Germany. Inspired by the National Surgical Quality Improvement Program of the American College of Surgeons (ACS NSQIP), but not simply copying its database architecture, the StuDoQ database was implemented with the objective of documenting, assuring, and improving the quality of surgery in CRC patients.

Each StuDoQ database record consists of more than 200 patient variables, covering demographic characteristics, medical history, tumor features, comorbidity, behavioral risk factors, surgical technique, adverse events, and clinical outcomes. Data are entered via a web interface and automatically checked for completeness, plausibility, and consistency. Once obligatory patient data have been entered and passed the plausibility checks, records are flagged as analysis cases.

Our learning samples for model development consisted of all 6729 colon and 4381 rectum cancer patients flagged as analysis cases between 2010 and 2015. The corresponding validation sets consisted of 2407 colon and 1287 analysis cases documented between January 2016 and February 2017.

Participants

Since the implementation of the StuDoQ database in 2010, the number of hospitals using the system has risen steadily. Participating hospitals are primary care centers located in all 16 states of the Federal Republic of Germany. As of February 2017, 184 of approximately 750 surgical departments in Germany contribute patient records. The spectrum reaches from municipal hospital to university medical centers. All centers take part in an audit procedure by the DGAV to guarantee a functioning quality system. Documentation quality is also checked by routines inbuilt into the data management of the registry.

Eligible patients had to have first-time abdominal tumor resection for cure or palliation of CRC, either elective or in case of emergency. We included patients of both genders without age restriction.

Outcomes

We developed and validated models for the following outcomes: severe complication (defined by a Clavien-Dindo score greater than three), surgical site infection requiring revision of the wound, 30-day mortality, anastomotic leakage, bladder voiding problems, bladder voiding disturbance after rectal surgery, abdominal wall dehiscence, ileus, bleeding requiring transfusions, unplanned ventilation, renal failure, myocardial infarction pulmonary embolism, stroke, 30-day hospital readmission, 30-day reoperation rate, and 30-day mortality.

Predictors

Besides age and gender, a set of potential predictors was selected from the available variables based on surgical literature and the judgment of surgical experts: age, sex, type of surgery (recorded as left vs. right colectomy, extended colon resection, low anterior rectum resection, abdominoperineal rectum resection, and extended rectum resection), tumor location (right/left/transverse/other colon; rectum under 6, 6 to 12, or 12 to 16 cm from the anus, rectum without further specification), body mass index, emergency versus elective surgery, smoking, alcohol abuse, American Society of Anesthesiologists (ASA) category, need of care (none/partial/full), New York Heart Association (NYHA) Functional Classification, coronary artery disease, peripheral artery disease, diabetes mellitus (none, not insulin dependent, insulin dependent), dialysis, disseminated cancer, weight loss, radio- and chemotherapy, anticoagulatory medication (none/aspirin/clopidogrel/other), presence of distant metastases, presence and synchronous resection of liver metastases, chronic obstructive pulmonary disease, blood pressure medication, history of conditions with an increased risk of colorectal cancer (familial adenomatous polyposis, ulcerative colitis, Crohn’s disease, hereditary non-polyposis colon cancer), corticosteroid medication, cerebrovascular disease (none, with/without neurologic deficit), elevated white blood cell count, anemia (none, mild with hemoglobin concentrations between 6 and 12 g/dL, severe with hemoglobin concentrations below 6 g/dL), C-reactive protein elevation, and creatinine elevation.

Since all data were collected during clinical routine by the treating surgeons, documentation of the predictors was not blind for the outcomes under study and vice versa.

Sample size

The learning set consisted of all available patients documented prior to 2016 (colon 6729, rectum 4381), the validation set of all patients treated between 2016 and February 2017 (colon 2407, rectum 1287). No formal sample size estimation was undertaken.

Missing data

Missing data in categorical variables were imputed by majority voting, i.e., by replacing the missing information with the modal value. The R package mice (version 2.25) was used for multiple imputation of quantitative variables with missing values (C-reactive protein, hemoglobin, creatinine) prior to classification using the categories mentioned previously.

Statistical methods

All analyses were performed using R version 3.3.0 [11]. Prediction models were based on logistic ridge using the glmnet package (version 2.0-5). Independent variables were standardized to account for the different measurement scales used. The reported regression coefficients were back-transformed to the original scale. The optimal value of the penalty parameter λ was determined by a tenfold cross-validation based on the learning sample, using the mean squared error (MSE) as optimality criterion. Model discrimination in the validation set was assessed by means of ROC curves and the corresponding areas under the curves (AUC) using the R package ROCR (version 1.0-7). Model calibration was assessed graphically by splitting the validation set in groups defined by the deciles of the predicted event probabilities and plotting the observed frequencies in the resulting groups versus the predicted probabilities. Predictive accuracy was assessed using Brier scores. A perfect prediction produces a Brier score of zero. Predicting the exact opposite of what happened produces a Brier score of one. Predicting each event with the coin toss probability 0.5 produces a Brier score of 0.25.

Results

Table 1 describes the demographic characteristics of the patients and the distributions of the predictor variables in the learning and the validation, stratified by main diagnosis. Table 2 gives an overview of the incidence of the complications modeled in the prediction models. The regression coefficients are displayed in Tables 3 (colon cancer) and 4 (rectal cancer).

Table 1 Demographic and clinical characteristics of the colon and rectum cancer patients in the learning and the validation sets
Table 2 Complications in colon and rectum cancer patients in the learning and the validation sets
Table 3 Regression coefficients from the logistic ridge regression models for prediction of complications in colon cancer patients
Table 4 Regression coefficients from the logistic ridge regression models for prediction of complications in rectum cancer patients

The regression coefficients can be used to weight the corresponding patient variables in the calculation of an additive risk score for the respective outcomes: starting with the intercept, weighted individual variable values add up to yield a sum score η i. Higher scores indicate higher individual risks. The formula exp(η i) / [1 + exp(η i)] (where exp is the exponential function with base e) can be applied to transform risk scores into individual probability estimates.

Table 5 lists the c statistics and Brier scores of all models when applied to the learning and validation sets. In both the validation sets of colon and rectal cancer patients, the models predicting 30-day mortality exhibited the highest discriminatory power with c values of 0.844 and 0.846, respectively. The model predicting pulmonary embolism in rectal cancer patients did not contain a single non-zero regression coefficient and consequently failed to discriminate between persons with and without events (c = 0.500, Brier score 0.25). The majority of the Brier scores was less than 0.1.

Table 5 c statistics and Brier scores for the prediction models

The risk calculator can be accessed via the world wide web at http://risikorechner.dgav.de/. Figure 1 shows the graphical user interface of the program.

Fig. 1
figure 1

Screenshot of the user interface of the DGAV risk calculator

Discussion

Most of our models show satisfactory discriminatory power and calibration. There are, however, exceptions from this rule, namely the prediction model for post-operative pulmonary embolism in patients with rectal cancer. It should be noted that good discrimination and calibration do not preclude individual predictions that are far off the mark. The vast majority of the analyzed patients (and most future cases) presented risk factor patterns effecting only low complication probabilities. It follows that, while predictions for low-risk persons tend to be stable, those for the small, yet particularly interesting, subpopulation of high-risk patients tend to be unreliable.

Our study has some limitations. It is unclear whether the contributing hospitals are a representative sample of the population of all surgical centers. Regarding the selection of patients, there is only a low potential of bias, since documentation quality and completeness are monitored regularly. There is some doubt, however, regarding the completeness of follow-up after hospital discharge. The 30-day mortality in the StuDoQ sample is lower than the values reported in the international literature and in hospital remuneration data for the population of German statutory health insurance members available at the German Federal Bureau of Statistics. While mere underreporting of outcomes unavailable at the time of hospital discharge need not have biased the regression slopes for the predictors, but must have resulted in risk estimates biased too low.

Surgical decision-making is a difficult process. Traditionally, it uses intuition and experience, but studies have shown that both physicians and patients often overestimate benefits and underestimate harms from medical interventions [12, 13]. Especially in complex situations like cancer surgery, where the level of uncertainty is high, a more analytical approach would be helpful to support the decision-making process [14]. The DAGV risk calculator is such a decision aid. It enables patients and surgeons to balance risks and expected outcomes and arrive at a decision together.

To prevent potential misuse and misinterpretation of the predictions from the risk calculator, we have complemented it with a web-based e-learning course raising consciousness for the statistical issues, untoward clinical consequences, and the appropriate application of the tool. One of the major challenges for patients and physicians is understanding and dealing with the inherent prognostic uncertainty. The e-learning tool not only provides practical tips to communicate percentages and probabilities in a comprehensible way but also encourages to integrate the patients’ values, preferences, and their current situation in the decision-making process.

Future research will have two major foci. Firstly, as the number of eligible patients documented in the StuDoQ database rises, models will have to be revised updated regularly, steadily increasing the reliability of the predictions. Secondly, the likely benefits and possible perils of using the risk calculator will have to be investigated systematically. We are confident that we are providing a tool that facilitates and supports shared decision-making, but clinical evidence for a beneficial effect has yet to be provided.

Other information

The work of AR and CK was funded by the Deutsche Gesellschaft of Allgemein- und Viszeralchirurgie (DGAV).