Introduction

Colorectal cancer (CRC) is a common cancer and one of the leading cancer-related causes of death worldwide, accounting for approximately 35,000 new cases and over 16,000 deaths per year [1], with the majority of patients undergoing potentially curative surgery. In this context, the accurate evaluation of perioperative risk is crucial in order to facilitate the shared decision-making and informed consent processes regarding patients undergoing surgery and to enhance the quality of clinical practice during the perioperative pathway. Moreover, the implementation of an accurate risk stratification tool enables the actual comparison of surgical outcomes among different healthcare providers for either service evaluation or clinical audit. Certain risk stratification tools have been introduced into clinical practice [2]. Risk stratification tools may be subdivided into risk scores and risk prediction models. Both are usually developed using multivariate analysis of risk factors for a specific outcome [2]. Despite, the increasing interest in more advanced risk prediction methods, the risk stratification models remain the most easily accessible choice for this purpose. Nonetheless, they are not commonly used in our daily clinical practice, possibly due to poor awareness amongst clinicians, as well as, concerns regarding their complexity and accuracy [3].

The surgical outcome risk tool (SORT) was developed following the 2011 National Confidential Enquiry into Patient Outcome and Death (NCEPOD) report, in order to provide enhanced identification of high-risk surgical patients in a more feasible manner [3]. To achieve this goal, the SORT model uses only six routinely collected data items, designed to predict patient’s probability of 30-day postoperative mortality. Currently, it has been compared favorably with other previously validated risk stratification tools, such as the ASA physical status (ASA PS) grade, and has been externally validated in groups of patients undergoing hip fracture surgery [4], along with hepatectomy [5]. In both groups [4, 5] SORT was associated with an acceptable discrimination level [AUC: 0.70 (0.66–0.74) and 0.822 (0.728–0.916), respectively], but showed low calibration traits. However, it has not been validated for a colorectal cancer surgical population, yet. The purpose of the present study was to validate the SORT model in Greek adult patients undergoing surgery for colorectal cancer, along with performing subgroup sensitivity analysis.

Methods

Data extraction

The present study was conducted under the protocol agreed by all authors. Data were obtained from a prospectively maintained database of consecutive patients undergoing surgery for colorectal cancer between January 1st 2011 and December 31st 2019. All the procedures were performed by the same surgical team leading by the senior author (GT) at the Department of Surgery, University Hospital of Larissa, Greece. The choice between open or laparoscopic approach was depended either upon patient’s preference (many people in Greece believe that laparoscopic colectomy is still somewhat “experimental”) or logistics (time/theatre space, availability of disposables, etc.). Ethical approval was obtained by the Scientific Committee of the University Hospital of Larissa (Protocol number: 33606/16-07-19). Informed consent was not necessary due to the retrospective nature of the present study.

Data on age, gender, surgical approach (laparoscopic or open), American Society of Anesthesiology (ASA) grade, operative priority, surgical severity, malignancy status, staging and type of procedure were prospectively collected. Mortality was defined as any death that occurred during the first 30 days or within the index hospital admission if longer than 30 days. The predicted risk of mortality was determined using SORT model. In addition, we calculated predicted mortality using POSSUM, P-POSSUM, CR-POSSUM, and ACPGBI CRC for all patients. Patients with incomplete data were excluded from analysis.

A separate sensitivity analysis was performed to determine the accuracy of each model to predict perioperative mortality in patients undergoing surgery for colorectal cancer based on (1) procedure-related variables: surgical approach (laparoscopic vs open), operative priority (elective vs acute), (2) a cancer-related variable: cancer site (colon vs rectum), and (3) a patient-related variable: age (≥ 80 vs < 80). We used these variables because they might affect the perioperative mortality.

Endpoints

The primary endpoint of the study was defined as the validation of SORT model in Greek adult patients undergoing surgery for colorectal cancer. Secondary endpoints included (1) the comparison of SORT with POSSUM, P-POSSUM, CR-POSSUM, and ACPGBI CRC models regarding their accuracy in predicting perioperative mortality and (2) subgroup sensitivity analysis.

Statistical analysis

The calculation of the SORT score was performed by employing the method and the web-based calculator developed by Protopappa et al. [3]. The SORT model implements the following variables: ASA physical status (PS), operative priority (elective, urgent, immediate), surgical specialties (gastrointestinal, thoracic, or vascular surgery), surgical severity (major / complex); malignancy status, and age (65–79 or ≥ 80 years). POSSUM, P-POSSUM, CR-POSSUM, and ACPGBI scores were calculated using the method described by Copland et al. [6], Prytherch et al. [7], Tekkis et al. [8], and Ferjani et al. [9], respectively.

We assessed the discrimination (i.e., the ability to separate those who did from those who did not die) and calibration (i.e., the ability to predict mortality rates in agreement with actual observed mortality rates) of the SORT model. Discrimination was assessed by generating receiver-operating characteristic (ROC) curves and by calculating the area under the ROC curve (AUC). The AUC was determined by calculating the 95% confidence intervals and compared using nonparametric paired tests, as described by DeLong et al. [10]. We defined as poor, fair and excellent model discrimination the AUC of < 0.70, 0.70–0.79 and 0.80–1.00, respectively [10].

The calibration regarding each model was calculated by estimating the predicted mortality (expected) and then compared with the true mortality (observed). The observed/expected ratio of 1 represents perfect accuracy, a ratio < 1 indicates overprediction of mortality rate, and a ratio of > 1 indicates underestimation. Calibration was further evaluated using the Hosmer–Lemeshow (H–L) goodness of fit test, defining a lack of fit as a p value ≤ 0.05 [11]. Finally, Chi-squared testing was used to compare the observed and expected outcome of all patients.

All data were analyzed using Microsoft® Excel 2019 (Microsoft, Redmond, Washington, USA) and Prism® Graphpad 8.4 for Mac (GraphPad Software, San Diego, CA).

Results

Baseline characteristics

We report our outcomes according to The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines [12]. The Trial Flow regarding data extraction strategy is presented in Fig. 1. A total of 526 patients were included. Patients’ baseline characteristics are shown in Table 1. One hundred ninety-three patients (36.7%) were females with a mean age of 69.75 (Standard Deviation—SD: 10.46) years. The majority of patients presented stage I/II cancer (69.4%) and underwent an elective procedure (88.2%). The tumor was located in the colon in 343 (65.2%) patients. A total of 277 (52.7%) patients underwent open surgery and 249 (47.3%) laparoscopy. Finally, the overall mortality rate was 1.9%.

Fig. 1
figure 1

Trial flowchart

Table 1 Patient baseline characteristics

Performance of SORT model in the entire dataset

As demonstrated in Table 2 and Fig. 2, SORT was associated with a high level of discrimination in the total study population [AUC: 0.81 (95% CI: 0.68–0.94); p = 0.001]. Furthermore, SORT presented the lowest Hosmer–Lemeshow value (H–L: 2.82; p = 0.83), thus providing the best performing calibration of all models in the entire dataset analysis. Nonetheless, SORT underestimated mortality determined by observed/expected ratios of > 1.

Table 2 Discrimination and calibration of the studied scores for predicting mortality in colorectal patients
Fig. 2
figure 2

ROC Curves regarding the discrimination of each model in the total study population

Comparison of SORT with other mortality prediction models in the entire dataset

P-POSSUM was also associated with a high discrimination level [AUC:0.85 (95% CI: 0.76–0.94); p = 0.005]. POSSUM, CR-POSSUM and ACPGBI CRC models demonstrated a fair discrimination level (Table 2, Fig. 2). While SORT presented the best performing calibration, P-POSSUM demonstrated the worst performing calibration traits (H–L: 8.29; p = 0.41) (Table 2, Fig. 2).

Performance of mortality prediction models in subgroups

Subgroup analysis outcomes are provided in Table 3. SORT model demonstrated high discrimination predicting perioperative mortality in patients undergoing (1) open surgery, (2) emergency/acute surgery and (3) in cases with colon cancer (Fig. 3). In all other subgroups SORT was associated with fair discrimination attributes. Moreover, SORT demonstrated a high level of calibration in all subgroups, with the lowest value observed in patients undergoing open surgery. In fact, ACPGBI CRC model provided the best performing discrimination [AUC: 0.96 (0.91–1.00); p < 0.001] and calibration (H–L: 2.82; p = 0.396) for patients undergoing open surgery. In addition, it showed the highest discrimination level in patients undergoing acute surgery. In contrast, the accuracy of ACPGBI CRC to predict perioperative mortality was poor to fair in all other subgroups. P-POSSUM provided the best performing discrimination [AUC: 0.91 (95% CI: 0.80–1.00); p = 0.002] and calibration (H–L: 1.59; p = 0.979) in patients undergoing laparoscopic surgery. POSSUM and CR-POSSUM were associated with the worst discrimination-calibration balance in all subgroups. All models underpredicted perioperative mortality in all subgroups.

Table 3 Discrimination and calibration of the studied scores for predicting mortality in colorectal surgical subpopulations
Fig. 3
figure 3

ROC Curves regarding the discrimination of SORT in each study subgroup

Discussion

The present study is the first to evaluate the validity of SORT model in (1) CRC, (2) non-UK patients undergoing surgery, (3) compare it with other risk models and (4) performing sensitivity subgroup analysis. The outcomes resulted from the current study might have a direct impact in clinical practice, suggesting a possible role of SORT in the perioperative pathway and during the shared decision-making process of CRC patients.

The SORT scoring system is a useful tool proposed by Protopapa et al. [3], to predict 30-day postoperative mortality. The study outcomes showed that six preoperatively available factors efficiently and effectively predicted postoperative mortality with a higher accuracy compared to other traditional risk assessment tools, such as ASA-PS [3]. Other risk stratification models that have been used in clinical practice and were included for comparison in the current study are POSSUM, P-POSSUM, CR-POSSUM, and ACPGBI CRC. Since, both patients and clinicians have implemented these tools in the counseling process, it was crucial to compare them with SORT. In addition, according to a recent study [13], POSSUM, P-POSSUM, CR-POSSUM and ACPGBI CRC were associated with poor accuracy in the setting of CRC surgery. The same study concluded that new models are required based on prospectively collected data [13]. Our outcomes provided an answer to this call. In fact, SORT demonstrated the best performing discrimination and calibration compared with all other risk stratification models assessed in the present study. All models underpredicted mortality. In this context, the study’s outcomes have significant implications during counseling CRC patients regarding the perioperative mortality risk in order to decide the treatment strategy.

The feasibility of SORT was also proved in the sensitivity subgroup analysis. SORT was associated with fair to excellent discrimination and improved calibration. Nonetheless, we should stress our comparative outcomes regarding two subgroups with direct clinical impact: ACPGBI CRC demonstrated the best performing discrimination and calibration values for patients undergoing open surgery, while, P-POSSUM demonstrated the best performing discrimination and calibration values for patients having laparoscopic surgery. This observation might be of great importance in daily practice, during the counseling and shared-decision making process regarding the optimal surgical approach (laparoscopic or open) for a given patient. In addition, SORT was associated with a moderate-high level of discrimination in patients aged ≥ 80 years. In fact, this observation is clinically important, since age and preoperative frailty are associated with postoperative morbidity in patients undergoing surgery for colorectal cancer [14, 15]. Furthermore, the findings of the present study regarding the value of clinical variables employed by SORT, are in accordance with evidence provided by administrative datasets [16]. Besides, according to our outcomes SORT presents higher accuracy compared with other preoperative (Barwon Health 2009–BH 2009) [17], along with intraoperative risk stratification models (Surgical Apgar Score) [18], while remaining friendly-to-use, since it implements only six preoperative variables.

ACPGBI CRC has been previously validated in CRC surgical patients [19] and demonstrated poor predicting power. Our findings are in accordance with this study [19], with the exception of open surgical approach subgroup. Even though, P-POSSUM has been extensively validated [2], the SORT has a number of advantages over it. First of all, SORT incorporates only six preoperative variables, compared with eighteen perioperative variables of P-POSSUM, thus being significantly easier to be implemented in the daily clinical practice. Secondly, P-POSSUM includes intra- and postoperative variables that are not available during the preoperative assessment. Finally, P-POSSUM contains subjective variables, thus increasing the interobserver variability and heterogeneity and posing a certain bias in its predicting accuracy.

A certain limitation of the present study has to do with the design, as it is a single-institution, retrospective study. However, the data are prospectively collected, the patients are consecutive, the surgical team is the same and the surgeon’s bias regarding patient/approach selection was minimized, as this was mostly depending on patients’ preference and/or logistics. Another potential bias is the belief of many patients in Greece that laparoscopic colectomy is still performed at an “experimental” level. To reduce this bias there has been an extensive counseling with each patient in order to choose the desired surgical strategy through a shared decision-making process.

The current outcomes demonstrate that SORT is an easy, feasible and efficient risk stratification tool that should be implemented in the preoperative counseling and shared decision-making process of CRC patients.

Conclusion

In this study we validated the SORT risk stratification model in Greek adult patients undergoing surgery for colorectal cancer. SORT demonstrated the best performing discrimination and calibration compared with POSSUM, P-POSSUM, CR-POSSUM, and ACPGBI CRC. The value of SORT was further confirmed by sensitivity subgroup analysis. SORT is a feasible and efficient risk stratification tool that could be implemented in the perioperative pathway of CRC patients.