Introduction

Complex pelvic anatomy, difficult disease processes including cancer and inflammatory bowel disease, cases that are clean-contaminated at best, and anastomoses within unfavorable geometry all contribute to high complication rates in rectal surgery. Single institution series and our own previous work with the National Surgical Quality Improvement Project (NSQIP) database showed that the rate of 30-day complications after rectal surgery approaches 40%.1,2,3,4,5 Despite these high observed complication rates, earlier risk adjustment models supplied with NSQIP data predicted a complication rate of just 23% for rectal surgery.4 The large discrepancy between NSQIP-predicted and actual complication rates spurred development of the Iowa Rectal Surgery Risk Calculator, which provided more accurate risk estimates for these operations.4,5

To develop the Original Iowa Rectal Surgery Risk Calculator (OIRC), NSQIP data were analyzed to identify factors associated with 30-day complications in proctectomy. From these, 17 factors available prior to operation and strongly associated with complications were included in a multivariable logistic regression model. The model was trained using 2010–2011 data and back-validated on 2005–2009 data. The OIRC performed significantly better than the NSQIP-supplied General/Vascular Surgery Morbidity Model (NSQIP-Model) in terms of both discrimination and calibration using 2005–2009 back-validation data (C-statistic of 0.637 and percent deviance of − 1.7%, compared to 0.621 and − 48.4% for the NSQIP-Model).

Despite this strong performance, deficiencies remained in the OIRC. At the time, no new data were available for the prospective validation recommended for clinical risk estimation tools.6 This forced a back-validation with older data. In the older dataset, a problem with recording bleeding complications necessitated imputation for this data point, potentially introducing error.4 Additionally, the OIRC did not provide risk estimates for low anterior resection (LAR), which represents a significant proportion of rectal surgery in the USA. Finally, it was not possible to compare the OIRC’s performance to the then-newly-released American College of Surgeons Online Surgical Risk Calculator (ACS-Calculator).7 Therefore, the present study sought to expand applicability of the Iowa Calculator to LAR, to provide a full prospective validation, and to determine whether the Iowa Calculator provides superior risk estimation compared to the ACS-Calculator.

Methods

Patients and Variables

Data were obtained from ACS-NSQIP Participant-Use-Data-Files for 2010–2015.8 Inclusion and exclusion criteria, variables, and procedure categorization matched those described for the OIRC with LAR procedures added (n = 65,683, Table 1).4,5 To ensure a fair comparison between the ACS-Calculator and the UIRC, records missing data such that a risk score could not be calculated by both models were excluded (n = 282 in 2010–2011 dataset). Pre- and post-operative variables included in the original study but no longer recorded in NSQIP were coded as not present in more recent patients when missing. Morbidity was defined as in the original model as death, reoperation, or any NSQIP-recorded complication within 30 days of surgery, matching definitions in the ACS-Calculator.4,8,9 Significance in comparisons of continuous variables was assessed by Welch’s t test and with Wilcoxon-rank-sum test for length of stay. Chi-squared or Fisher-exact tests assessed categorical variables. The Benjamini-Hochberg false-discovery-rate adjustment was used to correct P values for multiple comparisons when appropriate.10 This study was Institutional Review Board-exempt.

Table 1 Included CPT codes, procedure groups, sample sizes, and percentages from 2010–2011 to 2012–2015 (n = 65,683). Note: included proctectomy case numbers from 2010 to 2011 differ slightly from those previously reported due to exclusion of records for which an ACS Online Risk Calculator prediction could not be calculated because of missing data.5

Models and Predictions

The OIRC is based on a multivariable logistic regression model described previously and was not altered for validation.4 After adding LAR procedures to the 2010–2011 training dataset, the model was refitted using the same predictor variables to create the UIRC. LAR procedures were coded as laparoscopic or open without considering whether a stoma was created (Supplement). Risk predictions for the 2012–2015 validation dataset were generated using both Original and Updated Calculators. NSQIP-Model predictions were obtained from NSQIP (MORBPROB in Participant-Use-Data-Files). To obtain ACS-Calculator any-morbidity predictions, a custom web-script was developed which submitted data from each NSQIP entry for all 2012–2015 patients and recorded the ACS-Calculator prediction.7,9 ACS-Calculator predictions were obtained in February–March, 2017.

Model Analysis

Models were compared in terms of discrimination using the C-statistic with DeLong’s test for significance.11,12 Percent deviance of observed to expected rates of complications assessed calibration, and the Brier Skill Score was used to measure combined discrimination and calibration.13,14 Statistical analyses used R v.3.4.0 (Vienna, Austria).

Results

Patient Characteristics

Included were 65,683 patients in 2010–2015. The predominant procedure in 2012–2015 was laparoscopic LAR (41.6% of operations), followed by LAR (25.8%), and abdominoperineal resection (APR, 7.8%) (Table 1). Preoperative patient characteristics and risk factors were first compared in the original, non-LAR proctectomy cohort (Table 2, left columns). This revealed several differences, which, while statistically significant, were small in absolute terms. Median age increased by 1 year to 57.0, median body mass index (BMI) increased by 0.3 kg/m2, and hematocrit increased by 0.4% (P < 0.01 for all). Larger and potentially more clinically meaningful differences between 2010–2011 and 2012–2015 time periods were observed in the proportions of patients treated laparoscopically (27.3 to 35.0%) and those with disseminated cancer (4.8 to 8.1%, P < 0.001 for both). Notably, NSQIP stopped recording a history of cerebrovascular accident or radiation therapy, and these declined sharply such that no patient was reported to have either after 2012. Proportions of female sex, diabetes, smoking, COPD, antihypertensive use, large weight loss, and preoperative leukocyte count did not significantly change between the two time periods among non-LAR proctectomy patients.

Table 2 Patient characteristics and preoperative risk factors using original (without LAR) inclusion criteria, and with LAR included. P values represent comparisons between time periods

Compared to the non-LAR proctectomy cohort, patients undergoing LAR in 2012–2015 were significantly older (median 61.0 vs. 57.0 years), more likely to be female (50.5 vs. 43.6%), less likely to be on preoperative steroids or immunosuppressants (3.9 vs. 14.3%), and more frequently had benign postoperative diagnoses (51.7 vs. 13.7%), and laparoscopic operations (61.7 vs. 35.0%, P < 0.001 for all, Table S1). Due to these differences and the large number of LAR patients influencing the composition of the overall patient population, both the original non-LAR proctectomy cohort and the group with LAR included are presented separately (Table 2, right columns).

Observed Morbidity

Patients undergoing non-LAR proctectomies had fewer complications in 2012–2015 compared to 2010–2011 (overall morbidity rate 37.0 vs. 40.4%, P < 0.001, Table 3). Contributing to lower morbidity were small decreases in most types of postoperative complications and significant decreases in rates of superficial wound infections (9.2 to 7.4%), urinary tract infections (5.7 to 4.4%), and prolonged ventilation (1.6 to 1.1%), while deep wound infections increased (2.6 to 3.2%, P < 0.05 for all). Rates of reoperation and readmission were similar between the time periods, while median length of stay decreased to 6.0 from 7.0 days (P < 0.001).

Table 3 Selected 30-day postoperative outcomes in cohorts with and without LAR. P values represent comparisons between time periods

Low anterior resection was associated with significantly fewer adverse 30-day surgical outcomes as compared to other proctectomy procedures (Table S2). The rate of having any complication after LAR was 22.4 versus 37.0% with other proctectomies in 2012–2015. The individual complication demonstrating the largest difference was postoperative bleeding (7.2 vs. 16.4%), and significantly lower rates of most individual complications were observed for LAR. Readmissions following LAR occurred in only 10.1% of patients, versus 17.2% after non-LAR proctectomy, and median length of stay was 5.0 days, a full day shorter than other proctectomies. The postoperative mortality rate was similar in LAR and other proctectomies at about 1%. Thus, after inclusion of LAR (Table 3, right columns), overall morbidity in 2012–2015 was only 27.2%. In this complete cohort, significant decreases were also observed between the time periods in postoperative bleeding, reintubation, sepsis, stroke, and death (P < 0.05 for all). Readmission and length of stay likewise decreased, and the increase in deep wound infections seen in non-LAR proctectomies was not observed. As noted during development of the Original Iowa Calculator, morbidity varied widely between different procedures,4 from 79.0% in pelvic exenteration to only 16.0% in laparoscopic LAR (Table S3).

Prospective Validation of the OIRC

Hypothesizing that valid risk estimation tools should accommodate differences in preoperative risk factors while still returning accurate predictions for postoperative outcomes, the OIRC was used to generate risk estimates for all 2012–2015 non-LAR proctectomy patients. Risk estimates were also obtained from the NSQIP-Model and the ACS-Calculator (Table 4). Model calibration was first examined using the percent deviance of predicted to observed morbidity. Of the risk models examined, the OIRC’s prediction of a 40.4% overall complication rate was closest to the observed rate of 37.0%, with a percent deviance of predicted-to-actual of + 9.2%. As it had for 2005–2011 data,4 the NSQIP-Model predicted a morbidity rate lower than was observed (20.0% predicted, − 45.9% deviance). The ACS-Calculator performed better, predicting a 25.2% complication rate for − 31.9% deviance. Each model provided reasonable discrimination of higher versus lower risk patients. The ACS-Calculator returned a C-statistic of 0.653 (95% confidence interval, CI 0.645–0.662), while the NSQIP-model reached 0.671 (CI 0.663–0.680). The OIRC provided the best performance of the three with a C-statistic of 0.676 (CI 0.667–0.684). This exceeded its C-statistic of 0.660 in the 2010–2011 training dataset and 0.637 in the 2005–2009 back-validation.4

Table 4 Model performance in the full 2012–2015 validation cohort (n = 51,344)

Brier Skill Scores (BSS) for each model were calculated to combine measures of discrimination and calibration. Using this metric, a score of 0 indicates predictive skill equal to assigning the average complication rate to all patients. Numbers greater than 0 indicate better performance, while scores less than zero correspond to predictive skill worse than guessing the average. By this measure, the OIRC performed well with BSS of + 0.089, as compared to − 0.047 for the NSQIP-Model and + 0.005 for the ACS-Calculator. When plotted as a density function, the OIRC predictions show a wide range of risk estimates roughly centered around the true complication rate, whereas most predictions of the other two tools cluster below the observed complication rate (Fig. 1). From this, we conclude that the OIRC is valid for non-LAR proctectomy risk estimation.

Fig. 1
figure 1

Density plot showing the distribution of 2012–2015 predicted morbidity probabilities (x-axis) in 16,722 non-LAR proctectomy patients. The Original Iowa Calculator (dashed and dotted blue line) includes a wider range of predictions and centers closer to the true 37.0% morbidity rate (vertical orange line), than the predictions of the ACS-Calculator (solid red line) or NSQIP morbidity model (dashed green line)

Inclusion of LAR

We next hypothesized that the same predictors of complications in the OIRC would accurately predict risk in LAR. To test this, LAR procedures from 2010 to 2011 (n = 9,051) were included with the original training dataset, and the model was regenerated to produce the Updated Iowa Rectal Surgery Risk Calculator. The UIRC performed well in the training dataset, predicting a mean complication rate equal to that observed (31.2%, 0% deviance), achieved a C-statistic of 0.696 (CI 0.687–0.705), and demonstrated a Brier Skill Score of + 0.107.

It remained possible that inclusion of a large number of LAR procedures might cause the lower morbidity profile of LAR to overpower other proctectomy procedures for which the calculator was initially developed. Examination of the Original and Updated Calculators’ Odds Ratios (OR) revealed that incorporating LAR resulted in only small changes in ORs for many predictors, but larger changes for some (Table S4). The OR for preoperative hematocrit remained 0.94, as it was in the OIRC, for example, but the OR for history of stroke decreased from 2.01 in the OIRC to 1.58 in the Updated Calculator. Odds ratios of the procedure term for LAR and laparoscopic LAR were 0.29 and 0.52 (compared to the reference of 1.00 for APR). Much of the different risk profile of LAR might therefore be accounted for by this procedure term, but to verify that changes in the UIRC’s odds ratios for other risk factors would not adversely affect its performance in non-LAR proctectomies, predictions from the Original and Updated Calculators in 2010–2011 non-LAR proctectomies were compared. This revealed similar performance by C-statistic (0.657 with UIRC vs. 0.659 by OIRC, P = 0.09) and percent deviance (0% for both), indicating that the UIRC performed well in non-LAR proctectomy patients.

Prospective Validation of UIRC

As models are expected to perform well when tested against the same data used to develop them, we next performed a prospective validation of the Updated Iowa Risk Calculator using independent 2012–2015 proctectomy data including LARs (n = 51,344). In the complete prospective validation dataset, the UIRC was well-calibrated, predicting an average complication risk of 29.3%, a + 7.9% deviance from the actual complication rate of 27.2%. Discrimination of higher and lower risk patients was better in the validation than training dataset, with a C-statistic of 0.706 (CI 0.701–0.711). The Brier Skill Score was + 0.114.

In the validation cohort, the Original and Updated Iowa Calculators’ performance was examined in LAR and non-LAR subgroups. In the subgroup of 2012–2015 non-LAR proctectomies, the UIRC performed better than the OIRC in terms of discrimination (C-statistic 0.679 vs. 0.676, P < 0.001) and calibration (+ 8.5 vs. + 9.2% deviance). The distribution of predicted risks was similar between the two models and centered around the observed non-LAR proctectomy morbidity risk of 37.0% (Figure S1). When considering only LARs in 2012–2015, the Updated Calculator returned a C-statistic and percent deviance of 0.689 and + 7.5%. These results show that broadening the scope of the model to include LAR did not compromise performance for non-LAR proctectomies.

Comparison to Other Prediction Tools

In the complete 2012–2015 cohort, the NSQIP General/Vascular Surgery Morbidity Model underestimated risk, predicting an average morbidity of 16.1%, with deviance of − 40.8% (Table 4). The ACS-Calculator was closer, predicting 19.5% morbidity, which deviated from the true complication rate by − 28.1%. Both performed well in discriminating patients at greater and lesser risk with C-statistics of 0.703 and 0.693, respectively. Despite small absolute differences, each C-statistic was significantly different from the others by DeLong’s test (adjusted P < 0.05 for all between-group comparisons). The NSQIP-Model and ACS-Calculator both recorded Brier Skill Scores above zero. In each of these metrics, the UIRC’s performance exceeded the other risk prediction tools. Examination of the probability predictions’ density plots again showed most predictions from the NSQIP-Model and ACS-Calculator fell to the left of the true 27.2% morbidity rate, while the UIRC predictions distributed more evenly above and below this level (Fig. 2).

Fig. 2
figure 2

Density plot showing the distribution of all 2012–2015 (n = 51,344 with LAR included) proctectomy morbidity predictions (x-axis) by the Updated Iowa Calculator (dashed and dotted blue line), ACS-Calculator (solid red line), and NSQIP morbidity model (dashed green line). The Updated Iowa Risk Calculator predictions more closely approximate the true 27.2% morbidity rate (vertical orange line), while the other models’ predictions skew to the left of the actual complication probability

To determine how calibration varied across procedures, predictions by the NSQIP-Model, ACS-Calculator, and UIRC were compared by procedure group. Of the 18 included procedure types, the UIRC returned average predictions within 20% of actual morbidity for 13 of 18 procedure types, representing 93% of total patients (Figure S2A-D). ACS-Calculator predictions fell within 20% of actual morbidity for 3 of 18 procedure categories, representing 44% of patients. The NSQIP-model’s predictions were below the − 20% bound for all procedure groups.

Calibration of each set of predictions was examined graphically by ordering the predicted morbidity probabilities and comparing the true and predicted morbidity rates of 10 approximately equally sized groups (Fig. 3). By this analysis, the UIRC showed excellent calibration with mean predicted morbidity in all 10 subgroups falling within 20% of those groups’ true complication rate. For the ACS-Calculator, 2/10 groups were within this limit and 0/10 NSQIP-Model prediction groups were well-calibrated. The Original and Updated Iowa Risk Calculators, including their complete risk prediction equations are available online at http://apps.cromwell.lab.uiowa.edu/calculator/rectalcalc.html.

Fig. 3
figure 3

Calibration plot of 2012–2015 (n = 51,344) actual versus mean predicted morbidity for the Updated Iowa Risk Calculator (blue circles), the ACS-calculator (red triangles), and the NSQIP-Model (green diamonds). For each risk tool, predictions were ordered and then divided into 9 groups of 5000 morbidity predictions (with the 10th containing 6344). For each group, the actual complication rate (x-axis) is plotted against the mean predicted morbidity (y-axis). Bars show standard deviation of predicted morbidity for each group. Perfect prediction is indicated by the 45° black line, while the dashed black lines show ± 20%. All 10 UIRC groups fall within this 20% error bound. The ACS-Calculator placed 2/10 groups within the bound, while all 10 NSQIP-Model prediction groups fell outside of this limit

Discussion

Despite the known tendency of clinical risk models to perform better when tested against the same data used to develop them, systematic reviews of published risk calculators conclude that for most, external validation is reported inappropriately or not at all.6,15,16 Serious limitations in the development and testing of clinical risk prediction models have led expert consensus panels to conclude that most reported risk calculators “should not be used,” and to advance standards for new prediction tools.6,17 While adhering to these guidelines, the present study found that the OIRC returned valid morbidity estimates for patients undergoing proctectomy procedures in 2012–2015. Using the same predictor variables for 2010–2011 patients, an Updated Calculator also performed well on prospective validation with independent 2012–2015 data, providing better risk discrimination than in the training dataset with good calibration. Risk estimates from both the OIRC and UIRC were more accurate for their respective populations than the estimates given by the NSQIP-Model and the ACS Online Risk Calculator. The Iowa Rectal Surgery Risk Calculators therefore provide clinically meaningful risk estimates to inform actual clinical practice.

The UIRC extends the scope of the OIRC to include LAR patients. Leaving out LAR from the OIRC despite it representing the most frequently performed proctectomy procedure was not specifically intended. Rather, OIRC-included procedures were those NSQIP placed in its “proctectomy basket,” whereas NSQIP grouped LAR with colectomies.4 The lower LAR complication rate (25.9% in 2010–2011), which more closely matches that of colectomy (24.4% reported in 2009–2012),7 as compared to non-LAR proctectomy (40.4% in 2010–11), seems to justify such a grouping and exclusion of LAR from a higher-risk proctectomy calculator. Further justification comes from the nature of operations constituting LAR. NSQIP does not record how much rectum is removed in an LAR. Rather, application of an LAR CPT code requires only that a coloproctostomy be performed. Therefore, procedures coded as LAR range from definite proctectomies such as total mesorectal excision with ultralow anastomosis, to procedures more closely resembling colectomies, such as sigmoidectomy for diverticulitis with high coloproctostomy. That 33% of LARs in the present 2012–2015 cohort had “diverticulitis” as the primary diagnosis suggests that many NSQIP LARs are the latter. An LAR case-mix that skews towards such operations, with less-extensive pelvic dissection and more favorable anastomotic geometry, could explain the overall lower LAR complication rate and more colectomy-like risk profile. Still, because many LARs involve significant rectal resection and because including LAR in the model did not degrade its performance in non-LAR proctectomy, we believe that inclusion of LAR in the UIRC represents an improvement over the Original Calculator.

Differences existed in preoperative characteristics and patient outcomes between the 2010–2011 and 2012–2015 time periods. Many of these were statistically significant due more to large cohort size than actual clinically meaningful differences. Larger preoperative demographic changes worth noting include higher rates of disseminated cancer (from 4.8 to 8.1% of patients in non-LAR proctectomies, a 69% relative increase). This could reflect either greater efforts to ensure charting of a strong predictor of higher complication rates for risk adjustment purposes, or greater willingness of surgeons to operate on patients with metastatic rectal cancer. In a review of 1600 hepatectomy patients with colorectal liver metastases, including 561 with rectal primaries, recent median disease-specific survival improved to 64 months from 43 months in the era prior to modern chemotherapy.18 Indeed, more effective chemotherapy and strong survival results for hepatic metastastectomy support that selected patients with metastatic rectal cancer should receive operations.

Another important change in preoperative factors was the decrease in recorded history of stroke or radiation due to NSQIP no longer recording these variables. A valid model accounts for variation in patient characteristics, but when predictors are missing entirely, factors distinguishing higher and lower risk individuals are lost. Stroke strongly predicted complications in both the OIRC and UIRC, and omission of this data point, which required treating it as “not present” in later patients, decreases the calculators’ predictive abilities. Few patients (1% in 2010–2011) had a history of stroke, mitigating this impact. Potentially more important was loss of information about radiation treatment, which correlated with complication risk in the OIRC,4 and which many rectal cancer patients receive (14.1% of all non-LAR proctectomy patients in 2010–2011). Still, despite these changes in data collection, the OIRC and UIRC still provided excellent discrimination and calibration in morbidity prediction.

Overall complications showed a substantial relative decrease of approximately 9% in both non-LAR and LAR-included cohorts between 2010–2011 and 2012–2015. Lower rates of superficial wound infections (SSI), urinary tract infection (UTI), prolonged ventilation, and sepsis led this trend. The influence of colorectal surgery subspecialty training and surgeon case volumes on these outcomes cannot be determined from these data as NSQIP does not record these factors (Supplement). While NSQIP data do not prove exact causes for this change, large-scale efforts targeting exactly these complications have been implemented during the years included in this study, as payors increasingly refuse payment for hospital-acquired infections such as SSI, catheter-associated UTI, and pneumonia.19,20,21 Increased adherence to recommendations for preventing SSI, UTI, and DVT in colorectal operations has been shown to produce detectable reductions in these complications.22 Improved administration of preoperative antibiotics, resurgence in the use of bowel preparation, and early urinary catheter removal have each been associated with lower morbidity and potentially positively impacted these complications during the study period.20,23,24,25 Relative contributions of different aspects of prevention to falling complication rates deserve further study, but irrespective of precise causes, that SSI decreased from 11.0% in 2005–2009 non-LAR proctectomy,4 to 9.2% in 2010–2011, to 7.4% in 2012–2015 suggests real progress in reducing proctectomy morbidity.

To assess the strength of different prediction models for estimating proctectomy risk, this study evaluated discrimination and calibration. Discrimination reflects each model’s ability to accurately rank individuals at higher and lower risk. Calibration describes agreement between the level of risk predicted to that observed.4,6 As commented upon previously,4 the C-statistic accurately evaluates models’ discrimination, but its dependence on the magnitude and dispersion of underlying risk in the evaluated cohort complicates interpretation.17,26,27 The UIRC illustrates this. In the complete 2012–2015 cohort, the UIRC recorded a much higher C-statistic than in either the LAR or non-LAR subpopulations (0.706 vs. 0.689 and 0.676). Due to its lower inherent morbidity, adding LAR to non-LAR proctectomies broadens the cohort’s risk distribution. Most LARs will then be correctly assigned lower risk than non-LARs, and the C-statistic improves compared to when considering the more closely clustered subpopulation predictions. The true event rate also affects the maximum achievable C-statistic, and calculators for rare events, such as mortality, therefore routinely show near-perfect C-statistics, whereas more common events pose greater difficulties for discrimination.4,7,26 In light of the observed 2012–2015 morbidity of 27.7%, discrimination by all evaluated models in the 2012–2015 complete cohort was strong. The ACS-Calculator C-statistic for 2012–2015 non-LAR proctectomy of 0.653 closely matched the 0.656 reported for the ACS-Calculator in 2010–2014 proctectomy data.28

Of the calculators, the UIRC performed best with a C-statistic of 0.706.

With good discrimination across the board, there were greater differences in calibration, with the Iowa Calculators standing out against the other models. As in our previous study, we elected to forgo Hosmer-Lemeshow (HL) testing as this test is designed for smaller sample sizes, and is difficult to interpret in a cohort of > 50,000 records.4,17,29,30 We instead present prediction density diagrams (Figs. 1 and 2) and calibration plots showing the agreement of predicted to actual results in risk subgroups (Fig. 3). These graphics provide more information and support easy comparison of the different models.27 While the other risk tools showed good discrimination and linear increases in risk across calibration subgroups, deviance in predicted to observed morbidity for these groups was at best − 12.5% and more than − 25% for most.

Assessment of a risk calculator’s ability depends on its intended application, especially in light of trade-offs between discrimination and calibration.26 A calculator intended for risk adjustment across large patient groups needs not be well-calibrated. If it correctly discriminates higher from lower risk patients, risk adjustment may be performed based on patients’ risk relative to others without worrying about the magnitude of predicted risk. However, if a risk calculator is to support informed consent and influence treatment decisions for individuals, such as whether to have surgery at all, whether to undergo a simpler or more complex operation, or whether to undergo neoadjuvant as opposed to adjuvant chemotherapy based on the likelihood of postoperative complications,31 it is essential that the calculator be well-calibrated. In this application, the UIRC provides the most accurate morbidity estimates of tested tools, which could help support patient and surgeon decisions.

Another advantage of the UIRC for proctectomy concerns accessibility. For both the OIRC and UIRC, the available online calculators completely specify the model using open-source code. The model can therefore be examined, tested, and applied to other data, and could be incorporated into electronic health records to provide automatic risk estimates for preoperative patients. Additionally, its conclusions can be tested by others in populations distinct from NSQIP to determine its external validity. The ACS-Calculator’s models remain proprietary and inaccessible for external validation or research.32,33

Applying the ACS-Calculator to large numbers of patients requires submission of each record through the ACS-Calculator website. As a result, external assessments of the ACS-Calculator’s performance have small sample sizes relative to the entire NSQIP population and often represent single-institution data (n = 95–1480 in 15 studies listed in Supplement). Many of these have questioned the ACS-Calculator’s morbidity predictions,27 and together suggest that individual risk calculators tailored to specific procedure types could outperform an omnibus calculator. These studies’ limited samples, however, make it difficult to determine whether the ACS-Calculator predictions were poorly-calibrated, or the included patients simply differed from the overall NSQIP risk profile.27,33

The present study overcame this limitation by evaluating NSQIP records representing the same inputs and endpoints as the ACS-Calculator. A web-script automated NSQIP record submission, allowing tens of thousands of ACS-Calculator predictions to be obtained. With this large and robust sample, our results likewise found poor ACS-Calculator calibration in a complex and high-risk class of procedures. A limitation of this conclusion is that since obtaining ACS-Calculator estimates, the ACS-Calculator has been updated with specific intent to improve calibration.28 How the current version differs from the one tested here remains unknown. Due to the addition of a feature on the website preventing automated record submission, we are unable to update our analysis. These results support further efforts to validate ACS-Calculator predictions for additional endpoints and surgical subgroups.

Conclusion

The OIRC and UIRC returned accurate proctectomy morbidity predictions when prospectively validated against independent 2012–2015 data. Discrimination and calibration of these predictions exceeded that of the NSQIP-Model and ACS-Calculator, which had good discrimination but significantly underestimated actual morbidity rates. The Iowa Calculators and underlying models are available at http://apps.cromwell.lab.uiowa.edu/calculator/rectalcalc.html and can be used to inform decisions about patient care.