Keywords

Introduction

Clinical scoring systems are frequently used for predicting outcomes in medicine. These scoring systems take into account that patient outcomes such as morbidity and mortality are seldom based on a single variable and generally are multi-factorial in nature. While physicians may have working knowledge of positive or negative influences on the outcomes, understanding the most important variables and assessing the weight of these variables in patients can be difficult. For the surgeon, challenges arise when determining who should be offered an operation and also when providing information regarding prognosis, especially when patients are perceived to possess both favorable and poor prognostic features. Beyond risk stratification, clinical scoring systems importantly facilitate communication between physicians and assist in patient selection for participation in clinical research. As such, clinical scoring systems have proved very useful for the evaluation of a wide variety of patients.

Commonly used systems include the Apgar score [1], the Glasgow coma scale [2], the Injury Severity Score [3], and the Acute Physiology and Chronic Health Evaluation (APACHE) scoring system [4]. Within surgery, scoring systems have been developed for a number of high-risk operations, including cardiac surgery [5], liver transplantation [6], and cytoreductive surgery with heated intraperitoneal chemotherapy [7]. As surgeons continue to push the limits of their technical ability, the question “can an operation be done?” shifts to “should an operation be done?”. Further questions are raised regarding who are the best patient candidates for trials of new methodologies and innovative therapies. Patients with colorectal cancer liver metastases openly lend themselves to this type of stratification, since a percentage of these patients will be cured with surgery, whereas a percentage will likely incur little or no benefit. A number of manuscripts have been published over the last two decades to facilitate the evaluation and care of these patients.

During this time frame, however, the standard of care has evolved, and now frequently includes neoadjuvant chemotherapy. This change in care has increased the number of surgical candidates and overall survival of these patients, requiring validation of previous scoring systems. Here we review 17 scoring systems for patients with colorectal liver metastases that are summarized in Tables 8.1, 8.2, 8.3, and 8.4. The discussion is divided into early scoring systems, prior to the widespread use of neoadjuvant chemotherapy, and contemporary systems that include patients treated after the year 2000, when modern neoadjuvant chemotherapy started being given more frequently. We then discuss comparison studies and conclude with how to best apply scoring systems to practice in the present day.

Table 8.1 Colorectal liver metastases scoring systems by institution and patient characteristics
Table 8.2 Colorectal liver metastasis scoring systems by publication and reported metrics
Table 8.3 Colorectal liver metastases scoring systems by common prognostic factors
Table 8.4 Additional prognostic factors considered in scoring systems for colorectal liver metastases prognosis

Early Scoring Systems

Nordlinger et al. derived one of the earliest scoring systems in 1996, using a French multi-institutional database including 1568 patients from 1968 to 1990. In this study, 95% of the patients included were treated after 1980 [8]. The authors observed a 2.3% perioperative mortality rate, and a 28% 5-year overall survival after liver resection. In addition, the authors used regression analysis and identified features significantly associated with disease-free survival and overall survival. They selected seven clinical criteria, each getting one point, given that each factor had a relative risk between 1.0 and 2.0 (Fig. 8.1a), which were dependent on age, stage of the primary tumor, disease-free interval, size and number of the liver metastases, and margin status.

Fig. 8.1
figure 1

Nordlinger et al. 1996 Clinical Scoring System. Scoring system variables (a) and overall survival curve demonstrating stratification between groups (b) in Nordlinger et al. 1996. (Reproduced with permission from Nordlinger et al. [8])

These patients were divided into low risk (scores 0–2), intermediate risk (scores 3–4), and high risk (scores 5–7). Two-year (instead of 5-year) survival rates were reported and were significantly higher in the low-risk (79%) versus the intermediate- (60%), and high-risk groups (43%) (Fig. 8.1b). Of note, CEA was not available for all patients in this study – when it was included in the model, age and size of largest tumor dropped out, and an alternative scoring system was proposed using CEA >5 μg/L or > 30 μg/L worth one or two additional points, to again make a total of seven possible points.

Fong et al. created a frequently cited scoring system in 1999, utilizing data of 1001 patients treated from 1985 to 1999 at the Memorial Sloan-Kettering Cancer Center [9]. In this study, there was a 2.8% perioperative mortality rate, and the 5-year survival after liver resection was 37%. The authors performed regression analyses to identify features that correlated with disease-free and overall survival, resulting in the selection of five clinical criteria, each getting one point (Fig. 8.2a): positive colorectal lymph nodes, disease-free interval < 12 months, number and size of liver tumors, and preoperative CEA level. When this scoring system was applied, the 5-year survival correlation coefficient was r2 = 0.92; patients with 0 points had a 5-year survival of 60%, compared to 14% for patients with 5/5 points (Fig. 8.2b). This score was superior to prediction based on a number of tumors alone and was considered more practical than the Nordlinger score by the authors, since all criteria could generally be known prior to surgery (excluding patients with planned synchronous resections).

Fig. 8.2
figure 2

Fong et al. 1999 Clinical Scoring System. Scoring system variables (a) and overall survival curve demonstrating stratification between groups (b) in Fong et al. 1999. (Reproduced with permission from Fong et al. [9])

Of note, Lee et al. (2008) created a scoring system based on 138 patients who had synchronous liver and colorectal resections from 1994 to 2005 [10]. Risk factors included in their scoring system, however, were similar to those identified in Fong et al. (1999) [9], including nodal status, number of liver tumors, preoperative CEA, and margin status and thus could not be easily applied preoperatively. Iwatsuki et al. (1999), Ueno et al. (2000), Lise et al. (2001), and Zakaria et al. (2007) also published scoring systems during this early time frame [11,12,13,14]. Additional risk factors for mortality in their scoring systems included >30% tumor invasion of the liver (relative risk 9.21) [13], bilobar tumors (relative risk 1.25) [11], serum alanine aminotransferase >55 U/L (relative risk 3.56) [13], hepatoduodenal lymph node positivity (hazard ratio 2.8) [14], and need for perioperative blood transfusion (hazard ratio 1.5) [14].

The treatment and prognosis of colorectal cancer liver metastases were significantly altered by the approval of multiple chemotherapeutic regimens in the early 2000s. Irinotecan was approved in April 2000 as first-line therapy for patients with metastatic colorectal cancer in conjunction with fluorouracil and leucovorin [15]. Oxaliplatin was then approved in August 2002 for use also in combination with fluorouracil and leucovorin for patients with metastatic colorectal cancer whose disease has recurred or progressed during or within 6 months of completion of first-line therapy [16]. These combination treatments doubled the median overall survival for metastatic colorectal cancer from 10 to 20 months [17].

A number of clinical scoring systems that straddled the time periods of infrequent and frequent use of neoadjuvant chemotherapy have been published (see Table 8.1, years included in the study) and thus may be more difficult to interpret. Schindl et al. (2005) reported that both neoadjuvant and adjuvant therapy were not used routinely in their cohort [18]. Malik et al. (2007) reported that neoadjuvant therapy was used in 10.3% of their cohort and that it did not alter the prognosis of patients who were high scoring [19]. Interestingly, the authors of this study hypothesized that preoperative inflammatory markers were clinically important and criticized previous scoring systems for requiring pathologic information not available at the time of clinical decision-making. In multivariate analysis, they showed that disease-free and overall survival were significantly associated with serum C-reactive protein >10 mg/L or a serum neutrophil-to-lymphocyte ratio > 5:1. They also showed no difference in survival between patients in the highest scoring group who had a liver resection and patients who were deemed inoperable because of advanced disease discovered after laparotomy. Konopke et al. (2008) reported that neoadjuvant therapy was used in 21% of patients in their cohort but that these patients were not well stratified using the scoring system proposed [20]. Those who were high scoring, however, lived longer than patients who did not receive neoadjuvant therapy. Rees et al. reported that 21% of patients received chemotherapy as a “downsizing strategy” but reported that its use was not associated with outcome [21].

House et al. (2010) examined the same group of patients from the original Fong et al. (1999) scoring system, and updated patients through 2004. Patients were compared before and after 1999, when neoadjuvant therapy had shifted at this institution [22]. Patients in the latter group were more likely to be treated with neoadjuvant chemotherapy, adjuvant chemotherapy, and adjuvant hepatic artery infusion pump, when compared to the earlier group. The 5-year overall survival was improved in patients treated in the later years (1999–2004) compared to those treated earlier (43% vs. 35%). They found that low-risk patients accounted for the difference in survival, whereas higher risk patients (clinical risk >2 points) had similar survival between time periods. The authors concluded that improved patient selection in combination with better chemotherapy was the likely reason for the difference in survival. This study did not address how many patients, however, were converted from unresectable to resectable status based on neoadjuvant treatment.

Two scoring systems that included patients from this early era focused exclusively on those initially deemed unresectable, both out of the same institution, Hôpital Universitaire Paul Brousse, in Villejuif, France. Adam et al. 2004, defined unresectability as <30% expected liver remnant, or concomitant extra-hepatic disease [23]. Of 1104 patients identified from 1988 to 1999, 138 (13%) went on to surgery and 58% who had FOLFOX as first-line therapy. Points were assigned for rectal primary disease, CA 19–9 > 100 IU/L, and maximum liver tumor diameter > 10 cm. Patients with three and four points had 0–6% 3-year survival. Imai et al. 2016 updated this cohort to include 439 patients treated from 1990 to 2012 [24]. Inoperability included <30% expected liver remnant, but additionally considered <40% in patients who had received more than eight cycles of chemotherapy. This is because the effect of neoadjuvant chemotherapy on liver parenchyma was better understood at that time [25]. Chemotherapy regimens used in Imai et al. (2016), however, were not reported. Scoring in this manuscript included node-positive primary disease, number of liver tumors, CA 19-9, concomitant extra-hepatic disease, and radiologic response to first-line chemotherapy based on Response Evaluation Criteria in Solid Tumors (RECIST) criteria [26].

Contemporary Scoring Systems

In 2008, Nordlinger et al. published a randomized controlled trial on the perioperative administration of FOLFOX to patients with potentially resectable colorectal liver metastases, and showed that progression-free survival was increased by 9.2% in patients who later underwent resection [27]. The follow-up to this study in 2013 showed a non-significant trend toward an increase in disease-specific survival in the treatment arm but no difference in overall survival [28]. Randomized controlled trials also demonstrated an increased number of patients with colorectal liver metastases who become surgical candidates after treatment with FOLFOXIRI regimens [29, 30]. As such, chemotherapy is now often given prior to surgical resection of colorectal liver metastases. RAS mutational status and epidermal growth factor receptor inhibitors are further relevant factors in the twenty-first century. Initially, inoperable patients with wild-type KRAS colorectal cancers and who were treated with cetuximab more frequently converted to surgical candidates in a number of randomized studies [31,32,33]. These advancements in treatment have prognostic implications and raise questions about the validity of previous colorectal liver metastasis scoring systems.

Three scoring systems have been published based on patient populations treated after the year 2000 [34,35,36]. Brudvik et al. (2017) examined 564 patients who had surgical treatment of colorectal liver metastases from 2005 to 2013, of whom 87% were treated with neoadjuvant chemotherapy [34]. They tested two different scoring systems – a traditional system that included nodal status, disease-free interval, number of liver tumors, size of the largest liver tumor, and CEA level, and a modified system that included nodal status, size of largest liver tumor, and RAS mutational status of the liver tumor. The modified scoring system was a better discriminator of recurrence-free survival and overall survival than the traditional scoring system for this cohort and four international validation cohorts. The number of viable tumor cells in the pathologic specimen was evaluated, which did not improve the discrimination of the score, but preoperative response to chemotherapy by imaging was not addressed. They also specifically examined the KRAS and NRAS status of the liver metastases postoperatively but commented that there is generally a high concordance of RAS mutational status between the primary tumor and liver metastases. Other limitations of this scoring system are that recurrence-free and overall survival rates were not reported by year and score, and that the median follow-up time for the cohort was also not reported.

Wang et al. (2017) published a scoring system based on 300 patients who all had been treated with neoadjuvant chemotherapy from 2006 to 2016 [35]. The final value, called a tumor biology score, was calculated with positive KRAS mutational status, Fong score >2, and poor radiologic response to neoadjuvant chemotherapy (progressive or stable disease according to RECIST criteria) [26], with each risk factor counting as a point. The tumor biology score correlated with survival: patients with 0 points had 64% 5-year survival, whereas patients with three points had 14% 5-year survival.

Sasaki et al. (2018) published a scoring system that included 604 patients treated from 2000 to 2015, 65% of whom were treated with neoadjuvant chemotherapy [36]. Points were calculated based on tumor size and number of tumors in a Cartesian schema to create a tumor burden score (tumor burden score2 = (maximum tumor diameter)2 + (number of liver lesions)2). Patients were then stratified into the lowest 25% (score <3.0), the intermediate 25–90% percentile (score 3.0–8.9), and the top 10% (≥9.0). Tumor burden score correlated with survival: scores <3.0 had 69% 5-year survival, and scores >9.0 had 26% 5-year survival. Scoring was validated using two other international cohorts, and maintained discrimination between patient groups who were treated with neoadjuvant chemotherapy and those who were not.

Validation and Comparison of Existing Scoring Systems

A number of the above studies compared their own scoring system to others, including Malik et al. (2007), Wang et al. (2017), and Sasaki et al. (2018) [19, 35, 36]. Universally, each study found that their own scoring system was superior for their own cohort. Wang et al. (2017), examining patients treated with neoadjuvant chemotherapy, compared their score to nine others for prognostic predictive power, including the Fong et al., Iwatsuki et al., Konopke et al., Nagashima et al., Nordlinger et al., Sasaki et al., Rees et al., and Brudvik et al. scores, and found that only the Fong et al., Brudvik et al., and Wang et al. scores had significant predictive power in their patient population [35].

Zakaria et al. (2007) compared predictive capacity of Fong et al., Nordlinger et al., and Iwatsuki et al. scoring systems and found that the Fong et al. score was the best for predicting survival and recurrence in their cohort [14]. Concordance probability estimates were performed for the three previous scoring systems and their own and found that all models were “only marginally better than chance alone” in predicting disease-specific survival and recurrence, thus concluding that the broad application of risk scoring systems had limited clinical value. They proposed that it be used to provide prognostic information postoperatively and to determine the value of adjuvant therapies.

Ayez et al. (2011) examined 352 patients treated from 2000 to 2008 [37], who were divided based on neoadjuvant chemotherapy administration, and then compared based on four different scoring systems: Fong et al., Konopke et al., Nagashima et al., and Nordlinger et al. Those who received neoadjuvant treatment had scores calculated before and after chemotherapy. Significant differences between scoring groups were detected for disease-free and overall survival in patients who did not receive neoadjuvant therapy. Differences were generally lost when scoring the neoadjuvant chemotherapy group before chemotherapy but then were generally recovered when scoring after neoadjuvant chemotherapy. Patients whose scores dropped after chemotherapy had the same outcome as patients who started with a lower score and did not receive chemotherapy. Remarkably, patients whose Nagashima scores dropped after chemotherapy had a better outcome than those with the same score who did not get chemotherapy.

A meta-analysis by Kanas et al. in 2012 critically evaluated 54 studies that examined survival of patients with colorectal liver metastases status post-liver resection and at least 24 months of median follow-up [38]. They found that the lowest meta-relative risk feature for survival was for liver tumors >3 cm (overall meta-relative risk of 1.52 (1.28–1.80) and the highest was for patients with a positive resection margin (overall meta-relative risk of 2.02 (1.65–2.48); it also found significant negative associations with high CEA level, high tumor grade, and multiple liver tumors.

Should I Use a Scoring System? If So, Which One? When Should I Use a Scoring System?

The above studies highlight a number of important factors related to patients with colorectal liver metastases. Features including extra-hepatic metastases, presence of positive lymph nodes, multiple liver tumors, and larger liver tumors generally have a negative prognostic value. They were the most frequently included features in the 17 scoring systems reviewed (Table 8.3) and thus can likely be used to estimate prognosis for patients at any institution. These studies also demonstrate that there is a group of surgically treated patients that have similar outcomes as those who do not have surgery; this is beyond the 1one in 100 patients who do not survive the perioperative period. Patients with colorectal liver metastases who are not resected have a 0% 5-year survival rate and median survival time of 13.3–21.4 [18, 39, 40].

Those who can be converted to curative resection on average experience better survival; however, patients who attain the highest values in these scoring systems also have median survivals that range from 14 to 22 months. Ueno et al. (2000) and Lise et al. (2001) specifically addressed this group. Ueno et al. (2000) reported that patients with a recurrence within 6 months had a median survival of 13 months, and that scoring highly was associated with this outcome [12]. Lise et al. (2001) reported that 92% of patients who had an early recurrence were identified in their highest scoring group [13]. Therefore, scoring systems that identify a group with 0% 5-year survival and <24 months median survival are of the greatest value in determining surgical candidacy. As chemotherapy and interventional radiology treatments continue to improve, this group of patients may enjoy the same survival as that of marginal candidates who get surgery but without the associated postoperative discomforts. These patients are also the best for consideration into clinical trials. Colorectal liver metastasis scoring systems further emphasize a group of patients who experience significant longevity and can potentially be cured. Patients who score the lowest have 48–85% 5-year survival depending on the system used. One might consider foregoing the modest improvement in recurrence-free survival afforded by neoadjuvant chemotherapy for patients without risk factors who are clearly resectable upfront.

The above rationale suggests that scoring systems should be used to assist in decision-making for the upper and bottom tiers of patients with respect to expected survival. While imperfect, they serve as an additional data point to drive discussion within a multi-disciplinary environment. The question of which scoring system to use should be answered based on some additional factors. Scoring systems tended to perform best in the institution where they were created. If no such scoring system exists within your own institution, it is worth considering the country where a scoring system was created or if it was validated using an external dataset that may be similar to your patient population. Scoring systems have been created based on patients in the United States, the United Kingdom, France, Italy, South Korea, Japan, and China and have been validated in Norway. The number of patients used to create a scoring system and the average follow-up should also be considered since validity is presumably increased with greater patient numbers and longer follow-up. Study cohorts reviewed here ranged from 83 to 1568 patients, and median follow-up ranged from 16.4 to 52 months. These metrics are presented for comparison in Table 8.2.

One should exercise caution using a scoring system without all of the data points needed to derive a score (Tables 8.3 and 8.4, Fig. 8.3). Scoring systems that require information only available postoperatively are best used for prognostic information, and are not well suited to be used in preoperative decision-making (Fig. 8.3). Preoperative decision-making can be to include or exclude patients for surgery, to estimate the extent of imaging, or even to decide whether a patient should have a different surgical procedure such as staging laparoscopy. If a synchronous colorectal and hepatic resection is being considered, a scoring system that requires nodal status may not be accurate if calculated preoperatively. Many institutions consider extra-hepatic disease or anticipated positive margin as contraindications to surgery. Scoring systems that are predicated on these factors, such as Imai et al., Rees et al., Nagashima et al., Lee et al., and Nordlinger et al. may be less useful if these patients are already clinically excluded from surgery in your practice. Lastly, for patients who are receiving neoadjuvant chemotherapy, scoring systems appear to have the best prognostic value when applied after this treatment.

Fig. 8.3
figure 3

Colorectal Liver Metastasis Scoring Systems. Schematic of colorectal liver metastasis scoring systems based on time frame of treatment and treatment with neoadjuvant chemotherapy. (Note: ∗Requires biopsy for mutational status)

Many scoring systems created prior to the widespread use of neoadjuvant chemotherapy are less effective in predicting outcomes of patients before treatment has been initiated. Scoring systems created based on patients treated in the neoadjuvant setting generally use data from patients after chemotherapy. Scoring systems validated for use in patients treated with neoadjuvant chemotherapy are presented in Fig. 8.3. Reviewing the Figures and Tables in this chapter can assist in identification of the best scoring system to use for your patient, which will improve predictive accuracy. The preoperative scoring system most used worldwide remains the Fong score because it incorporates criteria available for any patient. Thus, Fong plus, meaning Fong score plus response to chemotherapy, is practical and is still the system against which most newly proposed scoring systems are compared.

Conclusion

A number of clinical scoring systems exist for colorectal liver metastases that may assist in clinical decision-making and provide prognostic information. Limitations of these scoring systems include that they were all devised in a retrospective fashion and without a power analysis. Many include patients from a wide range of time periods while the standard of care was in evolution. They also frequently require information for calculation that is only available postoperatively. Clinical scoring systems are useful, however, as an additional data point when appropriately applied, to help predict which patients will have the best and poorest long-term outcomes. This information can then be used to guide clinical decision-making and facilitate patient discussions. In the future, as medical and interventional radiologic treatments advance for this population, older scoring systems will need to be validated, and new scoring systems should be evaluated taking these additional variables into account.