Laparoscopic liver resection (LLR) is being increasingly utilized in the USA and worldwide. LLR is employed for both benign and malignant liver lesions, and brings multiple advantages, including decreased length of stay, blood loss, and wound infection rate without reducing quality of resection and oncologic outcome [1,2,3,4,5]. While advantageous in many aspects, LLR is technically challenging and surgeon experience plays a critical role in achieving a successful and safe resection.

The first and second International Consensus Conferences on LLR noted factors associated with difficulty of resection and the importance of surgeon expertise [6, 7]. LLR difficulty scores have been developed to predict the feasibility of achieving resection laparoscopically and guide surgical planning, and the second (Morioka) consensus conference strongly advocated their use [7]. Multiple scores that account for various patient and tumor factors have been proposed [8,9,10,11].The IWATE difficulty scoring system is a four-level scoring system developed from a Japanese cohort based on six factors, including extent of resection, tumor location and size, proximity to major vessels, patient liver function, and plan for use of hand-assisted laparoscopy [12]. It has been shown to correlate with rate of conversion, operative time, blood loss, and postoperative complications [12]. Given the estimated learning curve of 45–60 major hepatectomies [13], the IWATE score may be used to guide an LLR program as its surgeons gradually and safely increase their operative capability.

The IWATE difficulty score has been validated on European and Japanese patient cohorts [12], but has not been examined in a North American patient population. Given the widespread geographic variability in the incidence of liver cancer [14], the indications for resection also likely differ between regions. Moreover, it is well known that body habitus differs between North America and other regions, and obesity is thought to add complexity to LLR [15]. In this study, we aim to validate the IWATE classification system on a North American cohort. Secondarily, we aim to describe the evolution of our LLR program over time and examine the ability of the classification system to predict rate of conversion to open resection.

Methods

Study population

This was a retrospective cohort study performed at a single large North American academic center. Patients that underwent LLR for a hepatic tumor between January 2006, at the founding of the laparoscopic liver surgery program, and December 2019 were selected from a prospectively maintained database. These included patients with hepatocellular carcinoma (HCC), intrahepatic cholangiocarcinoma (IHC), metastatic liver tumors, and benign liver lesions. Patients that underwent cyst fenestration, living donor liver resection, solely thermal ablation, vascular or biliary reconstruction, or repeat LLR (except wedge) were excluded. Patients were followed for 90 days for complications. Institutional IRB approval was obtained.

Study variables and definitions

Patient characteristics included for analysis were patient demographics, preoperative imaging, operative details, and post-operative course. Resection margin was determined by histologic analysis. Couinaud’s segments II, III, IVb, V, and VI were defined as anterolateral, and segments I, IVa, VII, and VIII were defined as posterosuperior. Brisbane 2000 terminology was used to describe LLR procedures [11, 16]. Data were extracted from the electronic medical record via manual chart review.

IWATE criteria

LLR difficulty was assessed using the IWATE Criteria, which comprised six variables with a total score ranging from 0–12 points (Supplemental Table 1). These variables included: tumor location (1–5 points), extent of a hepatic resection (0–4 points), tumor size ≥ 3 or < 3 cm (0–1 points), proximity to a major blood vessel (0–1 points), Child–Pugh liver function (0–1 points), and HALS/hybrid resection (–1–0 points). The 0–12 difficulty index was further subdivided into four difficulty levels: low (0–3), intermediate (4–6), advanced (7–9), and expert (10–12).

Table 1 Basic characteristics of study cohort

Outcomes

First, to test the validity of the IWATE criteria in a North American cohort, conversion to laparotomy, estimated blood loss (EBL), operative time, and postoperative complications were analyzed according to 4-level IWATE difficulty. EBL was estimated by the anesthesiologist immediately following the surgery. Operative time was defined as the time from skin incision to skin closure. Complications were graded using the Clavien-Dindo Classification and defined as Grade II or higher [17]. Secondary outcomes included length of stay (LOS) and intraoperative blood transfusion. Transfusion was defined as administration of any blood product, including packed red blood cells, fresh frozen plasma, cryoprecipitate, or platelets. To examine how the utility of IWATE changed with time with the least possible bias, we arbitrarily stratified the cohort into four chronological eras, with the first three eras consisting of 100 patients and fourth era of 126. Outcomes were then assessed between eras. A receiver operating characteristic (ROC) analysis for conversion to open resection was determined.

Statistical analysis

Summary statistics were reported as frequencies with percentages or median values using interquartile ranges (IQR). Differences between categorical values were estimated using the Chi-squared test, while differences between continuous values were assessed with the Mann–Whitney U test or the Kruskal–Wallis test, as appropriate. Conversion to laparotomy post estimation concordance was calculated in the standard fashion using ROC plots and the area under the curve (AUC). All testing was two-sided and used a 5% level of significance. Statistical analysis was performed using Stata 16.

Results

Patient characteristics

A total of 426 patients met inclusion and exclusion criteria and were included for analysis. Baseline clinical characteristics overall and by chronological era, are presented in Table 1. There were 237 (55.6%) male and 189 (44.4%) female patients with a median (IQR) age of 61 (49–69) years and body mass index (BMI) of 27.6 kg/m2 (23.8–32.8). Indications for surgery included both benign (n = 129, 30.3%) and malignant (n = 297, 69.7%) tumors. Of the malignant tumors, the most common was colorectal liver metastases (CRLM) (n = 147, 49.5%), followed by HCC (n = 93, 31.3%), other malignancies (n = 36, 12.1%), and IHC (n = 21, 7.1%). The number of patients undergoing surgery for benign vs. malignant conditions remained similar between eras. There were significant differences in malignancy type between eras, with an increase in the percentage of patients undergoing resection for HCC and IHC and decrease in percentage for CRLM with time (p < 0.01, Table 1). The distribution of IWATE scores by eras are displayed in Fig. 1. Other baseline characteristics, including largest tumor size and location were not significantly different between eras. Patients with Child–Pugh B or C cirrhosis were not typically candidates for LLR in the USA and were therefore not represented in our cohort.

Fig. 1
figure 1

Distribution of IWATE scores between eras

Procedural details

Operative details and surgical outcomes overall and by chronological era are presented in Table 2. Overall median operative time (IQR) was 230 (172–309) minutes. Median estimated blood loss was 200 (50–350) mL. Conversion to open surgery was required in 67 (15.7%) patients. Fifty-two (12.2%) patients experienced a complication, with the most frequent being pulmonary complications, such as pneumonia or pleural effusion. Between eras, utilization of a pure laparoscopic approach increased with time, while hand-assist and robotic approaches became less common. Type of resection also varied significantly between eras, with the percentage of major resections remaining stable at 10–11% of surgeries between eras 1–3 but increasing to 38.9% of cases during era 4. There was significant variation in operative time, estimated blood loss, conversion to open surgery, blood transfusion, complications, and length of stay between eras. The proportion of all hepatectomies approached laparoscopically increased as our program gained experience (Supplemental Fig. 1).

Table 2 Procedural and perioperative details

Validation of the IWATE criteria according to 4-level difficulty

Procedural and perioperative details according to the 4-level IWATE Criteria difficulty are presented in Table 3. The median difficulty score overall was 5. Intermediate difficulty resections were most common (n = 195, 45.8%), followed by low difficulty (n = 137, 32.1%), advanced (n = 56, 13.1%), and expert (n = 38, 8.9%). There was a general increase in IWATE score over time, with 50% and 81.6% of the advanced and expert level cases, respectively, performed during Era 4 (p < 0.01). A laparoscopic approach (92.1%) was utilized more frequently for expert cases compared to hand-assist (5.3%) or robotic (2.6%) approaches. The percentage of major resections increased in concordance with 4-level IWATE difficulty, comprising 0.0%, 1.5%, 73.2%, and 97.4% of the low, intermediate, advanced, and expert difficulty cases, respectively (p < 0.01). Wedge resections or segmentectomies were categorized most frequently as low or intermediate difficulty, whereas more extensive resections such as left or right hepatectomies fell within the advanced or expert IWATE difficulty levels. Operative time increased in direct concordance with IWATE difficulty level (p < 0.01) Fig. 2. Median estimated blood loss, percentage of patients requiring a blood transfusion, and rates of conversion to open surgery increased in concordance with difficulty until the advanced level; however, a continued increase was not demonstrated for the expert group (Fig. 2). There were no significant differences in rates of complications between difficulty levels. Hospital length of stay increased in concordance with difficulty level.

Fig. 2
figure 2

Surgical outcomes according to the 4-level IWATE Criteria A Conversion to open surgery, B Operative time, and C Estimated blood loss *p < 0.05 **p < 0.01

Table 3 Procedural and perioperative details according to IWATE difficulty level

Performance of the IWATE criteria in predicting conversion to open surgery

The performance of the IWATE criteria in predicting conversion to open surgery is presented in Fig. 3. ROC plots revealed an overall area under the curve (AUC) of 0.694 (CI 0.629–0.759, Fig. 3A). An IWATE score of five was the most sensitive and specific cutoff. Analysis by era demonstrated a decreasing predictive performance of the IWATE criteria with time (Fig. 3B). During the first and second eras, the IWATE Criteria demonstrated superior performance, with AUCs of 0.771 and 0.775, respectively. Subsequent eras demonstrated a stepwise decrease in performance, with AUCs of 0.708, and 0.551 for Eras 3, and 4.

Fig. 3
figure 3

A Receiver operating characteristic for conversion to open surgery, B ROC analyses by Era

Discussion

This single institution retrospective study validated the IWATE Criteria in a large North American LLR cohort, as made evident by its correlation with operative time, blood loss, blood transfusion, and conversion to open surgery. Additionally, we found a decrease in the IWATE Criteria’s ability to predict conversion to open surgery as our program gained experience. Utilization of LLR has been increasing annually since the Louisville Statement in 2009 [18]. The ability to preoperatively determine case difficulty is crucial for patient safety and the development of the liver surgeon’s practice. Ban et al. originally developed a difficulty score, which was improved upon at the 2014 Morioka Conference to create the “IWATE Criteria” [7, 19]. IWATE has the potential to be a useful tool in pre-operative planning, based on its correlation with open conversion, blood loss, operative time, and complications in Japanese and French cohorts; however, these cohorts differ from the patient population encountered in North America. For example, BMI varies widely between regions and is felt to be a “high risk” factor, according to consensus guidelines [20, 21]. This is supported by increased rates of conversion to open surgery and bile leak in elevated BMI patients [22, 23]. Despite this, BMI is not accounted for by the IWATE Criteria. There are also differences in disease prevalence that may contribute to variability in indications for LLR between regions. In a similar vein, the Japanese Multi-Institution (JMI) and Institut Mutualiste Montsouris (IMM) cohorts utilized for previous validation comprised LLR data from 1995 to 2015. Much has changed since then, with major resections comprising a larger percentage of LLR cases [24]. Additionally, the widespread implementation of LLR in North America has been more gradual than in Asia and Europe. Therefore, further validation was needed prior to implementation of the IWATE Criteria in present day North American LLR practice.

As anticipated, there were multiple differences between our cohort and the JMI and IMM cohorts utilized previously. For example, BMI from the IMM cohort was 24.7, compared to 27.6 in our study [25]. Although BMI for the JMI cohort was not reported, it has been shown to differ in the general population [21]. Furthermore, 21.8% of our patients underwent LLR for HCC, 34.5% for CRLM, and 30.3% for benign disease. These percentages are similar to those previously reported from the National Surgical Quality Improvement Project (NSQIP), suggesting that our cohort is representative of the North American population [18]. The JMI cohort, on the other hand, comprised 66% HCC, 22.2% metastatic disease, and 7.0% benign disease, while the IMM cohort comprised 9.2% HCC, 70% metastatic disease, and 14.3% benign disease [25]. The extent of resections between cohorts also varied; our cohort comprised 61% wedge/segmentectomy cases, similar to the 62.6% partial resections described in the JMI cohort. Interestingly, the IMM cohort included only 34.6% partial resections, suggesting that the French may have been early to adapt LLR for larger resections. Such variability further supports the need for validation in a North American population to expand its utility from Asian and European cohorts.

Our analysis of perioperative details validated the IWATE Criteria’s ability to predict operative outcomes in a North American cohort. Operative time, blood loss, transfusion, and conversion to open surgery tended to increase based on IWATE difficulty level, suggesting that the Criteria can effectively determine preoperative case difficulty. This empowers surgeons to take on cases appropriate for their level expertise. Some of these trends were not observed for the expert group, which may be due to outliers in its small sample size or improved surgeon expertise by era 4, when the majority of expert level cases were performed. The ability of the IWATE Criteria to predict conversion to open surgery was a novel finding and may be of utility during discussion of preoperative risks and expectations with patients. Open conversion is known to be associated with inferior outcomes, making this particularly relevant [26]. As such, we encourage North American laparoscopic liver surgeons to utilize the Criteria preoperatively for reasons twofold: identification of cases appropriate for surgeon experience and determination of laparoscopic feasibility. Together, this can help optimize patient selection and allow for less experienced surgeons to gradually increase their caseload of more technically demanding procedures.

The IWATE criteria were originally developed based on expert surgeon opinion. This proved effective; however, surgeon perception of difficulty is subject to change. As technologies advance and operative techniques improve, what was once considered difficult may no longer be seen as challenging. Therefore, we investigated the utility of IWATE criteria as our program gained experience. During eras 1 and 2, from 01/2006 to 08/2015, the Criteria demonstrated good predictive value for conversion to open surgery, with AUCs of 0.771 and 0.775. However, during eras 3 and 4, from 08/2015 to 04/2020, the predictive value was diminished, with AUCs of 0.708 and 0.551, respectively. Reasons for this are likely multifactorial and may include factors such as improved surgical technique, instruments, or increased surgeon LLR experience. It’s also possible that the IWATE Criteria are less effective when used in today’s LLR population, which tends to be comprised a higher percentage of major resections. Considering this, the IWATE Criteria may benefit from interval recalibration as practice evolves to account for changes in surgeon perception of difficulty, as well as case load.

In addition to its retrospective design, one limitation is that of potential selection bias. Of all the patients requiring liver resection, only a subset was determined to be candidates for a laparoscopic approach. Surgeon determination of laparoscopic feasibility is subjective and, based on the increased proportion of resections performed laparoscopically during later eras, likely varied over the course of the study. The exact effect of this variation is difficult to quantify; however, it is encouraging that baseline characteristics in terms of age, gender, and BMI of patients undergoing LLR remained largely stable over time. Nevertheless, it is possible that less stringent criteria for a laparoscopic approach may have contributed to the IWATE Criteria’s inferior performance in the later eras. Types of malignancies did vary between eras, mainly due to an increase in the proportion of patients undergoing resection for HCC and IHC. This may be related to case difficulty at presentation, with more difficult HCC resections performed laparoscopically as our program gained experience. Understanding this, we anticipate most centers will have variability in case composition with time, meaning this could actually be the norm. Another limitation is this study’s single center design, which lessens its generalizability, especially for centers with different available resources or patient populations. Lastly, changes in technology or surgical techniques over the course of the study likely added further bias. However, such changes are integral to all fields undergoing rapid growth and are necessary for improvement. Rapid change is one of the main reasons we recommend interval recalibration.

Conclusion

The IWATE Criteria are an effective means of stratifying preoperative LLR case difficulty within the North American patient population. They readily distinguish between beginner, intermediate, and advanced difficulty level liver resections. Chronologic analysis of predictive ability for conversion to open surgery demonstrated decreased predictive value as our program gained experience; interval recalibration may be warranted. Looking forward, a prospective study is needed to best understand the utility of the IWATE Criteria and how they may be most effectively applied.