The robotic system is used in various fields of surgery and its application to different indications continues to expand in parallel with the development of technology [1]. Feasibility, safety and auspicious outcomes of robotic total mesorectal excision (TME) have been demonstrated by various case series [27]. Preliminary data from the ROLARR trial (Robotic versus Laparoscopic Resection for Rectal Cancer) has shown that robotic TME does not increase the conversion rate, though no statistically significant clinical or oncological benefit has been reported so far [8]. Before taking part in the ROLARR trial, surgeons had performed an average of 91 laparoscopic versus only 25 robotic resections which might suggest that some of the participating surgeons were still in their learning curve for robotic surgery whilst already being an expert in laparoscopic rectal surgery during this trial. In fact, there is only sparse data available about the robotic learning curve of expert laparoscopic rectal surgeons which might be different to that for a rectal surgeon primarily starting with robotic TME without prior laparoscopic experience [9]. However, this situation is certainly less likely as in most surgical centres only experienced laparoscopic colorectal surgeons adopt the robotic technique. Hence, the learning curve of an expert laparoscopic TME surgeon who starts robotic TME is likely of paramount interest for colorectal services, especially as the robotic system theoretically should simplify the operative procedure which may result in a favourable learning curve. The scope of this study is to produce illustrative robotic learning curves for two experienced laparoscopic colorectal surgeons using cumulative sum (CUSUM) charts, whilst adjusting for confounders.

Materials and methods

A retrospective analysis of a prospectively maintained database was carried out (Fig. 1). Laparoscopic and robotic total mesorectal resections (anterior resections) for non-metastatic rectal cancers performed from October 2006 to November 2015 at the Minimally Invasive Colorectal Unit, Queen Alexandra Hospital, Portsmouth, were included in this study. Approximately, 250–300 colorectal resections per annum are performed in our unit. During the study period, a dual console four arm robotic system (Da Vinci® Si, Intuitive Surgical®) was introduced and two experienced laparoscopic colorectal surgeons (surgeon A and B) who were also trainers for the National Training Programme (NTP) for Laparoscopic Colorectal Surgery in England (Lapco) adopted the robotic method. Both surgeons had reached a plateau in the learning curve with respect to their independent and proficient performance of a laparoscopic colorectal resection, as to Lapco standards (http://www.lapco.nhs.uk). The laparoscopic experience of surgeon A and B before starting robotics consisted of approximately 1500 and 400 colorectal procedures, respectively. The robotic cases for each individual surgeon were consecutive and therefore included the complete learning curve. A full-time research assistant collected all variables of interest in a structured database and validated the data, including ensuring consistency with patient notes and checking death records for patients lost to follow-up. For each individual surgeon, the robotic cases were 1:1 nearest neighbour propensity score matched to his laparoscopic cases to obtain comparable groups of patients, with the laparoscopic cases serving as a reference to calculate the proficiency targets for the robotic cases. The variables used to calculate the propensity scores were age, gender, body mass index (BMI), American Society of Anesthesiologists (ASA) grade, height of tumour location (low, middle or upper rectum) and American Joint Committee on Cancer (AJCC) tumour stage. Outcome variables of interest, defining quality indicators, were operation time (minutes), conversion rate, lymph node harvest (count), length of hospital stay (days), reoperation rate, the presence or absence of major complications (defined as Clavien–Dindo Grade 3a–4b) and 30-day/inhospital mortality (Clavien–Dindo grade 5) [10]. For each surgeon, the mean and standard deviations of the continuous quality indicators were calculated for his laparoscopic cases to define the individual baseline proficiency against which the robotic cases were compared when performing the cumulative sum analysis (CUSUM). For the binary variables, the rate of each outcome for the laparoscopic cases was used as the baseline/inherent risk, again calculated separately for each surgeon. Potential sources of bias were addressed by focusing on treatment of non-metastatic rectal cancer with curative intent i.e. anterior resection with TME and by the propensity score matching providing a comparable laparoscopic group with similar operative and perioperative risk factors to avoid selection bias (due to case mix). The study size evolved from the number of consecutive robotic cases of the involved surgeons matched to an equivalent number of non-learning curve laparoscopic cases. Quantitative variables were analysed as such with the exception of tumour height from anal verge which was categorised as low (2–5 cm), medium (6–11 cm) or high (12–15 cm) rectal cancers. Operating time was defined as the time from incision of the skin until the final stitch which also included the docking time in robotic cases. All included patients signed an informed consent allowing their data to be used for retrospective analysis and its publishing. The requirements for anonymization of personal dataset by the Data Protection Act 1998 were satisfied. According to the Health Research Authority (HRA), this study was not classified to need their approval as it is an audit.

Fig. 1
figure 1

Flow chart of the method used to assess the learning process of an individual surgeon

Operative procedure

The laparoscopic TMEs were performed following a standardised approach. This included medial to lateral mobilisation, high tie of the inferior mesenteric artery, division of the inferior mesenteric vein at the lower border of the pancreas, complete mobilisation of the splenic flexure in all cases and total mesorectal excision using sharp diathermy dissection usually followed by a colorectal stapler anastomosis. The robotic procedures were performed following similar principles, again using a highly standardised approach including complete mobilisation of the splenic flexure in all cases [11]. A single-dock method was applied, and an experienced robotic surgeon proctored the first ten robotic TMEs of each surgeon. Although a dual console was used for proctoring, it was not intended for the proctor to take over parts of the operation, but only to demonstrate technical solutions and ensure a safe operation. Observations and simulator training (of at least 30 h) were carried out beforehand. Both surgeon A and B had colorectal fellows who assisted with the robotic cases. They were different but the team for each surgeon was generally consistent. Although the assistants had no previous robotic experience, they had extensive previous exposure to advanced laparoscopic colorectal surgery.

Cumulative sum charting

Cumulative sum (CUSUM) curves for each quality indicator of interest were used to monitor the performance of each surgeon [12, 13]; the presentation format described by Kestin was used [14]. The x-axis represents the consecutive case number and the y-axis represents the CUSUM score. For the binary outcomes (X), the CUSUM score is the cumulative sum of X i  − X 0 where X i represents the success (X i  = 0) or failure (X i  = 1) of each consecutive procedure. X 0 is the inherent risk of failure of the procedure, which in our study has been calculated using each surgeon’s matched laparoscopic reference group as a baseline (Fig. 1). For the continuous quality indicators (T), such as operating time, the CUSUM score is the cumulative sum of T i -(\(\stackrel{-}{x}\) + k), where T i is the outcome of each successive procedure; k is the tolerable slack from the reference mean (\(\stackrel{-}{x}\)) in the laparoscopic group and was defined as a quarter of a standard deviation in order to set tight targets and to account for the dispersion of the reference data. A tolerable shift of half a standard deviation is a common rule-of-thumb in industrial processes where, however, the standard deviation is normally derived from a larger sample size [15]. The CUSUM curves ascend when the set targets are not reached which reflects an ongoing learning process; the curve runs more or less parallel to the x-axis when the performance is similar to the laparoscopic standards, reflecting no learning process; the curve has a downward trend when the performance is more often on target than expected.

Statistical methods

Propensity scores were calculated via logistic regression analysis with method of procedure (laparoscopic versus robotic) as the outcome variable and then nearest neighbour 1:1 matching applied (with no discards, nor a specified calliper distance). The level of balance achieved was assessed by the absolute standardised mean difference for each baseline variable, the multivariate imbalance measure L1 (Iacus, King and Porro, 2010) and the overall balance test (Hansen and Bowers, 2010). For the matching process, PS Match for SPSS (Version 1.0, by Felix Thoemmes, 2011) [16] was used; underlying R packages included MatchIt, RItools, and cem [1720]. For the outcomes, categorical variables were compared using a Pearson Chi-square or Fisher’s exact test, as appropriate, and continuous variables were compared using the non-parametric, Mann–Whitney U test. The statistical analyses were performed using SPSS (Statistical Package for Social Sciences, Version 20, IBM Corp. Released 2011).

Results

A total of 384 (294 laparoscopic and 90 robotic) total mesorectal resections met the inclusion criteria. Surgeon A performed 206 (70.1%) of the laparoscopic and 43 (47.8%) of the robotic cases, whilst surgeon B performed 88 (29.9%) of the laparoscopic and 47 (52.2%) of the robotic cases during the study period. After propensity score matching, no baseline variable exhibited an absolute standardised mean difference >0.25 (│d│>0.25), either for surgeon A or surgeon B. The multivariate imbalance measure L1 showed an increase in balance (0.976 and 0.955 before and 0.837 and 0.915 after matching for surgeon A and B, respectively), and the overall balance test showed no significant difference after matching (p = 0.963 and p = 0.988, respectively). Biometric, oncologic and operative baseline characteristics of the robotic and matched laparoscopic cases of each surgeon were similar and are summarised in Table 1. Overall, surgeon A treated more low rectal cancers (47 vs. 32%) but less cases with neoadjuvant radiotherapy (17 vs. 30%) when compared to surgeon B. The outcomes of interest, including the quality indicators, are summarised in Table 2. On an individual level, there was no significant difference with respect to operation time, R stage, lymph node harvest, length of hospital stay, conversion rate and major complications between the laparoscopic and robotic cases for both surgeons. Only the readmission rate was statistically significantly higher in the laparoscopic compared with the robotic group for surgeon B (21 vs. 6%, p = 0.035). The cut-off values for robotic on-target performance for the continuous quality indicators, and the expected inherent risk for the binary quality indicators, were calculated based upon the laparoscopic outcomes and are listed for each quality indicator in Table 3. The on-target cut-off for operation time was considerably higher for surgeon B than for surgeon A (292 vs. 233 min) as was the inherent risk for major complications Clavien–Dindo 3b–5 (15 vs. 7%); on the other hand, surgeon B had a slightly higher lymph node harvest cut-off (15 versus 14) and also a shorter hospital stay cut-off (11 vs. 12 days). Finally, the CUSUM charts for the quality indicators are shown in Fig. 2. As the conversion rate was zero in the robotic group for both surgeons and only one conversion occurred in the laparoscopic group, no CUSUM chart was drawn for this quality indicator. For similar reasons (i.e. due to small numbers of events), mortality was included in the major complications outcome so that this constituted a single quality indicator (Clavien–Dindo grade 3b–5).

Table 1 Baseline characteristics of patients who underwent laparoscopic or robotic total mesorectal resection by two surgeons after propensity score matching
Table 2 Short-term outcomes of consecutive learning curve robotic total mesorectal excisions compared to non-learning curve conventional laparoscopic resections performed by surgeon A and B
Table 3 Individual on-target cut-off values and inherent risk for continuous and binary quality indicators, respectively, for surgeon A and B used for cumulative sum charting, based on laparoscopic cases
Fig. 2
figure 2

Cumulative sum (CUSUM) charts for four quality indicators for the first consecutive robotic total mesorectal excisions performed by surgeon A and B

For surgeon A who had performed more laparoscopic TMEs (1500 laparoscopic colorectal procedures before starting robotics and 206 cases meeting the inclusion criteria during the study) before starting robotic TME, the CUSUM curves showed only a short learning process for operation time, necessitating seven tutored cases, whilst there was apparently minimal to no learning process for the other quality indicators, such as lymph node harvest, length of stay and major complications. In fact, the general downward trend for operation time, lymph node harvest and length of stay indicates that these quality indicators were more often on rather than off their pre-defined targets.

For surgeon B who had performed less laparoscopic TMEs beforehand (400 laparoscopic colorectal procedures before starting robotics and 88 cases meeting the inclusion criteria during the study), the CUSUM curves showed a clear learning process for operation time, length of stay and major complications. However, all of these indicators were ultimately on-target, showing a comparable performance to laparoscopy after just 15 robotic procedures. No evidence of a learning process was observed for lymph node harvest. Other systematic influences explaining the learning curve like a sudden change of the assistant could not be found.

Discussion

This study suggests that experienced laparoscopic colorectal surgeons may undergo a short learning process when changing from laparoscopic to robotic TME. Furthermore, the number of previous laparoscopic TMEs performed may also influence the number of cases needed to reach an equivalent performance level in robotic TME. This is well demonstrated by the CUSUM curves running parallel to the x-axis or inflecting downwards after a maximum of only 15 robotic cases, whilst the more experienced surgeon (A) had a shorter learning process than the less-experienced surgeon (B). The results also show that the introduction of a robotic system into a specialist colorectal unit may only have some marginal effect on case load per time period and short-term outcomes. In case of extensive laparoscopic experience, there is no apparent learning curve, as observed for surgeon A. Similar observations have been previously reported for other procedures where skills were transferrable to a new technique [21]. CUSUM plots are a widespread tool for sequential quality control and learning curve assessment [22] and have already been used to assess various colorectal procedures [23]. To our knowledge, there is only one recent study by Yamaguchi et al. using CUSUM charts to assess the learning curve in robotic rectal surgery [9]. A minimum of 25 robotic procedures was reported as necessary to achieve proficiency. However, this study included a variety of procedures with and without lateral lymph node dissection and for a range of indications. No special measures were taken to account for selection bias (due to case mix) and only a single quality indicator, namely ‘operation time’, was used in the CUSUM analysis. Besides this, the level of proficiency in laparoscopic rectal surgery prior to starting with robotics was not stated. In contrast, our study included a defined standard procedure for a narrow field of indications, and the proficiency targets were calculated based on non-learning curve laparoscopic TMEs which were matched to consecutive robotic cases to minimise selection bias. Although operation time is an important indicator for the learning curve, opinions on the appropriate choice of quality indicators differ and other indicators are likely equally important to assess safety and oncological adequacy. For many surgeons, important quality criteria are feasibility, safety and ergonomics. For the hospital management, important quality indicators might include operating time, associated costs and marketing effects. For the patient, postoperative pain, functional outcome, cosmesis and disease-free as well as overall survival are of likely most interest. For all of these quality indicators, CUSUM charts can be constructed. Operation time is often considered an important surrogate for the adoptability and competitiveness of a new technique. Consequently, to compensate for longer operation times, the other benefits of a new technique may need to be substantial. Although a longer operation time is still a common argument against robotic TME [24, 25], our experience does not support this even when including docking time. With increased experience of the theatre team, the set-up and docking time could be reduced and allowed similar operation times as in laparoscopy after only 15 cases. It is important to mention that our study not only reflects the learning curve of the surgeon but also that of the whole team for whom the robotic platform was a novelty as well. The assistant standing at the patient’s side plays an important role when it comes to active assistance via the auxiliary port, although the four arm robotic system with operator-guided camera as used in this study makes the console surgeon less dependent on the assistant’s experience.

There was no statistically significant difference in the R0 resection rate between the laparoscopic and the robotic group. Although previous evidence has suggested similar circumferential margin positivity in open, laparoscopic and robotic TME, this was less clear during the learning curve phase [26]. In fact, surgeon B had less positive margins in robotic than in laparoscopic TME which is not entirely explained by differences of stage, site of tumour and neoadjuvant treatment. This result is clinically relevant as it demonstrates non-inferiority of robotic TME even during the learning curve.

Lymph node harvest, often used as a surrogate marker for quality of surgery, was not adversely influenced during the learning period as shown in the corresponding CUSUM chart. However, lymph node count as a global quality indicator has been called into question as it does not only reflect the surgeon’s effort alone but also pathological retrieval technique and tumour biology [27, 28].

The length of stay in hospital was prolonged in the first 15 cases during the learning curve of surgeon B, with a corresponding upward infliction in the CUSUM chart representing procedures not on target in this period. Length of stay may reflect the invasiveness of a lengthy procedure which may influence the degree of inflammatory response leading to a prolonged time to recover.

No perioperative death occurred in the robotic group and major complications were less common than in the laparoscopic group which was more apparent for the less-experienced surgeon B. For a new technique to be implemented successfully, it is crucial that quality indicators concerning patient safety are in-target as fast as possible. Due to the relatively static robotic platform, the lack of haptic feed-back and the tunnel view, there is a potential for specific and severe complications in robotic surgery [29]. In particular, the tunnel view may be challenging for inexperienced colorectal surgeons who need more integral views to establish pattern recognition as provided in conventional laparoscopy. In our study, there was no conversion to open surgery in the robotic group. Similarly, a systematic review of short-term outcomes of robotic rectal cancer surgery found a lower conversion rate in the robotic than in the laparoscopic group [30]. Vascular dissection and medial to lateral mobilisation as well as the take down of the splenic flexure can be challenging in both laparoscopic and robotic surgery. For the medial to lateral approach in robotic TME, it is crucial that the inferior mesenteric axis is identifiable before docking the robot. Experience in conventional laparoscopy helps efficiently use patient positioning in order to optimise exposure. The novice robotic surgeon may find this especially challenging as re-positioning of the patient is not possible once the robot is docked unless a synchronised operating table is used which was not the case in this study. In our institution, a standardised approach for complete mobilisation of the splenic flexure for both laparoscopic and robotic surgery has been developed which enables the surgeon to perform this complex step in a methodological manner resulting in a low conversion rate [11].

Addressing selection bias is important when evaluating a new technique as often straightforward cases (females, low BMI, small tumour size, no previous abdominal surgery, no preoperative radiotherapy) are chosen initially with the aim to progress to more complex cases with growing expertise. Matching laparoscopic cases to the consecutive robotic cases ensures that the corresponding laparoscopic reference cases theoretically could have been operated by robotics at the same stage of training. So, not only the robotic but also the matched laparoscopic cases were of increasing complexity and were well suited to define inherent risk and allowable slack.

Using our approach, CUSUM curves of different surgeons should not be compared without caution as the robotic performance is only benchmarked against an already achieved personal laparoscopic level of proficiency. If a surgeon had an inconsistent laparoscopic performance, this would result in a high allowable slack or inherent risk of quality indicators. Although being on target with reference to their own performance, they might be below standard when compared to other surgeons. On the other hand, a skilled and experienced surgeon who achieves consistent results leading to low standard deviations or inherent risks of quality indicators may find it difficult to reach a similar level when adopting a new technique. Our approach specifically answers the question as to whether it is worth changing from the current method, even when it produces good results. To prevent unnecessary harm to the patient, constant monitoring of the learning process is mandatory.

CUSUM curves for continuous quality indicators may be subject to systematic bias. The transition point of the CUSUM curve from off-target to on-target depends upon the allowable slack which may be somewhat arbitrary in its definition. The higher the allowable slack, the shorter the learning process becomes and vice versa. Half of a standard deviation is a commonly used slack when controlling industrial processes where the reference group is usually large leading to a relatively small standard deviation and a tight slack. In our model where only the surgeon’s own expert cases constituted the reference group, standard deviations were relatively high resulting in large allowable slacks when using half a standard deviation. Therefore, the tolerable slack was reduced to a quarter of the standard deviation from the mean in our approach.

Many expert laparoscopic colorectal surgeons are reluctant to start with robotic TMEs as they fear that they may have to repeat the learning process with all its negative implications. This study, however, provides evidence that the laparoscopic skill and procedural knowledge may be transferable to the robotic approach. The CUSUM charts also suggest that there may even be a “stepping down” from a more difficult laparoscopic to a less difficult robotic technique. This means that surgeons who are not able to perform a TME by laparoscopy might be able to do so by the robotic approach.

This study has limitations due to its retrospective nature, low case numbers, small number of surgeons included and the lack of quality grading of the specimens by the pathologist. The high number of cases proctored may also have a considerable impact on the learning curve, which could not be assessed. We had ten cases proctored for each surgeon, which may vary from other centres due to the cost implications of proctoring. Furthermore, the model used for this study only allows the assessment of a surgeon’s robotic performance in comparison to his own laparoscopic proficiency. Nevertheless, this study suggests that skills attained during laparoscopic TME surgery are transferable to robotic surgery and that the formal learning curve of experienced laparoscopic colorectal surgeons for robotic TME may be limited when compared to their individualised laparoscopic proficiency targets.