Gastric cancer is the fifth most common cancer and the fourth leading cause of cancer-related deaths worldwide [1]. Surgical resection is the only curative treatment approach, and regional lymphadenectomy is recommended as a component of radical gastrectomy for gastric cancer [2]. Laparoscopic gastrectomy (LG) is increasingly being used for the treatment of gastric cancer because of its beneficial short-term effects and equivalent long-term outcomes when compared with open gastrectomy [3,4,5,6]. The da Vinci® Surgical System (Intuitive Surgical, Sunnyvale, CA, USA) was developed to overcome several disadvantages of conventional LG, including a limited range of motion with straight instruments and the surgeon’s hand tremor [2]. Most surgeons expect that the use of the da Vinci® Surgical System for the treatment of gastric cancer would overcome the technical difficulties of LG, improving its safety and reproducibility, and possibly leading to improved prognoses [7]. However, a large non-randomized prospective study (NCT01309256) showed that robotic gastrectomy (RG) has a longer duration of operation and higher cost than LG, with no difference in morbidity between the two methods, suggesting that RG may reduce cost-effectiveness [8]. However, an increasing number of studies, conducted mostly by expert surgeons in leading institutions for RG in Japan, have recently revealed favorable short-term outcomes of RG [2, 7, 9,10,11]. Our previous multi-institutional prospective study (UMIN000015388/jRCTs042180129), which was approved for Advanced Medical Technology (“Senshiniryo B”) managed by the Japanese Ministry of Health, Labour, and Welfare, successfully showed that RG for the treatment of cStage I/II gastric cancer reduced the morbidity rate (Clavien–Dindo classification grade ≥ IIIa) of patients to less than half of that in a historical control group of patients who underwent LG in three leading institutions, i.e., Kyoto, Saga, and Fujita Health universities [7]. Considering the clinical advantages of RG shown in that study, the Japanese Ministry of Health, Labour, and Welfare recognized RG as part of LG under the universal health insurance coverage, starting from April 2018. However, no additional fee is reimbursed to the hospital for the use of RG instead of LG. This is because only a few reports have been conducted to investigate the survival benefits of RG, which is one of the most critical factors that determine cost-effectiveness [7, 12, 13]. Therefore, the aim of the present study (UMIN000034366) was to determine and compare the 3-year oncological outcomes of RG and LG, using the data of patients who underwent RG in our previous study (UMIN000015388) and those of historical controls who underwent LG.

Materials and methods

Study design and cohort development

This multi-institutional, retrospective, comparative study was designed to assess whether RG improves the prognosis of patients with primary cStage I or II gastric cancer when compared with LG. The RG group comprised 326 patients from 15 institutions who prospectively underwent RG between October 2014 and January 2017 in the abovementioned previous study (UMIN000015388) [7]. The LG group consisted of the historical controls of that study, which included 801 patients from three institutions (338, 248, and 215 patients from Fujita Health University, Saga University, and Kyoto University, respectively) who underwent insured LG between 2009 and 2012 [7]. These three institutions closely communicated with each other, engaged in personnel exchange, and standardized the procedures for LG considering the outermost layer-oriented approach [14, 15]. In addition, they performed LG for any operable patient with resectable gastric cancer who had been hoping for an insured minimally invasive procedure since the early 2000s. A total of 1343 consecutive patients with primary gastric cancer underwent gastrectomy in these institutions between 2009 and 2012. Of these, 1212 patients underwent curative gastrectomy, whereas 998 underwent curative LG procedures, including laparoscopic distal, proximal, and total gastrectomy. A total of 801 of the 998 LGs were performed on patients with cStage I or II gastric cancer without preoperative chemotherapy. The RG procedure used in Japan was established and standardized by I.U. and his colleagues considering the concept of the outermost layer-oriented approach [2, 7, 12, 14]. The procedure was shared with the three abovementioned institutions and gradually expanded to the 15 institutions that participated in the UMIN000015388 study [2, 7, 12, 14]. Patients who met the following criteria were included in the study: operable under general anesthesia; histologically proven gastric adenocarcinoma (common type); cStage I or II disease not indicated for endoscopic resection according to the Japanese Gastric Cancer Treatment Guidelines [16]; curatively treated with total, distal, or proximal gastrectomy involving D1 + or D2 lymph node dissection; and age ≥ 18 years. Patients who underwent preoperative chemotherapy or those with serious mental disorders who might, therefore, not be able to provide informed consent were excluded.

Selection of quality indicators and confounding factors

Consensus meetings were held by a study team that consisted of surgeons and biostatisticians to determine quality indicators, adjust for confounding factors, and compare the outcomes of RG and LG using propensity score-based analyses, including the inverse probability of treatment weighting method and propensity score matching for primary and sensitivity analyses, respectively [17]. The primary outcome measure was the 3-year overall survival rate (3yOS) because the planned follow-up duration in the UMIN000015388 study was 3 years [7]. The secondary outcomes are described in Online Resource 1.

Preoperative factors that served as a basis for determining whether a patient would undergo RG or LG were identified to estimate the propensity score [18]. Several additional risk predictors identified in a previous study were also included in the model [19,20,21,22,23]. Covariates for propensity score estimation included patient’s age at the time of surgery, sex, body mass index, American Society of Anesthesiologists physical status (ASA-PS) classification, presence of comorbidities, history of laparotomy, tumor size, clinical tumor stage, type of resection, extent of lymph node dissection, type of alimentary tract reconstruction, and surgeon volume (the number of procedures performed by the surgeon). To control for surgeon volume, an operating surgeon who had performed ≥ 100 LGs before any of the patients enrolled in the LG group had undergone surgery was defined as an expert surgeon in the LG group [24]. Likewise, we recognized any RG surgeon who was able to perform LG expert procedures using the surgical robot as an RG expert, and an LG expert who had performed ≥ 40 RGs before any of the patients enrolled in the RG group had undergone surgery was defined as an expert surgeon in the RG group, considering the learning curve for RG among experienced LG surgeons [18, 25, 26]. All procedures were performed or supervised by an expert surgeon.

Clinicopathological findings and tumor stages were classified according to the 14th edition of the Japanese Classification of Gastric Carcinoma [27]. The extent of lymph node dissection and gastric resection was determined according to the Japanese Gastric Cancer Treatment Guidelines [16]. Details of preoperative diagnosis and postoperative management are shown in Online Resource 2. The observation period for each patient was 3 years after surgery. Overall survival was calculated from the date of resection to the date of the last follow-up or death from any cause. Recurrence-free survival was calculated from the date of resection to the date of the first recurrence, last follow-up, or death from any cause, whichever occurred first.

Data management

The data center (Center for Clinical Trial and Research Support, Fujita Health University) used in the UMIN000015388 study prospectively collected all data for the patients in the RG group using case report forms in a linkable anonymized fashion, as determined in our previous report [7]. The same data center created data sheets for the LG group in the present study, based on the case report forms used for the RG group, and provided them to each institution. The medical charts of each institution were retrospectively reviewed, and the data sheets were filled out and sent back to the data center. After the data center gathered all the raw data for each group, those data were reviewed on a patient-by-patient basis, and the dataset of each group was fixed thereafter.

Statistical analysis

A biostatistician blinded to the outcome conducted propensity score modeling and performed propensity score-based analyses, including inverse probability of treatment weighting and propensity score matching [17]. The propensity score was estimated using logistic regression models to predict the exposure of undergoing RG or LG from the confounding variables described above. The balance of the adjusted cohort was assessed by calculating the standardized difference between the two groups. An absolute standardized difference above 0.1 indicated a meaningful imbalance. Based on the propensity score, each patient was weighted using the inverse probability of receiving each treatment, thus generating weighted synthetic samples in which observed baseline co-variables were not confounded by the assignment of treatment. For estimation of variance, we incorporated the robust variance estimator to deal with the within-subject correlation induced by weighting. In addition, propensity score matching was performed to evaluate the sensitivity of the results. Greedy nearest neighbor matching was performed using a caliper with 0.2 standardized differences of the logit of the estimated propensity score at a ratio of 1:1 without replacement. Categorical and continuous variables were compared using a linear mixed-effects model. Data are expressed as medians with ranges or odds ratios (ORs) with 95% confidence intervals (CIs) unless otherwise stated. Three-year outcomes were assessed using the Kaplan–Meier method and Cox proportional hazards regression analysis. Univariate and multivariate stratified Cox proportional hazards regression analyses were conducted to examine the factors that determine 3yOS and 3-year recurrence-free survival rate (3yRFS). Mortality risk was estimated by calculating hazard ratio (HR) and 95% CI. All comparisons were two-sided, and a p-value < 0.05 indicated significance. All analyses were conducted using SAS Ver.9.4 (SAS Institute, Cary, NC, USA).

Results

Patient demographic characteristics

A flow diagram of the patient selection process is shown in Fig. 1. A total of 1127 patients (326 in the RG group and 801 in the LG group) were enrolled in this study. We excluded 44 patients, all in the LG group, from the analysis set because they had multiple primary cancers (n = 38), special histological types (n = 3), cStage ≥ III or unknown disease (n = 2), and duplicate records (n = 1). Thus, the full analysis set comprised 326 patients in the RG group and 757 in the LG group. Inverse probability of treatment weighting was performed for 326 patients in the RG group and 752 in the LG group. Five patients in the LG group were excluded from the weighted population owing to missing covariate variable data. The background characteristics of the patients are summarized in Table 1. Before weighting, the patients treated using RG were younger and had smaller tumor sizes. The proportion of patients in the RG group who had comorbidities and were operated on by expert surgeons (RG, 65.0% vs. LG, 51.9%) was greater than that in the LG group. The proportion of patients in the RG group who had ASA-PS scores ≥ 2, had a history of laparotomy, and underwent total gastrectomy (RG, 14.4% vs. LG, 24.3%) was smaller than that in the LG group. After weighting, the standardized difference of all these confounding factors was reduced to 0.09 or less.

Fig. 1
figure 1

Flow diagram of the patient selection process. RG robotic gastrectomy; LG laparoscopic gastrectomy; IPTW inverse probability of treatment weighting

Table 1 Patient background data

Three-year outcomes in the weighted population

The 3-year outcomes are shown in Figs. 2a–d and 3a–d. In the RG group, 3yOS was significantly improved (RG, 96.3% vs. LG, 89.6%; HR, 0.34 [0.15, 0.76]; p = 0.009) (Fig. 2b), and there was a trend toward an increase in 3yRFS (RG, 92.3% vs. LG, 87.2%; HR 0.58 [0.32, 1.05]; p = 0.073) (Fig. 3b). Sub-analyses stratified according to the presence of pStage IA and pStage ≥ IB disease revealed that RG improved both 3yOS (RG, 99.7% vs. LG, 94.4%; HR 0.05 [0.01, 0.38]; p = 0.004) and 3yRFS (RG, 99.7% vs. LG, 93.7%; HR 0.05 [0.01, 0.34]; p = 0.003) in patients with pStage IA disease (Figs. 2d and 3d). There was a tendency toward improvement in 3yOS (RG, 90.9% vs. LG, 80.8%; HR, 0.44 [0.19, 1.02]; p = 0.056) and 3yRFS (RG, 80.6% vs. LG, 75.3%; HR 0.74 [0.41, 1.36]; p = 0.338) in patients with pStage ≥ IB disease (Figs. 2d and 3d), but these differences were not significant. Similar trends were shown in sub-analyses stratified by the presence of pStage I and pStage ≥ II diseases (see Fig. S1, Online Resource 3). Univariate analyses, in which the independent variables consisted of treatment using RG and the covariates for propensity score estimation, showed that treatment using RG, age, tumor size, clinical tumor stage, type of resection, extent of lymph node dissection, and type of alimentary tract reconstruction were positive or negative risk factors for any cause of death. Multivariate analyses using these risk factors revealed that treatment using RG and distal gastrectomy were the factors that contributed to improvement in overall survival, whereas age and clinical tumor stage deteriorated overall survival (Table 2). The results of multivariate analysis for 3yRFS are shown in Table 3. All-cause death and deaths from other diseases, but not gastric cancer-related deaths, were reduced in the RG group (Table 4).

Fig. 2
figure 2

Kaplan–Meier estimates of overall survival. a Unweighted overall survival of the RG and LG groups. b Weighted overall survival of the RG and LG groups. c Unweighted overall survival of the pStage IA/ ≥ IB subgroups. d Weighted overall survival of the pStage IA/ ≥ IB subgroups. RG robotic gastrectomy, LG laparoscopic gastrectomy; HR hazard ratio; OS overall survival. aCox proportional hazards regression analysis

Fig. 3
figure 3

Kaplan–Meier estimates of recurrence-free survival. a Unweighted recurrence-free survival of the RG and LG groups. b Weighted recurrence-free survival of the RG and LG groups. c Unweighted recurrence-free survival of the pStage IA/ ≥ IB subgroups. d Weighted recurrence-free survival of the pStage IA/ ≥ IB subgroups. RG robotic gastrectomy; LG laparoscopic gastrectomy; HR hazard ratio; RFS recurrence-free survival. aCox proportional hazards regression analysis

Table 2 Factors associated with 3-year overall survival in the weighted population
Table 3 Factors associated with 3-year recurrence-free survival in the weighted population
Table 4 Recurrence sites and causes of death

There was no difference in re-operation rate (RG, 1.0% vs. LG, 1.1%, Table 5) and recurrence rate (RG, 7.5% vs. LG, 7.1%, Table 4) between the RG and LG groups. In addition, no differences were observed in the common patterns of recurrence, including peritoneal dissemination (RG, 3.9% vs. LG, 4.9%), hepatic metastasis (RG, 1.8% vs. LG, 1.5%), abdominal wall muscular layer metastasis (RG, 1.1% vs. LG, 0.5%), distant lymph node metastasis (RG, 0.8% vs. LG, 1.1%), and local recurrence (RG, 0.8% vs. LG, 0.3%) between the two groups (Table 4). Regarding the remaining patterns of recurrence, the number of events for each pattern was too small to determine practical significance.

Table 5 Postoperative complications

Postoperative complications in the weighted population

The postoperative complications are presented in Table 5. Apart from the unweighted group, RG did not improve the morbidity rate in the weighted group (RG, 3.7% vs. LG, 5.0%). A similar trend was observed in the incidence of intra-abdominal infectious complications (RG, 2.4% vs. LG, 4.1%). RG attenuated some of the adverse events, including anastomotic leakage (RG, 0.2% vs. LG, 2.2%) and intra-abdominal abscess (RG, 0.0% vs. LG, 1.6%). However, there was no difference between the RG and LG groups in terms of pancreatic fistula incidence (RG, 2.2% vs. LG, 0.9%). Although pulmonary complications, sepsis, renal complications, anastomotic stenosis/passage obstruction, gastrointestinal bleeding, and in-hospital mortality seemed to be attenuated, and intra-abdominal bleeding seemed to be increased in the RG group, the numbers of these events were too small to determine practical significance.

Surgical outcomes in the weighted population

The surgical outcomes are summarized in Table 6. Although RG increased medical costs and surgical costs, it improved estimated blood loss and duration of postoperative hospitalization. No differences were observed between the RG and LG groups in terms of operative time, number of dissected lymph nodes, and conversion to open surgery.

Table 6 Surgical outcomes

Sensitivity analyses

After propensity score matching, data of 311 patients who underwent RG and 311 who underwent LG were retrieved from the full analysis set. The standardized difference of all the confounding factors was reduced to 0.08 or less (see Table S1, Online Resource 4, which shows patient demographic data before and after population matching). As shown in Online Resource 5 (Fig. S2), RG improved 3yOS (RG, 97.1% vs. LG, 89.2%; HR 0.28 [0.13, 0.59]; p < 0.001) and 3yRFS (RG, 94.2% vs. LG, 86.7%; HR 0.38 [0.21, 0.70]; p = 0.002). Univariate analyses showed that treatment using RG, tumor size, clinical tumor stage, and extent of lymph node dissection were positive or negative risk factors for 3yOS and 3yRFS. Multivariate analyses using these risk factors revealed that treatment using RG was the only factor associated with 3yOS and 3yRFS (see Table S2 and Table S3, Online Resources 6 and 7). The postoperative outcomes and surgical outcomes are summarized in Online Resources 8 and 9 (Table S4 and Table S5), respectively.

Discussion

This study was conducted to determine the 3-year outcomes of RG for the treatment of gastric cancer. We expanded on our previous single-arm study (UMIN000015388) [7] and retrospectively confirmed our hypothesis that RG improves overall survival more than LG. Considering these outcomes, the Japanese Ministry of Health, Labour, and Welfare decided to increase the medical remuneration points for RG starting from April 2022. This study yielded three major findings.

First, the 3-year safety of RG was demonstrated. In terms of the LG group, 3yOS (overall, 89.6%; pStage IA, 94.4%; pStage ≥ IB, 80.8%) and 3yRFS (overall, 87.2%; pStage IA, 93.7%; pStage ≥ IB, 75.3%) were comparable with those reported in previous studies conducted in high-volume centers in East Asia, considering that approximately a quarter of the patients enrolled in this study underwent total or proximal, but not distal, gastrectomy; had pStage ≥ II disease; and had lymph node metastasis [3,4,5,6, 22, 28, 29]. Short-term postoperative outcomes, including in-hospital mortality (0.3%) and morbidity (5.0%), were better than those reported in previous studies [18]. RG further improved 3yOS (overall, 96.3%; pStage IA, 99.7%; pStage ≥ IB, 90.9%) and 3yRFS (overall, 92.3%; pStage IA, 99.7%; pStage ≥ IB, 80.6%), as well as surgical and short-term outcomes including blood loss, duration of postoperative hospital stay, and partly postoperative complications. Recurrence rates and patterns were similar between RG and LG. These data collectively suggest surgical and oncological safety of RG.

Second, the benefits of RG for improving survival were identified in the present study, as well as in a previous single-center retrospective study performed in Japan [23], although most previous reports failed to demonstrate a prognostic benefit of RG over LG [20, 30, 31]. This may be at least partly because RG reduces some postoperative complications. Various reports have shown that severe postoperative morbidities are associated with impaired long-term prognosis [32]. Better surgical margins and more radical lymph node dissection, which may be achieved with RG [33], are less likely to contribute to better survival in the RG group because the survival benefit was more remarkable in patients with earlier-stage disease. It is plausible that the magnified vivid surgical view and the improved range of motion brought about by the da Vinci® Surgical System might enable gentler tumor resection along the dissectable layers to be traced. This might reduce the intra- and postoperative dissemination of circulating tumor cells, decrease systemic inflammatory responses, and lead to better recovery and prognosis with a smaller chance of tumor recurrence [9, 10, 14, 15, 34]. Further research is required to examine the mechanisms through which RG improves survival, as well as to determine if RG is truly less invasive than LG.

Third, RG extended overall survival more greatly than recurrence-free survival and reduced deaths from other diseases rather than gastric cancer-related deaths. This may happen because patients who underwent RG may be in such a better physical condition that they were less likely to be affected by other diseases and were able to start chemotherapy sooner with better tolerance even if cancer recurrence occurred. Additionally, the following biases might have affected the outcomes: First, chronological bias may be present because patients in the RG group received treatment for gastric cancer 5 years later than those in the LG group. We did not include patients who underwent preoperative chemotherapy, which is not recognized as a standard treatment option in the Japanese guidelines [16]. However, patients with pStage ≥ II disease basically underwent S-1-based adjuvant chemotherapy, whereas those with recurrent disease received palliative chemotherapy when applicable, in accordance with the Japanese Gastric Cancer Treatment Guidelines [16]. The outcomes of palliative chemotherapy, which can affect overall survival but not recurrence-free survival, may have considerably improved over time during the study period [35]. However, this impact should be minimal because the effectiveness of RG was the greatest in patients with pStage IA disease, who have little chance of recurrence and are treated with surgery alone unless tumor recurrence occurs [16]. In addition, perioperative interventions to prevent postoperative complications, including smoking cessation, oral hygiene, early ambulation, and physical and nutritional therapy, were mostly unchanged during the study period. Second, selection bias due to differences in socioeconomic status between the groups may not be fully eliminated because each patient who underwent RG needed to pay approximately 700,000 JPY, even when using the “Senshiniryo” B system, in addition to the 500,000 JPY reimbursement from Intuitive Surgical, Inc., whereas the use of insured LG involved a cost of only approximately 100,000 JPY per patient [7]. However, patients in both groups received the same postoperative management and cancer follow-up under the Japanese universal health insurance system where socioeconomic status is less likely to systematically influence the treatment decision for intervention [36]. Third, the RG group was derived from the population of a prospective study, in which patients with good health conditions and physiological status might have been selected. To mitigate the influence of such a bias, we balanced the patient demographic data using inverse probability of treatment weighting because it can be used to estimate HRs with negligible bias when assessing survival outcomes as the treatment effect in the entire population (treated and untreated individuals, average treatment effect), but not in treated individuals (average treatment effect on the treated), without reducing the sample size [17]. However, when using the inverse probability of treatment weighting method, it should be noted that individuals with extremely large weights may disproportionately influence results and yield estimates with high variance [17]. In the present study, we examined several models for propensity score calculation, including weight censoring, and selected the most optimal weight. Moreover, sensitivity analyses using propensity score matching, which determines the average treatment effect on the treated, confirmed a similar trend, indicating the robustness of the results.

The present study has some limitations. First, this was a retrospective study conducted using propensity score-based analyses, and we were unable to discuss unmeasured outcomes. Second, this study was conducted in high-volume institutions, and more than half of the RGs and LGs were performed by high-volume surgeons [19]. Therefore, it may be difficult to extrapolate these outcomes to real-world settings. Third, the cost-effectiveness of RG was not examined in this study. Further studies are warranted to determine whether the improved prognosis achieved with RG is worth its higher costs. Fourth, most patients in this study had cStage I disease; thus, it may be challenging to extrapolate the findings of this study to Western populations. Fifth, medical and surgical costs were examined considering the data from 325 RGs and 529 LGs, but not from the full analysis set, largely because those of patients who underwent LG at Saga University were not reserved.

In conclusion, this study showed surgical and oncological safety of RG considering the 3-year outcomes, compared with those of LG. A multicenter randomized controlled trial is warranted to determine if the advantageous 3-year outcomes of RG over LG revealed in this study are reproducible. We believe that the skills required to fully operate a robot considering the appropriate surgical concept could play a key role in enhancing the clinical benefits of RG.