Introduction

It is generally accepted that the gold standard for evaluating the efficacy of therapeutic interventions is randomized controlled trials (RCTs). In RCTs, random assignment of participants to treatment and control groups virtually eliminates distortion of results due to differences in patient characteristics between study groups. However, in most surgical studies, randomization is difficult for ethical and practical reasons [1]. In addition, RCTs are costly and inefficient because they require many resources, including subjects, time, and the cooperation of diverse experts, to estimate treatment effects with sufficient accuracy [2]. Therefore, as a practical alternative, many observational studies have been performed in actual clinical settings.

Observational studies are susceptible to biases such as confounding, selection, and differential ascertainment bias because they lack randomization and other elements of RCT design [3]. Some reports have suggested that both randomized and observational studies may produce very similar results [4, 5], while others have reported conflicting results [6]. However, the topics covered in these previous reports are very limited, and more empirical and quantitative evidence is needed to clarify the accuracy of and differences in each study design [7]. In recent years, case-matched studies have been frequently conducted in surgical research for appropriate confounder adjustment in observational studies, and the most common technique is propensity score matching [1]. The propensity score, proposed as a potential solution to the problem of confounding associations between treatment and outcome, represents the probability of being treated with an intervention based on variables measured during or before treatment [8]. Although there are methodological differences between case-matched studies and RCTs, such as patient selection and adjustment for confounders [9, 10], only one report, concerning rectal cancer, has investigated the similarities and differences between different study designs in the field of gastrointestinal surgery [2]. Therefore, it is still unclear what differences there are between RCTs, case-matched studies, and cohort studies in other gastrointestinal surgeries, and clarifying them is a very important clinical issue.

Thus, the purpose of this study was to investigate estimated treatment effects between RCTs, case-matched studies, and cohort studies regarding upper gastrointestinal surgery areas. As a clinical topic, we selected the comparison of laparoscopic distal gastrectomy (LDG) versus open distal gastrectomy (ODG) for advanced gastric cancer (AGC), which is one of the most discussed and interested issues among gastrointestinal surgeons. While there have been several meta-analysis studies that evaluated the efficacy of LDG in AGC [11,12,13,14], none of them focused on the differences in study design.

Therefore, in the present study, we evaluated the differences in study designs by addressing this clinical topic for which sufficient evidence has been accumulated.

Materials and methods

We performed a systematic review and meta-analysis in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [15].

Literature search strategy

We searched the PubMed, Cochrane Central Register of Controlled Trials, and Web of Science databases for studies in which LDG was compared with ODG for AGC published from inception until July 2021. The search terms used were “laparoscopy” OR “laparoscopic” AND “stomach neoplasms” OR “gastric cancer” OR “stomach cancer” AND “open gastrectomy” AND “distal gastrectomy” (Appendix S1). The reference lists of all relevant articles were evaluated to identify other related papers. The study title, study authors, year of publication, and study characteristics were checked, and duplicates were removed. Two authors (R.O. and Y.M.) independently reviewed the title and abstract of articles after eliminating duplicates. The same authors then evaluated the full text according to the study eligibility criteria described below. In cases of disagreement, the authors discussed or consulted a third author until agreement was reached.

Eligibility

The inclusion criteria were as follows: (1) RCTs, case-matched studies, or cohort studies; (2) studies that compared LDG versus ODG for AGC; (3) studies that provided available outcome data; and (4) articles written in English.

The exclusion criteria were as follows: (1) studies without appropriate data; (2) laboratory or animal studies; and (3) papers identified as letters, comments, correspondence, editorials, or reviews.

Data extraction and outcome parameters

Two authors (R.O. and Y.M.) collected the data independently. The following data were extracted: population characteristics (year of publication, study design, country in which the study was performed, number of patients), short-term outcome parameters (operative time, intraoperative blood loss, postoperative hospital stay, retrieved lymph nodes, postoperative complications), and long-term outcome parameters (recurrence, 3-year disease-free survival (DFS), 3-year overall survival (OS)). The collected data were double-checked by each author, and any discrepancies were resolved by rechecking and discussion.

Assessment of study quality and risk of bias

RCTs were assessed using the revised Cochrane risk-of-bias tool [16]. For observational studies, the Newcastle–Ottawa quality assessment scale (NOS) was used to assess the quality of the included studies [17]. The score ranged from 0 to 9 stars, and studies with a score of ≥ 6 were considered to be of a high quality. For each outcome, a funnel plot was used to examine the publication bias among the included studies.

Statistical analyses

All statistics analyses were carried out using Review Manager version 5.3 software (The Cochrane Collaboration, Oxford, UK). The random effects model were used. Heterogeneity was assessed using the I2 statistic. Odds ratio (OR) with corresponding 95% confidence interval (CI) was evaluated for categorical variables. The mean difference (MD) with corresponding CI was assessed for continuous variables. The mean with standard deviation (SD) was estimated from the median, the range, and the size of a sample using the method of Hozo et al. [18]. Survival outcome was analyzed according to the pooled hazard ratio (HR) and 95% CI. If the HR was not provided directly, an estimated HR was calculated from Kaplan–Meier curves according to the method of Tierney et al. [19]. The P value of < 0.05 was defined statistically significant.

Results

Study characteristics

The comprehensive electronic literature search detected 1385 articles. In total, 392 articles were removed due to duplication. According to the eligible criteria, 849 were excluded by title/abstract screening. The remaining 144 articles were evaluated by full-text review. Ultimately, 23 studies with 13698 patients were included (Fig. 1) [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. Although two RCTs were from the same trial (CLASS-01 trial, NCT01609309) [31, 38], one reported the short-term outcomes [31] and the other was a follow-up that reported the long-term outcomes [38], so both were included in this study to analyze the results of each. The included studies were 5 RCTs, 8 case-matched studies, and 10 cohort studies. The characteristics of the included studies are summarized in Table 1.

Fig. 1
figure 1

Flow diagram of study selection

Table 1 Study characteristics

The study quality and risk of bias

The risk of bias assessed using the revised Cochrane risk-of-bias tool is shown in Table 2. For overall risk-of-bias judgement, all included RCTs were rated as low risk of bias. The quality of the included observational studies was assessed using the NOS, and all studies were graded as a high quality (Table 3). In addition, we conducted a funnel plot analysis to assess the possibility of a publication bias (Fig. 2). The spread of the distribution of the effect sizes of the studies in the funnel plot was more pronounced in observational studies than in others.

Table 2 Quality assessment of the included RCTs based on the revised Cochrane risk-of-boas tool
Table 3 Quality assessment of observational studies
Fig. 2
figure 2

Funnel plot of publication bias. a Operative time. b Intraoperative blood loss. c Postoperative hospital stay. d Retrieved lymph nodes. e Postoperative complications. f Recurrence. g The 3-year disease-free survival. h The 3-year overall survival

Short-term outcomes

Operative time

A total of 21 studies with 6222 patients (4 RCTs with 2651 patients, 7 case-matched studies with 1792 patients, and 10 cohort studies with 1779 patients) reported operative time (Table 4). The meta-analysis showed that the operative time of the LDG group was significantly longer than that in the ODG group in RCTs (MD: 49.2, 95% CI: 29.38 to 69.02, P < 0.00001), case-matched studies (MD: 32.25, 95% CI: 15.2 to 55.3, P = 0.0006), and cohort studies (MD: 47.85, 95% CI: 29.37 to 66.33, P < 0.00001) (Fig. 3).

Table 4 Summary of meta-analysis
Fig. 3
figure 3

Results of the meta-analysis of operative time stratified by study design

Intraoperative blood loss

In total, 17 studies with 4831 patients (3 RCTs with 1562 patients, 5 case-matched studies with 1612 patients, and 9 cohort studies with 1657 patients) revealed intraoperative blood loss (Table 4). The LDG group showed significantly less intraoperative blood loss than the ODG group in RCTs (MD: − 35.91, 95% CI: − 67.54 to − 4.28, P = 0.03), case-matched studies (MD: − 44.89, 95% CI: − 64.65 to − 25.12, P < 0.00001), and cohort studies (MD: − 179.3, 95% CI: − 235.81 to − 122.8, P < 0.00001) (Fig. 4).

Fig. 4
figure 4

Results of the meta-analysis of intraoperative blood loss stratified by study design

Postoperative hospital stay

Twenty-one studies with 6222 patients (4 RCTs with 2651 patients, 7 case-matched studies with 1792 patients, and 10 cohort studies with 1779 patients) showed postoperative hospital stay (Table 4). The LDG group had significantly less postoperative hospital stay than the ODG group in RCTs (MD: − 0.73, 95% CI: − 1.28 to − 0.19, P = 0.009), case-matched studies (MD: − 2.49, 95% CI: − 3.84 to − 1.13, P = 0.0003), and cohort studies (MD: − 2.75, 95% CI: − 4.1 to − 1.41, P < 0.00001) (Fig. 5).

Fig. 5
figure 5

Results of the meta-analysis of postoperative hospital stay stratified by study design

Number of retrieved lymph nodes

A total of 21 studies with 6222 patients (4 RCTs with 2651 patients, 7 case-matched studies with 1792 patients, and 10 cohort studies with 1779 patients) reported the number of retrieved lymph nodes (Table 4). The number of retrieved lymph nodes was significantly larger in the ODG group than in the LDG group in RCTs (MD: − 1.19, 95% CI: − 2.23 to − 0.04, P = 0.04). In contrast, there were no significant differences between the groups in case-matched studies (MD: − 0.14, 95% CI: − 1.63 to 1.35, P = 0.85) and cohort studies (MD: 0.21, 95% CI: − 2.16 to 2.58, P = 0.86) (Fig. 6).

Fig. 6
figure 6

Results of the meta-analysis of retrieved lymph nodes stratified by study design

Postoperative complications

Twenty-two studies with 13,698 patients (4 RCTs with 2651 patients, 8 case-matched studies with 9268 patients, and 10 cohort studies with 1779 patients) revealed the incidence of postoperative complications (Table 4). There were no significant differences between the two groups in RCTs (OR: 0.82, 95% CI: 0.56 to 1.20, P = 0.30). Conversely, the LDG group had a significantly lower incidence of postoperative complications than the ODG group in case-matched studies (OR: 0.84, 95% CI: 0.74 to 0.95, P = 0.006) and cohort studies (OR: 0.60, 95% CI: 0.44 to 0.84, P = 0.002) (Fig. 7).

Fig. 7
figure 7

Results of the meta-analysis of postoperative complications stratified by study design

Results of long-term outcomes

Recurrence

In total, 10 studies with 4525 patients (2 RCTs with 2013 patients, 2 case-matched studies with 1302 patients, and 6 cohort studies with 1210 patients) showed the incidence of recurrence (Table 4). There were no significant differences between the two groups in RCTs (OR: 1.10, 95% CI: 0.87 to 1.39, P = 0.45), case-matched studies (OR: 1.04, 95% CI: 0.82 to 1.32, P = 0.76), and cohort studies (OR: 0.85, 95% CI: 0.66 to 1.09, P = 0.21) (Fig. 8).

Fig. 8
figure 8

Results of the meta-analysis of recurrence stratified by study design

The 3-year DFS

Six studies with 3631 patients (3 RCTs with 2209 patients and 3 case-matched studies with 1422 patients) reported the 3-year DFS (Table 4). There were no significant differences between the two groups in RCTs (HR: 1.07, 95% CI: 0.88 to 1.31, P = 0.51) and case-matched studies (HR: 0.83, 95% CI: 0.53 to 1.3, P = 0.42) (Fig. 9). However, the LDG group tended to be correlated with favorable 3-year DFS in case-matched studies compared to in RCTs.

Fig. 9
figure 9

Results of the meta-analysis of the 3-year disease-free survival stratified by study design

The 3-year OS

A total of 7 studies with 3565 patients (2 RCTs with 2013 patients and 5 case-matched studies with 1552 patients) showed the 3-year OS (Table 4). There were no significant differences between the two groups in RCTs (HR: 1.11, 95% CI: 0.87 to 1.43, P = 0.40) and case-matched studies (HR: 0.68, 95% CI: 0.38 to 1.24, P = 0.21) (Fig. 10). However, the LDG group tended to be associated to favorable 3-year OS in case-matched studies compared to in RCTs.

Fig. 10
figure 10

Results of the meta-analysis of the 3-year overall survival stratified by study design

Discussion

In this study, we performed a meta-analysis including 23 studies for 5 short-term outcomes and 3 long-term outcomes. There was no difference in estimated treatment effects between RCTs and case-matched studies for all outcomes except for the number of retrieved lymph nodes and postoperative complications. For all analyzable items, the results of cohort studies were similar to those of case-matched studies. In terms of short-term outcomes, both RCTs and case-matched studies found significantly longer operative time, less intraoperative blood loss, and shorter postoperative hospital stay in LDG compared to ODG. Postoperative complications were significantly less in case-matched studies but not in RCTs. However, given the distribution of the 95% CIs for postoperative complications, we considered the estimated treatment effect in both studies to be comparable. Regarding long-term outcomes, although LDG had relatively better 3-year DFS and 3-year OS in case–matched studies, there was no significant difference between LDG and ODG in both RCTs and case–matched studies. Thus, the findings of RCTs and case-matched studies were similar for almost all outcomes.

The estimated treatment effects of LDG in case-matched studies were intermediate between RCTs and cohort studies in terms of intraoperative blood loss, postoperative hospital stay, retrieved lymph nodes, and recurrence. RCTs can adjust for all confounders (including unknown ones), whereas propensity score matching, a typical case-matching method, has been shown to potentially miss some confounders [43]. Therefore, such differences in estimated treatment effects among study designs may be due to the different degree of adjustment for covariates in each study design. The amount of intraoperative blood loss, hospitalization period, number of retrieved lymph nodes, and recurrence are outcomes that can be objectively assessed from medical records, surgical records, pathology reports, and imaging findings. Hence, it is suggested that differences in research design may affect even objective endpoints. In addition, it has been reported that observational studies such as case-matched studies and cohort studies may overestimate treatment effects [44]. The nature of the overestimation of observational studies, which was also observed in this study regarding upper gastrointestinal surgery areas, is consistent with the results of a study conducted in the lower gastrointestinal surgery field [2], indicating that this review is significant in terms of accumulating evidence regarding differences in treatment effects among study designs in the gastrointestinal surgery field. Possible causes of overestimation in observational studies include missing data, possible crossover, publication bias, selective reporting of results, selection bias, outcome ascertainment bias, immortal-time bias, and residual confounding [43, 44]. Therefore, although observational studies are a very useful research tool in real clinical practice, the design and actual implementation of the analysis must be critically evaluated on a case-by-case basis in order to assess the true magnitude of treatment effects.

The strengths of this study are its novelty in the absence of similar studies in the field of upper gastrointestinal surgery and the inclusion of a relatively large number of studies to assess the differences among RCTs, case-matched studies, and cohort studies. However, there are several limitations of the present study. First, cohort studies lacked the long-term outcome data for calculating the HR needed to conduct a meta-analysis. These would have allowed us to examine in more detail the differences in long-term outcomes between the study designs. Second, only published studies were included in the present study, which made it difficult to eliminate potential publication bias. Finally, most of the articles included in this review were conducted in East Asia. Therefore, more extensive studies should be conducted in other countries and regions to improve the quality of the research and to find general trends in differences among study designs.

In recent years, observational data representative of clinical practice has become available from nationwide clinical databases, such as the National Clinical Database (NCD) and the National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB) [45, 46]. Given this background, case-matched studies using modern design methods such as propensity score matching will become more and more important in the future because it is an efficient way to evaluate the effects of interventions in typical clinical settings [1]. In addition, the results of properly conducted and analyzed observational studies are expected to help prioritize research needs that should be addressed in more resource-intensive RCTs. Therefore, this study, which compares the estimated treatment effects of RCTs and observational studies, has important implications for clinical practice and future research.

Conclusion

Our analysis indicated that the estimated treatment effects of LDG for AGC in the case-matched study were almost the same as in the RCTs. However, to assess the true magnitude of the treatment effect, the design and actual implementation of the analysis must be critically evaluated.