Early-stage gastric cancer (EGC) is the gastric carcinoma limited to mucosa or submucosa regardless of lymph node metastases [1]. Patients with submucosal tumor invasion less than 500 μm (SM1) are usually free from lymph node metastases. Therefore, endoscopic resection has been widely accepted as the optimal treatment for mucosal and SM1 gastric cancers as its invasiveness is minimal and cure rate is comparable to surgery, while gastrectomy with lymph node dissection is usually recommended for patients with deeper invasive EGCs. Overestimation of the invasion depth of EGCs will lead to unnecessary gastrectomy, while underestimation may require subsequent surgery after ESD, so accurate estimation of the invasion depth prior to resection is crucial for the therapeutic decision making of EGCs.

According to the Japanese classification of EGCs, the invasion depth can be estimated by the macroscopic characteristics under conventional endoscopy, such as an uneven surface, marked marginal elevation, and abrupt cutting/fusion of converging folds, etc., which were reported to be associated with submucosal invasion [2, 3], but these empirical criteria, though widely used among Japanese endoscopists, were complicated, non-systematic, and less consistent. For instance, the interobserver agreements of these endoscopic characteristics were only 0.54–0.60 [4,5,6] even among experienced reviewers [3]. Furthermore, the accuracies of the endoscopic prediction for EGCs were unsatisfactory as reported to be 73.7–83.6%, with 15–19.3% of EGCs overestimated, and 6.3–11.3% underestimated in the depth of invasion [7,8,9]. Systemic value evaluation and reproducibility of those endoscopic findings for deep invasion are needed.

On the other hand, endoscopic ultrasonography (EUS), as an effective tool to visualize the tomographic structure of gastric walls, can facilitate objective assessment of the invasion depth of EGCs, and was recommended by National Comprehensive Cancer Network (NCCN) guidelines for gastric cancer staging [10, 11], but its accuracy for tumor staging in EGCs was only 70–76%, with rates of overestimation of the invasion depth as high as 18–42% [4, 8, 12] when singly used.

We hypothesized that joint visualization by conventional endoscopy and EUS from different perspectives might improve the accuracy of invasion depth of EGCs, therefore a prediction model with characteristics from both conventional endoscopy and EUS was developed and validated in this study to facilitate invasion estimation and therapeutic decision making of EGCs.

Materials and methods

Study population

38,227 Individuals underwent endoscopic biopsies at the Peking Union Medical College Hospital (PUMCH) from January 2006 to December 2015, with 3295 of them were diagnosed with gastric cancers pathologically in biopsied specimens. We reviewed the endoscopic images of these patients, and identified 691 patients as presenting with superficial lesions. Among them, only 421 patients received resection at PUMCH. We followed the Japanese Gastric Cancer Treatment Guidelines in our institution [13]: EMR or ESD is indicated for a differentiated-type adenocarcinoma without ulcerative findings [UL(−)], of which the depth of invasion is clinically diagnosed as T1a and the diameter is ≤2 cm. The expanded indications of ESD include the following tumors clinically diagnosed as T1a and: (a) of differentiated-type, UL(−), but >2 cm in diameter; (b) of differentiated-type, UL(+), and ≤3 cm in diameter; (c) of undifferentiated-type, UL(−), and ≤2 cm in diameter. The lesions beyond these indications will receive surgery.

There were 153 patients received subsequent endoscopic submucosal dissection (ESD) and 268 patients received surgery. Among the 421 patients, 226 patients were excluded from the study because they either did not receive EUS before ESD or surgery (n = 208), or had ambiguous pathological conclusions (n = 18). A total of 195 patients with 205 lesions were finally identified as subjects of the study (Supplementary Figure 1). The study protocol was approved by the Institutional Ethics Committee of the PUMCH.

Endoscopy

All lesions were observed by conventional endoscopy (video endoscope Q260 or H260, Olympus Medical Systems, Tokyo, Japan) followed by EUS. EUS was performed by 5 endoscopists with more than 2 years of EUS experience using the radial array echoendoscope technique (GM-240, 7.5 MHz, Olympus, Tokyo, Japan) or a miniature ultrasonic probe (UM-DP12-25R; 20 MHz, Olympus) that was connected to an endoscopic ultrasonic observation unit (UM2000; Olympus, EUM-1, Aloca α5). De-aerated water was instilled to improve transmission of the US beam. During procedures, we used a mini-probe to observe lesions on most occasions (n = 162), but a radial endosonoscope would be used for certain lesion locations or clinical conditions, with endosono-staging starting from the default frequency of 7.5 MHz until satisfied imaging is yielded.

Endoscopic imaging review

The superficial gastric cancers were classified macroscopically according to the Japanese classifications [1]. We grouped the macroscopic types into three categories: elevated type (I, IIa, and IIa + IIc without ulceration), flat type (IIb), and depressed type (IIc, III, IIc + IIa, and IIa + IIc with ulceration). Ten other endoscopic features were also selected as signs of deeper invasion from the literatures [2, 3, 7,8,9, 14] to be investigated in the study (Supplementary Figure 2). Ulceration was defined as lesions with active ulceration. Remarkable redness was defined to as a reddish area similar to regenerative epithelium (Fig. 1A). Uneven surface was defined as nodules in the tumor’s surface. Margin elevation was defined as subepithelial tumor-like protrusion without flexibility. Enlarged folds included thickened or merged convergent folds. Abrupt cutting of converging folds referred to sudden interruption of converging folds (Fig. 1B). Fusion of convergent folds referred to melting of convergent fold ends. Unclear demarcation was defined as no margin present from the background mucosa. Spontaneous bleeding was defined as lesions with hemorrhage without touch. Stiffness and deformation of the gastric wall were defined as a gastric wall with poor extension with full air inflation or distortion of the gastric wall. In EUS staging, lesions confined to the first and/or the second sonographic layers were considered to be mucosal tumors, and SM2 or deeper invasion (≥SM2) was defined as an obvious irregular narrowing, massive invasion with hypoechoic mass, budding or interruption into the third layer or beyond (Fig. 1C). Lesions with mildly unsmooth surface of ambiguous invasion into the third layer were defined as SM1 invasion [3].

Fig. 1
figure 1

Typical features of conventional endoscopy and EUS for assessing the invasion of superficial gastric cancers. A Remarkable redness; B Abrupt cutting of converging folds; C Hypoechoic mass and budding and interruption of the sonographic layer 4; D Location in the upper third of the stomach

We delabeled and digitally coded all the imaging files. Two endoscopists, one senior with 10-year experience in EUS (XW) and one junior with 3-year experience in EUS (QWJ) reviewed the files independently. They evaluated the endoscopic signs and determined endosonographic invasion depth for each case. To ameliorate the contamination between conventional endoscopic and EUS image reading, endoscopists were required to review the EUS first, then endoscopic imaging with at least 1 week later. When the two endoscopists disagreed on certain imaging, the discrepancy was reviewed and discussed until a final consensus was reached. Two other endoscopists, one senior with over 10-year experience in EUS (FY) and one junior with 2-year experience in EUS (YLF), were asked to review the imaging files independently according to the same protocol for the assessment of interobserver agreement. They were trained with representative images before reviewing.

Interobserver agreement

Interobserver agreement on candidate variables of the modeling between endoscopists was assessed by the Kappa statistic [15] comparing the readings of the two endoscopists (FY and YLF) against the consensus reached by the first two endoscopists (XW and QWJ), respectively. The agreement was graded as poor (κ = 0.00–0.20), fair (κ = 0.21–0.40), moderate (κ = 0.41–0.60), good (κ = 0.61–0.80), and excellent (κ = 0.81–1.00) based on the Kappa (κ) values [15].

Histopathological staging

Histopathological examination of the resected specimens was performed in sequential 3-mm-thick sections stained with hematoxylin and eosin. The tumor size, depth of invasion, degree of differentiation, and lymphatic and vascular involvement were evaluated based on the map template. The depth of submucosal invasion was classified into 2 layers: SM1 (minute invasion into the submucosal layer <500 μm from the muscularis mucosae) and SM2 (penetration ≥500 μm). Mucosal cancer (M) and SM1 were categorized as M-SM1 because the same therapeutic strategy is employed for both conditions. SM2 or deeper invasion was categorized as ≥SM2. The degree of differentiation was classified as differentiated or undifferentiated. Poorly differentiated adenocarcinoma, signet-ring cell carcinoma, and mucinous cell carcinoma were assigned to the undifferentiated-type. One experienced pathologist (WXZ) in the study team reviewed all the pathological examinations.

Statistical analysis

Data description

Continuous variables were expressed as means (SD), and categorical variables were presented as proportions.

Modeling

A split sample method was used to derivate and validate the predictive model with stratified randomization approach [16]. The lesions (n = 205) were stratified into (1) M-SM1; (2) ≥SM2 based on the pathologic depth of invasion. Within a stratum, all lesions were given a random number with a random number generator and ranked by ascending order. The preceding two-thirds in each stratum comprised the derivation set to develop a predictive model and the remaining one-thirds were used to validate the model. The differences between the derivation and validation sets were analyzed by χ 2 test for categorical variables or independent sample student t test for continuous variables.

A predictive model for invasion depth was developed based on logistic regression analysis. Candidate predictors with P < .05 in univariable analysis were selected in forward stepwise multivariable logistic regression analysis to identify independent predictors (P < .05) associated with ≥SM2. A predictive model was constructed based on these independent variables and each of them was weighed by a score according to their adjusted regression coefficients. The simple summation of their scores was used to predict the probability of ≥SM2 invasion.

The discriminative power of the predictive model was assessed by area under the receiver operating characteristic (AUROC) curves in both derivation and validation sets. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and positive likelihood ratio (LR+) were calculated for each cut-off point of the total score. The consistencies of the model prediction and invasion depth in pathology by each cut-off point of the total score were also calculated.

All the analyses were performed using SPSS package version 17.0 (SPSS, Chicago, IL, USA). Two-sided P value <.05 was considered statistically significant.

Results

Baseline characteristics of enrolled patients

The mean age of the 195 patients was 62.0 ± 9.5 years old and 137 (70.26%) patients were male. There were no significant differences in the age of subjects (62.6 ± 9.1 vs. 60.6 ± 10.2, P = .17) and proportion of males (73.91 vs. 62.69%, P = .10) between the derivation (n = 138) and validation (n = 67) sets.

The characteristics of the 205 lesions are summarized as in Table 1. The proportions of different lesion locations, macroscopic types, tumor sizes, ulceration, remarkable redness, uneven surface, margin elevation, enlarged folds, abrupt cutting of converging folds, fusion of convergent folds, unclear demarcation, spontaneous bleeding, and stiffness and deformation of the gastric wall were similar between the derivation set (138 lesions) and the validation set (67 lesions).

Table 1 Characteristics of the lesions in the derivation and validation sets

129 (62.93%) lesions were resected endoscopically, among which four lesions (in four individuals) were positive for the vertical margin after ESD. Surgical resection was performed in two of the four patients since their submucosal invasions were more than 500 μm, but no cancerous lesions were found in the specimens. The other two patients were followed up because their submucosal invasions were less than 500 μm and multiple endoscopic biopsies did not reveal any in situ lesions of cancer in the following three consecutive years. Therefore, mucosal cancers were confirmed in 121 lesions (93.80%), SM1 cancers in 6 lesions (4.65%), and SM2 cancers in 2 lesions (1.55%). Surgery was performed on 76 (37.07%) lesions with a curative intent. The pathological findings were M-SM1 invasion in 43 (56.58%) lesions and ≥SM2 invasion in 33 (43.42%) lesions, respectively. According to the pathology in the resected specimens, there was no lymph node metastasis in any lesion of SM1 or less in the surgery group. No lymphovascular invasion was detected in SM1 or less lesions with ESD. Totally, there were 5 lesions with lymphovascular invasion (3 undifferentiated gastric cancer and 2 differentiated gastric cancer with ≥SM2 invasion), and 13 lesions with lymph node metastasis (10 undifferentiated gastric cancer with submucosal invasion and 3 differentiated gastric cancer with ≥SM2 invasion). All patients with lymphovascular invasion or lymph node metastasis undertook surgery. There were no significant differences in the proportions of different resection methods, invasion depth, and histological types between the two sets (Table 1).

Derivation of the MPHD score

In the derivation set, univariable analyses found that a lesion located in the upper third of the stomach, larger than 3 cm, remarkable redness, marked marginal elevation, abrupt cutting of converging folds, stiffness and deformation of the gastric wall, spontaneous bleeding, and EUS invasion were associated with ≥SM2 invasion significantly, but only remarkable redness (OR 5.42; 95% CI 1.32–22.29) (Fig. 1A), abrupt cutting of converging folds (OR 8.58; 95% CI 1.65–44.72) (Fig. 1B), lesions location in the upper third of the stomach (OR 10.26; 95% CI 2.19–48.09) (Fig. 1D), and deep invasion based on EUS findings (OR 16.53; 95% CI 4.48–61.15) (Fig. 1C) were verified to be independently associated with ≥SM2 invasion in multivariable analyses (Table 2).

Table 2 Univariate and multivariate predictors of histological depth in the derivation set

To establish the depth prediction model, each parameter independently associated with deeper invasion was weighed according to the adjusted β-coefficients in the multivariable logistic regression, respectively: 3 points for remarkable redness, 4 points for abrupt cutting of converging folds, 5 points for a lesion located in the upper third of the stomach, and 6 points for deep invasion based on EUS findings (Table 3). The model for the prediction of histologic depth (MPHD) was based on a score ranging from 0 to 18.

Table 3 The scoring system for predicting histological depth

The AUROC for the MPHD score to discriminate between M-SM1 and ≥SM2 lesions in the derivation set was 0.865 (95% CI 0.796–0.917) (Fig. 2A). The discriminatory power of MPHD by each cut-off point of total scores in the derivation set is shown in Table 4. We selected the cut-off point to be eight, because the proportion of correct estimation of invasion depth was highest and that of the overestimation was as low as 2.17%.

Fig. 2
figure 2

Model discrimination. Receiver operating characteristic curves were used to evaluate the discrimination performance in the derivation (A) and validation (B) sets. AUROC area under the ROC curve, CI confidence interval

Table 4 Diagnostic performance and accuracy of the MPHD scoring system for the identification of early gastric cancers with SM2 invasion in the derivation set

Validation of the MPHD score

In the validation set, the MPHD scoring showed good predictability of ≥SM2 invasion (AUROC 0.797; 95% CI 0.681–0.886) (Fig. 2B). At the cut-off point of eight, the sensitivity and specificity of MPHD were 63.64 and 91.07%, respectively. The model correctly predicted the invasion depth 86.57% of lesions; it overestimated the depth of 7.46% of lesions and underestimated the depth of 5.97% of lesions.

Interobserver agreement

The overall interobserver agreement for the selected parameters in the model was good with a κ value between 0.45 and 0.90 (Supplementary Table 1).

Discussion

In this retrospective study, we quantitatively and systemically evaluated macroscopic characteristics that were widely used to predict the invasion depth of EGCs, as well as EUS reading of the depth. Although many parameters were reported to be associated with deeper invasion of EGCs, our study only found remarkable redness, abrupt cutting of converging folds, tumor location, and invasion depth by EUS to be independent predictors in the multivariable analyses with different weights. Abe et al. [14] showed that remarkable redness in the lesions was significantly associated with ≥SM2 invasion, which might be explained by dilated tumor vessels. The development of regenerating epithelium is also presumed to be the result of malignant transformation and vertical invasion of a large and ulcerated lesion. In a study from Korea [2], an abrupt cutting of convergent folds with shallow and even depression yielded accuracy of 70% to predict invasion depth of EGCs, and it may to represent cancerous extension into the fibrosis of the submucosal layer. The tumor location in the upper third of the gastric wall has been verified to be a high risk factor for deeper invasion of EGCs [17], which may be attributable to anatomical characteristics of the gastric wall, as the upper third is thinner than the rest, resulting in easier and deeper invasion of the cancer. Upper location is also associated with an undifferentiated histological type [18]. Furthermore, lesions near the cardia or upper posterior wall are more liable to be missed in the early stages due to technical difficulties in visualization [18]. As some studies demonstrated that EUS has comparably discriminative power as conventional endoscopy for the invasion depth of EGCs [3, 8], in this study it provided the strongest predictor in the MPHD system by visualizing the structure of gastric walls directly. Some damage on the gastric wall structure, such as ulceration or multiple biopsies, may lead to incorrect estimation of the invasion depth of EGCs by EUS, and complimentary inclusion of macroscopic characteristics in conventional endoscopy might improve the accuracy of prediction. The MPHD developed in this study demonstrated enhanced predictability in a more simplified and systematic way by combining three parameters of conventional endoscopy and EUS reading of the invasion depth with different weights, compared to conventional methods. Because overestimation of the invasion depth of EGCs may result in unnecessary gastrectomy, which is irreversible and has significant impact on patients’ quality of life, while underestimation of the invasion depth can be rectified by surgical gastrectomy following endoscopic resection, a higher specificity to minimize overestimation of the invasion depth is more relevant in identifying the cut-off point of the total score in the model. In this study, when the cut-off point of the total score was chosen to be 8, the overestimation rates in all lesions were only 2.17 and 7.46% in the derivation and validation sets, respectively. The probability of lesions with SM2 or deep invasion was 81% when the total score of lesions was 8 or more. That means, the lesions should be considered for surgery rather than ESD/EMR, if its score is 8 or more.

In practice, EUS performers visualize sonographic images after detecting and targeting the lesion by conventional endoscopy, therefore the real-time endosonographic judgement may be affected by endoscopic impressions [3, 8]. In this study, endoscopists were asked to review EUS imaging instead of real-time EUS reports, and additional measures, such as delabel and setting 1 week between EUS and CE image reading, were taken to reduce contamination. Endoscopic characteristics of EGCs were often subjective and observer dependent, which may result in heterogeneity and inconsistency between studies. In tumor staging of EGCs, high accuracy of predictability by endoscopic characteristics can be reached by experienced endoscopists in large-scale studies [2, 8], and moderate interobserver agreements in conventional endoscopy and EUS were shown among observers with more than 7-year endoscopic experiences [3]. In this study, image reviewers with various endoscopic experiences, from 2 to 10 years were included, and the interobserver agreement of each parameter in the model achieved moderate to excellent after training with representative images, regardless endoscopists’ experiences.

This study also has some limitations. First of all, this is a retrospective study with all parameters investigated reviewed based on recorded imaging files, which may introduce bias as some patients who did not have pathological verification of the invasion depth were excluded, and there may be disparity in the judgment between real-time visualization and historical image reading. Secondly, the study population, though split as the derivation and validation sets, was enrolled from a single center, therefore made the sample size “fixed” during a certain period of time, which could have found limited significant parameters for a predictive model. Thirdly, the 7.5 MHz radial echoendoscope but not mini-probe was used to visualize relatively larger and ulcerated EGCs in 31 patients in the derivation set and in 12 in the validation set, which might predict the invasion depth less accurately because of its low resolution, but actually we didn’t find significant difference of accuracy between mini-probe and radial EUS (89.4 vs. 86.0%, P = .73, details available in request). Prospective studies with external validation in multiple centers are needed to confirm the validity of the model.

In conclusion, the MPHD scoring system combining endoscopic characteristics and endosonographic reading developed in this study exhibits high accuracy to discriminate EGCs with deeper submucosal invasion, and can minimize the risk of overestimation of the invasion depth and unnecessary gastrectomy.