Introduction

Gastric cancer is the fifth most common malignant tumor in the world after lung cancer, breast cancer, colorectal cancer, and prostate cancer and the third leading cause of cancer mortality in men and women [1]. Recent studies have indicated that gastric cancer is 10.79% of new cancer cases each year, and 12.80% of all cancer deaths are attributed to gastric cancer [2]. Surgery is considered the best treatment for early gastric cancer, such as endoscopic resection, laparoscopic, or open gastrectomy with D1 or D2 lymph node resection [3], [4]. For advanced gastric cancer, surgery remains the preferred approach for resectable gastric cancer, with preoperatively and perioperatively adjuvant chemotherapy improving the efficacy [5]. Gastric cancer is characterized by high postoperative recurrence, and early gastric cancer has a recurrence rate of 1.6% [6]. Lymphovascular invasion (LVI) is defined as the lymphatic vessels and/or blood vessel invasion of malignant tumor cells within the primary tumor and surrounding tissues [7]. LVI in gastric cancer is an independent risk factor for patient prognosis, especially in advanced cancers with lymph node metastases [8]. Even after surgical resection, patients with early gastric cancer who were positive for LVI had a higher rate of recurrence and a lower overall survival rate of 3–5 years than those with a negative LVI status [9]. Moreover, LVI is also associated with lymph node metastasis and is an independent predictor of poor prognosis in patients with gastric cancer [10].

Therefore, accurate assessment of LVI was important in predicting prognosis of gastric cancer and if the status of LVI can be detected and predicted noninvasively before operation, there will aid more optimal clinical decision-making and personalized treatment for patients [11]. LVI is detected after surgery by pathological examination, and preoperative noninvasive diagnosis of LVI is limitation relying solely on traditional clinical examinations (such as dynamic enhanced CT). Dynamic enhanced CT has been reported to be an effective tool for assessing tumor angiogenesis by quantitative enhancement measurement [12]. Fluoro18-fluorodeoxyglucose positron emission tomography/computed tomography (PET/CT), as a new quantitative and non-invasive imaging technology, uses 18F-FDG as a tracer to reflect the differences in metabolic levels of tissues in vivo through the differences in uptake distribution of tracers in different tissues [13]. Enhanced CT or PET/CT can be further quantitatively analyzed by radiomics with machine-learning approaches [14]. Recently, several applications of radiomics in gastric cancer have been reported in the prediction of histopathological characteristics [15], prediction of lymph node metastasis [16], and evaluation of patients’ prognosis [17]. However, it remains unclear whether radiomic models based on both enhanced CT and PET/CT can be used as preoperative prediction tools for LVI in gastric cancer patients, and there are few relevant literatures. By contrast, Li et al. reported that clinicopathological characteristics including the level of CA19-9, Lauren classification, tumor differentiation, TNM stage, and gastric wall invasive depth are associated with the presence of LVI in patients with gastric cancer [18], and Shen et al. found that LVI is significantly correlated with tumor size, age, status of ALN, and histological grade in breast cancer [19], indicating that clinical variables might also help predict LVI.

Therefore, this study aimed to investigate the value of PET/CT radiomics and clinical variables in preoperative prediction of LVI status in gastric cancer, as well as develop and validate a combined model that incorporates radiomics features and clinical characteristics to improve model performance.

Materials and methods

Patient enrollment

The study was approved by the ethical committee of the Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital (Zhengzhou, China), and the need for written informed consent form was waived. Patients with gastric cancer with histologically confirmed LVI status were retrospectively recruited from February 2014 to June 2019. The exclusive criteria were as follows: (1) abdominal enhanced CT and PET/CT examination was performed more than 2 weeks before surgery or biopsy, (2) tumor lesions could not be identified on the enhanced CT or PET/CT images, (3) patients with insufficient clinicopathological characteristics information, and (4) patients had received any anticancer therapy before scan. Finally, 34 gastric patients with LVI and 67 patients with non-LVI were selected for this study.

The patients were randomly divided into a training dataset (27 LVI and 49 non-LVI, n = 76) and a validation dataset (7 LVI and 18 non-LVI, n = 25). The predictive models were developed in the training dataset, and the validation dataset was a held-out dataset and never used before the evaluation and comparison of model performance (Fig. 1).

Fig. 1
figure 1

Flowchart of the study

Collection and pre-processing of clinical variables

Basic demographic information including age and gender were directly collected from the electronic medical record system. By applying fixed staging criteria, the depth of tumor invasion, clinical T stage, and clinical N stage on CT images were prospectively diagnosed by an experienced radiologist with 9 years of experience and confirmed by a senior expert with more than 20 years of experience [20]. The clinical T stage and clinical N stage were assessed based on the AJCC/UICC 8th edition staging system in gastric cancer, and the local regional lymph nodes with shorter diameter greater than 10 mm were considered as suspected metastasis. The two radiologists showed excellent inter-rater agreement in assessing clinical T stage (Weighted kappa = 0.936). The exemplars of different cT staging signs in gastric cancer are shown in Fig. 2. Borrmann classification was based on the results of endoscopy examination. The mean CT values of the venous period (CTV-P value) and thickness of tumor were measured by an experienced radiologist using a radiology workstation. Before being used as input information for model development, clinical variables were pre-processed and recorded in relation to the diagnosis as follows:

Fig. 2
figure 2

CT signs of gastric cancer by T1–4 staging. a The mucosa of the gastric antrum significantly enhanced (Short white arrow), not exceeding 50% of the thickness of the gastric wall. b Thickened locally of the gastric antrum wall, high enhancement of lesion involving thickness over 50% (Short white arrow), smooth serosal layer (red arrow). c Gastric cardia mass, slightly uneven enhancement (Short white arrow), slightly blurred fat space on serosal surface with short and thin cords (red arrow). d Stiff and non-uniform thickening gastric antrum and gastric body wall with multiple lymph nodes metastasis around the stomach (Short white arrow), nodular changes in serosal surface (red arrow), indistinguishable from the pancreatic tail (hexagon) and intestine (rhombus) (Color figure online)

(1) Age, an actual variable.

(2) Gender, a dichotomous variable (male = 1, female = 0).

(3) Clinical T stage, a dichotomous variable (T1 = 0, T2 = 1, T3 = 2, T4 = 3).

(4) Clinical N stage, a dichotomous variable (N0 = 1, N1 = 2, N2 = 3, N3 = 4).

(5) Borrmann type, a dichotomous variable (type I = 1, type II = 2, type III = 3, type IV = 4).

(6) Tumor location, a dichotomous variable (cardiac fundus and lower esophagus = 1, gastric curvature = 2, large curvature of the stomach = 3, gastric antrum and pylorus = 4, full stomach = 5).

(7) CTV-P value, an actual variable.

(8) Thickness of tumor, an actual variable.

Image acquisition

Enhanced CT images

Abdominal CT examinations were performed by Siemens Somatom Perspective CT Scanner (Siemens, Forchheim, Germany). The scan covered the upper or the entire abdomen. Twenty patients with gastric cancer were examined with a single-phase CT-enhanced scan (venous phase), and 81 patients were examined with triple-phase spiral CT-enhanced scan. For the enhanced CT scan, patients were infused with 1.5 ml/kg of iodine (370 mg I/ml, 50 ml) with a pump injector, at a rate of 3.0 ml/s into an antecubital vein. Imaging was obtained with a post-injection delay of 30 s, 70 s, and 300 s after initiation of contrast material injection, corresponding to the arterial, venous, and delayed phases, respectively. The CT scanning parameters were as follows: 120 kV tube voltage, 200–250 mA tube current, 0.7 s tube rotation time, 40–50 cm field of view, 512 × 512 matrix, and 5 mm section thickness. All CT images were reconstructed with 5 mm slice thickness.

PET/CT images

CT plain scan was performed first, with tube voltage 120 kV and tube current 80mAs, reconstruction spacing 3.75 mm and pitch 1.25, and then 3D PET image was collected every 3 min, with axial field of 15.7 cm and fault resolution of 4 mm. After the display, Fourier reconstruction (FORE) was used to reconstruct the image, and CT data were used to attenuate PET image twice. Finally, PET and CT data were transmitted to Xeleris workstation to obtain multi-plane image and fusion image.

Tumor region segmentation and radiomics features extraction

All the CT images and PET images were retrieved from the Picture Archiving and Communication System to a local workstation for the regions of interests (ROIs) segmentation and analysis. A gastrointestinal radiologist with 9 years of experience manually labeled the 3D tumor regions on each slice of the CT images and PET images by ITK-SNAP software (v3.8.0, http://www.itksnap.org). The segmentation results were then reviewed and modified by a senior gastrointestinal radiologist with more than 20 years of experience. Both radiologists were blinded to the pathologic results.

We used an open-source python package (PyRadiomics version 2.1.2, https://github.com/Radiomics/pyradiomics) to automatically extract radiomics features from the manually segmented ROIs in the enhanced CT and PET/CT images. A total of 1454 radiomics features, including shape features (n = 14), first-order intensity statistics features (n = 288) and texture features [Gray Level Dependence Matrix (GLDM, n = 224), Gray Level Co-occurrence Matrix (GLCM, n = 336), Gray Level Size Zone Matrix (GLSZM, n = 256), Gray Level Run Length Matrix (GLRLM, n = 256), Gray Neighboring Gray Tone Difference Matrix (NGTDM, n = 80)], were extracted from the manually labeled ROIs, and all these features had been used in the literature [21].

Evaluation of inter- and intra-observer reproducibility and selection of radiomics features

In order to ensure the reproducibility and accuracy of the radiomics features, 30 CT scans and 30 PET scans (each from 20 non-LVI and 10 LVI patients) were randomly selected, and the ROIs were manually labeled by radiologist 1 and radiologist 2. Then, radiologist 1 repeated the same procedure 2 weeks later. The radiomics features from the paired ROIs were automatically extracted and the inter- and intra-observer agreement was accessed by intra-class correlation coefficients (ICC). The radiomics features which met the criteria of having an ICC greater than 0.75 were considered as good agreement and used for further analysis.

To avoid overfitting problems and reduce computation complexity, the Least absolute shrinkage and selection operator (LASSO) algorithm was applied to select the key radiomics features most closely associated with the determination of gastric cancer LVI status [22], with penalty parameter tuning conducted by ten-fold cross-validation. Based on the optimal log(lambda) sequence (− 2.695 for enhanced CT-based features and − 2.434 for PET/CT-based features), six enhanced CT-based key features and four PET/CT-based key features were selected for further analysis, including shape (n = 2), first-order (n = 3), GLCM (n = 1), GLDM (n = 1), and GLSZM (n = 3) features. The feature extraction and selection process were implemented by Python 3.6.0 (www.python.org).

Development of predictive models

To investigate the capability of radiomics features and clinical variables in the preoperative differentiation of the LVI from non-LVI patients with gastric cancer, we developed 3 types of models with different machine-learning (ML) classifiers by different input data, respectively. The clinical variables were used to conduct the clinical model, the selected radiomics features were used to develop the image model, and the combined model was constructed with the clinical variables and selected radiomics features.

Logistic regression (LR), adaptive boosting (AdaBoost), and linear discriminant analysis (LDA) classifiers were used to discriminate LVI groups from non-LVI groups of patients with gastric cancer. Moreover, LVI groups and non-LVI groups were defined as positive and negative in the classification, respectively.

LR was a statistical modeling technique where the probability of a category was related to a set of explanatory variables [23]. The logistic model was defined by the following equations:

$$z={a}_{0}+{\sum }_{i=1}^{n}{a}_{i}{x}_{i},$$
(1)
$$P(z)=\frac{{e}^{Z}}{1+{e}^{Z}},$$
(2)

where Z was a measure of the contribution of the explanatory variables xi (i = 1, …, n), ai were the regression coefficients obtained by maximum likelihood in conjunction with their standard errors △ai, and P(z) was the categorical response of variables.

AdaBoost classifier was an ensemble classifier that included various weak classifiers [24]. In each iteration during the learning process, a component classifier Ck was trained according to the weights.

If the training pattern was classified correctly, its chance of being used again in the next component classifier decreased; conversely, if the pattern was classified mistakenly, its chance of being used again increased. The ensemble classifier could be expressed as follows:

$$C(x)=sign\left\{\left.{\sum }_{k=1}^{N}{\alpha }_{k}{c}_{k}(x)\right\}\right.,$$
(3)

where x denoted the input vector. N denoted the number of classifiers. \({\alpha }_{k}\) (k = 1, 2, ……, N) represented the weight of each weak classifier.

LDA was a classical linear classifier we used to solve the problem [25]:

$$y={\beta }_{0}+{\beta }^{T}x,$$
(4)

where x denoted the feature vector. \(\beta\) denoted the weight, and \({\beta }_{0}\) was the constant, which were determined by maximizing the distance between the 2 classes' means and minimizing the interclass variance.

The development and validation of all models were performed with InferScholar platform version 3.5 (InferVision).

Evaluation of model performance

The performance of the predictive models was evaluated by the area under the curve (AUC) and the sensitivity and specificity in the validation dataset. Sensitivity denoted the number of correctly predicted vascular invasion samples divided by the total number of vascular invasion samples, and specificity denoted the number of correctly predicted vascular non-invasion samples to the total number of vascular non-invasion samples. AUC was defined as the area under curves, which could evaluate the classifiers’ performance for the identification of LVI and non-LVI samples across the entire range independent of class distributions. An AUC > 0.7 was considered good classification performance [26].

Calibration analysis and decision curve analysis

We used 1,000 bootstrapping resamples for the evaluation of calibration, and the consistency between the actual and predicted LVI probability was represented graphically and assessed by Hosmer–Lemeshow test [27]. The potential net benefit of the predictive models at different threshold probabilities was quantified and the clinical usefulness was evaluated by the decision curve analysis (DCA) [28].

Statistical analysis

The differences between continuous clinical variables were evaluated by the Mann–Whitney U test, and the chi-squared test or Fisher’s exact test was used to compare dichotomous clinical characteristics between groups. Receiver operating characteristic (ROC) curve analysis was performed, and the discrimination performance of different models was quantified by the AUC, sensitivity, and specificity. Delong’s test was used to compare the difference between 2 AUCs of different models [29]. A heatmap of selected radiomic features was generated by HemI v1.0 software [30]. The calibration curve was plotted using the “rms” package (version 6.2), and the decision curve was plotted using the “rmda” package (version 1.6). A two-sided p value less than 0.05 was considered statistically significant. All statistical analyses were performed with SPSS (version 21.0) and R software (version 3.6.3).

Results

Study design and patient characteristics

Two hundred and fifty-one patients were diagnosed with gastric cancer from February 2014 to June 2019 in our hospital and were initially recruited. According to the inclusion and exclusion criteria, 74 men (median age, 64 years; range, 24–78 years) and 27 women (median age, 55 years; range, 24–84 years) were enrolled in the study. Based on pathological test results, our classification splits the patients into 2 groups: LVI (n = 34) and non-LVI (n = 67) group. The prevalence of LVI was 33.7% (34/101).

No significant differences were observed between the training and validation datasets in LVI prevalence (35.5% and 32.0% in the training and validation datasets, respectively. p = 0.75). No significant differences were observed in age, gender, cT category, cN category, Borrmann type, and tumor location between LVI and non-LVI groups in the training and validation datasets (all ps > 0.05). The LVI group showed higher CTV-P value and lower tumor thickness than the non-LVI group did in the training dataset; however, their differences were not significant in the validation dataset. The demographic and clinicopathological chrematistics of all patients are summarized in Table 1.

Table 1 Comparison of clinical variables between the LVI group and non-LVI group

Radiomics feature selection

There were 1105 and 726 radiomics features extracted from the enhanced CT and PET images showed good agreement with ICC > 0.75 after inter- and intra-observer agreement analysis. Six CT-based radiomics features and 4 PET-based radiomics features with non-zero coefficients remained after LASSO penalty was applied (Fig. 3). The heatmap of these selected key radiomics features in the training and validation dataset was plotted according to the normalized radiomics feature values (Fig. 4).

Fig. 3
figure 3

Selection of radiomics features by the variance threshold method and LASSO regression. A The variance threshold approach-based dimensionality reduction profile of the CT-based radiomics features. B The coefficient profile plot of 6 CT-based non-zero coefficients against the optimal log(lambda) sequence. C The variance threshold approach-based dimensionality reduction profile of the PET-based radiomics features. D The coefficient profile plot of 4 PET-based non-zero coefficients against the optimal log(lambda) sequence

Fig. 4
figure 4

Heatmap of selected CT and PET-based radiomics features in the training and validation datasets. Each row represented a radiomic feature, and each column corresponded to 1 patient (separately grouped for LVI and non-LVI patients)

Development and validation of the predictive models

The performance of predictive models was evaluated by ROC analysis with respect to the AUC in the validation dataset. The clinical models using AdaBoost, LDA, and LR classifiers had reached AUCs of 0.742 (95% CI, 0.529–0.894), 0.706 (95% CI, 0.492–0.870), and 0.690 (95% CI, 0.476–0.858), respectively (Fig. 5a–c). The AUCs of the image models using AdaBoost, LDA, and LR classifiers were 0.849 (95% CI, 0.650–0.960), 0.778 (95% CI, 0.568–0.918), and 0.810 (95% CI, 0.604–0.937), respectively (Figs. 5d–f). The combined models achieved an improvement in discrimination efficacy compared with the other models, with AUCs of AdaBoost, LDA, and LR classifiers achieving 0.944 (95% CI, 0.774–0.997), 0.929 (95% CI, 0.751–0.993), and 0.921 (95% CI, 0.741–0.990), respectively (Fig. 5g–i). The detailed performance of each model is presented in Table 2.

Fig. 5
figure 5

Performance evaluation of the predictive models. ROC curve of the clinical models using A AdaBoost, B LDA, and C LR classifiers in the training and validation datasets, respectively. ROC curve of the image models using D AdaBoost, E LDA, and F LR classifiers in the training and validation datasets, respectively. ROC curve of the combined models using G AdaBoost, H LDA, and i LR classifiers in the training and validation datasets, respectively

Table 2 Performance comparison of the predictive models in the validation dataset

Model comparison

For the clinical, image, or combined models, the AdaBoost, LDA, and LR classifiers showed no significant difference in performance in the validation dataset (Delong’s test, all p > 0.05). By contrast, although not statistically significant, the combined models showed better discrimination capability than the clinical models (Delong’s test, AdaBoost classifier p = 0.177, LDA classifier p = 0.063, LR classifier p = 0.132, respectively) or the image models (Delong’s test, AdaBoost classifier p = 0.136, LDA classifier p = 0.164, LR classifier p = 0.146, respectively), indicating that incorporating radiomics features and clinical variables would benefit the prediction of LVI status in patients with gastric cancer.

Calibration and clinical usefulness analysis

The calibration curve analysis demonstrated good agreement between prediction and observation for the combined models, with the non-significant statistic of the Hosmer–Lemeshow test suggested no significant deviation from an ideal fitting (p = 0.926 for AdaBoost classifier, Fig. 6a; p = 0.744 for LDA classifier, Fig. 6B; p = 0.907 for LR classifier, Fig. 6c). The DCA for the combined models using different classifier is presented in Fig. 7, and the results indicating that these combined models were clinically useful.

Fig. 6
figure 6

Calibration curves of the combined models using a AdaBoost, b LDA, and C LR classifiers, respectively. The solid line represented the performance of the model without correction for overfit. The dotted line was the bootstrap-corrected performance of the model with a scatter estimate for future accuracy

Fig. 7
figure 7

Decision curve analysis for the combined models using A AdaBoost, B LDA, and C LR classifiers, respectively. The red line represented the net benefit of the model across the full range of threshold probabilities. Gray line: all positive, assuming all patients should undergo surgery to confirm the presence of LVI. Black line: all negative, assuming no possibility of LVI in patients

Discussion

As one of the most common independent predictors of prognosis of gastric cancer, LVI refers to the invasion of lymphatic space, blood vessels, or both in the peritumor region by tumor embolus. The positive LVI status in gastric cancer increases the incidence rate of lymph node metastasis and distant metastasis and reduces the survival rate of patients [18]. TNM is the most used tool for the assessment of cancer prognosis [31], but the combination of LVI status and a TNM staging system can improve the accuracy of predicting the prognosis of patients with N0 stage gastric cancer [32]. Because there are no significant image features, it is difficult for radiologists to differentiate between LVI and non-LVI on preoperative CT images. Li et al. found that the level of CA19-9, tumor size, Lauren grading, tumor differentiation, depth of gastric wall infiltration, involvement of lymph nodes, distant metastasis, and advanced TNM stage were significantly correlated with the presence of vascular invasion in gastric cancer [18]. However, Lauren grading, tumor differentiation, depth of gastric wall infiltration, and involvement of lymph nodes are confirmed by postoperative pathology diagnosis, and the TNM stage is also difficult to classify before biopsy or surgery [33]. Therefore, it remains challenging to preoperatively predict the presence of LVI by clinicopathological characteristics. According to our review of the literature, no studies have used both enhanced CT and PET/CT images incorporating clinical variables for the prediction of LVI in gastric cancer.

In this study, the cT stage and cN stage, which can be classified on preoperative CT images, were used for model development. Other clinical variables, including age, gender, Borrmann type, tumor location, thickness of tumor, and CTV-P value, were used as risk factor candidates in this study. The Borrmann type was associated with LVI in advanced proximal gastric cancer [34]. Tumor location (the upper and middle third of stomach) is a significant risk factor for submucosal or lymphovascular invasion [35]. Ma et al. found that ΔAP (Tumor CT attenuation difference between non-contrast and portal) could be an independent predictor of LVI because of the increased microvascular permeability by lymphatic vascular structure destruction, suggesting that the CT values of the venous period might be correlated with LVI in gastric cancer [36]. Univariate analysis showed significant differences in thickness of tumor and CTV-P value between the LVI group and non-LVI group in the training dataset, which was consistent with another study [8].

Gastric cancer is a clinically heterogeneous disease, and CT images and clinical variables would enable different insights into different tumor biology characteristics on multiple levels. We developed 3 types of predictive models by radiomics features from both CT images, clinical variables, and the combination of both, respectively in the study. The combined model has a better performance than simply clinical model and image model in AdaBoost (AUC = 0.944, 0.742, 0.849), LDA (AUC = 0.929, 0.706, 0.778), and LR (AUC = 0.921, 0.690, 0.810) classifier, respectively. The results indicated that the combined model could reflect the tumor heterogeneity more accurately and be more reliable. We reasoned that both radiomics features selected from both enhanced CT and PET/CT images and preoperative clinical variables of patients with gastric cancer could be used for prediction of LVI status by machine-learning-based methods, and the predictive performance could be improved by their combination. In other studies, researchers found that incorporating radiomics features could lead to improved performance in evaluating the survival and chemotherapeutic benefits of patients with gastric cancer than TNM staging system and clinicopathologic nomogram [37], and the combined model could also add more efficiency than either feature alone for predicting preoperatively the LVI status of patients with breast cancer [38].

We also compared the performance of 3 ML classifiers—AdaBoost, LDA, and LR—and found no significant difference. It may be that the sample size is relatively small. In clinical practice, appropriate methods should be used according to different needs. Considering the stability and prognostic properties, a combined model using the AdaBoost classifier might be the best choice.

Several limitations in this study deserve consideration. First, as a retrospective study, the patients were enrolled from one research institute and the sample size was limited; thus, prospective multicenter research with a larger sample is necessary. Second, the value of radiomics features extracted from multiphasic contrast-enhanced CT images and other clinical variables in the prediction of LVI also requires investigation. Third, because three-dimensional manual tumor segmentation was time-consuming and complicated, a deep-learning method for the automatic segmentation of gastric cancer should be developed.

In conclusion, the results of this study showed that the LVI status of gastric cancer could be predicted by radiomics features from enhanced CT and PET/CT images, as well as incorporating with clinical variables. The combined model incorporating radiomics features and clinical variables had a better performance and is a potential non-invasive tool for detecting LVI in patients with gastric cancer preoperatively. Further research is required before application in clinical practice.