Introduction

Lung cancer remains the leading cause of cancer-related death globally and imposes a huge burden on human health and economics [1]. Anatomic lobectomy and systematic lymph node (LN) dissection are the standard surgical procedure for resectable non-small cell lung cancer (NSCLC). Sublobectomy could preserve more lung tissue, reduce operation trauma, and improve postoperative living quality. Multiple studies demonstrated that sublobectomy could provide the therapeutic effect equivalent to lobectomy for early-stage NSCLC patients [2,3,4]. Accurate presurgical prediction of LN status is beneficial to choose segmentectomy or wedge resection for node-negative, early-stage patients. Especially, LN status assessment was impracticable during wedge resection, so no LN metastasis should be confirmed prior to operation [5, 6]. Additionally, for patients scheduled for radiotherapy, accurate mediastinal staging could assist oncologists in designating irradiation fields, reducing the risk of treatment failure due to occult LN metastasis. Thus, accurate prediction of LN metastasis is crucial in informing therapeutic decision-making for NSCLC patients.

The endobronchial ultrasound-guided transbronchial needle aspiration or mediastinoscopy are generally used to pathologically confirm LN metastasis. However, these invasive methods could not be routinely applied across the whole population due to complications such as airway bleeding, pneumothorax, or nerve injury [7]. Pretreatment LN staging is noninvasively evaluated through imaging modalities including CT and PET/CT, but CT interpretation of LN short-axis diameter has been proved to be implausible in diagnosing LN metastasis [8]; For PET/CT, LN false positives caused by inflammation and granuloma, as well as expensive fees, were obstacles to a wide clinical application of this modality [9, 10]. Recently, several scholars developed CT-based radiomics predictors for LN metastasis [11,12,13,14]. But the radiomics approach needs handcraft segmentation of tumors, which is labor-intensive and time-consuming. Furthermore, radiomics features are highly susceptible to the heterogeneity of interobserver segmentation due to subjective judgement and professional skill [15].

Encouragingly, the emerging deep learning (DL) has achieved inspiring marks in differentiating histological subtypes [16, 17], evaluating therapeutic response [18,19,20], and predicting outcomes for lung cancer [21, 22]. DL approach could automatically extract representative information without manual segmentation. A few scholars previously applied the DL approach to predict LN metastasis for NSCLC. However, their studies had relatively small sample sizes and some important clinical variables such as smoking history and carcinoembryonic antigen (CEA) status were not involved. Besides, their constructed DL models did not perform remarkably well and still need to be improved [23,24,25]. Lung adenocarcinoma is the most common histological subtype of lung cancer, accounting for nearly 60% of NSCLC [26]. This study adopted a novel DL architecture named Swin Transformer to develop and validate a DL signature predictive of mediastinal LN invasion in patients with lung adenocarcinoma. We also compared the predictive performance of DL with traditional radiomics signature and clinical-semantic (CS) model based on clinical characteristics and CT semantic features in risk estimation of LN metastasis.

Methods and materials

Patients

This retrospective study was approved by the institutional ethics committee and the requirement for informed consent was waived. The patients undergoing radical surgical excision and systematic lymphadenectomy from May 2014 to September 2019 were retrospectively reviewed. We included patients with (1) pathologically-confirmed primary lung adenocarcinoma; (2) no presurgical radiotherapy or chemotherapy; (3) the interval time from presurgical CT examination to operation within 2 weeks. The exclusion criteria were as follows: (a) adenocarcinoma in situ, minimally invasive adenocarcinoma, and rare histological variants of lung adenocarcinoma; (b) synchronous or metachronous tumors; (c) no thin-section CT image or unsatisfactory CT image quality; (d) incomplete clinicopathologic data; (e) histories of other cancers. Finally, 612 patients were enrolled and then randomly divided into training cohort (n = 489) and internal validation cohort (n = 123) at a ratio of 4:1. Following the same eligibility criteria, 108 eligible patients receiving surgical excision in our institution from October 2019 to January 2021 were collected to constitute an independent test cohort (n = 108). The patient recruitment was shown in Fig. 1.

Fig. 1
figure 1

The workflow diagram of patient recruitment

The clinicopathological characteristics including age, gender, smoking history, pack-year and serum CEA status, histological subtypes, Ki-67 labeling index (LI), and LN status were acquired from electrical medical records. Histological grade is determined by the prognostic classification of the predominant histological subtype of lung adenocarcinoma [27, 28].

CT acquisition and semantic features interpretation

The patients underwent contrast-enhanced CT examination using two multi-slice spiral CT scanners (GE Discovery CT 750 HD, TOSHIBA Aquilion One TSX-301A). The CT acquisition parameters were revealed in Supplementary data.

Blinded to clinicopathologic information, two radiologists with experience of 8 years and 3 years independently assessed CT semantic features in fixed lung window (width, 1600 HU; level, −600 HU) and mediastinal window (width, 400 HU; level, 40 HU). CT semantic features included location, affiliated lobe, tumor total diameter, tumor consolidation diameter, consolidation-to-tumor ratio (CTR), spiculation, lobulation, air bronchogram, plural attachment, and CT-reported LN status. CTR was calculated with the following formula: CTR (%) = tumor consolidation diameter /tumor total diameter × 100 [5]. The definitions of CT semantic features were detailed in Supplementary data. Cohen’s Kappa coefficient and intraclass correlation coefficient (ICC) were used to evaluate the interobserver agreement for categorical variables and continuous variables, respectively. Generally, the Kappa coefficient/ICC of 0–0.20 indicates a poor agreement; 0.21–0.40 fair agreement; 0.41–0.60 moderate agreement; 0.61–0.80 good agreement, > 0.80 excellent agreement. The average for the continuous variable was calculated as the final value. For categorical variables, the consensus was reached through discussion if a disagreement occurred.

Radiomics signature development

Image segmentation was performed by two trained radiologists using open-source software of ITK-SNAP (version 3.8.0) as detailed in Supplementary data. Radiomics features were extracted from delineated three-dimensional volume-of-interest of tumors using PyRadiomics software (https://pyradiomics.readthedocs.io/en/latest/index.html). Feature selection and radiomics signature development were detailed in Supplementary data.

DL signature development

We proposed a DL architecture called Swin Transformer to develop a DL signature predictive of LN metastasis. The architecture of the Swin Transformer was depicted as detailed in Fig. 2 and Supplementary data. In data preprocessing, we placed a cubic bounding box on the largest slice of the tumor, ensuring the entire tumor was completely involved within the bounding box. The bounding boxes of images containing the tumors on each CT slice were resampled to 224 × 224 pixels by bilinear interpolation. The bounding boxes of images on three adjacent CT slices were combined into a three-channel image as the input of the DL model to generate the risk probability of LN metastasis. Specifically, to achieve a robust prediction, all three-channel images of each tumor were fed into the DL model, and the average risk probability of LN metastasis was obtained as a DL signature. Transfer learning was used to efficiently develop a Swin Transformer model [29]: the pretraining was performed in 1.28 million natural images from the ImageNet dataset; Afterwards, the developed network was finetuned in 17610 CT images of lung adenocarcinoma in the training cohort. The original code of Swin Transformer is available at https://github.com/microsoft/Swin-Transformer. We implemented the neural network using PyTorch 1.4.1 library in Python 3.7.0 (https://pytorch.org).

Fig. 2
figure 2

The detailed architecture of Swin Transformer for prediction of lymph node (LN) metastasis in lung adenocarcinoma

Clinical-semantic model and combined model construction

In the training cohort, the significant CS variables in univariate analysis were selected. To avoid multicollinearity, the variables with Spearman correlation coefficients greater than 0.7 were excluded. The remaining CS variables were incorporated in multivariable logistic regression with forward stepwise selection to determine the independent risk predictors and construct the CS model. Noted that the pathological metrics were recorded but removed from regression analysis due to the inherent study design of preoperative prediction for LN status. To explore the optimum prediction model, we construct three combined models by integrating radiomics signature, DL signature, and both of them with CS model, which were indicated as CS-radiomics model, CS-DL model, and CS-radiomics-DL model.

Statistical analysis

Statistical analysis was conducted using software of MATLAB (MathWorks Inc.) and SPSS (IBM, ver. 26.0). The continuous variables were tested for normality and homogeneity of variance using the Shapiro-Wilk test and Levene test, respectively. The continuous variables were compared using the Student’s t-test and ANOVA test, or Mann-Whitney U test and Kruskal-Wallis test, as appropriate. The categorical variables were compared using the chi-square test or Fisher exact test, as appropriate. The correlation between variables was assessed using the Spearman correlation coefficient.

The receiver operating characteristic curve (ROC) was depicted and the area under the curve (AUC) along with sensitivity and specificity were calculated to quantify model performance. The comparisons of AUC were conducted by the DeLong test. The calibration curve and Hosmer-Lemeshow test were used to evaluate the agreement of predicted probabilities with actual observations. The decision curve analysis was depicted to exhibit clinical utility. Two-tailed p value < 0.05 indicated a significant difference.

Results

The clinicopathological characteristics and semantic features among the training cohort (n = 489), internal validation cohort (n = 124), and independent test cohort (n = 108) were similarly distributed as revealed in Table 1. Among the total of 720 patients, 359 (49.9%) were male (median age (interquartile): 60.0 (53, 65)) and 361 (50.1%) were female (median age (interquartile): 59.0 (52.0, 65.0)). Totally, pathologically-confirmed LN metastasis occurred in 199 (27.6%) out of 720 patients. Of them, there were 49 (24.6%), 143 (71.9%), and 7 (3.5%) patients diagnosed with N1, N2, and N3 disease, respectively. The negative LNs were pathologically confirmed to be inflammatory proliferation, tuberculous granuloma, sarcoidosis, or normal nodal pathological structure.

Table 1 The distribution of clinicopathological characteristics and semantic features across the training cohort, internal validation cohort, and independent test cohort

Interobserver agreement assessment of semantic features

The ICCs for tumor total diameter, tumor consolidation diameter and CTR were 0.985 (95% confidence interval [CI]: 0.975, 0.990), 0.989 (95% CI: 0.979, 0.993), and 0.990 (95% CI: 0.988, 0.991), respectively, which were indicative of excellent agreement. The disagreement numbers and percentages between two radiologists occurring for categorial variables were shown in Table 2. Cohen’s Kappa coefficients for categorical variables also showed good agreement as detailed in Table 2.

Table 2 The interobserver agreement of CT semantic features for lung adenocarcinoma

The development of radiomics signature and DL signature

Totally, 1210 radiomics features were extracted and selected as detailed in Supplementary data. Finally, an eighteen-feature radiomics signature was generated by a linear combination of the selected features weighted by their respective regression coefficients as detailed in Supplementary data. The DL output of the last layer of the Swin Transformer was obtained as the DL signature.

Clinical-semantic model and combined model construction

In Table 3, patients with CEA > 5 ug/L were prone to suffer from LN metastasis (p < 0.001), but no association of age, gender, smoking history, and pack-year with LN metastasis was observed. LN metastasis was more common in patients with intermediate- and high-grade lung adenocarcinoma (p < 0.001); The Ki-67 LI in patients with LN metastasis was higher than that in patients without LN metastasis (p < 0.001). LN metastasis was more frequently found in tumors with a larger total diameter and consolidation diameter, higher CTR, spiculation, pleural attachment, and CT-reported LN metastasis, and less common in tumors with air bronchogram (all p < 0.001).

Table 3 Univariate analysis of clinicopathological characteristics, semantic features, radiomics signature, and deep learning signature in training cohort

CEA, tumor total diameter, tumor consolidation diameter, CTR, spiculation, air bronchogram, pleural attachment, and CT-reported LN metastasis were the candidates to construct the CS model. Tumor consolidation diameter was excluded owing to a strong correlation with tumor total diameter as revealed in Fig. 3 (r = 0.81, p < 0.001). Finally, CEA > 5 ug/L (odds ratio [OR]: 2.758; 95% CI: 1.670, 4.555; p < 0.001), CTR (OR: 1.062; 95% CI: 1.038, 1.086; p < 0.001), air bronchogram (OR: 0.582; 95% CI: 0.355, 0.951; p = 0.031), pleural attachment (OR: 1.748; 95% CI: 1.074, 2.844; p = 0.025), and CT-reported LN status (OR: 4.511; 95% CI: 2.495, 8.155; p < 0.001) as the independent risk predictors were incorporated to construct CS model. Accordingly, the combined CS-Radiomics model, CS-DL model, and CS-radiomics-DL model were constructed as revealed in Table 4.

Fig. 3
figure 3

The pairwise correlation evaluation of clinical-semantic (CS) candidate variables, radiomics signature and deep learning (DL) signature using Spearman correlation coefficient

Table 4 Multivariable logistic regression analyses of the CS model and combined models for predicting lymph node metastasis

Model performance evaluation

There were 68 patients occurring the LN metastasis out of all 124 patients with CT-reported LN metastasis. The other patients without LN metastasis were pathologically diagnosed to be inflammatory proliferative diseases, tuberculous granuloma, sarcoidosis, or normal pathological structure. In Table 5, CT-reported LN status alone performed far inferior to CS model in all three cohorts (AUC: 0.619 vs. 0.823 for training cohort, p < 0.001; 0.604 vs. 0.781 for internal validation cohort, p = 0.026; 0.627 vs. 0.853 for independent test cohort, p < 0.001). The sensitivity of CT-reported LN status ranged from 0.303 to 0.394, while the specificity ranged from 0.844 to 0.907 in diagnosing LN metastasis across three cohorts.

Table 5 The model performances in the training cohort, internal validation cohort and independent test cohort

The AUC for CS model was 0.823 (95% CI: 0.785, 0.861) in training cohort, 0.781 (95% CI: 0.693, 0.869) in internal validation cohort, and 0.853 (95% CI: 0.780, 0.926) in independent test cohort. The AUC for radiomics signature was 0.884 (95% CI: 0.853, 0.915) in training cohort, 0.863 (95% CI: 0.787, 0.939) in internal validation cohort, and 0.886 (95% CI: 0.826, 0.946) in independent test cohort.

Encouragingly, DL signature achieved significantly higher AUC than CS model and radiomics signature in training cohort (0.961 vs. 0.823, p < 0.001; 0.961 vs. 0.884, p < 0.001), internal validation cohort (0.948 vs. 0.781, p < 0.001; 0.948 vs. 0.863, p = 0.019), and independent test cohort (0.960 vs. 0.853, p = 0.002; 0.960 vs. 0.886, p = 0.029), respectively (Fig. 4A–C). The sensitivity and specificity of DL signature in predicting LN metastasis ranged from 0.758 to 0.910 and 0.907 to 0.987 across all three cohorts, respectively.

Fig. 4
figure 4

The performance evaluation of DL signature in predicting LN metastasis. (A–C) The receiver operating characteristic curves of CS model, radiomics signature and DL signature in training cohort (A), internal validation cohort (B), and independent test cohort (C). Number in parenthesis is the area under receiver operating characteristic curve. (D–F) The calibration curves depicted the agreements between DL signature predicted probabilities and actual observed probabilities of LN metastasis in training cohort (D), internal validation cohort (E), and independent test cohort (F)

The Hosmer-Lemeshow tests (p = 0.267 for the training cohort, p = 0.790 for the internal validation cohort, and p = 0.754 for the independent test cohort) and the calibration curves revealed DL signature predicted probabilities had a good agreement with the actual observed probabilities in all three cohorts (Fig. 4E–G). From decision curve analyses, the DL signature could confer a higher net benefit in predicting LN metastasis than the CS model and radiomics signature across the threshold probability range of 0.2–1.0 (Fig. 5).

Fig. 5
figure 5

Decision curve analyses of CS model, radiomics signature and DL signature in training cohort (A), internal validation cohort (B), and independent test cohort (C). DL signature conferred a higher net benefit in predicting LN metastasis than CS model and radiomics signature across threshold probability range of 0.2–1.0

In the training cohort, the combined CS-radiomics model, CS-DL model, and CS-radiomics-DL model achieved an AUC of 0.914, 0.965, and 0.974, respectively. In the internal validation cohort, the combined CS-radiomics model, CS-DL model, and CS-radiomics-DL model achieved an AUC of 0.882, 0.958, and 0.958, respectively. In the independent test cohort, the AUCs for the aforementioned models were 0.936, 0.958, and 0.969, respectively. Reasonably, the inclusion of radiomics signature, DL signature, or both of them showed an incremental value with respect to the CS model in all three cohorts (all p < 0.05). In training cohort, CS-Radiomics-DL model performed slightly superior to DL signature (0.974 vs. 0.961, p = 0.005), but AUCs for CS-DL model and DL signature were comparable (0.965 vs. 0.961, p = 0.107). Furthermore, in internal validation cohort and independent test cohort, no significant difference was observed between the DL signature and three combined models (all p > 0.05); in other words, the incorporation of CS risk predictors and radiomics signature did not reveal a substantial improvement in discriminative performance over DL signature.

Discussion

This study developed a DL signature predictive of invasive mediastinal metastasis based on a novel Swin-Transformer architecture, yielding an AUC of 0.961 (95% CI: 0.942, 0.979), 0.948 (95% CI: 0.910, 0.987), and 0.960 (95% CI: 0.922, 0.997) in the training cohort, internal validation cohort, and independent test cohort, respectively. The proposed DL signature exhibited superior predictive efficacy to the traditional CS model and radiomics signature. Furthermore, the DL signature acquired a higher net benefit than both the CS model and radiomics signature.

Currently, there is limited literature on adopting the DL technique in presurgical prediction for LN staging in lung cancer. Ran et al used VGG-6 to generate a DL signature predictive of LN metastasis, which yielded an AUC of 0.812 in the external validation set [24]; Zhao et al presented a DL framework named DenseNet algorithm for risk estimation of occult LN metastasis, resulting in an AUC of 0.880 [23]. Our developed DL signature achieved an AUC ranging from 0.948 to 0.961 across all cohorts, far exceeding the previous DL results. This remarkable capability may attribute to the state-of-the-art Swin Transformer architecture serving as the backbone of our DL model, which was exploited by Liu and colleagues from Microsoft Research Asia in 2021 [30]. Swin Transformer has two key strengths: hierarchical feature representation and multi-head self-attention based on shifted windows. This hierarchical architecture limits self-attention computation to non-overlapping shifted windows and allows cross-window connections, which can be flexibly applied in modeling at various scales. The Swin Transformer has been proved highly efficient in image classification, dense detection, and semantic segmentation [30].

For comparison with the DL signature, a radiomics signature was generated by a linear polynomial of eighteen selected features. The AUC for radiomics signature ranged from 0.863 to 0.886 in this study, far inferior to that for DL signature in the corresponding cohort. Radiomics technique relies strongly on delicate delineation generally performed by trained radiologists, which overburdens clinical workload. Radiomics processing involves several sequential steps including tumor segmentation, feature extraction, feature selection, and model establishment, and the overall modeling performance depends on the processing quality of each step. These qualities consequently contribute to a disadvantage of confined generalization capability and limited ability to leverage high-throughput features in radiomics. DL is an end-to-end architecture, where models are adjusted and finally converged by reversely transmitting the errors between predicted results and real observations in each layer. Aside from that, the DL technique is characterized by automatic representative data acquisition, free from clinical index collection, semantic feature interpretation, and manual annotation, and therefore is readily accepted in clinical workflow [31,32,33].

To construct a CS model predictive of LN metastasis, we modestly considered the potential clinical variables and CT semantic features. As a cell adhesion-associated glycoprotein, CEA is widely considered an indicator of tumor invasiveness and plays an important role in prognosis evaluation and treatment monitoring for lung cancer [34, 35]. Multivariable logistic regression analyses revealed that elevated serum CEA (> 5 ug/L) was independently associated with LN metastasis in both the CS model and combined models. Consistent with our findings, Wang et al [12] and Gu et al [36] also demonstrated that CEA was the independent risk predictor for LN metastasis when incorporating CS features and radiomics signature. CTR is a quantitative manifestation of consolidation proportion within tumors [37]. Prior studies confirmed that CTR was of great value in predicting LN metastasis for lung adenocarcinoma [6, 38]. In the CS model, CTR was an independent risk predictor for LN metastasis, with a 1.82-fold increased risk of LN metastasis for every 10% increase in CTR. However, CTR became insignificant when integrating DL signature with the CS model. The strong correlation between DL signature and CTR (r = 0.611, p < 0.001) might account for this. Beyond that, air bronchogram and pleural attachment were closely associated with LN metastasis, which were previously reported to be radiological markers reflecting tumor aggressiveness [39, 40]. CT-reported LN status is a conventional assessment metric for clinical mediastinal staging, depending on macroscopic measurement for LN short-axis diameter [41]. Expectedly, CT-reported LN status was weighted heavily in logistic regression equations, but this subjective method revealed poor sensitivity and unsatisfactory AUC in this study. A previous study demonstrated more than 20% of lymph nodes with a short-axis diameter < 1 cm were proved to be tumor involved [42]. CT alone for mediastinal staging is insufficient to meet the requirements of clinical application [12, 43]. Furthermore, the AUCs for the CS model were 0.781–0.853, much poorer than those for the DL signature. Aokage et al constructed a prediction model including tumor diameter, CTR, and solid component density, and created a formula for calculating the probability of LN metastasis in lung adenocarcinoma [6]; He et al found that CEA, lung adenocarcinoma, absence of vascular convergence and pleural attachment were independently predictive of LN metastasis in NSCLC. But these models relied on subjective evaluation by experienced radiologists and achieved the AUCs (0.796–0797) far inferior to our DL signature.

We attempted to exploit the optimum prediction model by a combination of CS risk predictors and radiomics signature with DL signature. It was originally assumed that the inclusion of CT-reported LN status might partially compensate for the deficiency of DL signature in obtaining representative information solely from the tumor region and thus yield a performance improvement over DL signature alone. Nevertheless, the results demonstrated that neither CT-reported LN status nor other CS risk predictors nor radiomics signature conferred an incremental value with regards to DL signature. This finding further lent support to the predominant potency of our proposed DL signature in the prediction of LN metastasis in lung adenocarcinoma.

There were several limitations to this study. First, an important weakness of DL was that small perturbations from data quality and provenance might result in output mistakes. DL signature might be affected by acquisition parameters across multi-vender and multi-institution CT scanners. The present study data were merely from a single center, and larger sample-size, multi-institution datasets should be warranted to affirm the reproducibility and generalization of the developed DL and radiomics signatures. Second, this study only included patients with lung adenocarcinoma. The value of DL signature in predicting other pathological subtypes of lung cancer should be further elucidated. Besides, only a few of our retrospectively enrolled cases received PET-CT scanning, and we temporarily failed to compare the performance of DL signature with PET-CT due to the excessive missing records in PET-CT. However, the discussion on the predictive performance of PET-CT, as well as the incremental predictive value of DL signature with respect to PET-CT should be supplemented in future research. Last, the lack of interpretation of DL results is a major obstacle to the practical application of DL models in clinical practice. The potential biological mechanism underlying the black box of DL signature requires further in-depth investigation. The common method to improve the explanation about DL prediction is to generate a visual feature heatmap using Grad-CAM and explore the clinical diagnosis and decision-making significance of the attention regions. Besides, uncovering some potential biological implications such as relating DL signature with expression of specific genes or proteins predictive of clinical endpoints could further provide the biological interpretability of DL.

In conclusion, we proposed a novel Swin Transformer to develop a DL signature for the prediction of LN metastasis in lung adenocarcinoma, and the predictive efficiency of the DL signature surpassed that of the traditional CS model and radiomics signature. DL signature might serve as an effective tool for non-invasive mediastinal LN staging and in facilitating the formulation of individualized therapeutic strategy.