Introduction

Pulmonary sarcomatoid carcinomas (PSCs) is a small subgroup of non-small cell lung carcinomas (NSCLC), representing around 0.1–0.4% of all lung malignancies [1]. It is a poorly differentiated tumor containing both cancerous and sarcoma or sarcoma-like compartments [2], with significantly worse outcomes than other forms of NSCLC. Despite a variety of clinical and pathological parameters, such as the disease stages, the size of necrosis and the status of lymphatic permeation, can be used for the classification and prognostic stratification of PSCs [3], there are few treatment options due to its resistance to conventional platinum-based chemotherapy [4] and insufficient understanding of its genomic background to apply targeted treatment. Due to the scarcity of this disease, only a few studies have evaluated some key driver genes in PSC patients, mainly including TP53, MET, EGFR and KRAS [5,6,7], and distinct results were obtained in terms of the mutation spectrum in different population. For example, 18% of PSCs in Japan were identified to carry EGFR mutations [8], while in United States, KRAS (30%) is the dominant mutated gene in PSC and no EGFR mutations were identified. Given the geographic differences in genetic background of this disease, we realized that comprehensive genetic profiling is critical to accelerate drug development and clinical trials of existing molecular targeted drugs for PSC treatment. Herein, we performed targeted DNA sequencing on dissected tumor tissues from 32 Chinese PSC patients, with 416 cancer-related genes and 16 common fusion genes interrogated.

Materials and methods

Patient enrollment and sample collection

Samples of 32 PSC patients were retrospectively collected from the First People’s Hospital of Changzhou between 2005 and 2016. The study was approved by the Review Broad of the Third Affiliated Hospital of Soochow University. For each patient, fresh–frozen tumor tissue from bronchial biopsies, surgically removed lung lesions or metastatic lymph nodes were obtained. All specimens of these patients were examined by two experienced pathologists under the guidance of the 2004 WHO classification of PSC. The diagnosis of PSC was made when tumor contains at least 10% of sarcomatoid component, represented by spindle or pleomorphic giant cells or both. The tumor content in each specimen is required to be above 50% for DNA extraction and further genomic profiling.

DNA extraction and library preparation

Genomic DNA of tumor tissues was extracted using Qiagen DNeasy blood and tissue kit (Qiagen) following manufacturer’s instructions. DNA was quantified on Qubit Fluorometer with Qubit dsDNA HS Assay kit (Thermo Fisher) and its quality was evaluated by Nanodrop 2000. Library construction for sequencing was performed as previously described [9]. In brief, 500 ng–1 μg DNA was sheared into ~ 350 bp with Covaris M220 instrument. Indexed paired-end adaptors for Illumina platform were synthesized by Integrated DNA Technologies (IDT). End repair, A-tailing and adaptor ligation of sheared DNA were performed with the reagents from KAPA Hyper DNA Library Prep kit (Roche Diagnostics). Unligated adaptors were removed by the size selection function of Agencourt AMPure XP beads (Beckman Coulter) and the ligation products were PCR amplified with Illumina P5 and P7 amplification primers.

4–5 DNA libraries with different indexes were pooled together and subjected to targeted enrichment using a customized panel that was designed to capture 416 cancer-related genes and 16 fusion genes. Pooled DNA libraries were combined with blocking oligos and biotinylated DNA probes in the hybridization solution of IDT Lockdown reagents (IDT) for overnight incubation. M-270 Streptavidin dynabeads (Life Technologies) were used to capture hybridized products and on-beads amplification was performed with KAPA HiFi HotStart ReadyMix (KAPA Biosystems). After post-amplification cleanup, the obtained library was quantified by KAPA Library Quantification kit (KAPA Biosystems) and its fragment size distribution was analyzed by Agilent Technologies 2100 Bioanalyzer.

Library sequencing and bioinformatics analysis

Prepared DNA libraries were sequenced on Illumina Hiseq 4000 platform (Illumina, San Diego, CA) in the core facility of Geneseeq Technology Inc., China. Sequencing data were processed and genetic variants were called according to a previous report with minor modifications. In brief, Trimmomatic was used for FASTQ file quality control (QC) to remove leading/trailing low quality (quality reading below 15) or N bases before mapping to the reference genome. Only qualified reads were mapped to the reference human genome hg19 using Burrows–Wheeler Aligner (BWA-mem, v0.7.12) with default parameters. SNVs and indels were detected by VarScan. SNPs with mutation allele frequency (MAF) > 30% were filtered by 1000 genomes project or 65,000 exomes project (ExAC) and were removed from the final reports if present in > 1% population frequency in the databases for filtering. ADTEx (https://adtex.sourceforge.net) was used to identify CNVs using a normal human HapMap DNA sample NA18535. The depth ratios are smoothed by discrete wavelet transformation techniques prior to applying HMM to estimate polyploidy, normal contamination ratio and absolute CNVs.

Statistical analysis

The mutation frequency of different genes in TP53 mutated and TP53 wide-type patients were compared by Fisher’s exact test. TMB analysis in different gene groups was ranked by Mann–Whitney U test. Overall survival (OS) was calculated from the date of first referral to date of death (uncensored) or last contact (censored). Multivariate analysis of COX’s model was used for survival analysis (OS) of patients’ clinical characteristics, including age, gender, disease stage, smoking status and TMB.

Results

Clinical characteristics of the study cohort

Demographic and clinical characteristics of 32 patients with PSC are summarized in Table 1. The median age of all patients is 60 years with a range of 39 ~ 72 and majority of them were male (n = 23, 72%). Patients were diagnosed at different disease stages and subjected to surgery and/or chemotherapy to control disease progression. All tumor tissues were collected prior treatment for DNA-sequencing analysis.

Table 1 Patient demographic and clinical characteristics

Primary genetic alterations in PSC

Among the 32 cases, a total of 516 genomic alterations on 216 distinct cancer-relevant genes were detected by targeted sequencing, including 424 missenses (82%), 34 copy number variances (7%), 20 nonsense (4%), 17 frameshifts (3%), 13 indels (2%), five splicing site mutations (1%) and three fusion alterations (1%) (Supplementary Fig. 1).

EGFR, KRAS and MET were the most frequently mutated genes in the 32 patients (Fig. 1a). EGFR was mutated in 28% patients (n = 9), but majority were rare mutations, including D1014G, V845M, G485S, K757R, G724S and L861Q, while L858R was only found in two cases and exon 19 deletion (19del) in one case (Table 2). We also identified the presence of T790M accompanying with L858R in a TKI-naïve tumor. KRAS missense mutations were found in 22% patients (n = 7), including G12C/V in six cases and Q61K in one case, both of which are activating mutations (Table 2). We also identified MET mutations in 16% patients (n = 5), including a missense mutation D1028N in exon 14, an intron 13 deletion c.2942-28_2942-13del and H904N that has not been reported before (Table 2).

Fig. 1
figure 1

Mutational analysis of PSC patients. a, b Co-mutation plot of top mutated genes. Each column represents one patient and the mutation frequency of each gene was displayed as bar graph on the right. c TMB in each patient. TMB was calculated by dividing mutation number of each gene to 0.9 MB (size of the targeted sequencing panel). Each bar represents one patient and corresponding to the mutation plot (a, b) above

Table 2 Common mutations and copy number variations in patients

Mutations on EGFR, KRAS and MET were normally considered to be mutually exclusive in tumor due to the overlapping of their signaling pathways. However, concomitant mutations are sometimes observed. We found four cases with EGFR missense mutations also harboring KRAS-activating mutations (Table 1).

TP53 and RB1 are the top two mutated tumor suppressor genes, with a frequency of 69% and 25%, respectively (Fig. 1b). Other top mutated genes include tumor suppressor genes NF1 (n = 7, 22%), TSC2 (n = 6, 19%) and FAT1 (n = 6, 19%), as well as ARID1A (n = 7, 22%), KMT2B (n = 7, 22%) and SMARCA4 (n = 6, 19%), which are all related to chromatin remodeling and modification (Fig. 1b). We also identified two cases with RET fusions, KIF5B-RET and TUBD1-RET. A special gene we notified is NKX2-1, which has copy number gain in one patient and functional loss in two other patients by a nonsense and indel variation, respectively, but all of them are accompanied by TP53 frame-shift variation (Fig. 1c).

The tumor mutation burden (TMB) in all tumors is ranging from 3.3 to 52.2 per megabase (MB) with a median of 11.7 per MB and 13 patients have more than 20 mutations per MB (Fig. 1c). It is observed that patients with mutations in BRCA2, KMT2B, SMARCA4 and TSC2 have significantly higher TMB, indicating the potential of using immunotherapy on these patients (Fig. 2). We also analyzed the impacts of TMB to OS by stratifying the stage III–IV patients (n = 23) into TMB < 20 and ≥ 20, but there is no significant difference between these two groups (Supplementary Fig. 3A).

Fig. 2
figure 2

The correlation of TMB with mutation status of different genes. TMB were grouped by the status of top mutated genes. Each dot represents the TMB in one patient. Mann–Whitney U test was used for statistical analysis. **p < 0.01

Interestingly, in the analysis of genetic alterations in TP53-mutated and TP53 wide-type patients, ARID1A mutation is only presented in TP53-mutated patients (Supplementary Fig. 2). However, the functional loss of ARID1A and TP53 together did not exert more adverse impact on OS compared to sole TP53 mutation (Supplementary Fig. 3B).

Survival and prognosis analysis

At the time of diagnosis, 14 patients received immediate lobectomy or pneumonectomy to remove cancerous parts of the lung and 12 of them also received lymphadenectomy to remove the infiltrated lymph nodes. Seven patients received solely chemotherapy, while six received both surgery and chemotherapies. Four patients with stage IV disease were only subjected to palliative care due to their poor health status. By the end of the study, 11 patients were still alive (35%) and one of them has maintained disease-free survival of 10.6 years after diagnosed as T2aN0M0 PSC and received lobectomy. The median follow-up time for all patients is 7.9 months (range 1.5–129 months) and the 1-year overall survival probability is around 50%.

To explore gene alterations that associate with clinical prognosis, we examined OS of patients when grouped by the mutation status of EGFR, KRAS, RB1 and TP53 individually. We first analyzed the correlation of patients’ clinical characteristics with OS using a multivariate COX regression model and found that disease stage is the only characteristic that significantly associated with OS (Table 3). To reduce its effects, patients with stage III and IV diseases (n = 23) were selected for survival analysis. It is observed that the status of EGFR and RB1 have no significant influences to OS, and TP53 is of borderline significance (p = 0.06) in OS (Fig. 3b, c). The presence of KRAS mutation seems to be associated with significantly worse OS (p < 0.004) (Fig. 3a), but stage-stratified COX analysis delimited its effects (p = 0.226), suggesting that the prognostic effects of KRAS is possibly because KRAS mutations are prone to occur in stage IV disease. MET was not included in this analysis because only three subjects have MET mutations in the stage III and IV patients.

Table 3 Multivariate OS analysis of PCS patients by the COX proportional hazards model
Fig. 3
figure 3

Overall survival analysis of patients with different gene mutations. Only stage III and IV patients (n = 23) were selected for the analysis. Survival analysis was performed in Graphpad Prism 6 and p value was marked in each graph

Discussion

In this study, we performed targeted next-generation sequencing on the tumor samples of 32 Chinese patients with PSC to depict their cancer genomic background. The most frequent mutations in our study are TP53 (69%), EGFR (28%), RB1 (25%) and KRAS (22%), and it is noticeable that the mutation frequency of EGFR is much higher than previous reports. Schrock et al. [10] reported that TP53, CDKN2A and KRAS were mutated in 73.6%, 37.6% and 34.4% of 125 PSC patients, respectively, while EGFR only mutated in 8.8% of patients. The higher occurrence of EGFR mutations in our study cohort is consistent with the report on Japanese population, in which approximately 20% patients mutated in EGFR [8]. It might be related to the fact that EGFR mutations are more prevalent in NSCLC (40–60%) in Asia than in Western countries (10%) [9]. Besides, majority of EGFR mutations in our study are rare mutations, including previously reported K757R, G724S, L861Q and V845M, and unreported D1014G and G485S. The observation is different from previous reports on NSCLC, in which L858R and 19del are the most common mutations and normally takes up 35–45% EGFR mutations, respectively [11, 12]. Of all these rare mutations, K757R, G724S and L861Q were reported having increased kinase activity or increased sensitivity to TKIs [13,14,15]. V845M has been described previously, but its function was undefined. D1014G and G485S are novel mutations that have not been reported before. To enable the application of EGFR TKIs on these patients, it is necessary to involve these rare mutations into the future clinical studies of EGFR TKIs and take their KRAS status into consideration when exert treatment.

The study also suggests that MET is another highly actionable target in PSC for 16% of patients were detected with MET alterations and exon 14 skipping is the primary alteration (4 of 6 MET alterations). Exon 14 skipping has been reported to occur in 3% of NSCLC [16], but 22% of PSC patients and its responses to crizotinib have been confirmed both in vitro and in case studies [7], suggesting the possibilities of introducing MET inhibitors to counter MET over-activation.

It is worth noting that all ARID1A mutations are accompanied by TP53 mutations, which is controversial from previous reports that ARID1A mutations are mutually exclusive with TP53 mutations in ovary cancer and endometrial carcinomas, due to the overlapping of their downstream molecules [17,18,19]. However, the commutation of ARID1A and TP53 was not associated with the reduced OS in PSC patients compared to single TP53 mutation, indicating that it is not prognostically effective in PSC as in endometrial carcinomas. A hypothesis for this phenomenon is that ARID1A and TP53 are occurring in different histological components of PSC considering that the tumor is normally mixed with different cell types.

The tumor mutation load (TMB) is defined as the total number of somatic gene-coding errors, base substitutions, gene insertions, or deletion errors detected per million bases. It is generally believed that the higher the TMB means that the body includes more tumor antigens, thereby activating the body’s immune function. Several studies have shown that TMB is closely related to the efficacy of CTLA-4 and PD-1 inhibitors 20, 21]. Higher TMB associated with better clinical efficacy of immune checkpoint inhibitor [22]. We identified a median of TMB per sample (range 3.3–52.2). The median number of TMB per sample was 11.7. The quantity and range of mutations were similar to published series of PSCs [23, 24]. In addition, we observed a wide range of TMB in the PSC patients and patients with mutations in certain genes, including BRCA2, KMT2B, SMARCA4 and TSC2, tend to have significantly higher TMB. BRCA2 is related to DNA repair and its correlation to higher TMB has been reported in breast cancer [25]. KMT2B is a gene for histone modification, SMARCA4 is for chromatin remodeling, while TSC2 regulates mTOR activation. They are involved into different cellular signaling pathways and their relations with higher TMB in PSC are reported for the first time. The relation of mutations of EGFR and the use of an immune checkpoint inhibitor is currently being investigated intensively in multiple clinical trials. Studies have shown that EGFR mutations in NSCLC may reduce PD-L1 expression [26]. The TMB is usually considered to be relatively high in patients with driver-negative genes, which means that immunotherapy is effective. In this study, the TMB of EGFR-mutation NSCLC patients was significantly lower than that of EGFR wild-type patients. We believe that this finding may have a significant impact to EGFR wild-type patients of high TMB.

Survival analysis found that the OS of patients is associated with disease stages, but not age, gender, smoking status or TMB. The attempt to link genetic alterations to disease prognosis is also not successful except that patients with TP53 mutations exhibits a borderline significance in better outcomes compared to TP53 wide-type patients. It might be because of the complex histological composition and tumor heterogeneity of PSC, which increase the difficulties in identifying a general prognostic marker for all patients. More detailed and stratified analysis based on the histological and genetic background of this disease is required to obtain more information.

As we known, sarcomatoid carcinomas are a rare pulmonary neoplasm and receive little attention. Our results showed that both EGFR and MET are frequently mutated in PSC and sensitizing somatic mutations of EGFR gene are seen in only 28% Chinese patients with PSC. We conclude that it is important to involve EGFR rare mutations and MET exon 14 skipping targeted therapies into clinical trials for treating PSC patients. It may provide a basis for exploring new targeted therapies for PSC. Furthermore, high TMB is seen in about 40.6% Chinese patients with PSC, which could benefit from the use of immune checkpoint inhibitors. However, more large samples of clinical studies are still needed to confirm this study.