Introduction

Papillary thyroid cancer (PTC) is the most common malignant tumor of the thyroid [1]. As most PTC patients have a favorable prognosis, overdiagnosis and overtreatment are a concern [1]. Two Japanese prospective studies on asymptomatic papillary microcarcinoma (tumor size ≤ 1 cm) showed that nonsurgical observation [2, 3] and active surveillance of low-risk papillary microcarcinoma is a practical strategy to avoid unnecessary surgery [4, 5]. In the United States and Europe, recent guidelines do not recommend aggressive fine-needle aspiration cytology for thyroid tumors ≤ 1 cm [1, 6]. However, PTCs with invasion of the trachea, esophagus, or recurrent laryngeal nerve, massive lymph node metastases, or progressive distant metastases are defined as “high-risk” in several clinical guidelines, and high-risk PTC can be fatal [1, 4]. Although systemic therapies, such as radioiodine and multiple tyrosine kinase inhibitor drugs, are available for patients with progressive distant metastases, PTC in these patients is still difficult to cure [4]. Thus, there is an urgent need for the development of informative biomarkers that can identify high-risk PTC.

Recent studies have identified an association between oncogene mutations and prognosis of differentiated thyroid cancer. For example, the presence of a BRAFV600E mutation was associated with a poor prognosis [7, 8], and TERT promoter mutations were an indicator of clinically aggressive tumors and poor prognosis [9, 10]. Identifying the differences in the gene signature between low- and high-risk PTC will contribute to the prediction of prognosis and may influence decisions on treatment strategy. Moreover, the genetic profile that characterizes high-risk PTC can be a therapeutic target. Although these oncogenic mutations have been well-studied, few investigations have focused on the differences in tumor-progressive gene expression status between low- and high-risk PTC. Therefore, this study aimed to define the characteristics of the tumor-progressive gene expression profile in low- and high-risk PTC and identify the putative biomarkers in high-risk PTC.

Methods

Sample collection

Formalin-fixed paraffin-embedded (FFPE) samples resected from six patients with low-risk PTC and six patients with high-risk PTC, who underwent surgery at Nagoya University Hospital between 2002 and 2008, with clinicopathological data available, were used in this study. The risk classification of each patient was based on the Japanese Clinical Practice Guidelines for Thyroid Tumors [4]. Clinical and pathological TNM classifications were in accordance with the Union for International Cancer Control (UICC) distribution (8th edition). All procedures, from RNA extraction to expression data analysis, were conducted by Riken Genesis Co. (Tokyo, Japan). To isolate total RNA from each tumor, we used 10–15 unstained 5 μm slides. The site of each tumor was manually macro-dissected, and total RNA was extracted using the Maxwell RSC RNA FFPE Kit (Promega Co., Madison, WI, USA). The RNA quality of every sample was validated by assessment with the Qubit 3.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). A total of 770 genes (740 tumor-progressive and 30 reference genes) were evaluated with the nCounter PanCancer Progression panel (NanoString Technologies Inc., Seattle, WA, USA). Raw counts were analyzed using nSolver Analysis Software (NanoString Technologies Inc.) and normalized according to the standard protocol using expression levels of reference genes.

Public datasets of PTC

We obtained the gene expression data and pathological and prognostic characteristics of the patients from The Cancer Genome Atlas (TCGA) database via cBioPortal for Cancer Genomics (URL: https://www.cbioportal.org/).

Statistical analysis

Differences in continuous variables between the two groups were evaluated with the Mann–Whitney test. Correlations between the two gene levels were analyzed using the Spearman’s rank correlation test. For prognostic analysis, each gene expression level in TCGA was divided into quartiles, and the highest quartile was compared with the remaining quartiles. The Kaplan–Meier method was used to evaluate the disease-free survival (DFS) rate and survival curves were compared using the log-rank test. The association between SRPX2 expression levels and patient clinicopathological factors were analyzed using the χ2 test. Multivariate regression analysis used the Cox proportional hazards model to identify prognostic factors and variables, for which p < 0.05 was entered into the final model. JMP 12 (SAS Institute, Inc., Cary, NC, USA) and R version 3.6.3 (Vienna, Austria. URL: http://www.R-project.org/) were used for the statistical analysis and p < 0.05 was defined as significantly different.

Ethical approval and consent for participation

This study complies with the Declaration of Helsinki and was approved by the institutional review board of Nagoya University Graduate School of Medicine (reference number: 2019–0019). Written informed consent was obtained from participants for the use of samples and data.

Results

Expression profile of the 740 genes involved in cancer progression

Table 1 summarizes the characteristics of the 12 PTC patients from whom specimens were analyzed in the panel. The low-risk group comprised six patients (L-1–L-6) with cT1 or cT2 and cN0 disease: three with pT1 and pN0 stage disease and three patients with pT1 or pT2 and pN1a stage disease. The high-risk group comprised six patients (H-1–H-6) with cN1 disease: three with pT4a stage disease that invaded the recurrent laryngeal nerve and three with M1 disease involving the lung. The high-risk patients had larger tumors (p = 0.025) and more lymph node metastases (p = 0.005) than the low-risk patients. None of the patients had metastatic lymph nodes larger than 3 cm or extranodal invasion, which are considered high-risk factors [4].

Table 1 Clinicopathological characteristics of the 12 patients whose cancer specimens were analyzed in the nCounter analysis

The expression levels of the 740 tumor-progressive genes were evaluated using an nCounter PanCancer Progression panel. Every gene expression level in each specimen is listed in Supplementary Table 1. Hierarchical clustering of 12 specimens in the 740 genes did not indicate an association with the clinical risk classification (Fig. 1a). When each gene expression level in the high-risk group was compared with that in the low-risk group, 14 genes showed significantly higher (> two-fold) expression and one gene expressed less than half (Fig. 1b). These 15 genes characterized the gene expression profile of high-risk PTC (Table 2).

Fig. 1
figure 1

Expression analysis of 740 genes involved in cancer progression. a Hierarchical clustering of 12 tumor specimens using the 740 genes. Each colored square indicates the relative mean transcript abundance for each sample. Classification of risk and pathological stage is shown below the array tree. b Volcano plots indicate each gene’s -log10 (p value) and log-twofold change. When each gene expression in the high-risk group was compared with that in the low-risk group, 14 genes were expressed more than two-fold (red dots), and one gene was expressed by less than half (blue dot)

Table 2 Genes expressed at higher or lower levels in the high-risk group

Association between gene expression and pathological factors in the TCGA database

To identify which of the 15 genes have clinical significance, their expression levels in 382 PTC patients were evaluated using the TCGA database. The median age was 46 years (range 17–89 years) and 93 male and 271 female patients were included (data missing for 18 patients). Table 3 summarizes the TNM stages in the UICC distribution (8th edition). Eleven patients (2.9%) had pT4 stage and 153 patients (40.1%) had lymph node metastasis (pN1 stage).

Table 3 Clinicopathological characteristics of the 382 patients in the Cancer Genome Atlas (TCGA) database

When the expression levels of the 15 genes were evaluated in these 382 patients, CCL11, COL6A3, INHBA, and SRPX2 showed significantly higher expression in patients with pT4 (n = 11) or pN1 (n = 153) stage disease than in those with pT1/T2/T3 (n = 351) or pN0 (n = 172) stage disease (Fig. 2a and Supplementary Fig. 1). No significant difference was found in any gene expression level between M1 (n = 6) and M0 (n = 193) stages. Moreover, there was a significant correlation in each pair of these four genes (Fig. 2b). In particular, the expression level of SRPX2 was highly correlated with that of COL6A3 (correlation coefficient: 0.753) and INHBA (correlation coefficient: 0.686).

Fig. 2
figure 2

a Association between gene expression levels and pathological stages in The Cancer Genome Atlas (TCGA) database. The expression levels of CCL11, COL6A3, INHBA, and SRPX2 were significantly higher in patients with pT4 or pN1 stage disease than in those with pT1/T2/T3 or pN0 stage disease. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001. b Correlation between two of four gene expression levels. There was a significant correlation in every pair of these four genes. r correlation coefficient

Association between gene expression and prognosis in the TCGA database

We investigated the association between the expression status of each of the four genes that were highly expressed in patients with an advanced stage of PTC and the DFS of 323 patients who had M0 stage disease with prognostic data available in the TCGA database. Patients with the highest quartile of each gene expression were designated as the “high group” (n = 82), and the remaining patients were designated as “the others” (n = 241). Although the DFS of the “high group” with high expression of CCL11, COL6A3, or INHBA did not differ significantly from that of “the others”, the “high SRPX2 group” had shorter 5-year DFS than “the others” (70.9% vs. 85.0%, respectively; p = 0.013; Fig. 3). The high SRPX2 group had a high proportion of older patients (≥ 55 years, p = 0.049), more advanced pT stage (p = 0.035), and pN1 stage (p = 0.001; Table 4). These results suggested that SRPX2 is a prognostic biomarker of PTC.

Fig. 3
figure 3

Prognosis analysis comparing the highest quartile group and other quartiles for each of the four genes (CCL11, COL6A3, INHBA, and SRPX2). The disease-free survival rate was significantly poorer in the high SRPX2 group only

Table 4 Associations between SRPX2 expression and the clinicopathological characteristics of 323 patients in the Cancer Genome Atlas (TCGA) database

Univariate analysis of DFS, identified “pN1 stage” and “high SRPX2” as significant prognostic factors. Multivariate analysis identified both “pN1 stage” (hazard ratio: 4.20; 95% confidence interval (CI) 1.35–18.4, p = 0.012) and “high SRPX2” (hazard ratio: 3.12; 95% CI 1.21–8.32, p = 0.019) as independent prognostic factors (Table 5).

Table 5 Prognostic factors for disease-free survival in patients with papillary thyroid cancer in the Cancer Genome Atlas (TCGA) database (n = 323)

Discussion

In this study, we investigated 740 tumor-progressive genes in six low-risk and six high-risk PTC patients and identified 14 highly expressed genes and 1 low expression gene in the high-risk patients. Thereafter, in the TCGA database, the expression levels of CCL11, COL6A3, INHBA, and SRPX2 were found to be higher in patients with advanced stage PTC, and SRPX2 was identified as an independent prognostic factor.

The development of molecular biomarkers contributes to the risk stratification of patients, which assists with deciding on the appropriate therapy. In breast cancer, several commercial multigene expression assays are clinically available to predict patient prognosis and evaluate the necessity of adjuvant chemotherapy [11]. Studies on differentiated thyroid cancer have demonstrated the importance of several oncogene mutations including BRAFV600E and TERT promoter mutations as indicators of poor prognosis; however, little is known about the gene expression profile related to tumor progression in PTC [7,8,9,10]. As a first step to explore candidate molecules in PTC, we used the nCounter PanCancer Progression panel in each group of six patients. This is a robust technique using FFPE samples and does not need RNA amplification with polymerase chain reaction, unlike the next-generation sequencing technique. Although the cluster analysis found no tendency between the low-risk and high-risk patients in the expression of all 740 genes, 14 genes were highly expressed, and one gene showed lower expression in the high-risk group. These 15 genes were considered putative biomarkers that can distinguish high-risk PTC.

The small number of patients in the high-risk group in the nCounter analysis seemed insufficient to represent the whole malignant phenotypes. To compensate this, we evaluated whether these genes were highly expressed in patients with advanced TN stages in the TCGA database. Among 15 genes, CCL11, COL6A3, INHBA, and SRPX2 were highly expressed in patients with pT4 or pN1 disease. Previous studies have identified the tumor-progressive roles of these four genes. CCL11, an eosinophil-selective chemoattractant cytokine, promotes cancer cell proliferation, migration, and invasion, and is upregulated in glioblastoma, renal cell carcinoma, and ovarian cancer [12,13,14]. COL6A3 belongs to collagen type IV, which is the major structural extracellular matrix protein, and the expression of COL6A3 is associated with poor prognosis in colon, pancreas, prostate, and lung cancers [15,16,17,18]. INHBA, a member of the TGF-β superfamily, is a poor prognostic predictor in colon cancer and is involved in the tumorigenesis of ovarian cancer [19, 20]. SRPX2, a chondroitin sulfate proteoglycan, is overexpressed in various cancers, such as gastric cancer and esophageal squamous carcinoma, by promoting cell proliferation and metastasis [21, 22]. In this study, the expression levels of the four genes were correlated significantly with each other, suggesting that they may be regulated by common or close pathways in PTC. Thus far, one study has reported that both COL6A3 and INHBA showed higher expression levels in gastric cancer tissue than normal tissue [23]. Here, the expression level of SRPX2 showed comparatively high correlation with that of COL6A3 and INBHA. Further mechanistic analysis is warranted to clarify the relationship between these four genes.

Among the four genes, the high expression of SRPX2 was associated with poor DFS and identified as a prognostic factor. There have been no reports describing the importance of SRPX2 expression in thyroid cancer. In prostate cancer, the knockdown of SRPX2 inhibited cell proliferation, migration, invasion, and epithelial-mesenchymal transition through suppression of the PI3K/Akt/mTOR signaling pathway [24]. The PI3K/Akt/mTOR signaling pathway was also suppressed by the silencing of COL6A3 [25], the mRNA expression of which was highly correlated with that of SRPX2 in this study. Interestingly, one study on osteosarcoma found that SRPX2 promotes tumorigenesis, tumor growth, and invasion by activating YAP1, which promotes malignant phenotypes, expansion of cancer stem cells, and drug resistance and has been noted as a potential therapeutic target molecule [26]. These results suggest the comprehensive tumor-progressive roles of SRPX2. Based on these and our findings, SRPX2 has the potential to be a novel therapeutic target as well as a prognostic marker.

These results have several possible clinical applications. The 14 genes highly expressed in high-risk PTC may contribute to the development of a multigene expression panel in PTC to predict prognosis. Patients predicted to have poor prognosis may require adjuvant systemic therapy, such as radioiodine or molecular targeting drugs, to improve their prognosis. Furthermore, the molecules identified in this study, especially SRPX2, have the potential to be novel therapeutic targets. Although lenvatinib and sorafenib have been used clinically in the treatment of PTC patients with progressive distant metastases, these drugs target multiple tyrosine kinase and can cause adverse events like hypertension, fatigue, proteinuria, and severe hand-foot syndrome [27, 28]. Developing drugs that target fewer molecules will provide more effective and safer treatments for patients with advanced PTC.

This study has some limitations. First, because the risk classification of each patient in the nCounter analysis was based on the Japanese guidelines [4], the high-risk group included patients with stage I or II disease in the UICC distribution. It is necessary to consider that the definition of “high-risk” varies among guidelines. Second, these results were obtained from gene expression analysis, which does not evaluate protein expression and each molecule’s mechanism. Further functional analyses of these molecules are needed to identify their tumor-progressive roles in PTC. Third, there are clinical data missing in the TCGA database. For example, the M status of 183 patients (47.9%) was unknown and data on the age of 4.7% of patients were missing. This might affect the fact that “age ≥ 55 years”, which is recognized as a poor prognostic indicator in several clinical guidelines [1, 4], was not a significant prognostic factor in this study. These results need to be validated in different cohorts.

In conclusion, our study identified gene expression profile characteristics of high-risk PTC. Among 740 tumor-progressive genes, CCL11, COL6A3, INHBA, and SRPX2 were highly expressed in advanced PTC patients, and SRPX2 was identified as a prognostic biomarker. We expect that these findings will be used for the identification of novel prognostic markers or therapeutic targets in PTC.