Introduction

Liver cancer has had the fastest growing mortality rate over the decades, with a 5-year survival rate of 20%, the second lowest among all cancers [1]. Hepatocellular carcinoma (HCC) is the most common pathological type of liver cancer, accounting for 75–85% of all cases [2]. As technology has improved, so has our understanding of the genetic alterations related to liver carcinogenesis. Genomic analyses have delineated several characteristic genomic events associated with the etiology of HCC, including FGF19 gene amplifications and mutations in the CTNNB1 and TP53 genes [3,4,5]. A comprehensive understanding of HCC pathogenesis and progression at the molecular level would help us identify core oncogenic events for developing new therapeutic strategies but currently remains an unmet need.

Although optimal clinical management has improved the survival of HCC patients, the high frequency of postoperative relapse and the presence of metastases limit long-term outcomes [6,7,8]. The risk of relapse is unpredictable in most HCC patients in current clinical practice. Microvascular invasion (MVI), considered the initial stage of hematogenous metastasis of HCC, is an important risk factor that influences the relapse and prognosis of HCC after hepatectomy [9, 10]. However, the genetic mechanisms underlying the vascular invasion of HCC, especially the clonal evolution between the primary tumor and portal vein tumor thrombus (PVTT), remain poorly understood. Therefore, comprehensive genomic profiling based on emerging high-throughput technologies and a gene-based predictive model highly associated with disease relapse are required to enhance the molecular understanding of HCC progression and promote individualized treatment of HCC.

Another major challenge in achieving precision medicine for HCC is the intratumor heterogeneity and lack of tissue of advanced HCC [11]. In this context, circulating tumor DNA (ctDNA), a key component of liquid biopsy, has received much attention as it integrates genomic information from different lesions and can be obtained in a minimally invasive manner [12]. The application prospects of ctDNA in the diagnosis and treatment of HCC have been widely discussed [13,14,15]. However, although ctDNA is released throughout malignant progression, from initial growth to metastatic spread, the evolutionary pattern of ctDNA in different stages of HCC remains largely unknown.

Here, we aim to systematically elucidate the genomic landscape during the clonal evolution of HCC from the perspective of tumor and ctDNA and to identify mutations associated with MVI and relapse. We performed whole-exome sequencing (WES) of tumor and peritumor tissues, PVTT, ctDNA, and peripheral blood mononuclear cells (PBMCs) in 5 HCC patients with MVI and 5 HCC patients without MVI. Using these samples, we compared genomic profiles, including mutational landscapes and clonal evolution patterns, between HCC patients with MVI and those without MVI to investigate the genomic evolution during HCC vascular invasion. Finally, focusing on the significantly mutated genes (SMGs) associated with MVI, we performed an integrative genomic analysis with the transcriptome and developed a prognostic signature for relapse prediction in HCC.

Materials and methods

Patient cohort

Biospecimens for WES were collected from 10 patients diagnosed with pathologically confirmed HCC during curative resection at Zhongshan Hospital, Fudan University between June 2016 and July 2016, including 5 with MVI and 5 without MVI. Peritumor tissues, tumor tissues, PBMCs, and ctDNA from these 10 patients, as well as 5 PVTTs from MVI-positive patients, were subjected to WES (a total of 45 samples). PBMCs were used as the control. Patients were included based on the following criteria: radical resection; no prior intervention, such as transhepatic artery embolization, chemotherapy, or radiotherapy; and no history of other malignancies or inflammatory disease. The presence of MVI in the tumor was independently judged by two experienced pathologists. In addition, we retrospectively collected 172 tumor samples from HCC patients who received hepatectomy during January 2012 and December 2012 in Zhongshan Hospital, Fudan University, as an external validation for our signature (PCR cohort) with the same inclusion criteria.

Demographic and clinicopathological data, including age, sex, laboratory tests, and pathology reports, were collected from the electronic medical record system. Liver function was assessed based on the Child‒Pugh scoring system. Tumors were staged according to the 8th edition of the tumor-node-metastasis (TNM) classification system [16]. The postoperative follow-up scheme was conducted as previously described [17]. Relapse-free survival (RFS) was defined as the interval between surgery and relapse or the study endpoint. The clinicopathological characteristics of patients were summarized in the Supplementary Table 1. All patients were HBV positive and patients with MVI were diagnosed as MVI ( +) after surgery. The study design was conducted in accordance with both the Declarations of Helsinki and Istanbul and approved by the Ethics Review Board of Zhongshan Hospital, and all participants provided written consent.

Genomic DNA preparation and WES

Genomic DNA from fresh-frozen specimens and PBMCs was extracted using the QIAamp DNA FFPE Tissue Kit and QIAamp DNA Blood Mini Kit (Qiagen, Germany), respectively, according to the manufacturer’s instructions. DNA was then quantified using a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, USA) and assessed for integrity by agarose gel electrophoresis.

Sequencing libraries were prepared using the Agilent SureSelectXT Low Input Reagent Kit (Agilent Technologies). Briefly, qualified genomic DNA was sheared to fragments in the range of 180–280 bp using a Biorupter Ultrasonicator (Diagenode) and then purified using AMPure SPRI beads. Later, DNA fragments were ligated with Illumina paired-end indexed adapters and amplified by 8 cycles of PCR to construct uniquely indexed libraries. After size distribution examination on the 4200 TapeStation High Sensitivity D1000 ScreenTape, the libraries were subjected to exon target capture using the SureSelect Human All Exon V5 Kit (Agilent Technologies). The captured exon-enriched libraries were assessed on an Agilent 2100 Bioanalyzer and quantified using a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, USA). Finally, libraries were sent for sequencing on the Illumina HiSeq 2000 platform using the format of 150 bp paired-end reads.

Sequence alignment and variant calling

Filtered raw sequencing data in FastQ format were aligned to the reference human genome (UCSC GRCh37/hg19) using Burrows‒Wheeler Aligner (BWA) software to generate mapping results in BAM format [18]. GATK2 and Picard were then used for local realignment and base recalibration [19]. Somatic single-nucleotide variations (SNVs) were identified using MuTect, and somatic indels were detected by GATK Somatic Indel Detector [20]. High-confidence variants were identified using the following criteria: coverage for all samples was more than 10 × , the variant allele fraction (VAF) was at least 10% in the tumor samples and less than 2% in peritumoral samples and the number of reads supporting the variant allele was > 3. ANNOVAR was conducted to annotate Variant Call Format (VCF). Somatic SNVs and indels were then compared against the single nucleotide polymorphisms (SNPs) presented in the 1000 Genomes Project to remove common SNPs and SNVs with minor allele frequencies greater than 1% or located in segmental duplications.

Spectra of somatic mutations and mutation signature analysis

Mutation spectra of six mutation types (C > T, T > C, C > A, T > A, C > G, and T > G) were analyzed in all samples. For mutation signature analysis, the relative contribution of each mutation signature in each sample was statistically quantified based on somatic SNVs using the “deconstructSigs” package in the R project [21]. The extracted signatures were analyzed for similarity with 30 previously depicted COSMIC mutation signatures.

Copy number variation (CNV) analysis

GATK software was applied to detect somatic CNVs and identify loss of heterozygosity (LOH) regions. The recurrently amplified and deleted genomic regions were then inferred using the GISTIC algorithm [22]. G-scores were calculated to evaluate the frequency and magnitude of amplified and deleted genomic fragments. Peak regions with a p value less than 0.05 were defined as significant CNV regions.

Identification of SMGs

The MutSigCV method was used to quantify the importance of nonsilent mutations (including SNVs and indels) and estimate the background mutation rate based on silent mutations. SMGs were defined as genes with a significantly higher mutation frequency than the background mutation rate (false discovery rate (FDR) < 0.05). In HCC patients with MVI, shared SMGs among tumors, PVTTs, and ctDNA were identified as SMGs associated with MVI.

Phylogenetic tree construction and driver mutation labeling

A phylogenetic tree was constructed based on nonsynonymous somatic mutations, including SNVs and indels, using PHYLIP to analyze the evolutionary trajectory of HCC [23]. Phylogenetic tree can be used to visualize the affinities between species or genes using tree branching graphs, and to calculate the evolutionary relationships between organisms by performing mathematical and statistical algorithms on gene mutations. The lengths of the branches and trunks were determined according to the number of mutated genes. For driver labeling, we collected a list of potential driver genes for HCC by a literature review of sequencing studies and assessed the mutation types in these genes. Genes were identified as driver genes and labeled to the phylogenetic tree when they had one of the following three variant types: nonsynonymous SNVs or indels, copy number amplification of oncogenes or fragment deletion of tumor suppressor genes and breakpoints in gene structure.

Cancer cell fraction (CCF) and mutation cluster analysis

Dynamic changes in the mutational landscape were analyzed using CCF value as mentioned previously [24]. This allows us to analyze the proportion of cells carrying a particular gene mutation in different samples, which can reflect the dynamic changes in the mutational characteristics of tumors during metastasis. For each somatic mutation, CCF was calculated using the following formula: \(VAF = \frac{{{\upalpha } \times {\text{CCF}}}}{{{\upalpha } \times {\text{CNT}} + 1 - {\upalpha } \times {\text{CNP}}}}\), where α indicates the tumor purity and CNT and CNP represent the copy number of tumor and peritumor samples, respectively. The tumor purity and copy number of each sample were detected using ABSOLUTE [25]. VAF was calculated using the following formula: \(VAF = \frac{{{\text{N}}^{var} }}{{{\text{N}}^{var} + {\text{N}}^{ref} }}\), where Nvar and Nref indicate the number of reads supporting the variant allele and reference allele, respectively. For mutation cluster analysis, mutation clusters were inferred based on somatic SNVs and indels using the PyClone method. Clusters containing only silent mutations were removed before further analysis.

Analysis of public databases

Liver cancer cohorts from two public databases were employed in this study: the LIHC cohort from The Cancer Genome Atlas (TCGA) database (n = 356) and the GSE14520 cohort in the Gene Expression Omnibus (GEO) database (n = 221) from the Liver Cancer Institute (LCI) at Zhongshan Hospital. Patients with pathological types of HCC were included. The LIHC cohort was used as the training dataset, and the GSE14520 cohort was employed for external validation (LCI validation dataset 1). RNA sequencing data and clinical information were downloaded from the corresponding websites. For data processing, transcriptome data from the LIHC cohort were logarithmically transformed and normalized; gene expression profiles from the GSE14520 cohort were processed on the Affymetrix platform. The probe names were converted into the corresponding gene symbols according to the annotation of each cohort. Batch effects were eliminated using the ComBat method. Gene set enrichment analysis was performed with C2 gene sets obtained from the Molecular Signatures Database. The Tumor Immune Dysfunction and Exclusion (TIDE) algorithm, a computational approach to evaluate tumor immune evasion potential based on gene expression profiles, was used to predict the immunotherapy response of HCC samples [26]. The regimen of immunotherapy refers to anti-PD-1 antibodies (pembrolizumab, nivolumab) and anti-CTLA-4 antibody (ipilimumab) based on the TIDE website.

Development of the relapse-related gene signature (RGSHCC) for HCC

Overlapping genes in MVI-associated SMGs and the TCGA-LIHC cohort and GSE14520 cohort were identified as relapse-related genes. Univariate Cox regression analysis of these genes was then performed in the TCGA training dataset to screen for prognosis-related genes. The least absolute shrinkage and selection operator (LASSO) Cox regression model was used to determine the final genes to be included in the signature and to calculate the corresponding gene coefficients. Analysis was performed using the “glmnet” package in the R project with tenfold cross-validation. A prognostic signature based on the weighted gene expression was finally constructed, and the model score was calculated using the following equation: \({RGS}_{HCC} Score={\sum }_{i=1}^{n}{c}_{i}{x}_{i}\), where x and c indicate the expression levels of specific genes and the corresponding coefficient, respectively.

Quantitative RT‒PCR (qRT‒PCR)

Tumor samples were treated with TRIzol reagent (Invitrogen) to extract total RNA using a general protocol. After quantification by a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, USA), RNA was reverse-transcribed into cDNA using the PrimeScript RT reagent Kit with gDNA Eraser (TaKaRa, Japan). The PCR system was prepared using RT2 SYBR Green qPCR Master Mix (QIAGEN), and qRT‒PCR was performed on the Applied Biosystems™ QuantStudio™ 3 platform (Thermo Fisher Scientific, USA). GAPDH was used as the control. The relative mRNA levels of target genes were determined using the ΔΔCt method. The primers for qRT‒PCR are summarized in Supplementary Table 2.

Statistical analyses

The data are presented as the mean ± standard deviation (SD) and frequencies (proportion) according to the variable type. Quantitative data were compared using Student’s t test or the Mann‒Whitney U test, and the composition ratio between groups was compared using Pearson’s chi-square or Fisher’s exact test. The Kaplan − Meier method and log-rank test were used for survival analysis. All statistical analyses were performed using SPSS (version 23.0; IBM) software or R project (version 4.0.0) with a two-sided p value of 0.05 as a statistically significant threshold.

Results

Profiles of the genomic alterations in HCC patients with and without MVI

To investigate the genomic alterations in HCC during the process of MVI, we performed WES of 45 samples, including paired tumor and peritumor tissues, PVTTs, PBMCs, and ctDNA, from 10 HCC patients (5 of whom had no MVI and 5 of whom had MVI) with an average sequencing depth of 232.74 × (ranging from 198 × to 280 ×) for tissues and 339.6 × (ranging from 302 × to 406 ×) for ctDNA (Fig. 1A). Approximately 99.6% of the targeted bases had a coverage depth greater than 10 × . We performed strict data quality control on each sample by applying filters to exclude sequence artifact features enriched in FFPE samples. Finally, by mapping sequence reads to the human reference genome, we identified a total of 10,945,468 SNVs and 1,567,536 indels, with a median of 775 (ranging from 587 to 1,067) somatic mutations per tumor.

Fig. 1
figure 1

Overview of the genomic alterations in HCC patients with and without MVI. A Flow chart of whole-exome sequencing in the present study. Preoperative PBMCs and ctDNA and postoperative peritumor tissue (PT), tumor tissue (TT), and PVTT from MVI ( +) cases of 10 HCC patients were collected for sequencing. B Top: The distribution of 6 substitution mutations in all samples. Bottom: The relative coefficients of mutation signatures in all samples. C Patterns of 3 identified mutation signatures (Signature A, B, and C) and their most similar COSMIC mutation signatures. D Top: The whole-genome distribution of CNVs (upper, red represents amplification; lower, blue represents deletion) inferred using the GISTIC algorithm. The G-score was used to assess the frequency and magnitude of CNVs. Bottom: The distribution of CNVs in samples from the MVI ( – ) group (L01-05) and the MVI ( +) group (H06-10)

Our analysis of the mutation spectrum revealed a predominance of C > T and T > C transitions and C > A transitions in all samples (Fig. 1B, top panel), a feature shared by other HCC cohorts [4, 5, 27, 28]. Furthermore, the mutation spectra were fairly consistent between tumors and PVTTs from the same patient with MVI. No significant difference in mutational spectrum was observed between MVI ( – ) HCCs and MVI ( +) HCCs.

We next performed mutation signature analysis of 96 substitution patterns to explore the temporal dynamics of the genomic landscapes in MVI ( – ) HCCs and MVI ( +) HCCs and identified 3 signatures across almost all samples (except tumor tissue of L05) (Fig. 1B, bottom panel). Signatures A and B, predominant signatures in most samples, were mainly characterized by C > T transitions and exhibited high similarity to the previously described Cosmic Signature 5 (A: cosmic similarity = 0.82; B: cosmic similarity = 0.86), which has been reported to be associated with aging (Fig. 1C) [29]. Signature C, characterized by T > A mutations, was only dominant in 3 MVI ( – ) tumor samples and had a close correlation (cosmic similarity = 0.98) with Cosmic Signature 22, which is known to be a result of aristolochic acid exposure [30]. A significant feature revealed by analyses was the similar mutation signature profile of tumor and PVTT samples in HCC patients with MVI, suggesting that metastatic lesions formed after vascular invasion follow a similar mutational process as the primary tumors. In addition, we found that in HCC patients with MVI ( – ), the relative contributions of signatures showed substantial variation between tumors and ctDNA; in contrast, in HCC patients with MVI ( +), the contribution of signatures was comparable, suggesting that ctDNA can effectively reflect the mutation features of tumors and metastases only when HCC progresses to the MVI stage.

In addition, CNV analysis revealed 56 amplified fragments harboring several genes, such as PPIAL4D, TARP, and DAD1, and 17 deleted fragments containing genes, including FCGR1A, MUC5B, and KDM6B, across the entire sample (Fig. 1D, top panel). The number of CNVs in the tumor and PVTT was significantly greater than that in the peritumor tissue, and more CNVs were detected in tumor and PVTT samples from the MVI ( +) group than in those from the MVI ( – ) group, suggesting the involvement of CNVs in hepatocarcinogenesis and vascular invasion (Fig. 1D, bottom panel). In addition, similar numbers of CNVs to those in tumor tissue could be detected in ctDNA in 3 HCC patients with MVI, indicating that the characteristics of genomic CNVs can be reflected from ctDNA to some extent in MVI ( +) HCCs. Taken together, these findings demonstrate that initial hematogenous metastases share similar genomic alterations with the primary tumor, and these features can be captured from ctDNA at the MVI stage of HCC.

Overview of somatic mutations and SMGs in HCC patients with and without MVI

To determine the potential molecular relationships among peritumor tissue, tumor tissue, PVTT, and ctDNA in each patient, we analyzed the nonsynonymous somatic mutations, including SNVs and indels, across all samples. As shown in Fig. 2A, different extents of shared mutations were detected in tumor tissues and PVTTs of MVI ( +) HCCs (accounting for 56.6%-83.2% of all mutations in each patient). In addition, tumor and PVTT samples, as well as ctDNA, of a given MVI ( +) case harbored fairly overlapping repertoires of somatic mutations, although to varying extents among individuals (ranging from 28.8% to 52.3%). However, in the MVI (-) patients (except L03), there were few shared mutations between tumor tissue and ctDNA (0.0–0.5%). These results suggest a potential clonal relatedness and identical origin between the primary tumor and PVTT, as well as ctDNA, in MVI ( +) HCC.

Fig. 2
figure 2

Profiles of somatic mutations and SMGs in HCC patients with and without MVI. A Venn diagrams of shared nonsynonymous somatic mutations among peritumor (P), tumor (T), PVTT, and ctDNA samples from the MVI ( – ) group (L01-05) and the MVI ( +) group (H06-10). B The distribution of SMGs in each sample. The top 30 statistically significant genes are shown. Mutation types are annotated in the lower panel

Discovery of key mutation events is an essential step to fully uncover the molecular mechanisms of tumorigenesis and progression [31]. Hence, we next revealed genes that were significantly mutated in the samples (Fig. 2B, Top 30 genes are shown). TP53 was the most significant SMG and was mutated in all tumor and PVTT samples (100.0%) from the patients with MVI but only 2 of 5 tumor samples from the patients without MVI (40.0%). Moreover, we observed that mutations in 20 of these 30 genes (66.7%) were detected only in the MVI ( +) but not MVI ( – ) HCC patients, suggesting that mutations affecting these genes mainly contribute to vascular invasion in HCC. In addition, somatic mutations in some genes, such as LPHN2, GRIA1, SBSPON, OSBPL3, LGR6, and ZNF541, were detected in the tumor, PVTT, and ctDNA of at least two HCC patients with MVI, indicating that mutations in these genes are more likely to be involved in MVI of HCC rather than incidental occurrence. Moreover, the profile of SMGs in ctDNA was highly consistent with that of the tumor and PVTT samples in the MVI ( +) patients. Collectively, we identified some metastasis-promoting gene mutation events that occur at the primary tumor stage and are inherited in metastatic tumors and ctDNA.

The evolutionary trajectory of vascular invasion in HCC

To further determine the evolutionary pattern in vascular invasion of HCC, we constructed a phylogenetic tree for each patient based on the somatic mutated genes (Fig. 3A). The first noteworthy finding is that the phylogenetic tree varies considerably between different MVI ( +) HCC cases, i.e., different branch evolution times between individuals resulted from vascular invasion. Nevertheless, we observed different lengths of trunk genes in the highly metastatic HCC group (H06, H07, H08, H09, and H10), suggesting a monoclonal origin of metastatic tumors, primary tumors, and ctDNA. No trunk gene length was observed in the low metastatic HCC group (L01, L02, L03, L04, and L05). In addition, as indicated by the divergent evolution of branching clones in different samples, there was significant spatial heterogeneity between geographically separated primary tumors, PVTT and ctDNA, suggesting that different patterns of genetic evolution occur in primary and metastatic HCC, as well as ctDNA, upon isolation.

Fig. 3
figure 3

The evolutionary trajectory of vascular invasion in HCC. A Phylogenetic trees showing the clonal relationship among peritumor (P), tumor (T), PVTT, and ctDNA samples from one HCC patient based on somatic mutations. The lengths of the branch and trunk reflect the number of mutated genes. Driver genes are labeled on the tree. B–C The copy number (upper) and LOH (lower, shown as BAF distribution) profiles of peritumor (P), tumor (T), PVTT, and ctDNA samples from L03 and H06

We next analyzed the distributions of copy number and B-allele frequency (BAF) of samples from the same patient. As illustrated by the representative profiles of the L03 and H06 cases in Fig. 3B and C, tumors, PVTTs, and ctDNA in MVI ( +) HCCs shared mutation clusters, again indicating the monoclonal origin of primary/metastatic tumors and ctDNA from another perspective. This phenomenon was not observed in the HCC patients without MVI. These results confirm the clonal relevance of ctDNA to primary and metastatic tumors in MVI ( +) HCC.

Dynamic changes in the mutational landscape during vascular invasion of HCC

To better understand the dynamic alterations in the mutational landscape during HCC progression and to identify specific mutations related to vascular invasion, we next performed clonal and subclonal structure analyses of the mutations in MVI ( +) HCC based on the CCF value calculated using the PyClone method [32]. Comparison of CNVs harbored by three samples from the H08 cases confirmed the clonal relatedness across samples (Fig. 4A). Interestingly, as illustrated by representative clonal architectures of the H08 case in Fig. 4B, we observed that TP53R117S maintained high cellular prevalence both in primary tumors and in PVTT, suggesting that it may be the trunk mutation and acquired at the early stage of carcinogenesis. Furthermore, the cellular prevalence of SNX6G6C, SLC14A2E526K, and KIF18AK57N, which were present in the primary tumor but relatively low, increased in the PVTT samples, while BRD7Y455X mutations were eliminated after vascular invasion. This finding indicates that HCC subclones harboring these specific mutations acquired invasive ability and finally generated vascular metastasis. In addition, novel mutations, such as HS3ST2S47T, SCN9A W349X, and NEBV196I, were acquired at the vascular invasion stage in the evolutionary cascade, suggesting that the cancer cells in the tumor embolus continue to mutate to adapt to the metastatic state. We also found that trunk mutations, metastasis-promoting mutations or metastasis-specific mutations were detected in ctDNA to different extents. These results demonstrate that despite its monoclonal origin, HCC has dynamic mutational changes during vascular invasion and mutational heterogeneity between primary and metastatic foci, which can be comprehensively reflected by ctDNA.

Fig. 4
figure 4

Monoclonal origin and dynamic mutation alterations during vascular invasion of HCC. A Circos plots showing the CNVs in each sample of H08. The first circle represents the coverage of sequencing, the second circle represents the density of SNP indels, and the third circle represents the distribution of CNVs (red indicates amplification, blue indicates deletion, green indicates normal copy number). B The cellular prevalence of specific somatic mutations in tumor (left), PVTT (middle), and ctDNA (right) samples from H08. The peak and distribution of somatic mutations were determined based on the allelic fraction and copy number using PyClone analysis

Development of an integrated molecular signature to predict relapse for HCC

Given that genes found to be significantly mutated in MVI ( +) HCC are likely to play an important role in the invasion and relapse of HCC and that transcriptomic data can provide additional information to determine whether the expression of mutated genes is altered, we next integrated the exome and transcriptome data to construct a gene expression-based prognostic signature. A concise flowchart of signature development and validation is shown in Fig. 5A. We obtained a total of 220 shared mutated genes from MVI-associated SMGs and two public HCC datasets. By applying univariate and LASSO Cox regression analyses, we finally developed a relapse-related prognostic signature containing 11 genes and named it RGSHCC. These genes, with characteristic amino acid mutation patterns, were detected in tumors, PVTT, and ctDNA of MVI ( +) HCC (Fig. 5B). Based on the coefficients for each gene shown in Fig. 5C, the RGSHCC score for each patient was calculated, and the median score was used as the cutoff value to determine the high-RGSHCC and low-RGSHCC groups. Association analyses of the risk score and clinicopathological characteristics of HCC patients indicated that patients with high RGSHCC scores had worse tumor differentiation and tumor staging, larger tumor size, higher α-fetoprotein (AFP) levels, and vascular invasion than those with low RGSHCC scores (Supplementary Table 3). In addition, as indicated in the TCGA training dataset, the RGSHCC score successfully segregated cases with a high risk of relapse from those with a low risk (hazard ratio [HR], 2.34 [95% confidence interval (CI), 1.71–3.21], p < 0.0001; Fig. 5D). Furthermore, the signature was externally validated in the GSE14520 cohort (LCI validation dataset 1) and PCR cohort (LCI validation dataset 2), for which HCC patients in the high-RGSHCC group experienced more relapse events than those in the low-RGSHCC group (validation dataset 1, HR, 1.7 [95% CI, 1.19–2.44], p = 0.0034; validation dataset 2, HR, 1.78 [95% CI, 1.14–2.79], p = 0.0097; Fig. 5E and F).

Fig. 5
figure 5

Development of an 11-mRNA signature named RGSHCC to predict relapse for HCC. A The flow chart of the signature development and validation in the present study. A total of 220 shared mutated genes in the MVI-associated SMGs and the TCGA-LIHC cohort and GSE14520 cohort were included for univariate and LASSO Cox regression analyses in the TCGA training dataset (n = 356). In total, 11 genes were finally selected to construct the signature named RGSHCC. The prognostic value of the signature was externally validated in the GSE14520 cohort (n = 221) and PCR cohort (n = 172). B Graphical summary of the sample distribution (right) and the amino acid mutation pattern (middle) of 11 genes included in the signature. C The LASSO coefficient of 11 genes in the signature. D–F Kaplan‒Meier analyses of RFS between the low-RGSHCC and high-RGSHCC groups in the TCGA training dataset (D), LCI validation dataset 1 (E), and LCI validation dataset 2 (F). Log-rank test was used for survival comparisons in Fig. 5D–F

To examine whether the RGSHCC score is independent of other prognostic factors, we next performed univariate and multivariate Cox regression analyses. The results indicated that the RGSHCC score remained significant for RFS after considering other prognostic variables, including MVI and TNM stage, in the TCGA training dataset (Supplementary Figs. S1A and S1B). Consistently, the prognostic signature exhibited independent predictive value in two validation datasets (Supplementary Figs. S1C-S1F). These results suggest that the RGSHCC score is a robust risk factor for HCC relapse.

Biological features underlying RGSHCC and its predictive value for therapeutic response

We next examined the biological features underlying the signature using gene set enrichment analysis. As illustrated in Fig. 6A, several biological processes, such as tumorigenesis, epigenetic alterations, and tumor-driving transcription, were enriched in the high-RGSHCC group, while the low-RGSHCC tumors had stronger inflammatory responses and higher chemotherapeutic sensitivity. This result drove us to investigate whether the high and low RGSHCC groups had different chemotherapeutic responses. As expected, based on the IC50 obtained from the GDSC database, we found that low-RGSHCC tumors tended to be more sensitive to four chemotherapy drugs approved for solid tumor treatment: docetaxel, erlotinib, temsirolimus, and dasatinib (Fig. 6B–E). Considering that immunotherapy is currently a promising treatment for HCC, we also used the TIDE algorithm to predict the immunotherapy response for each sample in LCI validation dataset 1. As indicated in the fourfold contingency table in Fig. 6F, HCC patients in the low-RGSHCC group had a higher immunotherapy response rate than those in the high-RGSHCC group (43.24% vs. 27.27%, p = 0.017). Furthermore, we matched the expression profiles of HCC patients in two groups with a published cohort of 47 melanoma patients receiving anti-CTLA4 and anti-PD-1 treatments and found that patients with low RGSHCC were likely to have a better response to anti-PD-1 therapy (Fig. 6G). These findings suggest that the dismal outcomes of high-RGSHCC HCC may result from more aggressive biological processes, and HCC patients with low RGSHCC scores may benefit more from chemotherapy and immunotherapy.

Fig. 6
figure 6

The biological processes underlying RGSHCC and its predictive value for therapeutic response. A Gene set enrichment analysis of differentially enriched gene sets in the low- and high-RGSHCC groups in the TCGA dataset. B–E Box plots of the estimated IC50 values for docetaxel, erlotinib, temsirolimus, and dasatinib between the low- and high-RGSHCC groups. F Association between RGSHCC and the predicted immunotherapy response of HCC patients using the TIDE algorithm in LCI validation dataset 1. G Submap analysis of the putative immunotherapeutic response in the low- and high-RGSHCC groups

Discussion

Vascular invasion of HCC is a spatially and temporally dynamic process that is poorly understood, resulting in very limited therapeutic interventions to effectively inhibit hematogenous metastasis and a lack of risk evaluation systems to accurately predict relapse of HCC. Here, by performing WES and comprehensive bioinformatics analyses, we obtained an explicit view of the clonality of metastatic HCC and the evolution pattern of ctDNA. First, genomic alterations that promote vascular invasion in HCC occur at the primary tumor stage and are inherited by metastatic foci. Geographically separated primary tumors and metastases will continue to evolve independently and present genetic heterogeneity. Furthermore, ctDNA can reflect the landscape of genomic alterations in metastatic HCC but has no clonal correlation with primary tumors in the absence of MVI. Moreover, we developed a practical and reliable molecular signature containing 11 genes for HCC relapse prediction based on the integration analysis of SMGs associated with MVI. This work contributes to a better understanding of the evolutionary process of HCC and to the prognostic stratification of HCC patients.

The high frequency of relapse and poor prognosis of HCC after surgery are largely the result of intrahepatic metastases through invasion of the portal vein [33]. One of our earlier studies found that the gene expression profile of primary tumors was very similar to that of metastases in metastatic HCC, thus pioneering the novel concept that gene expression events favoring metastasis were initiated at the primary tumor stage from a transcriptome perspective [34]. However, no relevant studies have been conducted to comprehensively characterize the evolutionary trajectory of HCC metastasis at the genomic level. In this work, by comparing genomic alterations, including mutation spectrum and signatures, CNVs, SMGs, and clonal evolutionary trajectories, between 5 paired tumor and PVTT samples, we found that primary tumors and metastases have very similar genomic alteration profiles and high clonal correlation, indicating a monoclonal origin, which is consistent with the clonal evolution theory of cancer [35]. In addition, our results further support and complement our previous concept that genomic alterations and expression changes promoting metastasis occur in primary tumors. This concept may have theoretical implications for the exploration of mechanisms underlying hepatocarcinogenesis and the development of therapeutic strategies but also presents a formidable challenge for early interventions of HCC progression. In addition, we observed that the landscape of SMGs and potential MVI-driven mutations were different among individuals, suggesting that the metastatic drive of HCC involves multiple pathways and is heterogeneous, highlighting the necessity of detecting tumor genetic features before targeted therapy in each HCC patient.

In HCC, although the most prevalent mutations, such as TP53, are not actionable, approximately a quarter of tumors have potentially pharmacologically targetable drivers [36]. However, the biggest obstacle to achieving precision treatment for HCC is the high intratumor heterogeneity and the inaccessibility of tumor tissues [11]. In this setting, ctDNA has been proposed to serve as an alternative source of tumor DNA and can provide comprehensive and dynamic insight into the tumor genome in a minimally invasive manner, which may be a way to address the dilemma of current HCC biopsy [13]. Indeed, evidence has demonstrated that HCC patients benefit from corresponding targeted therapies based on the mutations detected in ctDNA [37]. Here, we provide new insights into the evolutionary pattern of ctDNA in HCC: ctDNA can accurately characterize tumor genomic alterations when MVI occurs but has no clear genetic correlation with primary tumors without vascular invasion. From a pathophysiological perspective, this can be explained by more cancer cells entering the circulation after the tumor breaks through the blood vessels, but more detailed molecular mechanisms need to be further investigated. This new concept suggests that ctDNA may have greater application prospects in advanced HCC and can effectively address the issues of tissue unavailability at the advanced stage of HCC, helping guide treatment decisions and dynamic detection of therapeutic efficacy. However, since the mutation frequency of single genes in ctDNA is relatively low, a ctDNA panel that integrates mutational features of multiple genes to more comprehensively, accurately, and adequately reflects the metastatic potential of HCC is warranted in the future.

By identifying SMGs and comparing dynamic changes in mutation landscapes between tumors and PVTT, we identified a series of genes and their specific mutations that may be involved in the invasion and relapse of HCC. Dissection of the subclonal architecture has demonstrated that HCC vascular invasion is associated with a lack of BRD7Y455X mutation, enrichment of SNX6G6C, SLC14A2E526K, and KIF18AK57N mutations, and acquisition of HS3ST2S47T, SCN9A W349X, and NEBV196I mutations. The role of some genes, such as SNX6, KIF18A, and HS3ST2, in promoting tumor invasion and metastasis has been reported [38,39,40]. Therefore, therapies targeting these genes or specific mutations may effectively prevent the postoperative relapse of HCC. We also found that some mutations can be detected in ctDNA. As mentioned above, whether ctDNA can be used as a new tool to obtain important molecular information in metastatic HCC deserves further investigation. On the other hand, we have to point out that the patients in this study were mainly with HBV-associated HCC, so whether the aforementioned evolutionary trajectories regarding HCC metastasis and ctDNA, as well as the gene mutations related to HCC metastasis, are also present in non-HBV-associated HCC need to be further studied. Another issue is that our sample size is relatively small, so the findings here are yet to be validated by a large-sample cohort in the future.

Finally, by integrating MVI-related SMGs with transcriptomic data from several independent HCC cohorts, we successfully developed and validated an RNA-based signature to robustly distinguish high- and low-risk relapsed HCC patients with good predictive performance. Our multivariate Cox regression analyses revealed that RGSHCC score was an independent prognostic factor in HCC, which may be attributed to its use as a gene -based predictive system that can additionally reflect information on molecular features associated with metastasis and recurrence within the tumor. In addition, we revealed the biological features underlying the signature, which may explain the poor prognosis in the high-RGSHCC group. The association of the RGSHCC score with chemotherapeutic and immunotherapeutic sensitivity was also highlighted. Nevertheless, although we have fully validated the feasibility of applying RGSHCC for relapse prediction in HCC from our institution, prospective multicenter and large-scale cohort studies are still required to thoroughly assess its application potential.

In summary, we characterized the genomic alterations during HCC vascular invasion and demonstrated a previously undescribed evolution pattern of ctDNA in HCC, which provides an important theoretical basis for a deeper understanding of HCC progression. We also developed a novel integrated multiomics-based molecular platform to identify populations with a high risk of relapse, which has significant implications for personalized treatment to prolong HCC patient survival.