Introduction

Despite considerable advances in the clinical management of hepatocellular carcinoma (HCC) over the past decades, HCC remains the fourth leading cause of cancer mortality worldwide, which is mainly attributed to the unsatisfactory early detection rate and limited treatments for patients with advanced HCC [1, 2]. From a clinical perspective, enhancing early screening for HCC, particularly among high-risk patients with chronic hepatitis or cirrhosis, is vital for increasing the overall prognosis of HCC [2]. Meanwhile, timely and effective postoperative surveillance for recurrent tumors is also of great significance for patients with HCC [3].

Currently, the combined utilization of ultrasonography and serum α-fetoprotein (AFP) is a widely available, affordable and noninvasive strategy for HCC surveillance [2]. Nevertheless, their sensitivity and specificity for early-stage HCC screening are insufficient [4]. In light of this, advanced and diagnostic imaging techniques with higher accuracy and reliability, such as contrast-enhanced computed tomography (CT) and magnetic resonance imaging (MRI), have been proposed as necessary supplements for HCC surveillance among high-risk patients; however, the cost, availability and risks of radiation and contrast agents should not be ignored [2]. In addition, the inherent risk of tissue biopsy also limits its application for early-stage HCC screening [2]. Thus, the optimal and reliable means for early detection of HCC is still evolving, and benefits should be fully weighed against health costs, potential physical harms and overdiagnosis.

MicroRNAs (miRNAs), a class of small noncoding RNAs, are implicated in various biological processes mainly by posttranscriptionally regulating gene expression [5]. Correlations between miRNA dysregulation and cancer initiation or progression have been widely identified [5]. Compared to tissue biopsy, the advantages of liquid biopsy have been highlighted, such as abundant sources (particularly blood), noninvasive techniques, and dynamic monitoring abilities by frequent sampling [3]. Interestingly, alterations of tumor-specific miRNA profiles in serum reflecting cancer formation at the early phase (including HCC) have been widely reported in recent years [6,7,8,9]. As circulating miRNAs in body fluids are stable, they have become one of the most promising and fast-growing noninvasive biomarkers in the field of liquid biopsy for cancer screening [6, 7, 10, 11]. However, it is undeniable that previous studies related to HCC-related circulating miRNAs are generally limited by a small sample size, a lack of adequate external validation or a defective study design [12,13,14,15]. Thus, more reliable circulating miRNA signatures associated with HCC tumorigenesis remain to be explored, and excellent circulating miRNA-based diagnostic models are highly anticipated.

In this study, based on multiple large-scale datasets from public databases, we aimed to screen out more reliable HCC-related miRNAs with dysregulated expression in both serum and tumors by a rigorous study design. Moreover, a novel and powerful circulating miRNA-based diagnostic score for early-stage HCC screening was further established and fully validated.

Materials and Methods

Data Preparation

By thoroughly searching the Gene Expression Omnibus (GEO) database, four datasets (GSE113740, GSE112264, GSE113486 and GSE106817) related to the circulating miRNA profiles of patients with HCC were obtained. From the GSE113740 datasets, 1453 serum samples were included in this study, including 345 from HCC patients, 139 from patients with chronic liver disease (CLD) (liver cirrhosis (LC), n = 93; chronic hepatitis (CH), n = 46) and 969 from healthy individuals (non-cancer control, NC). The GSE113486 dataset provided 972 serum samples, including 40 from HCC patients, 832 from patients with other tumors (OT) and 100 from NC. The GSE112264 dataset was consisted of 1350 serum samples, including 50 from HCC patients, 1259 from OT patients and 41 from NC. Serum samples from 81 HCC patients and 2759 NC from the GSE106817 dataset were also enrolled in this study. According to the statements in the original articles regarding the four datasets, three internal control miRNAs (including miR-149-3p, miR-2861, and miR-4463) were utilized to normalize the raw miRNA signal [15,16,17,18]. Quantile normalization of the datasets was further performed in the present study. In addition, the normalized miRNA-seq data and relatively intact clinical information of 250 HCC tissues and 49 non-cancer controls were downloaded from The Cancer Genome Atlas (TCGA) database using the UCSC-Xena platform (https://xenabrowser.net/). All patients with HCC enrolled in this study were pathologically diagnosed.

Screening of HCC-Related Circulating miRNAs

To screen HCC-related circulating miRNAs, the differentially expressed miRNAs in serum samples from HCC patients compared with those of CLD patients and NC were first screened out. Then, the differentially expressed miRNAs in HCC tissues compared to NC were obtained. After that, overlapping analyses between the two datasets were performed to identify the potential miRNAs dysregulated in both the serum and tumors of HCC patients. Differential expression analysis was performed using the R package “limma”, and “fold change (FC) > 2 with adjusted P-value < 0.05” was used as the threshold.

Circulating miRNAs Selection and Model Construction

To construct a circulating miRNA-based diagnostic score using the GSE113740 dataset, LASSO-penalized logistic regression analysis was first conducted to select the optimal circulating miRNA set by using the R package “glmnet” based on tenfold cross-validation and lambda.min. The Z-score was computed for each selected miRNA as its relative expression level during model construction. Then, the selected circulating miRNAs were enrolled into a multivariate logistic regression model to calculate their coefficients (β values). The circulating miRNA-based diagnostic score was constructed using the following formula: diagnostic score = (β1 * Z-score of miRNA1) + (β2* Z-score of miRNA2) + … + (βn* Z-score of miRNAn).

Validation and Visualization of the Model

The receiver operating characteristic (ROC) curve was used to evaluate the diagnostic value of the circulating miRNA-based model for HCC. Meanwhile, for visualization of the diagnostic model, a convenient nomogram was further built using the R package “rms” and comprehensively assessed by calibration plot and concordance index (C-index). In addition, three independent datasets (GSE112264, GSE113486 and GSE106817) were utilized to externally validate the discriminability of the model for HCC.

Statistical Analysis

Continuous variables between different groups were compared by the Mann–Whitney test or one-way analysis of variance. Logistic regression model analysis was used to determine the factors associated with HCC. The optimal cutoff values of the miRNAs levels for survival analysis were calculated by X-tile software (v3.6.1), and their prognostic values in terms of overall survival and recurrence-free survival were analyzed by Kaplan–Meier analysis and log-rank test. GraphPad Prism 8.0.2, SPSS 22.0 and R 3.4.2 were applied for statistical analysis and graphing. A P-value < 0.05 was deemed statistically significant.

Results

Screening of Key HCC-Related miRNAs

The overall design of this study was summarized in a flowchart (Fig. 1) Based on the stringent screening criteria mentioned above, 155 circulating miRNAs (including 53 upregulated miRNAs and 102 downregulated miRNAs) that could distinguish HCC patients from both CLD patients and NC were found. Then, 136 dysregulated miRNAs (including 33 upregulated miRNAs and 103 downregulated miRNAs) in HCC tissues compared with NC were screened out. Next, by performing overlapping analysis, six key miRNAs (miR-184, miR-221-3p, miR-532-5p, let-7b-3p, miR-26b-3p, miR-5589-5p) dysregulated in both serum and tumors of HCC patients were ultimately obtained (Fig. 2). The clinical implications of the dysregulated miRNAs in HCC tissues were preliminarily explored and are shown in Supplementary Table 1 and Supplementary Fig. 1.

Fig. 1
figure 1

Flowchart describing the overall design of this study. CH chronic hepatitis, CLD chronic liver diseases, FC fold change, HCC hepatocellular carcinoma, NC non-cancer control (healthy individual), NT non-cancer tissue, LC liver cirrhosis, OT other tumors, ROC receiver operating characteristic

Fig. 2
figure 2

Screening of HCC-related circulating miRNAs. (a) Identification of key miRNAs dysregulated in serum and tissues of HCC patients by differential expression analysis and overlapping analyses. (b-c) Expression levels of key miRNAs in serum (GSE113740) and liver tissues (TCGA) of different populations. CH chronic hepatitis, CLD chronic liver diseases, Down downregulation, HCC hepatocellular carcinoma, NC non-cancer control (healthy individual), NT non-cancer tissue, LC liver cirrhosis, Up upregulation

Construction of Circulating miRNA-Based Diagnostic Score

By performing LASSO-penalized logistic regression analysis using the GSE113740 dataset, it was suggested that the optimal modeling approach was combined utilization of the six circulating miRNAs (Fig. 3). The correlations between the six circulating miRNAs levels and HCC occurrence were identified by univariate logistic regression analysis (all P < 0.05). Following calculation of their coefficients using multivariate logistic regression analysis, the circulating miRNA-based diagnostic score was developed. Diagnostic score = (0.483* Z-score of miR-184) + (0.338* Z-score of miR-532-5p) + (4.437* Z-score of miR-221-3p) + (− 0.276* Z-score of miR-5589-5p) + (− 0.161* Z-score of let-7b-3p) + (− 0.421* Z-score of miR-26b-3p) (Table 1).

Fig. 3
figure 3

Selection of the optimal circulating miRNAs for modeling by LASSO-penalized logistic regression analysis. (a): The optimal circulating miRNA set was chosen by tenfold cross-validation and lambda.min; (b): LASSO coefficient profile of the circulating miRNAs

Table 1 Logistic regression model analyses of associations between the six circulating miRNAs levels and HCC status

Internal Validation and Visualization of the Model

As shown in Fig. 4A, the diagnostic scores of HCC patients at different clinical stages (BCLC staging system, I-IV) were significantly higher than those of the non-HCC individuals (including CLD patients and NC). ROC analyses indicated the excellent discriminability of the model for HCC (HCC vs non-HCC: AUC = 0.9535, HCC vs NC: AUC = 0.9559, HCC vs CH: AUC = 0.8963, HCC vs LC: AUC = 0.9569, all P < 0.0001), which was higher than that of each of the six circulating miRNAs (Fig. 4B-C, Supplementary Fig. 2). In addition, we further evaluated the diagnostic value of the model for HCC at different clinical stages. Surprisingly, the model displayed high discriminability for both early-stage HCC and advanced HCC (HCC-stage I vs Non-HCC, AUC = 0.9490, HCC-stage II vs Non-HCC, AUC = 0.9508, HCC-stage III vs Non-HCC, AUC = 0.9547, HCC-stage IV vs Non-HCC, AUC = 0.9315, all P < 0.0001) (Fig. 4D). In contrast, the diagnostic value of serum AFP level for HCC, particularly for early-stage HCC, was relatively undesirable (HCC vs Non-HCC: AUC = 0.6840, P < 0.0001; HCC-stage I vs Non-HCC, AUC = 0.5702, P = 0.0583; HCC-stage II vs Non-HCC, AUC = 0.6810, P < 0.0001, HCC-stage III vs Non-HCC, AUC = 0.7968, P < 0.0001; HCC-stage IV vs Non-HCC, AUC = 0.8895, P < 0.0001) (Fig. 5).

Fig. 4
figure 4

Internal validation of the diagnostic ability of six-circulating miRNA-based diagnostic scores for HCC in the GSE113740 dataset. (a) Comparison of the diagnostic score in different populations. (b-c) ROC analyses evaluating the accuracy of the model for distinguishing HCC patients from non-HCC patients or specific populations (including healthy individuals (NC) and patients with chronic hepatitis (CH) or liver cirrhosis (LC)). (d) ROC analyses evaluating the diagnostic value of the model for HCC at different clinical stages (BCLC staging system)

Fig. 5
figure 5

Evaluating the diagnostic value of serum AFP levels for HCC in the GSE113740 dataset. (a) Comparison of serum AFP levels in different populations. (a) ROC analysis evaluating the accuracy of serum AFP for distinguishing HCC patients from non-HCC patients. (c) ROC analyses evaluating the diagnostic value of serum AFP for HCC at different clinical stages (BCLC staging system). AFP α-fetoprotein, CH chronic hepatitis, HCC hepatocellular carcinoma, LC liver cirrhosis

To enhance the clinical applicability of the model, a visual nomogram was developed to expediently evaluate the probability of HCC based on each diagnostic score (Fig. 6A). The C-index of the nomogram was equivalent to the AUC mentioned above (0.9535). The calibration plot showed high consistency between the nomogram-predicted probability of HCC and the actual probability of HCC, indicating that the nomogram was an excellent model for HCC diagnosis (Fig. 6B).

Fig. 6
figure 6

Establishment and validation of a visual nomogram. (a) The nomogram based on the circulating miRNA-based diagnostic score for HCC screening. (b) The calibration curve of the nomogram

External Validation of the Model

To deeply assess the diagnostic power of the model, three independent datasets were utilized for external validation. As shown in Fig. 7, consistent with the results described above, ROC analyses suggested that the diagnostic score could not only effectively distinguish HCC patients from non-cancer control, but also displayed good separability between HCC patients and OT patients (GSE113486: HCC vs NC, AUC = 0.9780; HCC vs OT, AUC = 0.8602; GSE112264: HCC vs NC, AUC = 0.9961; HCC vs OT, AUC = 0.8324; GSE106817: HCC vs NC, AUC = 0.9681, all P < 0.0001).

Fig. 7
figure 7

External validation of the six-circulating miRNA-based diagnostic score. The diagnostic scores of different populations and ROC analyses evaluating the accuracy of the model for distinguishing HCC patients from non-cancer patients (NC) or patients with other tumors (OT) using three independent cohorts, including GSE113486 (a), GSE112264 (b) and GSE106817 (c)

Discussion

Circulating miRNAs in serum can remain fairly stable either by exosomal encapsulation or binding to proteins, thereby escaping RNase-mediated degradation and tolerating harsh conditions (such as variation of pH or temperature) [19]. Tumor-specific circulating miRNAs are expected to be one of the most promising and competitive noninvasive biomarkers in clinical practice, ranging in utility from early cancer screening to postoperative recurrence monitoring [10]. To date, many HCC-related single circulating miRNAs or circulating miRNA panels have been found, such as miR-122, miR-21, miR-92-3p, miR-3126-5p, miR-107, miR-320b and miR-6724-5p [12, 15, 20, 21]. However, highly reliable circulating individual miRNA or miRNA panels for early-stage HCC screening remain to be identified for the following reasons. First, the sample size in most of the previous studies was relatively small; thus, the relevant results are not sufficiently reliable [14, 22, 23]. Second, the overall study design used to explore HCC-related circulating miRNAs was relatively biased and unreasonable in some studies. For example, some studies just aimed to determine the consistent dysregulation of certain miRNAs in the serum of HCC patients according to the reported miRNAs dysregulated in HCC tissues or in the serum of patients with other tumors in previous studies [24,25,26,27]. Some studies merely focused on circulating miRNA profiles without taking miRNA profiles in HCC tissues into consideration, and there might be inadequate identification of circulating miRNAs dysregulation truly originating from pathological changes in tumors [15, 28,29,30]. Furthermore, the generally limited accuracy of single circulating miRNAs hinted at the necessity for the construction of diagnostic model based on multiple circulating miRNAs to enhance screening ability [24, 28]. Finally, previous studies were usually conducted based on single-center cohorts, so the existing circulating miRNAs and diagnostic models generally lacked powerful external validation using sufficient independent cohorts [12, 15, 29].

Given the considerations mentioned above, we included data from five large-scale reliable public datasets related to intact miRNA profiles in the serum or tumor tissues of HCC patients. Based on rigorous and comprehensive screening strategies, six key miRNAs (miR-184, miR-5589-5p, miR-532-5p, let-7b-3p, miR-221-3p, miR-26b-3p) dysregulated in both serum and tumor tissues of HCC patients were systematically identified, all of which displayed significant discriminability between HCC patients and controls (including healthy individuals and high-risk patients). Moreover, to enhance the accuracy of the above circulating miRNAs for HCC screening, a novel six-circulating miRNA-based diagnostic score was developed by appropriate statistical methods. With reference to the diagnostic efficacy of the AFP test, the robust diagnostic ability of our model for HCC (particularly for early-stage HCC) was highlighted. In addition, a visual nomogram based on the diagnostic score was correspondingly established to strengthen its clinical applicability. Most importantly, sufficient external validation was strictly carried out by utilizing three independent datasets with large sample sizes. The consistent results strongly indicated that the six-circulating miRNA-based diagnostic score might be an ideal and reliable model for noninvasive HCC screening.

In line with our findings, dysregulation of the six miRNAs in HCC tissues and their underlying biological roles or clinicopathologic implications have been revealed more or less in previous studies (Supplementary Fig. 3) [31,32,33,34,35,36]. To some extent, the close involvement of the above miRNAs with HCC initiation and progression strongly indicated that the corresponding dysregulation of circulating miRNA profiles could timely and rationally mirror the pathophysiological changes of HCC. Among the six circulating miRNAs, serum miR-221-3p exhibited the highest diagnostic accuracy for HCC. In light of the consistent findings widely demonstrated in previous studies, upregulation of circulating miR-221-3p may be an excellent and reliable indicator for HCC [37, 38]. To date, dysregulation of the other five miRNAs in the serum of HCC patients has rarely been reported. Besides, with reference to the circulating miRNAs levels in HCC, the consistent or opposite miRNA levels (including miR-221-3p, miR-532-5p, miR-26-3p, let-7b and miR-184) in the serum of patients with other cancers revealed in existing studies are worth mentioning [39,40,41,42,43,44,45,46]. This highlights that our distinctive circulating miRNA panel may encompass cancer hallmarks and heterogeneity, thereby demonstrating its excellent power to distinguish between HCC patients and non-cancer populations or OT patients. Of note, changes in certain circulating miRNA levels (such as miR-221 and miR-532) following radical cancer therapy have been observed, which indicates that our model may also be feasible for postoperative monitoring [39, 47,48,49,50]. More importantly, the mechanisms underlying the alterations of the circulating miRNAs expression in HCC in terms of their specific sources (HCC cells or non-HCC cells in tumor microenvironment) and biological roles (cell-to-cell communication) should be further elucidated in the future [10, 51].

Apart from the excellent results achieved in this study, there are still several limitations. First, detailed clinicopathological characteristics (such as tumor size, numbers and histological grade) and survival data of HCC patients were not available in the four circulating miRNA profile datasets. Thus, specific clinical implications or prognostic values of the six circulating miRNAs could not be fully analyzed. In addition, although the accuracy of the diagnostic score for HCC has been effectively validated in this study, its ability to detect early-stage HCC remains to be further evaluated compared with more powerful references (such as CT, MRI and other serologic biomarkers (including protein induced by vitamin k absence or antagonist-II, PIVKA-II)) by large prospective studies to obtain further evidence of its efficacy. Finally, the potential benefits of combining circulating miRNA-based diagnostic model with other surveillance tests (such as ultrasonography) for early-stage HCC screening need to be further studies.

In conclusion, we identified a distinctive panel of HCC-related circulating miRNAs based on reliable large-scale datasets and rigorous study design. In addition, a reliable six-circulating miRNA-based diagnostic score and its corresponding visual nomogram for HCC detection were developed and fully validated. With the development of noninvasive liquid biopsy in oncology, this model may serve as a powerful option in clinical practice for early screening of HCC and dynamic monitoring on postoperative recurrence, thereby guaranteeing the optimal timing of radical interventions.