Introduction

Hepatocellular carcinoma (HCC) is the most common primary liver malignancy and the fourth most common cause of cancer mortality worldwide [1]. Most cases occur in patients with cirrhosis from any etiology, including hepatitis C virus (HCV), nonalcoholic steatohepatitis (NASH), and alcohol-related liver disease (ALD) or in patients infected with chronic hepatitis B virus (HBV) without cirrhosis. Notably, the incidence is increasing in patients with NASH without cirrhosis [2,3,4,5,6].

Despite improvements in management strategies for HBV and HCV, both the incidence and mortality of HCC have remained persistently elevated in recent years and are leading causes of death in patients with compensated cirrhosis. This is in large part due to a lack of curative therapies for intermediate/advanced malignancy and comorbidities associated with liver disease [7]. Patients with advanced disease have a median survival of less than 2 years as compared to over 70% 5-year survival in those with early-stage HCC [8]. Several factors including underlying liver function, access to curative treatments, and functional status affect outcomes, but the stage at the time of diagnosis has been consistently shown to be highly associated with overall survival. Therefore, appropriate surveillance for HCC is associated with early detection, curative treatment receipt, and overall survival [9,10,11]. Current guidelines recommend screening for HCC with liver ultrasound (US) and serum alpha-fetoprotein (AFP) in patients with cirrhosis due to any etiology, or certain populations of patients chronically infected with HBV [3, 9, 12••].

Limitations of Ultrasound

Up to 20% of US has inadequate visualization for HCC due to patient factors, resulting in significant variance in the sensitivity of US for early-stage HCC detection. A recent meta-analysis showed US sensitivity for early detection of HCC varied widely between studies, from 21 to 89% [13, 14]. The pooled sensitivity of US-based surveillance is 45%, and the inclusion of AFP increases the sensitivity to 63%. Patient-level factors, ultrasound technician experience, and radiologist expertise further contribute to the heterogeneity of ultrasound performance.

Patient-level factors which diminish ultrasound-based surveillance performance include central adiposity, ascites, or procedural discomfort. Fatty deposition, whether in the subcutaneous tissue or liver parenchyma, can attenuate the ultrasound beam and impair the visualization of target organs [15]. One retrospective study of 116 patients noted a sensitivity of only 21% in detecting lesions in patients with a body mass index (BMI) ≥ 30 kg/m2, compared to 77% for those with a BMI < 30 [16]. Metabolic liver disease is also independently associated with a higher likelihood of poor visualization on US. Ascites or increased parenchymal macronodularity from clinical deterioration or advanced Child-Pugh B or C cirrhosis respectively can also distort viewing angles, obscure smaller tumors, and compromise exam quality [13, 17,18,19]. Variability in the technique and experience of the performing technician can also limit the radiologist’s ability to evaluate an ultrasound study [20].

Indeterminate lesions are common in US-based surveillance, which can lead to additional imaging, exposure to radiation, contrast-related injury, and invasive procedures including biopsy. Analysis of two retrospective studies found that 15–28% of patients received unnecessary cross-sectional imaging or liver biopsy in patients enrolled in US-based surveillance; in another study, 27.5% of 680 patients reported physical harms (defined as follow-up tests performed for false positive or indeterminate results) over a 3-year period, with a higher proportion of US-related harm than AFP-related harm [21, 22]. Ultimately, false-positive results can cause psychological distress, decreased self-perception, and anxiety and lead to increased costs. These effects can linger and may bring about ambivalence about the accuracy of further medical testing, precluding patients from returning for future screening [23, 24].

In addition to existing clinical limitations, screening liver ultrasound suffers from poor adherence in clinical practice. One pooled meta-analysis of twenty-nine studies comprising nearly 120,000 patients found that only 24% of at-risk patients utilized US surveillance for HCC screening [25]. This percentage was lower in those affected by alcohol- or NASH-related cirrhosis, both of which are increasingly common etiologies of HCC. Interventions such as mailed outreach or reminders within the electronic health record show promise, but applicability and adherence remain an issue [26, 27]. Patient-reported barriers included difficulties with scheduling, cost of testing, and uncertainty of where to get testing, whereas provider-related barriers include misconceptions about the limitation of US-based screening, limited time in clinic, and lack of up-to-date knowledge about surveillance guidelines [28•, 29,30,31,32].

Alternative Imaging Modalities and Their Limitations

Given the limitation in US-based screening, abdominal computerized tomography (CT) and magnetic resonance imaging (MRI) have been studied as alternative screening modalities. CT scans have comparable if not marginally better sensitivity—one systematic analysis of 10 CT studies found a pooled sensitivity of 68% compared to 60% for US [33]. A randomized controlled trial comparing biannual US to annual CT in 163 patients from the Veterans Affairs population found only marginal improvement in detection characteristics for US (sensitivity and specificity were 71% and 98%, respectively, for US vs. 67% and 94%, respectively, for CT) [34]. Low-dose CT (LDCT) has also been proposed for patients considered at high risk for developing HCC. In a single-arm study from South Korea, 139 patients at high risk for developing HCC (as defined by an annual incidence of ≥5% on a risk index) underwent paired biannual US and LDCT 1–3 times [35]. LDCT had significantly higher sensitivity for both overall and very early-stage HCC detection than US (83% and 82% versus 29% and 18%, respectively)[36].

Meanwhile, a Cochrane Review of MRI use for diagnosing HCC of any size and stage found a sensitivity and specificity of 84% and 94%, respectively [37]. Recently, abbreviated MRI (AMRI), a shortened exam that uses limited abdominal sequences, has shown promising performance. A systemic review and meta-analysis of 15 studies consisting of over 2800 patients found that AMRI had an overall pooled sensitivity and specificity of 86% and 94% respectively for HCC detection, with a subgroup analysis based on lesion size noting a sensitivity of 69% and 86% for detecting tumors <2 cm and ≥2 cm in diameter respectively [38]. The comparison of AFP plus AMRI versus US and AFP is currently under investigation in two separate ongoing randomized controlled trials (RCT): the PREMIUM Trial (NCT 05486572), involving 4700 patients from the United States Veterans Affairs Medical Centers and the FASTRAK Trial (NCT 05095714), involving 944 patients from France.

Despite these advantages, prior analyses from the USA have found that neither CT nor MRI matches the cost-effectiveness ratio of combined AFP and US-based surveillance [39, 40]. The repetitive nature of surveillance also leads to significant cumulative radiation exposure via CT scans. Administration of contrast is also a relative contraindication in those with severely impaired renal function. In addition to costs, MRI is limited by availability, with many patients in rural or underserved areas lacking access to MRI.

Considering these limitations, and the limited sensitivity of AFP as a standalone biomarker, the role of alternative biomarkers screening for HCC has been proposed. Several have been investigated, but none has yet been sufficiently validated to supplant ultrasound as the standard of care for screening. In this review, we will discuss the process of biomarker validation and highlight the steps necessary to move beyond ultrasound-based screening for HCC.

Biomarkers

A biomarker is defined as a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes [41]. While any biomolecule such as DNA, RNA, or protein can possess clinical utility, cancer biomarkers are traditionally important molecules involved in specific molecular pathway disruptions or pathogenesis/proliferation of malignant cells [42].

Biomarkers can be categorized into three groups depending on their utility: predictive, prognostic, or diagnostic. Predictive biomarkers predict response to therapeutic interventions. Prognostic biomarkers calculate the likelihood of recurrence. Finally, diagnostic biomarkers such as fecal immunochemical tests (FIT) or AFP can be used alone or in conjunction with other studies for surveillance testing [3].

Biomarker Validation

Like pharmaceuticals, biomarkers undergo a multistep process from discovery to validation and implementation (Table 1). Each step is defined by a unique set of objectives, requirements, and evaluation criteria that eventually form the blueprint for clinical utilization (Table 2).

Table 1 Phases of biomarker validation
Table 2 Designs for biomarker validation studies

Phase I

This phase focuses primarily on discovery. Researchers perform high-output preclinical exploratory studies to distinguish malignant and nonmalignant tissues to identify potential biomarkers. Techniques such as proteomic, metabolomic, or transcriptomic analysis; gene-expression profiling; and mass spectroscopy are often utilized to identify promising proteins, genes, or marker antibodies. Markers that appear to be differentially expressed in diseased samples relative to healthy controls are advanced to the next stage. Importantly, organ tissue is often used for the discovery of a signal that can then be tested further via blood samples [43].

Phase II

Once potential candidates have been identified, researchers validate these markers by performing retrospective case-control studies. Receiver operating curves (ROC) and area under ROC (AUROC) are often utilized as part of this objective assessment. This phase also secondarily analyzes the temporal relationship of the candidate marker with the natural course of its associated malignancy—markers that find earlier-stage disease are inherently more promising than those that detect only late-stage tumor [43]. Case-control studies overestimate the biomarker’s performance due to both spectrum bias in case-control studies and inflated cancer incidence compared to population-based cohorts [7].

Phase III

During this phase, biomarkers are validated in larger population-based cohorts using a retrospective longitudinal repository design (also known as PRoBE [prospective-specimen collection, retrospective-blinded evaluation]) to determine the efficacy of the biomarker in detecting preclinical disease [7, 44]. Candidate marker performance can be compared against a gold standard, and if promising, can move on to the next phase of validation. Notably, this phase is also used to define thresholds for positive criteria and ensure that the biomarker’s clinical utility extends across all populations regardless of demographics, medical history, and other confounding variables [43].

Phase IV

This phase utilizes prospective screening studies to assess the candidate marker’s ability to discriminate known disease versus controls. Real-world studies allow researchers to calculate both rates of detection and referral for further workup. The biomarker is also compared to an established gold standard via randomization or parallel design as part of the continuing effort to scrutinize its practicality and clinical utility [7, 45••].

Phase V

The final phase explores whether the use of the biomarker reduces the burden of cancer in routine clinical practice [7, 43]. This phase not only depends on biomarker performance but also depends on the ease of obtaining the test, result reporting, and treatment effectiveness of early-stage cancers. Data is often gathered from randomized controlled trials to evaluate how the test reduces overall mortality from cancer. Monitoring may also include an analysis of the cost-effectiveness and safety of the test [45••].

Biomarkers offer several advantages over traditional imaging-based screening, including ease of access with potential improvements in adherence, improvement on sensitivity and specificity, and potential improvement in cost-effectiveness. However, because of the novelty of many proposed markers, they lack higher levels of validating evidence and clear guidance on how to deal with incongruent findings such as a positive biomarker but negative imaging. Currently, AFP remains the only biomarker that has been validated beyond phase III in the USA; however, several promising biomarkers remain in the validation pipeline (Table 3).

Table 3 Commonly used and emerging biomarkers

Alpha-Fetoprotein (AFP)

AFP is the most widely used biomarker for HCC diagnosis and monitoring. It is the only biomarker that has been validated for clinical practice. However, using AFP as a standalone marker for HCC screening has several notable limitations. AFP can be less than the upper limit of normal in 40–60% of patients with early-stage HCC and elevated in the absence of HCC in other conditions such as viral hepatitis and other neoplasms of the GI tract [63, 64]. A recent meta-analysis showed that using AFP in conjunction with abdominal US over using abdominal US alone results in an increase in overall sensitivity from 45 to 63% [14]. At its traditional cutoff of 20 ng/mL, AFP sensitivity for detecting early-stage HCC ranges from 41 to 65%, whereas specificity ranges from 80 to 94% [46,47,48,49]. Several strategies have been employed to mitigate the limitations of AFP including longitudinal trends in AFP to increase sensitivity over interpretation of a single value [3, 50]. Changing demographics in the etiologies of cirrhosis can also affect optimal cutoffs. Prior phase II studies from the Early Detection Research Network (EDRN) have reported that AFP levels tend to be lower in patients with nonviral etiologies of disease [51]. A recent analysis of more than 133,000 patients from the National Cancer Database demonstrated a downtrend in median AFP levels at the time of HCC diagnosis, with the most notable decline among those with early-stage tumor—this reflects the shifting epidemiology to nonviral etiologies of liver disease [4, 65]. While AFP has been incorporated into other panels such as the GALAD score (gender, age, AFP-L3, AFP, DCP) and HES (hepatocellular carcinoma early detection screening) algorithm, AFP alone is not an effective biomarker for HCC screening.

AFP-L3

AFP-L3 is a fucosylated isoform of AFP with a high affinity for the Lens culinaris agglutinin (LCA) antigen. HCC produces AFP-L3 even in its early stages, and cells with increased expression of the glycoprotein are prone to undergo early vascular invasion and intrahepatic metastasis [52]. A meta-analysis of six studies involving nearly 2500 patients found that AFP-L3 has high specificity (92%) but low sensitivity (34%) for diagnosis of early HCC [53]. In a phase III validation cohort including 397 patients, AFP-L3 at a cutoff of 11.9% had sensitivities of 46% and 45% for early and any-stage HCC diagnosis, respectively; another phase III cohort of 534 patients assessed AFP-L3 at a cutoff of 8.3% and found a sensitivity of 40% when the FPR was 10% [47, 54•, 66]. AFP-L3 performance as a standalone marker is not sufficient for HCC early detection; however, it has been integrated into multi-biomarker panels that show promising performance for HCC early detection.

Des-Gamma Carboxyprothrombin (DCP)

DCP, also known as protein induced by vitamin K absence II (PIVKA-II), is an abnormal prothrombin that is generated via defective carboxylation in malignant cells [49]. It serves as both an autologous growth factor and a promoter of vascular invasion [67]. It has been used extensively in Japan, where it has been integrated into standard surveillance and diagnosis guidelines [68,69,70]. In a phase II study of 131 patients with early HCC in the USA, DCP had an AUROC of 0.72, as compared to 0.8 and 0.66 for AFP and AFP-L3 respectively [47]. Prior single-center studies comparing the three markers suggested that DCP had the best or tied for the best performance characteristics for diagnosing HCC [55, 56]. However, in a phase III study, its sensitivity for HCC 1 year prior to diagnosis was poor (12.1%) compared to AFP ≥ 20 ng/mL (35%) or AFP-L3 (34.3%) [57]. Combining DCP with AFP and AFP-L3 has not been shown to substantially increase the AUROC and subsequent sensitivity for early HCC detection [58].

Liquid Biopsy

Liquid biopsy is a technique that utilizes a biofluid sample such as blood, cerebrospinal fluid, or plasma to detect and analyze markers to evaluate disease and prognosticate treatment outcomes. Among these are DNA methylation markers, cell-free or circulating tumor DNA, circulating tumor cells, and extracellular vesicles. While a relatively novel technology, recent data on their role in the early detection of HCC has been promising [71].

DNA Methylation Markers and Cell-Free DNA

Epigenetic DNA methylation silences tumor suppressor genes, thus promoting tumorigenesis and cancer progression. Alterations in the methylation epigenome are sometimes the first neoplastic changes in early-stage HCC [72]. Three of these markers were integrated into the multitarget HCC blood test (mtHBT) algorithm along with AFP and sex—this proprietary test, known as Oncoguard Liver, had an overall sensitivity, specificity, and AUROC of 82%, 87%, and 0.94, respectively, for early-stage HCC in a phase II validation study [59]. A head-to-head trial comparing mtHBT to US with and without AFP is ongoing (ALTUS trial; NCT 05064553).

Cell-free DNA, also known as circulating tumor DNA (ctDNA), is composed of small fragments of nucleic acid that are not associated with cells or cell fragments [73, 74]. Ongoing clinical trials of one proprietary test for HCC detection, HelioLiver, found a sensitivity of 76% and AUROC of 0.94 compared to AFP and GALAD algorithm; this platform is also undergoing validation in a trial (NCT 05199259) [60]. Prospective use of ctDNA can also extend to monitoring—mutation profiles of ctDNA in advanced HCC have been suggested to predict poor response to systemic therapies [75].

Extracellular Vesicles

Extracellular vesicles (EV) are enclosed nanoparticles that are secreted by cells. They can contain biochemical cargo including genetic material, proteins, and micro RNAs extruded by diseased and/or malignant cells, and thus are an attractive target for liquid biopsy. Presently, they have limited application because of factors such as poor reproducibility, lack of standardized EV isolation techniques, and suboptimal performance of candidate markers to date [76]. A recent preliminary study used immunoaffinity-based chips to purify and capture EVs in plasma samples from 158 patients belonging to two cohorts: newly diagnosed, treatment-naïve HCC and at-risk cirrhotic patients. The results found that EVs had an AUROC of 0.93 and sensitivity of 94% for detection of Barcelona Clinic Liver Cancer (BCLC) stage 0/A HCC in at-risk cirrhotic patients compared to AUROC of 0.69 for AFP [61]. Further validation trials are ongoing.

Algorithms

Single biomarkers may be limited because of the inherently diverse molecular pathogenesis, variety of genotoxic insults, and heterogenous growth patterns that contribute to HCC [77, 78]. Therefore, panels or algorithms incorporating multiple biomarkers along with clinical data have been shown to improve the sensitivity and specificity of HCC detection [56] (Table 4).

Table 4 Biomarker-integrated algorithms

GALAD Score

The GALAD score (gender, age, AFP-L3, AFP, DCP) was initially investigated in a cohort of patients from a single institution in the UK [79]. In a phase II study of nearly 7000 patients from Germany, Japan, and Hong Kong, it had an overall AUROC of 0.93 and 0.94 for the Japanese and German cohorts, respectively, and performed equally regardless of cirrhosis etiology, status of sustained viral response (SVR), or status of HBV treatment [80]. A phase II multicenter data from the Early Detection Research Network (EDRN) cohort was equally promising, noting an AUROC of 0.88 with a sensitivity between 76 and 79% and specificity between 79 and 86% for early-stage HCC depending on cutoff values used [81]. Initial phase III analysis in small cohorts found a sensitivity of 53.8% for detecting early HCC within 6 months of diagnosis, and no differences in AUROC among single measurements of GALAD, HES, AFP-L3, or DCP [54•]. Recent data from a larger phase III analysis of over 1700 patients in the Hepatocellular Early Detection Study (HEDS) demonstrated improvements in sensitivity to 65% and 62% at 6 and 12 months prior to diagnosis respectively [82•]. Similar to AFP, the sensitivity of GALAD can improve with longitudinal measurements [66]. Based on these encouraging findings, the EDRN has funded the multicenter National Liver Cancer Screening Trial, which will randomize 5000 patients with cirrhosis and chronic HBV into two arms—current standard of care (AFP with US) versus GALAD alone—to assess the reduction in the proportion of late-stage HCC. This trial is set to launch enrollment in early 2024.

Doylestown Algorithm

Similar to the GALAD score, the Doylestown algorithm (DA) includes clinical factors (age and gender) and serum-based tests (log AFP, alkaline phosphatase [ALP], and alanine aminotransferase [ALT]) [83]. A phase II validation study of 162 patients (93 early-stage HCC cases and 93 non-HCC controls) showed an AUROC of 0.93 for early-stage detection, superior to AFP alone (AUROC of 0.80). The inclusion of fucosylated kininogen into the DA creates a modified algorithm known as DA Plus that significantly improved AUROC to 0.97 [84]. A nested case-control study of 29 patients with HCC and 58 cirrhosis controls found the DA Plus to have an overall sensitivity of 63.2% compared to 57.9% and 47.4% for GALAD and AFP, respectively [85]. A modified DA Plus model is currently being investigated in a phase II cohort of 766 patients (NCT 03878550).

HES Algorithm

The hepatocellular carcinoma early detection screening (HES) algorithm includes age, AFP, rate of AFP change within the past year, ALT, and platelet count and was initially studied in patients with HCV cirrhosis [86]. It has since been validated in a nearly 5000 patient cohort of Veterans Affairs patients with cirrhosis of any etiology, in whom the HES algorithm identified HCC 6 months prior to diagnosis with a sensitivity of 53%, as compared to 48% for AFP [87]. Another prospective phase III study had previously found a sensitivity of 36.4% at a fixed 10% FPR, which was not statistically significantly different from AFP, GALAD, or AFP-L3 [54•]. Based on currently available data, the HES algorithm is insufficient to perform independently as a screening tool.

Multi-Cancer Early Detection Test

Multi-cancer early detection (MCED) tests use genomic sequencing or other approaches, sometimes in combination with machine learning, to detect signals from multiple cancers via analysis for cfDNA and other circulating products [62•]. They have been touted as the next advance in cancer screening, with one analysis of patients in the USA and the UK estimating that an MCED test with 25–100% uptake could detect hundreds of thousands of additional breast, cervical, colorectal, and lung cancer in a cost-effective manner [88]. A validation study using the proprietary Galleri platform by GRAIL in over 4000 patients (2823 with cancer and 1254 without cancer) found an overall sensitivity and specificity of 51.5% and 99.5%, respectively. Across all cancers, sensitivity increased consistently with advancing stage, from 16.8 to 40.4 to 77 to 90.1% for stages I, II, III, and IV respectively [62•]. In 89% of patients, the test accurately identified the primary site of origin. Subgroup analyses of liver cancers showed >70% sensitivity across tumor stages, although this should be interpreted with caution due to the limited inclusion of early-stage tumors (6 stage I and 10 stage II), which can impact the generalizability of the data. Several other commercial companies including Freenome and Exact Sciences have initiated trials of their proprietary MCED panels. While early results are promising, early-stage cancer detection is suboptimal. Further evaluation is essential before MCED tests can be recommended for general screening.

Conclusion

Biomarkers are emerging diagnostic tools that may complement or perhaps supersede imaging in the diagnosis of HCC. While imaging modalities have technological limitations and may be patient- and operator-dependent, biomarkers allow for ease of access, improved sensitivity and specificity, rapid turnaround, and potential improvement in cost-effectiveness. Proper validation of biomarkers is necessary prior to clinical implementation. Biomarkers have the potential to enhance early detection and ultimately improve survival in patients with HCC.