Introduction

In the natural history of liver cirrhosis, gastroesophageal varices (GOV) is one of the common complications, which leads to variceal bleeding of 5–15% per year and 6-week mortality up to 20% [1]. Moreover, for decompensated patients with gastrointestinal bleeding, the annual mortality rate is up to 57% [2]. Early detection of the presence and the stage of GOV is, therefore, of great importance in cirrhotic patient management, especially for patients with “high risk” GOV, defined as medium/large varices, or small varices with red signs or in Child–Pugh C [3]. For primary prophylaxis of variceal hemorrhage, while patients with small varices with red wale marks or Child–Pugh C class were recommended to be treated with non-selective beta blockers (NSBB), NSBB or endoscopic band ligation is recommended in patients with medium-large varices [3]. To detect GOV in cirrhotic patients, esophagogastroduodenoscopy (EGD) is recommended as a screening method [1]. However, the invasive nature and the potential variceal hemorrhage induced by intensive discomfort limit the widely clinical repeat endoscopic evaluation. Thus, non-invasive assessment of GOV has been exploring for years.

Liver stiffness measurement (LSM) by transient elastography (TE) has been validated for liver fibrosis and cirrhosis assessment, but its use in GOV diagnosis is barely satisfactory [4, 5]. Platelet count combined with LSM have been proved to improve the predictive accuracy. As stated by Baveno VI consensus, patients with a LSM < 20 kPa and platelet count > 150 × 109/L have a very low risk of having varices requiring treatment, and can avoid screening endoscopy [6]. However, the proportion of patients exempted from EGD were far different (8.1–46.2% with missed diagnosis 0–13.3%) among validating studies [7]. Thereafter, Ding criteria (LSM < 25 kPa and platelet count > 100 × 109/L) and the expanded-Baveno VI criteria (LSM < 25 kPa and platelet count > 110 × 109/L) were proposed to exclude high-risk gastroesophageal varices (HRGOV) [8, 9]. Additionally, spleen thickness/diameter was also explored in aiding the GOV staging [10, 11]. Further validation and optimization of these non-invasive methods for GOV screening are still needed. Therefore, we perform this retro- and prospective study to evaluate the efficiency of combining markers such as ultrasonic spleen thickness (UST), platelet count and liver stiffness measurement by Fibroscan® for HRGOV prediction in patients with liver cirrhosis.

Methods

Study population

Cirrhotic patients with different etiologies were retrospectively enrolled for HRGOV detection algorithms derivation. In the validation cohort, we prospectively included consecutive cirrhotic patients with hepatitis B sustained viral response (HBSVR) (Clinical-Trials.gov: NCT04123509) from Nanfang Hospital, Southern Medical University (Table 1). All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008. Informed consent was obtained from all patients for being included in the study. All patients were given routine tests within 2 weeks of hospitalization, including EGD, LSM by TE (FibroScan®), upper abdomen ultrasonography, complete blood count, biochemistry and prothrombin indexes. Patients with non-cirrhotic portal hypertension, portal vein thrombosis, hepatocellular carcinoma, history of transjugular intrahepatic portosystemic shunt procedure, variceal bleeding, treatment by NSBB, splenectomy or endoscopic band ligation were excluded. Patients with thalassemia and other hemolytic disease were also excluded. HRGOV was defined as medium and severe GOV, mild GOV with red signs or Child–Pugh C, which was recognized as clinically significant and required treatment in standard clinical practice [1, 6].

Table 1 The demography and clinical characteristics of the enrolled cirrhotic patients

Transient elastography

Transient elastography was performed after overnight fasting or at least 2 h after meal using the FibroScan® (Echosens, France) equipped with a standard M probe within 2 weeks of EGD and routine laboratory tests. Details of the technique have been described previously [12]. A successful LSM was determined as at least ten successful shots, a success rate of more than 60% and an interquartile range/median ratio lower than 30%.

Abdomen ultrasonography

Upper abdomen ultrasonography was performed by two experienced ultrasound experts after overnight fasting of the patients using color Doppler ultrasonic diagnostic apparatus (Esaote Mylab™, Italy) equipped with CA431 convex array probe (1–8 MHz). Spleen thickness was determined as the maximum tangent distance from splenic hilum to contralateral margin of splenic hilum.

Routine algorithms for high-risk gastroesophageal varices detection

Other routine algorithms used for HRGOV detection in the current study were the LSM-spleen diameter to platelet ratio score [LSPS, LSM (kPa) × spleen thickness (mm) / platelet (109/L)]) [13] and platelet count to spleen diameter ratio [PSR, platelet (109/L) × 100 / spleen thickness (mm)] [14].

Statistical analyses

Statistical analysis was performed using the Statistical Package for Social Science (SPSS version 22.0; SPSS Inc., Chicago, IL, USA). Continuous variables were expressed as mean ± standard deviation or median (25th–75th percentiles) as appropriate. Categorical variables were compared by chi-squared test or Fisher’s exact test as appropriate. Continuous variables were compared by Student’s t test or Mann–Whitney U test for two independent samples, as appropriate. The overall performance of diagnostic algorithms was evaluated by area under receiver operating characteristics curve (AUROC) and its 95% confidence internal (CI). AUROCs comparison was performed by DeLong test using MedCalc (V 18.2.1, MedCalc Software bvba, Ostend, Belgium). Optimal cut-offs were chosen to obtain positive likelihood ratio (PLR) nearly 10.0 for ruling in diagnosis and at least sensitivity 95%, negative likelihood ratio (NLR) nearly 0.1 for ruling out diagnosis to provide statistically conclusive and strong evidence. All statistical tests were two-sided. Statistical significance was taken as p < 0.05.

Derivation of algorithms for HRGOV detection

Multivariate logistic regression analysis was performed to construct a HRGOV prediction model in the deriving cohort. The significant associated parameters in univariate analysis were included in multivariate logistic regression analysis. Spleen thickness, age, LSM and albumin were found to be the independent factors associated with HRGOV (Table 2). According to the partial regression coefficients, an equation index named SALA was derived as follows: SALA  =  0.118 × spleen thickness (mm) + 0.037 × age (years) + 0.022 × LSM (kPa) – 0.060 × albumin (g/L). For resource-limited area without FibroScan®, the simple index named SPA was also derived as follows: SPA = 0.086 × spleen thickness (mm) – 0.008 × platelet count (109/L) – 0.101 × albumin (g/L).

Table 2 Logistic regression analysis of the factors for HRGOV prediction in deriving cirrhotic cohort

Results

Characteristics of the study population and predictors for HRGOV

In total, 305 patients were retrospectively enrolled in the deriving cohort, including 143 (46.9%) patients with hepatitis B viral infection, 71 (23.3%) patients with hepatitis C, 54 (17.7%) patients with alcoholic liver disease and 37 (12.1%) with autoimmune liver disease. The majority of them were male (84.3%), with mean age of 45.3 ± 11.2 years old. One hundred and twenty-eight (42.0%) patients were discriminated as HRGOV. Patients with HRGOV were characterized with older age, higher LSM, lower alanine aminotransferase and serum albumin, higher spleen thickness and Child–Pugh stage, more severe thrombocytopenia and prothrombin time abnormality. In the validation HBSVR cohort, among the 341 patients prospectively enrolled (enrollment process of HBSVR patient has been published previously [15]), 13 patients without datum of UST were further excluded from analysis. Among the 328 patients enrolled in the final analysis, 67 (20.4%) were classified as HRGOV (Table 1).

Diagnostic performance of different predictors for HRGOV

The AUROCs of HRGOV detection are presented in Table 3 and Fig. 1. Briefly, in the deriving cohort, while the AUROCs of SALA and SPA were the most superior, the AUROCs of UST, LSPS, and PSR were characterized with medium efficiencies, followed by platelet count. The AUROCs of albumin and LSM were the most inferior. In the HBSVR validation cohort, all AUROCs were superior to those in the deriving cohort. The AUROCs of LSPS, SALA, SPA, and PSR were superior to those of UST and platelet. The AUROCs of LSM and albumin were the most inferior.

Table 3 The area under receiver operating characteristics curve for HRGOV detection in cirrhotic patients
Fig. 1
figure 1

Area under receiver operating characteristics curve of spleen thickness-age-liver stiffness-albumin index (SALA), spleen thickness-platelet-albumin index (SPA), liver stiffness-spleen diameter to platelet ratio score (LSPS), platelet count to spleen diameter ratio (PSR), platelet (PLT) and ultrasonic spleen thickness (UST) for high-risk gastroesophageal varices detection in deriving cohort (a) and validating cohort (b)

The suggested optimal cut-offs and their diagnostic performances of HRGOV detection are shown in Table 4. In general, the proportion of EGDs (27.1%) spared by UST screening in the HBSVR cohort was slightly lower than that in the deriving cohort (29.2%), while the EGD spared proportions of other predictors were superior in HBSVR cohort. The algorithms comprising multiple variables, such as SALA, SPA, LSPS, and PSR, showed superior performances with EGD spared proportions 46.6%, 38.0%, 34.4%, and 20.7% in the deriving cohort, 68.1%, 66.8%, 66.8%, and 66.5% in the HBSVR validation cohort, respectively.

Table 4 Optimal diagnostic cut-offs and diagnostic performance of predictors detecting HRGOV in deriving and HBSVR cohorts

Although expanded-Baveno IV criteria also spared 61.0% of patients from EGDs in the HBSVR validation cohort, the sensitivity decreased to 88.1%, which resulted in 12% of missed diagnosis. Instead, Baveno IV criteria spared one third patients from EGDs with 100% sensitivity. Figure 2 graphically presents the superiority of SPA in EGDs spared as compared with Baveno VI.

Fig. 2
figure 2

Comparison of spleen thickness-platelet-albumin index (SPA) with Baveno VI criteria for high-risk gastroesophageal varices (HRGOV) detection in the deriving cohort and validation cohort. Spe: specificity, PLR: positive likelihood ratio, Sen: sensitivity, NLR: negative likelihood ratio, EGD: esophagogastroduodenoscopy

Stepwise combination of different predictors for HRGOV

To further improve the EGD spared proportions, stepwise applicating predictors were also tried. In the deriving cohort, the EGD spared proportions of stepwise applying UST → SALA (or SPA) → expanded-Baveno VI criteria were nearly 55%, which was similar with stepwise applying UST → SALA. In the HBSVR validation cohort, stepwise applying UST → SALA, UST → SPA spared similar EGD proportions with SALA, SPA, LSPS and PSR (66.5% ~ 68.1%). However, the EGDs spared proportion of Baveno VI criteria → SALA would be up to 73.5% (Table 4).

Liver stiffness measurement free algorithms for HRGOV prediction

LSM free algorithms included SPA, PSR, and stepwise combination of UST → SPA. In the deriving cohort, SPA, PSR, and UST → SPA spared 38.0%, 20.7%, and 48.9% of patients from EGDs, respectively. In the HBSVR validation cohort, these numbers increased up to 66.8%, 66.5%, and 68.0% (Table 4).

Discussion

In the present study, UST, age, LSM and albumin were independent predictors of HRGOV. Among these related variables, UST is the most important variable with the highest odd ratio. Apart from hemolytic disease, splenomegaly represented as extended UST was the consequence of portal hypertension. LSM was not only correlated with liver fibrosis stages, but also correlated with inflammation grade indicated by ALT level [16]. Sustained progression of liver inflammation and fibrosis would result in portal hypertension characterized as GOV and ascites. The predicting values of spleen diameter and LSM for HRGOV were also displayed in previous studies. Recent systemic review and meta-analysis indicated that compared with LSM and spleen stiffness, LSPS comprising LSM, spleen diameter and platelet detected HRGOV with the best AUROC [17]. Additionally, while accumulation of hepatic fibrosis increases with older age, the hypoalbuminemia in liver disease implied more severe cirrhosis. In the present study, UST spared both 29.2% of EGDs in the deriving cohort and HBSVR validation cohort, respectively. While combining UST, age, LSM and albumin, the SALA exclude HRGOV with at least 95% sensitivity, NLR 0.08 and included HRGOV with at least specificity 95%, PLR 10.5, which were characterized with statistically conclusive and strong evidence. As for diagnostic efficiency, SALA exempted 46.6%, 68.1% patients from EGDs in the deriving cohort and validating cohort, respectively.

Considering FibroScan® may be not available in resource-limited area, another index without LSM named SPA was derived, which consisted of UST, platelet and albumin. Although the EGDs spared proportion by SPA was slightly lower than that of SALA in the deriving cohort (38.0% vs 46.6%), the EGDs spared proportions were similar in the HBSVR validating cohort (66.8% vs 68.1%).

Furthermore, the current study also tried to improve EGDs spared performance by stepwise applying several indexes, but the performance of sparing EGDs seemed not improved. By combining UST and SALA, the EGDs spared proportion was up to 56.7% in the deriving cohort and 68.6% in the validating cohort, respectively. Among different stepwise applying algorithms (Table 4), the best algorithm of stepwise applying in the deriving cohort was UST → SALA, with EGDs spared proportion of 56.7%. In the validating cohort, the best spared proportion 73.5% was obtained in stepwise applying Baveno VI → SALA.

Interestingly, for predictors comprising multi-variables such as SALA, SPA, LSPS, and PSR, cut-offs were all lower in the HBSVR validating cohort, while the EGDs spared proportions were also improved (Table 4). It can be explained by the variables comprised in these algorithms including UST, LSM, albumin and platelet. In patients with HBSVR, the improvement of splenomegaly, high level of LSM, thrombocytopenia and hypoalbuminemia definitely lowered the calculated results of SALA, SPA, LSPS, and PSR, thus decreased the diagnostic cut-offs. The improvement of diagnostic performances may be attributed to the attenuated hepatic inflammation by antiviral treatment and improved correlation between HRGOV and related markers.

In the present study, the AUROC of platelet in predicting HRGOV was 0.756, which significantly was lower than that of UST. Previous study also showed that platelet alone did not accurately distinguish the stage of GOV [18]. Nevertheless, normal blood platelet counts somehow indicated non-HRGOV. Platelet counts more than 157 × 109/L in the present study excluded HRGOV with sensitivity 96.9% and NLR 0.09 in the deriving cohort, sensitivity 97.0% and NLR 0.06 in the HBSVR validating cohort, which were characterized as strong diagnostic evidence. Although Baveno VI criteria excluded HRGOV with mild superiority in sensitivity and NLR, the endoscopy free proportion was reduced to 12% which was much lower than 21% of platelet. This disparity may be resulted from the additional criteria of LSM < 20 kPa. Previous studies also indicated that the Baveno VI criteria only spared less than 20% of endoscopy [19,20,21,22]. Alternately, Augustin et al. [9] justified the cut-offs of LSM and PLT to 25 kPa and 110 × 109/L, respectively (namely expended-Baveno VI criteria). The adjusted cut-offs led to slightly decreased diagnosis accuracy (94.6% vs 93.1%), while more EGDs could be spared. However, this is not the case for HBSVR cohorts. Among prospective HBSVR cohorts involved in Wang et al. [15] study and the current study, expanded-Baveno IV criteria excluded HRGOV with sensitivities 88.6% and 88.1%, NLR 0.15 and 0.16, respectively, implying HRGOV omitting diagnosis of 11.4% and 11.9%, which suggested expanded-Baveno IV criteria not being suitable for excluding HRGOV in HBSVR cohorts. Why did HBSVR influence the diagnostic accuracy of expanded-Baveno IV criteria for excluding HRGOV? Thrombocytopenia in cirrhotic patients is not only related to hypersplenism, but also to myelosuppression and thrombopoietin reduction caused by hepatitis itself, thus the thrombocytopenia does not necessarily predict HRGOV. While hepatic inflammation was alleviated by HBSVR, LSMs declined and platelet counts increased due to attenuated marrow depression. Consequently, the LSM cutoff descended and the platelet cutoff increased, leading to lower cut-offs in HBSVR cohort for SALA, SPA, LSPS, and PSR.

In our previous HBSVR prospectively study, Baveno IV criteria stepwise combining spleen stiffness measurement ruled out HRGOV diagnosis and spared 61.6% EGDs with sensitivity 95.7% and NLR 0.08 [15]. In the same prospective HBSVR cohort enrolled in the present study as validation cohort, SPA consisting of routine variables not only ruled out HRGOV for 55.8% of patients with sensitivity 95.5% and NLR 0.07, but also diagnosed 11% of patients as HRGOV with specificity 97.7 and PLR 19.5. Accordingly, 66.8% of EGDs were spared by SPA without application of FibroScan®, which is slightly superior over Wang et al. [15] with similar accuracy. Compared with SALA, while SPA spared a little less patients from EDGs (SPA vs. SALA, 38.0% vs. 46.6%) with higher accuracy in the deriving cohort, the proportion of EGDs spared by both indexes were similar (66.8% vs 68.1%) in the HBSVR validating cohort. Considering FibroScan® may be not available in resource-limited area, SPA may be of more value for clinical application. In addition, the present study demonstrated the superiority of SPA in EGDs exemptions as compared with Baveno VI (Fig. 2).

The current study is characterized with several strengths. First, the optimal cut-offs were chosen to obtain at least 95% specificity and PLR nearly 10.0 for ruling in diagnosis, at least 95% sensitivity and NLR 0.1 for ruling out diagnosis to provide statistically conclusive and strong evidence [23, 24]. Likelihood ratio was not affected by the event rate of the study cohort [25], thus warranted the comparison of diagnostic effects in different cohorts. Considering that the omitted diagnosis of HRGOV results in no primary prophylaxis and may lead to varices hemorrhage with high-risk of death in patients with decompensated cirrhosis, it has been suggested by Baveno VI consensus that non-invasive method for GOV prediction should have a missed HRGOV rate < 5% [6]. Secondly, the excellent validating efficiency of SPA for HRGOV prediction in large HBSVR cohort implicated the reliability of extended utilization in treatment-experienced cohort. Thirdly, like PSR, the new index SPA, consisting of routine variables rather than LSM and spleen stiffness measurement, may be a preferred non-invasive method for HRGOV screening in resource-limited area without FibroScan®. As derived from a retrospective and multi-etiologies single center study, the diagnostic performance of the newly derived index SALA and SPA for HRGOV detection needs further external cohort validation, including treatment naïve and experienced cohorts with other etiologies.

In conclusion, an index named SPA derived from logistic regression equation including the easily acquired parameters of UST, platelet and albumin for HRGOV prediction was explored in the present study, the effectiveness of SPA and the previous derived simple index PSR was further satisfactorily validated in a prospective enrolled HBSVR cohort, with more than 66% of cirrhotic patients free from EGDs, which was significantly superior over Baveno VI criteria. For out-patient screening, the parameters involved in these new indexes were readily available and efficacious but still need more validation.