Introduction

Heart failure (HF) is a clinical syndrome that occurs when intracardiac pressures are elevated and/or cardiac output is inadequate due to a structural and/or functional abnormality of the heart [1, 2]. HF affects an estimated 26 million people worldwide [3], and has high incidence and mortality rates especially in intensive care units (ICUs) [4, 5], so it confers a substantial burden on individuals, healthcare systems and society around the globe [2]. Pulmonary congestion is a common condition for patients with HF that can be semi-quantified by lung ultrasound (LUS), and correlates with high rates of readmission and death [6, 7]. Decongestive strategies including diuretics and vasodilators are the cornerstone of acute HF treatment [1, 8]. In a prospective, single-center, observational study, patients hospitalized for decompensated HF were enrolled and the Kaplan–Meier analysis found a better event-free survival in the group with B-lines negative at discharge compared with its counterpart [9], while a higher number of B-lines at discharge identified patients at increased risk for hospitalization and death [10,11,12,13,14,15].

The several existing LUS scoring systems differ in the number and designation of thoracic zones scanned, and B-line quantification methods [16]. Recent studies in HF cohorts have reported 4, 6, 8, 10, 12, or 28 chest zones [12, 17,18,19,20,21], and 8‑point and 28-point methods have been recommended to assess cardiogenic pulmonary edema by an international guideline in 2012 [22]. A prospective study conducted in four emergency departments demonstrated that the highest increase in the C-index on top of the clinical diagnosis score was observed with the 8-point method among four methods (4-, 6-, 8-, and 28-point methods) [23]. Meanwhile, it was confirmed that 8‑point and 28-point methods have similar prognostic value in patients with HF [11]. But a single-center, prospective observational study of 20 ICU patients showed that the examination time of 8‑point method is significantly shorter than that of 28-point method with no significant reduction in B-lines detection [24]. Furthermore, a prospective comparative study has found that 8‑point method could be timesaving with similar reproducibility when compared with 28-point method [16]. The simplified 8-zone protocol, therefore, can be recommended for clinicians both in the clinical setting and for research [25].

To our knowledge, there were six approaches of B-line quantification on LUS after scanning 8 zones [26,27,28,29]. Although 8‑point method has been accepted because it has a timesaving benefit over 28-point method, to date, there is no published evidence to demonstrate that one of the B-line quantification approaches is clinically superior to the others, and the prognostic value, feasibility, and reproducibility are not compared among different scoring methods of 8-point LUS protocol.

Accordingly, we aimed to evaluate the associations between each of B-line quantification methods of 8-point LUS protocol and the composite outcome (readmission to the ICUs or death within 180 days) in patients with HF at emergency intensive care unit (EICU), and to compare the prognostic value, feasibility, and reproducibility for selecting optimal methods.

Methods

Study patients

In this single-center, prospective study, we consecutively enrolled patients admitted to EICU for acute HF from 1 January 2018 to 31 December 2020. The inclusion criteria were: (1) age ≥ 18 years; (2) acute HF diagnosis, satisfying the European Society of Cardiology guidelines [8]; (3) chest radiography or CT showed central distribution of edema, increased heart size or increased pulmonary blood volume [30], or lung ultrasonography demonstrated bilateral and symmetric distribution of B-lines at EICU admission [31]; and (4) patients did not require mechanical ventilation or underwent ventilator weaning at EICU discharge. Exclusion criteria were: (1) severe diseases hampering image acquisition (pulmonary fibrosis, severe emphysema, pulmonary cancer or metastases, breast prosthesis, pleurisy, previous pneumectomy or lobectomy); (2) pregnancy; and (3) features that would influence follow-up [12, 32].

Biochemical analysis

Blood samples were obtained within 24 h before EICU discharge. D-Dimer was measured using automated coagulation analyzer (Coapresta 2000; Sekisui, Osaka, Japan). Liver and kidney’s function, electrolyte, brain natriuretic peptide, and troponin I were measured using automation solution (APTIO; Siemens, Nuremberg, Germany).

Echocardiography

Echocardiography was performed using the Mindray M7 Diagnostic Ultrasound System (Shenzhen Mindray Bio-Medical Electronics Co., Ltd, Guangdong, China) within 24 h before EICU discharge. Left ventricular ejection fraction (LVEF) was obtained using the biplane Simpson’s method [33]. Measurements of inferior cava (IVC) diameter and IVC collapsibility index (IVC-CI) using M-mode imaging were measured as recommended [34]. Pulmonary arterial systolic pressure (PASP) was estimated using continuous wave Doppler (CW Doppler) echocardiography via assessment of the peak tricuspid regurgitation velocity (TRV) and taking into account right atrial pressure (RAP), according to recent guidelines [35, 36].

Lung ultrasound

The study physician performed 8-zone LUS examinations using a phased array transducer on commercially available ultrasound machines (Shenzhen Mindray Bio-Medical Electronics Co., Ltd, Guangdong, China) after immediate echocardiography before EICU discharge. Patients were assessed while lying in a semi-supine position (45∘), and each window was recorded for 6–7 s [37]. We evaluated the anterolateral chest, including two parasternal chest scans and two scans of the anterior and lateral basal chest on the right and left hemi thoraxes [33]. To date, six B-line quantification methods to our knowledge have been mentioned in recent studies, which were as follows:

Quantitative method 1: the number of positive zones (positive zone: at least three B-lines on a frozen image) [14].

Quantitative method 2: the number of positive zones (positive zone: in presence of ≥ 3 B-lines simultaneously or pleural effusion on a frozen image) [16].

Quantitative method 3: the total number of B-lines (fused B-lines which cannot be distinguished as separate are counted as a single B-line) [29].

Quantitative method 4: the counting method of single and confluent B-lines was shown in Additional file 1, and the score ranging from 0 to 80 was calculated to summarize the B-lines of 8 zones [28].

Quantitative method 5: the scores of discrete B-lines were determined by B-line counts, and the scores of confluent B-lines were assessed by multiplying the percentage of rib interspace occupied by B-lines by 10; the total B-line count was determined by summing the scores of 8 zones [26].

Quantitative method 6: the scores of discrete B-lines were determined by B-line counts, while the scores of wide, fused, or coalescing B-lines were determined by multiplying the percentage of the intercostal space filled with confluent B-lines by 20; the total score was calculated by summing the scores of 8 zones [27].

According to their characteristics, we divide the six B-line quantification methods into two categories [29]: (1) the B-pattern scoring systems: the number of positive zones (Quantitative methods 1 and 2), (2) the B-line count: the total number of single and confluent B-lines which were regarded as single B-lines (Quantitative method 3), and the total number of single and fused B-lines, which were preliminarily estimated (Quantitative method 4) or more accurately calculated (Quantitative methods 5 and 6) based on the percentage of the rib space.

A single operator performed 8-point LUS protocol and collected ultrasound images in the intercostal space, two experienced emergency physicians were responsible for image interpretation using six B-line quantification methods, and the measured data were averaged. The time spent for interpretation was measured, and all of them were blinded to patient's course and took no part in the clinical management.

Sample size calculation

The sample size in logistic regression analysis was sufficient on the basis of the minimal ten events per variable rule [38, 39]. The minimum sample size in other statistical methods was calculated using power analysis and sample size (PASS) software (version 11.0.7; PASS, NCSS, LLC) [40, 41]. Meanwhile, the power was set to be at least 80%, and the p value was set to be less than 0.05.

Outcomes

The primary outcome was the statistical performance of B-line quantification methods, including discrimination, calibration and clinical usefulness (the applicability of quantification methods to contemporary clinical practice) [42, 43]. Secondary outcomes included the feasibility (the time spent for image interpretation) and reproducibility of diverse B-line quantification methods.

Statistical analysis

Categorical data are presented as numbers with percentages, and continuous variables are represented as means with standard deviations or medians with 25–75% interquartile ranges displayed, depending on whether data distribution was normal or nonnormal. Patient baseline characteristics were recorded, Fisher’s exact tests or Chi-squared tests were used to compare categorical variables, Student t tests were used to compare normally distributed variables, and Wilcoxon rank tests were used to compare continuous variables which were not normally distributed.

Multivariable logistic models were used to estimate odd ratios (ORs) with 95% confidential intervals (95% CIs) of the composite outcome (readmission to the ICUs or death within 180 days). The composite outcome was repeated in the models: (1) Model 1: crude model without adjustment; and (2) Model 2: adjusted for variables with P < 0.1 between the two groups with and without primary outcome.

Discrimination assessed how well these B-line quantification methods differentiates between those patients who experienced the composite outcome (readmission to the ICUs or death within 180 days) and those who did not, and it was measured by the area under the ROC curve (AUC); calibration was assessed graphically using a calibration plot; clinical usefulness was shown by decision curves. According to an arbitrary guideline, the discriminative ability of a test is considered: non-informative (AUC = 0.5); low accurate (0.5 < AUC ≤ 0.7); moderately accurate (0.7 < AUC ≤ 0.9); highly accurate (0.9 < AUC < 1); or perfect (AUC = 1) [44]. Differences in the feasibility among B-line quantification methods were presented in boxplot. We evaluated the inter-rater reliability of B-line quantification methods between two emergency physicians using intraclass correlation coefficients (ICC), according to the criteria stated by the guidelines (less than 0.40—poor; between 0.40 and 0.59—fair; between 0.60 and 0.74—good; between 0.75 and 1.00—excellent) [45].

A P value < 0.05 was considered significant. Statistical analyses were performed using R software (version 4.2.2, R foundation for Statistical Computing, Vienna, Austria).

Results

Patient characteristics

A total of 71 patients were enrolled during the period of January 2018 to December 2020. Baseline characteristics and medical treatments are shown in Table 1. In the whole cohort, the median age was 79 years, and 50.70% of patients were male. The etiologies of current HF hospitalization were coronary heart disease (70.4%), hypertensive heart disease (15.5%), and valvular heart disease (4.2%).

Table 1 Demographic and clinical properties and medical treatments for patients with acute heart failure

There were no significant statistical differences regarding the demographics or clinical characteristics between both groups, except for the presence of higher age in patients with composite outcome events (80.50 (72.00, 86.00) vs. 76.00 (71.00, 82.00), p = 0.034). We observed that higher scores of six B-line quantification methods were related to the composite outcome. With respect to drugs prescribed in EICU, we found patients with composite outcome events had lower angiotensin-converting enzyme inhibitors (ACE-I)/angiotensin receptor blockers (ARBs) usage rates and higher Furosemide usage rates (5.3 vs. 24.2%, p = 0.037; 100.0 vs. 87.9%, p = 0.042, respectively).

Relationship between the LUS scoring methods and the composite outcome

Table 2 presents the association between six B-line quantification methods and the composite outcome (readmission to the ICUs or death within 180 days).

Table 2 B-line quantification methods related differences of the composite outcome (readmission to the ICUs or death within 180 days) (n = 71)

In the crude model (Model 1), Quantitative method 1 (OR: 1.446, 95% CI 1.117–1.872, p = 0.005), Quantitative method 2 (OR: 1.444, 95% CI 1.114–1.872, p = 0.006), Quantitative method 3 (OR: 1.127, 95% CI 1.036–1.255, p = 0.005), Quantitative method 4 (OR: 1.112, 95% CI 1.027–1.203, p = 0.009), Quantitative method 5 (OR: 1.128, 95% CI 1.039–1.225, p = 0.004), and Quantitative method 6 (OR: 1.094, 95% CI 1.029–1.164, p = 0.004) increased the risk of the composite outcome.

When adjusting for the potential risk factors (Model 2), Quantitative method 1 (OR: 1.657, 95% CI 1.183–2.321, p = 0.003), Quantitative method 2 (OR: 1.659, 95% CI 1.183–2.325, p = 0.003), Quantitative method 3 (OR: 1.167, 95% CI 1.048–1.299, p = 0.005), Quantitative method 4 (OR: 1.156, 95% CI 1.038–1.287, p = 0.008), Quantitative method 5 (OR: 1.174, 95% CI 1.052–1.310, p = 0.004), and Quantitative method 6 (OR: 1.139, 95% CI 1.046–1.241, p = 0.003) increased the risk of the composite outcome.

Discrimination, calibration, and clinical usefulness of B-line quantification methods

Quantitative method 1 (AUC = 0.707, 95% CI 0.584–0.829, p < 0.001; cutoff value 2.5; specificity 78.9%, sensitivity 60.6%), Quantitative method 2 (AUC = 0.705, 95% CI 0.583–0.828, p < 0.001; cutoff value 2.5; specificity 78.9%, sensitivity 60.6%), Quantitative method 3 (AUC = 0.711, 95% CI 0.589–0.832, p < 0.001; cutoff value 10.5; specificity 76.3%, sensitivity 60.6%), Quantitative method 4 (AUC = 0.717, 95% CI 0.596–0.838, p < 0.001; cutoff value 13.5; specificity 68.4%, sensitivity 69.7%), Quantitative method 5 (AUC = 0.714, 95% CI 0.593–0.835, p < 0.001; cutoff value 18.5; specificity 78.9%, sensitivity 57.6%), and Quantitative method 6 (AUC = 0.713, 95% CI 0.594–0.832, p < 0.001; cutoff value 24.5; specificity 68.4%, sensitivity 66.7%), showed moderately accurate discriminative values in differentiating patients with the composite outcome (Fig. 1). There was a moderately good discriminative value between six quantification methods and the composite outcome.

Fig. 1
figure 1

ROC curves analysis of six quantification methods. ROC curves with the optimal cutoff value (specificity, sensitivity) were drawn for each quantification method. ROC receiving operating characteristic, AUC area under the curve, CI confidence interval

The calibration curve of six B-line quantification methods for the probability of the composite outcome showed good agreement between prediction and observation (Fig. 2). Figure 3 illustrates the decision curves for six B-line quantification methods to predict the composite outcome in patients with HF. Decision curve presented that six B-line quantification methods presented similar net benefits at the entire range of threshold probabilities.

Fig. 2
figure 2

Calibration curves of six quantification methods for predicting the primary outcome (readmission to the intensive care units or death within 180 days) in patients with heart failure. A Calibration curve of Quantification method 1 for predicting the primary outcome. B Calibration curve of Quantification method 2 for predicting the primary outcome. C Calibration curve of Quantification method 3 for predicting the primary outcome. D Calibration curve of Quantification method 4 for predicting the primary outcome. E Calibration curve of Quantification method 5 for predicting the primary outcome. F Calibration curve of Quantification method 6 for predicting the primary outcome

Fig. 3
figure 3

Decision curves of six quantification methods for predicting the primary outcome (readmission to the intensive care units or death within 180 days) in patients with heart failure

The feasibility and reproducibility among six quantification methods

The boxplot (Fig. 4) indicates a significant difference of the time spent for image interpretation among six B-line quantification methods. Image interpretation time of the B-pattern scoring systems (Quantitative methods 1 and 2) was significantly less than that of other methods, and interpretation time of the B-line count based on the percentage of the rib space (Quantitative methods 5 and 6) was significantly more than that of other methods.

Fig. 4
figure 4

Comparisons of the feasibility (image interpretation time) among six quantification methods. The boxes show the median and interquartile range (IQR)

ICC for Quantitative methods 1 and 2 between two experts were 0.927 (0.885–0.954) and 0.886(0.824–0.928), respectively, which demonstrated the excellent level of clinical significance; ICC for Quantitative methods 3, 4, 5, and 6 between two experts were 0.740 (0.614–0.830), 0.737 (0.609–0.827), 0.749 (0.626–0.836) and 0.747 (0.623–0.835), respectively, and the level of clinical significance was good (Table 3).

Table 3 ICC for quantification methods between two experts (inter-rater reliability)

Discussion

In this prospective observational study, we found that higher scores of 8‑point lung ultrasonography were related to the composite outcome, and six B-line quantification methods had similar discrimination, calibration and clinical usefulness, the feasibility, and reproducibility of B-pattern scoring systems (Quantitative methods 1 and 2) was nevertheless superior to other methods.

Studies have demonstrated that pulmonary congestion detected by 8-point LUS protocol could predict rehospitalization for HF events and/or all-cause death. Two studies have suggested there was a significant positive correlation between the number of positive zones (Quantitative method 1) at discharge and the risk of 30 day readmission or 4-year all-cause mortality [14, 46]. Several studies showed that B-lines (Quantitative method 3) at admission or before discharge were associated with HF readmission or death at 1, 3, 6, or 12 months [33, 47,48,49,50]. A multivariate cox regression showed that B-lines ≥ 30 (Quantitative method 4) at admission was a risk factor for 1-year death or rehospitalization [28]. Compared to those with < 19 B-lines (Quantitative method 5), patients with ≥ 19 B-lines at admission had a fourfold higher hazard of in-hospital mortality [26]. Our study was consistent with these results, we also found that the scores of Quantitative methods 2 and 6 at discharge were independent predictors of readmission to the ICUs or death within 180 days.

Previous studies have demonstrated that the scores assessed through 3 B-line quantification methods had low or moderate discriminatory power to predict the outcome. B-lines (Quantitative method 3) at admission were predictive for HF readmission or death at 6 months (AUC 0.68, p < 0.001) [33]; B-lines ≥ 30 (Quantitative method 4) at admission demonstrated a moderately accurate performance (AUC 0.75, p < 0.010) in predicting 1-year death or rehospitalization [28]; B-lines ≥ 19 (Quantitative method 5) at admission had a moderately discriminatory performance to predict in-hospital mortality (AUC 0.79, p < 0.010) [26]. Similarly, we found the area under the curve of six B-line quantification methods was 0.707–0.717 with p < 0.001. Meanwhile, they showed similar agreement between prediction and observation, and presented similar net benefits at the entire range of threshold probabilities. Therefore, different B-line quantification methods had comparable discrimination, calibration, and clinical usefulness.

Image interpretation time of B-pattern scoring systems (Quantitative methods 1 and 2) was significantly less than that of other methods, which suggested that Quantitative methods 1 and 2 were more timesaving than other methods. Moreover, several studies demonstrated that inter-observer agreement of B-line quantification methods between competent observers was proved to be good or excellent. Anderson KL found that the interclass correlation coefficients for Quantitative methods 3 and 5 between emergency physicians with experience in pleural sonography were 0.84 (0.81–0.87) and 0.87 (0.85–0.90), respectively [29], and it was previously shown that the ICC of Quantitative methods 3 between two experienced observers was 0.663 (0.540–0.753) [45]. In our study, ICC for Quantitative methods 1 and 2 were 0.927 and 0.886, respectively, and ICC for Quantitative methods 3, 4, 5, and 6 were 0.740, 0.737, 0.749, and 0.747, respectively. The results of our and other studies were not entirely comparable, and different probes of lung ultrasound, operator expertise, quantity of LUS images, experience of observers, comorbidities of patients and the timing of image interpretation (immediate or delayed review) were possibly responsible for the changes [45, 51, 52].

A number of image acquisition protocols ranging from 4 to 28 chest regions have been described, and 8‑point lung ultrasonography is increasingly used in the clinical setting[25], furthermore, there is a consensus paper already recommending Quantitative method 1 of 8-zone LUS examination in patients with acute decompensated HF [22]. The results of our study were in line with current consensuses. Besides, we found that Quantitative method 2 has similar discrimination, calibration, clinical usefulness, feasibility and reproducibility with Quantitative method 1. It could be explained that Quantitative method 1 and Quantitative method 2 are both regarded as B-pattern scoring systems, and they have high similarity in Quantifying the number of positive zones. We, thus, recommend B-pattern scoring systems (Quantitative methods 1 and 2) to be conducted into the routine use.

Our study had some potential limitations. First, like similar previous studies, there was no “golden standard” for B-line identification in our study. Second, we did not adjust for more confounding factors in logistic model because of limited sample size. Third, heterogeneity among the study subjects might be high because of patients at EICU with complicated comorbidities. Fourth, patients admitted to EICU for acute HF were enrolled, but study population did not include patients in other setting like internal medicine wards or outpatient clinic. Finally, LUS was performed to all patients before EICU discharge, and we did not obtain the baseline B-line burden at EICU admission, so we could not assess dynamic changes of quantitative methods.

Conclusions

Six quantification methods of 8-point LUS protocol have similar discrimination, calibration, and clinical usefulness in patients with HF, but B-pattern scoring systems are shown to be more feasible and reproducible when compared with other methods. Moreover, prospective multicenter studies need to assess these results in internal medicine wards or outpatient clinic.