Introduction

Heart failure (HF) is a condition where the heart cannot, at a normal filling pressure, pump blood at a rate commensurate with the requirements of the metabolizing tissues [15]. HF affects approximately 5.7 million Americans, and about 670,000 new cases are diagnosed annually in the United States [6]. The estimated total cost of HF in the United States in 2010 was $39.2 billion, or 1–2 % of all health care expenditures [6].

Clinically, HF is a syndrome with typical symptoms (e.g., breathlessness and fatigue) and signs (e.g., elevated jugular venous pressure and pulmonary crackles). Patients with HF may have either reduced or preserved left ventricular ejection fraction (LVEF). The diagnosis of HF can be difficult since the clinical features of the condition are not always sensitive or specific. No gold standard investigation exists to diagnose HF.

The challenge of diagnosing HF emphasizes the importance of evaluating whether other investigations may help diagnose the condition. Furthermore, the characteristics of these other investigations should be examined for their prognostic utility and their usefulness in guiding HF therapy.

The natriuretic peptides, i.e., B-type natriuretic peptide (BNP) and N-terminal proBNP (NT-proBNP), may be useful to help with diagnosis, prognosis, and management of HF. BNP and NT-proBNP are secreted into the bloodstream by cardiac myocytes in response to increased ventricular wall stress, hypertrophy, and volume overload [7]. BNP and NT-proBNP levels are increased in persons with HF, and low levels rule out HF. Thus, these peptides have emerged as promising markers for HF [5, 8].

Many studies have evaluated the diagnostic characteristics of BNP and NT-proBNP. Study populations have included patients with acute decompensated HF who present to the emergency room or patients with symptoms and signs of HF who are evaluated by primary care physicians. These studies have examined the performance of BNP and NT-proBNP in patients with various comorbidities and at different cutpoints. However, questions about issues such as optimal cutpoints still persist regarding the diagnostic capability of BNP and NT-proBNP. Consequently, a systematic review is needed to better understand the diagnostic capability of BNP and NT-proBNP.

Assessment of prognosis is important to promote better counselling of HF patients with regard to future therapies, including cardiac transplantation. Research suggests that BNP and NT-proBNP may provide incremental prognostic information beyond what is available from the clinical data such as New York Heart Association (NYHA) class, LVEF, and comorbidities [912]. A systematic review is required to better understand whether BNP and NT-proBNP provide prognostic information for patients with acute decompensated HF and chronic stable HF.

The management of HF is essentially directed by an algorithm for medical therapy. Many times, patients are not fully optimized on therapy because clinicians believe, based on the clinical findings, that further optimization is unnecessary. This could result in under treatment for HF patients. Since BNP and NT-proBNP concentrations have been found to decrease with the escalation of therapy, sequential measurement of these markers may be a useful means of guiding HF treatment. To date, individual studies have not definitively demonstrated whether BNP or NT-proBNP test values can guide HF therapy. A systematic review of this issue would provide information to assess strategies to better optimize the management of HF patients.

The use of BNP or NT-proBNP in the diagnosis, prognosis, or treatment for HF requires knowledge of the variation in peptide levels over serial measurements. Currently, the evidence is uncertain concerning how much of a difference in BNP or NT-proBNP concentration is clinically important.

Given the many outstanding issues involved in using BNP and NT-proBNP for diagnosing, prognosticating, and treating HF, we conducted a systematic review of the literature to address six key questions:

  • Key Question 1: In patients presenting to the emergency department or urgent care facilities with signs or symptoms suggestive of heart failure (HF):

    • What is the test performance of BNP and NT-proBNP for HF?

    • What are the optimal decision cutpoints for BNP and NT-proBNP to diagnose and exclude HF?

    • What determinants affect the test performance of BNP and NT-proBNP (e.g., age, gender, and comorbidity)?

  • Key Question 2: In patients presenting to a primary care physician with risk factors, signs, or symptoms suggestive of HF:

    • What is the test performance of BNP and NT-proBNP for HF?

    • What are the optimal decision cutpoints for BNP and NT-proBNP to diagnose and exclude HF?

    • What determinants affect the test performance of BNP and NT-proBNP (e.g., age, gender, and comorbidity)?

  • Key Question 3: In HF populations, is BNP or NT-proBNP measured at admission, discharge, or change between admission and discharge, an independent predictor of morbidity and mortality outcomes?

  • Key Question 4: In HF populations, does BNP measured at admission, discharge, or change between admission and discharge, add incremental predictive information to established risk factors for morbidity and mortality outcomes?

  • Key Question 5: Is BNP or NT-proBNP measured in the community setting an independent predictor of morbidity and mortality outcomes in general populations?

  • Key Question 6: In patients with HF, does BNP-assisted therapy or intensified therapy compared to usual care, improve outcomes?

The Agency for Healthcare Research and Quality commissioned the systematic review, which is published as a report [13] and available at http://www.effectivehealthcare.ahrq.gov/ehc/products/328/1754/heart-failure-natriuretic-peptide-report-131119.pdf. The present series of articles distils the important findings of the report into separate manuscripts based on the key questions. This introductory article describes the methods common to the entire series of articles.

Methods

We searched six electronic databases: Medline, Embase, AMED, Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews, and CINAHL. Additionally, we examined three sources for gray literature: regulatory agency Web sites, clinical trial databases, and conference sources. See Online Resource 1 for specific search strategies. The search was restricted to human studies published in English between January 1989 and June 2012. We also searched the reference lists of systematic reviews, meta-analyses, and studies screened at the full-text level for other potentially relevant citations.

Study selection criteria were based on the Participants, Interventions, Comparisons, Outcomes, Time, and Setting (PICOTS) framework (see Online Resource 2). For key questions (KQs) 1 and 2, the only excluded study design was case reports. For KQ3–5, case reports, cross-sectional, and case–control studies were excluded; retrospective studies, randomized controlled trials (RCT), and other prospective studies were included, provided these studies were based on medical or database records that permitted the construction of historical cohort, before/after, or time series data. For KQ6, only RCTs were included.

Two raters used standardized forms to independently screen the titles and abstracts of all studies retrieved in the literature search. Disagreements were resolved through consensus or third-party adjudication. Potentially relevant studies, or studies whose relevance could not be ascertained at title and abstract screening, were promoted to full-text screening. At full-text screening, a single rater screened each article for inclusion.

Trained data extractors, using standardized forms and a reference guide, extracted relevant data from the included studies. Extracted data encompassed general study characteristics, details of the patient population, comorbidities, blood sample type for peptide measurement (plasma or serum), assay source (name), type of peptide (BNP, NT-proBNP, or both), and storage temperature of peptide (if applicable). Outcomes extracted were the type of instrument or scale, cutpoints, measure of effect (e.g., end point, change score, and measure of variance), and definition of treatment response. Outcome data were not extracted from studies reporting results in chart or graphical form only.

All included studies were summarized in narrative form and in summary tables. Primary study papers were considered for statistical analyses in the case of multiple publications of the same study cohort.

We assessed the risk of bias using QUADAS-2 [14] for diagnosis studies (KQ1–2), the Hayden criteria [15] for prognosis studies (KQ3–5), and the Jadad scale [16] for treatment RCTs (KQ6). One trained rater used a standardized form to assess the risk of bias for each study; a second rater verified the initial assessment. Inconsistencies were resolved through consensus or third-party adjudication.

We tailored the QUADAS-2 [14] and Hayden criteria [15] to meet the specifics of this review (see Online Resource 3). We also supplemented the Jadad scale [16] with four additional questions: adequacy of allocation concealment, use of intention-to-treat analysis, justification of sample size, and reporting of outliers.

Meta analysis was conducted for KQ1–2 only. We utilized information from the included studies to calculate sensitivities, specificities, positive and negative likelihood ratios, diagnostic odds ratios, and summary receiver operating characteristic curves. These measures were calculated across different cutpoints (i.e., manufacturer cutpoints, optimum cutpoints, and maximized sensitivity) and study settings (i.e., emergency department and primary care) for BNP and NT-proBNP separately. Analyses were stratified by assay type for BNP because four different assays were used in the included articles. Only a single assay was used for measuring NT-proBNP, so stratification was unnecessary in this case. Extracted data were pooled using exact binomial rendition [17] of a modified (for pooling diagnostic test data [18]) version of van Houwelingen’s bivariate mixed-effects regression model [19, 20]. Cochrane’s Q and I2 were used to assess statistical heterogeneity [21]. When heterogeneity was present, we re-pooled the data and reported results using a random-effects model. We used Deek’s method [22] to graphically and statistically investigate whether publication bias or other small study effects may have adversely affected the results. All statistical analyses were carried out using Stata/SE 12.0 for Windows (Stata Corporation) and the accompanying Meta Package [23].

We assessed the strength of evidence (SOE) for the outcomes of sensitivity and specificity (KQ1–2) and all-cause mortality (KQ6). For each outcome, we used established guidelines [24, 25] to rate SOE in four domains (i.e., risk of bias, consistency, directness, and precision). The overall SOE across all four domains was “high” if further research would be very unlikely to change our confidence in the estimate of effect, “moderate” if further research might change our confidence in the estimate of effect, and “low” if further research would be likely to change our confidence in the estimate of effect. Overall, SOE was “insufficient” when the evidence was unavailable or too scarce to permit one from drawing conclusions (e.g., only one included study evaluated an outcome).

We did not assess SOE for KQ3–5 because criteria to evaluate and score SOE for prognostic studies have not been fully developed [26].

We followed the Preferred Reporting Items of Systematic Reviews and Meta-analyses (PRISMA) guidelines [27] to report all components of the review.

Results

The literature search retrieved 25,864 citations from the six electronic databases and 35 additional citations from the gray literature (see Fig. 1). After duplicates were removed, 16,893 citations went through title and abstract screening; 3,616 citations (21 %) were promoted to full-text screening. Three hundred ten articles (9 %) passed full-text screening and were included in the review.

Fig. 1
figure 1

Article flows through title and abstract and full-text screening. *6 articles deal with two KQ groups. Three dealt with both diagnosis and prognosis [2830], and three dealt with both prognosis and treatment [3133]. 22 publications in KQ4 were selected from KQ3 publications and are not counted in the total number of prognosis articles

One hundred four articles were applied to the diagnostic accuracy section of the review: 76 of these articles addressed KQ1 and 28 addressed KQ2. For the prognosis section, 190 articles were relevant, with 183 eligible for KQ3, 22 for KQ4, and seven for KQ5. Nine articles were included in KQ6 to address treatment guided by BNP or NT-proBNP. Seven articles examined the biological variation of BNP and NT-proBNP in persons with and without HF. Readers may refer to the published report [13] for information on biological variation.

Discussion

Each paper in this series describes the characteristics of the articles included to address a specific key question. The papers also report, summarize, and discuss the evidence to answer the key questions. An overall summary of findings is presented below.

For persons presenting to emergency departments or urgent care settings with signs and symptoms of HF, BNP and NT-proBNP have good diagnostic performance to rule out, but lesser performance to rule in, HF compared to the reference standard of global assessment using patients’ medical records. Comorbidities, including age, renal function, and BMI [(BMI) body mass index for BNP only], have important effects on the performance of the tests. The studies do not agree on appropriate cutpoints.

Both BNP and NT-proBNP have good diagnostic performance in primary care settings to identify persons who are at risk of developing HF, or who have few symptoms and less severe signs of HF. Using manufacturers’ suggested cutpoints, BNP can effectively rule out the presence of HF in primary care settings. In the case of NT-proBNP, limited evidence is available to determine whether manufacturers’ suggested cutpoints are effective.

The published literature shows that BNP and NT-proBNP are associated with all-cause mortality and composite outcomes in both decompensated and chronic stable HF populations. Other mortality outcomes (e.g., cardiac and sudden cardiac) demonstrated less convincing associations in chronic stable populations, and were less often evaluated in decompensated populations. In six studies of patients undergoing resynchronization therapy, BNP and NT-proBNP were shown to be independent predictors of all-cause and cardiovascular mortality and morbidity.

In persons with decompensated HF, the literature search yielded limited yet consistent evidence that BNP and NT-proBNP added incremental value to other prognostic factors when predicting all-cause and cardiovascular mortality in the short (3 and 6 months) and longer terms (22 months to 6.8 years); the included studies did not evaluate morbidity or composite outcomes. No included studies assessed the incremental value of BNP in populations with chronic stable HF. NT-proBNP added incremental value to predicting all-cause mortality, cardiovascular mortality, and composite outcomes at 1- to 3-year intervals in chronic stable HF populations.

Studies involving the general population reported associations between NT-proBNP and morbidity (i.e., onset of HF or atrial fibrillation) and mortality (i.e., all cause, cardiovascular, and sudden cardiac). No included studies examined BNP in the general population.

Nine studies assessed the benefits of BNP- or NT-proBNP-guided therapy over usual care. Outcomes included all-cause mortality, hospitalizations, clinic visits, days alive, and quality of life. Results were equivocal, with some studies showing benefits and others showing no benefits. All-cause mortality, evaluated in seven studies, was lower in the groups receiving guided therapy; however, the results in only two of the seven studies were statistically significant.

Across all of the different topics in this series of research papers, we did not find evidence to suggest that BNP should be favored over NT-proBNP, or vice versa. We do note that no studies looked at the incremental value of BNP in populations with chronic stable HF and no studies examined the ability of BNP to serve as an independent predictor of morbidity and mortality in general populations.

Age tended to show positive associations with the concentrations of both peptides, while BMI and renal function showed negative associations. No statistically significant associations were apparent for sex and ethnicity. However, only a limited number of studies examined the potential confounding effects of these and other covariates. Future studies should be expressly designed and adequately powered to investigate the effects of age, sex, ethnicity, BMI, renal function, and comorbidities on BNP and NT-proBNP cutpoints. Researchers should agree on a standard set of covariates to be evaluated in future work, especially in nonrandomized studies, which form the bulk of published reports in this area.

Cutpoints tended to be variable. Often authors selected arbitrary cutpoints based on information from their own datasets (e.g., they established cutpoints using the median or mean peptide concentration values in their samples). Although values above the cutpoints indicated a greater likelihood of HF diagnosis, or poorer prognosis, the totality of evidence did not suggest an optimal cutpoint or BNP or NT-proBNP. Future research should aim to establish a common set of cutpoints.

Risk of bias was generally low in the included studies. The most problematic areas of bias concerned the studies’ failure to consider all of the confounders that we pre-specified as important (i.e., age, BMI, and renal function), as well as the studies’ reliance on the use of composite outcomes.

Although follow-up intervals were not part of the criteria [1416] that we used to assess the risk of bias, many of the included studies did not justify their selection of follow-up intervals. We recommend future studies establish clinically meaningful follow-up intervals. Furthermore, the included studies utilized a wide assortment of outcomes that diminished our ability to make generalizable inferences across articles. Researchers should standardize outcome assessment by specifying a set of mandatory outcomes to evaluate in future studies of BNP and NT-proBNP. Standardization should include uniform definitions and measures of these outcomes.

The assessment of strength of evidence suggests that future studies will be unlikely to change our findings with respect to the sensitivity of using BNP or NT-proBNP tests to diagnose HF in emergency room or primary care settings. However, further research may change the review’s findings with regard to the specificity of this testing. For BNP- or NT-proBNP-assisted therapy, the strength of evidence is low and future research may well change the findings of this review.

Although we did not assess the strength of evidence for the prognosis key questions, the findings consistently show that both peptides have prognostic ability. The literature lacks practical guidance on how to employ BNP or NT-proBNP for prognostic purposes; the development of clinical protocols is required in this area.

In conclusion, BNP and NT-proBNP are useful diagnostic clinical tools to exclude HF. They also have a strong association with prognosis in persons with HF, but the clinical utility of any potential prognostic ability has not yet been established. Further work is required to set cutpoints and develop protocols for the use of these peptides in standard clinical practice settings. Additional research is required to establish the utility of BNP- or NT-proBNP-guided treatment in HF.