Introduction

Biomarkers are used in various fields of medicine for screening and diagnosis, including prognostic or risk stratification [1,2,3,4,5]. Their usefulness and accuracy is dependent of intrinsic characteristics of the test (sensitivity, specificity) but also of the context in which the test is employed, as disease incidence influences extrinsic performance (positive and negative predictive value) [6]. Standards for Reporting of Diagnostic Accuracy Studies (STARD) guidelines, aiming to homogenize reporting of diagnostic performance assessment, underline the need to adequately assess studied population and to report extrinsic performance of tests [7]. Hence, impact of changes in pre-test probability over diagnostic test extrinsic performance is a well described mathematic correlation [8,9,10]. Nevertheless, pre-test probability may be difficult to assess [11], changes in post-test probability according to this later difficult to estimate, basic concepts of diagnostic test performance are poorly understood by medical students [10, 12], and usual indices of intrinsic performance may be misleadingly reassuring [13]. As a consequence, pre-test probability has been found to be taken into account infrequently by physicians when interpreting diagnostic test results [14, 15].

Alternative presentation of diagnostic test performance, including visual aids according to frequency has been advocated by some authors [10, 13]. Pneumocystis pneumonia in non-HIV patients might be suitable to reassess influence of pre-test probability on diagnostic test performance for several reasons: The disease is severe, and associated with a high morbidity and mortality [16,17,18]. The incidence of the disease is limited although context, clinical presentation, and radiological patterns may significantly changes pre-test probability [17, 19]. Last, available diagnostic tests, namely quantitative polymerase chain reaction (PCR) and β-D Glucan (BDG), have been described to have a good to very good intrinsic performance [20,21,22,23]. These tests are however quantitative, suggesting that intrinsic test performance may vary according to degree of positivity and adding further complexity in test interpretation [20,21,22,23]. These conditions not only underline needs for visual description for extrinsic test performance and potential clinical implications of findings, but also suggest that expert statements may be needed to help physician on daily basis.

The objective of this study was to assess incidence of Pneumocystis pneumonia in the general population of critically ill immunocompromised patients with acute respiratory failure, to detect subgroups of specific risk and to depict post-test probability of Pneumocystis pneumonia according to PCR and BDG tests in this setting.

Methods

Study population

In way to assess Pneumocystis pneumonia incidence, two distinct prospectively collected datasets were used.

The first set was the TRIALOH study dataset [24]. Patients were prospectively included from 2010 to 2012. The study was carried out in 17 university or university-affiliated centers in France and Belgium that belonged to a research network instituted in 2005. In all 17 centers, a senior intensivist and a senior hematologist were available around the clock and make ICU-admission decisions together. The appropriate ethics committees approved this study [24]. In this set, the attending physician assessed occurrence of acute respiratory failure prospectively and three independent experts reviewed all etiological diagnoses of acute respiratory failure. Since this study focuses on hematological patients with various reason for ICU admission, and to avoid artificially decreasing incidence of Pneumocystis pneumonia, only patients with Acute Respiratory Failure as the main reason for ICU admission were included in the current study.

The EFRAIM study was a multinational, observational prospective cohort study performed from Nov 2015 to July 2016 [25]. Investigators were critical care physicians from 16 countries with extensive experience in the management of various cohorts of critically ill immunocompromised patients. Participating providers obtained institutional review board (IRB) approval from their institutions in accordance with local ethics regulations. Only adult patients with acute respiratory failure, based upon predefined criterion were included in this study. All etiological diagnoses were reviewed by two study investigators for coherence and for alignment with established definitions [25].

Definitions

Pneumocystis pneumonia

In both studies, etiologies of pulmonary involvement were diagnosed based on predefined criteria [26]. These criteria included type of immune defect, time between onset of the disease and ICU admission, radiologic presentation, and microbiological tests including direct search for Pneumocystis (direct examination with Gomorri-Grocott stain, immunostain, or PCR pneumocystis), and clinical course of patients. Beta-D-Glucan was uncommonly used in participating centers during both studies periods. For both included studies, study investigators reviewed a posteriori every diagnoses for coherence and alignment with established definitions (EFRAIM, TRIALOH).

Ground glass opacities were defined as any degree of ground glass opacities on CT-scan.

Anti-Pneumocystis prophylaxis was based on patients’ prescription before ICU admission without any regard to adherence.

Lymphoid hematological malignancy was defined as any acute or chronic underlying hematological malignancy including acute lymphoid leukemia, non-Hodgkin’s lymphoma, and chronic lymphoid leukemia.

Hematopoietic Stem Cell Transplantation (HSCT) was defined by any allogeneic or autologous stem cell transplantation independently of the conditioning protocol, origin or compatibility of donor cells, and without regard for delay since HSCT transplantation.

Estimation of intrinsic performances of PCR and BD glucan by systematic literature review

A systematic review was performed on MEDLINE database using “Pneumocystis pneumonia (MeSH)” AND “sensitivity and specificity (MeSH)” AND/OR “(1–3)-β-D-Glucan” AND/OR “PCR” NOT “HIV (MeSH)”. Estimation of diagnostic test performance was validated by three authors (LC, VL, MD).

Experts’ priors

Physicians’ priors were assessed before and after study results presentation. To do so, a standardized questionnaire assessing perception of priors with regard to incidence and post-test probability were obtained using visual analogic scale ranging from 0 to 100. Some data regarding experts’ characteristics were concomitantly obtained. Priors were searched for during a meeting of our research group (Groupe de Recherche en Réanimation Respiratoire et Onco-Hematologique). These meeting are held three time a year, contain both didactic presentations and presentation of study results focused on critically-ill cancer patients, with an attendance ranging from 50 to 100 intensivists. Responders were defined as being an expert if they presented on symposia or published in this field.

Statistical analysis

Incidence and 95% confidence interval were computed by normal approximation in the general immunocompromised population with ARF.

To assess incidence and relevant clusters, a supervised tree partitioning was performed. Incidence and 95% confidence interval were computed in subgroup of interest identified by tree partitioning. Variables included in the supervised tree partitioning were age, gender, underlying immune defect, SOFA score, presence of ground glass opacities and preexisting prophylaxis.

Incidence and pre-test probability were then simulated in 5000 samples for the total cohort and subgroups identified by tree partitioning. These populations were simulated using a normal distribution centered on observed incidence and its confidence interval.

Diagnostic performance was modelized according to observed intrinsic performance, assuming binary results (positive vs. negative) and disregarding change in performance which may arise from quantitative analysis of either test or sample in which PCR is performed. Sensitivity and specificity were computed assuming a 0.5% uncertainty, following a continuous distribution and according to three hypotheses being, respectively:

Intermediate diagnostic performance: both sensitivity and specificity centered around the median observed performance for both PCR and BDG;

High sensitivity (weakly positive test): sensitivity was centered to the highest range of CI and specificity to the lowest range of the observed 95% confidence interval.

High specificity (highly positive test): sensitivity was centered to the lowest range of the observed CI and specificity to the lowest range of CI.

Post-test probability was computed using Bayes theorem and according for incidence and computed diagnostic test performance for the total cohort and the identified subgroups. Results are reported as median (IQR) and binary plot comparing pretest probability and post-test probability. In these plots, at a given pre-test probability, variability post-test probability reflect uncertainty regarding diagnostic test performance, and range of post-test probability uncertainty regarding true disease incidence. Post-test probability of successive concordant or discordant PCR and BDG was assessed assuming complete conditional independence of tests.

Last, in way to assess the influence of findings on physicians’ priors, densities of priors and difference between priors and findings were plotted before and after presentation of the results.

Analyses were performed using R software version 4.3.4 (R Project for Statistical Computing, Wien, Austria), rpart and infer packages.

Results

Study population and pneumocystis pneumonia incidence

Of the 2622 critically ill immunocompromised patients included in the considered dataset, 2243 had an acute respiratory failure and were ultimately included in the current analysis. Median age was 62 years (51–70) and 1336 were of male gender (59.6%). The most common underlying malignancy were solid tumor in 774 patients (34.5%), acute myeloid leukemia in 382, non-Hodgkin’s Lymphoma in 358 patients (17.0%), Hodgkin’s lymphoma in 168 (7.5%), Myeloma in 121 (5.4%), and acute lymphoid leukemia in 89 (4.0%). 152 patients were allogeneic stem cell transplant recipients (6.8%) and 206 patients underwent autologous stem cell transplantation (9.2%). 378 patients were receiving anti-Pneumocystis prophylaxis at ICU admission (16.9%). Overall, 92 patients were considered having high probability Pneumocystis pneumonia (incidence 4.1%; 95% CI 3.8–4.4).

Supervised tree partitioning identified four distinct subgroups of Pneumocystis pneumonia risk according to presence of (a) ground glass opacities at CT-scan; (b) anti-Pneumocystis prophylaxis at ICU admission, and (c) lymphoid underlying malignancy or previous hematopoietic stem cell transplantation (Fig. 1).

Fig. 1
figure 1

Tree reporting main clusters as regard to Pneumocystis pneumonia in the analyzed datasets (n = 2243)

The observed Pneumocystis pneumonia incidences varied from 2.0% (95%CI 1.8–2.2) in patients without ground glass opacities to 20.3% (95% 18.3–22.2) in patients without prophylaxis, with underlying lymphoid leukemia or previous stem cell transplantation and with presence of ground glass opacities.

Diagnostic test accuracy

After careful analysis of the literature, diagnostic test performance were set in accordance with systematic reviews reporting PCR and BDG test accuracy [21, 27]. For Pneumocystis PCR, we considered 98.3% sensitivity (95%CI 91.3–99.7) and 94.8% specificity (95% 90.8–97.1). For BDG, we considered 91.0% sensitivity (95%CI 82.7–95.5) and 86.3% specificity (95%CI 81.7–89.9). Cut-off of performance set for subsequent analyses are reported in table S1.

Simulated incidence and post-test probability

The simulated incidence of of Pneumocystis pneumonia is reported in Tables 1 and 2.

For intermediate diagnostic test performance, post-test probability if test was positive was 33% (95%CI 31.1–34.8) and 22.9% (95%CI 21.5–24.3) for PCR and BDG respectively (Figs. 1 and 2; Tables 1 and 2). Post-test probability if test was negative was 0.1% (95%CI 0.09–0.12) and 0.23% (95%CI 0.21–0.25) for PCR and BDG respectively (Figs. 1 and 2; Tables 1 and 2).

Fig. 2
figure 2

Relationship between incidence and post-test probability in the general population of immunocompromised patients with ARF and according to PCR result (positive = red, negative = blue). Diagnostic test performances are ranged from highly sensitive (light color) to sensitive test (dark color)

Table 1 Post-test probability of positive and negative PCR in non HIV critically ill patients with respiratory failure for the overall population and for the different subgroups of incidence
Table 2 Post-test probability of positive and negative BDG in non HIV critically ill patients with respiratory failure for the overall population and for the different subgroups of incidence

Performances according to risks clusters and test performance are depicted in Tables 1 and 2. In the highest risk group, post-test probability if test found positive ranged from 60.0 to 82.2% for PCR (Table 1). Similarly, post-test probability if test found positive ranged from 58.0 to 68.6% for BDG (Table 2).

Figures S1 and S2 report post-test probability according PCR results and subgroups. Figures S3 and S4 report post-test probability according PCR results and subgroups.

Figure S5 reports post-test probability after successive PCR and BDG testing assuming complete conditional independence of both tests.

Intensivists’ priors

Overall, 25 physicians were interviewed before and after presentation of our results. Of them 5 had published a manuscript on Pneumocystis pneumonia or had been invited as speaker to our research group meeting and were considered experts. Median age of responders was 42 years (34–49). One expert reported conflict of interest related to Pneumocystis pneumonia.

Perception of Pneumocystis pneumonia and post-test probability of having the disease according to PCR and BDG were assessed in the general population of non-HIV immunocompromised patients with ARF (fig S6) and in the highest risk subgroup (fig S7). Before results presentation, perception systematically overestimated incidence in the general immunocompromised patients(+ 16% [IQR 6–16%]), in high risk subgroups and in post-test probability with both tests (+ 22% for PCR [IQR 7–37%] and + 27% for BDG [IQR 7–37%]; fig S5 and S6). This overestimation was found in both experts and non-experts (fig S8 and S9). Study results presentation resulted in decreased overestimation for incidence (+ 2% [IQR 1–6%]) and post-test probability with both tests (-3% for PCR [IQR − 10- -2%] and − 3% for BDG [IQR − 5–14%]. This decreased overestimation was observed in both experts and non-experts (Fig S5 to S8).

Discussion

Our study is the first to the best of our knowledge to assess incidence of Pneumocystis in way to delineate post-test probability of disease in identified subgroups. According to our results, despite assumption of a high sensitivity and specificity, positive post-test probability of Pneumocystis pneumonia is limited after positive PCR or BDG test as consequences of the limited incidence of the disease in immunocompromised patients. We propose a visual representation (Figs. 2 and 3) that may help physician to appreciate interaction between observed incidence and intrinsic diagnostic test performance at bedside while taking into account incidence and diagnostic test performance uncertainties.

Fig. 3
figure 3

Relationship between incidence and post-test probability in the high-risk population of immunocompromised patients with ARF and according to PCR result (positive = red, negative = blue). Diagnostic test performance are ranged from highly sensitive (light color) to sensitive test (dark color)

In line with previous studies, the incidence of Pneumocystis pneumonia is low in the population of critically ill immunocompromised patients admitted for an acute respiratory failure [17, 25, 28]. In line with previous studies, identified risk factors of Pneumocystis pneumonia were preexisting lymphoid disease or stem cell transplantation, ground glass opacities at CT scan and lack of Pneumocystis prophylaxis [17, 29]. In this large cohort of patients suggest incidence ranging from 2% (1.4-2.8%) in lowest risk subgroup to 20.2% (17.8-27.8%) in the high risk subgroup.

Not surprisingly, even assuming excellent intrinsic performance of Pneumocystis PCR and BDG, our results underline that the poor positive post-test probability resulting from the low incidence translates into high risk of false positive diagnosis and, more importantly, failure to identify culprit of the ARF with potentially negative consequences [19, 25, 30].

Interestingly, although arising from simulation data, our attempt to depict graphically relationship between incidence in risk subgroup and post-test probability translated into dramatic change in perception of disease incidence, diagnostic performance of the test in this setting and ultimately perception of potential significance of diagnostic test [12, 31]. Previous studies underlined frequent physicians’ misinterpretation of disease incidence [32, 33], frequent misperception and overestimation of diagnostic test performance [10, 34, 35] and most importantly a limited comprehension of pre-test/post-test relationship and its implication at bedside [10, 34, 35]. Thus, when facing various hypotheses of disease incidence, previous studies suggested a lack of perception of changes in post-test probability by physicians [14]. In addition, previous studies suggested misperception to persist even when diagnostic test performance were described as likelihood ratios rather than as sensitivity and specificities, these later being independent of disease incidence 13. Thus, previous studies suggested use of natural frequency of disease to allow better perception of diagnostic test performance [10], short course of Bayesian reasoning to partly improved this perception [12] and, best of all, graphical representation to allow improved understanding of diagnostic test performance (32). Although these results may deserve to be confirmed, our depiction of the pre-test / post-test relationship in various subgroups may help in depicting accurately a known and mathematical relationship, may help physician in apprehending input of positive or negative test in various subgroup or clinical scenarios, while accounting for uncertainty both in term of diagnostic test performance and in disease incidence.

This study has several limitations. First, we aimed in assessing incidence and depicting incidence/post-test probability in various predefined subgroups. Although this approach may make sense at a population level, it disregards fact that for a given patient, pre-test probability ranges from 0 to 1 and cannot be limited to the incidence of the disease. Hence, several subgroup of interest were not tested including stratification according to duration of symptoms before ICU admission, type of anti-Pneumocystis prophylaxis, or specific symptoms of ARF. In this line, solid organ recipients were under-represented which may have influenced our results and results of the high-risk group may have been modified by emerging targeted therapies. Our results may however help in identifying subgroup of patients in whom tests may be useless or at least should be interpreted only if negative. This limit suggests additional studies in specific subgroups may be required to refine our results and improve overall view of pre-test probability. Last, although we tried to depict impact of successive test, these results are probably misleading. Hence, both test are likely to be highly correlated, and covariance of both test has never been assessed to the best of our knowledge. As consequences, performance of concordant PCR and BDG in our study probably overestimate post-test probability and dedicated studies are needed.

Moreover, diagnoses of Pneumocystis pneumonia in the initial dataset were confirmed in most cases using either PCR and/or BDG test, in specific setting and after expert validation. Although this could have impaired assessment of diagnostic test performance, we only used these data to set range of incidence and not to validate intrinsic test performance of the test, this limit being unlikely to have influenced our findings. Furthermore, we lack validated gold standard in confirming Pneumocystis pneumonia. Therefore, incidence in the study population may have been overestimated. This bias however strengthens our findings, lower incidence translating into lower positive post-test probability. Last, although we validated our results when compared to experts’ priors, no validation against existing standards was performed. Thus, whether our visual representation may perform better than classical Fagan’s nomogram with usual pre-test probability range underlined (example given as figure S9) may deserve to be assessed in future studies.

Conclusion

In this study we hypothesized and validated that despite excellent intrinsic performance, both Pneumocystis PCR and BDG displayed a limited positive predictive value in critically ill immunocompromised patients with acute respiratory failure. This analysis underlines need for adequate pre-test estimation of probability of Pneumocystis pneumonia to allow interpretation of laboratory tests results. We display a visual representation that may help physician to understand influence of observed incidence on post-test probability of a disease and be a first step implement clinical vignette to underline case-scenario in which these tests might be relevant.