Introduction

D-dimer testing has become commonplace in the workup of pulmonary embolism (PE) due to its very high sensitivity and negative predictive value, particularly when used for patients at low risk for PE [1]. Various diagnostic algorithms utilize clinical decision instruments to stratify patients with suspected PE into risk levels in order to facilitate the safe exclusion of PE by using d-dimer results. The most extensively evaluated clinical decision instruments are the Wells’ Score, Revised Geneva Score, and Simplified Revised Geneva Scores, which stratify patients into two (unlikely or likely) or three (low, moderate/intermediate, or high) risk levels [25].

In light of mixed results from previous studies, there remains considerable debate over which clinical decision instrument performs best and how to apply them. Some articles report better diagnostic accuracy when using the Wells’ Score compared to the Revised Geneva Score [68] or the Simplified Revised Geneva Score [9]. Others found no difference between clinical decision instrument performance [10, 11], or found that physician gestalt equaled [11] or even outperformed [9] clinical decision instruments. Meta-analyses typically report no difference in clinical decision instruments [1214], a slight advantage to the Wells’ Score [15], or no difference from physician gestalt (though gestalt performance was notably variable between individual clinicians) [12, 16]. Though there is some concern regarding the potential for poor inter-rater reliability of the Wells’ Score due to the inclusion of a subjective criterion (PE is the most likely diagnosis), recent research has proven otherwise [17, 18].

In its 2011 guideline, the American College of Emergency Physicians recommended the use of a negative d-dimer in conjunction with a “PE unlikely” result on the two-level Wells’ Score to safely exclude PE [19]. Other research goes further, suggesting that clinicians can safely exclude PE in patients with “non-high risk” (i.e., low or moderate) and a negative d-dimer [10, 12, 20]. If determined to be safe, this higher threshold would allow clinicians to exclude PE in many moderate-risk patients who would otherwise undergo cross-sectional imaging, incurring additional expense and radiation exposure, in the case of pulmonary computed tomography angiography (CTA) or nuclear medicine ventilation/perfusion exams. Considering that studies estimate that one fatal case of cancer is induced for every 2000 CT scans at a dose of 10 mSv [21], and that a recent report showed that the average dose of radiation from pulmonary CTA is 10.7 mSv [22], reducing dependence on CTA and its radiation exposure is of critical importance for patient safety. Moreover, the substantial increase in the number of CTs performed annually in the USA from 1980 to 2012 (from three to 85 million) [23, 24] has substantial public health implications.

Therefore, in this study, we aimed to quantify the negative predictive value of d-dimer testing for PE when used in conjunction with previously published cutoff thresholds of three clinical decision instruments (Wells’ Score, Revised Geneva Score, and Simplified Revised Geneva Score), in a retrospective cohort of patients who underwent pulmonary angiography (either MRA or CTA) to evaluate for PE. The analysis focused specifically on the ability of a negative d-dimer to exclude PE in patients with “non-high risk” for PE, as determined by these three clinical decision instruments. We hypothesized that the Wells’ Score would perform best among the clinical decision instruments, given its heavily weighted subjective criterion, which previous literature has suggested enhances sensitivity for diagnosing PE.

Methods

Study design and setting

This was a retrospective study of patients who were evaluated for possible PE at an academic medical center in the Midwestern United States. This was a sub-study of a parent study comparing the 6-month outcomes of patients following pulmonary CTA versus MRA for the diagnosis of PE. All components of this study were HIPAA-compliant and IRB-approved.

Selection of participants

All patients at our center undergoing pulmonary MRA for possible PE between April 1, 2008 and March 31, 2013 were identified through a database search of our electronic medical record. A randomly selected sex- and age-matched cohort of patients who underwent pulmonary CTA for possible PE during the same period was then identified. The combination of these two groups constituted our research population. Patients were enrolled only once, so if a patient had multiple imaging tests to rule out PE during the study period, only the initial scan was included. Patients were excluded if they were pregnant, were in atrial fibrillation at the time of the index scan, had a pre-existing inferior vena cava filter, or were on anticoagulation for at least 30 days preceding the index scan. In this analysis, patients were also excluded if the data necessary to calculate the clinical decision instruments were incomplete, the index scan was ordered outside of the emergency department, or if a d-dimer result was not obtained during their visit.

Data abstraction

Throughout the study period, our hospital’s lab used the HemosIL D-dimer HS 500 assay, which uses a clinical cutoff value of 500 ng/mL (Instrumentation Laboratory, Bedford, MA) [20]. We developed a protocol to abstract data from the electronic medical record of the identified patients and used a REDCap (Research Electronic Data Capture) data abstraction instrument [25]. We assessed all radiology, clinic, emergency department, inpatient, and telephone notes in the electronic medical record, searching for exclusionary criteria, presence of venous thromboembolism, and all clinical information needed to calculate the Wells’ Score, Revised Geneva Score, and Simplified Revised Geneva Score. The protocol also made use of an electronic medical record search function, which returned notes that included not only the search term, but also its identified synonyms as determined by the ontological reference used by the electronic medical record vendor (Epic Systems, Verona, WI). The clinical decision instruments and their risk stratifications are detailed in Table 1. The primary data abstractor trained on 80 cases, after which the protocol was refined. The training cases were then re-abstracted and two additional abstractors were trained. In the event of an uncertain outcome, data abstractors flagged cases, which were reviewed by an expert panel of three investigators (two radiologists and one emergency physician) and adjudicated according to consensus decision.

Table 1 Clinical decision instrument definitions—risk factor point values and risk stratification schemas for the Wells’ Score, Revised Geneva Score, and Simplified Revised Geneva Score

Our data abstraction instrument was tailored to the Wells’ Score. Regarding the Revised Geneva Score criterion of “surgery (under general anesthesia) or lower limb fracture in the past month,” we assessed whether the patient had surgery or immobility for at least 3 days in the last month. Immobility was defined as being bedridden or having a cast preventing lower extremity movement. Regarding the Wells’ Score criterion “clinical signs and symptoms of DVT,” we assessed if leg pain or swelling was reported. If the answer to this was yes, both the Revised Geneva Score and Simplified Revised Geneva Score criteria of “unilateral lower limb pain” and “pain on lower limb deep venous palpation and unilateral edema” were assumed to also be true. Our adjudication process involved many cases flagged for consideration of the field “clinical signs and symptoms,” during which bilateral and equal lower extremity edema was not considered to meet this criterion. Finally, the Wells’ Score criterion “PE is most likely diagnosis” was assumed to be true since all patients were deemed at high enough risk to warrant cross-section imaging (the primary inclusion criterion for patient selection).

Outcomes

Our primary outcome was the presence or absence of venous thromboembolism (VTE) at 6 months from the index ED visit. If imaging test results were reported as equivocal, they were considered negative for PE for the purposes of this study. However, in order to account for the possibility of a missed PE on the index scan, we assessed all subsequent clinical notes in the electronic medical record during the study period to ascertain if there was a diagnosis of PE or deep venous thrombosis (DVT). If PE or DVT was diagnosed within 6 months of the index scan, the patient was considered positive for PE.

Analysis

Test characteristics for each clinical decision instrument, using previously reported threshold values for the two- or three-level scoring systems, are presented as point estimates with exact 95% Clopper-Pearson confidence intervals. Receiver operator characteristic (ROC) curves were drawn and areas under the curves (AUCs) were calculated using the pROC package in R statistical software [26, 27]. Tests of significance were conducted using Fisher’s exact test or the Student’s t test, as appropriate. A p value of less than 0.05 was considered to be statistically significant.

Results

Characteristics of study participants

Overall, 1294 patients were identified, of whom 121 were excluded: 87 were on anticoagulation, 27 were in atrial fibrillation, 18 had an inferior vena cava filter, and 8 were pregnant (some patients met multiple exclusionary criteria). Patient flow through the study and patient characteristics are summarized in Fig. 1. An additional 105 patients had incomplete documentation of the clinical data required for clinical decision instrument calculation: 18 patients did not have a note to accompany the index scan because they were referred to our center for imaging from nearby, unaffiliated clinics, and the remainder did not have a heart rate documented. This left 1068 patients: 724 were seen in the emergency department, of whom 459 had a d-dimer test performed. These 459 patients constitute the study population.

Fig. 1
figure 1

Patient flow and characteristics. Flow of patients through the study, including available characteristics of groups excluded from the final analysis, and summary information for those included in the final analysis. SD standard deviation, DVT deep venous thrombosis, ED emergency department, PE pulmonary embolism, VTE venous thromboembolism

Main results

In our research cohort, 41 of 459 (8.9%) patients had PE. D-dimer results were negative in 76 of 459 (16.6%) patients, none of whom suffered PE during the 6-month follow-up interval. Consequently, there were no differences between clinical decision instrument performances; each clinical decision instrument plus d-dimer result had 100% sensitivity and negative predictive value. By extension, the risk categorization was irrelevant if the d-dimer was negative (a negative result excluded PE regardless of risk category). Specificities increased as the threshold for a positive test increased. Table 2 displays the test characteristics of each clinical decision instrument when used in conjunction with d-dimer results at various, previously reported thresholds.

Table 2 Selected test characteristics—sensitivity, specificity, and negative predictive value of three clinical decision instruments at previously published thresholds, when used in conjunction with d-dimer testing

Utility of clinical decision instruments alone

When evaluated independently of d-dimer results, each clinical decision instrument performed poorly. ROC analysis revealed AUCs (95% CI) of 0.55 (0.46–0.63) for the Wells’ Score, 0.53 (0.43–0.63) for the Revised Geneva Score, and 0.54 (0.45–0.63) for the Simplified Revised Geneva Score. ROC results are presented graphically with AUCs in Fig. 2.

Fig. 2
figure 2

Receiver operator characteristic curves. Graph of sensitivity and 1-specificity at every threshold for the Wells’ Score, Revised Geneva Score, and Simplified Revised Geneva Score, independent of d-dimer testing results. AUC area under the curve

PE prevalence by risk stratum

PE prevalence in each risk stratum is presented in Table 3. The 3-level Wells’ Score “high risk” group had a significantly higher prevalence of PE than the “moderate risk” group (23.8% and 8.2%, p = 0.03). Similarly, the 2-level Revised Geneva Score “PE likely” group had a higher PE prevalence than the “PE unlikely” group at the p = 0.05 level (16.1 and 7.8%). No other clinical decision instrument threshold stratified patients into groups with significantly different PE prevalence.

Table 3 Pulmonary embolism prevalence by risk stratum—breakdown of pulmonary embolism prevalence in each risk stratum as assessed by the Wells’ Score, Revised Geneva Score, and Simplified Revised Geneva Score

Discussion

In this study, we evaluated the negative predictive values of three clinical decision instruments when used in conjunction with d-dimer testing in a retrospective cohort of patients who underwent cross-sectional imaging (MRA or CTA) for the evaluation of PE. Though we are not the first to suggest that patients at both the low and moderate risk levels may have PE safely excluded with a negative d-dimer, our results do lend further support for this notion [10, 12, 20]. In particular, we found no difference between PE prevalence when comparing the low and moderate risk patients when using the Wells’ and Revised Geneva Scores. Interestingly, a negative d-dimer test alone had 100% sensitivity and negative predictive value for PE in our cohort, regardless of patient risk stratification (i.e., even for high risk patients).

It is worth noting that at least one meta-analysis has identified a publication bias for the 3-level Wells’ Score: studies were more likely to be published if they showed an important contrast between the low and other risk levels [13]. Most importantly, none of the 76 patients with a negative d-dimer had PE. This is bolstered by the findings of Carrier et al., which demonstrated extraordinarily low risk (0.41%) of venous thromboembolism at 3 months following a negative d-dimer result in “non-high risk” patients. Moreover, Legnani et al. also demonstrated 100% sensitivity and negative predictive value using the same d-dimer assay as is used in our lab [20].

Though we hypothesized the Wells’ Score would outperform the Revised Geneva and Simplified Revised Geneva Scores due to the inclusion of a subjective, gestalt criterion (PE is the most likely diagnosis), we found no difference in the negative predictive values of the three clinical decision instruments evaluated in our study. However, we did find that the Wells’ Score stratified patients into risk categories better than the other two scores, when assessed by the difference in the prevalence of PE in each category. Performance of the Revised Geneva Score in this regard was borderline significant as our calculated p value was at the threshold of 0.05. Considering that only 7 of 459 patients were stratified into the high risk group, we were underpowered for detecting a difference. Conversely, the Simplified Revised Geneva Score did not assist in stratifying patients’ risk of PE when using the 2-level system. This has not been previously reported, though may be artifactual due to low numbers of patients stratified to the unlikely category when compared with the Well’s and Revised Geneva Scores as well as assumptions required when using chart review methodology. Of note, PERC scores (Pulmonary Embolism Rule-out Criteria) were not assessed in this study since every patient in our cohort underwent imaging as part of the inclusion criteria, suggesting that the treatment team in the emergency department was significantly concerned for PE, a cohort for whom PERC does not apply.

When analyzed independently of d-dimer testing, the areas under the ROC curves of each clinical decision instrument were similar and demonstrated poor performance in predicting PE. However, it should be noted that AUC is affected by both sensitivity and specificity at each possible threshold. This is problematic for the clinical decision instruments that we included, which are designed only to rule out PE rather than serve as a definitive diagnostic test. Perhaps more clinically relevant is how well each instrument actually risk stratified patients with regard to the prevalence of PE, as discussed above.

Our study was unique for several reasons. First, every patient had cross-sectional imaging for the detection of PE performed during their index ED encounter. This differs from other studies on this subject, which often forego imaging in low risk patients. While we would argue that this is clinically appropriate, it does present a potential overrepresentation of PE prevalence in the high and moderate risk groups as compared to those without imaging or other additional workup. Furthermore, our use of 6-month venous thromboembolism events as a proxy for false negative index imaging results is novel when compared with other studies of clinical decision instruments. Since the test characteristics of CTA are known to be imperfect, we feel that this is an important aspect of our study, lending further validation to the use of d-dimer results. Finally, most studies on the topic have not assessed how well a negative d-dimer result performs in “non-high risk” patients. Our results suggest that a negative d-dimer result in this group effectively rules out PE.

Our study has several limitations. Perhaps most importantly, as a result of our retrospective design, we made the decision to assume that PE was the most likely diagnosis in every case, resulting in no stratification of patients into the 3-level Wells’ Score low risk group (“PE is most likely diagnosis” is worth three points; low risk is defined as less than two points). This was acceptable to us given the inclusion criterion of a pulmonary CTA or MRA—the physician’s determination of PE likelihood had already surpassed the threshold to order advanced imaging. Secondly, regarding the Revised Geneva Score criterion of “surgery (under general anesthesia) or lower limb fracture in past month,” we assessed for surgery or immobility for at least 3 days in the last month instead of strictly requiring a fracture to be mentioned on chart review. Regarding the Wells’ criterion “clinical signs and symptoms of DVT,” we assessed if clinical symptoms (leg pain or swelling) were present. If the answer to this was yes, both the Revised and Simplified Revised Geneva criteria of “unilateral lower limb pain” and “pain on lower limb deep venous palpation and unilateral edema” were assumed to also be true. This may have limited the ability of the Geneva Scores to rule out some patients with one, but not both, of these criteria. However, our adjudication process involved many cases flagged for deliberation of the field “clinical signs and symptoms of DVT.”

In conclusion, our findings suggest that with a negative d-dimer result, clinicians may safely exclude PE in “non-high risk” patients, especially as assessed by either the Wells’ Score or the Revised Geneva Score. Using this paradigm for d-dimer testing would increase the number of patients for whom PE could be excluded without the use of cross-sectional imaging. This is important not only due to the high costs of cross-sectional imaging, but also the amount of ionizing radiation to which patients are exposed with CT scans for PE, and by extension, the frequency of possible radiation-induced malignancies.