Introduction

The clinical presentation of pulmonary embolism (PE) ranges from no symptoms to sudden death [1]. Acute dyspnea, chest pain, syncope and palpitations are the most frequent presenting symptoms of PE, representing more than 96% of PE presentations [2]. Unfortunately, most cardiopulmonary diseases share at least one of these symptoms, making the differential diagnosis a daily clinical challenge, largely dependent on a physician’s ability and experience.

Advances in diagnostic technology are useless when the presence of PE is not considered, as different autopsy studies have proven [3, 4]. Moreover, the increasing request of computed tomographic pulmonary angiography (CTPA) in the emergency departments (EDs) has improved both PE detection and false positive results [57]. Traditional diagnostic algorithms for PE are limited: starting from the clinical possibility, they simply confirm or exclude PE, without helping physicians in the differential diagnosis [4]. Thus, while several large and well-designed prospective studies have shown the high accuracy of different decision algorithms [8, 9], few doctors use them, probably because more than one diagnostic hypothesis is investigated at the same time [10, 11].

An Italian multidisciplinary collaboration was established to develop an expert system for assisting physicians in the differential diagnosis of PE. In order to develop and validate the system, we organised a data collection focused on patients referred to the ED complaining of least one of the following symptoms: acute dyspnea, chest pain, syncope or palpitations. The present study aims to describe the clinical presentations of these patients, what diagnostic tests were performed, and to investigate whether these tests were properly requested.

Methods

Study design, setting and population

Six Italian hospitals, three academic and three non-academic, provided a total of 17,497 electronic admission records of patients referred to the ED, and then hospitalised from January to June 2007. A block random sampling design was applied to select a sufficiently well represented range of clinical records. The randomisation blocks were defined by the combination of age categories (less than 30, between 30 and 60, over 60 years of age), gender, and the four investigated symptoms, independently identified by keywords referring to “acute dyspnea”, “chest pain”, “fainting”, and “palpitations”. To decrease the chance of missing some rare but life-threatening conditions, keywords referring to “pulmonary embolism”, “aortic dissection”, and “pneumothorax” were used for the last block.

A total of 192 combinations were identified. Six were excluded because patients had neither the four selected symptoms of PE, nor the three rare diseases. A sample size of 800 medical records was estimated sufficient to obtain at least 4 cases in each of the 186 combinations. All patients hospitalised for trauma were also excluded (see Fig. 1).

Fig. 1
figure 1

Randomisation flow chart

In order to make our estimates representative of the original population, direct standardisation was applied to all of them [12]. The weighting distribution was derived from the block-specific rates of the population of all 17,497 patients admitted to the participating centres. Events that were too rare to be precisely estimated in their occurrence (when the 95% confidence interval [CI] around the expected prevalence was greater than 20%) were grouped in the “other” category.

Data collection

In each centre, a trained physician performed data collection on 161 pre-specified variables, adding a qualitative comment on final diagnosis. A single investigator (DL) reviewed all the cases, and verified if the final diagnosis was supported by pre-specified objective criteria. Discordances between data or uncertain final diagnoses were solved by discussion with data collectors. All the original diagnostic conclusions were coded to allow an estimate of the prevalence of acute disorders leading to hospitalisation. The Institutional Review Board of the hospital of Piacenza approved the study protocol on July 2007, and, subsequently, the Institutional Review Board of each hospital approved the protocol.

Outcomes and statistical analysis

The most common combinations of the four presenting symptoms of PE were identified, and, for each of them, the prevalence of acute disorders was computed. Within each presenting pattern, diagnostic procedures were classified and ranked according to two distinctive features: the actual preference of treating physicians, that is, the prevalence of their use, and their overall accuracy.

Diagnostic accuracy is usually derived from sensitivity and specificity of a single test for a single disease. However, an estimate of multipurpose procedure accuracy raises two distinct problems: (1) how to compute the accuracy by taking into account information already obtained by the application of previous tests; (2) how to compute an overall measure of accuracy by taking into account procedures encompassing several tests, each of them discriminating among multiple rather than two simple alternative hypotheses (like disease’s presence or absence).

To tackle both issues, a procedure was considered as a collection of tests, each test being designed for one disease, while it may provide additional information about other diseases.

To solve the first problem, procedures and their tests were classified into two main categories, namely ‘routine’ and ‘advanced’. Chest X-ray study, electrocardiogram (ECG) and peripheral blood count were defined as routine procedures, as they were requested in almost all patients complaining of cardiopulmonary symptoms in the EDs. The sensitivity and specificity of an advanced test can be corrected by subtracting the probability of having the result of an advanced test being replicated by a routine test. Such probabilities are easy to calculate if replications are deemed independent, as they derive from the product of the two tests’ sensitivities or specificities. Sensitivities and specificities of interest were systematically retrieved from the published literature [13, 14]. For the present analysis, diagnostic figures were updated to January 2010.

The second problem is to compute a procedure’s accuracy, herein performed into two distinct steps. For the overall accuracy of a diagnostic test towards multiple disease hypotheses, Obuchowski proposes the weighted mean of the estimated test accuracy for a single disease, using the prevalence of diseases for weighting [15]. Next, accuracy of the procedure can be obtained by averaging accuracies of all its tests. The latter operation is consistent with independence among the different test results, and provides a measure that we called the Average Procedure Accuracy (APA). For example, arterial O2 and CO2 partial pressures can be regarded as two tests of one procedure, i.e. blood gas-analysis, both addressed to several diseases, including PE and Chronic Obstructive Pulmonary Disease (COPD) exacerbation. In the APA calculation of the procedure, both the impact of the combined values of hypoxia with hypercapnia, and hypoxia with hypocapnia when the possibilities of COPD exacerbation and PE are taken into account.

The APA was taken as a measure of the diagnostic utility, and the rate of application as a measure of current physician preference of each procedure. Since the rate of application was estimated with different sample sizes, one procedure was deemed preferred to another procedure only when their 95% CI were not overlapping. Finally, in the light of the ranking order of procedures obtained with APA and rate of application, respectively, a three-level semiquantitative classification defines each procedure ‘under, ‘as’ or ‘over’ requested than theoretically expected within each most common clinical presentation.

Results

The sample of 800 clinical records was reduced to 750 by eliminating patients admitted because of trauma. The median age of the overall population was 74 years (range, 15–100), and female-to-male ratio was 0.9. Eighty percent of the patients had at least one concomitant chronic disease, in particular heart disease (66%) and COPD (33%). Other baseline characteristics are described in Table 1, where they are also distinguished according to the hospital type (academic vs. non-academic). Isolated dyspnea was the most frequent clinical presentation (39.7% of patients), followed by isolated syncope (14.4%), chest pain with dyspnea (11.9%) or chest pain without dyspnea (11.4%), and palpitations alone (4.4%) or palpitations associated with dyspnea (5.8%). The remaining 12.4% consisted of a hodgepodge of cardiopulmonary symptoms. The clinical presentations are all well represented in both academic and non-academic hospitals (see Table 2).

Table 1 Baseline clinical characteristics
Table 2 Clinical case mix according to hospital’s type

Table 3 shows the proportion of diseases underlying the most common clinical presentations. Heart failure (HF), pneumonia and acute exacerbation of COPD were the most likely diagnoses in patients with dyspnea, and in patients with isolated syncope. Pulmonary edema was common in patients with isolated dyspnea (12%), while acute myocardial infarction (AMI) occurred in about 10% of patients with chest pain. Acute anemia, mainly due to a gastrointestinal bleeding, was particularly common in patients with isolated syncope (13%). Atrial fibrillation is the reason for palpitation in almost half of the patients. Other diagnoses (e.g. pericarditis, hyperthyroidism, rhabdomyolysis and sepsis) were grouped into the “other acute disorders” category because of being too rare to be reliably studied.

Table 3 Most common acute disorders in most common clinical presentations

Overall, PE cases were about 4%. PE was more likely in patients with isolated syncope (6%), followed by those with isolated dyspnea (3.5%), but at least one case of PE occurred in every clinical presentation.

Use of diagnostic procedures

Table 4 summarises how frequently a procedure was applied within the most common clinical presentations. Besides chest X-ray studies, ECG and complete blood count that were routinely performed in almost all patients, cardiac enzymes and blood gas analysis were also frequently tested, in particular in patients with dyspnea combined with chest pain.

Table 4 Application of procedures within clinical presentations in patients with cardiopulmonary symptoms

Echocardiography was commonly used in patients with chest pain (41%), and thyroid hormones measurement when palpitations (50%) and isolated syncope (31%) were the presenting symptoms. B-type natriuretic peptide (BNP) was rarely performed but in patients with dyspnea and palpitations (19%).

Among diagnostic procedures for PE, D-dimer was mainly requested in patients with isolated syncope (36%), isolated palpitations (24%) and chest pain, with or without dyspnea (46 and 30%, respectively). Echocardiography was always requested more often than CTPA.

APA and comparison with rate of application

The APA of procedures varied according to the clinical presentations (see Table 5). Table 6 highlights the discrepancy between the APA and the rate of application of each diagnostic procedure within the clinical presentations.

Table 5 APA within clinical presentations
Table 6 Appropriateness of diagnostic procedures matched with the rate of their performance in the main clinical presentations

CTPA appeared insufficiently used in all presentations. Echocardiography was requested according to what was expected from the APA, but was overused with isolated syncope and isolated chest pain. Perfusion lung scan was underused in almost all presentations, except in patients with isolated dyspnea. Conversely, cardiac enzymes were overused in all presentations, except in isolated chest pain. Blood gas-analysis was also performed more than expected, except in isolated palpitations and isolated syncope. A D-dimer test was performed according to its expected accuracy, but underused in case of isolated syncope. The BNP was largely underused in all presentations, particularly when dyspnea was present.

Discussion

Our data confirm the clinical difficulties of the diagnosis and differential diagnosis of PE. In patients hospitalised after presenting to an ED with acute dyspnea, chest pain, syncope or palpitations, PE was present in only 4% of the cases. HF, pneumonia and COPD exacerbation are by far the most common acute disorders explaining cardiopulmonary symptoms. More than 20 diseases were at least as frequent as PE, representing almost one fourth of our sample (see Table 3). Many patients have concomitant chronic diseases (78% of patients) that are PE risk factors themselves. They are elderly (median age 74 years), and a consistent proportion of them were discharged without any mention of an acute disorder, particularly when isolated palpitations were the presenting symptoms (see Table 3). In this clinical setting, the priority deserved by diagnostic procedures according to their theoretical accuracy (see Table 5) does not always reflect their actual performance (see Table 4), a comparison explicitly summarised in Table 6.

The major strength of this study is the classification of patients based on their clinical presentations. Epidemiological and diagnostic studies usually describe a population of patients selected by subjective criteria, that is: clinical appearance of PE. Conversely, complaining symptoms allow a more objective selection of the included population and analysis of the diagnostic process in daily clinical practice. In this context, calculation of the APA provides an overall objective measure of accuracy of complex diagnostic procedures.

Several observations on the differential diagnosis of PE arise from our data. In presentations where HF was the most likely hypothesis, echocardiography and BNP had the highest APA. However, physicians request cardiac enzymes more frequently, probably because they are used to exclude more threatening conditions, like AMI. Indeed, a cardiac enzyme determination is really informative, in our analysis, only in patients presenting with isolated chest pain. However, other reasons appropriately influencing the test choice may be cost, invasiveness, accessibility and potentially treatable disease.

These may also explain why thyroid hormones were requested so often, despite the fact that hyperthyroidism is a relatively rare disorder. Another reason is that emergency physicians tend to focus on one diagnosis at a time, underestimating the value of multipurpose diagnostic procedures. Such an attitude might simply reflect a cognitive difficulty in handling an excessively large set of hypotheses. It is worthwhile noting that available decision-support algorithms addressing a single diagnostic hypothesis do not solve this problem.

As CTPA is concerned, it has been remarked how its high accuracy and ready availability have resulted in its increased use paired with a concomitant decrease in positive results for PE representing less than 10% of patients with suspected PE [16]. According to our analysis, a procedure yielding a negative result for one disease may also be positive for other diseases. This is why CTPA appears as an undervalued procedure, particularly with some clinical presentations, like isolated syncope, where preliminary routine tests are poorly informative. However, the risk of exposure to radiation and contrast media should prevent physicians from an indiscriminate request of CTPA, particularly when the most likely hypotheses can be revealed at lower clinical cost [4, 5, 7].

A D-dimer seems appropriately requested for several clinical presentations, but it appears underused in patients with isolated syncope and chest pain [17]. Probably, emergency physicians underestimate PE rate in these situations, focusing mostly on cardiac aetiology.

Any consideration about under or over use of diagnostic procedures should account for the actual accessibility of procedures, which is known to be different in academic and non-academic hospitals [18]. Although differences in both general characteristics and cardiopulmonary symptoms may indicate different distributions of diseases (see Tables 1 and 2), our sample could not be split further to reliably study differences in the strategies adopted in the two settings.

Other drawbacks limit our study, and deserve a comment. First, the retrospective design of the study and the lack of a standardised diagnostic process decrease the accuracy of each single final diagnosis. However, thanks to the large number of collected variables, the diagnosis reliability was assessed through objective and accessible criteria. For instance, the diagnosis of PE demanded a positive CTPA result. Second, estimate of the APA as a measure of accuracy of complex procedures is not optimal. In the present analysis, we follow statistical independence assumptions that are typically applied to sequential diagnosis [19]. Unfortunately, few statistical tools have been developed to discriminate between multiple diagnostic hypotheses, indeed the most common diagnostic problem in ED [2022]. Third, for some clinical presentation such as isolated palpitations, routine tests (i.e. ECG, chest X-ray and peripheral blood count) are used often enough to reach the final diagnosis, and, consequently, the estimation of the APA is less reliable for the limited number of cases (see low APA in table 5). Finally, only hospitalised patients have been included in the study. In Italy, however, patients with a suspected PE are practically all referred to the ED, and then hospitalised when the diagnosis is confirmed. Instead, other acute disorders are more often treated at home. Even admitting that some PE cases were missed, this should not have occurred more often than for other acute disorders.

In conclusion, the present study demonstrates that PE occurrence is rarer than other acute life-threatening diseases also presenting with dyspnea, chest pain, syncope or palpitations to an ED. PE differential diagnosis encompasses a large set of diagnostic hypothesis. Such a task represents a daily clinical challenge, for both physicians and researchers. While physicians do not prioritise the use of diagnostic procedures in the light of their expected accuracy, proposed scoring systems and decision algorithms simply skip the problem by addressing the confirmation of a clinical possibility, rather than the differential diagnosis with PE as one of the many hypotheses. More studies should be designed to assess the value of multipurpose procedures, as well as to develop flexible decision models able to optimise their use in the clinical practice.