Introduction

In the last 20 years, a steep increase in the prescription of diagnostic imaging examinations has taken place. However, if the usefulness of a radiological examination is to change patient management, its “overuse” without clinical-diagnostic gains may have negative economic and clinical consequences in terms of radioprotection. Indeed, assessment of prescriptive appropriateness and consequently the right allocation of resources are becoming a central issue in the management of health-care systems. The clinical audit (CA) practice has been introduced in hospital management to this end. CA involves comparing current clinical practice to evidence-based best practice in the form of standards [1]. The purpose of CA is to monitor to what degree standards for any given health-care activity are met and to identify reasons why they are not met. The Italian Ministry of Health has drafted a Clinical Audit Manual which provides a methodology regarding specific medical/health-care issues and some aspects of the current practices [2]. Recently, in Italy some health-care facilities have included CA as part of clinical practice [3, 4].

The purpose of our study was to audit the level of appropriateness of whole body CT (WB-CT), PET–CT and chest X-rays (CXRs) prescribed at the Tor Vergata University Hospital in the inpatient setting and to identify the main categories of inappropriate prescribers and the most frequent inappropriate indications.

Materials and methods

The retrospective observational study has involved a multidisciplinary work group consisting of one medical area facilitator, one surgical area facilitator, two diagnostic imaging facilitators, one hospital directorate manager and the chiefs of the respective clinical operative units (OU). WB-CT, PET–CT and CXR examinations performed at the Tor Vergata University Hospital in the inpatient setting during the period between January and December 2014 were analysed. CXR examinations were divided into bedside CXRs and traditional CXRs. The data were collected by the RIS-PACS system of the Diagnostic Imaging Department. The examinations were stratified among the different OUs and normalized to the number of discharges of each OUs and expressed in percentage (number of examinations/total discharges × 100). The WB-CT and PET–CT examinations prescribed by the OUs that proved to be equal or to exceed the threshold of 20% of the number of examinations/total number of discharges were included in the analysis of the diagnostic appropriateness. With regard to traditional CXRs and bedside CXRs, the analysis included the examinations prescribed by the OUs that proved to be equal or to exceed the threshold of 50% of the number of examinations/number of discharges. The analysis of diagnostic appropriateness was performed by critical revision of the clinical records. The appropriateness of the examinations was defined following the guidelines of the American College of Radiology Appropriateness Criteria [5]: A = appropriate, score 7–9, D = doubtful appropriateness, score 4–6, or I = inappropriate, score 1–3. Examinations of doubtful appropriateness (D) were grouped with inappropriate examinations (I). The second step involved categorizing the causes of inappropriateness (in classes D and I), using groupings according to the European Union Medical Imaging guidelines [6] into one of six possible broad categories: (1) repeating tests that have already been done (e.g., at another hospital). In our setting, category 1 was assigned to examinations performed too often (category 2) or to investigation whose results were unlikely to affect patient management (category 3) that we found to have already been performed at another hospital; (2) investigation when results are unlikely to affect patient management (e.g., because the anticipated positive finding is usually irrelevant or because a positive finding is so unlikely); (3) investigating too often (e.g., before the disease could have progressed or resolved, or before the results could influence treatment); (4) do the wrong test. In our study preoperative CXRs were also included in this category; (5) failing to provide appropriate clinical information and questions that the imaging investigation should answer; (6) excessive investigation. Some clinicians tend to rely on tests more than others, and some patients have inappropriate expectations of the optimal type of examination.

Results

A total of 2232 WB-CT, 703 PET–CT, 6219 bedside CXRs and 5490 traditional CXRs were performed at Tor Vergata University Hospital during 2014. The OUs that proved to exceed the 20% threshold of number of examinations/number of discharges with regard to WB-CT were nephrology (94/338), hepatology (58/171), internal medicine (74/192), cardiology (13/32), thoracic surgery (43/103), medical oncology (379/648), emergency medicine (406/470) and haematology (364/228) (Fig. 1). With regard to PET–CT, they were nephrology OU (68/338), internal medicine (52/192), thoracic surgery (43/103) and haematology (210/228) (Fig. 2). With regard to CXRs, the OU that exceeded the 50% threshold were maxillofacial surgery (89/174), general surgery (344/637), neurosurgery (215/384), infective diseases (181/271), haematology (159/228), vascular surgery (242/346), ENT (183/253), hepatology (129/171), gynaecology (217/279), endocrinology (176/208), cardiac surgery (863/599) and thoracic surgery (153/103) (Fig. 3); with regard to bedside CXRs, the OUs that exceeded the 50% threshold were anaesthesia and resuscitation (174/181), the stroke unit (187/344), intensive care (297/146), emergency medicine (1031/460) and cardiac surgery (2774/599) (Fig. 4). For the evaluation of the prescriptive appropriateness, the clinical records of 1190 patients were assessed with regard to WB-CT, 353 clinical records for PET–CT, 873 clinical records for bedside CXRs and 2800 clinical records for traditional CXRs. The appropriateness was suboptimal for all analysed techniques CXRs (A = 38%, I = 62%), bedside CXRs (A = 45%, I = 53%), WB-CT (A = 45%, I = 55%) and PET–CT (A = 48%, I = 52%).

Fig. 1
figure 1

WB-CTs prescribed in the inpatient setting at Tor Vergata University Hospital during 2014

Fig. 2
figure 2

PET–CT prescribed in the inpatient setting at Tor Vergata University Hospital during 2014

Fig. 3
figure 3

CXRs prescribed in the inpatient setting at Tor Vergata University Hospital during 2014

Fig. 4
figure 4

Bedside CXRs prescribed in the inpatient setting at the Tor Vergata University Hospital during 2014

With regard to WB-CT, the most inappropriate requests came from the OUs of haematology (44%) and emergency medicine (33%); for PET–CT we found that the most inappropriate requests came from the OUs of thoracic surgery (53%) and of haematology (48%). With regard to traditional CXRs, the percentage of inappropriateness was consistently distributed among all surgical OUs, including ear nose and throat, with an average rate of inappropriateness equal to 63%. For bedside CXRs, the most inappropriate prescribers were the OUs of emergency medicine (48%), cardiac surgery (58%), intensive care (67%) and anaesthesia and resuscitation (78%). The most represented classes of inappropriateness were 2, 3, 4 and 6 (Table 1).

Table 1 Stratification of the causes of inappropriateness in ACR classes D and I, using groupings according to the European Union Medical Imaging guidelines into one of six possible broad categories

Discussion

In recent years, prescriptive inappropriateness has become one of the main issues in health care in terms of waiting times, economic costs and radiation doses.

To date, the evaluation of diagnostic imaging prescription appropriateness has been mostly limited to the outpatient settings; however, the risks of prescriptive inappropriateness may also be present in a more “controlled” clinical setting such as the inpatients setting [4].

In our study, we have revised the prescriptions of imaging examinations with a high economic and numeric impact, such as WB-CT, PET–CT, CXRs and bedside CXRs performed in the inpatient setting during the year 2014 at the Tor Vergata University Hospital. The analysis has documented that for WB-CT, the largest prescribers of inappropriate examinations were the haematology and emergency medicine OUs. The main category of inappropriateness for haematology was category 1: “Repeating tests that have already been done (e.g., at another hospital)”. In the USA, the Institute of Medicine has suggested the potential for substantial savings, estimating that $8 billion is spent annually on repeat testing [7]. Although there is wide variation in reporting how much waste exists in our current health-care delivery system and how it should be defined, there is consensus among researchers and policy makers that such waste exists and that action can be taken to reduce it. However, the term “repeat testing” as currently used is neither precisely nor universally defined. Indeed, from both research and policy perspectives, the term “repeat testing” is ambiguous and is often used to describe many different facets of both appropriate and potentially inappropriate care [8]. In our setting, category 1 was assigned to examinations performed too often (category 2) or to investigation whose results were unlikely to affect patient management (category 3) that we found to have already been performed at another hospital. This kind of examination could be viewed as duplicative imaging as classified by Kassing et al. [8] and possible due to defensive medicine rather than a compelling clinical need. In these cases, both patient education and improved integration of electronic health records could be solutions for such a waste of resources [8].

With regard to emergency medicine, the main category of inappropriate prescription was category 2, i.e. “Investigation when results are unlikely to affect patient management (e.g., because the anticipated positive finding is usually irrelevant or because a positive finding is so unlikely)”. In fact, a number of studies have shown that performing CT in post-traumatic, haemodynamically stable patients with normal clinical parameters does not affect the patient’s clinical management [9, 10]. In particular, a study by Millo et al. analysed the medical records of patients presenting with a triage history of motorized blunt force trauma who underwent CT of the chest, abdomen and pelvis at the time of presentation. Hemodynamically stable adult patients without abnormal physical examination findings to suggest injury of the trunk were included in the study. The authors found that the clinical yield of performing CT of the chest, abdomen and pelvis in motorized blunt trauma patients with normal clinical examinations was minimal [11]. Although we recognize the value of a normal CT scan in quickly and accurately triaging patients, extended clinical observation with serial physical examination may be considered as an alternative to CT where appropriate [12].

With regard to inappropriate PET–CT requests, the most inappropriate prescriptions came from the haematology OU and fell within category 4, “Do the wrong test”. In particular, most of the inappropriate prescriptions concerned PET–CT routine assessment of patients with chronic lymphocytic leukaemia (CLL). Despite the fact that some studies suggested that PET/CT may helpfully integrate the biologically based prognostic stratification of CLL, no usefulness has been documented for PET–CT in the routine surveillance of CLL [13, 14]. In light of current scientific evidences, performing PET/CT imaging in CLL is justified only whenever there is clinical suspicion for disease progression or complications [14]. However, more prospective clinical trials including large cohorts of patients are certainly warranted to conclusively assess the role and prognostic impact of PET/CT in the routine management of CLL patients.

The other main inappropriate prescriptions of PET–CT fell within category 6, “Excessive investigation”. In particular, PET–CT prescriptions by thoracic surgery OU regarded the evaluation of solitary pulmonary nodules of less than 1 cm, already documented by previous CT examinations. It is now consolidated that for part-solid nodules <10 mm PET is of limited value and potentially misleading, and CT follow-up is advised [15]. Indeed, false-negative results for SPN characterization on PET–CT can occur in three main settings: small lesion size, low tumour metabolic activity and hyperglycaemia [16]. The partial volume effect leads to considerable underestimation of true intensity or activity within the lesion. In general, negative PET–CT results for nodules smaller than 1 cm, particularly <7 mm, do not confidently exclude malignancy [17]. However, it is impossible to ignore medicolegal considerations when discussing management of pulmonary nodules. The current practice of recommending PET–CT for all indeterminate opacities is partly related to perceived liability if a cancer should develop. When the medical community has preached the importance of early detection of cancer for so long, it may prove difficult to convince physicians and the patients that PET–CT of every nodule is unnecessary.

With regard to CXRs, since most CXR tests were preoperative, the inappropriateness was consistently distributed among all surgical OUs and was categorized as category 4 (do the wrong test). The Royal College of Radiologists has published a multi-centre study that analysed 10,619 pre-surgical CXR tests in patients, candidates for elective surgery, reaching the conclusion that the “pre-surgical CXR test does not affect the surgical and/or anaesthesiology choice” [18]. Also, Rucker et al. evaluated the usefulness of preoperative chest radiographs in 905 patients based on risk factors including history of malignancy, recent history of smoking, exposure to toxic chemicals or signs and symptoms of recent infection. He concluded that extensive CXR preoperative testing has no clinical added value [19]. Indeed, some preoperative investigations may be appropriate, if they are based on the finding of a specific clinical abnormality and if the results of the test might affect the care of the patient. However, this approach requires a most careful examination by the physician but the resultant cost savings are the reward.

For bedside CXRs, the OUs involved were the emergency medicine, cardiac surgery, intensive care and anaesthesia and resuscitation, and the inappropriateness fell into category 3 (investigating too often) and category 2 (investigation when results are unlikely to affect patient management). Although the prescription of bedside CXRs by these OUs could be justified by the severity of the patient’s illness, a study by Graat et al. in a cohort of medical–surgical intensive care patients has shown that daily CXRs led to changes in treatment only in 2.2% of the patients [20]. Furthermore, a meta-analysis published by Oba and Zaza in 2010, carried out on a sample of 7078 patients hospitalized in intensive care units, half of whom underwent daily CXRs while the other half underwent CXRs only on specific clinical indications, showed that no changes occurred between the two groups with regard to mortality, length of hospitalization or use of pulmotor [21]. Other authors underlined how following specific clinical indications for intensive care patients may result in an advantage in terms of diagnostic significance [22]. Along with these lines of evidences, a white paper by Thomson Reuters has documented that in the USA, more than 95 million diagnostic imaging examinations are performed each year, of which 20–50% are inappropriate with a consequent loss of 250–325 billion of dollars per year [23]. In Italy, the almost uncontrollable growth in the number of diagnostic imaging test prescriptions together with the high number of negative examinations suggests poor appropriateness that does not improve the patient’s health in the outpatient setting [24]. Indeed, our data confirm that poor appropriateness can also be present in a more controlled environment such as the inpatient setting and highlight the role of clinical audit as an important tool that can be used to critically review current practice and consequently to reduce the unnecessary use of health-care resources.

As a limitation of our study, we accepted published guidelines as the only possible gold standard. Indeed, the approach of defining appropriateness from guidelines is limited, since this process does not allow the evaluation of shadings according to the different patient clinical situation. Notably, most of the guidelines are based on the level of evidence C, that is, the consensus in the absence of a firm scientific evidence base [25]. Moreover, the setting of the study was a university hospital with a high burden of diagnostic imaging examinations and patient discharges, of which we selected only a sample by choosing a subjective threshold of 20% of the number of examinations/total number of discharges for the CXRs and 50% for the PET–CT and the WB-CT; nonetheless, more 5000 clinical records were examined by our committee.

In conclusion based on our data, the elimination of inappropriate prescriptions would allow the Diagnostic Imaging Department of Tor Vergata to avoid the execution of approximately 4000 examinations/year with savings of 390,000 Euro/year. In light of such evidences, our work group is developing an implementation plan to increase the appropriateness of prescribing through the adaptation of the available evidence to the local context and experience.