Introduction

Despite significant advances in the diagnosis and treatment of lung cancer, it remains the leading cause of cancer mortality [1]. Although only a proportion of patients with lung cancer present with a solitary pulmonary nodule (SPN) on diagnostic imaging tests, this is an important group as an SPN can represent early stage lung cancer, with higher survival rates following surgical resection than larger lesions [2]. However, not all SPNs turn out to be lung cancer and the accurate characterisation of SPNs is an ongoing diagnostic challenge with significant associated health costs [3]. With the adoption of low-dose computed tomography (CT)–based lung cancer screening programmes in many countries, the number of patients with a SPN requiring further investigation is likely to increase substantially [4].

An SPN is defined as a single pulmonary lesion less than 30 mm in size [5]. Positron emission tomography with computed tomography (PET/CT) is currently the recommended test for the investigation of an indeterminate SPN ≥ 8 mm, particularly when a biopsy is not possible [6, 7]. However PET/CT is only available in specialist centres, with more limited availability than CT, which can make access more difficult for an older population with a high burden of co-morbidities [8, 9]. In addition, PET/CT is both time-consuming and expensive relative to other non-invasive imaging modalities such as CT. Where PET/CT measures the metabolism within the tissue of interest, dynamic contrast–enhanced CT (DCE-CT) allows measurement of the vascularity of the tissue [10]. The degree of enhancement on DCE-CT has been shown to correlate well with grade of lung cancer and the vessel density in the tumour [11, 12]. DCE-CT can be performed on most modern CT machines in current use and is therefore potentially readily accessible to patients. Furthermore DCE-CT could potentially be performed at the same CT examination at which the pulmonary nodule is found. Early work suggested a high diagnostic accuracy for DCE-CT; however, this previous analysis incorporated a relatively small number of studies [13].

The aim of this systematic review of the literature and meta-analysis was to determine the diagnostic performance of dynamic contrast–enhanced computed tomography (DCE-CT) for the differentiation of malignant from benign pulmonary nodules.

Materials and methods

The study was prospectively enrolled in PROSPERO (CRD42018112215). The study has been reported in accordance with the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) statement [14].

The population of interest were those with a solitary pulmonary nodule undergoing a dynamic contrast–enhanced CT as part of a workup to determine the malignant or benign status of the nodule. The inclusion criteria were studies examining solitary pulmonary nodules being worked up for malignancy, and excluded those which included participants < 18 years old, and those with pure ground glass nodules. The intervention of interest was dynamic contrast–enhanced computed tomography. Computed tomographic scans were included as long as there was a minimum of both a pre-contrast and post-contrast-enhanced CT dataset for the quantification of the degree of enhancement. The gold standard against which the test was examined was required to be histological diagnosis of malignancy obtained from either needle biopsy or surgical resection, with benign status confirmed either histologically or with follow-up imaging showing no growth at 2 years or resolution. We considered both prospective and retrospective diagnostic accuracy studies which contained sufficient data to construct contingency tables in order to assess true positive, false positive, true negative, and false negative results.

To identify articles of interest for review, Ovid MEDLINE and EMBASE were searched for published studies from their inception until October 2018 on the diagnostic accuracy of DCE-CT in the characterisation of pulmonary nodules. The full search strategy is documented in Supplementary Table S1. Titles and abstracts of studies retrieved using the search strategy and those from additional sources were all independently screened by two reviewers (J.W.M. and S.J., both with 1-year experience) to identify studies that potentially met the inclusion criteria outlined above. The full text of these potentially eligible studies were retrieved and independently reviewed by the two reviewers to assess for eligibility. Where there was a disagreement between the reviewers, a consensus was reached through discussion. The references of the retrieved full text articles were screened for further articles of interest, and if any articles were found these were retrieved if they had not been previously identified with the original search strategy.

A single reviewer (J.W.-M.) used a standardised, pre-piloted form to extract data from the included studies for assessment of study quality and evidence synthesis. Extracted information included study population and participant demographics and baseline characteristics; details of the CT scanning hardware, scanning technique, and diagnostic threshold used; study methodology; nodule size range and eventual diagnosis; diagnostic accuracy metrics; and radiation dose.

Two review authors (J.W.-M. and S.J.) independently assessed the risk of bias in the included studies through the use of the second version of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) questionnaire [15]. Discordance in the scoring of bias between the two reviewers were resolved by a third review author (L.-M.D.).

Three deviations occurred from the original pre-registered protocol. A size threshold was not pre-specified in the original protocol, yet upon the literature review it became apparent that the upper size limit included varied markedly between studies. Although the Fleischner and BTS guidelines state that the upper limit of an SPN is 30 mm, we allowed up to 40 mm for the purpose of this analysis due to the high quality of many of the studies using this threshold, and the granularity it would provide the review. However, an analysis was performed to compare studies with and without nodules above 30 mm as described in the statistical section. Whilst our original protocol called for the analysis of solitary pulmonary nodules, we found that although several studies recruited cases based on the detection of a solitary pulmonary nodule, if an additional nodule was detected at the time of the index test, they included, analysed, and followed up both lesions. Despite not being strictly ‘solitary’ pulmonary nodule studies, these were included in the analysis as they reflect routine clinical practice where a second smaller nodule is identified when CT is performed following detection of a nodule on chest radiograph. Some studies reported average follow-up of the nodules detected on CT, rather than a minimal follow-up period. Cancellation of follow-up after resolution of the nodule in the case of infectious/inflammatory nodules would reduce the mean length of follow-up below the pre-stated 2 year minimum, yet nodules are considered benign if they resolve. Therefore, these studies were included in the meta-analysis. They were however classed as being at high risk of bias with regard to their application of the reference standard on the QUADAS-2 questionnaire due to the uncertainty about the minimum length of follow-up in stable nodules. The impact of this on the results was analysed as described below.

Statistical analysis

Numbers of true positives, false positives, true negatives, and false negatives were extracted from the studies and used to form 2 × 2 contingency tables which were used to derive sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR). Results were pooled using the lme4 package within R (RStudio Version 1.1.463, RStudio, Inc.) to perform a bivariate binomial random effects meta-analysis [16]. This uses a binary (logit) generalised linear mixed model fit by maximum likelihood (using a Laplace approximation). Bivariate summary receiver operator characteristic (SROC) curves were constructed using the bivariate random effects model outputs to populate the SROC plot within Review Manager Version 5.3 (The Cochrane Collaboration). To identify potential sources of heterogeneity, we stratified a secondary analysis into subgroups according to characteristics such as sample size, lesion size, risk of bias (low versus high/indeterminate), diagnostic thresholds, whether the diagnostic threshold was prospectively set, and year of publication. These were included as covariates, in turn, in a meta-regression analysis, with analysis of statistical significance between models performed using a likelihood ratio test of nested models. For sample size, the threshold at which to split the data was arbitrarily set at 100 to represent larger samples that were less likely to be prone to bias due to outliers. For mean nodule size, the sample was split at 20 mm to provide a reasonable split of the data. For maximum nodule size, the data was split based on whether the study included nodules > 30 mm, as the 30 mm diameter is considered by most guidelines as the upper threshold for a lesion to be called a nodule, after which it is considered to be a mass. Effect of publication date was examined by splitting on the median (2008), with studies published in the last decade considered to be more representative of modern CT technology. In studies reporting the diagnostic accuracy of multiple thresholds, the optimal threshold was used in the primary analysis. In the secondary analyses examining different thresholds, studies were included in each subgroup analysis where they had reported the threshold of interest. Thresholds with ≤ 2 studies reporting the same threshold were not considered for this secondary analysis. To test for study publication bias and heterogeneity, a Galbraith plot was created to examine the interaction between the efficient score and variance, with the Harbord test used to test for funnel plot asymmetry [17]. All statistical analysis was performed using RStudio. Forest plots and SROC curves were generated using RevMan.

Results

Of 3028 potential papers identified by the literature review, 22 were included which met the inclusion and exclusion criteria. An additional study was located from the references of the included papers resulting in 23 studies in the final analysis. Figure 1 details the flow of the studies identified and screened for eligibility, and the reasons for study exclusion.

Fig. 1
figure 1

Flow diagram of the articles identified by the literature search, screened for eligibility, and included in the final study. CT computed tomography, MRI magnetic resonance imaging, PET/CT positron emission tomography, SPECT single positron emission computed tomography

Twenty-three studies were included, incorporating the results from 2397 patients with 2514 nodules. Of these, 1389/2514 (55.3%) were malignant. The studies were predominantly retrospective single-centre studies, performed in a wide range of countries and settings (Table 1). The dynamic contrast–enhanced CT protocol varied widely from study to study, from the injection rate to the scan timing to the tube settings (Table 2). Eighteen studies were performed using mono-energetic (routine) CT with regular interval imaging, and 5 were performed using CT perfusion techniques. The injection techniques included a standardised volume bolus and injection rate; adjusting the contrast volume to the weight of the patient; or adjusting the injection rate to the weight of the patient. Image acquisition ranged from 3 volume acquisitions at different phases of the contrast injection to 32 separate acquisitions. Most studies utilised an enhancement subtraction technique, taking the phase with the maximum nodule attenuation and subtracting the baseline attenuation to calculate the degree of enhancement. However, several studies utilised the slope of the enhancement curve or the area under the enhancement curve.

Table 1 Summary of the study design and baseline characteristics of those included in the meta-analysis
Table 2 Summary of the CT acquisition protocols and measurements of the studies included in the meta-analysis

The results of the QUADAS-2 bias and applicability assessment are summarised in Fig. 2 whilst Table 3 documents the individual bias scores for the seven domains for all included studies. Bias in patient selection was unclear in a large number (14/23, 61%) of studies due to a lack of reporting of the sampling of patients for the diagnostic test accuracy evaluation, with many retrospective studies not clearly documenting whether consecutive cases were included or not. Risk of bias in the index test was high in a large number of studies (12/23, 52%) due to a lack of pre-specification of the intended threshold to be used, and in several studies multiple techniques of enhancement of quantification were used simultaneously (including but not limited to absolute contrast enhancement, relative contrast enhancement, wash in, wash out, wash in and wash out, and area under the enhancement curve). Bias regarding the reference standard was unclear in the majority of studies (18/23, 78%), with the blinding of the reference standard to the index test infrequently reported. Flow and timing had a similar high-rate frequency of uncertainty bias (15/23, 65%), with the delay between the index test and reference standard infrequently reported. Concerns regarding the applicability of the included studies to the review question were low for the majority of the studies (Fig. 2).

Fig. 2
figure 2

QUADAS scoring summary of the included studies

Table 3 Table of the QUADAS-2 components for each of the individual studies

The results of the individual studies sensitivities and specificities are collated in a forest plot in Fig. 3, with all studies reporting a per nodule diagnostic accuracy. The pooled analysis of the 24 studies is reported in Table 4. The pooled sensitivity and specificity were 94.8 (95% CI 91.5; 96.9) and 75.5 (95% CI 69.4; 80.6) respectively (see SROC plot in Fig. 4), with a positive and negative likelihood ratio of 3.86 (2.99; 4.74) and 0.07 (0.03; 0.10), and a diagnostic odds ratio of 56.6 (24.2; 88.9). Only two distinct enhancement thresholds were reported by > 2 studies with the pooled analysis for each of these reported in Table 4. Of these, a threshold of < 20 Hounsfield units (HU) enhancement for the differentiation of a malignant from a benign nodule had the highest diagnostic odds ratio of 142.5 (95% CI − 36.4; 321.3), maintaining a high sensitivity of 98.3% (95% CI 95.1; 99.4) and moderate specificity of 71.0% (95% CI 63.1; 77.8) (Table 4).

Fig. 3
figure 3

Forest plot of the included studies. Studies listed by first author and year of publication. CI confidence intervals, FN false negative, FP false positive, TN true negative, TP true positive

Table 4 Diagnostic performance of dynamic contrast–enhanced CT for the evaluation of pulmonary nodules
Fig. 4
figure 4

Bivariate SROC curve of the included studies. The white circles indicate each individual study whilst the black circle indicates the summary point. The dotted line is the 95% confidence region for the summary operating point, whilst the dashed line is the 95% prediction region (which is the confidence region for a forecast of the true sensitivity and specificity in any future study)

The Galbraith plot (Fig. 5) demonstrated multiple studies falling out with the 95% confidence intervals consistent with a significant inter-study heterogeneity in findings, but there was not any significant asymmetry in the plot (p = 0.90) to suggest publication bias. A formal analysis of the degree of heterogeneity was not performed as per the Cochrane Collaborations recommendations on diagnostic test accuracy meta-analysis; however, factors that may have contributed to the heterogeneity were examined (Table 5). Studies with a low risk for reference standard bias demonstrated a higher sensitivity and with equivalent specificity compared with studies with intermediate/high risk (p = 0.044). However only two studies—both conducted by the same group—were considered to be at low risk. Studies conducted pre-2008 had slightly higher sensitivity and specificity compared with those from 2008 onwards although this did not reach statistical significance (p = 0.07). The CT technique (mono-energetic, versus perfusion) did not affect diagnostic accuracy (p = 0.42). No difference was present between subgroups when studies were split based on sample size, mean or maximum nodule size, threshold prospectively or retrospectively set, or the presence of patient selection bias, index test bias, or flow and timing bias (p > 0.1 for all). In particular, there was no significant difference in the pooled sensitivity or specificity between studies that only included nodules ≤ 30 mm (and therefore meet current definitions of SPNs) compared with those that included larger nodules up to 40 mm in size (p = 0.07 for between group differences in sensitivity and specificity).

Fig. 5
figure 5

Galbraith plot examining inter-study heterogeneity for publication bias by incorporating the effect size of each study compared with the pooled analysis. The y-axis represents the test statistics (effect/standard error of the estimate) of each study, which are expected to fall within 2 units of the pooled effects for 95% of the studies. The x-axis plots 1/standard error of the pooled study estimate

Table 5 Subgroup analyses of the diagnostic performance of DCE-CT for evaluation of indeterminate pulmonary lesions

Discussion

This meta-analysis demonstrates a high sensitivity and moderate specificity for dynamic contrast–enhanced computed tomography for the diagnosis of solitary pulmonary nodules with a pooled sensitivity and specificity of 94.8% and 75.5% respectively. However, the study quality was indeterminate in a significant proportion of the studies with only one multi-centre study and a large number of small single-centre studies. Whilst the analysis shows promising results for the technique, the low quality of the included studies must be taken into account and further carefully designed high-quality multi-centre studies are required.

The current Fleischner guidelines for further investigation and management of indeterminate solitary pulmonary nodules call for either PET/CT or biopsy if the nodule is > 8 mm [6], with dynamic contrast–enhanced computed tomography not mentioned in the diagnostic pathway despite inclusion of the technique in the 2005 version of the guidelines [5]. The British Thoracic Society guidelines state that dynamic contrast–enhanced computed tomography should not be used where positron emission tomography is available although it is acknowledged that there is little evidence to support this beyond the historical prerogative of PET/CT [7]. A recent meta-analysis of PET/CT including 20 studies with 1557 participants reported a sensitivity and specificity of 89% and 70%, and a diagnostic odds ratio (DOR) of 22 [39]. These results are similar to the DCE-CT results obtained in this meta-analysis with the 23 studies including 2397 participants, demonstrating a pooled sensitivity, specificity, and DOR of 95%, 76%, and 57% respectively. This suggests that DCE-CT could replace PET/CT as an equivalent diagnostic technique. Currently, there are a limited number of studies directly comparing DCE-CT with PET/CT, precluding the ability to perform a meta-analytic comparison. Ohno et al compared DCE-CT with both PET/CT and dynamic contrast–enhanced MRI in a single-centre study of 198 patients, and found that DCE-CT out performed both MRI and PET/CT in specificity and accuracy [10]. This contradicted results of Yi et al who found, in a single-centre study of 119 participants, that PET/CT was more sensitive with equal specificity to that of DCE-CT [40]. Thus, further work is required to directly compare these two modalities. Another technique that has a growing body of evidence is that of diffusion-weighted MRI (DW-MRI). Whilst PET/CT examines metabolism and DCE-CT measures perfusion, DW-MRI quantifies the movement of water within the lesion. A recent meta-analysis of diffusion-weighted MRI for the diagnosis of indeterminate solitary pulmonary nodules has suggested superiority of this technique compared to PET/CT with a pooled sensitivity, specificity, and DOR of 83%, 91%, and 50 respectively for diffusion-weighted MRI compared with 78%, 81%, and 15 for PET/CT [41]. Furthermore, dynamic contrast enhancement can also be quantified on MRI in the same examination as the assessment of diffusion [42]. Given the differing nature of the 3 parameters in question, further research is needed to determine whether the information from perfusion, diffusion, and metabolism are complimentary or duplicative in improving diagnostic accuracy.

The equivalent sensitivity, specificity, and accuracy in this meta-analysis of DCE-CT compared with previous meta-analysis of PET/CT provides supportive evidence for consideration of incorporation of DCE-CT into the diagnostic pathway of pulmonary nodules. CT machines are more commonly found and more readily accessible in hospital settings than PET/CT. A dynamic contrast examination is very similar to a standard contrast CT procedure which is commonly undertaken at all hospitals and requires no additional equipment. A PET/CT examination requires the injection of a radioactive substrate, which needs to be delivered reliably to centres undertaking PET examinations. The requirement of such a supply chain can have significant impact on service flexibility and can result in scan cancellations when there is disruption or delay in delivery of the radioactive agent [43]. Future studies examining whether certain subgroups of pulmonary nodules (such as small size) or those found in patients with different risk profiles and likelihood of malignancy may have more to gain from a DCE-CT examination than PET/CT are also required. Similarly, a tiered approach using DCE-CT as the first diagnostic test and gatekeeper to PET/CT may allow for a more nuanced workup approach utilising the strengths of both techniques. Such an approach has been shown to be a cost-effective approach to the diagnosis of SPNs [44]. Robust direct comparative accuracy of DCE-CT and PET/CT in the same population and cost-effectiveness studies are warranted to test the various diagnostic pathways.

There are several limitations with the current meta-analysis. The quality of the included studies was frequently indeterminate due to lack of reporting of key metrics. The studies were almost exclusively single-centre and frequently retrospective, both of which are likely to amplify the apparent diagnostic accuracy of the technique. In addition, the dynamic contrast acquisition technique and the metrics for the quantification of the enhancement were heterogeneous throughout the studies. Whilst these factors did not appear to have an impact on the accuracy of meta-regression, a standardised acquisition and analysis technique should be agreed upon to improve reproducibility and facilitate comparison between trials thereby allowing more widespread adoption. The observed rate of malignancy in the included studies is relatively high (55%). Whilst this is consistent with previous meta-analysis of MRI and PET in SPNs [39, 41], it is substantially higher when compared to screening detected SPNs such as in the National Lung Screening Trial (15.0% malignancy in 10–30 mm nodules) and NELSON trial (15.2% malignancy in nodules > 10 mm) [45, 46]. Previous work has shown the sensitivity of a technique to be relatively robust to disease prevalence and for the specificity to increase with falling prevalence [47]. It can be postulated that the diagnostic accuracy of DCE-CT would be similar, or even further improved, in a screening population.

In conclusion, we have found a high diagnostic accuracy of DCE-CT for the diagnosis of pulmonary nodules although study quality was poor or indeterminate in a large number of cases. The diagnostic accuracy is comparable to a recent meta-analysis of PET/CT suggesting that DCE-CT may compliment or augment the current diagnostic pathway used for the investigation of solitary pulmonary nodules.