Introduction

Adrenal nodules are commonly encountered incidental imaging findings, identified in approximately 5% of patients undergoing computed tomography (CT) [1, 2]. While the majority of incidental adrenal nodules are benign adenomas, the risk of malignancy in an incidentally detected adrenal nodule increases substantially in patients with a history of cancer [3, 4]; thus, the imaging characterization of incidentally discovered adrenal nodules is important. Adrenal cortical cells normally contain intracytoplasmic lipid which is a precursor for steroid production and approximately 70% of adenomas will also contain intracytoplasmic lipid [5]. The detection of intracytoplasmic lipid in an adrenal nodule using CT or magnetic resonance imaging (MRI) forms the basis of diagnosis for so called “lipid-rich” adenomas. Lipid-poor adenomas can be diagnosed by their high relative washout rate on a CT study with a delayed contrast phase [3]. Detection of intracytoplasmic lipid can confirm the diagnosis of adrenal adenoma by using a threshold of 10 Hounsfield units (HU) on non-contrast CT (NCT): sensitivity of 71% and specificity of 98% [5]. Due to this, current practice guidelines recommend that for homogeneous adrenal lesions < 4 cm and with density < 10 HU, no further imaging is required [6].

A frequent dilemma is encountered in clinical practice when an incidental adrenal nodule is detected during a single-phase contrast-enhanced CT (CECT) examination because enhanced adrenal nodule attenuation levels will overlap significantly between adenomas and non-adenoma lesions [710]. In this instance, a repeat examination using either dedicated NCT (+/- multiphase adrenal washout protocol) or chemical shift MRI is required for further characterization which is generally performed in conjunction with biochemical testing [11]. Dual-energy CT, which has rapidly become incorporated into many abdominal imaging practices, offers the ability to extract a virtual non-contrast CT (vNCT) data set from routinely acquired variable energy single-phase CECT examination [12]. Analysis of the vNCT data using absolute attenuation thresholds has been suggested as a viable method to diagnose adenomas [13]. Diagnosis of adenomas using vNCT is desirable as a ‘replacement test’, as this could obviate the need for additional follow-up imaging. To date, studies reporting the diagnostic accuracy of vNCT using absolute attenuation thresholds for diagnosis of adrenal adenomata are variable and have been restricted to single-institution retrospective studies [14]. The purpose of this systematic review is to compare the diagnostic accuracy of vNCT to NCT for the diagnosis of adrenal adenomas.

Methods

This systematic review was written using the guidelines outlined by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [15] and the Cochrane Handbook for Diagnostic Test Accuracy Reviews [16](2)(2)(2). The protocol of this study was registered on PROSPERO (International Prospective Register of Systematic Reviews; protocol number, CRD42015025937) on September 4, 2015 [17](3)(3)(3). No significant protocol deviations occurred.

We proposed the following PICO(S) question: Patients with an indeterminate adrenal lesion; Index test of dual-energy CT used to generate a vNCT series and attenuation levels measured on vNCT used for diagnosis; if direct Comparison to standard non-contrast CT was performed in the same patients, this data will be extracted and used as a comparator; Outcome was assessed with reference standard of surgery, tissue biopsy or clinical follow up of > 6 months for non-adenomas, clinical parameters or previously described and validated imaging findings compatible with adenoma including: NCT density <10 HU, visual/quantitative signal intensity decrease using chemical shift MRI [4] or adrenal washout protocol CT; any type of Study that is not a case report or case series was eligible for inclusion [18].

Search strategy

A literature search using Medline, Embase and the Cochrane Central Register of Controlled Trials (or CCTR) was performed with the assistance of a hospital librarian (A.D.). The final search was performed on October 23, 2015. The search strategy was developed by an abdominal radiologist (M.D.F.M., 9 years of post-residency experience and 5 years of experience conducting systematic reviews). Peer review of the search strategy was performed by an additional abdominal radiologist (N.S., 4 years of post-residency experience) and a hospital librarian (A.D.). No language restrictions were placed on search results. The search strategy is provided in Appendix 1. Retrieved titles and abstracts were independently reviewed by two authors (M.J.C., M.D.F.M.) for relevance (M.J.C. is a resident with 3 years of radiology experience). Full text versions of relevant studies were retrieved for further evaluation. Inclusion criteria were applied to the full-text articles independently by two reviewers (M.J.C., M.D.F.M.). Studies were excluded if any of the inclusion criteria were not met. Reference lists of included studies were checked manually to identify other relevant papers. A search of the grey literature was performed by one of the authors (M.J.C.), up to date as of March 5, 2016, including conference proceedings in 2013–2015 Radiological Society of North America (RSNA) Scientific Meetings, 2015 American Roentgen Ray Society (ARRS) meeting, 2013–2015 European Congress of Radiology (ECR) meetings as well as National Institute of Health (NIH) Clinical Trials database and World Health Organization (WHO) International Clinical Trials Registry Platform. Grey literature search strategy is also provided in Appendix 1.

Inclusion criteria

  1. 1.

    Human patients with an adrenal lesion.

  2. 2.

    Dual-energy CT was performed and a vNCT series was generated as part of the protocol.

  3. 3.

    The data is retrievable to calculate a 2 x 2 contingency table (or sufficient data to calculate either sensitivity or specificity).

  4. 4.

    An acceptable reference standard (described in the ‘Outcome’ section above) was used for the majority of patients.

  5. 5.

    The study is not a case report, case series, review article or commentary.

  6. 6.

    The study patients are not a subset of patients from another included paper. If studies contain overlapping samples, the duplicate patients will be removed when possible (either by information from the paper or contacting the authors).

  7. 7.

    vNCT applying absolute HU attenuation values is the index test used.

Data extraction and quality assessment

Two authors (M.J.C., M.E-K.) independently extracted data by using data extraction sheets. Data extracted included: Study first author, journal and year of publication, number of patients, patient age (mean, range), number of patients excluded (because of study overlap, insufficient test, no reference standard), technical parameters of dual-energy CT imaging (vendor, dual-energy CT technique [e.g. two consecutive scans at different energies, dual-source/dual-energy, rapid kilovoltage switching or energy-resolving detector] and CT protocol), absolute attenuation threshold used to differentiate adenomas from other adrenal lesions, number of true positive (TP), true negative (TN), false positive (FP) and false negative (FN), number of adenomas, metastases, pheochromocytomas, other lesions and reference standard applied. If standard non-contrast CT was used as a direct comparison in the study, the data relevant to this comparator index test was extracted: technical parameters of CT imaging, absolute attenuation threshold used to differentiate adenomas from other adrenal lesions, number of TP, TN, FP and FN. Disagreements were resolved by consensus. The following definitions were used: TP is a positive index test (vNCT) and diagnosis of adrenal adenoma is confirmed by the reference standard. TN is a negative index test and diagnosis other than adenoma is confirmed by the reference standard. FP is a positive index test for adenoma not confirmed by the reference standard. FN is a negative index test and diagnosis of adenoma is confirmed by the reference standard. Adrenal lesion was defined using the American College of Radiology definition of an adrenal mass ≥ 1 cm [11]. An adrenal lesion containing macroscopic fat such as a myelolipoma measuring < 10 HU was considered to be true negative (not false positive) based on the appearance and density measurement of macroscopic fat. Otherwise, no assumptions were made for HU density for a given pathological diagnosis unless the density was explicitly stated by the authors. Studies with multiple readers with different numbers of TP, FN, FP or TN for adrenal lesions were averaged to obtain ‘study-level’ data and rounded to the nearest whole number.

Non-diagnostic tests were excluded from the analysis on the assumption that they occur randomly [19]. Samples to which no reference standard was applied were excluded, when possible. One of the authors (M.J.C.) extracted the following additional data: mean, standard deviation and range for adrenal lesion density (HU) on vNCT and NCT, where available. If statistical tests were performed for differences in attenuation values between vNCT and NCT, statistical test type and p values were reported.

A customized QUADAS-2 (a tool for quality assessment of diagnostic accuracy studies in systematic reviews) tool based on the four domains of study selection, initial reading, reference standard, and ‘flow and timing’ was used [20]. The customized tool was applied to all studies by two of the authors (M.J.C and M.D.F.M.). Disagreements were resolved by consensus discussion. Appendix 3 provides the QUADAS-2 tool used in this review. The risk for bias across studies (publication bias) was not assessed since there is no generally accepted method for diagnostic accuracy studies and the number of included studies was low [21, 22]. We attempted to minimize publication bias primarily via our search strategy which included multiple databases, conference abstracts and trial registries.

Data analysis

Data extraction was performed using Microsoft Excel (Redmond, WA, USA). 2 x 2 data was summarized in forest plots of sensitivity and specificity for each study, where available, using Meta-DiSc software (Madrid, Spain) [23]. Pooling of data to generate summary estimates was performed for sensitivity only, using the random effects model. The justification for this is that the reported specificity for the included studies was 1; therefore, use of a univariate random effects model (rather than a bivariate one) was felt to be appropriate [24]. If comparator test data was available (standard non-contrast CT), this was displayed and analyzed in a similar manner.

Heterogeneity, in the form of between-studies variance, was quantified using the I 2 index; use of I 2 (a univariate measure) was felt to be appropriate for the same reasons indicated above [21, 22, 24]. Levels of heterogeneity (low <25, moderate 25–75 and high >75) were defined as per Higgins et al. [25]. In addition, sources for heterogeneity were explored, including evaluating the effect of the following (where possible, based on provided data) on diagnostic test accuracy: 1) adenoma prevalence; 2) technical parameters; 3) threshold applied; 4) risk of bias.

Results

Search

Appendix 1 provides details on the search strategy. Figure 1 is a study flow diagram detailing search results and study inclusion. Appendix 2 provides a list of studies excluded at the level of full-text search, with the reason for exclusion. The initial search identified 60 articles, 28 trials and 2 abstracts; these were screened by title and abstract to 12 articles and 2 abstracts for full-text review. After application of the inclusion criteria, five articles (170 patients; 192 adrenal lesions) were included. All of these studies used dual-source dual-energy CT scanners. [13, 14, 2628]. All included articles were in English. Retrievable data from Kim et al. [14] only permitted calculation of the sensitivity for the diagnosis of adrenal adenomas.

Fig. 1
figure 1

Flow diagram illustrating search results, study review and study inclusion. RSNA = Radiological Society of North America, ARRS = American Roentgen Ray Society, ECR = European Congress of Radiology, WHO = World Health Organization

Data extraction

Table 1 summarizes the characteristics of the included studies. Table 2 summarizes the extracted data from the studies included in the diagnostic accuracy portion of the study. Table 3 summarizes the extracted data and relevant study characteristics from the studies comparing the density measurements of the adrenal lesions on vNCT and NCT with the comparative statistical tests performed and their results. There were statistically significant differences reported in measured HU densities between vNCT and NCT for the 60-second contrast delays in Botsikas et al. and Kim et al. studies. No significant differences were seen for 150 minute delays for Botsikas et al. and Kim et al. or for Gnannt et al. and Ho et al.

Table 1 Characteristics of included studies
Table 2 Results for studies included in the diagnostic accuracy analysis
Table 3 Summary of reported statistical comparisons of vNCT with respect to contrast timing

Data analysis

Figure 2 is a forest plot of sensitivity data for the diagnosis of adrenal adenoma using vNCT and NCT for the five included studies. Kim et al. and Botsikas et al. performed vNCT using both 60-second and 15-minute intravenous contrast delays, which are plotted separately. The pooled sensitivity for vNCT is 54% (95% CI: 47 to 62%) and heterogeneity I 2 index = 86.4%, which is suggestive of high levels of heterogeneity. NCT data for the adrenal lesions was available for three of the five studies [13, 14, 26]. The pooled sensitivity for NCT is 57% (95% CI: 45 to 69%) and I 2 = 10.1%, consistent with low levels of heterogeneity. Figure 3 is a forest plot of sensitivities for the subgroup of vNCT where there is corresponding NCT data for each adrenal lesion. The pooled sensitivity for these three studies is 42% (95% CI: 33 to 51%) and I 2 = 74.9%, which may represent moderate heterogeneity. Based on the overlapping 95% CIs, there are no significant differences between the vNCT, subgroup vNCT or NCT sensitivities. Pooling of specificity was not performed since the reported specificity for all studies was 100% (no false positives, see Table 2).

Fig. 2
figure 2

Forest plot of sensitivity for vNCT (top) and NCT (bottom) for diagnosis of adrenal adenoma. Included studies are listed on the right with summary accuracy measurements, 95% CI and pooled sensitivity listed on the right

Fig. 3
figure 3

Subgroup forest plot of sensitivities for vNCT for diagnosis of adrenal adenoma on studies that have corresponding NCT data

Table 4 depicts the results of the risk of bias assessment of individual studies in the diagnostic accuracy stream when using the modified QUADAS-2 tool. Two areas of concern regarding risk of bias were identified: the index test and reference standard. The risk of bias for the index test was high for Ho et al. since one of the index test interpreters had also reviewed the cases and interpreted the reference standard [13]. For Botsikas et al., Helck et al. and Kim et al., it was unclear if the index test was interpreted without knowledge of the results of the reference standard. The second area of concern was the reference standard. The risk of bias was unclear for four studies due to heterogeneity of the reference standards applied within studies and/or suboptimal reference standards such as clinical follow-up or unenhanced CT only rather than histologic reference standard. The study by Gnannt et al. was felt to be at high risk of bias because NCT (<10 HU defined adenoma) was used as the reference standard for all cases, representing a form of incorporation bias [19, 27].

Table 4 Results of QUADAS-2 assessment for risk of bias assessment

The I 2 index demonstrated high levels of heterogeneity between studies for pooled sensitivity of vNCT. There was low heterogeneity of importance between studies for pooled sensitivity of NCT. There are several possible reasons for high levels of heterogeneity for the vNCT group. Two important areas could be disease prevalence and technical parameters. The percentage of adrenal masses which were adrenal adenomas in the study population varied considerably from 56–83%. The percentage of adrenal adenomas which measured < 10 HU on NCT varied considerably from 47–70%. From a technical standpoint, there were differences in contrast timing between studies with Botsikas et al. and Kim et al. using both 60-second and 15-minute post contrast administration; Gnannt et al. and Helck et al. used 70 seconds and Ho et al. used 100 seconds. Technical parameters were otherwise comparable between studies. All studies used a cut-off of < 10 HU on vNCT for the diagnosis of adrenal adenoma.

Due to the small number of included studies, subgroup analysis or meta-regression for such things as contrast timing or adenoma prevalence could not be performed.

Discussion

In this systematic review and meta-analysis, we found that vNCT images generated from dual-energy CT demonstrated comparable sensitivity to NCT for the diagnosis of adenomas. These findings are of potential importance because diagnosis of adenomas using vNCT alone could prevent additional follow-up imaging studies and reduce cost and cumulative radiation dose (if NCT is pursued as the next imaging test) to the patient. However, some important limitations should raise caution when interpreting the data, including a high risk of some forms of bias and the number of included studies is quite low; as such, our study may be underpowered to detect differences in sensitivity (there was a potential trend for lower sensitivity for vNCT). In addition, planned pooling of specificity data could not be performed since no FP cases were reported.

High heterogeneity for sensitivity of vNCT was identified (and not for NCT). This may be due to potential overestimation of the density in HU on vNCT as compared to NCT. The degree of overestimation may depend on the intravenous contrast timing of the vNCT. There were statistically significant differences between measured density in HU of adrenal lesions for Botsikas et al. and Kim et al. when comparing the 60-second intravenous contrast delay vNCT and NCT but not between the 15-minute delay vNCT and NCT. There were no statistically significant differences between Gnannt et al. and Ho et al. for vNCT and NCT which had contrast delays of 70 seconds and 100 seconds, respectively. Adrenal adenomas characteristically show early enhancement followed by washout of contrast compared to surrounding tissues which forms the basis of adrenal CT washout studies [10, 29]. A previous study by Szolar and Kammerhuber demonstrated adrenal adenoma enhancement peaking around 60 seconds following intravenous contrast administration [29]. Peak adenoma enhancement at 60 seconds followed by rapid washout could explain why there was overestimation of attenuation values for vNCT derived from 60-second acquisition data that becomes less pronounced with derivation of vNCT from more delayed contrast timing.

Previous studies have consisted of single-center, retrospective data and compared vNCT to NCT using a variety of methods [13, 14, 2628]. We studied what we believe to be the most important measures of vNCT for adrenal adenomas: diagnostic accuracy and head-to-head comparison of HU density measurements. By combining the data from these studies to create a larger patient population and focusing on these important metrics, we were able to provide better insight into whether vNCT on incidentally discovered adrenal masses has the potential to reduce the number of follow-up chemical shift MRI or NCT studies performed to further characterize nodules detected incidentally at CECT. We have demonstrated that differences measured in HU for vNCT compared to NCT exist and may be attributed to timing of the contrast enhancement series from which the vNCT was derived.

Our study had several limitations. Studies utilizing rapid-switching or detector-based dual-energy CT scanners were unable to be included due to the absence of any studies evaluating adrenal lesions and using HU values measured on vNCT. In the case of rapid-switching dual-energy CT, measurement of HU on vNCT has only recently been described using multimaterial decomposition and has only been studied in normal anatomic structures [13, 14, 30]. Only five studies, all of which were retrospective and utilized dual-source dual-energy CT scanners, met the inclusion criteria. Heterogeneity between the studies’ intravenous contrast timing and adenoma prevalence (specifically the percentage of adenomas measuring < 10 HU on NCT) could affect the measured sensitivity. For the 3 studies which provided NCT values, only 57% of their adrenal adenomas measured < 10 HU on NCT [13, 14, 26]. This is less than the widely quoted value of 70% of adrenal adenomas being ‘lipid-rich’ or measuring < 10 HU on NCT [5] and leads to a lower sensitivity for vNCT on those same studies due to lipid-rich adrenal adenoma prevalence. This could be due to a selection bias in these studies (i.e. indeterminate lesions were more likely to be investigated). The QUADAS-2 assessment identified potential or unclear risks of bias in the index test and reference standard which could affect the calculated specificity, sensitivity and measured HU density. Lastly, with the exception of Botsikas et al., we were reliant on published values and unable to access the raw data for the included studies which could have led to more data for pooling of specificity of vNCT, NCT and head-to-head comparison of measured HU of adrenal masses for vNCT and NCT.

Our study demonstrates that virtual NCT has comparable sensitivity to true NCT when using attenuation thresholds for the diagnosis of adrenal adenoma. We were not able to pool specificity results across studies as there were no reported false positive diagnoses of adenoma using vNCT (or NCT) in the evaluated studies. Because we identified only a small number of studies with concerns about high risk of some forms of bias, we suggest that additional prospective, multicenter trials with an acceptable reference standard to validate the diagnostic accuracy of vNCT for diagnosis of adrenal adenoma are required. The heterogeneity with regard to the sensitivity of vNCT observed in our study may relate to the timing of the post-contrast acquisition from which the vNCT data set was derived. Studies performed at 60 seconds resulted in higher vNCT attenuation values compared to their true NCT attenuation values. This important observation will require further analysis but may suggest a limitation of vNCT for diagnosis of adrenal adenoma using data derived from early post-contrast enhanced acquisitions.