Introduction

Citation bias, resulting from the selective citation of articles based on their results, is a common issue in bio-medical research [1, 2]. The preferential citation of articles with favorable or statistically significant results can lead to over-representation of certain studies in the literature [1, 3]. Combined with use of citation rate as a surrogate for study quality, this process may inflate the perceived value of studies that are highly cited, based solely on the direction or significance of their results [4]. Given its ability to distort the body of information available on a given subject, preferential citation may adversely influence which evidence clinicians, reviews, and clinical practice guidelines choose to inform their imaging-related recommendations and decisions [5].

Citation bias among trials of therapeutic interventions has been well-documented [6,7,8,9]. Diagnostic accuracy research differs from evaluations of therapeutic interventions, since it does not always produce a result of “statistical significance”—rather, it typically provides estimates of test accuracy with confidence intervals.

It is not known whether studies producing higher diagnostic accuracy estimates (analogous to positive or statistically significant results in therapeutic trials) are similarly subject to preferential attention. Although previous work has evaluated associations of accuracy estimates with other forms of bias (e.g., time-lag bias and reporting bias) [10,11,12], to our knowledge, there are no published studies evaluating citation bias in the area of diagnostic accuracy research.

The more frequent citation of results showing higher accuracy in clinical practice guidelines, commentaries or non-systematic reviews could lead clinicians to overestimate the accuracy of imaging tests. Identification of preferential citation in this area of research would therefore be a key initial step in improving the quality of the literature that guides patient care. The purpose of this study was to assess the risk of citation bias in imaging diagnostic accuracy research by evaluating whether studies with higher accuracy estimates are cited more frequently than those with lower accuracy estimates.

Methods

Research ethics board approval is waived for this type of study at our institution.

The study protocol was agreed upon a priori and is available on the Open Science Framework [13].

Search strategy

We employed a convenience sampling strategy, by which primary studies were identified for inclusion from a previously collected series of diagnostic accuracy imaging systematic reviews [14]. A flow diagram outlining study inclusion for the prior study (i.e., our initial convenience sample) is available as Supplementary Fig. 1. The search strategy was as follows: Medline was searched, applying a systematic review filter, in addition to a previously published search filter for meta-analysis or systemic reviews of studies of diagnostic test accuracy (Appendix 1) [15,16,17]. Search results were restricted to radiology, nuclear medicine, and medical imaging journals, as defined by Clarivate Analytics Journal Citation Reports (Appendix 2) [18], and limited to articles published in English between January 1st, 2005, and April 30th, 2016.

All included systematic reviews were on a topic of imaging diagnostic test accuracy, published in imaging journals, and obtained summary estimates of sensitivity and specificity using a hierarchical pooling method.

Study inclusion

All primary studies that were represented in the selected meta-analyses were screened for inclusion in the present study. One investigator (R.F., 4th year medical student) screened all records for eligibility and identified studies for inclusion.

To be included, an article was expected to meet the following criteria: primary study assessing the diagnostic accuracy of at least one imaging test and report at least one original set of sensitivity and specificity estimates, or complete 2 × 2 data.

Studies were excluded for following reasons: two authors unable to locate or access the full text through institutional subscriptions; not indexed in Web of Science (unable to obtain citation data); and duplicate occurrences of an included study from another meta-analysis.

Data collection

Data extraction was independently performed in duplicate by eight authors. Imaging modality, organ-based subspecialty, study design, and sensitivity and specificity were extracted by R.F., W.D. (second year radiology resident) J.S. (clinical epidemiology Master’s student), A.D. (second year medical student), N.K. (MD), T.M. (fourth year medical student), M.W. (third year radiology resident), N.L. (third year medical student), and I.G. (third year medical student). Discrepancies were resolved by consultation with a third author (M.M., radiologist with 10 years of clinical experience and 6 years of experience performing systematic reviews). 2016 journal impact factor was extracted (not in duplicate) by J.S., A.D., N.K., T.M., W.D., M.W., and N.L.. Times cited was extracted by R.F. on February 19th–20th, 2018, for all studies. Article titles, publication date, authorship, journal, and sample size were previously collected for use in a prior study [14].

Complete details regarding data collection and classification are outlined in Appendix 3.

Statistical methods

A negative binomial regression analysis was performed to evaluate the strength of association between Youden’s index (calculated as sensitivity + specificity -1) and citation rate for the included primary studies, controlling for the following confounding effects: 2016 journal impact factor, imaging modality, organ-based subspecialty, study design, sample size, and source meta-analysis. We hypothesized that there would be a positive association.

A sensitivity analysis, excluding any studies for which the month of publication was unavailable, was also performed. Additional subanalyses were performed using the same statistical approach to assess the association of citation rate with sensitivity alone and specificity alone (rather than Youden’s index).

Positive and negative regression coefficients were considered to represent positive and negative associations, respectively, between a given variable and citation rate. The magnitude of the regression coefficient represents the relative strength of association between the variable and citation rate (i.e., a coefficient of 1.0 represents an association twice as strong as a coefficient of 0.5). Specifically, for a one unit change in the diagnostic accuracy estimate, the log of the citation rate is expected to change by the value of the regression coefficient (when other potential confounding variables in the model are held constant). The direction of association (positive versus negative) for categorical variables is interpreted relative to the reference category. A significance level (α) of 0.05 was used for all hypothesis tests. All analyses were performed in R version 3.4.3 [19].

Results

Search and inclusion

We screened 1458 primary studies from 98 meta-analyses for inclusion. After applying inclusion and exclusion criteria, the final analysis included 1016 primary studies from 97 meta-analyses. The detailed study selection process is outlined in Fig. 1. Appendix 4 contains a complete reference list of all included systematic reviews.

Fig. 1
figure 1

Flow diagram demonstrating study selection. DTA, diagnostic test accuracy; Sens, sensitivity; Spec, specificity; ID, identification

Study characteristics

Publication dates for the included studies ranged from May 1985 to May 2015. Complete publication dates were available for 493 studies (49%), while 523 studies (51%) had to be assigned either the default day (n = 509, 50%) or default month and day (n = 14, 1%). Of 241 publishing journals, 619 included studies (61%) were from imaging journals, the most common being Radiology with 98 (10%) of the included studies. The most strongly represented subspecialty and modality were cardiac imaging (n = 266, 26%) and magnetic resonance imaging (n = 288, 28%), respectively, while the most common study topic was cardiac CT (n = 122, 12%). The majority of studies reported prospective design (n = 473, 47%), while 220 (22%) were retrospective and 323 (32%) did not specify. Characteristics of the included studies are summarized in Tables 1 and 2.

Table 1 Summary characteristics of 1016 included primary studies
Table 2 Summary of included study topics by organ-based subspecialty and imaging modality

The mean citation rate among included studies was 0.51 citations per month (95% CI [confidence interval], 0.47–0.55). The highest citation rate was 7.57 citations per month, for a randomized study published in the New England Journal of Medicine, assessing CT virtual colonoscopy versus optical colonoscopy for colorectal cancer screening and reporting a Youden’s index of 0.898.

Preferential citation

A positive association between Youden’s index and citation rates was present; regression coefficient = 0.35 (p = 0.011). After excluding 14 studies with no reported month of publication, this association was maintained with a regression coefficient = 0.34 (p = 0.017).

Assessment of the association between highest reported sensitivity and citation rate yielded a regression coefficient of 0.43 (p = 0.027). The association between highest reported specificity and citation rate was not statistically significant, yielding a regression coefficient of 0.33 (p = 0.14). Regression coefficients for the variables of interest and selected potential confounders are presented in Table 3. A complete set of regression coefficients from each analysis is available in Appendix 5.

Table 3 Association of multiple variables with citation rate, as determined by negative binomial regression analysis

Discussion

A positive association between diagnostic accuracy estimates (Youden’s index, sensitivity) and primary study citation rates was identified. This suggests that studies reporting higher diagnostic accuracy are preferentially cited in the imaging diagnostic accuracy literature.

Despite the popular belief that citation rate is indicative of study quality, it is not surprising that our study identified preferential citation of favorable results. Several prior studies have identified drivers of citation other than study quality, such as study availability, utility of results, and industry funding [4, 20, 21]. Furthermore, a recent meta-analysis found that citation bias is prevalent in the bio-medical literature, where studies with statistically significant results, positive results, hypothesis-supporting results, or favorable conclusions are cited more frequently [1]. This issue has been identified for therapeutic trials on topics of cardiology, gastroenterology, orthopedics, immunology, addiction medicine, and psychiatry, among others [6, 9, 22,23,24,25], but not previously for diagnostic accuracy research (imaging or otherwise).

Our results suggest that the preferential citation in the included studies is largely driven by sensitivity estimates, as the regression coefficient for specificity was smaller and not statistically significant. This finding is consistent with the results of a recent study assessing time-lag bias in imaging diagnostic test accuracy studies, which found that higher Youden’s index and sensitivity, but not specificity, were associated with more rapid study publication [12]. The occurrence of this pattern of bias in two independent studies suggests that sensitivity might be considered more important than specificity by authors in the imaging research community. This notion is further supported by the fact that 29 studies were excluded from our analysis for not reporting specificity, compared to only two failing to report sensitivity. However, it interesting that the regression coefficient was similar for Youden’s index and specificity—as with any negative result, it is possible that our study may have been underpowered to detect an existing association between specificity and citation rate.

The significant positive association demonstrated between citations and impact factor was expected. The two metrics are related—journal impact factor is, by definition, a function of study citations within that journal. In addition, there is a popular perception that journal impact factor is a reliable surrogate for study quality, which could promote citations [26].

The findings of our study might warrant concern for the field of diagnostic imaging and general clinical practice. In order to make informed decisions regarding the utility of a diagnostic imaging test, radiologists and clinicians should have balanced exposure to all of the relevant evidence. The impact of preferential citation is likely greatest for topics that have not been summarized by systematic reviews. Well-conducted systematic reviews theoretically consider all articles relevant a topic, regardless of accuracy estimates. However, non-systematic reviews, commentaries, and clinical practice guidelines often rely on ad hoc methods for study inclusion and citation. As such, these non-systematic evaluations may (likely unintentionally) select and cite studies with higher accuracy and contribute to citation bias.

Imaging results are often instrumental in guiding patient care decisions, with substantial influence on patient outcomes. If the citation pattern detected in this study is partially attributable to selective inclusion of evidence in diagnostic accuracy review articles, this could potentially translate to flawed clinical practice guidelines. Through disproportionate representation of studies with high accuracy estimates in reviews and guidelines, the resulting citation bias could drive clinicians to overestimate the accuracy of imaging tests. Overestimating sensitivity, in particular, can lead to false reassurance and delayed diagnosis, potentially contributing to adverse health outcomes.

It is important that physicians and researchers are aware of citation bias in the literature in order to mitigate its impact on patient care. At the patient-care level, clinicians should seek high quality systematic reviews or perform an in-depth search of the primary literature in order to adequately answer their clinical questions. From a publication perspective, journal editors and authors (particularly those of clinical guidelines and non-systematic reviews) should endeavor to minimize misconceptions by including explicit statements regarding the how well the cited studies reflect the trends in the literature [1].

Our study is subject to limitations. We employed a convenience sampling strategy, drawing from a previously collected set of meta-analyses. However, it is unlikely that this conferred substantial bias, as the included primary studies effectively represent a random sample, and we controlled for the source meta-analysis in the statistical analysis. While we accounted for several key potential confounding variables in the statistical analysis, additional study characteristics potentially influencing both accuracy estimates and citation rate were not feasible to assess, such as study methodology and quality. While it might have been ideal to account for the influence of study quality on citation rate, this might represent an unnecessary (and arguably futile) endeavor in that there is no universal quality metric for diagnostic accuracy studies that might influence citations. While completeness of reporting as measured by STARD can be considered a reproducible quality metric, multiple studies have found no association between STARD adherence and citations [27, 28]. Furthermore, another useful quality measure, QUADAS-2, utilizes a variable scoring system specifically tailored to each individual study and therefore could not be reliably compared between multiple studies for a regression analysis [29]. Given our very large sample size (1016 primary studies cited a total of 69,330 times), it was not feasible to assess the nature of each individual citation. Thus, we are unable to comment on the sources of citations (e.g., self-citation, recent study building upon prior work, bibliometric study, review article, and clinical guideline) and the context of the citations (e.g., referencing accuracy estimates versus other study data, supporting or refuting cited information, and appropriate or inappropriate interpretation of cited results). Our initial evaluation of citation trends in diagnostic accuracy research has identified preferential citation of primary studies with higher diagnostic accuracy estimates. While this could represent over-inclusion in reviews and guidelines, other drivers of this phenomenon (e.g., more recent studies tending to reproduce or expand upon prior studies that show promising results) must also be considered. Future detailed analyses of the nature of these citations would be useful in clarifying the relative influence of citation bias in diagnostic imaging research. While it would theoretically be possible for future study to performing a similar analysis accounting nature of citations, the feasibility of this is questionable. Firstly, simply identifying the individual sources for hundreds of citations would prove to be a daunting task, and assessing the nature of these citations would undoubtedly be both subjective and labor-intensive. Ultimately, we expect that this would necessitate a substantial reduction in sample size, thereby sacrificing statistical power (an important strength of the present study). Another barrier to accounting for the nature of citations is that a major driver of citation might be the novelty of a topic can change abruptly over time. For instance, current articles on the topics of artificial intelligence and machine-learning may be preferentially cited regardless of accuracy results, but this could rapidly change as these technologies become more commonplace. Given the inevitable delay between data collection and publication of a manuscript, accounting for this variable in a study might actually render the study less relevant by the time of publication. We also did not account for the date of publication of the included studies in our analysis. Our study included primary studies published over a 30-year period, but we have no reason to suspect a systematic increasing or decreasing trend over time in reported accuracy estimates, though citation practices may have evolved substantially in this time frame. The negative binomial regression model used may be criticized for not accounting for “0” counts of citations appropriately; however, the impact of this is likely minimal since only two of more than 1000 studies were not cited. In addition, there may be some bias introduced by citation of the primary studies in the “source meta-analysis.” However, the impact of this is also likely to be minimal when the median and mean citations per included study were 40 and 60, respectively. Furthermore, the citation of the primary studies in the source meta-analysis was not a consistent practice (some were listed in appendices rather than in the reference lists). Due to the nature of meta-research, our included studies were limited to published articles, so we were also unable to account for publication bias (resulting from a failure to publish disappointing results) as a potential confounding variable. We only assessed the association of objective results with citation rates; however, in the meta-analysis by Duyx et al [1], the variable demonstrating the strongest association with citations was favorable author conclusions. A similar phenomenon might be present among imaging diagnostic test accuracy studies, as over-interpretation of study results has been established as a common practice in this area of research [30, 31]. Therefore, we could have underestimated the true magnitude of citation bias in the imaging diagnostic test accuracy literature—this is an important and feasible topic of interest to be explored in a future study.

The preferential citation pattern identified in this study, with higher accuracy estimates cited more frequently than lower accuracy estimates suggests that citation bias might exist in imaging diagnostic accuracy literature, which could lead to overestimation of test accuracy. Downstream consequences of such bias could include misdiagnosis and adverse health outcomes. With identification of this risk as a key initial step, future studies exploring other potential drivers as well as the context of these citations are warranted in order to further define the influence of citation bias in imaging research. While this topic remains under investigation, clinicians, review authors, and journal editors should exercise vigilance in their efforts to optimize citation practices and enhance the quality of imaging literature.