Introduction

A number of studies have identified and described the continuing use of retracted publications in the biomedical literature. Budd et al. (1998) found 235 articles published from 1966 through 1997 that were tagged in MEDLINE as retracted, noting that these articles were cited 2034 times after retraction, and that most of these citations were “implicitly positive” while 275/2034 were explicitly positive. Similarly, Nath et al. (2006) used MEDLINE to identify 395 articles published between 1982 and 2002 and subsequently retracted, and categorized the reasons for retraction as misconduct or unintentional error. They estimated that retractions are more than twice as likely to result from unintentional mistakes as from scientific misconduct.

Recently, Cokol et al. (2008) reported that retraction rates are increasing, although author unawareness is likely a factor in the continuing use of retracted literature (Budd et al. 1998, 1999; Snodgrass and Pfeifer 1992). In the past, retractions and corrections were not always accessible or obvious to database users (Snodgrass and Pfeifer 1992, Neale et al. 2007, Pfeifer and Snodgrass 1992). Journals and libraries have a variety of approaches to handling retractions and requests for retractions (Pfeifer and Snodgrass 1992; Atlas 2004; Friedman 1990; Parrish 1999). This landscape seems to be improving with more awareness of scientific misconduct and the need for policies for retracting affecting articles, (Sox and Rennie 2006) plus improvements in databases (Garfield et al. 2006; National Library of Medicine 2008; Norton and Saltman 2007).

Our study of the retracted and corrected publications took a different approach: we started with articles affected by scientific misconduct, and then identified if they were retracted or corrected, the extent that they were subsequently cited in other papers, and the way that citing authors used the information in these affected papers. The use of literature affected by misconduct is concerning because it has the potential to misdirect subsequent research and clinical care (Budd et al. 1999; Benos et al. 2005; Couzin and Unger 2006; Gardner et al. 2005; Katz 2006; Roberts et al. 2007).

Study Background

In a previous paper from the same study, we identified 102 published research articles that were named in official findings of scientific misconduct, and reported on whether they had been retracted or corrected per the administrative actions in the scientific misconduct report (Neale et al. 2007). To identify the 102 articles affected by scientific misconduct, the study methods involved a content analysis of all the “Findings of Scientific Misconduct” published in two public sources (the NIH Guide for Grants and Contracts, and the Annual Reports of the U.S. Office of Research Integrity) during the period of 1993–2001. We also determined the number of citations to these 102 affected articles by the retraction or correction status of each.

The purpose of this report is to characterize the papers that subsequently cited the 102 articles affected by scientific misconduct. Three study objectives are addressed: (1) Characterization of a sample of published papers that cited the articles affected by scientific misconduct using a quantitative content analysis methodology. Specifically we evaluated how the citing authors used the affected article, and determined if the citing paper contained an indication of awareness that it was citing an article affected by scientific misconduct. (2) Evaluation of the hypothesis that affected articles (i.e., named in findings of scientific misconduct) have fewer citations than those in a comparable group of articles unaffected by scientific misconduct. (3) Consideration of whether clinical practice might have been influenced by any of the 102 articles affected by scientific misconduct. To address this third objective, we used a case study approach to examine those papers that used an affected clinical article to directly support their own study purpose, and we also supplemented our observations with literature searching and expert opinion.

Methods

Articles Affected by Scientific Misconduct

We used the same sample of 102 articles affected by scientific misconduct that are described by (Neale et al. 2007). As previously detailed, we used the Cited References Search in the Institute for Scientific Information (ISI) Web of Science (Thomson Reuters 2009) to determine the number of citations by subsequent authors to the 102 articles affected by misconduct. The citation analysis was conducted during the week of May 17, 2005; as of that time, the Web of Science database listed 5,393 citations to the 102 articles (range: 0-592; median of 26 citations per article) (Neale et al. 2007).

Stratified Random Sample of Citing Papers

To address the first study objective (to characterize citing papers), we conducted a content analysis of papers that cited the articles affected by scientific misconduct. As it was not feasible to perform a content analysis on the population of the 5,393 citing papers, we developed a stratified random sampling strategy, stratified on the number of citations, that included at least one citation to each of the 86 affected articles that had a citation (16/102 affected articles did not have a citation at the time of data collection). If an affected article had 1–3 citations (n = 22), then we randomly selected 1 of these citing papers; if an affected article had ≥4 citations (n = 64), then we randomly selected 33% of these citing papers. This methodology also allowed citations to accrue for a minimum of three years after publication.

Using this stratified random sampling strategy, 603 articles were drawn from the population of 5,393 citing articles (excluding the approximately 65 articles used for the content analysis training). The following eligibility criteria were used to select the stratified random sample of citing articles for the content analysis: (1) only English-language articles; (2) only articles denoted in PubMed as a research study or a review article; (3) only articles for which we obtained a copy of the article (either by open access, institutional subscription, Loansome Doc or inter-library loan); (4) only citing papers published at least one year after the first official notification of the finding of scientific misconduct. When an ineligible citing paper was sampled (e.g. not in English; a letter to the editor; or unsuccessful efforts to obtain a copy of the article), then another was selected at random from the stratum; this occurred in <10% of sampled papers.

Content Analysis of Citing Papers

During the content analysis, each of the 603 sampled papers was read and the following information abstracted and recorded: (1) the type of study in the citing paper (human clinical, human basic science or animal); (2) whether the affected article was explicitly referenced such as a direct naming of the author or the study findings, or referred to implicitly such as being embedded in a string of citations; (3) evidence of awareness that the article was affected by scientific misconduct (e.g. citation of a retraction/correction or discussion of the scientific misconduct investigation); and (4) the nature of how the affected article was used in the citing paper (direct support/contrast, indirect support/contrast, benign use, and other; see Table 1 for operational definitions).

Table 1 Operational definitions for content analysis of how the affected articlea was used by the citing paper.b

Content Analysis Training

Content analysis training was conducted in two phases. In the first phase, the principal investigator (AVN) and the study coordinator separately abstracted 20 citing papers using an open-ended data abstraction form. After a discussion of findings, a close-ended data abstraction form was developed and used in a second pilot study of 20 different citing papers. After another review and discussion, the content analysis data abstraction form was revised into its final form.

A majority of the 102 articles affected by misconduct were basic science studies (52.9% animal; 20.6% human), and 26.5% were human clinical studies. An advanced graduate student completing a Master of Science degree in our medical school’s biomedical sciences program was retained to read and abstract the 603 citing papers, most of which were published in basic science journals. The graduate student was trained by the study coordinator in the methods of the content analysis using 10 citing papers. We then conducted two analyses of the proportion of abstractions in which they had any disagreements. In the first analysis, they independently abstracted 15 citing papers and found disagreement on only one variable in each of three articles. Consensus for the coding of the three discrepancies was reached by discussion. In the second analysis, they each abstracted 21 new citing papers and found disagreement on one variable in each of 5 papers. Following more discussion about coding nuances, the graduate student read and independently abstracted the remaining citing papers sampled for the content analysis. Occasional questions were resolved with the study team.

Comparison Group

To test the hypothesis in the second study objective, we developed a matched comparison group to determine if the affected articles were cited at a comparable rate as similar publications that were not named in “Findings of Scientific Misconduct”. The methodology for identifying comparison articles was to select one article from the same year and journal issue as each affected article. Other eligibility criteria for selection as a comparison article were: (1) Different authors than affected article; (2) No evidence that the article was affected by misconduct, as indicated by corrections or retractions indexed in PubMed; (3) Same type of paper: original research report or review article; (4) Same type of study: basic science (animal or human in vitro) or human clinical (determined from the MEDLINE database, and the content analysis; (5) Indexed in the Institute for Scientific Information (ISI) Web of Science (Thomson Reuters 2009); and (Neale et al. 2007) Printed adjacent (preferably immediately after) the affected article. If the affected article was the last article in the issue, then the article just before it in that issue was selected. If the selected article did not meet all inclusion criteria, then the next adjacent article was sampled, until an eligible article found. The number of citations to this comparison group was determined from the Cited Reference Search in the ISI Web of Science database (Thomson Reuters 2009) during the period of 8/8/2005 – 8/23/2005. The distribution of the citation data was skewed and so we used the Wilcoxen signed rank test to compare the number of citations for the comparison group to that for the affected articles during the period of 05/17/2005 through 05/20/2005 (Neale et al. 2007).

Consideration of Effect on Clinical Practice

To consider how articles affected by misconduct possibly might have affected clinical care (the third study objective), we identified the four papers (Caen and Han 1993; Maier and Watkins 1999; Whitsett 1995; Zauli and Catani 1995) that used an affected article for direct support of a study purpose or finding to evaluate any evidence that they might have influenced clinical practice. We also used other qualitative approaches (literature searching and expert opinion) for insights into the clinical use of findings from such affected articles.

Results

Characterization of Citing Papers

The citation to an article affected by misconduct was embedded in a string of references in 61% of the 603 sampled citing papers; there was an explicit reference to an affected article in 39% of citing papers. Table 2 shows how the affected article was used by subsequent citing authors: 8.6% of citing papers explicitly used the affected article as either direct support or direct contrast of their study; and 54.1% implicitly used the affected article as indirect support or indirect contrast. (See Table 1 for definitions of how the citing papers used the affected articles.) One-third (32.8%) of citing papers did not address invalid information in the affected article (i.e. they used a portion of the affected article that was not affected by the misconduct, such as the literature review or a description of a methodological technique). Only 2.8% of the 603 citing articles referenced the corrigendum (i.e. retraction, erratum or a comment tag) to the article affected by scientific misconduct.

Table 2 Nature of use of 86/102 articles affected by scientific misconduct in 603 subsequent citing papers.a

Citations to Affected and Comparison Articles

Table 3 shows that the 102 articles affected by misconduct had a median of 26 citations, and the 102 comparison articles had a median of 27 citations (p = 0.08). Thus, the hypothesis that affected articles named in findings of scientific misconduct will have fewer citations than those in a comparison group (study objective 2) was not supported.

Table 3 Comparison of citations to articles affected by scientific misconducta and a matched comparison groupb

Influence of Affected Articles on Clinical Practice

In regards to study objective 3, Table 2 shows that 50 of the 603 sampled citing papers explicitly used an affected article for direct support of a study purpose; four of these 50 papers cited a clinical study. Neither of the two papers that explicitly used an affected paper for direct contrast were clinical studies. We reviewed the four citing papers for any evidence that they possibly might have affected clinical care. Each of these four papers (Caen and Han 1993; Whitsett 1995; Whitsett 1995; Zauli and Catani 1995) cited the same clinical article affected by scientific misconduct titled “Preliminary report: Effects of interleukin-1 on platelet counts”, and authored by Anand Tewari, William Buhles and H. Fletcher Starnes (Tewari et al. 1990). In part, the abstract of this article states the following:

“Recombinant human interleukin-1β was given in 5 daily intravenous infusions to ten patients with metastatic malignant disorders as part of an antineoplastic trial…. A 50% rise in platelets occurred in response to interleukin-1β… Interleukin-1β may therefore be beneficial in the treatment of conditions of thrombocytopenia associated with haematological disorders and chemotherapy for malignant disorders.” (Tewari et al. 1990).

As indicated in PubMed, Bhules and Starnes (1992) retracted their paper co-authored with Tewari on 08/22/1992. The retraction states that some arithmetical errors were made in summarizing the data and the overall conclusions as originally stated are valid, but they must retract the paper.

The four papers that cited the Tewari et al. (1990) article were all review articles published after the 1992 retraction. The citing paper by Caen and Han (1993) was published in 1993, and it is possible that the retraction (Buhles and Starnes 1992) was posted around the time or even after Caen and Han (1993) submitted their manuscript for publication. The other three citing papers were published after longer time lags: both Whitsett’s (1995) paper and the Zauli and Catani (1995) paper were published in 1995. And finally, the Maier and Watkins (1999) review article was published in 1999. Three of these four review articles (Caen and Han 1993; Maier and Watkins 1999; Zauli and Catani 1995) used the Tewari et al. (1990) article to support similar statements that interleukin-1 increases platelet production. Whitsett (1995) added that it “could increase platelet counts in cancer patients”, and Zauli and Catani (1995) stated that it “promotes platelet production in clinical trials.” Maier and Watkins (1999) cite Tewari et al. (1990) with: “the administration of cytokines to humans produces reports of depressed mood”; this is curious because the Tewari article (Tewari et al. 1990) did not measure or discuss depression or any mood state.

To gain insight into the possible current use of interleukin-1 to increase platelet production, we searched the Clinical Practice Guidelines in Oncology™ published by the National Comprehensive Cancer Network, Inc. (2009), and did not identify any treatment guidelines suggesting interleukin-1 to increase platelets. We also consulted with four physicians and an advance practice nurse affiliated with our National Cancer Institute-designated comprehensive cancer center. Each said they were not aware that interleukin-1 was ever a standard of care to increase platelets. Based on this informal survey, we found no suggestion that the Tewari et al. (1990) paper affected clinical oncology care.

Discussion

The judgment of scientific misconduct and the publication of subsequent corrigenda (notices of findings of misconduct, retractions or corrections) in PubMed did not seem to influence the number of citations to such affected articles, which were similar to those of a scientifically drawn comparison group. Indeed, fewer than 5% of citing authors evidenced any awareness that they were citing an article named in a judgment of scientific misconduct. Depending on the method for locating and accessing such affected papers, the authors of the citing paper may not have been exposed to the retraction (Neale et al. 2007). Recently, Eugene Garfield and colleagues (Garfield et al. 2006) noted that that the Science Citation Index in the ISI Web of Science offers a reliable approach to find retractions.

The majority of citing authors used an affected article in an implicit or indirect way, or referred to a portion of the affected article that was not discredited (e.g. the study methods). Four (all review articles) of the 603 sampled citing papers explicitly referred to one affected clinical study (Tewari et al. 1990) for direct support of their purpose, and it is unlikely that this single clinical study had a significant effect on clinical equipoise or clinical care.

Limitations

Several limitations to the study methodology are acknowledged. First, we were unable to determine the dates when retractions or errata were posted, and to identify which citations accrued before such postings. Second, the selection of the most appropriate comparison group was challenging. We used a matched comparison group strategy by selecting an adjacent article in the same issue. This reduced factors that could affect citations such as the journal impact factor, the journal’s open access status and the length of time since publication for citations to accrue. Third, it would be ideal to identify the citations to the population of articles affected by misconduct and to the comparison sample during the same brief time window; however the labor-intensive nature of the data collection resulted in slightly different time windows. The progression of work was such that we first identified the citations to the affected articles, then developed the comparison sample, and then identified the citations to the comparison sample.

Conclusion

Although most articles named in misconduct investigations have an identifiable corrigenda (Neale et al. 2007), few citing articles reference such retractions or corrections in their bibliography. Citing articles evidenced little awareness that they were using an article affected by scientific misconduct, which in part may be due to the past barriers to identifying retractions and corrections (Snodgrass and Pfeifer 1992; Neale et al. 2007). In spite of the ongoing use of discredited literature, science tends to be self-correcting through the processes of replication and post-publication peer review (Cokol et al. 2007; Couzin 2006; Poulton 2007), with occasional notable exceptions (Katz 2006). Although others have noted that citations diminish after retractions and public exposure of misconduct (Garfield and Welljams-Dorof 1990; Pfeifer and Snodgrass 1990), we found a similar level of citations to both the papers affected by scientific misconduct and those in a matched comparison group. Authors may not use all of the available information about the validity of some of the biomedical literature, yet there is little indication that the articles affected by scientific misconduct used in this research had an adverse effect on clinical practice. Nevertheless, the use of tainted literature likely still has the untoward consequences of wasting time and effort as well as undermining public trust in the integrity of the scientific enterprise (Sox and Rennie 2006; Benos et al. 2005; Horton 1999; Tobin 2000).