Introduction

One of the important scientific questions pertaining to modern agriculture is whether genetically modified (GM) foods contain chemicals that may place human health at risk. The answer to this question has policy implications, since many have argued that GM foods should be banned, regulated, or clearly labelled because they may be unsafe to eat (Environmental Working Group 2012; Buiatti et al. 2013; Public Citizen 2014). Proponents of GM crops have argued that the scientific evidence shows that GM foods are just as safe to eat as normal foods and can play an important role in combatting world hunger (Lemaux 2008; Batista and Oliveira 2009; Buiatti et al. 2013). Opposition to GM crops has been especially strong in Europe. Environmental groups, farmers, and concerned citizens convinced the European Union to ban the cultivation of GM crops (labelled by critics as “Frankenfoods”), which it did from 1997 to 2010. The EU now allows member nations to decide whether to permit the cultivation of GM crops and requires labeling of GM foods. Several countries in the EU and other parts of the world have banned the importation or commercialization of GM foods (Resnik 2012).

On September 19, 2012, Gilles-Eric Séralini and seven coauthors published an article in Food and Chemical Toxicology (FCT) that added fuel to the simmering controversy over GM foods and crops. The article claimed that rats fed Roundup©-resistant GM maize alone, GM maize with Roundup©, or Roundup© for 2 years had a higher percentage tumors and kidney and liver damage than normal controls. The authors attributed these results to the endocrine-disrupting effects of Roundup© and the metabolic impact of consumption of the transgene in GM maize (Séralini et al. 2012). Prior to the publication of the article, the authors shared their manuscript with a select group of journalists and took the unusual step of requiring them to sign an agreement not to share it with third parties to avoid leaks of sensitive information (Nicole 2012). Shortly after this study was published, numerous scientists and several scientific organizations, including the European Food and Safety Agency and Société Française de Pathologie Toxicologique, argued that the research was severely flawed on methodological and ethical grounds (Nicole 2012; Butler 2012).

In its March 2013 issue, FCT published over two dozen critical responses to the study, some of which called for retraction of the article. The journal retracted the article without the authors’ consent in its January 2014 issue (Food and Chemical Toxicology 2014a, b). FCT required the authors to submit their raw data for review before making its decision. The editors found no evidence of fraud or misrepresentation, but they determined that the sample size was too small to draw useful conclusions and that statistical variability could not be excluded as a reason for the observed results. Although the editors did not claim that the results were incorrect, they determined that they were inconclusive and that the article did not reach the threshold of scientific rigor needed for publication in FCT (Food and Chemical Toxicology 2014a, b). This was not the end of the study, however. On June 24, 2014, the authors republished a slightly modified version of their article in Environmental Sciences Europe (ESE), a new, open access journal (Séralini et al. 2014a, b). Several of the authors also published an accompanying commentary in the journal in which they disagreed with the rejection decision and claimed that their critics had undisclosed conflicts of interest (Séralini et al. 2014a, b).

Should the Séralini study have been retracted? Should it have been republished? How should the editors have handled the ethical and scientific issues related to this research? This article will consider these and other questions related to editorial decision-making in this episode. It will first review the Séralini study, its findings, the charges of the critics, and the authors’ responses to these charges.

The Séralini Study

Numerous studies of laboratory animals fed GM foods have found no evidence of risks to human health (EFSA GMO Panel Working Group on Animal Feeding Trials 2008; Lemaux 2008; Batista and Oliveira 2009; Snell et al. 2012). Most of these studies are 90-days trials that compare rodents fed GM food to a control group (see Hammond et al. 2004; Zhu et al 2013), but they also include experiments lasting 2 years and multi-generational investigations (Snell et al. 2012). Outcome measures typically include mortality, tumor formation, and organ and tissue damage (EFSA GMO Panel Working Group on Animal Feeding Trials 2008). A potential shortcoming of 90-day trials is that they involve feeding rodents a high dose of a product for a short time period but they do not assess the long-term effects of consumption of GM foods at lower doses. Since the human diet often involves prolonged consumption of GM foods at modest levels, it may be difficult to draw conclusions for human health from short-term feeding studies (Séralini et al. 2014a, b). However, long-term studies can be difficult to interpret because many strains of rodents, including those used by Séralini and collaborators, spontaneously develop tumors and other health problems after 18 months (Brix et al. 2005). To conduct a rigorous long-term feeding study it is therefore necessary to design it so that it can distinguish between adverse effects due to the diet and adverse effects resulting from normal deterioration of rodent health (Grunewald and Bury 2013).

The goal of Séralini’s study was to measure the effects of feeding rats Roundup®-resistant NKG603 GM maize and Roundup® over a 2-year period. Roundup® is an herbicide containing the active ingredient glyphosate. The study randomly assigned 100 male and 100 female Sprague-Dawley rats to ten different groups, each with ten rats: three groups fed Roundup®-resistant GM maize at three different doses (11, 22, and 33 % of the diet), three groups fed Roundup-resistant GM maize at the three different doses pre-treated with Roundup®, three groups fed water containing Roundup® at three increasing doses, and one control group fed a diet of 33 % non-GM maize. The animals were monitored twice a week and 11 urine and blood samples were collected before, during, and at the conclusion of the study. Food and water consumption, weight, behavior, physical appearance, and palpable tumors were also measured or assessed. The animals that had not died were euthanized at the end of the study. Pathological and biochemical analyses were conducted. Outcome measures included mortality, tumor formation, and liver and kidney damage. The authors used quantitative Polymerase Chain Reaction (qPCR) to detect the presence of the transgene in the GM corn and mass spectrometry to measure glyphosate levels in the food and water. Multivariate statistics were used to evaluate biochemical data, however, no statistical tests were used to determine whether there were significant differences between groups in terms of the main outcome measures.

The study found that males fed GM maize at 11 %, GM maize + Roundup© at 11 and 22 %, and Roundup© at the middle dose had a higher mortality than controls. Females in all treatment groups had higher mortality rates compared to controls. However, the mortality results were not dose-dependent: males fed GM 33 % had a lower mortality than controls, and males fed GM 33 % and Roundup© had the same mortality as the controls. Males fed Roundup© at the highest dose also had a lower mortality than controls. The highest female mortality rates were found in groups fed GM at 22 % or GM at 22 % and Roundup©. However, females fed GM at 33 % and GM at 33 % and Roundup© had lower mortality than those fed GM at 22 % or GM at 22 % and Roundup©. Males in all experimental groups had a higher percentage of liver, kidney, and hepatodigestive damage than controls, but this effect was not dose-dependent. Females in all experimental groups had a higher percentage of mammary and mammary gland tumors than controls, but this effect also was not dose-dependent. While seven out of nine female experimental groups had a higher percentage of pituitary tumors than the control, two groups (GM at 33 % and GM at 22 % and Roundup®) had a lower percentage of tumors than controls. The article included graphic images of rats with enormous tumors. The authors hypothesized that the differences they observed between experimental groups and the control group were due to the metabolic impacts of the transgene and the endocrine disrupting effects of glyphosate. They concluded that NKG603 GM maize and Roundup® can cause health problems in rodents and that long-term studies of the risks of GM foods and pesticides are warranted (Séralini et al. 2012).

Criticism of the Study

Arjo et al. (2013) outline ten different problems with the Séralini study. The most important problems cited by critics included:

Small Sample Size

Numerous critics argued that the sample size (ten per male and female group) was too small to draw meaningful conclusions, especially for long-term studies in which the laboratory animals are likely to develop significant health problems after 18 months (Arjo et al. 2013; de Souza and Macedo 2013; Grunewald and Bury 2013; Hammond et al. 2013; Robert et al. 2013; Tien and Huy 2013). According to several critics, the control group and nine experimental groups should have included at least 50 animals per sex (Grunewald and Bury 2013; Hammond et al. 2013; Langridge 2013; Robert et al. 2013; Tien and Huy 2013).

Poor Choice of Laboratory Animals

According to several critics, Sprague-Dawley rats should not have been used in this experiment, given their tendency to develop health problems by 18 months of age (Arjo et al. 2013; de Souza and Macedo 2013; Grunewald and Bury 2013; Tien and Huy 2013). To avoid minimize confounding factors related to the natural deterioration of health the study should have used a rodent with a better long-term track record (Arjo et al. 2013).

Flawed Statistical Analysis

Several critics pointed out flaws with the statistical analysis used in the study, such as lack of power analysis to justify the sample size (Arjo et al. 2013), and no formulation of statistical hypotheses to be tested (Arjo et al. 2013) or statistical analysis of the main endpoints of the study (Arjo et al. 2013; Ollivier 2013; Sanders et al. 2013). Several critics analyzed the data for the main endpoints (morality, tumors, organ and tissue damage) and found no statistically significant differences between groups (Ollivier 2013; Panchin 2013; Sanders et al. 2013).

Lack of Experimental Details

Several critics pointed out that the published paper omitted key experimental details, including information concerning how much food and water the rats consumed, how the food was prepared, and how the pathological analysis was done (Arjo et al. 2013; Barale-Thomas 2013; Robert et al. 2013; Sanders et al. 2013).

Lack of Access to Supporting Data

Several critics pointed out that the article mentioned supporting data for the study but that access was not provided (Arjo et al. 2013; Robert et al. 2013; Sanders et al. 2013).

Improper Pathological Analysis

A few critics argued that the pathological analysis was done improperly and that standard procedures were not followed (Arjo et al. 2013; Barale-Thomas 2013; Schorsch 2013).

Inadequate Discussion of Prior Research on the Safety of GM Foods

Several critics argued that the published paper failed to meaningfully engage previous research on the safety of GM foods, including short-term and long-term studies (Arjo et al. 2013; Berry 2013; Tester 2013).

Unethical Treatment of Animals

Several critics argued that the authors of the study caused unjustified pain and suffering to the rats by allowing the tumors to grow beyond the point when the standard of animal care would require euthanasia (Arjo et al. 2013; Barale-Thomas 2013; Robert et al. 2013; Sanders et al. 2013).

Séralini and Coauthors Reply to Their Critics

Séralini and coauthors’ replied to their critics in the March 2013 issue of FCT. They responded to the charge that their sample was too small by noting that 90-day feeding trials use only 10 animals per male and female group (Séralini et al. 2013). They defended their choice of Sprague-Dawley rats by claiming that it is important to use a type of animal that is prone to tumor formation, since animals that strongly resist tumor formation may not show enough tumors after 2 years to yield data suitable for analysis. Critics could argue, however, that even if the choice of animals is defensible, this would still require a much larger sample size, since Sprague-Dawley rats are known to develop health problems, including tumors, by about 18 months. A larger sample size (e.g. 50 or more per male and female group) would be needed to distinguish between normal deterioration of health and adverse health effects due to the diet (Arjo et al. 2013).

Concerning statistical issues, Séralini et al. (2013) replied that statistics alone does not tell the truth and that it is necessary to hypothesize a biologically plausible mechanism (i.e. adverse reactions to the transgene and glyphosate) to explain observed results. Critics could argue, however, that a statistical analysis of endpoints is essential for drawing inferences from the data, regardless of whether one has proposed a plausible biological mechanism to explain the results (Arjo et al. 2013).

In response to the charge that they did not follow standard procedures in their pathological analysis, Séralini et al. (2013) argued that they did follow standard procedures.

In response to the charge that they were withholding supporting data, Séralini et al. (2013) said they would make the data available to scientists when biotechnology companies that are conducting research on GM crops also make their data available.

In response to the charge that the study did not provide adequate protections for animals because the tumors were allowed to grow excessively large prior euthanasia, Séralini et al. (2013) argued that their protocol was approved by an animal care committee and that they followed animal welfare guidelines.

In a commentary accompanying their republished paper, Séralini et al. (2014a, b) claimed that many of their critics had financial relationships to biotechnology companies. There most serious charge is against Richard Goodman, who became an assistant editor at FCT in January 2013.

Séralini et al. (2014a, b) claimed that Goodman had requested raw data from their study and had criticized it severely. Goodman had been employed by GM seed manufacturer Monsanto for 7 years but no longer has a position with the company. He also has an ongoing relationship with International Life Sciences Institute, which is funded by biotechnology companies. In response to the charge of editorial conflict of interest, Goodman claims he was asked to join FCT in January 2013 to review biotechnology research and had no involvement in the decision to retract the paper (Casassus 2014).

Some critics have noted that Séralini also had conflicts of interest which were not properly disclosed (Entine 2014; Retraction Watch 2014). Séralini is president of the scientific board of CRIIGEN (Committee of Independent Research and Information on Genetic Engineering), an anti-GM non-governmental organization that sponsored his research. He is also marketing a book and a documentary on the risks of GM crops (Entine 2014; Retraction Watch 2014).

In the commentary that accompanied the republished paper, Séralini et al. (2014a, b) also accused their critics of engaging in ad hominem, libelous attacks on their research. They rebuked FCT for retracting their paper, arguing that inconclusiveness is not a legitimate reason for a retraction. They described the journal’s editorial decision-making as a form of censorship motivated by political and economic interests (Séralini et al. 2014a, b).

Lessons for Journal Editors

This article will not offer an opinion on the ongoing debate concerning the safety of GM foods or attempt to adjudicate the dispute between Séralini and colleagues and their critics. The purpose of this article is to draw some lessons from this episode for journal editors.

Retracting Papers

What are some of the lessons that journal editors can draw from this episode? The first one concerns that retraction of the original (Séralini et al. 2012) paper. Séralini and his collaborators argued that their paper should not have been retracted because inconclusiveness is not a sufficient reason for retracting a paper (Séralini et al. 2013, 2014a, b). Several commentators agreed with them (Portier et al. 2014; Institute of Science and Society 2013; Fugh-Berman and Sherman 2014). FTC and its publisher, Elsevier, are both members of the Committee on Publication Ethics (COPE). COPE’s retraction guidelines state that:

Journal editors should consider retracting a publication if:

  • they have clear evidence that the findings are unreliable, either as a result of misconduct (e.g. data fabrication) or honest error (e.g. miscalculation or experimental error)

  • the findings have previously been published elsewhere without proper crossreferencing, permission or justification (i.e. cases of redundant publication)

  • it constitutes plagiarism

  • it reports unethical research…

Retraction is a mechanism for correcting the literature and alerting readers to publications that contain such seriously flawed or erroneous data that their findings and conclusions cannot be relied upon. Unreliable data may result from honest error or from research misconduct (Committee on Publication Ethics 2009).

It is important to note the COPE guidelines state sufficient but not necessary conditions for retraction, since they say “editors should consider retracting a publication if”. It is conceivable, therefore, that some editors who follow the guidelines decide to retract articles for reasons that go beyond the guidelines. For example, the COPE guidelines list “honest error” or “misconduct” as reasons why a paper might be retracted for being unreliable. An editor might decide to retract a paper because he or she judges that its findings are unreliable due to serious flaws in the design of the research, which was arguably the case in the Séralini study. As mentioned earlier, critics identified a number of different serious flaws with the study, such as small sample size and lack of statistical analysis of the main endpoints. However, the editors of FTC did not indicate that they were retracting the paper because the research design was seriously flawed. Instead, they chose to retract it due to “inconclusiveness”. Perhaps they could have chosen a better way to express their concerns with the paper, since a great deal of published research is inconclusive (Portier et al. 2014).

The COPE guidelines do recommend that the editors consider a retraction if the research is unethical. Although the editors did not cite ethical transgressions as reasons for retracting the article, there is substantial evidence that Séralini’s study violated animal welfare guidelines, given the size of the tumors presented in photographic images in the paper (Arjo et al. 2013; Barale-Thomas 2013; Robert et al. 2013).

COPE’s guidelines also recommend that editors consider publishing an expression of concern as an alternative to a retraction. However, COPE’s guidelines concerning expressions of concern do not appear to fit the Séralini case. The guidelines state that:

Journal editors should consider issuing an expression of concern if:

  • they receive inconclusive evidence of research or publication misconduct by the authors

  • there is evidence that the findings are unreliable but the authors’ institution will not investigate the case

  • they believe that an investigation into alleged misconduct related to the publication either has not been,

  • or would not be, fair and impartial or conclusive

  • an investigation is underway but a judgement will not be available for a considerable time (Committee on Publication Ethics 2009).

As noted earlier, the editors of FCT did not consider possible misconduct to be an issue in the Séralini paper. Although some critics accused the authors of fraud, the FCT editors found no evidence of data fabrication or falsification when they reexamined the paper and the original data. Given these circumstances, publication of an expression of editorial expression of concern would not have been appropriate.

In sum, the editors should either not have retracted the article (Portier et al. 2014), or, if they did, then they should have cited a reason for retraction consistent with COPE’s guidelines, such as violations of animal welfare guidelines or serious flaws in the research design that undermined the reliability of study.

Republishing Retracted Papers

The second lesson concerns the republication of a retracted paper. Should editors republish a paper that has been retracted at other journals? If so, should they re-review the paper and require the authors to address the problems identified by the reviewers, including those that led to the retraction? Should they link the republished paper to the retraction so that readers will know that the paper has been previously retracted and why? The editors of ESE said they decided to republish Séralini’s retracted study because they wanted to make the data available to the public (Casassus 2014). The ESE editors did not require the paper to receive additional scientific review because it had already been reviewed by FCT, which had concluded there was no evidence of fraud or error (Casassus 2014). The republished paper was almost identical to the retracted one, except the authors provided their raw data and analyzed the data differently (Séralini et al. 2014a, b). The authors did not address important issues raised by critics, such as the small sample size, the use of Sprague-Dawley rats, or deficiencies with their statistical analysis (Séralini et al. 2014a, b). The journal editors mentioned the original study in a preface to the republished version, but they did not say whether the authors had addressed the issues that led to the retraction (Séralini et al. 2014a, b).

While the goal of making data available to scientists is worthwhile, one might question ESE’s editorial decision-making regarding the republication of the Séralini study. A strong case can be made that the decision not to subject the paper to additional scientific review was irresponsible, given that numerous critics and the editors at FCT had identified serious flaws with the study. ESE’s reviewers probably would have also noticed these problems with the paper if they had been asked to review it. At the very least, the editors of ESE could have required the authors to revise the paper in light of the problems the FCT editors mentioned in the retraction notice. Of course, requiring the authors to do this would probably have further delayed publication of the research, since the authors probably would have needed to repeat their experiments with larger sample sizes in the different treatment groups, but one might argue that this is the price one must pay for rigorous science.

Peer Review

The third lesson concerns the peer review process at FCT. How was a paper that the editors said did not meet the journal’s scientific standards approved for publication in the first place? In the retraction notice, the editors of FTC stated that “The editorial board will continue to use this case as a reminder to be as diligent as possible in the peer review process (Food and Chemical Toxicology 2014a).” Since FTC, like most other journals, protects the confidentiality of the peer review process, it is difficult learn more about how the Séralini study was reviewed, barring public disclosures from the editors, reviewers, or other authors. FTC has an editor-in-chief, A. Wallace Hayes, four managing editors, and 21 associate editors (Food and Chemical Toxicology 2014b). The journal has not stated publicly which reviewers or editors were responsible for handling the review process for the Séralini paper, but Hayes would have had final authority over any decision to publish the disputed research.

As many commentators have noted, peer review has a number of weaknesses (Smith 2006). Studies of peer review have found that reviewers and editors often fail to catch obvious errors, disagree about the quality of the manuscript, and are susceptible to various biases (Schroter et al. 2008; Bornmann et al. 2010; Lee et al. 2012; Dwan et al. 2013). While we can only speculate about how reviewers at FTC responded to the Séralini paper, it is possible that they failed to notice or comment on some of the design flaws subsequently pointed out by the critics. It is also possible the reviewers detected these design flaws but the editors did not require the authors to correct them. However, we can only speculate about what may have transpired during peer review at the journal.

Although it is not reasonable to expect peer review to be perfect, when a paper is likely to have significant implications for science and society it is especially important for editors and reviewers to make sure that the peer review process is rigorous and fair. The editors of FCT probably knew—or should have known—that the Séralini paper would have a major impact on the public debate about the safety of GM foods, since it defended a hypothesis that contradicted most of the published studies in the field. The journal’s admitted failure to properly review the paper may have resulted in publication of research that has served to confuse but not clarify the debate about the safety of GM foods. Giving additional scrutiny to papers with significant implications for science and society can help assure readers that the journal has properly vetted them through peer review. It is far better to delay the publication of a paper so that the authors can address concerns raised by reviewers and editors than to rush to publish a paper and deal with its flaws once it is in print.

Conclusion

The publication, retraction and subsequent republication of the Séralini GM maize study raises important scientific and ethical issues for journal editors. Decisions to retract an article, especially without the authors’ consent, should be made on the basis of well-established policies, such as the COPE guidelines. Articles should be retracted only for serious errors that undermine the reliability of the data or results, or for serious ethical lapses, such as research misconduct or mistreatment of animal or human subjects. Inconclusiveness, by itself, is not a sufficient reason for retracting an article, though a seriously flawed research design might be. Retracted articles that are submitted for republication should undergo scientific review to ensure that they meet appropriate standards. The authors should address concerns that led to the retraction as well as other issues identified by the reviewers. Republished articles should be linked to the original, retracted publication, so that readers can decide whether the authors have addressed the issues that led to the retraction. Journals that are reviewing studies with significant scientific and social implications should take special care to ensure that peer review is rigorous and fair.