The final phase in any experiment is to interpret and report the results. Finding the answer to a challenging question is the goal of any research endeavor. Proper communication of the results to clinicians also provides the basis for advances in medicine [1]. To communicate appropriately, investigators have to review their results critically and avoid the temptation to overinterpret benefit or underreport harm. They are in the privileged position of knowing the quality and limitations of the data better than anyone else. Therefore, they have the responsibility for presenting the results clearly and concisely, together with any issues that might bear on their interpretation. Investigators should devote adequate care, time and attention to this critical part of the conduct of clinical trials. We believe that a policy of “conservative” interpretation and reporting best serves science, public health, clinical medicine, and the interests of readers.

A study may be reported in a scientific journal, but publication is in no way an endorsement of its results or conclusions. Even if the journal uses referees to assess each prospective publication, there is no assurance that they have sufficient experience and knowledge of the issues of design, conduct and analysis to judge fully the reported study [2]. Only the investigators are likely to recognize subtle, or even not so subtle, weaknesses and problems. As pointed out over 35 years ago by a former Editor of The New England Journal of Medicine [3],

In choosing manuscripts for publication we make every effort to winnow out those that are clearly unsound, but we cannot promise that those we do publish are absolutely true … Good journals try to facilitate this process [of medical progress] by identifying noteworthy contributions from among the great mass of material that now overloads our scientific communication system. Everyone should understand, however, that this evaluative function is not quite the same thing as endorsement.

This point has been illustrated by Ellenberg et al. [4]. The favorable results of a multicenter trial accompanied by a very positive editorial were published in The New England Journal of Medicine only 2 weeks before an Advisory Committee of the FDA voted unanimously against recommending that the intervention, a respiratory syncytial virus immune globulin, be licensed. A trial showed superiority of cangrelor against clopidogrel in people undergoing percutaneous coronary intervention for the primary outcome of a composite of death, myocardial infarction, revascularization, or stent thrombosis [5]. Despite the apparent benefit, the FDA had concerns about benefit from cangrelor and did not approve the drug [6]. In the end, it is up to the authors to be as objective as possible and the readers of a scientific article to assess it critically and to decide how to make best use of the reported findings.

In this chapter, we discuss guidelines for reporting, interpretation of findings, and publication bias, as well as the answers to three specific questions that should be considered in preparing a report: (1) Did the trial work as planned? (2) How do the findings compare with those from other studies? (3) What are the clinical implications of the findings? A checklist of what should be included in a report of a clinical trial is provided by the Consolidated Standards of Reporting Trials (CONSORT) group [713]. Similar guidelines have been prepared for publications of meta-analyses [14]. Included in the CONSORT website [10] is a checklist of essential items. Briefly, it lists study background and objectives, methods (trial design, participants, interventions, primary and secondary outcomes, sample size, randomization procedures, blinding, statistics), results (baseline data, outcomes, harms), and interpretation (limitations, generalizations). Also required are trial registration and funding source.

Fundamental Point

Investigators have an obligation to review their study and its findings critically and to present sufficient information so that readers can properly evaluate the trial and its findings.

Guidelines for Reporting

Any report of a clinical trial should include sufficient information about the study rationale, design, population and conduct, so the readers can assess the adequacy of the methods employed. The quality of a trial is typically judged based on the thoroughness and completeness of the Materials and Methods sections of the report. Unfortunately, thorough reporting does not always occur. A survey of 253 randomized trials published in five general medicine journals after revised CONSORT recommendations found that several aspects (e.g., allocation concealment and various components of blinding) were inadequately discussed [15]. Others [16] have noted that eligibility criteria are sometimes poorly described. Wang et al. [17] conducted a survey of subgroup analyses reported in The New England Journal of Medicine over a 1-year period. Subgroup analyses were common, but highly variable in completeness of information presented. As a result, The Journal implemented guidelines for reporting subgroup analyses [17].

Terms often used in clinical trial reports are misused. Many authors claim that they performed an “intention-to-treat” or “ITT” analysis, when in fact data from randomized participants have been excluded from the analysis. There may be good reasons why not all data are available, but unless the absent data are such a small percentage of the total, such that regardless of what they might show, no change in overall trial outcome could occur, this analysis should not be called intention-to-treat. Readers must look carefully despite claims of an ITT analysis. Sometimes, “modified ITT analysis” is used, which is a contradiction. If not all participants and not all follow-up events are accounted for, the report of the analysis should not say “intention-to-treat.” Some participants might be lost to follow-up. The number (ideally small) of those should be clearly indicated. Another term that is misleading is “per protocol analysis.” Authors use that phrase to apply to analyses that omit data from those who fail to adhere fully to the intervention or otherwise leave the study. We consider this to be an unfortunate use of the term, as it implies that such an analysis is the preferred one specified in the protocol. As we have argued in this book (Chap. 18), it is almost never the preferred analysis and should not be so specified in the protocol. When such an analysis is performed, we prefer the term “on treatment analysis” as it more accurately reflects what is done.

Traditional journals impose page limitations, forcing authors to exclude some important information. On-line journals that do not have such page limitations are becoming more common. In addition, many print journals allow supplemental material (e.g., details of methods, extra data) to be included in their electronic versions. Therefore, space limitations are no longer justification for withholding pertinent information.

As noted above, guidelines on how to report a clinical trial exist [714]. The International Committee of Medical Journal Editors has issued a set of uniform requirements that are endorsed by a large number of journals [18]. One of the guidelines is assurance that the trial has been listed in a formal registry [19, 20]. In addition, journals have their Instructions for Authors that address issues on format as well as content.

With the enormous number of scientific articles published annually, it is impossible for clinicians to keep up with the flow of information. Journals to which one subscribes may have online services to help identify articles of particular interest. Other online listings of publications in selected areas to which readers can subscribe may help, but the clinician still has the obligation to review carefully clinical trial publications. More informative abstracts help clinicians who browse through journals on a regular basis. Valid and informative abstracts are important, since clinical decisions are often influenced by abstracts alone [21]. For reporting clinical investigations, many journals have adopted the recommendation [22] for structured abstracts, which include information on objective, design, setting, participants, intervention(s), measurements and main results, and conclusion(s). The early experience of structured abstracts was reviewed by Haynes et al. and comments were “supportive and appreciative.” Those authors recommended some modifications of the guidelines [23]. We strongly endorse the now common use of the structured abstract.

Authorship

Decisions of authorship are both sensitive and important [24, 25]. It is critical that decisions are made at an early stage. Cases of scientific fraud have reminded us that being an author carries certain responsibilities and should not be used as a means to show gratitude. Guidelines regarding qualifications for authorship are included in general instructions for manuscripts [18]. In the past, a number of journals attempted to prohibit group authorship, on the grounds that those taking responsibility for the actual conduct of the trial and the writing of the manuscript ought to be clearly identified. Meinert [26] came to the defense of group authorship and expressed concern over the possible effect of this policy on multicenter work. We believe that group authorship is an important part of clinical trials research. Fairness and equity require proper crediting to those who have made major contributions to the design, conduct, and analysis, not just the few that served on the writing group. A compromise accepted by many journals and recommended by the International Committee of Medical Journal Editors is to allow group authorship but list those who served on the writing committee. A distinction may also be made between “authors” and “collaborators.” Some journals ask about the contributions of each person listed as an author or member of a writing group. If authorship is by a research group name, journal policies may ask that a corresponding author be listed, as well as those who accept responsibility for the paper. The International Committee of Medical Journal Editors [18] states the policy clearly:

Some large multi-author groups designate authorship by a group name, with or without the names of individuals. When submitting a manuscript authored by a group, the corresponding author should specify the group name if one exists, and clearly identify the group members who can take credit and responsibility for the work as authors. The byline of the article identifies who is directly responsible for the manuscript, and MEDLINE lists as authors whichever names appear on the byline. If the byline includes a group name, MEDLINE will list the names of individual group members who are authors or who are collaborators, sometimes called non-author contributors, if there is a note associated with the byline clearly stating that the individual names are elsewhere in the paper and whether those names are authors or collaborators.

“Ghost authorship,” or the failure to properly credit as an author those who wrote or coauthored a manuscript or who otherwise played a major role in the trial such that they deserve notice, has received considerable attention. Gøtzsche and colleagues [27] conducted a survey of 44 industry-initiated trials and found evidence of ghost authorship in three quarters of the publications. Ross et al. [28] describe publications concerning rofecoxib that were written by the industry sponsor’s employees, who were not acknowledged as authors.

The flip side of ghost authorship is “guest authorship,” where usually highly respected investigators who had little or no role in the study or in writing of the manuscript are given visible authorship. We deplore both of these practices.

Duplicate Publication

Journals typically prohibit duplicate publications. They routinely ask manuscript submitters whether the paper has been published or even submitted elsewhere. Nevertheless, a survey in 2003 looked at publications from 1983 to 1999 of trials that were relevant to submissions for approval of serotonin reuptake inhibitors by the Swedish drug regulatory agency [29]. Only one of the five drugs submitted did not involve multiple publications of the same or overlapping data. Depending on the journal and the nature and extent and importance of new information, updates of trials that were previously published may be accepted. The practice of many journals requiring a trial registration number serves to minimize, if not completely avoid, the concern that updates could lead to double counting of trials and participants in meta-analyses. One proposal that also might help, at least for electronic publications, is to better enable linkage of publications by means of trial registration numbers [30].

Disclosure of Conflict of Interest

Many journals have policies requiring clear statements of possible conflicts of interest [31]. The Uniform Requirements for Manuscripts Submitted to Biomedical Journals [18] contains guidelines regarding disclosure of potential conflicts related to individual authors and to the identification and role of the sponsor of the trial. Authors must be forthcoming in disclosing any potential conflicts, as they can affect how readers interpret study findings. Unfortunately, there have been instances where important conflicts were not disclosed and were subsequently discovered [32, 33]. These cases serve both to embarrass the investigators and perhaps unfairly tarnish good research, a situation that could have been avoided had openness been followed in the beginning. We recommend that all authors disclose freely all real, potential, or apparent conflicts of interest. Others may perceive conflicts that the authors did not consider to be such, so it is helpful to be as forthcoming as possible. Independent assessment is always preferable to lack of disclosure.

Presentation of Data

Presentation of the data analysis is important [3442]. There is a common misunderstanding of the meaning of p-values. Only about one-fifth of the physician respondents to a multiple choice question understood the proper meaning of a p-value [43]. The p-value tells us how likely an observed difference may have occurred by chance. It conveys information about the level of doubt, not the magnitude of clinical importance of this difference. A p-value of 0.05 in a very large trial may be weak evidence of an effect while in a small sample it can be quite strong evidence [35]. The point estimate (the observed result) with its 95% confidence interval (CI) provides us with the best estimates of the size of a difference. The width of the CI is another measure of uncertainty. The p-value and the CI are inherently related; thus, if the 95% CI of the difference excludes 0, the difference is statistically significant with p < 0.05. The CI permits the readers to use their own judgment for the smallest clinically important difference in making treatment decisions [34]. Some journals have taken the lead and now require more extensive use of CIs. We advocate reporting of p-values, point estimates, and CIs for the major results. They all convey important information and help in evaluating a trial’s result.

Interpretation

Articles have been written to help clinicians in their appraisal of a clinical study [4449]. Readers should be aware that many publications have deficiencies and can even be misleading. Pocock [50] has given three reasons why readers need to be cautious: (1) some authors produce inadequate trial reports, (2) journal editors and referees allow them to be published, (3) journals favor positive findings. For example, a review of trials of antibiotic prophylaxis found that 20% of the abstracts omitted important information or implied unjustified conclusions [51]. In another review, Pocock and colleagues [52] examined 45 trials and concluded that the reporting “appears to be biased toward an exaggeration of treatment differences” and that there was an overuse of significance levels. In a 1982 report, statistical errors were uncovered in a large proportion of 86 controlled trials in obstetrics and pediatrics journals and only 10% of the conclusions were considered justified [53]. Gøtzsche found that reports of 76% of 196 trials of nonsteroidal anti-inflammatory drugs in rheumatoid arthritis contained “doubtful or invalid statements” [54]. As mentioned in Chap. 9, inadequate reporting of the methods of randomization and baseline comparability was found in 30–40% of 80 randomized clinical trials in leading medical journals [55]. In the oncology field, the criteria for tumor response from articles published in three major journals were incompletely reported, variable and contributed to the wide variations in reported response rates [56].

Baar and Tannock [57] constructed a hypothetical trial and reported its results in two separate articles; one with errors of reporting and omissions similar to those “extracted from” leading cancer journals and the other with appropriate methods. This exercise illustrates how the same results can be interpreted and reported differently.

The way in which results are presented can affect treatment decisions [5860]. Almost half of a group of surveyed physicians were more impressed and indicated a higher likelihood of treating their patients when the results of a trial were presented as a relative change in outcome rate compared to an absolute change (difference in the incidence of the outcome event) [59]. A relative treatment effect is difficult to interpret without knowledge of the event rate in the comparison group. The use of a “summary measure,” such as the number of persons who need to be treated to prevent one event, had the weakest impact on clinicians’ views of therapeutic effectiveness [60]. We recommend that authors report both absolute and relative changes in outcome rates.

Publication Bias

Timely preparation and submission of the trial results—whether positive, neutral, or negative—ought to be every investigator’s obligation. The written report is the public forum that all the work of a clinical trial finally faces. Regrettably, negative trials are more likely to remain unpublished than positive trials. Early evidence of this publication bias came from a survey of the psychological literature. Sterling [61] noted in 1959 that 97% of 294 articles involving hypothesis testing reported a statistically significant result. The situation was similar for medical journals decades later; about 85% of articles—clinical trials and observational studies—reported statistically significant results [62]. Simes [63] compared the results of published trials with those from trials from an international cancer registry. A pooled analysis of published therapeutic trials in advanced ovarian cancer demonstrated a significant advantage for a combination therapy. However, the survival ratio was lower and statistically nonsignificant when the pooled analysis was based on the findings of all registered trials. Several surveys have identified selective reporting and/or multiple publications of the same trial [29, 6467]. A review of reporting bias found that it was widespread in many medical conditions [68]. Heres et al. [69] found that in head-to-head comparisons between anti-psychotic agents, 90% of the 33 trials sponsored by a drug company showed benefit from the sponsor’s drug. That is, the better drug was whichever was the one produced by the sponsor of the trial. These apparently contradictory result suggested bias in study design, analysis, and/or reporting.

Even multicenter trials conducted at a major academic center remained unpublished over 40% of the time. Those trials sponsored by government were published only modestly more often than those sponsored by industry [20, 66]. These findings were confirmed by Gordon and colleagues [70], who reviewed the publication records of 244 clinical trials funded by the National Heart, Lung, and Blood Institute of the NIH from 2000 through the end of 2011. Fifteen months later, only 156 of the trials had published main results, with the median time to publication being 25 months. Those trials with clinical outcomes were published more rapidly than those with other outcomes (e.g., biomarkers). The authors also found that after adjustment for other factors, those trials with “positive” results (defined as “a significant between-group difference in the primary end point favoring the investigators’ stated hypothesis”) were published quicker than those with “negative” results.

Turner et al. [67] looked at 74 studies of antidepressant agents that had been registered with the U.S. Food and Drug Administration. Twenty-three of the trials had not been published. In addition, those that were published claimed to show results more positive toward the intervention than did a subsequent FDA analysis of the data. Perlis et al. found that financial conflict of interest was common in clinical trials in psychiatry and was associated with clinical trial results that were highly favorable to the intervention [71]. According to Chan and colleagues [65], there were frequent discrepancies (62%) between the primary response variable stated in the trial protocol and that reported in the publication of results. Analyses used in publications have also differed from those used in internal company documents [72]. It has been shown that many abstracts are never followed by full publications [73]. In a survey of 156 investigators who acknowledged participating in trials whose results were not published, Dickersin et al. found that among 178 unpublished trials with a trend specified, 14% favored the new therapy compared to 55% among 767 published reports (p < 0.001) [74]. Analysis of factors associated with this bias are, in addition to neutral and negative findings, small sample size and possibly pharmaceutical source of funding [74]. Rejection of a manuscript by a journal is an infrequent reason [75, 76]. However, authors are no doubt aware that it is difficult to publish neutral results. A survey of the reference lists of trials of nonsteroidal anti-inflammatory drugs revealed a bias toward references with positive outcomes [77].

Selective reporting is viewed as a serious issue. In a survey of clinical trialists, selective reporting was considered among the two most important forms of scientific misconduct [78]. Investigators have the primary responsibility for ensuring that they do not engage in this practice. Journals too have a responsibility to encourage full and honest reporting. They ought to select trials for publication according to the quality of their conduct rather than according to whether the p-value is significant. We expect that the common use of clinical trial registries will encourage more complete reporting of trial results, as those trials begun but not reported are more easily identified, though as of yet, the record is mixed [20, 79].

A specific source of potentially biased reporting involves early phase studies and pilot trials. Particularly, unless these studies show positive trends or lead to full-scale late phase trials, they are likely to be unreported. Prayle and colleagues [80] reviewed trials listed in ClinicalTrials.gov that were subject to mandatory reporting of results by the FDA. They found that only 22% (163 of 738) studies reported results within 1 year of the end of the trial. Later phase trials were more likely to report results within a year (38%) than phase II studies (10%). Lack of reporting or publication is justified on the basis that the studies are small, of short duration, and may not use optimal doses of the intervention. Nevertheless, such studies may contain important data that should be made available to other researchers and clinicians. For example, if there are design flaws, disclosing those could save other researchers from repeating them. If there are problems with a drug, device, or procedure, it would be important for others to learn about them. We recognize that many journals will reject publication of these kinds of studies, but hope that in the era of on-line publishing, enough journals will accept them. We strongly encourage publication of early phase and pilot studies [81], in addition to publication of all late phase trials.

Did the Trial Work as Planned?

Baseline Comparability

The foundation of any clinical trial is the effort to make sure that the study groups are initially comparable so that differences between the groups over time can be reasonably attributed to the effect of the intervention. Randomization is the preferred method used to obtain baseline comparability. The use of randomization does not necessarily guarantee balance at baseline in the distribution of known or unknown prognostic factors. Baseline imbalance is fairly common in small trials but may also exist in large trials (see Chap. 9). Therefore, both a detailed description of the randomization process, including efforts made to prevent prior knowledge on the part of the investigator of the intervention assignment, and a presentation of baseline comparability are essential. Should the study be nonrandomized, the credibility of the findings hinges even more upon an adequate documentation of this comparability. For each group, baseline data should include means and standard deviations of known and possible prognostic factors. Small trends for individual factors can have an impact if they are in the same direction. A multivariate analysis to evaluate balance may be advantageous. Of course, the fact that major prognostic factors may be unknown will produce some uncertainty with regard to baseline balance. Adjustment of the findings on the basis of observed baseline imbalance should be performed and any difference between unadjusted and adjusted analyses should be carefully explained (see Chap. 18).

Blinding

Double-blinding is a desirable feature of a clinical trial design because, as already discussed, it diminishes bias in the assessment of response variables that require some element of judgment. However, many studies are not truly double-blinded to all parties from start to finish. While an individual side-effect may be insufficient to unblind the investigator, a constellation of effects often reveals the group assignment. A specific drug effect such as a marked fall in blood pressure in an antihypertensive drug trial—or the absence of such an effect—might also indicate which is the active intervention group and which is not. Although the success of blinding may be difficult for the investigator to assess, and some disagree with assessment of blinding [8284], we think that an evaluation could have value. The reasons for not assessing and reporting the success of blinding are that such efforts might stimulate investigators and participants to make extra efforts to unblind and that responses from participants are often unreliable. We believe, however, that readers of a publication ought to be informed about the degree of unblinding. An evaluation such as the one provided by Karlowski and colleagues for a trial of vitamin C [85] is commendable.

It is important to emphasize that assessment of blinding should not be done while the trial is ongoing, but only at the end. If assessment is conducted at the end of the trial, effects on trial conduct are minimal or nonexistent and there is less incentive for participants to attempt to mislead the investigator.

Adherence and Concomitant Treatment

In estimating sample size, investigators often make assumptions regarding the rate of nonadherence. Throughout follow-up, efforts are made to maintain optimal adherence to the intervention under study and to monitor adherence. When interpreting the findings, one can then gauge whether the initial assumptions were borne out by what actually happened. When adherence assumptions have been too optimistic, the ability of the trial to test adequately the primary question may be less than planned. The study results must be reported and discussed with the power of the trial in mind. In trials showing a beneficial effect of a specific intervention, nonadherence is usually a minor concern. Two interpretations of the effect of nonadherence are possible. It may be argued that the intervention would have been even more beneficial had adherence been higher. On the other hand, if all participants (including those who for various reasons did not adhere entirely to the dosage schedule or duration of intervention of a trial) had been on full dose, there could have been further adverse events or harmful effects in the intervention group.

Also of interest is the comparability of groups during the follow-up period with respect to concomitant interventions. Use of drugs other than the study intervention, changes in lifestyle and general medical care—if they affect the response variable—need to be measured and reported. Of course, as mentioned in Chap. 18, adjustment on postrandomization variables is inappropriate. As a consequence, when imbalances exist, the study results must be interpreted cautiously.

What Are the Limitations?

When the results of a “superiority” trial (i.e., one in which an intervention is evaluated to see if it differs from a control) indicate no statistically significant difference between the study groups there are several possible explanations. In addition to the conclusion that the studied intervention may be of little or no value, the dose of the intervention may have been too low or too high; the technical skills of those providing the intervention (e.g., surgical procedure) may have been inadequate; the sample size may have been too small, giving the trial insufficient power to test the hypothesis (Chap. 8); there may have been major adherence problems; concomitant interventions may have reduced the effect that would otherwise have been seen; or the outcome measurements may not have been sensitive enough or the analyses may have been inadequate. Finally, chance is another obvious explanation. The authors should provide the readers with enough information in the Methods and Results sections for them to judge for themselves why an intervention may not have worked. In the Discussion section, the authors should also offer their best understanding of why no difference was found.

For equivalence or noninferiority trials, inadequate design or conduct, or poor adherence on the part of participants, can lead to what the investigators and sponsors consider as the “desired” outcome, that is, no discernable difference between intervention groups. As discussed in Chap. 5, attention to these factors is extraordinarily important in noninferiority trials. Perhaps even more than in superiority trials, the authors must recognize and clearly acknowledge any study limitations and problems that could have contributed to the lack of difference. In some cases, an “on treatment” analysis might be warranted, in addition to the intention-to-treat analysis.

What are the limitations of the trial findings? One needs to know the degree of completeness of data in order to evaluate a trial. A typical shortcoming, particularly in long-term trials, is that the investigator may lose contact with some participants or for other reasons have missing data. These participants are usually different from those who remain in the trial, and their event rate or outcome measurements may not be the same. Vigorous attempts should be made to keep the number of persons lost to follow-up to a minimum. The credibility of the findings may be questioned in trials in which the number of participants lost to follow-up is large in relation to the number of events. A conservative approach in this context is to assume the “worst case.” This approach assumes the occurrence of an event in each participant lost to follow-up in the group with lower incidence of the response variable, and it assumes no events in the comparison group. After application of the “worst case” approach, if the overall conclusions of the trial remain unchanged, they are strengthened. However, if the worst-case analysis changes the conclusions, the trial may have less credibility. Other approaches to handling missing outcome data are discussed in Chap. 18. The degree of confidence in the conclusion will depend upon the extent to which the outcome could be altered by the missing information.

What Kinds of Analyses?

As addressed in Chap. 18, results may be questionable if participants randomized into a trial are withdrawn from the analysis. Withdrawal after randomization undermines the goal of conducting a valid, unbiased trial. It should be avoided. Investigators who support the concept of allowing withdrawals from the analysis should be required to report analyses both with, and without, withdrawals. If both analyses give approximately the same result, the findings are confirmed. However, if the results of the two analyses differ, believe the intention-to-treat analysis while exploring the reasons for the differences.

In evaluating possible benefit of an intervention, more than one response variable is often assessed which raises the issue of multiple comparisons (Chap. 18). In essence, the chance of finding a nominally statistically significant result increases with the number of comparisons. This is true whether there are multiple response variables, repeated comparisons for the same response variable, subgroup analyses or whether various combinations of response variables are tested. In the survey of 45 trials in three leading medical journals, the median number of significance tests per trial was eight; more than 20 tests were reported in six trials [51]. The potential impact of this multiple testing on the findings and conclusion of a trial ought to be considered. A conservative approach in the interpretation of statistical tests is again recommended. When several comparisons have been made, a more extreme statistic might be required before a statistically significant difference could be claimed. One approach is to require a p-value <0.01 for a limited number of secondary outcomes or a Bonferroni correction in order to declare a treatment difference statistically significant. An alternative approach is to consider all of the subsidiary analyses exploratory and hypothesis generating [52]. Authors of a report should indicate the total number of comparisons made during a trial and in the analysis phase (not just those selected for reporting). Readers should focus attention on p-values for protocol-specified comparisons.

The main objective of any trial is to answer the primary question. Findings related to one of the secondary questions may be interesting, but they should be put in the proper perspective. Are the findings for the related primary and secondary response variables consistent? If not, attempts ought to be made to explain discrepancies. Explaining inconsistencies was particularly important in the Cooperative Trial in the Primary Prevention of Ischaemic Heart Disease [86]. In that trial, the intervention group showed a statistically significant reduction in the incidence of major ischemic heart disease (primary response variable), but a significant increase in all-cause mortality (secondary response variable).

An area of some controversy concerns the analysis and reporting of composite outcomes (Chap. 18). Cordoba and colleagues [87] reviewed trials published in 2008 that employed composite outcomes. Of 40 such trials, 28 used components of the composite outcome that were of different importance. Thirteen trials used inconsistent definitions of the components in different parts of the publication (in five of them, the components were not the same). Nine of the trials did not present clear data for the individual components. Particularly when components are of different clinical importance, clear presentation of individual component data, as well as the composite data, is essential. Obviously, there will be limited power to detect group differences among the separate components, but authors should provide complete and consistent reporting. Are the trends in the individual components in the same direction, even though statistical significance in not observed?

In all studies, evidence for possible serious adverse events from the intervention needs to be presented. Comparison of adverse events among those participants who adhered to the intervention may provide a more conservative assessment, in the sense that it leans toward safety. Authors might consider analyzing adverse event data both using intention-to-treat and on-treatment approaches. In the final conclusion, the overall benefit should be weighed against the risk of harm. This assessment of the balance, however, is too infrequently done (Chap. 12).

How Do the Findings Compare with Those from Other Studies?

The findings from a clinical trial should be placed in the context of current knowledge. Are they consistent with knowledge of basic science, including presumed mechanism of action of the intervention? Although the precise mechanism may be unclear, when the outcome can be explained in terms of known biological actions, the conclusions are strengthened. Do the findings confirm the results of studies with similar interventions or different interventions in similar populations?

It is important here to keep in mind that a substantial proportion of initiated and even completed trials are never published. Additionally, a review of the completeness of articles cited in reference lists of clinical trial publications suggests that studies with neutral or negative results tend not to be cited [73]. Among published trials the response to a given drug or drug combination can vary markedly [88, 89]. Much of the variation may be explained by differences in participant selection, including genetic variation, treatment regimen and concomitant intervention, but major differences may also reflect the way the data were analyzed and reported. In a review of 51 randomized clinical trials in congestive heart failure, the authors attributed conflicting results to lack of uniform diagnostic criteria [88]. In a thoughtful editorial, Packer [89] pointed out that several other factors could explain discordant results. He suggested that the characteristics of the enrolled participants may be more important than the definition of congestive heart failure. Differences in design—sample size, dose and duration of intervention—may affect the trial findings. Other factors might be differences in criteria of efficacy and publication policy. Results of positive trials tend to be published several times, for example, both in a regular journal report and in a journal supplement funded by the pharmaceutical industry.

Generally, credibility of a particular finding increases with the proportion of good independent studies that come to the same conclusion. Inconsistent results are not uncommon in clinical research and medicine. In such cases, the problem for both the investigators and the readers is to try to determine the true effect of an intervention. How and why results differ need to be explored. The use of confidence limits has the advantage of allowing the readers to compare findings and assess whether the results of different trials could, in fact, be consistent.

What Are the Clinical Implications of the Findings?

It is appropriate, of course, to generalize the results to the study population, that is, those people who would have been eligible for and could have participated in the trial. The next step, suggesting that the trial results be applied to a more general population (the majority of which would not even meet the eligibility criteria of the trial) is more tenuous. Readers must judge for themselves whether or not such an extrapolation is appropriate. As seen in Fig. 4.1 in Chap. 4, there is often a considerable winnowing from the initial study population to the final sample. A similar argument applies to the intervention itself. How general are the findings? If the intervention involved a special procedure, such as surgery or counseling, is its application outside the trial setting likely to produce the same response? In a drug trial the question of dose-effect relationship is often raised. Would a higher or lower dose of the drug have given different results, perhaps by altering the balance between benefit and harm? Can the same claims be made for different drugs of the same class or that have a similar structure or pharmacological action? Can the results of an intervention be generalized even more broadly? For example, there have been many trials comparing different statins in the prevention of coronary disease sequelae. If the goal LDL-cholesterol is the same in the groups being compared, should one expect similar outcomes? Based on the experience with cerivastatin [90], statins are unlikely to be the same, at least with respect to adverse events. One problem in trials of devices is that the devices are constantly being modified or improved, with respect to the technology or the software algorithm. Does the trial using the old model have any implications for the latest model or the model to come in the future? For a further discussion of generalization, see Chap. 4.

In 1987, a review found that the majority of therapeutic interventions had not been properly tested in randomized clinical trials [91]; approval may have been granted on the basis of surrogate endpoints or drugs may have multiple indications, only some of which are proven. As discussed in this book, there continue to be examples of drugs that had been approved but when assessed in an adequately designed clinical trial turn out not to be as wonderful as hoped. Skillful marketing has a major impact on practice patterns. The marked regional differences in drug sales can not be explained on the basis of science, since regions, in principle, have access to the same scientific information. It is difficult to tease out the impact of clinical trials on medical practice from other factors such as marketing and treatment guidelines. There are several examples of trials that have changed practice patterns [92, 93]. Similarly, there are examples where practice was predominantly influenced by the other factors [94].

As noted by Rothwell [95], clinicians must decide if clinical trial results are relevant to their patients. Rothwell points out that issues such as trial setting, kinds of patients, details regarding the intervention and control, nature of the primary and secondary outcomes, and adverse events are important in arriving at clinical decisions. Therefore, authors should include the necessary information in their publications. Obviously, no trial is large enough or has a broad enough population to enable readers to evaluate every kind of patient that might be treated. But the information can be helpful.

As with all research, a clinical trial will often raise as many questions as it answers. Suggestions for further research should be discussed. Finally, the investigator might allude to the social, economic and medical impact of the study findings. How many lives can be saved? How many working days will be gained? Can symptoms be alleviated? Economic implications or cost-effectiveness are important. Any benefit has to be weighed against the cost and feasibility of use in routine medical practice rather than in the special setting of a clinical trial.

Data Sharing

An issue that has received considerable attention is data sharing. Even an exemplary scientific report can contain only limited data that might conceivably be important to other researchers and clinicians. Therefore, data sharing among investigators and public access to data and publications have been proposed, and even required by some clinical trials sponsors [96102]. A study jointly sponsored by the National Institute of Allergy and Infectious Diseases of the NIH, the Juvenile Diabetes Research Foundation, Genetech, and Biogen Idec, at the time of publication, indicated that data sets were accessible on a public website [103]. Also encouraging was a report from GlaxoSmithKline that it would “provide access to deidentified patient-level data” [104] and a supportive letter from a representative from Hoffmann-La Roche [105]. It should be noted that the National Heart, Lung, and Blood Institute of the NIH has for decades provided data sets of many of its major clinical trials and observational studies to investigators [106]. In 2014, the National Institutes of Health stated its intent to increase sharing of clinical trial results for studies that it funds. It proposed to require submission of “summary results information to ClinicalTrials.gov for any applicable clinical trial that is required to be registered, regardless of whether the drugs, biological products, or devices under study have been approved, licensed, or cleared for marketing…” [107]. The proposed requirements deal with summary data only. It was acknowledged that sharing of individual participant data would be important, and that future efforts to accomplish that were under consideration [108]. The European Medicines Agency has clarified its position on the need for publication of clinical data and clinical study reports on which regulatory decisions are made [109].

The Institute of Medicine has released a report on data sharing that addresses many of the issues [110]. While the rationale for data sharing may be compelling, the process is very challenging because there are so many stakeholders and levels of data in a typical clinical trial. Stakeholders involved with data sharing, from sponsors, whether federal or private, include clinical investigators in the trial, other interested clinical investigators not part of the trial, patient advocacy groups, trial participants, regulatory agencies, journals and graduate students in training. There are also different levels of data, ranging from the raw data which may include medical images, electrocardiogram tracings, and quality of life assessment tests to the more standard baseline data and clinical and laboratory outcome data. Not all of these data are used, some perhaps only rarely, in scientific presentations or publications, and many are unlikely to be used even in regulatory review. The Institute of Medicine Report calls for data and their metadata (documentation about the data file and the study) to be made available within 6 months of the publication date or when the study has been presented for regulatory review. For publication, these shared data would be the analyzable data set used in the publication. The 6 month moratorium is to provide trial investigators time to prepare and submit their additional secondary analysis papers for publication. For regulatory review, the shared data would be what was in the complete study report. In general, the IOM report calls for data to be made available no longer than 18 months after trial completion described as last participant’s last visit or the predefined follow up cutoff date. The logistics of how this process should be carried out must evolve, including what group or groups are the curators for the trial data, what review process if any is required before data are released, and who funds this process, among many other challenging issues.

As discussed in the IOM report, the benefits and limitations of the data sharing policies are contentious, but all investigators whose trial was funded by an agency or company requiring data sharing must keep abreast of the requirements and policies of funding agencies such as NIH, drug regulatory agencies, and pharmaceutical companies.