Introduction

Our current, best-known causal criteria were presented more than 50 years ago [1,2,3] by, among others, Susser, the Surgeon General and Bradford Hill. They have served us well in epidemiology as a toolbox for a structured debate on causation. They were not labelled as criteria by Hill [1] but they have earned their status as criteria over the years. In spite of the rapid development in theoretical epidemiology, they have remained at least as a reference point for causal thinking in review committees and for decision makers.

Many advocate a more frequent use of causal terminology [4] even when reporting from single studies, but in spite of better research tools, we will usually not be in a position where single studies justify a causal label. Being able to identify all causal links in the process from exposure to disease does not mean we are able to study these links in an unbiased fashion under real life conditions. One thing is knowing what can go wrong in causal inference; another thing is avoiding these pitfalls in praxis.

Often we study problems in public health or clinical epidemiology where we have to take a stand on recommending acting or doing nothing. The consequence of choosing one or the other option should not determine our belief in causality but must be taken into consideration when we make public health decisions. We suggest therefore to add a consequence criterion.

We may be studying “Laws of Nature” and not just associations in specific populations [5], but we do it with imperfect tools [6] although new tools provide better and more valid designs. Time has come to implement these new methodologies and concepts in formal causal inference.

Epidemiology is a scientific discipline that aims at identifying preventable causes of diseases in order to reduce the burden of disease. If causes of diseases are eliminated or reduced, we will expect their effects, the disease occurrence, to shrink or to diminish. If E is causing D, eliminating E will at least reduce the incidence in the studied population with one case, often more [7]. If the exposure is not a cause, eliminating the exposure need not reduce the disease occurrence, except when other causal factors in the pathways linking the exposure to the endpoint of interest are also changed.

Studying causation requires a concept of causation, which is not only a technical or a philosophical concept but is part of everyday language. We learn about it in standard situations of causal interventions beginning in childhood (as when we turn on the light). If we had no concept of causation, we would be left with a very primitive language [6]. According to Hill [1], preventive medicine (including occupational medicine) is an intervention practice governed by a “decisive… question whether the frequency of the undesirable event B will be influenced by a change in the environmental feature A” (p. 29).

Hill’s list of ‘conditions’

Hill avoided the term ‘causal criteria’ and talked instead about ‘viewpoints and ‘guidelines’. His guidelines are now widely referred to as criteria, but there have been almost no attempts at clarifying in what sense, Hill’s guidelines are, in fact, criteria.

Using Feinstein’s account of the role of criteria in clinical research and practice [8], we argue that Hill’s guidelines are procedural criteria, i.e. criteria used to outline the performance of intervention procedures. Criteria for good preventive practice have undergone changes as epidemiological methods, disease-patterns, working-conditions, technology artefacts, culture and economic conditions have changed. Causal criteria cannot be used in the way they are often used in clinical practise where a certain number of criteria will lead to action since doing nothing may not be an option.

Many have discussed causal criteria, also before Hill published his landmark paper in 1965. Causal criteria were also presented in the text by the Surgeon General’s report on Smoking and Health [2] and later the International Centre for Cancer Research (IARC) added a probabilistic component to their classification of potential carcinogens [9].

Hill’s 9 criteria were: (1) strength, (2) consistency, (3) specificity, (4) temporality, (5) biological gradient, (6) plausibility, (7) coherence, (8) experiment, and (9) analogy.

Before that, Hume had stipulated some of the criteria of our everyday use of the word cause (e.g., constant conjunction). So we could say that the ‘criteria go back’—in the sense that Hill exemplars from occupational medicine are in accordance with our everyday use of the word ‘causation’. Hill’s criteria, however, comprises more ‘viewpoints’ based on examples from a particular expert field.

Causation is often a delayed effect with a probabilistic outcome

Many discussions on causation often refer to Hume’s “strong criteria” [10, 11].

For E to be a cause of D it must be true that:

  1. 1.

    E will always be followed by D—comment; E is a sufficient cause of D.

  2. 2.

    If E does not occur, D will not follow; comment; but for this to hold E has to be a necessary cause of D and the only necessary cause of D.

Hume believed these two statements to be alike, but they are not. The counterfactual condition in 2 will only be true if E is both a necessary and sufficient cause of D and is the only cause. The idea of a cause as a necessary and sufficient condition makes sense but is hardly ever seen in epidemiology, not even for infectious diseases although they were used in the Koch’s postulates [12]. We see necessary causes but they are often the result of how we define the disease. If we include E in the definition of D, E will become a necessary cause by circular reasoning, as when we defined AIDS as a disease following HIV exposure. In the practice of epidemiology, we need other criteria to identify associations that may be likely causal candidates and thus targets for prevention [13]. We cannot limit our research to ‘strong’ sufficient and necessary causes but have to target component causes that act in concert (in causal fields) to onset an effect. Mackie and Rothman were the ones who linked Hume’s causal criteria to a concept that often works in practice and explains its probabilistic nature and delayed effects; the component causal field model [14, 15].

On top of the list of causal criteria, we often placed strength of the association. The stronger the association is, the more likely it represents a causal link, although there is no estimate of strength in a standard directed causal graphs (DAGs). The strength of an association, we now think, is related to how common the other component causes in the causal field are in the population under study. According to Mackie’s causal field theory causes will follow the INUS conditions; causes are ‘insufficient but non-redundant parts of a condition which is itself unnecessary but sufficient for their effects’. However, strength is an important criterion because it makes other non-causal explanations less likely.

Consistency or reliability is also considered an important criterion, but the effect sizes are expected to depend upon the frequency of other component causes.

Evidence from randomized controlled trials, as illustrated by DAGs, will under perfect conditions reduce interpretation of a positive result to causation or chance. Randomized controlled trials are important tools, especially in clinical epidemiology, since confounding by indication may not be avoidable without randomization, especially when the treating doctors are good. Evidence from trials support causality but trials are also subject to error, especially if they need to be large and run for a long time.

Another important criterion is the dose–response association because it is hard to ‘explain away’ by confounding unless the confounder mimics the same dose–response effect; the higher the exposure, the more frequent is the outcome.

None of the mentioned criteria are sine qua non-criteria, except that a cause has to precede the effect; the association may not be a consequence of reverse causation.

It should also be a causal criterion—and perhaps the most important one—to have an association that remains after comprehensive attempts to remove it. “When you have eliminated the impossible, whatever remains, however improbable, must be the truth” (Sherlock Holmes). The task of the investigator should be to see if he/she can make an association go away, not just to add new data and repeat what has been done already. Repeating the same design does not bypass the verification problem and identifying observations that would not be compatible with the hypothesis may be more informative [16]. The new method development provides much better tools to see how robust an association is to falsification [17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32].

New method development should lead to new criteria

Many things have changed since Hill’s paper came out in 1965. Thinking of causal exposures with component causes that enter or exit these fields over time provides a concept that at least does not contradict empirical findings. Graphical presentations of causation based upon mathematical rules make us better prepared to decide on which data to collect and how they should be analyzed. Focus has changed from making inference based on P values [33] to bias analyses, for example by using instrumental variables, negative controls, triangulations, sibling comparison, use of cases as their own controls, use of marginal structural models, or use of invers probability weighing to adjust for selection or confounding etc. [18,19,20,21, 23,24,25,26,27,28,29,30,31,32, 34]. The rapid improvement of computer technology has opened a whole set of new ways to analyze data and to better learn from simulation studies [17].

DAGs have provided a powerful tool to illustrate causation [20, 21, 34]. Presenting a plausible DAG with empirical support would argue for a causal association and should be one of the “causal criteria”. Putting the association through bias analyses will also make an important contribution by trying to quantify the potential role of selection and information bias, including confounding. If using ‘best bias analyses’ and reasonable assumptions will not make the association go away, it speaks in favor of causality. Use of counterfactual reasoning also made important contribution to causal understanding.

A consequence criterion

We have to evaluate the evidence we have in the light of the methods that were used to generate the findings. However, we often need more than procedural criteria for good epidemiological practice to be of use in real life.

Procedural criteria state prescriptions for doing something in a particular practice [35]. Hill’s procedural criteria give prescriptions for epidemiological research in the context of preventive medicine.

Acting in accordance with adequate procedural criteria does not always secure that adopted consequences will be accepted (‘in real life’) as appropriate (right, just, fair etc.). We have to consider what flows from decisions made in accordance with the procedural criteria. If society and its institutions acts upon it, it will have consequences in real life. However, if society and its institutions do not act it will often have consequences in real life as well.

In ‘real life’ procedural criteria for causal intervention are not sufficient. Hill gives examples to show how the strength of evidence demanded in a particular context of intervention should be determined in the light of human values as fairness, justice and autonomy. Before “we made people burn a fuel in their homes that they do not like or stop smoking the cigarettes and eating the fats and the sugar they do like” we should need ‘very strong evidence’.

Here Hill makes ‘human autonomy’ a criterion among others in preventive medicine. He is in accordance with Feinstein who also points to the need of combing procedural criteria with (what he labels) desirability criteria in medicine, criteria that prescribe actions that are considered worthy or right in social contexts [8].

The idea of coming to an agreement on causation is often related to action, to do nothing or something [36, 37]. This reflects back to our counterfactual consideration; what would have happened had the exposed not been exposed, but the question is now what would happen in the future if the people stop being exposed. A potential counterfactual future without the exposure may offer more benefits and less side effects than maintaining status quo. The decision process need also to take into consideration if the ‘exposure’ is imposed from outside or a result of a personal choice.

Some may argue even for a moral obligation to action, i.e., to contribute to causal intervention in an environmental context. In the International Covenant on Economic, Social and Cultural Rights (1966) it is stated in Article 12 that the States recognize the right of everyone to the enjoyment of the highest attainable standard of physical and mental health [38]. Steps to be taken to achieve the full realization of this right include those necessary for the improvement of all aspects of environmental and industrial hygiene, and the prevention, treatment and control of epidemic, endemic, occupational and other diseases.

This emphasis on action was mentioned in Hill’s paper from 1965 [1]. “In occupational medicine, our object is usually to take action. If this be operative cause and that be deleterious effect, then we shall wish to intervene to abolish or reduce death or disease”. Paul Stolley further addressed our social responsibility in his talk to SER members [39] “This is not to say that all findings should not be scrutinized and challenged, but this should be done with a sense of responsibility”. We have no need for partly justified ‘opinions’.

It is not always the case that we have the luxury substantial evidence to evaluate effect risks in the light of causal criteria. Many drug trials are stopped at an early stage because the producer runs a high financial and ethical risk if they bring a harmful product on the market. A decision to implement a new vaccine—in spite of limited evidence—should be taken if the risk of doing nothing is considered to exceed the risk of using the vaccine. Other situations may call for decision making in situations where the risk of doing nothing is high, but the decision process is often heavily biased towards doing nothing. ‘Active’ mistakes are often more criticized than ‘passive’ mistakes.

Those who decide on this set of criteria should be driven by a wish to reach the truth—be like Kafka’s truth seeking dogs. They should have no conflicts of interest in the sense that they have no personal gain by the decisions they make. They should be familiar with epidemiologic research and the infrastructure and conditions for doing research.

By including a consequence criterion in a set of criteria for causal intervention we are confronted with dilemmas between different ethical and social values (e.g., between respecting individual autonomy and freedom and respecting social responsibility and solidarity). Here epidemiologists face problems and challenges they cannot solve alone.

Hill cleverly avoided simple checklists to classify research as good or bad. Such checklists may be of value in very standardized research protocols like RTCs but to think it is possible to navigate in the more complicated rivers of causation by only using predefined guidelines is naïve.

Conclusions

We still need causal criteria to summarize evidence, and we need to act on these criteria to preserve health, and to prevent diseases. These criteria should reflect the best knowledge from research, and much has changed in this field since Hill wrote his paper in 1965.

The task for such a revision of procedural criteria as well as a consequence criteria should be left for authorities who can speak on behalf of the scientific communities and who are recognized and trusted by stakeholders in states and civil society. National institutions should collaborate (e.g., in the context of WHO) aiming at formulating international standards and criteria to promote public health. It will be a useless exercise, unless the criteria are widely accepted and used (lead to action). If this is not done, many of our research findings will not be used in practice.