Introduction

Proposals to use policy measures such as taxing persons with obesity as ways to raise revenue and discourage poor health behaviors, including high levels of consumption, existed at least as early as 1904.1

However, it was largely in the mid-1990s that the academic and professional dialog around obesity shifted from one dominated by basic science and clinical research to involve a third branch, namely, public health approaches. Inspired in part by the successful efforts to curtail cigarette smoking, potential obesity-related policy approaches began receiving more attention. A selection of such policies include, but are not limited to, providing information (for example, labeling restaurant menus with nutritional facts), marketing ideas to inspire behavior change (for example, placing public health posters in subway systems to discourage or encourage certain food or activity behaviors), mandating the measurement and reporting of the body mass index of schoolchildren to parents, enacting worksite economic contingencies, changing food offerings for schoolchildren, zoning of allowable restaurants, banning the sale of certain portion sizes, taxing or subsidizing certain foods, and providing economic incentives and disincentives through insurance charges. Some of these have been implemented and some have only been proposed. Few have been rigorously evaluated and fewer still have unequivocal evidence demonstrating efficacy in stabilizing or reducing body weight.

Because the implementation of such policies typically involves at least some of the following: money, limitations on the freedom of businesses to engage in certain types of commerce, limitations on personal freedom and opportunity cost with regard to time and attention; it is not surprising that obesity-related policy proposals often provoke heated debate. Moreover, the debate frequently focuses on moral issues, sometimes involving the balance between autonomy and beneficence or individual fairness and societal benefits. Because these issues revolve around morals and values, they are difficult to reconcile. As such, they are repeatedly deferred while the dialog jumps to questions of judging the quality of evidence. Yet even here, disagreements abound as to the strength of evidence and whether it supports a particular position on a proposed policy. Equally important and sometimes debated, but often simply glossed over, are questions such a:, (i) What type of evidence is needed and appropriate for a particular situation? (ii) How can such evidence be generated? and (iii) Is evidence even needed at all to justify the implementation or rejection of a particular proposed policy?

In this article, we address three macro-level questions. First, concerning evidence, we raise questions about the relevance of some types of evidence that are often brought to bear in policy dialogs. Second, we discuss the major methods used to generate such evidence, with particular focus on the fact that there are a range of study designs (that is, ordinary association tests (OATs) to pure randomized controlled trials (RCTs)) that yield evidence of varying quality and varying ability to support causality. Third, we consider what the standards of evidence should be in various contexts, as well as who ought to set those standards, and emphasize the inherent subjectivities involved in making policy decisions. We conclude by noting that it would be beneficial if both academics and policymakers were transparent in recognizing and conveying those subjectivities while taking care to both understand and distinguish the roles of empirical evidence and subjective values.

What do we want evidence about?

Evidence regarding plausibility

When considering a potential policy, the first evidence-oriented question we might ask is, ‘Is there evidence that the policy will plausibly be effective?’ That is, is there reason to speculate that the policy will work? Of course, beyond simply saying we cannot prove the contrary, the plausibility of a proposition is subjective, but one’s reasons for declaring something plausible or implausible can be specified. At the most superficial level, many obesity policies can be deemed plausible on the basis of the simple concept of energy imbalance as a cause of obesity. Any policy directed at either increasing energy expenditure or decreasing energy intake might thus be assessed as plausible by some. In some cases, this general plausibility is all that is needed to initiate a policy. For example, when considering calorie-labeling of restaurant menus, the US District Judge Richard J Holwell ruled that:

‘The Court agrees with Dr Allison that one cannot conclude with scientific certainty from the available evidence that a regulation of this type will ultimately be successful in combating obesity. But even if there are no data demonstrating conclusively that Regulation 81.50 will be effective, conclusive proof is not required to establish a reasonable relationship between Regulation 81.50 and the City’s interest in reducing obesity. Based on the evidence presented by the City, as well as common sense, it seems reasonable to expect that some consumers will use the information disclosed pursuant to Regulation 81.50 to select lower calorie meals when eating at covered restaurants and that these choices will lead to a lower incidence of obesity.’2

In contrast, empiricists (or Bayesians) might state that the existing evidence indicates that no proposed public health approach to obesity has been convincingly shown to work or, at best, that no approach has more than very modest effects when it has been applied or tested.3, 4 Therefore, the a priori expectation is that the next proposed policy will have little to no effect. By analogy, this rationale is similar to the statistically minded high school guidance counselor who advises the basketball star to study academics because, while the counselor cannot rule out that this player will be the one to get drafted to the NBA or WNBA, it is unlikely.

Plausibility may also be low in some people’s minds for policies that aim to affect one component of energy balance in one context while leaving other components of energy balance untouched. Such policies, even if effective in altering the one component of energy balance in the desired direction, will only be effective if this alteration is not compensated (or is at least incompletely compensated) for by alterations in other components of energy balance. Empirical, experimental evidence indicates that such compensation does indeed occur, although the compensation is usually incomplete.5 This suggests that the plausible effects of policies that work through proposed alterations in one component of energy balance should not be based on models that assume no compensation (c.f., The Caloric Calculator, which estimates average caloric impact, which predicts effect sizes for childhood obesity interventions6), as such models will likely markedly overestimate plausible effects.

The plausible benefit of many proposed policy approaches also rests on the assumption of additivity—a small effect coupled with several other slight effects will collectively produce a larger response in the outcome. This is particularly applicable to the category of ‘nudge,’ a term introduced by Thaler and Sunstein7 to describe multiple, minor, likely unnoticeable changes to alter one’s behavior. Rozin et al. 8 showed that multiple modest changes, or nudges, affecting food accessibility (location of ingredients at a salad bar and size and type of serving utensils) in a cafeteria setting reduced the calories purchased during single meals without removing choices. They predicted that the reduced purchasing would translate to a cumulative benefit of weight loss over 1 year. Again, this type of study relies on several assumptions: that fewer calories purchased translates to fewer calories consumed; that ‘all else is equal,’ that is, no compensation occurs; that short-term effects persist in the long term; that multiple interventions have additive effects; and that effects of interventions work equally well when subjects are fully aware of the interventions (as in ordinary commerce) as when the interventions are not disclosed (as in many studies). Such a study also brings up questions of whether patrons would purchase fewer calories in an ordinary setting such as a store and that would result in weight loss. For example, Wansink and colleagues found that increasing the cost of soda resulted in reduced soda purchased but was associated with increased sales of beer.9 These nudges also may elicit a different response when persons are made aware of the interventions or with repeated long-term exposure (that is, daily or weekly grocery shopping).10 The nudge approach has also been criticized on several other grounds,11, 12 and such criticism highlights that what seems plausible to one person may not seem so to another.

Evidence regarding postulated intermediaries

Evidence of the effectiveness of obesity policy may also rest on evidence regarding presumed mediating variables. An example is a proposed policy for an action intended to increase fruit and vegetable consumption, with the main assumption being that increased intake of fruits and vegetables (the mediating factor) will decrease adiposity or promote less weight gain. Empirical support for the policy may include a demonstration that the proposed action does indeed lead to increased fruit and vegetable consumption. However, such support can only be suggestive because it does not necessarily follow that increases in fruit and vegetable intake will actually decrease or prevent adiposity or lessen weight gain, and the same argument applies for other postulated intermediaries.13, 14

Evidence from analog studies

Analog studies attempt to represent key aspects of ordinary life while controlling or limiting external factors, which increases internal validity and can yield key insights15 yet potentially decreases external validity.16 An example of an analog study was conducted by Epstein et al.17 to compare the effects of taxation versus subsidization on food purchases. They found that using taxes on foods with low nutrient density but also high caloric content was successful at reducing caloric intake, whereas subsidizing low-calorie foods increased caloric intake. This type of evidence supports the plausibility, but not necessarily the effectiveness, of a policy for decreasing obesity. One area of opportunity is the use of pragmatic RCTs, which emphasize rigorous methods in real-world contexts.18

Direct evidence regarding effectiveness

Of course, the key evidence desired is evidence of a policy’s effectiveness on the ultimate outcome: decreased levels of obesity. Although optimal, such evidence is often difficult to obtain. Ultimately, an ideal study would bear direct evidence of effectiveness, under actual conditions of use, during extended periods of time and would be of a nature to allow strong inference of cause and effect. These would be randomized studies of actual policy or of extremely close proximity. There is no question that these studies would be difficult, expensive, time-consuming and in some cases potentially unethical. We do not advocate a lack of action without this type of evidence; however, there should be a clear understanding that without such evidence, statements about the effects of a policy remain speculative.

Evidence regarding unintended consequences

It is important to keep in mind that implementation of any policy often brings with it unintended and undesirable consequences. Many of these consequences have been previously highlighted.19, 20 Such consequences can include, but are not limited to, inequitable distribution of the costs to implement the policy, encroaching on individual freedoms, over-consumption or increased purchasing of certain foods, stigmatization, depression and avoidance of doctor appointments.19, 20 One author contends that the emphasis on body weight has led to weight-based bullying, increased disordered eating, body dissatisfaction, extreme dieting and complications from obesity surgery, among others.21 Although some evidence exists on potential unintended and undesirable consequences, it is fairly limited as this field has not been fully investigated. Again, fear of unintended negative consequences should not paralyze us into inaction, but should lead us to practice humility about the potential value of our proposals, to think things through carefully, and to vigilantly monitor implemented policies for any potential unintended consequences.

Evidence regarding public opinion

Reports of the results of public opinion surveys on the desirability of particular obesity-related policies have proliferated in recent years.22 By implication, this suggests that if a large portion of the population supports a proposed policy, then implementing the proposal is merited. Is such a conclusion reasonable? Should evidence of public opinion about the desirability of policies be considered?

Suh et al.23 suggest that public opinion should be solicited to ‘better understand the public mindset about relevant policy strategies, and to identify attitudes among different subsets of the population towards specific legal measures that can increase protections for individuals affected by obesity’. Pollard et al.24 also contend that it is important to survey public opinion or community perception, especially when the policy in question involves what may be thought of as government ‘interference’ in issues concerning food (labeling, advertising and supply of environmentally friendly food). But are such opinions always important? When assessing public opinion is warranted, which methodologic issues are involved? And, are there actually circumstances when assessing public opinion would be quite inappropriate? Because this article is primarily about evidence for effectiveness, we consider these questions only briefly here.

Are scientific assessments of public opinions about policies always important?

Throughout the history of the United States, political leaders have wrestled with the pursuit of what seems morally right based on fundamental principles and doing what is popular. One such example is the famous Lincoln–Douglas debates about slavery. In one of the debates, Lincoln famously said, ‘In this and like communities, public sentiment is everything. With public sentiment, nothing can fail; without it nothing can succeed. Consequently he who moulds public sentiment, goes deeper than he who enacts statutes or pronounces decisions. He makes statutes and decisions possible or impossible to be executed’.25 It is noteworthy that Lincoln, like some modern day authors interested in obesity policy,26 is talking about ‘moulding’ public opinion to enable what one has already determined is right and just, and not assessing public opinion to determine what is right and just.

If Lincoln had conducted a public opinion poll and found that most pre-Civil War Americans favored retaining slavery in the United States, would he have judged that pertinent evidence as to whether the practice should be abolished? Would we? The answer is evidently no. When something is judged to be morally wrong, it is wrong and should be ‘off the table’ for discussion regardless of its popularity. Consider the recent posting from Ted Kyle on a ‘UK Proposal for Explicit Weight Discrimination in Healthcare’.27 Kyle argues that a proposed policy was a grossly unjust form of discrimination against persons with obesity in terms of health care access. Or, consider proposed policies that entail institutionalized ‘fat shaming’28 or a failed/withdrawn Mississippi bill to limit access of persons with obesity to restaurants.29 Many, including the current authors, would consider such proposals morally indefensible, and if one adopts such a position, then no public opinion polls are needed. If moral opinion has superior authority relative to public opinion, this invites important questions of who or how many determine the moral authority and on what basis.

When public opinion assessment is warranted, which methodologic issues are involved?

The above notwithstanding, situations certainly exist where public opinion is important, such as to determine whether a policy which is neither morally indefensible nor a moral imperative is desired by the citizenry. In such situations, it will be important to rely on good principles of designing and interpreting opinion surveys and to keep in mind that who is surveyed30 and how questions are worded31 can both be used to manipulate the answers one receives. Extensive discussions on these and other methodologic points are covered in standard textbooks on survey and sampling methods.

Are there actually circumstances when assessing public opinion would be quite inappropriate?

Finally, we suggest that in some circumstances, assessing public opinion is not only unnecessary, but inappropriate. Specifically, in situations where a proposed approach is morally indefensible, to admit the value of public opinion surveys on determining whether a policy should be enacted invites a ‘tyranny of the majority’.32, 33 An interesting corollary of this is that empirical evidence on the harm or lack of benefit for some morally indefensible practice might also be seen as not only unnecessary, but counterproductive, because the very act of considering the empirical evidence implies that the practice under consideration might be worthy of adoption if the evidence came out a particular way. For example, consider this headline from an internet posting: ‘Science Says Fat Shaming Backfires—So Can We Finally Stop It?’34 The article seems to be referencing an observational study35 that is interpreted to show that perceived weight discrimination leads to greater future obesity in the person experiencing the discrimination. The answer to the headline’s rhetorical query, So Can We Finally Stop It?, in our opinion is that we unequivocally should stop fat shaming, but not because of this (or any other) study but rather because it is wrong. Even if one accepts our view that fat shaming is wrong a priori, might one ask where the harm is of buttressing the position with some empirical support. The harm is that the empirical support, like all empirical support, is subject to differential interpretation, criticism and being overturned. In the observational study in this example, it would be easy to point out many limits, most notably that the study cannot show cause and effect. This may lead others to conclude, ‘Well, if the wrongness of fat shaming depended in part on the empirical evidence and the empirical evidence has holes in it, I guess fat shaming may not be wrong after all’. If this example is not stark enough, we can ask ourselves would we take seriously the need for studies to show deleterious effects of policies that institutionalized racial or religious discrimination as justifications for eliminating such heinous policies?

How might evidence for obesity policies be generated?

We now turn from the question of what evidence we want to the question of how such evidence can be generated. In doing so, we emphasize that we are focusing in this section on questions regarding the effects of potential policies on outcomes and do not consider questions about assessing other things such as public opinion about policies. In considering the generation of evidence regarding the effects of potential policies, we are considering questions of cause and effect and readers may find the videos available from an annual short course on this topic of interest (see: http://www.norc.uab.edu/courses/shortcourse).

Here, we divide the types of research to be considered into three categories: (i) research that can be determinative of the causal effects of policies; (ii) research that can contribute to an overall assessment of the causal effects of policies, but cannot on its own determine causation; and (iii) research that formally synthesizes multiple sources of information to estimate the causal effects of policies.

Research that can be determinative of the causal effects of policies

Role of randomized controlled trials

Empirical evidence derived from RCTs aimed at identifying factors that increase or decrease the risk or magnitude of obesity can provide the strongest evidence to guide the development of obesity policies.36 RCTs are regarded as the gold standard in the hierarchy of research designs because they are the most reliable method for determining causality.37 Evidence generated from RCTs has been used to guide the development of several types of obesity policies such as dietary recommendations, sugar-sweetened beverage taxes and food pricing.38, 39, 40 Despite the acknowledgment that RCTs offer the strongest inferences about cause and effect, several arguments are commonly offered against reliance on RCTs for causal inference in policy research. We very briefly review these arguments here.

RCTs are imperfect

Some authors note that RCTs are imperfect. They can be designed and executed with flaws. Like all empirical studies, they are subject to stochastic variation. Finally, they often entail subject selection criteria and/or study conditions that limit generalizability of the results owing to the broader population and more ‘real-life’ circumstances. These are all legitimate criticisms, but two things are noteworthy. First, these weaknesses are all surmountable. RCTs can be designed and executed well and can be executed in large enough samples and tested with small enough nominal type 1 error levels to minimize stochastic errors. Finally, pragmatic controlled trials offer investigators the ability to conduct a study that examines the effectiveness and efficacy of an intervention in the real world by allowing for the inclusion of a diverse sample of the population and by enabling the intervention to be adapted to local settings.41 For example, the Moving to Opportunity study found that certain social programs involving housing vouchers providing the ‘opportunity to move from a neighborhood with a high level of poverty to one with a lower level of poverty was associated with (caused) modest but potentially important reductions in the prevalence of extreme obesity and diabetes’.42

RCTs are sometimes impractical or impossible

We agree that RCTs are sometimes impractical or impossible, but this has no bearing on the extent to which RCTs and other designs can or cannot provide strong inferences or causation. The argument that (i) RCTs are sometimes impractical or impossible, (ii) such that if we relied on only them for strong causal inferences we would not be able to make strong causal inferences in some situations in which we wished to make strong causal inferences and (iii) therefore we should not make strong causal inferences solely from RCTs, is simply a special case of Argumentum ad Consequentiam.43

There are no RCTs showing that parachutes work

It is sometimes noted that we accept many propositions as true on the basis of some evidence and intuitive obviousness such as that smoking causes lung cancer or that parachutes save lives among skydivers.44 This is an example of argument by analogy.45 Arguments by analogy can be useful foils to provoke thought, but in and of themselves prove or disprove nothing.

We cannot wait for perfect data

It is sometimes argued that we cannot (or more aptly should not) wait for perfect data to take certain actions, such as enact certain policies. We agree with this proposition. However, the statement ‘we cannot (or more aptly should not) wait for perfect data to take certain actions’ is not equivalent to ‘we cannot (or more aptly should not) wait for perfect data to draw strong conclusions about causation’. Taking actions and drawing causal conclusions are distinct processes and the need and justification to take prudent action in the face of uncertainty is not a justification for denying that the uncertainty exists.19, 46, 47

Inadvertently promoting a false dichotomy

Majumdar and Soumerai48 have cogently noted that ‘some contend that only randomized controlled trials produce trustworthy evidence’. Unfortunately, such a position discounts valid nonrandomized or quasi-experimental study designs, even though health policy randomized controlled trials are rarely feasible. Such a constrained view inappropriately lumps together valid evidence from strong nonrandomized designs (that is, before–after studies with concurrent controls or the interrupted time series study in which a policy causes a sudden, visible change in trend) with evidence from weak designs that permit little causal inference (that is, the commonly conducted cross-sectional analysis that looks at outcomes only after a policy has been implemented). We agree that there is a continuum of non-RCT designs that vary in the strength of causal inferences they justify. We also agree that the stronger designs are underutilized as we discuss later in this article. However, these recognitions do not affect the validity of propositions that randomization is key to valid causal inference.49 If we accepted otherwise, we would again be engaging in Argumentum ad Consequentiam.43

Research that can contribute to an overall assessment of the causal effects of policies

Having emphasized the critical role of RCTs in humans on the policy under question and for the outcomes under question in drawing strong causal inference, we also note that with such information often unavailable and sometimes unattainable, it is frequently necessary to make decisions about actions without drawing firm conclusions about causation. In doing so, we must commonly integrate multiple sources of information, none of which alone is unequivocally dispositive about a conclusion of causation, to make informed decisions about what might reasonably be expected to work. Several sources of evidence can contribute to such decisions.

Role of model organism evidence

Model organisms are used to generate information regarding casual relationships that cannot be derived through human studies. For example, exposure to environmental obesogens, such as endocrine-disrupting chemicals, has been identified as a possible factor that increases the risk of obesity.50, 51 Such studies are vital in policy decisions, for example, to approve or disapprove use of a food additive, but cannot offer unequivocal conclusions about causation in humans because of the possible heterogeneity of effects across species.52

Role of observational evidence: of OATs and extended association tests

Observational evidence generally has a vital role in assessing the likely value of proposed policies. Observational studies are useful in generating hypotheses that can inform the conduct of more rigorous studies (that is, randomized trials) to begin to establish causality. With regard to policies developed to address the obesity epidemic, observational studies have been used to investigate associations between the initiations of policies and relevant outcomes. That said, not all observational evidence is of equal value. Here we distinguish between two broad classes of observational evidence, which we will call OATs and extended association tests (EATs).

Ordinary association tests

We define OATs to be observational studies on samples of individuals in which the sole or primary means of controlling for potential confounding factors is inclusion of measures of some potential confounding factors as covariates in statistical models (or stratifying by measures of such factors). OATs are heavily relied upon in thinking about plausible effects of policies, but have also been heavily criticized in general53, 54 and in the obesity and nutrition domains in particular55, 56, 57 for multiple reasons. We refer the reader to those references for details.

Extended association tests

Most dialog and research in obesity does not consider the evidence continuum between OATs, which do not offer strong assessments of causal effects, and RCTs, which do offer strong inferences, but cannot be done in all circumstances. In contrast to this polarized view, there are techniques that we refer to as extended association tests that lie intermediary between OATs and RCTs, including but not limited to quasi-experimental studies and natural experiments. Such designs are increasingly used, especially in the disciplines of economics and genetics, but are rarely used in obesity research. However, the ability to draw causal inferences in obesity research could be strengthened by increased judicious use of such approaches. In-depth understanding and appropriate use of the full continuum of these methods requires input from disciplines including statistics, economics, psychology, epidemiology, mathematics, philosophy and in some cases behavioral or statistical genetics. The application of these techniques, however, does not involve routine well-known ‘cookbook’ approaches but requires understanding of underlying principles so the investigator can tailor approaches to specific and varying situations.

Some of the key methods in use for situations where standard RCTs may not be available include natural experiments, quasi-experiments and experiments in which true randomization is used but subjects are not randomized directly to levels of the independent variable, as described with examples in Table 1.

Table 1 Examples of EATs in obesity research

Quasi-experiments are a useful type of observational study that can be used to investigate the impact of environmental changes on obesity, that is, changes that the investigator did not manipulate. In this case, investigators merely measure outcomes before and after the implementation of a new policy, regulation or other factor that has changed. Within the context of efforts at obesity modification, quasi-experiments have been used to assess the effectiveness of new policies (for example, inclusion of calorie information on menus, implementation of environmental elements thought to promote physical activity (that is, parks, bike lanes, walking trails), use of school-based obesity screening and body mass index report cards).58, 59, 60, 61

A prime example comes from the US Food and Drug Administration, which implemented regulations requiring franchise restaurant chains with 20 or more locations to provide calorie information on their menus and menu boards. In a quasi-experiment conducted in New York City, receipts were collected from patrons of fast food restaurants before and after menu labeling was implemented. The investigators found that adding calorie information to the menus did not appear to influence the food choices of parents or adolescents.58 Quasi-experiments such as this are a cost-effective way to evaluate the effects of obesity policies, as well as provide information that might inform modifications to existing policies.

The existence of extended association tests seems to be less well known to many investigators in public health, medicine, psychology and related fields. We believe that many questions about behavioral, psychological and economic influences on obesity-related variables and many applied questions about the effects of extant or proposed interventions can be addressed more informatively and more rigorously if more investigators availed themselves of these evolving methods related to causal inference from a basis of a sound understanding of fundamental principles.

Research that formally synthesizes multiple sources of information to estimate the causal effects of policies

Apart from the need to embrace and use the range of potential design strategies available, it is also essential to ‘step back’ and synthesize the multiple and varied sources of information to evaluate what they can tell us about the causal effects of policies.

Role of systematic reviews and meta-analysis

As a result of the growing rates of obesity around the world, the volume of evidence from obesity research has burgeoned. However, owing to variations in the quality and type of study design, implementation and the outcomes measured, determining effects from various studies can be challenging. Debates on obesity policies are often fueled by the contradictory findings of empirical studies, such as those regarding the influence of sugar-sweetened beverage consumption on childhood obesity.40 As such, high-quality systematic reviews and meta-analyses can be useful when attempting to evaluate the state of the evidence related to a particular intervention or policy with objective approaches to identifying and integrating evidence.62 That said, as Ingram Olkin once wrote, ‘Doing a meta-analysis is easy. Doing one well is hard’,63 and we have found that errors in obesity-related meta-analyses abound.64 Hence, although meta-analyses are vital, our field needs to improve their execution, and meta-analyses should be as critically reviewed as are any other studies.

Role of modeling

One drawback of RCTs, noted above, is that they often are not large enough to capture the entire spectrum of effects (both desired and undesired) that a policy may have.65 Mathematical and computational models of health policies are tools that can be used to predict the outcomes of an obesity policy and to identify implementation barriers before the policy is adopted.66 Moreover, the modeling of obesity policy enables policymakers to estimate the costs of implementing policies and to determine the resource allocation required to implement a given policy.67, 68, 69 For example, a dynamic weight loss model was used to estimate the effects of a tax on sugar-sweetened beverages on the prevalence of obesity in New York City.70 The model suggested that there would be decreases in obesity prevalence over a 10-year period.70 The model also estimated the magnitude of the projected reductions in obesity prevalence, allowing readers to better judge the potential public health impact of such a policy.70

Models are also valuable for monitoring the effects of policies over time. Evidence has shown that the effects of health policies can increase or diminish with the passage of time.69 Therefore, new data concerning the effects of a policy should be continually generated to estimate its effects in order to allow policymakers to revise or even discontinue implementation of the policy if it is shown to be ineffective.65

Despite the benefits of using models in the development and refinement of health policies, some challenges and limitations must be recognized. For example, health policy modelers are not often integrated into the health policymaking process. Therefore, models are seen as ‘one-offs’ rather than as tools that should be used during the lifecycle of the policy to ensure that it retains its value. Perhaps most importantly, models offer projections of effects, not demonstrations of effects. Such projections can be heavily dependent on the input parameters (that is, assumptions) of the model, and some published modeling activities71 are so heavy on assumptions of efficacy of the policies considered that the modeling can be seen as an instance of petitio principii.72

Standards for evidence and related factors influencing policy decisions

According to Donaldson et al.,73 most obesity-prevention bills enacted between 2010 and 2013 were based on initiating strategies (for example, ‘initiated farmer’s markets, increased access to walking trails, local menu labeling’) that had little to no evidence of benefit. But is this wrong? A vital consideration, often not made explicit a priori, concerns the standards for evidence that will be used to both generate a policy decision and to evaluate its effect once implemented. In general, the standards of evidence for a scientific conclusion are thought to be far more rigorous, because they are based on long-established methods that are considered to be objective, repeatable and relatively immune to biases of the individuals conducting the study. In contrast, the evidence (if any) needed to reach a policy decision (which is distinct from reaching a scientific conclusion) depends on many factors and is not constant across circumstances. Opinions can also vary. For example, the Society for Prevention Research states, ‘To be ready for broad dissemination, a program must not only be of proven effectiveness, but it must also meet other criteria…’ (emphasis added).74 This stands in marked contrast to the statement of District Judge Richard J Holwell quoted above that ‘even if there are no data demonstrating conclusively that Regulation 81.50 will be effective, conclusive proof is not required…’ and in the context of the legal proceedings, his interpretation of law is what determined the evidence standard. There are yet other standards in different contexts and so no universal rule about how much evidence is or is not needed for policymaking can be given. This stands in contrast to occasional statements from academics that seem to state from no formal basis of authority that a particular amount of evidence is or is not needed to enact a policy.

The four quotations listed (see Box 1) are from discussions and presentations involving policies directed at curbing sugar intake in the public. They reflect the varying perspectives of differing standards of evidence among researchers. The first two75, 76 put rigor of evidence aside and instead emphasize that the decision to develop policy is the priority based on a decision that seems to have already been committed to based on some combination of suggestive evidence or intuition. In contrast, the third and fourth statements progress from needing ‘a strong sense that it will be effective’77 to confidently requiring ‘strong evidence’ before any public policy decision.78 Thus, disagreement on the amount and rigor or evidence needed to enact a policy exists even among researchers discussing a particular target (sugar) of public policy. They illustrate the subjectivity of the standards of evidence for decision making.

In summation

In closing, our field will benefit from a greater emphasis on probative research. Probative research would meaningfully move us forward in our ability to state that a given treatment or prevention strategy does or does not have a particular effect.79 This is in contrast to studies that merely continue to draw attention to the plausibility of some treatment having some effect but do not increase our knowledge that such an effect actually exists.79 Finally, the quest for rigorous evidence and scrupulous truthfulness in reporting is fully compatible with the quest for beneficence and passionate pursuit of action for the betterment of others. Recognizing these comparabilities (see Box 2) may pave the way for public health dialog in obesity that is both more honest and more collegial.