Keywords

1 Introduction to Health Program Evaluation

Clinicians and administrators use program evaluation studies to assess program performance, to identify strategies for improving program outcomes and efficiency, and to understand the program’s fiscal impacts. These analyses are an essential component of efforts to strengthen patient health and eliminate inefficiencies currently embedded in the US healthcare system. Evaluation study results equip clinicians and program administrators to quantify impacts of their programs on patient health and to document evidence of these impacts. This chapter will discuss questions that must be addressed by teams designing these studies.

The term program evaluation is an umbrella term that describes several types of analyses of program operations, including estimation of program impacts on patient health, health-related behaviors, healthcare utilization, and (in some cases) the cost of providing care. Studies that estimate program impacts on patient health, health behaviors, and/or healthcare utilization are known as comparative effectiveness research (CER) studies.Footnote 1 Studies that estimate both health impacts and financial impacts are known as cost-effectiveness analysis (CEA) studies. Healthcare providers and payers utilize these studies to fine-tune program design; to support decisions to continue, expand, or discontinue programs; to inform provider contracting decisions; and to inform payer coverage and formulary decisions (Centers for Disease Control and Prevention [CDC], 2012).

Comparative effectiveness research (CER) studies address questions such as the following:

  • Does the treatment generate physical health and/or mental health benefits for individuals who elect to participate?

  • Does the treatment attract participation by the people who are most likely to benefit from the treatment?

  • Can the treatment design be modified to increase participation and/or increase the health benefits enjoyed by participants?

Cost-effectiveness analysis (CEA) utilizes the same statistical methods to estimate both health impacts and financial impacts of operating specific treatments.Footnote 2 These studies report a ratio that measures the financial cost of achieving a specific health benefit (CDC, 2012, 2017).

Cost-effectiveness analysis (CEA) studies are utilized to support decisions to allocate resources across existing treatments, and they support decisions regarding health insurance coverage of new drugs and treatments. As provider and payer organizations make these decisions, they consider questions such as the following:

  • How does the cost of achieving a specific health benefit via a particular treatment compare with the cost of utilizing alternate strategies to achieve the same benefit?

  • How do the benefits generated by a particular treatment compare to the financial impacts of providing that treatment?

  • Does the ratio of benefits to costs vary across subgroups of treatment participants (CDC, 2012)?

Answers to these questions may support payer decisions regarding coverage of new treatments. They also support provider organization decisions to allocate resources to specific treatments and assessments of an array of management issues such as (i) risks and benefits of participating in shared savings contracts and (ii) likely impacts of treatments on publicly posted quality measures and the probability of earning quality-based incentive payments.Footnote 3

High-quality CER and CEA studies estimate causal impacts of treatments on patient health and on costs and revenues. Therefore, both types of studies require careful design to ensure that the study results will reflect causal relationships. Results that simply report correlations between treatments and outcomes do not necessarily inform decision-makers about treatment impacts, because the correlations could, instead, reflect parallel trends in the treatment and outcome variables or the impact of a lurking variable not included in the analysis. The well-known economist Franklin Fisher illustrated this point with the following anecdote:

There was once a cholera epidemic in Russia. The government, in an effort to stem the disease, sent doctors to the worst-affected areas. The peasants of [one] province … discussed the situation and observed a very high correlation between the number of doctors in a given area and the incidence of cholera in that area (i.e. more doctors were observed in cholera areas than elsewhere). Relying on this hard fact, they rose and murdered their doctors. (Fisher, 1966)

Incentives to conduct CER and CEA studies stem from pressures to increase healthcare system efficiency. This focus on efficiency is rooted in the US Triple Aim health policy. The Triple Aim focuses on boosting healthcare quality and increasing access to healthcare while reducing the growth rate of healthcare expenditures. To achieve simultaneous progress on all three goals, healthcare payers and providers must develop and implement strategies to develop efficient and effective strategies for generating health benefits for patients (Tanenbaum, 2017). CER and CEA studies will be used to help clinicians asses the value of the care they deliver, identify opportunities for improvement, and demonstrate that value to payers (Beaudin-Seller et al., 2021; Figueroa et al., 2019). These studies will equip clinicians to address two questions: How do you know whether your efforts to strengthen patient health are successful and efficient? How will you convince others?

In this environment, clinicians face growing incentives to conduct program evaluation studies. Payers are utilizing two payment system strategies to incentivize providers to identify and implement efficient strategies for delivering care. Managed care organizations receive monthly capitated payments to provide specified care for all individuals enrolled in the managed care plan. These organizations have strong incentives to produce care efficiently while ensuring that the care meets quality standards. In addition, payers utilizing fee-for-service payments are increasingly including value-based payment incentives in the provider payment. For example, physician reimbursement for treating patients covered by Medicare may be subject to value-based rewards and penalties specified in the 2015 Medicare Access and CHIP Reauthorization Act (MACRA).Footnote 4 To the extent that integrated care strengthens performance on quality metrics while reducing overall healthcare expenditures, this delivery strategy can help physician practices maximize reimbursement for treating Medicare patients.

Some clinicians and entities contract with managed care companies and healthcare providers to offer cost-effective programs, such as integrated care, under shared savings contracts. Under these contracts, the program vendor’s revenue hinges on generation of reductions in the managed care company’s healthcare expenditures. These “savings” are “shared” between the program vendor and the managed care company (Slater & Culkin, n.d.). CEA studies provide the information needed to assess whether it is wise to sign such a contract and - after a contract is signed - to determine year-end payments.

In addition to these monetary incentives to increase healthcare efficiency, healthcare providers increasingly face reputation incentives. Provider-level quality performance and payment data are now available to patients and payers. The CMS website Care Compare provides star ratings summarizing performance on quality of care and patient satisfaction for hospitals (including VA hospitals) and clinicians (including psychologists). In addition, states aiming to increase healthcare transparency are creating all-payer claims databases (APCDs) that store provider-level information that can support analyses to compare practice patterns across provider organizations (All-Payer Claims Database Council, 2021). Provider groups working in states that maintain APCDs can use these data to compare the quality and efficiency of their own practice patterns with the practice patterns of their peers.

Some clinicians deploying evidence-based programs may question the value of conducting CER and CEA studies to measure the impacts these programs in their clinics or practices. Despite the availability of published estimatesFootnote 5 of the impacts of programs implemented in other locations, clinicians and practice administrators conduct site-specific CER and CEA studies because program impacts can differ across sites for several reasons. Site-specific variations in program impacts stem from differences in participant characteristics, differences in the program environment (payment and utilization incentives, insurance coverage, availability of complementary services such as transportation or day care), and subtle site-specific differences in program implementation details. Characteristics of eligible individuals may differ across sites if, for example, one site primarily treats patients covered by Medicaid while another site primarily treats individuals covered by private sector employer-sponsored insurance. Characteristics of the subset of patients who elect to participate in the program may differ across sites if, for example, one site successfully induces individuals with high BMI to participate in a wellness program, while most participants in the wellness program offered at another site are individuals who were already interested in exercise and healthy diets before the program was implemented (US Government Accountability Office, 2012).

Program evaluation study teams include individuals with diverse types of expertise. Clinician input is essential in the study design phase and in the assessment of the implications of the study results. The study team will also include individuals with detailed knowledge of data available in electronic medical records systems, along with individuals with knowledge of the program’s implementation and management processes, and detailed understanding of substitutes and complements for the services offered by the program. The team will also include individuals with expertise in multivariate statistical techniques designed to distinguish between results that reflect correlations among variables and results that reflect causal relationships among variables. These individuals may be econometricians or biostatisticians. If they are not available within the healthcare provider or payer organization, clinicians and healthcare administrators can partner with university researchers who can provide the necessary econometric skills and treatment evaluation experience. In addition to providing essential skills, these researchers offer a neutral perspective, because they are not associated with treatment implementation and operation. Professors with this expertise may work in several types of university departments, including economics, statistics, and public health.

For CEA studies, the team will also include individuals with detailed knowledge of the payer and/or provider organization’s fixed and variable costs and provider organization reimbursement rates and incentive structures. In addition, input may be needed from individuals with detailed understanding of the organization’s budgeting processes. If the program is expected to generate costs or savings over a multiyear period, the study team will also include an individual with experience addressing issues posed by the time value of money.

Clinicians may be the team leaders for some CER and CEA studies. In other situations, clinicians may prefer to delegate this role to neutral entities or individuals who do not have vested interests in demonstrating the success of a specific treatment. Whether clinicians are team leaders or team members, they play significant roles in defining the study goals, specifying useful measures of treatment performance, ensuring that the individuals designing the analytical methods have clear and accurate understandings of treatment processes, and clarifying implications of the study results.

This chapter is not a manual specifying technical methods to complete a study. Instead, this chapter aims to equip clinical professionals and treatment managers to engage in meaningful and productive collaboration with econometricians and/or statisticians. Section 2 focuses on issues addressed by teams designing CER or CEA studies. Section 3 discusses additional issues considered by teams planning CEA studies. Section 4 addresses controversies about appropriate uses of CEA, to help study teams anticipate issues that could potentially arise when they report study results. Section 5 concludes the chapter. To streamline the discussion in this chapter, we will use the terms “program” and “treatment” interchangeably, to refer to the broad array of activities, initiatives, and programs that aim to strengthen patient or client health.

2 Tasks Required to Conduct a CER or CEA Study

While CER and CEA studies address several types of specific questions, the planning processes for all of these studies share a common structure. For both types of studies, the study designers must specify the key questions to be addressed in the study, metrics that will be used to evaluate the impacts of the treatment on participant health, the study sample, and the experimental design and statistical methods that will be used to generate reliable estimates of the treatment impacts (Adams & Neville, 2020). In addition, the study designers will assess the types of reports that will be used to convey the results to the study stakeholders. Clinicians play important roles as subject matter experts at each stage of this process.

2.1 Task # 1: Specify the Study Questions and the Date for Reporting Results to the Study Stakeholders

Specifying the questions to be addressed in the study is an essential first step that shapes subsequent decisions about the study’s key evaluation metrics and the analytical strategy. In this subsection, we discuss three key questions that must be addressed to complete this task.

Who Are the Study Stakeholders? What Questions Are Important to These Entities?

Completing a CER or CEA study requires staff time and financial resources. Careful study design is needed to ensure that the results of this effort will be useful to decision-makers. The first step is identification of the study stakeholders and specification of the questions that are salient for these individuals and entities. The CDC Framework for Program Evaluation (2017) highlights the importance of careful consideration of the questions that will be addressed by the study:

…good evaluation does not merely gather accurate evidence and draw valid conclusions, but produces results that are used to make a difference. To maximize the chances evaluation results will be used, you need to [focus the evaluation] on questions that are most salient, relevant, and important. (CDC, 2017)

The task of specifying the study goal includes specification of relevant types of health impacts, subgroups for which impacts should be estimated, additional questions identified by the study stakeholders, and the duration of the study.

What Types of Health Impacts Are Important to the Study Stakeholders?

Specification of relevant types of health impacts requires a careful trade-off. The definition must be broad enough to capture key treatment benefits, and it must be focused enough to avoid introducing statistical noise by including health events that may be also impacted by myriad other factors.

For example, an employee wellness program might include gym memberships and ergonomic consultations as strategies to reduce surgeries for musculoskeletal injuries. To estimate whether these strategies significantly reduce expenditures for musculoskeletal surgeries, it would be useful to exclude surgeries for patients with diagnoses indicating trauma, patients with recent surgeries, and patients referred to fitness facilities to complete rehabilitation programs. If data on such diagnoses are not available, analysts could alternately exclude patients with unusually long hospital stays. Similarly, for a wellness program focused on encouraging employees to comply with cancer screening recommendations, the relevant outcome metric might be expenditures for cancer treatments. In this case, analysts might control for prior employee tobacco use among program participants and nonparticipants.

What Other Issues Are Important to the Study Stakeholders?

If the key study stakeholder is the program manager, that individual might be interested in both documenting program impacts and identifying opportunities for program improvement. This individual might be interested in program impacts on outcome variables, and she may also be interested in impacts on intermediate process measures. For a tobacco cessation program, for example, the manager might be interested in the program’s impacts on the outcome of abstaining from tobacco use for 1 year. She might also be interested in intermediate process measures such as the proportion of eligible individuals who begin participating in the program, proportion of initial participants who complete the program, and the proportion who relapsed by year-end after abstaining for 6 months.

Should Impacts Be Estimated for Subgroups of Program Participants?

In addition to estimates of program impacts on the full set of participants, stakeholders may request additional results such as estimates of program impacts by participant income category. Understanding the distributions of program participation and health benefits across subgroups of patients or clients may be important to support efforts to address equity issues or efforts to broaden the treatment’s reach. Identifying subgroups for which impacts should be estimated may require consideration of shifting societal priorities and emphases and the degree to which these societal concerns are relevant to the current study. Subgroups may be delineated by a range of factors such as:

  • Geographic proximity between the treatment’s potential participants and treatment sites

  • Socioeconomic characteristics of potential treatment participants such as education

  • Demographic characteristics of potential treatment participants such as race and ethnicity

  • Health risk levels of potential treatment participants defined by factors such as smoking status or the presence of specific diagnoses

  • Proximity to areas affected by environmental risks such as wildfire smoke, mosquitoes carrying specific viruses, or exposures to pollution or toxins

Analyses of participation decisions by demographic or socioeconomic subgroups may help program managers adjust program design and communication strategies to optimize participation patterns. For example, evaluation of an employee wellness program indicated that employees with relatively high wages were more likely to participate in treatment events than employees with lower wages. The program vendor responded by indicating that this result highlighted an opportunity to improve the treatment design. Because employees with relatively high wages were disproportionately employed in office locations, while lower wage workers were more likely to work in non-office locations, the study result highlighted the importance of careful consideration of the locations and scheduling of program events (Mukhopadhyay & Wendel, 2013).

When Will the Study Team Report to the Study Stakeholders?

From a program management perspective, it is reasonable for the treatment funder to be interested in the treatment’s first-year results, to assess whether the treatment should be continued and/or modified.Footnote 6 However, specifying the report due date constrains the duration of the time available for program operation and data collection. This is an important issue for two reasons. First, if data for an ongoing treatment are to be gathered prospectively, the study duration will determine the potential number of treatment participants to be included in the study dataset. Second, the study duration will also determine the time available for observing health impacts.

The definition of a reasonable study duration depends on the types of impacts targeted by the treatment. For example, funders may be interested in first-year results, to make decisions about ongoing funding. This time frame is long enough to assess whether the program successfully helped women quit smoking while pregnant, but it would not be long enough to assess whether the program successfully helped these women maintain abstinence for a year following completion of the pregnancy. Employee wellness programs that aim to reduce employee healthcare expenditures by increasing compliance with cancer screening behaviors face a similar challenge. This strategy requires an initial investment to fund the increased cancer screenings. Thus the immediate effect of this strategy will be increased healthcare expenditures for the screenings, expenditures to determine whether patients with positive screening results actually have cancer, and expenditures to treat newly identified early cancers. However, the savings generated by treating these cancers “early” will not be realized for 6 or 7 years (Pyenson & Zenner, 2005).Footnote 7

2.2 Task # 2: Specify the Study Sample

Study designers must specify whether the study will focus on analyzing the impact of the treatment on individuals who actually participated and engaged in the treatment or analyzing the impact of the treatment on all individuals who were eligible to participate in the treatment. Studies that report the impact of a treatment on individuals who participated in the treatment are said to estimate the impacts of the “treatment on the treated individuals” (TT). Studies that focus on the average impact of a treatment on the full set of individuals who were eligible to participate in the treatment are said to report the impact of “the intent to treat” (ITT). Both types of estimates are useful; however, they address different questions. TT estimates focus on the ability of the treatment to generate benefits for participants. ITT estimates of the treatment’s impact capture the combined result of the program’s recruitment efforts and the program’s impact on participants. This type of result is relevant when the program funder pays a monthly fee for each eligible individual.Footnote 8 In this situation, the ITT permits comparison of the per capita expenditure and the per capita program impact.

For studies that report both TT and ITT impacts, the difference between TT and ITT estimates of treatment impacts yield insights about individual participation decisions. These decisions are known as “selection” decisions, and analysis of these decisions yields information about the program’s ability to “target” individuals with high potential to benefit from the program. Two types of selection are possible:

Selection Type 1

People with low potential to benefit from the treatment might be more likely to participate than people who could potentially gain larger benefits by participating in the treatment. For example, “health-conscious” women may be more likely to be nonsmokers and are also more likely to initiate prenatal care during the first trimester than other women. If so, women who do not smoke would be more likely to obtain recommended prenatal care than women who are smokers. To the extent that assistance with smoking cessation is an important component of prenatal care, this would imply that women who are most likely to benefit from prenatal care (i.e., smokers) are also less likely to initiate prompt prenatal care. In this situation, the TT estimate of the impact of prenatal care would underestimate the impact that would occur if a random sample of eligible individuals obtained prenatal care. Individual selection decisions make it difficult for the treatment managers to achieve the goal of maximizing treatment benefits achieved with the resources allocated to the treatment. An increase in the proportion of eligible individuals who participate in the program would lead to an increase in the average estimated impact per participant.

Selection Type 2

Alternately, women who smoke might proactively seek assistance with quitting during pregnancy. In this case, smokers would be more likely to obtain prenatal care than nonsmokers. The TT estimates of treatment impacts would overestimate the potential impact of expanded participation in a prenatal care program. In this situation individual selection decisions are congruent with treatment managers’ efforts to maximize the benefits achieved with the resources allocated to the treatment. An increase in the proportion of eligible individuals who participate in the program would lead to a decrease in the average estimated impact per participant.

Consider an employee wellness program, for example. Selection impacts are likely to be important because per-person healthcare expenditures vary dramatically across individuals. Nationwide, the most expensive 5% of people spend 50% of the nation’s healthcare dollars, while the least costly half of the population spends less than 3% of the nation’s healthcare dollarsFootnote 9 (Mitchell, 2019). Suppose a large employee group has a similar distribution of healthcare expenditures across the insured individuals. In this situation, an employee wellness treatment can only generate net savings if it successfully engages individuals who incur high healthcare expenditures along with individuals who are likely to enter that group in the future. In contrast, an employee wellness treatment that attracts participation by health-conscious individuals (who incur minimal healthcare expenditures) is not likely to generate net savings.

Selection patterns have implications for both study design and treatment design (Lewis et al., 2014). From a study design perspective, it is useful to understand the pattern of selection decisions. From a treatment design perspective, it is important to induce high levels of participation among individuals who are likely to benefit from the treatment. These efforts are known as “targeting” the program (U.S. Government Accountability Office, 2012).

2.3 Task # 3: Specify Data Sources and Variables

CER and CEA studies employ four types of variables: measures of program outcomes, measures of the degree and duration of program participation, variables that measure individual characteristics that may shape participation decisions, and variables that control for factors that affect program impacts such as individual health status prior to program participation. Analysts may also utilize information about confounding variables, such as major disruptions that could affect program delivery (such as government-imposed COVID lockdown restrictions, or evacuations triggered by major environmental disruptions including hurricanes or wildfires). As study designers consider the variables that will be utilized in the study, they typically confront two types of data questions: Will the study measure treatment impacts by focusing on outcome measures or process measures? Will the study utilize any proxy measures?

Will the Study Rely on Outcome Measures or Process Measures of Participant Health?

Some treatment evaluation studies focus on outcome measures, such as reduced numbers of cardiac events, while other studies utilize process measures, such as participation in screenings for high blood pressure or attendance at weight management sessions. The distinction between process and outcome measures is generally clear; however, the classification of a metric as either “process” or “outcome” can be affected by the study goal. If the treatment goal is BMI reduction, then this variable would be an outcome measure. However, if the treatment focused on BMI reduction as a strategy to achieve the larger goal of reducing expenditures for cardiac events, then expenditures for cardiac events would be an outcome measure, and the BMI reduction variable would be an intermediate process variable.

Some argue that studies should focus on outcome measures, because they provide more useful and relevant information than process measures (Gross, 2012). However, some studies cannot focus on outcome measures due to time frame constraints or sample size constraints.

Time Frame Constraints

Consider a study designed to estimate the financial impact of an employee wellness program. To inform the employer’s decision to either continue or terminate the program, it was necessary to complete the study using 3 years of data. However, an actuarial study completed by Milliman concluded that cancer prevention treatments do not begin to reduce healthcare expenditures for 6 or 7 years after the screenings are performed (Pyenson & Zenner, 2005). Thus, outcome variables could not be used to inform the employer’s contracting decision. However, process variables could provide preliminary information about program participation. This information could include analyses of changes in proportions of employees obtaining screenings (stratified by employee characteristics) or employee success in achieving BMI reductions or smoking cessation.

Sample Size Constraints

Process variables may also offer an advantage over outcome variables for evaluating programs that aim to prevent high-cost outcomes that occur in only small proportions of individuals. Consider, for example, the sample size needed to be able to conclude that a 33% reduction in the incidence of an adverse outcome represents a statistically significant change. If 30% of patients initially experience the adverse outcome, the initial probability of the adverse outcome is P = 0.30. Suppose that only 20% of patients experienced the adverse outcome following implementation of an improvement program. This represents a 33% reduction in the incidence of the adverse outcome. If the number of patients studied was at least 143, the reduction would be statistically significant (at the 5% level, with t >= 1.96).

In contrast, suppose that only 3% of patients experienced the adverse outcome in the initial situation and only 2% experienced this outcome following implementation of the improvement treatment. This represents a 33% reduction in the incidence of the adverse outcome, but this change would not be statistically significant if the sample size were 143. In this case, the sample size needed to conclude that the change was statistically significant is 1875 (see Table 6.1).

Table 6.1 Impact of the initial incidence of an adverse outcome on the sample size needed to conclude that a reduction is significant

For studies designed to prevent low-probability events, the key outcome is the incidence of the low-probability event. However, study designers may not be able to construct samples that are large enough to draw meaningful conclusions about this type of outcome measure. In this situation, it may be necessary to focus on process measures. However, process measures only provide insight about the treatment’s impact on outcomes when estimates of the treatment’s impact on a process measure are combined with published estimates of impacts of the process measure on outcomes achieved in other settings. For example, evaluation of a treatment designed to increase compliance with mammogram screening recommendations might provide evidence that the treatment increased the number of 50–59-year-old women with repeated screenings by 10,000. Study results cited by the US Preventive Services Task Force suggest that recommended screening for 10,000 women in this age category would reduce the number of breast cancer deaths by eight statistical deaths (Nelson et al., 2016). Hence, analysts would conclude that the treatment would likely reduce breast cancer deaths by approximately eight deaths.

This type of indirect estimate must be interpreted cautiously, however, because this estimation strategy does not account for possible differences in the risk factors associated with women who were screened in the program under evaluation and those who were screened in the studies that generated the published evidence. If the risk factors associated with the two sets of women differ, however, the number of deaths averted by the treatment could be significantly higher or significantly lower than 8.

Will the Set of Independent Control Variables Include any Proxy Measures?

In addition to variables measuring treatment participation and outcomes, the study dataset will include variables measuring key factors that could potentially mediate the relationship between treatment participation and outcomes. These variables are denoted as independent variables or control variables. They may identify subgroups for which the study sponsor requests separate estimation of treatment impacts. In addition, independent variables may capture individual characteristics that are hypothesized to affect either individual participation decisions or the probability that treatment participation will generate a significant impact on the individual.

However, direct measures of some important mediating variables are not always available. In these situations, it may be possible to construct proxy variables to capture key issues. For example, an evaluation of a diabetes management treatment utilized a proxy variable to measure the complexity of each participant’s diabetes management task. The annual cost of each individual’s diabetes-related prescriptions provided the proxy measure for the complexity variable (Wendel & Dumitras, 2005).

Detailed discussions between clinical personnel and the analysts are needed to define useful proxy measures. In addition, the analysts may estimate treatment impacts with and without inclusion of the proxy measure to assess the sensitivity of the study results to inclusion of the proxy measures in the analysis.

In other situations, it might be useful to include an independent variable to control for patient income because income might be a measure of challenges faced by the patient attempting to implement medical recommendations. Income is not generally included in medical record databases. In this situation, insurance type may be used as a proxy measure of income: individuals with low income may be eligible for Medicaid, while individuals covered by employer-sponsored insurance are likely to have higher income. Similarly, some analysts use coverage by Medicare as a proxy measure for age 65 or older. These proxy measures do not represent the key variables with complete accuracy: some low-income individuals are covered by employer-sponsored insurance, and some individuals younger than age 65 are covered by Medicare. However, these proxy measures are used as independent control variables because they provide reasonable representations of the indicated income and age categories.

2.4 Task # 4: Specify the Analytical Strategy

Designing a CER or CEA evaluation study to estimate a causal impact of the treatment on outcome variables requires either randomized experimental design or a multivariate econometric estimation strategy capable of yielding estimates that are likely to reflect causal relationships. Multivariate estimation techniques are used because they allow the analyst to control for an array of individual-level characteristics that potentially affect treatment participation and/or health outcomes.

Implementing a randomized experimental design for a prevention program or a clinical treatment that will be offered to a population of eligible individuals can be accomplished in several ways.

  • The treatment may be implemented sequentially at different locations. Locations that have not yet implemented the treatment can serve as the control group if the eligible populations have similar characteristics at each location, the treatment implementation details are uniform at all locations, and the sequence of locations is randomly selected.

  • If the treatment can initially accommodate only one-half of eligible individuals, every other caller could be enrolled in the treatment, while the remaining callers constitute the control group until treatment capacity can be expanded.

  • If the study is conducted to test alternate strategies for implementing a treatment, eligible individuals may be randomly assigned to groups that will participate in alternate versions of the treatment.

Studies that use these types of randomization strategies are known as “field experiments,” because the experimental design must accommodate requirements imposed by the treatment’s purpose and methods and by ethical considerations raised by decisions to offer treatments or programs to selected groups of individuals.

The evaluation task is more challenging for programs that are mandatory for eligible individuals or entities. The analytical strategy used to estimate the impacts of one value-based purchasing program provides an example of issues faced by study designers in this situation (Shapoval, 2020). This study estimated whether a pay-for-performance incentive system successfully induced hospitals to reduce hospital-acquired conditions (HACs). This study utilized national data to estimate the impact of a value-based payment program administered by the federal Centers for Medicare & Medicaid Services. The Hospital-Acquired Condition Reduction Program (HACRP) was designed to incentivize hospitals to develop and implement strategies to reduce the incidence of hospital-acquired conditions for Medicare patients. The study used hospital-level performance scores reported by all hospitals included in the incentive program. The program was announced in August 2013, and the first penalties were applied in 2015. Because there is a lag between the year for which hospitals report data and the year in which penalties were used to reduce Medicare reimbursements for the penalized hospitals, the 2015 penalties reflected performance that predated the announcement of the program.

To test the hypothesis that the incentive program generated reductions in HACs, the researcher constructed a dataset that included HACRP penalty information for the years 2015, 2017, and 2019. The worst-performing 25% of hospitals were penalized in 2015, with a one percent reduction in payments earned by treating Medicare patients. For the first round of analysis, these hospitals were viewed as the “treatment” group, because they received the “treatment” of a monetary penalty and public posting of hospital-level data documenting this penalty based on activities that occurred before the incentive program was announced. The remaining hospitals were viewed as the “control” group, because they were not immediately impacted by the penalty. This presented the researcher with “before” and “after” observations for members of treatment and control groups. The fact that the initial penalties were based on actions that predated the program announcement mitigated concerns about potential selection effects. Multivariate regression indicated that the disparity between the worst performers (i.e., hospitals penalized in 2015 due to poor performance prior to announcement of the HACRP program) and the top performers (i.e., hospitals that were not likely to ever be affected by HACRP due to strong performance prior to 2015) decreased significantly by 2017, and it decreased further by 2019.

Results using this framework (of before and after observations of hospitals in the treatment and control groups) are generally viewed as likely to reflect a causal relationship. However, this type of analysis would normally be accompanied by analysis of HAC trends occurring in the two groups of hospitals for several years prior to the program announcement. Because the available data were not sufficient to support analysis of these trends, the researcher focused, instead, on additional analyses to assess whether the reduction in the gap between the worst performers and the top performers reflected a meaningful relationship or simply regression to the mean.Footnote 10

One characteristic of causal relationships is a dose-response relationship, in which hospitals facing a greater “dose” would exhibit a larger response. The researcher tested whether a dose-response relationship was present by constructing a “threat” variable. The 25% of hospitals that were the worst performers received immediate penalties in 2015. At the same time, all hospitals received the threat that they could fall into this category in 2017, if low-performing hospitals improved their HAC scores. The threat was greatest for hospitals with scores close to the 25th percentile threshold, and it was attenuated for hospitals with scores further from this value. Multivariate analysis of the “threat” variable indicated that the data does exhibit a dose-response relationship.

One hospital administrator commented on this finding, by stating that his hospital faces incentives and pressures to improve on a broad array of metrics. If the hospital discovers that performance on one specific metric is above average, it reallocates staff time and effort to work on metrics that posed more immediate concerns.

While the researcher reported additional exploratory analyses, she cannot rule out the possibility that the results could simply reflect the phenomenon of “regression to the mean.” Because the results are consistent with both hypotheses, the study results must be viewed as preliminary. If the study had been conducted to inform resource allocation decisions, the study conclusion would have provided a careful explanation of the limitations imposed by available data, and the administrator or program manager would review the results with caution until additional years of data become available.

2.5 Task # 5: Specify the Types of Reports to Inform the Stakeholders About the Study Results

CER and CEA studies require thoughtful consideration of the econometric and statistical results, to draw conclusions, identify questions for further examination in the study data or in future work, and explore implications of the study results for program management strategies (Adams & Neville, 2020). Interactions among the clinicians, analysts, and program managers are useful at this point, to ensure insightful examination of these issues.

Based on the number and types of stakeholders and stakeholder requests, it may be useful to generate two types of reports, to explain (i) estimates of program impacts on all participants and on subgroups of participants and (ii) results of analyses of intermediate process measures and other issues that suggest opportunities for quality improvement.

3 Additional Tasks Required for CEA Studies

Individuals designing a CEA study will address all of the questions required for CER, for the health impact component of the CEA study. In addition, the study designers will address four questions specific to the financial impact analysis. Two of these questions add additional components to Task 1, while the other two add additional components to Task 5.

3.1 Additional Steps Required to Specify the Study Goal in Task 1

Whose Financial Impacts Will Be Estimated?

CER and CEA studies estimate health benefits enjoyed by treatment participants. However, the financial impacts generated by these treatments may affect a larger set of individuals and entities. Therefore, an important issue that must be addressed during the study design phase for a CEA study is what individuals and what types of entities will be included in the financial analysis. Will the study consider financial impacts on entities that provide healthcare for the treatment participants, entities that pay for this healthcare, the participants and their families, and/or other entities such as the participants’ employers? While a narrowly focused CEA that estimates financial impacts on the relevant payer and provider offers the obvious advantage of streamlining the analytical task, a more comprehensive CEA that considered financial impacts to other individuals and entities, such as patients and their families and employers, could address an important question: Were costs shifted from one entity to another, or from one entity to patients and families?Footnote 11

The answer to this question is typically shaped by the study purpose. If the study purpose is to support decisions about allocating resources within a provider organization, then the study will focus on estimating financial impacts on that entity. If an employer funds a CEA study to provide information about the financial impacts of a wellness program, then the financial analysis will focus on financial impacts on that employer. If a payer funds a CEA study to inform decisions about a shared savings contract with a provider, the study will focus on financial impacts on the payer.

The question is more complex when a CEA study is designed to inform a public payer’s decision about covering a new treatment or treatment. CEA studies designed to inform public policy decisions may include analyses of impacts on patients and social service providers, along with estimates of financial impacts on providers and payers.

This “societal perspective” is receiving increasing attention. The Analytical Perspectives report for the 2003 federal budget discusses the importance of this perspective for federal agency CEA studies (Executive Office of the President of the United States, 2003). In a section titled “Which Costs Should Be Counted?”, the report states:

If one were only concerned about impacts on the Federal budget, then the only regulatory costs that would be counted would be those incurred (or saved) by a federal agency. To reflect the full effect of a regulation, all costs to society—whether Federal, State, or private costs—should be counted when cost-effectiveness ratios are computed.

Shafrin et al. (2018) expand this discussion, describing three perspectives for conducting a CEA study. In this typology, a CEA study of a medical treatment framed by the Traditional Payer Perspective would include estimates of medical costs to treat the patient, adverse medical events, and patient-reported outcomes that impact the patient’s quality of life. A CEA study framed by the Traditional Societal Perspective would include those impacts plus productivity losses and non-medical costs such as transportation and cost of the patient’s time. A study framed by the Broad Societal Perspective would add impacts on caregivers, the option value of treatment, the insurance value to non-patients, and the value of hope.

The Second Panel on Cost-Effectiveness in Health and Medicine recommended, in 2016, that cost-effectiveness analyses utilize both the societal perspective and the more focused strategy of analyzing financial impacts relevant to the entity sponsoring the study. This panel also recommended summarizing the range of potential program impacts by including a structured table that lists health and non-health impacts of the program. This table is denoted as an “impact inventory” (Neumann & Sanders, 2017). In subsequent discussion, some analysts question the feasibility and usefulness of recommending that all CEA studies should collect the information needed to support the analyses required by a societal perspective. These analysts focus on the recommendation that CEA studies should be designed to address questions posed by the study stakeholder(s).

What Types of Financial Impacts Will Be Estimated?

Studies utilizing the traditional payer perspective may (a) focus on the treatment’s impacts on explicit costs incurred by healthcare providers or payers, (b) focus on the treatment’s impacts on both costs and revenues, or (c) expand the analysis to address impacts on implicit costs along with the impacts on explicit costs and revenues. (The term “implicit costs” refers to costs, such as patient wait time, that do not involve monetary transactions.) The decision to include (or exclude) implicit costs may be particularly important for studies that include analyses of impacts on informal care provided by families or studies of programs that affect the duration of time patients will be unable to work (Paterson, 2014).

Cost impacts are the central financial issue for CEA studies that are funded to support shared savings contracts, and they are also the central financial issue when a treatment is expected to impact cost, with no impact on revenue. This could occur, for example, if a fixed budget clinic evaluates a treatment, to provide input into resource allocation decisions.

Impacts on both costs and revenues may be relevant for a healthcare provider facing quality-based incentive payments such as the MACRAFootnote 12 incentive system that affects payments for services provided to patients covered by traditional FFS Medicare. For example, a private sector primary care practice (PCP) might hire a team of psychologists to provide integrated care. If the integrated care treatment strengthens medication adherence and wellness behaviors among patients with type 2 diabetes, the treatment would generate two types of health impacts for these patients: increased numbers of screening visits and reduced numbers of visits to treat preventable conditions. Each of these health impacts would affect clinic costs and clinic revenues.Footnote 13 In addition, the health impacts could potentially boost patient revenue by increasing the clinic’s quality-based incentive payments.

The revenue impacts of such treatments depend on payment structures and quality-based incentive structures, which vary across payers and providers. Conducting a CEA study, which analyzes both cost and revenue impacts, requires detailed information about the incentive structure offered by each payer that contracts with this PCP and proportions of patients and visits covered by each of these payers. In this situation, the study team will include professionals with expertise in billing and accounting, along with those needed to conduct a CER study.

Many CER and CEA studies focus on costs and revenues from the perspective of a single payer or a single provider. Expanding the study to encompass both a payer and a contracting provider raises an additional complication: payments made by the payer to reimburse the provider appear as costs to the payer and revenue to the provider. The cost of care that triggered the reimbursement appears as a cost incurred by the provider. Whether the provider’s revenue completely offsets the cost of providing that care depends on the terms of the contract between the payer and the provider.

4 Additional Steps Required to Specify the Reporting Format in Task 5

CEA studies are widely used, to evaluate healthcare programs and treatments and to evaluate regulations and programs in other areas such as environmental regulations. However, the details of CEA study design and reporting can be controversial. Anticipating and addressing concerns about the use of CEA in the study report can facilitate communications with study stakeholders. In addition to reporting study results to stakeholders within the organization that requested the study, study results may be submitted for publication. Some organizations will value the credibility established by published study results, while others will view the results as proprietary.

Metrics for Measuring Impacts on Health and Mortality

Several metrics are used to measure the benefits generated by healthcare providers, including statistical lives saved, statistical life years saved, and quality-adjusted life years saved. The metric “statistical lives saved” is equal to the number of deaths averted. The metric “statistical life years saved” is useful for assessing the impacts of treatments that are expected to extend life for a period of time such as 1 year or 5 years.

The metric quality-adjusted life years saved (QALY) was developed to measure improvements in health status as well as extensions in years of life expectancy. QALYs are frequently used to account for the fact that healthcare services may reduce pain or increase ability to conduct activities of daily living (ADLs) or instrumental activities of daily living (IADLs). If a treatment strengthens the ability of individuals to conduct ADLs, without affecting life expectancy, this improvement generates an increase in QALY scores.

Statistical lives saved can be converted to statistical life years saved, to create a common metric. To make this conversion, analysts compute the average age of individuals who lives may be saved. To illustrate, suppose the relevant population is all employed US residents. US data indicate that the median age of these workers is 43 years (https://www.bls.gov/cps/cpsaat18b.htm). Average life expectancy in the USA is 78 years (Arias et al., 2021); hence, the average remaining life expectancy of US workers is 35 years. One statistical life saved would therefore represent 35 statistical life years saved in this population.

One year of life with perfect health is measured as one QALY. Thus, the number of QALYs saved is the same as statistical life years saved if the entire population is in perfect health until death occurs. For individuals with chronic conditions or disabilities that create difficulties conducting ADLs or IADLs, the number of QALYs is less than the number of statistical life years. (Technically, QALYs are also statistical constructs, but it is not stated explicitly because the term QALY is assumed to imply this.) The weights used to compute QALYs for years of life with less-than-perfect health are derived from surveys designed to elicit public preferences regarding the myriad ways that chronic conditions or disabilities can create difficulties with ADLs or IADLs, or create pain or mental disturbances (NICE glossary, n.d.).

Reporting Format Issue 1: Will the Health and Financial Impacts Be Reported as a Ratio?

CEA studies generate estimates of the impacts of programs on participant health benefits and on costs and revenues of payers, providers, patients, and others. These results are typically reported as a ratio of costs to benefits, known as the cost-effectiveness ratio.Footnote 14 For example, the study results might indicate that $12 million dollars would be spent and statistical two deaths would be averted. This could be reported as $6 million dollars spent per statistical life saved.

Reporting Format Issue 2: Will the Ratio Be Compared to a Threshold?

If a payer were considering several programs, it could maximize the number of lives saved with its budget, by funding programs with the lowest ratio of dollars to statistical lives saved. The payer’s decision problem is more complex when it expects to receive an ongoing stream of proposals and expects to make decisions to fund (or reject) proposals as they are received. This type of organization may want to ensure that all decision-makers use consistent standards and that system resources are allocated to the set of treatments that maximize benefits to patients. In this situation, organizations may set a threshold and fund projects that generate health benefits at relatively low cost while rejecting projects that generate health benefits at higher levels of cost.

Why Use a Threshold?

To examine the logic underlying the threshold-based decision strategy, consider a hypothetical organization that will allocate $60 million to implement new programs. To simplify the discussion, we assume that the cost of implementing each program is $20 million and all of the people whose lives may be saved are the same age. In this hypothetical scenario illustrated in Table 6.2, the number of statistical lives saved by each program ranges from one to ten, as detailed below:

Table 6.2 Hypothetical example: CER

The organization would begin by funding treatment A, which would save ten statistical lives. It would also fund treatments B and C. Together, programs A, B, and C would cost $60 million, and they would save 20 lives. Thus, the organization would spend $3 million per life saved. This strategy of funding the three treatments with relatively low CERs would maximize the number of lives saved given that the organization is willing to spend $60 million for this purpose. If the organization substituted treatment D for treatment C, for example, the cost would remain at $60 million, but the total number of statistical lives saved would decrease from 20 to 18.

Implementation of the strategy of allocating resources to treatments with low cost-effectiveness ratios is straightforward if all of the treatments are known at one moment in time. Decision-makers face a more difficult decision if information about treatments becomes available as they are proposed throughout the year. These decision-makers realize that new treatments will be proposed during each budget year, and it will be necessary to either implement or reject each of these treatments as they are proposed. Suppose this decision-maker wants to fund treatments with relatively low cost-effectiveness ratios while accommodating the need to make funding decisions sporadically as treatments are proposed during the upcoming year. He could use data from recent years to set a decision threshold. If the information summarized in the table above represented treatments proposed in recent years, that threshold would be equal to $5 million. Any treatment proposed with a cost-effectiveness ratio less than or equal to $5 million would be funded. Treatments with higher cost-effectiveness ratios would not be funded.Footnote 15

4.1 What Types of Agencies Compare the Cost-Effectiveness Ratio to a Threshold?

CEA is widely used to guide treatment funding decisions in healthcare and in an array of government agencies, in the USA, and in other countries. We will briefly summarize information about regulatory agencies that compare cost-effectiveness ratios to a threshold. We then discuss relevant policies of US healthcare entities.

Agencies That Compare Cost-Effectiveness Ratios to a Threshold

Health benefits are generated by an array of federal programs outside the healthcare system, such as the Department of Transportation (responsible for vehicle safety standards and projects to improve highway safety), the Environmental Protection Agency (sets chemical exposure standards), and the Occupational Safety and Health Administration (which sets workplace safety and exposure standards). These agencies make controversial decisions that save lives and extend lives while also imposing costs on individuals who own small businesses, invest retirement funds in stocks, buy vehicles, and/or pay taxes. Analysts argue that it is important to implement regulations and treatments that save lives at relatively low cost while eschewing regulations and treatments that save lives at relatively high cost. For example, the Organization for Economic Cooperation and Development (OECD, 2008) focuses on CEA as a tool to guide regulatory decisions, and federal budget analysts argue that the CEA threshold decision strategy can help the federal government increase transparency and consistency of federal agency decisions that affect health and safety risks. The EPA computes cost-effectiveness ratios for proposed regulations and compares these ratios to a threshold, which is currently approximately $10 million per statistical life saved. This threshold was set in the 2000 Guidelines for Preparing Economic Analyses, and it is updated to reflect increases in the Consumer Price Index (US Environmental Protection Agency [EPA], 2021).

The British National Institute for Health and Care Excellence (NICE) uses a similar strategy to assess drugs and treatments and to recommend to the British National Health Service (NHS) whether each intervention is a cost-effective use of NHS resources (Sculpher et al., 2001). NHS has an annual budget each year. Therefore, offering a new drug or procedure that does not generate net savings has a clear opportunity cost: the funds expended on the new drug or procedure will not be available to provide established treatments to patients. In the NICE system, treatments that cost more than $45,000 per QALY are less likely to be recommended than treatments that generate QALYs at lower cost (Dillon, 2015). The British strategy is not unique. Similar systems are used in other EU countries, Australia, and Canada. For example, the Canadian Agency for Drugs and Technologies in Health assesses both clinical effectiveness and cost-effectiveness of new drugs and provides nonbinding recommendations to drug plans operated by Canadian provinces other than Quebec (Tikkanen et al., 2020).

CEA and Threshold Policies in HealthCare Entities in the USA

In the USA, policies regarding the use of CEA in healthcare are mixed. The Centers for Medicare & Medicaid Services (CMS) does not use CEA to assess new drugs and procedures. This policy applies directly to CMS approval decisions for individuals enrolled in the traditional FFS component of Medicare. Medicare Advantage PlansFootnote 16 may elect to provide additional coverage that is not specified by CMS. The CMS decision to approve the drug Provenge illustrates the difference between decisions made on behalf of individuals enrolled in traditional FFS Medicare and decisions made on behalf of individuals enrolled in Medicare Advantage Plans (Medicare Coverage Database, 2011). Provenge was approved by the FDA in 2010, and it was approved by CMS for patients enrolled in traditional FFS Medicare with asymptomatic or minimally symptomatic metastatic prostate cancer that was resistant to other treatmentFootnote 17 (National Cancer Institute, n.d.). However, the CMS decision letter specifically stated that local contractors offering Medicare Advantage Plans could make their own decisions regarding coverage for patients with less advanced cancers, when these patients are enrolled in clinical trials (Mendelson & Carino, 2011). Thus, the CMS decision against covering Provenge for patients with less advanced cancers did not block coverage through managed care plans for patients enrolled in clinical trials.

In contrast with the CMS decision, the British NICE concluded, in January 2015, that the cost-effectiveness ratio for Provenge was too low to recommend this treatment. The NICE decision noted that Provenge is an immunotherapy that stimulates the patient’s own immune cells to identify and attack prostate cancer cells (National Institute for Health and Care Excellence, 2015). It was the first drug of this type to be approved by the FDA. To the extent that coverage decisions affect pharmaceutical research initiatives, the difference between the CMS and NICE decisions represents a difference in policies regarding both pharmaceutical innovation and health insurance coverage.

The Affordable Care Act (ACA) expressly prohibited the use of CEA in research funded by the new Patient-Centered Outcomes Research Institute (PCORI). The PCORI was created in the ACA to generate the evidence needed to reduce waste without harming patient outcomes (Patashnik, 2020). The ACA specified that PCORI-funded analyses would focus on clinical effectiveness, without analyzing cost-effectiveness:

The Patient Centered Outcomes Research Institute … shall not develop or employ a dollars per quality adjusted life year… as a threshold to establish what type of health care is cost effective or recommended. The Secretary shall not utilize such an adjusted life year … as a threshold to determine coverage, reimbursement, or incentive programs under title XVIII. (The PPACA) (Neumann & Weinstein, 2010)

However, when Congress reauthorized PCORI at the end of 2019, for another 10 years, it softened the prohibition against the use of PCORI funding for cost-effectiveness analysis. The 2019 bill authorized PCORI to fund research that would address a broad range of outcomes:

Research shall be designed, as appropriate, to take into account and capture the full range of clinical and patient-centered outcomes relevant to … patients, clinicians, purchasers and policy-makers in making informed health decisions. In addition to the relative health outcomes and clinical effectiveness, clinical and patient-centered outcomes shall include the potential burdens and economic impacts of the utilization of medical treatments … on different stakeholders and decision-makers…. These potential burdens and economic impacts include medical out-of-pocket costs, including health plan benefit and formulary design, non-medical costs to the patient and family, including caregiving, effects on future costs of care, workplace productivity and absenteeism, and healthcare utilization. (CITE: P 1423. Extension of appropriations to the patient centered outcomes research trust fund; Subsection (d)(2) of such section 1181 is amended by adding: (F) Consideration of full range of outcomes data)

Despite the impact of this expansion in the types of outcomes that may be considered in PCORI-funded research, Congress continues to constrain PCORI from funding studies designed to compare a cost-effectiveness ratio with a threshold. The PCORI website states:

… even with this expanded provision, our authorizing law still does not allow developing or employing a dollars-per-quality adjusted life year as a threshold to establish what type of health care is cost-effective or recommended… (New Provision Bolsters Relevance, Usefulness of PCORI’s Work Sept 22, 2020. http://www.pcori.org//blog-topic/research)

In contrast, other public and private entities are utilizing CEA to support health and healthcare decisions (Neumann & Sanders, 2017), including the CDC Advisory Committee for Immunization Practices (CDC, 2021), the American College of Cardiology and the American Heart Association (Anderson et al., 2014), and the Institute for Clinical and Economic Review (ICER, n.d.; Neumann & Cohen, 2018). In 2017, the VA began utilizing ICER cost-effectiveness evaluations for prescription drugs. Glassman et al. (2020) summarize the results of this collaboration:

Overall, the VA-IOCER collaboration has been highly beneficial to VA on several fronts, most notably to help VA better understand the relative value of new drugs and in providing more concrete pricing benchmarks for medications that are often extremely costly upon release. (p. 5)

In comparing CMS and VHA policies on the use of CEA, it is useful to note that the VHA has a fixed annual budget set by Congress, while CMS deficits are financed through general fund dollars.

The fact that agencies outside the healthcare industry have established thresholds provides a useful context for considering cost-effectiveness ratios computed for healthcare interventions. The EPA strategy for setting the threshold focuses on actual decisions made by individuals when they face trade-offs between risk and money. Individuals face this decision when they weigh the costs and benefits of buying a smaller, cheaper, and less safe vehicle against the costs and benefits of a larger, heavier, and safer vehicle. Individuals also face this decision when they consider employment in jobs that provide a base rate of pay and additional payment in the form of “hazard pay” for individuals facing elevated on-the-job risk (Catolico, 2020). The EPA website posts a review of published studies on decisions in which people reduce risk by paying higher prices (or accepting lower wages). The EPA’s explanation of its logic focuses on trade-offs individuals make between dollars and risk in our daily lives:

The EPA does not place a dollar value on individual lives. Rather, when conducting a benefit-cost analysis of new environmental policies, the Agency uses estimates of how much people are willing to pay for small reductions in their risks of dying from adverse health conditions that may be caused by environmental pollution.

This is best explained by way of an example. Suppose each person in a sample of 100,000 people were asked how much he or she would be willing to pay for a reduction in their individual risk of dying of 1 in 100,000, or 0.001%, over the next year. Since this reduction in risk would mean that we would expect one fewer death among the sample of 100,000 people over the next year on average, this is sometimes described as “one statistical life saved.” Now suppose that the average response to this hypothetical question was $100. Then the total dollar amount that the group would be willing to pay to save one statistical life in a year would be $100 per person × 100,000 people, or $10 million. This is what is meant by the “value of a statistical life.” Importantly, this is not an estimate of how much money any single individual or group would be willing to pay to prevent the certain death of any particular person.

What does it mean to place value on life? (EPA, 2020)

Based on these studies, the EPA concludes that the American public is willing to pay approximately $10 million per statistical life saved. This conclusion does not state that any individual would be willing to exchange his life for this amount of money. Instead, this conclusion indicates that a large population has demonstrated willingness to spend a total of $10 million to avoid one statistical death, or $286,000 to avoid losing on statistical life year. This number exceeds the NICE threshold by a substantial margin.

Anticipate Concerns About CEA Methods and Address These Concerns in the Study Report

The concept of allocating government funds to treatments that generate benefits at relatively low cost is widely accepted as an essential component of efforts to increase healthcare efficiency. However, some aspects of this concept are controversial. It may be useful for study designers to anticipate concerns articulated by individuals and groups opposing the use of CEA and CEA thresholds.

  • There may be substantive gaps in the information available to support the CEA study.

Information is not always available to support evidence-based estimates of costs and benefits of new treatments. This issue is particularly salient when long-term costs and benefits are important and when individual behavioral responses are important. To address this concern, CER and CEA reports should provide detailed information about limitations imposed by the available data.

  • Reliance on the QALY metric could disadvantage disabled individuals.

The use of the QALY metric raises concerns about comparisons of treatments that would extend life for individuals in perfect health versus treatments that would extend life for individuals with chronic conditions or disabilities. One extra year of life would generate more QALYs for individuals in otherwise perfect health than for individuals with chronic conditions or disabilities. Some analysts and policy-makers view this problem as sufficient reason to ban the use of CEA for insurance coverage decisions. Analysts can address this concern by reporting two CEA ratios, with one ratio using the QALY metric and the second ratio using statistical life years saved.

  • The manner of death may be important.

Some analysts argue that people may generally dread death due to cancer more than sudden death due to an accident. NICE addresses this concern by setting a separate threshold for treatments that extend life for cancer patients.

  • Maximizing the total number of statistical lives saved is a simplistic goal. It ignores the question of whether there is an equitable distribution of statistical lives saved across groups delineated by geographic, socioeconomic, race/ethnicity, age, and health status characteristics.

Some opponents of using CEA argue that the threshold decision strategy is likely to focus attention on healthcare system efficiency while diverting attention from equity issues. These analysts argue that equity concerns require careful consideration of the impacts of CEA decision strategies across subgroups of people differentiated by geography, socioeconomic status, race/ethnicity, age, differences in health status, and other characteristics (Neumann & Sanders, 2017).

  • Using a CEA threshold to make health insurance coverage decisions is a smokescreen to disguise rationing or limiting care. Restricting access to costly care could harm current patients and impede medical innovations that might benefit future patients.

Some critics of using CEA thresholds as a tool to guide health insurance coverage decisions fear that comparing CEA ratios to a threshold is likely to limit healthcare expenditures. Advocates of the use of CEA counter this argument by stating that resource allocation decisions cannot be avoided and CEA increases the transparency of the trade-offs facing decision-makers (Neumann & Sanders, 2017).

Analysts also note that some expensive drugs are cost effective. In this situation, the CEA approach offers a logical approach to drug assessment. For example, some cancer drugs are expensive. Compared with money spent to avert statistical deaths in non-healthcare situations, however, some expensive healthcare treatments are bargains (Lakdawalla et al., 2010; Philipson et al., 2012; Lichtenberg, 2013). For example, Lakdawalla et al. (2010) conclude life expectancy for patients diagnosed with cancer during the years 1988–2000 increased by 4 years. Despite the much discussed high prices for cancer drugs, they also conclude that expenditures per statistical life year saved were substantially lower than the EPA threshold. They conclude that healthcare producers received less than 20% of the net gain, while patients enjoyed the majority of this gain. These results indicate that CEA offers a more nuanced assessment strategy than the “cost containment” approach for improving healthcare system efficiency (Murphy & Topel, 2006).

  • The term “statistical life” is confusing.

The term “statistical life” does not facilitate communication about the fact that people routinely make decisions that involve trade-offs between money and risk. The EPA has announced intention to clarify its focus on risk reduction by changing from setting a threshold based on Value of Statistical Life (VSL) to setting a new type of threshold measuring the Value of Mortality Risk (VMR). The EPA explains the logic underlying this change as follows:

For decades economists have been studying how people make tradeoffs between their own income and risks to their health and safety. These tradeoffs can reveal how people value, in dollar terms, small changes in risk. For example, purchasing automobile safety options reveals information on what people are willing to pay to reduce their risk of dying in a car accident. Purchasing smoke detectors reveals information on what people are willing to pay to reduce their risk of dying in a fire. (EPA-e: How will the EPA Estimate the Value of Mortality Risk (VMR))

In conclusion, clinicians participating in planning CEA studies and reporting CEA study results should be aware of controversies regarding CEA. Despite these controversies, however, this analytical strategy is widely used. It may be useful to note that much of the controversy focuses on the question of whether cost-effectiveness ratios should be compared to thresholds, to support public agency resource allocation decisions. When CEA results are not compared to thresholds, they may be used to help decision-makers develop a broad understanding of benefits generated by various alternative strategies. In addition, they may be used to support pharmaceutical price negotiations. Finally, CEA study results can place trade-offs between alternate healthcare treatments within the larger context of trade-offs between alternate regulatory policies that affect health such as environmental, highway safety, or product safety regulations.

5 Conclusion

Clinicians recommend treatments for individual patients, yet they are increasingly expected to consider population health issues when they make these recommendations. For example, the CMS clinician payment incentives include healthcare expenditures as a metric for measuring the value of the clinician’s work. In this environment, clinicians are incentivized to develop and implement strategies to work more efficiently, and they are expected to consider impacts of individual treatment decisions on total healthcare expenditures. Therefore, clinicians are likely to be increasingly involved in CER and CEA studies as members of study teams and as “consumers” of other study results. As members of study teams, clinicians will contribute to study design, interpretation of the statistical results, and communication of the study conclusions.