Introduction

Cost-effectiveness analysis (CEA) can be a powerful policy-making tool. By promoting value-based decision-making, CEAs have helped guide policies ranging from human papilloma virus vaccinations [1] to mammography [2] to drug approval decisions [3].

In global surgery, CEAs are proliferating. Beginning in the 1990s with analyses in cataract [4] and trachoma surgery [5], continuing with McCord and Chowdury’s seminal evaluation of a Bangladeshi hospital [6], through Gosselin et al.’s [7] evaluation of surgical services in a small hospital in Sierra Leone and Kahn et al.’s [8] study in male circumcision, and culminating in two systematic reviews of the cost-effectiveness of surgery in low- and middle-income countries (LMICs) [9, 10], this body of work has firmly established that surgery can be a cost-effective intervention [9, 10]. By expanding CEA beyond just medical interventions to system-wide analysis, CEAs also served as a cornerstone for the economic argument for surgery in the Lancet Commission on Global Surgery [11] and the third edition of the World Bank’s Disease Control Priorities in Developing Countries [12].

With the crescendo in CEAs in global surgery, however, a review of the standards for analysis is timely. A recent systematic review of CEAs in global surgery found that, out of 26 published articles, only 4 were able to fulfill all the criteria presented in one commonly used guideline [9]. Discrepancies in methodology lead to discrepancies in results—sometimes varying as much as 150-fold—limiting generalizability and preventing policymakers from making value-based comparisons of surgical interventions. This review will give an overview of CEA methodology, highlight common pitfalls in performing these analyses, attempt to reconcile sometimes divergent guideline recommendations [1316], and establish standards for rigorous analyses in global surgery.

Theoretical basis of CEA

It is often helpful to begin at the end. When multiple intervention options face a limited budget, CEA provides a framework with which to assess which option (or options) will maximize benefit per dollar spent. The most basic analysis compares the costs and outcomes of at least two hypothetical scenarios: continuing the status quo and implementing one or more interventions.

The fundamental output of any CEA, through which this comparison is made, is the incremental cost-effectiveness ratio (ICER):

$${\text{ICER}} = \frac{{c_{\text{a}} - c_{\text{b}} }}{{e_{\text{a}} - e_{\text{b}} }}$$
(1)

where \(c_{\text{a}}\) and \(c_{\text{b}}\) represent the costs of scenarios A and B and \(e_{\text{a}}\) and \(e_{\text{b}}\) their outcomes. In all analyses, the cost and outcome of the status quo (scenario B above) must be assessed. The importance of assessing the incremental (or marginal) benefit of an intervention cannot be stressed enough; failing to do so leads to incorrect conclusions (see “Appendix”).

The ICER represents how much it would cost society to buy one unit of benefit from scenario A, over the benefit already obtained in scenario B. It can be compared to how much society is willing to pay for an additional unit of benefit to determine whether intervention A is “cost-effective.” ICERs may be used to compare multiple interventions against each other. In this case, the interventions are ordered from least expensive to most expensive, and the ICER is calculated for each sequential intervention. More detail on multiple interventions is given in the “Appendix”.

The lower the ICER, the more cost-effective an intervention will appear, putting a heavy onus on the analyst to avoid anything that biases the ICER downward [17]. To protect against this bias, as well as to maintain transparency and improve reproducibility, all assumptions that underlie the analysis should be made explicit.

This review will highlight methods to minimize common pitfalls in CEA leading to downward bias. These pitfalls occur in five broad categories, each of which is necessary for undertaking a complete cost-effectiveness analysis: the analytic perspective, cost measurement, effectiveness measurement, probability estimation, valuation of the counterfactual, and heterogeneity and uncertainty.

One note on what this review will not cover. Other methods of cost analysis have been proposed, including cost-minimization analyses and benefit–cost analyses (BCAs). The former are less useful: They simply focus on the numerator of Eq. (1), thereby making the implicit assumption that the denominator is 1. BCAs, on the other hand, are common in global surgery, and the recommendations discussed below apply. However, BCAs take the additional step of valuing the denominator monetarily—that is, assigning a dollar value to a health benefit. Doing requires additional assumptions around, for example, willingness to pay for each life saved [18]. Discussing these assumptions is beyond the scope of this review.

Analytic perspective and intervention definition

Analyses may be undertaken from multiple perspectives—that of the patient, the hospital, the payer, or society as a whole. Every published cost-effectiveness guideline recommends the latter [1316],Footnote 1 because adopting a societal perspective maximizes inter-analysis comparability and generalizability of findings.

Although other perspectives may be informative for specific questions, they should only be done secondarily and, if used, should be explicitly identified.

This has an important corollary. In surgery, especially, interventions are not delivered in isolation. A CEA on a surgical mission to fix hernias in Madagascar does not establish the cost-effectiveness of hernia surgery. It establishes the cost-effectiveness of a surgical mission to fix hernias in Madagascar, when compared to current hernia treatment strategies in Madagascar. Generalizing from the specific intervention platform to surgery as a whole is impossible.

Measuring cost

Although accurate costing is imperative for generalizability, it is in costing that the societal perspective is most commonly lost. Measures of effectiveness are designed to be distributions across populations and time. When costs are calculated for smaller analytic frames—for the hospital [7] or for the mission providing the surgery [19, 20]—the numerator in Eq. (1) is artificially decreased, introducing a downward bias into the ICER.

From a societal perspective, costs accrue to the healthcare system, the provider, and the patient. These must all be included.Footnote 2

Health system costs

Any health system costs specific to an intervention must be included. A new task-sharing program requires training. Including only the costs of already-trained task sharers, without taking into account the costs of training, would be an underestimate.

Provider costs

Two types of provider costs exist. Fixed costs are those that must exist for an intervention to occur, and do not vary whether one surgery is done or many. Variable costs increase as the number of surgeries increases.

Fixed costs

Capital costs—for example, the cost of a hospital building, an operating room, an anesthesia machine—must be allocated to the evaluated program. For example, if an anesthesia machine was being used exclusively for Cesarean sections, a new laparoscopic surgery program would have to include its cost, to the extent that it is used for laparoscopic surgery. Any capital should be annualized across the lifetime of the program, taking into account its resale value (if resale is possible) and discounting over time [14].

Labor costs are often large, so ignoring them biases the results. In general, medical professionals’ salaries and benefits are an appropriate approximation, despite the fact that, in many LMICs, local medical professionals are likely underpaid. In global surgery, an additional wrinkle exists in the fact that many interventions are provided by volunteer organizations. As such, the opportunity cost of the volunteer surgeons and staff—effectively, their foregone salary for being on the mission—must be included.

Variable costs

Variable costs include anything used on a per-case basis: medications, supplies, and operating room time, for example, even if donated. If per-hour operating room time cost is available, it should be used. If not, the total aggregate cost of the operating room can be allocated proportionally to the time spent performing the evaluated interventions.

Patient costs

Patients face three sets of costs: direct medical, direct non-medical, and indirect. Direct costs are any for which a patient must reach into her pocket, while indirect costs are losses in opportunity. All direct costs must be included; the WHO recommends excluding indirect costs in the main analysis, but including them, if available, in secondary analyses [15]. Care should be taken not to assume that “free care” is always free to the patient [21].

Direct medical costs

In addition to the costs of intraoperative medications and supplies, direct medical costs include pre- and post-operative medications, laboratory and radiographic testing, blood transfusions, and any other cost a patient faces because she had surgery. In high-income countries, per-day hospital admission costs are often available from the literature [22]; for LMICs, the WHO’s Choosing Interventions that are Cost-Effective (WHO-CHOICE) provides country-specific costs for inpatient and outpatient bed days [23].

Direct non-medical costs

Patients also have to pay to get to care. These direct non-medical costs can sometimes be larger than the medical costs themselves [24]. They include, at minimum, transportation, food, and lodging (if not provided by the hospital), and the “informal payments” often required for care to be delivered [25].

Indirect costs

Time is the most commonly cited indirect cost. An hour spent lying in a hospital bed is an hour spent not doing something else. The value of that “something else” is the opportunity cost of the hour spent in the hospital bed.

In high-income countries, time is often valued in terms of prorated wages [16]. In countries without a fully developed labor market, income is not an appropriate measure of economic worth. Economists have used household expenditures as its proxy [26].

Caregiver costs

One final note: Direct and indirect costs accrue to both patients and their caregivers. If it is common for caregivers to accompany patients, as is often the case for surgery, costs for the caregiver’s transportation, food, and lodging must also be included. Their indirect costs should also be included, if this secondary analysis is being performed.

Standardizing costs

To maximize generalizability, costs may be reported in local currency units but should also be reported in a standardized currency comparable across analyses. The reference cost for most CEAs is the inflation-adjusted, standardized US dollar. Getting from the local currency unit to this standardized cost measure requires conversion across both time and space. Cost conversions can be complicated, and any uncertainty would be best resolved by consultation with a global health economist.

Adjusting across time (inflation)

In the USA, adjustment for inflation can be performed with the medical portion of the consumer price index [28]. For global health CEAs, the WHO recommends GDP deflators instead [29]. To adjust 2010 to 2014 costs, the deflators are used as follows

$$c_{2014} = c_{2010} \cdot \frac{{G_{2014} }}{{G_{2010} }}$$
(2)

where c y represents cost in year y and G y represents that year’s GDP deflator.

Adjusting across space (purchasing power)

When converting out of local currency, it is tempting to use prevailing market exchange rates. This is not completely correct.

For costs of items that can be traded across borders (e.g., medications, instruments, machines), conversion from the local currency unit to US dollars is correctly done using the prevailing exchange rate at the time the purchase was made. For non-tradable goods, however (e.g., salary, a day in the hospital), costs must be converted using the purchasing power parity exchange rate, which the World Bank defines as the rate at which a dollar “would buy in the cited country a comparable amount of goods and services a US dollar would buy in the USA” [27]. For example: $1 in Madagascar buys far more than $1 in the USA, and the analyst must take this “purchasing power” into account. The World Bank provides purchasing power parity (PPP) conversion factors, [29] of which the PPP factor for private consumption (as opposed to for GDP) should be used.

Adjusting across space and time

When adjusting across both space and time, use local GDP deflators to adjust for inflation first, then convert to international dollars using the target year PPP conversion factor.

Discounting

All future costs and benefits must be discounted, to reflect the relative importance for individuals of present costs and benefits over future costs and benefits (i.e., it is not a correction for inflation). Discounting a future stream of costs is calculated as follows:

$${\text{Total Cost}} = \sum\limits_{t} {\frac{{c_{t} }}{{\left( {1 + r} \right)^{t} }}}$$
(3)

where c t represents the cost in year t (t = 0 for the current year) and r represents discount rate. Although there is significant controversy about whether costs and health benefits should be discounted at the same rate, common practice is to set r = 0.03 [14] and to vary r in sensitivity analyses to determine the sensitivity of the results to discounting.

How long to discount

Many analyses discount over the lifetime of the patient. Note that because average life expectancy is the life expectancy at birth, it is an underestimate for a patient who has survived past infancy. Country-specific life tables [30] give life expectancies by age group and should be used instead.

Measuring effectiveness

The most commonly utilized measures of effectiveness in global surgery are lives saved, years of life lost, quality-adjusted life years (QALYs) gained, or disability-adjusted life years (DALYs) averted. Although there are theoretical reasons to question these common measures [31], and although DALY calculations have changed in the last two decades, discussions of this controversy are beyond the scope of this paper.

The QALY was developed by Pliskin, Shepard, and Weinsten in 1980 [32] as a summary measure for length and quality of life:

$${\text{QALY}} = qT$$
(4)

where q represents a health state valuation, scored on a scale of 0 (death) to 1 (perfect health) and T represents time spent in that health state. Discounting a stream of future QALYs can be done similarly to Eq. (3).

In contrast, the DALY, developed in 1994 [33], is a “gap” measure, quantifying a loss from perfect health and perfect longevity. It is a sum of full years of life lost (YLL) due to premature mortality and full years of life lived in disability (YLD):

$${\text{DALY}} = {\text{YLL}} + {\text{YLD}}$$
(5)

YLL is straightforward. For the purposes of CEA, YLDs should be calculated from an incidence perspective [34], such that:

$${\text{YLD}} = {\text{d}}T$$
(6)

Note that this is different from the prevalence-based calculations used in burden of disease studies. For DALYs, d ranges from 0 (perfect health) to 1 (death)—the exact opposite of q.

In practice, QALYs have been more frequently used in high-income country CEAs, while DALYs have been more frequently employed in global health analyses. Although there is no fundamental theoretical reason for this, we recommend continuing to use DALYs for CEAs in global surgery to maximize comparability.

Of note, nothing constrains what is placed in the denominator of Eq. (1). CEAs exist, for example, evaluating the cost per pregnancy averted [35]. The interpretability of non-traditional denominators is limited, however—how much would society be willing to pay, in this example, for an averted pregnancy?

Choosing appropriate disability weights

QALY weights [q in Eq. (4)] for many surgical conditions can be found in a searchable database maintained by Tufts University [36]. The most recent DALY DWs [d in Eq. (6)] can be found in Salomon et al.’s [37] Lancet article.

Unfortunately, surgery-specific DWs are sparse, making surgery CEAs difficult. This presents the analyst with an added challenge: Estimation of DWs is required, but overestimations of DW will, once again, introduce downward bias into the ICER.

A common response to this [38] is estimation based on a method proposed by McCord and Chowdhury [6] (Table 1), itself a slight simplification of a table in Murray’s original 1994 paper [33]. This method is widespread, and although it has been suggested that it is validated [39], we could find no evidence of validation. Further, it is unambiguously subjective and routinely overestimates the disability weight of surgical conditions.

Table 1 Disability weight estimates [6]

As an example of the overestimation, take cleft lip and palate. A reasonable argument may be made from Table 1 that these conditions should be assigned a disability weight of somewhere between 0.2 and 0.4—limited ability to perform activities in one or two of recreation, education, procreation, or occupation.

This would be an overestimate: formal evaluations have placed the DW for cleft lip around 0.1 [40], and the accepted DW for the Global Burden of Disease project is 0.122 (discussed next).

What if the disability weight does not exist?

The lack of DWs in surgery presents the field with vast research opportunity: direct DW elicitation is required. In the absence of directly elicited DWs, however, the Global Burden of Disease initiative recommends an estimation using DWs for “generalized illness” (Table 2). A patient with cleft lip and palate will have difficulty with speech (DW = 0.054) and will experience level 2 disfigurement (“a visible physical deformity that causes others to stare and comment. As a result, the person is worried and has trouble sleeping and concentrating”; DW = 0.072). Comorbid DWs combine multiplicatively:[41]

$${\text{DW}}_{\text{Total}} = 1 - \left[ {\left( {1 - {\text{DW}}_{1} } \right) \cdot \left( {1 - {\text{DW}}_{2} } \right)} \right]$$
(7)

giving a DW, in this case, of 0.122.

Table 2 GBD 2010 disability weights for general conditions (95% uncertainty ranges in parentheses) [37]

Finally, all DW calculation should be referenced against other DWs for (at least) face validity. Cleft palate has, in some papers, been estimated to carry a DW of 0.38 [42]. This implies that a year lived with a cleft palate is worse than a year lived without both arms (DW = 0.359), almost twice as bad as a year lived in blindness (DW = 0.195), and approximately equal to a year lived with both HIV and tuberculosis (DW = 0.399) [37].

Even with the most careful analysis, subjectivity is bound to remain. For this reason, we emphasize again that all assumptions must be clearly stated to allow the reader to draw the most accurate conclusions. In addition, a large opportunity for research exists in the elicitation and standardization of DWs for surgical disease.

Discounting and age-weighting

Although age-weighting and discounting are not without controversy, broad consensus exists that future benefits should be discounted at the same rate as future costs in at least the base-case scenario [15, 16]. Age-weighting was initially utilized by the WHO [15], but the updated Global Burden of Disease studies have done away with it [43]. The recommendation, then, is to present non-age-weighted, discounted DALYs in the base-case analysis, and to reserve age-weighting for scenario analyses, discussed below.

Probability estimation

The effectiveness of an intervention is based on the number of DALYs it averts. DALYs averted have previously been calculated as: [31]

$${\text{DALY}}_{a} = {\text{YLL}} \times {\text{RD}} \times {\text{PST}}$$
(8)

for lethal conditions, and

$${\text{DALY}}_{a} = {\text{YLD}} \times {\text{DW}} \times {\text{RPD}} \times {\text{PST}}$$
(9)

for non-lethal conditions, where RD represents the risk of death, PST represents the probability of successful treatment, and RPD represents the risk of permanent disability. Unfortunately, these are problematic, as will be discussed below.

Determining probabilities

CEAs in global surgery have used estimates for RD, RPD, and PST often been based on ill-defined ranges [38, 39]. For example, Gosselin et al. [39] assign the point estimates shown in Table 3 for RD.

Table 3 Probability estimation [39]

This forces the analyst to guess probabilities. Even if the analyst guesses correctly, however, the table introduces systematic error unless the true RD is exactly 50%. If death risk is low (as is often the case), DALY estimates are inflated. If it is high, this procedure is an underestimate (Fig. 1).

Fig. 1
figure 1

Error introduced by utilizing probability estimates found in Table 3. The distributions come from a microsimulation of 1000 cost-effectiveness analyses using the true probability compared with the estimates in Table 3. On average, very little error is introduced if the true probability is exactly 50%, with increasing error as the true probability rises or falls

Routinely overestimating a 25% probability as 31%, when multiplied over entire populations and multiple years, clearly results in an unacceptable error, especially when event probabilities are easily found in the literature or can be estimated from hospital records [44].

Avoiding oversimplification

A second problem with Eqs. (8) and (9) is their oversimplification. They assume that the only potential outcomes from treatment are cure, death, or an ill-defined “residual permanent disability.” Equation (8) implies the decision tree in Fig. 2.

Fig. 2
figure 2

Decision tree implied by usual estimates—in Eq. (8)—of DALYs averted

This ignores three very real issues:

  1. 1.

    the risk of complications from treatment

  2. 2.

    the fact that most conditions are not only lethal or only non-lethal, and

  3. 3.

    the change in the mortality risk after “unsuccessful” treatment.

A more complete tree is shown in Fig. 3.Footnote 3

Fig. 3
figure 3

A more complete representation of a patient’s potential outcomes after a surgical intervention

The true estimate of DALYs averted is:

$${\text{DALY}}_{a} = {\text{YLL}}\left( {{\text{RD}} - {\text{RD}}_{\text{postTx}} } \right) + {\text{PST}}\left( {{\text{RD}}_{\text{postTx}} \cdot {\text{YLL}} + {\text{YLD}}_{{{\text{d}}z}} - {\text{pCompl}} \cdot {\text{YLD}}_{\text{compl}} } \right)$$
(10)

Note that the value obtained from Eq. (10) is almost exclusively smaller than the value obtained from Eqs. (8) and (9), once again introducing downward bias into the ICER.

Importantly, even Eq. (10) is itself a simplification, since not all complications are created equal. As a result, we recommend against using simplified formulas and make the strong recommendation for the construction of decision trees, as in Fig. 3. Probability-based equations and/or decision trees must be applied to both the numerator (cost) and denominator (effectiveness) of Eq. (1).

Valuing the counterfactual

CEAs are used to answer two types of policy questions, and confusion between these two types has led to the propagation of errors. Published recommendations for CEA do not agree on which type of analysis is most appropriate [15, 16]. We make a recommendation in this paper, in line with the majority of the recommendations, but acknowledge that others disagree.

Which problem should be analyzed?

CEAs can answer either a “shopping spree” or a “competing choice” problem [45] (see “Appendix” for details).

The shopping spree problem assumes a health system does not exist, and asks which of a menu of non-exclusive options (e.g., surgery vs HIV treatment vs antimalarial care) should be in a newly constructed health system. On the other hand, the competing choice problem assumes a health system does exist and is already treating a condition, and asks, “Which of these (mutually exclusive) methods of treating this condition is most cost-effective?”

In the shopping spree problem, Eq. (1) may be simplified. Because the healthcare system is being designed de novo, there is no counterfactual to compare against, and the ICER simplifies to an average cost-effectiveness ratio:

$${\text{ICER}} = \frac{{c_{\text{a}} - 0}}{{e_{\text{a}} - 0}} \to \frac{{c_{\text{a}} }}{{e_{\text{a}} }}$$
(11)

This simplified cost-effectiveness ratio is almost exclusively smaller than an ICER.

Early global surgery CEAs had to advance an argument that surgery could be cost-effective and should be included in health systems. That is, the shopping spree problem needed to be solved. This argument has been made, and made strongly. Two systematic reviews [9, 10] as well as the Lancet Commission on Global Surgery [11] have all established that surgery is likely cost-effective.

As a result, however, the shopping spree problem is no longer relevant. This is doubly true if, in reality, a certain type of surgery is being performed. That procedure has already been included in the health system, making a simplified cost-effectiveness ratio misleadingly low.

In keeping with Gold [16] and the CHEERS consortium [13] (but in distinction with WHO [15]), we recommend that a true ICER be calculated, with all interventions measured against the status quo, not against the theoretical counterfactual of “nothing”. It is much more relevant to evaluate platforms for surgical delivery than to spend research time answering a question that has already been answered.

The difference between an average CE ratio and an ICER is not small. Take, as an example, a hypothetical “surgical mission trip” to fix obstetric fistulas. Over 2 weeks, it repairs 20 fistulas, at a cost of $100,000 (similar to published valuations [46]). Making the (heroic) assumptions that every repair is successful, that no complications ensue, that no recurrence happens, that all patients are 18 years old, and that their life expectancy is an additional 40 years each, each repaired fistula nets

$${\text{DALY}}_{a} \sum\limits_{t = 0}^{39} {\frac{0.338}{{1.03^{t} }} = 8.047}$$
(12)

DALYs averted (discounted at 3% per year, using the GBD 2010 DWs [37]).

The average CE ratio of the mission trip (Eq. 11) is

$${\text{CER}} = \frac{\$ 100,000}{20 \times 8.047} = \$ 621.36$$
(13)

However, a hypothetical district hospital repairs 5 fistulas a week, at a weekly cost of $5000 (at a much better average CE ratio of $124.27). The $100,000 spent on the mission trip could be used to scale up the repair of fistulas in this hospital. The question, then, is not whether the mission trip should be included in the health system, but whether the additional benefit gained from the mission trip is worth the additional cost:

$${\text{ICER}} = \frac{\$ 100,000 - \$ 10,000}{{\left( {20 \times 8.047} \right) - \left( {10 \times 8.047} \right)}} = \$ 1118.40$$
(14)

This implies that the additional 10 patients done by the mission trip actually cost $1118.40 per DALY averted—nearly twice the original estimate.

Addressing heterogeneity and uncertainty

Heterogeneity and uncertainty are not often addressed in global surgery CEAs.

Heterogeneity

Not all patients are identical, and this heterogeneity can substantively alter ICERs. For example, the following sets of life expectancies average to 40:

$$\begin{array}{*{20}c} {\varvec{A}:} & {\left\{ {40, 40, 40, 40, 40} \right\}} \\ {\varvec{B}:} & {\left\{ {10, 10, 50, 50, 80} \right\}} \\ \end{array}$$

Using the same DWs and discount rate as in Eq. (12), an intervention in set A averts 40.23 DALYs, while an intervention in set B averts only 34.37 DALYs. This difference alone would raise the ICER in Eq. (14) to $1309.

If individual patient data are available, they should be used instead of averages. In the absence of this level of granularity, heterogeneity can be dealt with in microsimulation models.

Uncertainty

Two layers of uncertainty exist: parameter uncertainty and scenario uncertainty. Mortality rates, risks of recurrence, probabilities of complications are all estimates and are therefore uncertain. Parameter uncertainty can be addressed in one-way, two-way, or probabilistic sensitivity analyses, which can be performed in both commercially available software and free statistical packages [14].

Scenario uncertainty allows for “what if?” questions to be asked. What if there were no complications? What if surgery was performed by task-shifted providers? What if DALYs were age-weighted but not discounted? These are less important but can be informative.

Conclusion

Cost-effectiveness analyses can be incredibly important for decision-making in global surgery, but methodologic discrepancies limit their utility. Early cost-effectiveness analyses in global surgery were incredibly important to demonstrate that surgery was a cost-effective intervention in the context of low- and middle-income countries. To that end, they were extremely successful. Future studies should, however, increase the methodologic rigor in cost-effectiveness analyses, and this review provides a discussion of the pitfalls in the field, the theoretical basis for rigorous cost-effectiveness analyses, and recommendations to standardize future analyses. These recommendations are summarized in the checklist in Table 4.

Table 4 Checklist for CEAs in global surgery

The goals of this checklist are fourfold: to encourage CEAs in global surgery and strengthen their quality, to maintain honesty and transparency, to avoid misleading results, and to maximize reproducibility and comparability—all in the service of making decision-making easier for policymakers in the field. Other checklists for CEAs have been developed; this checklist attempts to reconcile the differences among these divergent guidelines in a way that is relevant to global health broadly and to global surgery more specifically. Guidelines-based analyses can accomplish the goals above, and we encourage the use of the checklist in Table 4 in the design of future global surgery CEAs.