FormalPara Key Points for Decision Makers

There is wide variation in economic models of age-related macular degeneration.

Key differences that affect findings include duration, consideration of starting visual acuity, and how utilities are assigned.

Trends in measurement of utilities are the most likely challenges to continued relevance of existing scholarship.

1 Introduction

Age-related macular degeneration (AMD) is a condition characterized by deterioration of the retinal pigment epithelium (RPE), which can impair central vision. In early AMD, patients present with a “combination of small drusen, few intermediate drusen…, or mild RPE abnormalities” [1]. Intermediate AMD may be characterized by multiple intermediate drusen, a single large drusen, or by geographic atrophy, in which entire patches of the RPE lose function. Geographic atrophy is characterized by larger lesions but as there is not yet a diagnosis code for the condition, it is challenging to identify outside of clinical situations. In advanced AMD, the patient may have geographic atrophy involving the foveal center or wet (exudative) AMD in which there is abnormal blood vessel growth (i.e. neovascular disease); these blood vessels can burst, releasing fluid and scarring the macula. Eyes can convert from dry to wet AMD suddenly. While 90 % of AMD patients have the dry form of the disease, 90 % of the vision loss associated with AMD occurs in patients with exudative disease.

A series of studies has identified risk factors, including increased age, non-White ethnicity, increased exposure to ultraviolet light, and genetic factors [2, 3], while other studies have identified interventions that can slow progression, or conversion, including supplementation with antioxidant vitamins and minerals among patients with early or intermediate AMD [4, 5]. The current standard of care for wet AMD varies based on the location of the lesion, but typically consists of intravitreal injections of anti-vascular endothelial growth factor (VEGF), such as ranibizumab, bevacizumab or aflibercept, although in select instances photodymamic therapy and laser photocoagulation are utilized [1]. The goal of any therapy is to delay progression and vision loss. Maintaining vision for as long as possible has been associated with a higher quality of life, lower medical costs, and less need for caregiving and other indirect resource use [610].

Dozens of papers have been published exploring comparative cost effectiveness of interventions for AMD, as well as a number of reviews that explored pharmacoeconomic assessments of treatments for AMD [1114]. Although there is typically some discussion about the choice of methods in each review, readers are often tempted to compare results at face value rather than to compare underlying assumptions. In recent years, there has been heightened interest in considering methods, instead of simply input parameters, in evaluating health economic models and the results of these models. Sensitivity analyses can explore all manner of scenarios to find extreme situations and identify thresholds at which comparators are equal in terms of cost effectiveness, but a model whose structure does not reflect a clinical scenario and disease progression also will not yield useful results no matter how many sensitivity analyses are conducted [15, 16]. Yet the use of models is undeniably important in comparing treatments when there is considerable uncertainty. Careful consideration of modeling techniques, approaches, and inputs can help ensure that comparisons are relevant and appropriate.

The objective of our review and critique is to summarize and compare the various assumptions made in model development and parameter population to offer guidance for future studies as well as to provide guidance for interpreting existing studies and comparing results across them.

2 Search Strategy

We initially conducted a search of MEDLINE-listed papers though PubMed in October 2014, with search terms designed to identify any model reporting on economic outcomes of treatments for AMD (specifically, the Medical Subject Headings ‘macular degeneration’ and ‘costs and cost analysis’ were used). No year limits were imposed, although both ‘human’ and ‘English’ language limits were used. Reports of cost effectiveness of screening were excluded; health technology assessments that were MEDLINE listed were included. A total of 39 publications were identified during the initial search. Of these, 11 were deemed as ineligible for the review, either because they were reviews and thus did not include primary results, or did not compare AMD treatments. After review of reference lists, another 13 papers were identified, with eight reporting on models, and five later deemed ineligible as they presented reviews or summarized the state of knowledge, characterized issues in utility assessment alone, or did not conduct cost-effectiveness modeling. Thus, 36 publications on cost effectiveness of AMD treatments were considered for this review. Tables 1 and 2 summarize the basic characteristics and model methodology of each publication, respectively.

Table 1 Characterization of studies
Table 2 Model characteristics

3 Review Results

3.1 Characterization of Studies

3.1.1 Findings

3.1.1.1 Geography

Of the 36 studies identified, 15 were US-based studies, and 9 were UK-based. The other countries represented were Spain [1719] (three studies), Germany [20, 21] and Canada [22, 23] (two studies each), and Australia [24], the Czech Republic [25], Greece [26], The Netherlands [27], and Switzerland [28] (one study each).

3.1.1.2 Treatments

Most of the studies (n = 24) evaluated VEGF inhibitors, either compared with others in the class, photodynamic therapy (PDT)/sham treatment, or best supportive care (BSC). Other modalities of treatment were represented: four studies compared PDT with placebo or BSC, three evaluated the impact of vitamins, and two assessed laser photocoagulation. Finally, one study each considered blue-light filtering intraocular lenses, choroidal translocation, and an implantable miniature telescope.

3.1.1.3 Date of Publication

The papers were not evenly distributed over the time period we evaluated. Nine papers were published from 2000 to 2005, and nine papers were published from 2011 to 2015, with the remaining 18 in the interim.

3.1.1.4 Endpoints Included

With the exception of a few studies [19, 25, 29], each study offered cost per quality-adjusted life-year (QALY) as one endpoint. A small number of studies also provided additional endpoints along with cost per QALY, such as cost per vision-year saved [30], cost per case of blindness averted or cost per blind-year averted [31], or cost per line of vision loss prevented [25].

3.1.1.5 Publication Type

Twenty-three papers were published in ophthalmology-focused journals, with four published in journals with an economic or policy focus, and seven published in journals with a broad clinical focus. Two of the analyses were health technology assessments and not published in journals.

3.1.2 Implications

The studies identified in the literature and selected for inclusion in this review seem to be representative of the clinical focus and interest in AMD during the past 15 years, with VEGF inhibitors the most relevant treatment to consider, and cost per QALY almost universally acknowledged as a meaningful metric for cost-effectiveness analyses, particularly given the global reach of these studies. The type of journal in which an economic model is published may be relevant in terms of the amount of space editors may be willing to devote to explaining the model or the balance of methods versus implications in the paper. The fact that so many papers were focused in ophthalmology journals but still relatively well-reported suggests that the ophthalmic community is interested in these types of studies, although one might still argue that the minutiae of modeling might be considered more worthy of publication in an economics journal. This may account for some of the unreported model specifications discussed later in this review.

3.2 Clinical Assumptions

3.2.1 Findings

3.2.1.1 Starting Patient Age

Typically, models adhered closely to the clinical status of patients in the trials from which data were driving the model. For example, the studies that used data from the Treatment of Age-Related Macular Degeneration with Photodynamic Therapy Study (TAP) or Minimally Classic/Occult Trial of the Anti-VEGF Antibody Ranibizumab in the Treatment of Neovascular Age-Related Macular Degeneration Study (MARINA) studies generally assumed an age of 75 years at the start of the model, while those using data from VEGF Inhibition Study in Ocular Neovascularization Study (VISION) tended to have younger starting ages. Studies exploring the use of supplements also selected younger starting ages for the analyses.

3.2.1.2 Type of Lesion

There was wide variation in the type of lesion, mostly also based on the definition in key clinical trials. Selected models presented results separately for minimally classic, occult, and predominantly classic lesions (e.g. Athanasakis et al. [26], Brown et al. [32], Gower et al. [30], and Neubauer et al. [20]).

3.2.1.3 One Eye versus Both Eyes

Most studies looked at the treated eye only, with a few considering that treatment patterns might be affected by the development of disease in the fellow eye. Only a small number of studies considered the possibility of treatment in both eyes [3235].

3.2.1.4 Starting Visual Acuity

Most of the studies identified assumed a distribution of starting visual acuity (VA), based on the trials from which data were primarily derived or an otherwise representative population, and provided a single estimate of cost effectiveness. A small subset of studies [24, 3639] included patients with two or more levels of starting VA, and presented results for each group. Findings highlighted the value of PDT for patients with better VA, which generally met acceptable levels of cost effectiveness, compared with PDT for patients with poor VA at baseline.

3.2.2 Implications

Across these studies, the differences in starting age were small, with only a few studies including patients less than 65 years of age; reasonably, those considering vitamin supplementation used younger populations. Lesion type is of some interest as there were slight differences in findings across lesion types within studies, but in each study the general findings of cost effectiveness and the magnitude of the finding would have been the same. For example, the study by Neubauer et al. comparing ranibizumab with PDT consistently demonstrated ranibizumab to be cost effective, with the cost per QALY estimates within 15 % of each other for predominantly classic, minimally classic, and occult lesions [20]. Similarly, across lesion types, Gower and colleagues consistently found that PDT had consistently lower costs and higher utilities than pegaptanib, regardless of starting VA [30]; providing these lesion-specific analyses does not, in the end, provide different guidance based on lesion type.

The importance of projecting outcomes based on different starting levels of VA may be greater than considering lesion size. While decision makers may want to minimize the number of different clinical scenarios for which policies should be developed, the studies that vary baseline VA and their findings suggest that there are groups of patients for whom AMD treatments are more cost effective than other patients. While few studies demonstrate this evidence, it is suggested that studies clearly articulate starting VA and strongly consider conducting separate analyses by VA if there are data available for necessary inputs.

The conceptual argument for considering both eyes is valid but challenging to implement. First, it affects utility values themselves. A recent study found that changing visual impairment in the worse eye, even with the better eye remaining unchanged, affected patient-reported utilities [40]. Second, consideration of bilateral versus unilateral disease should be useful in identifying real-world cost patterns. Unfortunately, laterality is frequently omitted from administrative claims databases, as is visual impairment coding, therefore any large-scale assignment of costs based on non-trial data is hampered.

3.3 Modeling Assumptions

3.3.1 Findings

3.3.1.1 Model Type

Most of the models (n = 24) used a Markov approach, either for the entire modeling exercise or in part. For example Neubauer et al. [21] used a decision tree for the first year but a Markov approach for the remainder of follow-up. In a few cases, the approach was not clearly stated.

3.3.1.2 Cycle Length

The majority of models for which the cycle length was specified used a 3-month cycle length (n = 10) or cycles of variable length depending on the treatment, but including a 3-month cycle length for at least the initial part of the model, with annual cycles for the remainder (n = 2). Of the remaining models, nine reported cycles of 1 year in duration, two used 6-week cycles, and 13 did not clearly report cycle length or it was not relevant given the model structure.

3.3.1.3 Discount Rate

For 30 of the 36 studies, a discount rate was clearly reported; in 26 studies the discount rate was the same for costs and outcomes. Among those for which it was not reported [25, 28, 4144], in some cases this was appropriate as models extended only 1 year [25]. Most of the studies used the same discount rate for costs and outcomes (3 %, n = 17 [17, 20, 22, 23, 2932, 34, 35, 37, 38, 4549]; 3.5 %, n = 6 [18, 26, 33, 5052]; 2.5 %, n = 1 [19]; 5.0 %, n = 1 [21]; 6.0 %, n = 1 [24]), although four studies [27, 36, 39, 53] either used different discount rates for costs and outcomes or reported one but did not indicate whether the other was discounted and/or at what rate.

3.3.1.4 Perspective

The careful review of the health economic studies included in this work revealed that in some cases what the authors have specified as the perspective of their study is not consistent with best practice. Here we report the perspective as described by authors. One study presented both a societal and third-party payer perspective [31]. Among the 35 remaining analyses, 19 presented a third-party payer perspective [17, 21, 24, 26, 29, 30, 3238, 41, 4446, 48, 54], seven provided a government (non-US) perspective [20, 22, 23, 42, 43, 51, 52], five stated they presented a societal perspective [18, 27, 28, 39, 49], and four did not clearly report what perspective was being presented [19, 25, 47, 50].

3.3.1.5 Time Horizon

Models ranged in duration from 1 year [44] to as long as up to 50 years [47]. Timelines were, in many cases, fixed, but in others population mortality was built into the model and a ‘lifetime’ horizon was modeled. Most models included 1–2 years of treatment with subsequent follow-up. Several models presented results at multiple time periods, most often 2 and 5 years. Excluding the studies with a lifetime horizon but without a clear mean survival, both the mean and modal duration for the models reviewed were 10 years.

3.3.2 Implications

It was somewhat remarkable that some papers did not seem to provide specific information on the models they developed. We suggest that details such as cycle length and model duration can never reasonably be missing. Similarly, the lack of mention of a discount rate for any paper exceeding 1 year in length is problematic. While there was variation in discount rates, we understand that the choice may be driven by the regulatory or reimbursement agency that may be part of a study’s audience. Still, more than half of the papers used 3 or 3.5 %, suggesting convergence, in accordance with guidelines for health economic analysis [55, 56].

The choice of perspective to use is often driven by the likely audience for the analysis, yet it was at times difficult to ascertain exactly which types of costs were included in each paper. Use of the ‘third party payer’ perspective does not imply comparability. Studies that reported presenting a third-party payer perspective still provided a wide range of costs, from the very minimal direct costs associated with AMD to a much broader set of all costs covered by an insurer related to ophthalmic diagnostic services and care, as well as direct costs associated with progression of disease and visual impairment. It is likely that the short-term costs can be measured accurately and that differences are largely identified in the models, but the assumption that longer-term costs are similar across treatment groups could be problematic. Most models have included sensitivity analysis that at least indirectly addresses the possibility of important cost differences.

As a simple comparison of the types of costs included in these models, Athanasakis and colleagues interviewed a panel of experts to identify resources associated with ophthalmic care and blindness, including a wide array of tests, inpatient care, and low-vision care [26], while costs included in Stein and colleagues’ models include a limited set of ophthalmic tests, visits and interventions, as well as short- and long-term costs associated with endophthalmitis, blindness, venous thrombotic events, or cerebrovascular accidents [48, 49]. Neither approach purports to capture all costs associated with AMD and its treatment, yet both acknowledge the array of costs beyond the treatment and routine follow-up in a different fashion. A subset of studies [39, 52] provide extensive details on the blindness-related costs included, and specify hip fractures and depression among the health-related resources. Costs associated with low-vision aids may be driven by an expectation about what services or aids are believed to be important by the study authors rather than being driven by patient experience. For example, even surveys that included patient and provider input to design a list of low-vision services were subject to write-ins, with some costly items identified [8, 9]. It seems that with the exception of papers with the same authorship team, no studies identified in this review included the same set of costs.

The balance of presenting a model of appropriate length and the complexity that comes with populating later years with costs and outcome data is clearly answered differently across the studies reviewed. Brown and colleagues make an important observation [34]; in describing results of their 12-year model, they found that less than 10 % of the incremental gain in QALYs was gained in the first 2 years of the follow-up period, while 75 % was gained in the last 8 years. Bansback and colleagues compared cost effectiveness at multiple follow-up points and found that verteporfin did not meet typical cost-effectiveness thresholds until approximately 3 years into the observation period [50]. Thus, direct comparisons of findings across studies with vastly different modeling duration may lead to inappropriate inferences. Further complicating the question of identifying an appropriate follow-up period is the relative lack of long-term results, particularly for neovascular disease. Extending models may capture deterioration that is seen over the long term [57] at the expense of introducing other types of uncertainties. Although there is an opportunity for misuse, we suggest that providing results at an interim time point can be useful, both for the sake of comparison and to help provide stakeholders with different levels of interest to identify at what point in treatment benefits will likely appear.

3.4 Utilities

3.4.1 Findings

3.4.1.1 Basis

In the vast majority of the studies, utilities were based on best-corrected VA (BCVA). In two studies [33, 50], utilities based on contrast sensitivity (CS) were used. The study by Butt and colleagues [33] considered CS only in the model for binocular disease, while BCVA was used for the unilateral disease model. Comparing outcomes from the CS and VA models, the incremental savings was greater with the CS model, and the incremental QALYs were higher; therefore, while both approaches demonstrated the cost effectiveness of bevacizumab, the findings were stronger for the CS model [33]. Interestingly, both papers refer to work by Espallargues et al. for utility values [58], although Bansback et al. used values derived from the Health Utilities Index (HUI) [50], and Butt et al. used values from the Short Form-6 dimension (SF-6D) [33].

3.4.1.2 Source

Utility values used for most of the studies identified were derived from a time trade-off (TTO) analysis conducted by Brown [59] and used in a series of papers by that research team. However, there were some exceptions. For example, two studies [27, 50] applied utilities derived from the HUI3, and Butt and colleagues used utilities derived from the SF-6D [33].

3.4.1.3 Health States

The typical number of health states in these analyses was six, five of which were various states of BCVA and the sixth was death, although some studies had as few as three or as many as 12 BCVA states. Among studies using the same number of BCVA-driven health states, there was no consensus on the thresholds. A substantial minority of studies looked at lines or letters of vision lost from baseline [22, 24, 25, 31, 36, 38, 43].

3.4.1.4 Influence of Adverse Events on Utilities

Adverse events associated with treatments were not consistently included in the models, nor were utilities consistently influenced by blindness. For example, four studies [32, 34, 36, 48] included QALY losses associated with adverse events and blindness. Others may have done so but did not detail how adverse events were incorporated into the models.

3.4.2 Implications

The convergence on a single series of studies for utility values is unusual. While that introduces a tempting level of consistency to much of the body of literature in the space, there remain questions. For example, the same team compared utilities obtained using a TTO approach versus a standard reference gamble (SRG) approach [60]. The TTO approach yielded lower utility ratings across the VA categories, raising the question of whether using the SRG-derived higher values would constrain variance and minimize the differences in QALYs estimated. Another study found similar results, with TTO and SRG approaches yielding lower utilities than the EQ-5D [61].

In recent years, concern has been raised about whether VA in the better eye is the appropriate proxy for utility, as well as how well VA relates to patient-reported outcomes in visual-specific and generic instruments for dry and wet AMD [40, 58, 62, 63]. It is unclear how relevant the issue of the better- or worse-seeing eye is, given that AMD eventually presents in both eyes for most patients. Furthermore, the blanket use of BCVA may discount the type of vision loss. For example, the quality-of-life effects of vision loss that is peripheral versus central may be different [64], and thus the application of utilities derived from patients with VA of a specified level may not be applicable to patients with the same measured VA but with a different type of vision loss. Utilities based on BCVA associated with diabetic retinopathy and AMD have been shown to be similar [65], suggesting that perhaps the relationship between VA and utilities may be consistent across conditions.

4 Discussion

Expecting that economic models examining therapies in the same therapeutic area are almost identical is unreasonable due to differences in the target audience, scholarship in the field, and honest theoretical disagreements. A single-party payer system is likely be more interested in the full costs of blindness and in following patients for a longer period of time, for example, while in a multiple-payer system in which patients may switch insurers, a shorter time frame may be justifiable. Thus, the use of a reference case [66] is always welcomed. However, there are a number of assumptions and decisions that should be consistent, or at least similar, across models because the models must still reflect the clinical disease patterns. While cost per QALY is almost universally recognized as an appropriate metric for effectiveness, there are known to be differences based on the instrument used. Within-study comparisons are not problematic but comparisons across studies may be less accurate if different utility collection tools are used.

An important consideration in evaluating utilities in AMD is that the decade-long assumption that VA provides the appropriate linkage to utilities may no longer be valid. Most of the papers described here use utility values based on studies conducted by a few collaborating research teams using a TTO approach with Snellen eye charts [67, 68]; however, the initial research was not specific to macular degeneration and the unique visual impairment that it presents. Whether the loss of central vision but maintenance of peripheral vision would be recognized as different from visual loss in general is unclear. Furthermore, VA may not be consistent when measured using various methods. For example, there is some evidence that Snellen eye charts may underestimate VA compared with charts designed for the Early Treatment Diabetic Retinopathy Study (ETDRS) for patients with AMD [69]. This leaves us with the concern that most of the existing studies with similar observation periods may be comparable since utilities were generally derived from the same studies; however, it does not ensure that this method for utility elicitation is infallible nor that the patient population provided representative utility values.

A second concern regarding the development of health states in AMD is recent research suggesting that CS, not VA, is a more meaningful metric in AMD [36, 70]. Newer scholarship that considers CS as more appropriate than BCVA (e.g. Patel et al. [53]) for assigning utilities should continue to be monitored. As testing of CS has now become a simple proposition, the use of it as a driver of health utilities will likely increase, and perhaps a crosswalk, even if it is imprecise, can be used to help bridge across studies. Even so, clinical practice has not yet embraced routine evaluation of CS, therefore its recent entry into cost-effectiveness modeling may be many years ahead of its widespread acceptance.

A third issue that was identified, although not well articulated, is the importance of the duration of follow-up. While few studies presented outcomes at multiple timepoints [34, 50], their findings suggest that a more complete understanding of when costs occur and benefits accrue would be useful to facilitate comparisons when follow-up does not match up across studies. We suggest that while the concerns about how utilities were elicited and the potential role of CS are essential to moving forward, being cognizant of the observation period is the one concern that is most easily considered with existing scholarship. Duration should be considered along with the perspective. For example, any model that purports to present a societal perspective over a short-term observation period (perhaps less than 5 years) would underestimate the costs associated with preventing blindness over the lifetime for a typical societal perspective payer (i.e. a government-sponsored national health care system). Conversely, a model that presents only AMD-related medical costs over a lifetime presents another potential disconnect.

The important question here is whether there are outliers within the existing body of scholarship that cannot be integrated with the majority of studies because of important deviations from acceptable standards or from other literature. At this point, we suggest that existing studies can be compared as long as key elements (source of utilities, duration, and type of costs included) are conceptually aligned, if not identical. What will remain important is to stay informed about advances in the science of AMD and measurement of utilities among patients with the condition. While adjustments to reference case standards or model specifications will likely not create incompatibility with existing scholarship, breakthroughs in the understanding of the impact of AMD on patient-reported utilities could render legacy studies difficult to interpret.