FormalPara Key Points for Decision Makers

Despite a growing literature on economic evaluation of digital mental health interventions (DMHIs), the existing studies may not satisfy the requirements for decision making.

Further research is required to understand the stickiness of the treatment effect of digital interventions, and to synthesise multiple sources of evidence on key parameters, such as the treatment effect, in a way that takes into account the complexity of these interventions.

Understanding the appropriateness of the methodologies used to evaluate DMHIs' value for money helps work towards consensus and methods harmonisation on these complex interventions.

1 Introduction

In the UK, NHS investment in digital technologies is growing rapidly [1]. According to the National Information Board's strategy for Personalised Health and Care 2020 [2], a digital health service will use technology to improve patient choices, access to services, clinical outcomes and better self-care through online services, rather than having to visit a health professional. In the treatment of patients with mental health conditions, this can offer the potential to elevate a system that is currently overstretched, with long waiting times for face-to-face contacts and poor provision of services in some areas. Digital interventions (DIs) for mental health conditions are wide ranging, from computerised cognitive behavioural therapy (cCBT) with or without added support from clinicians, to exposure therapy for the treatment of phobias, and text-based motivational support for smoking cessation.

However, digitalisation of mental healthcare can only be regarded as a positive change if the delivery of interventions through remote/digital mechanisms offer value for money. Commissioners are particularly concerned with delivering a quality service for lower costs. Cost-effectiveness analyses (CEAs) can provide evidence to support or refute the assumption that digital mental health interventions (DMHIs) are good value for money, by comparing the costs and outcomes of DMHIs relative to the costs and outcomes of relevant alternatives. DMHIs are, however, complex interventions, which may require complex methodology to evaluate cost-effectiveness appropriately [3]. Despite a growing literature on economic evaluation of DMHIs, the appropriateness of the methodology used to determine cost-effectiveness has not been considered. In addition, reviews conducted to date have tended to focus on a specific range of conditions and interventions, and comment only on cost-effectiveness results rather than consistency and appropriateness of methodology used. Indeed, the majority of the reviews fail to conclude on cost-effectiveness because of heterogeneity in the interventions, the conditions they target, and the methods used to evaluate them. Furthermore, reviews even interpret the same studies differently.

Without a consensus on the appropriate methods for evaluation of DMHIs, evidence will continue to be generated that is difficult to compare with the existing evidence base. For a decision-maker concerned with which DMHIs offer value for money and achieve good patient outcomes, it is difficult to interpret the body of evidence that currently exists. Existing reviews have not considered how CEAs for DMHIs meet the requirements for decision-making. Specifically, does the economic analysis estimate costs and effects, based on all the available evidence, for the full range of possible alternative interventions and clinical strategies and over an appropriate time horizon, for the mental health conditions of interest [4]?

In pursuit of a consensus framework or agreed methodology for evaluating DMHIs, this paper reviews all literature on economic evaluation of DMHIs and identifies common themes, focussing on the methods used and their appropriateness for decision-making. Finally, the paper considers where methods have yet to be developed or utilised, and where further research is required in the context of DMHIs.

2 Methods

2.1 Scope

This systematic review aimed to identify economic evaluations of DMHIs. DIs were defined as software-based systems and technology platforms designed for patient-facing delivery of a mental health intervention (i.e. an intervention to improve mental health outcomes). Studies could be performed on participants with both symptoms or risk of mental health problems. Mental health problems were defined by the ICD-11 criteria for mental, behavioural or neurodevelopmental disorders with the exception of the conditions listed under the categories of neurodevelopmental, neurocognitive and disruptive behaviour or dissocial disorders. All study designs that have the potential to inform decision-making were included in the searches, including modelling studies, clinical trials and observational studies.

The review was conducted in line with PRISMA guidelines, excluding the Risk of Bias (RoB) assessment. This is because our aim was to review the appropriateness of methods used in the economic evaluation of DMHIs, rather than to synthesise their results, and so broad RoB measures are unlikely to provide a useful contribution. Instead, broader principles were used to critique the studies (see Sect. 2.6 for details).

2.2 Searches

In December 2018, the following databases were searched to identify published and unpublished studies in any language: MEDLINE, PsycINFO, Cochrane Central Register of Controlled Trials (CENTRAL), Cochrane Database of Systematic Reviews (CDSR), Cumulative Index to Nursing and Allied Health (CINAHL Plus), Database of Abstracts of Reviews of Effects (DARE), EMBASE, Web of Science Core Collection, NHS Economic Evaluation Database (NHS EED); Health Technology Assessment database and the NIHR Journals Library; Database of Promoting Health Effectiveness Reviews (DoPHER). The full MEDLINE search strategy is presented in Online Supplementary File 1.

We also searched two clinical trial registries for ongoing studies: ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform portal, searched the NIHR portfolio, and conducted web searches using Google and Google Scholar using simplified search terms. After searches were completed we searched the lists of included studies of relevant systematic reviews identified by the searches and the reference lists of all included studies, and conducted forward citation chasing on all identified protocols, conference abstracts and the included studies using Google Scholar for any relevant publications. We also contacted researchers in the field and searched the citations of identified studies.

The searches were conducted from 1997, as any relevant studies of DMHIs could not have been published before this date. Searches were restricted to those written in English, as we anticipated that most economic studies written in other languages would also have a version published in English (e.g. the South Asia Cochrane Group [5]).

2.3 Selection Criteria

Eligible studies included participants with symptoms or risk of mental health problems. Studies were excluded where the primary diagnosis of the participants was a physical or other condition other than those listed (e.g. cancer or insomnia). All digital interventions that expressly targeted mental health and patient rather than was service oriented were included with the exception of those that were simply a communication medium (e.g. phones or videoconferencing). A broad range of studies was considered in the review, including economic evaluations conducted alongside trials, modelling studies and analyses of administrative databases. Only full economic evaluations that compared two or more options and considered both costs and consequences (i.e. cost-minimisation, cost-effectiveness, cost-utility, cost-consequence and cost–benefit analyses) were included in the review. No studies were excluded on the basis of their comparator group. Study protocols, abstracts and reviews were marked to allow for following up on as described in the searches section (Sect. 2.2).

2.4 Study Selection

Studies were screened by a team of four researchers (DM, DJ, PS, HM). Each title and abstract was independently assessed for eligibility by two reviewers, one of whom had expertise in health economics. If either reviewer indicated a study could be relevant, the full text for that study was sought and again assessed independently by two reviewers, with disagreements resolved through discussion or via a third reviewer (LG or LB).

2.5 Data Extraction

The purpose of data extraction was to summarise methodology and identify challenges common in the evaluation of DMHIs. To do so, the following information was extracted:

  • the population;

  • the intervention – underpinning principles, delivery mode, level of support, treatment duration;

  • comparators;

  • outcomes—clinical and economic outcomes, and the economic endpoint;

  • study design, analytical approach (within-trial analysis, decision model, statistical model, epidemiological study), and analysis time horizon;

  • setting (country and analytical perspective);

  • analytical framework – cost-minimisation (CM), cost-effectiveness analysis (CEA), CUA, CCA, or cost–benefit analysis (CBA); and

  • methods employed to characterise uncertainty.

Three reviewers (DJ, PS and LB) independently extracted details from full-text studies. Another reviewer (DM) checked extracted data, and disagreements were resolved by discussion.

2.6 Critique

The identified studies were critically reviewed to assess whether the existing evidence meets the requirements for decision-making in healthcare [4] and to identify challenges in generating cost-effectiveness evidence in this context. In particular, the methods used in applied studies were considered in terms of:

  • Does the economic analysis estimate costs and effects, i.e. it is not restricted to a cost-minimisation analysis;

  • Does the analysis appropriately synthesise all the available evidence;

  • Are the full range of possible alternative interventions and clinical strategies included; and

  • Are costs and outcomes considered over an appropriate time horizon?

3 Results

3.1 Summary of Available Economic Evaluations of Digital Mental Health Interventions (DMHIs)

The review identified 63 studies and 67 papers in total, as shown in Fig. 1. Out of the 10,764 records identified, and after duplicates were removed, a total of 6931 records were screened, of which 6646 were excluded by title and abstract, and 285 were assessed for eligibility by full text. A total of 217 papers were excluded because the primary diagnosis was not a mental health problem (n = 27), or the intervention was not a mental health intervention (n = 13), or the study did not include health economic outcomes (n = 35), or it was not an economic evaluation (n = 7), or it was a review rather than a primary study (n = 2), or it was a conference abstract rather than a peer-reviewed paper (n = 26), or it was a duplicate reference (n = 4), or it was a study protocol rather than a peer-reviewed paper (n = 103).

Fig. 1
figure 1

Number of studies identified and excluded at each stage of screening

Two papers reported the results of identical analyses [6, 7]; all other papers reported unique results (such as a different analytical perspective or length of follow-up), even when multiple papers were based on the same study. The paper by Duarte et al. [6] was thus taken out of the analysis to avoid duplicate reporting, and the final number of papers was 66.

The summary tables and details of individual studies are presented in Online Supplementary Files 2 and 3, respectively, while detailed analysis of evidence on the cost-effectiveness of DMHIs is due to be reported in future publications.

The majority of the studies (44 (67%) out of 66) evaluated interventions that target anxiety and/or depression. Other conditions included suicidal ideation (n = 1), child disruptive behaviour (n = 1), eating disorders (n = 3), schizophrenia (n = 3), and addiction including drug and alcohol addiction (n = 3 and n = 2, respectively) and smoking cessation (n = 8); see Online Supplementary File 2, Table S1.

All identified interventions were condition-specific, except six interventions that targeted both anxiety and depression. The interventions vary both between and within individual conditions in terms of their underlying principles (e.g. CBT, guided relaxation, self-help, exposure therapy, motivational support for smoking cessation), content (e.g. the number of modules), mode of delivery (e.g. mobile, computer or text based, or completed at home or at a clinic), type of support (online chat, phone, face-to-face), frequency of support (e.g. weekly, ad hoc), person delivering support (e.g. clinician, assistant, lay person) and extent of support (administration support only, additional counselling); see Online Supplementary File 2, Table 3.

Comparators can be broadly categorised as no treatment or waitlist, treatment as usual (TAU), active controls and face-to-face (F2F) psychological therapy. Waitlist comparators may not restrict access to other interventions, and so the difference between waitlist and TAU is not always clear. TAU includes psychological interventions (such as F2F therapy) and pharmacotherapy, although authors often report the lack of clarity over what patients receive in practice. Attention control interventions refer to interventions designed as ‘psychological placebos’; they encourage patients to spend the same amount of time on treatment as active interventions, but without the ‘active’ component, such as CBT or problem-solving therapy. Attention control comparators included websites, printed reading, and online relaxation, among others.

Health-outcome measures included health-related quality of life (HRQoL), life-years (LY), disease-free days, disease-specific clinical outcomes, and a range of abstinence measures in addiction and smoking cessation (1-week abstinence, smoke-free years, substance-free urine samples, etc.).

Economic endpoints included cost per additional: quality-adjusted life-year (QALY) (n = 43), disability-adjusted life-year (DALY) averted (n = 3), life-year (n = 1), disease-free day (n = 2), point change in the clinical score (n = 12), responder or clinical improvement (n = 16), cognitive improvement (n = 2), inpatient days avoided (n = 1) or additional day of abstinence in interventions that target addiction (n = 10). Results were presented in terms of incremental cost-effectiveness ratios (ICERs) (n = 40), overall change in the costs and effects (i.e. whether the intervention was dominant or dominated) (n = 2), net benefit (n = 3) and the probability an intervention is cost-effective at a specific cost-effectiveness threshold (n = 4) or at a range of thresholds (n = 33). In studies evaluating interventions that target schizophrenia, child disruptive disorder and suicidal ideation, the outcome measures tend to be disease-specific, and the costs and outcomes compared separately.

Methods for costing interventions varied. The majority of the evaluations (50/66) included the cost of staff time required to deliver the intervention, while a further nine did not report clearly which costs were included. Some studies included variable equipment costs and website maintenance and hosting (23 and 14 studies, respectively), while very few considered the cost of development, capital costs, or patient recruitment (or technology dissemination).

The vast majority of evaluations (52/66) were within-trial analyses, where one included a temporal extrapolation. A further three studies were evaluations within pilot [8, 9] and feasibility trials [10], while one study [11] used observational data. Nine studies used decision models to evaluate interventions—the summary of these models is provided in Table 1. Eight out of the nine studies that used modelling to evaluate DMHIs did so due to the absence of head-to-head trial data, and used individual trials or non-comparative data sources to inform the treatment effect of digital interventions. The model for eating disorder used synthesised evidence of the treatment effect to derive the cost of different treatment options.

Table 1 Summary of decision models used in economic evaluation of digital mental health interventions

The vast majority of the papers (55/66) reported some form of sensitivity analysis. In total, 42 studies used both probabilistic and deterministic sensitivity analysis (PSA and DSA). DSA involved exploring alternative scenarios and assumptions; the most common parameter whose value was varied was the cost of intervention. Other common scenarios included alternative methods for dealing with missing data, and the methods for estimating costs and effects.

3.2 Key Challenges and Limitations in Economic Evaluation of DMHIs

The critical review of the studies identified a range of challenges arising from the complexity of DMHI interventions, and the heterogeneity of evidence; we describe each in turn below.

3.2.1 Estimation of Costs and Outcomes

Section 3.1 highlighted that applied studies use a variety of methods to measure costs and outcomes of DMHIs. In the studies reviewed, benefits were measured in terms of QALYs, DALYs, life-years, disease-free days, disease-specific outcome measures, response or clinical improvement, inpatient days avoided, or day of abstinence in interventions that target addiction. Costs attributed to DMHIs included the cost of staff time required to deliver the intervention, a range of equipment costs, website maintenance and hosting, the cost of development, capital costs, or patient recruitment (or technology dissemination).

The optimal method for measuring outcomes ultimately depends on the analysis perspective. An employer may be interested in measuring the effect of the intervention on productivity, a mental healthcare provider may include a narrow range of benefits specific to the mental health condition targeted by the intervention and costs that fall on that provider, while a health system may aim to improve overall health, and so requires a broader health measure such as HRQoL to allow comparison across different fields of medicine.

While the majority of studies identified in this review reported a range of different costs and outcomes, seven studies evaluated interventions from a healthcare system or payer perspective, but measured outcomes in terms of changes in clinical scores [15, 22,23,24], clinical improvement/response/remission [25, 26], or disease-free days [27]. It is not clear how health gains in such disease-specific outcome measures can be used by decision makers to allocate resources across different disease areas.

Similarly, the appropriate methods for measuring costs depend on the analysis perspective (e.g. whether to include the cost to employer, service provider, broader health system, or society), as well as the role of the intervention. When interventions target undiagnosed patients who would not have sought care otherwise, dissemination (e.g. advertising or public health campaigns) is an integral part of the intervention that is likely to affect its outcomes, and so the cost of dissemination should be included in the cost of the intervention. Conversely, when an intervention targets diagnosed patients, and is prescribed by clinicians, the dissemination costs are likely to be negligible. In this review, 23 studies recruited self-referred patients through advertising, yet only two studies included recruitment costs in their analysis. Furthermore, the costs of DMHIs are highly uncertain. For example, four studies included capital costs (such as computers, staff training or one-off software purchases), and a further 14 included the cost of website maintenance and hosting. (For details of included costs, see Online Supplementary File 3.) The subsequent cost per patient depends on the scale of rollout, where wider delivery (e.g. providing an intervention nationally) is likely to dilute such fixed costs.

3.2.2 Use of All Available Evidence

While the volume of economic evaluations of DMHIs is growing, only one of the 66 evaluations [19] synthesised evidence from multiple sources to inform the effectiveness of a treatment effect.

Evidence synthesis is more complex for DMHIs than for interventions such as medication, as they are multi-layered and subject to external factors, and thus it is difficult to ensure uniform delivery, or to disentangle the impact of each layer on outcomes. There is significant heterogeneity likely between interventions, populations they target, and the settings in which they are delivered – even when interventions target the same condition, they tend to vary in their underlying principles, content, and the type and extent of support (see Sect. 3.1 for details). It is not clear whether each of the characteristics affect the treatment effect, or whether evidence from similar interventions can be reasonably pooled to make an overall recommendation about their cost-effectiveness.

Furthermore, interventions for the same mental health disorder can target different patient populations (e.g. according to disease severity). The population in which an intervention is evaluated can affect its comparability to other trials—i.e. it may not be appropriate to generalise costs and treatment effect observed in one target population to another, or to attempt to synthesise effectiveness of interventions observed in different patient populations.

Finally, digital interventions, as well as comparators that involve behavioural therapy, are likely to vary between settings (clinics and countries) in the referral system, capacity, waiting times and frequency of contact, and so synthesising evidence on resource use may not be appropriate.

3.2.3 Specification of Comparators

Comparators included guided and unguided digital interventions, medication, different types of F2F therapy and no treatment. Economic evaluation should include all relevant comparators, yet the majority of studies evaluating DMHIs only included two arms, which may limit their applicability to decision-making.

Waitlist control and TAU were the most common comparators (n = 34); however, their description was often limited and the distinction between them was not always clear. Treatment of mental health conditions tends to vary between health providers, and between different patient populations (e.g. diagnosed vs. undiagnosed); a lack of understanding of a comparator in a trial can limit generalisability of the findings, as well as comparability of results across trials.

3.2.4 Time Horizon for Analysis

The majority of the evaluations were conducted alongside a trial or using retrospective data from a single study. Of the 66 papers, 54 (51 RCTs and three pilot and feasibility studies) did not explore the results beyond the trial end point potentially failing to capture long-term costs and effects of DMHIs.

This is considered to be inadequate for decision-making due to the truncated time horizon. Mental illness is a lifetime condition for many patients, with periods of respite and relapse, during which costs and outcomes can be influenced by any potential treatment received. Limiting the time horizon of an analysis can generate inaccurate estimates of cost-effectiveness. The lack of longer-term modelling is likely to be due to, in part, the lack of reliable data about the long-term performance of DMHIs. For many treatments there is no empirical data on how long the treatment effects are likely to be observed for and how these relate to a changing baseline, i.e. how the population without treatment progress in their illness. Furthermore, this is likely to be confounded by co-morbidities and future events, making long-term extrapolation challenging.

4 Discussion

Despite a growing literature on economic evaluation of DMHIs, including several systematic reviews, there is no conclusive evidence regarding their cost-effectiveness. The lack of consensus is often attributed to the heterogeneity in the evaluated interventions, the conditions they target, and the methods used to evaluate them. There may, however, be more fundamental differences between the applied examples, determining how useful these studies are in informing decision-making, including commissioning choices. This paper aimed to assess the appropriateness of the methodology used to determine cost-effectiveness of DMHIs and to highlight the challenges associated with estimating cost-effectiveness.

4.1 Findings

The review identified 66 papers, and our findings support conclusions from previous reviews that DMHIs are heterogeneous, and the methods used to evaluate them vary [28,29,30,31]. The review has identified key gaps in DMHIs evidence required for decision-making. It highlighted characteristics of digital interventions that should be considered in future evaluations in order to address these gaps.

4.1.1 Evidence Gaps

Evidence to inform an assessment of cost-effectiveness: Cost-effectiveness analysis typically requires multiple forms and sources of evidence, including clinical effectiveness, adverse events, disease natural progression, cost/ resource use and health-related quality of life. The evidence to inform these often comes from several different data sources, particularly for the treatment effect, providing multiple estimates of the parameter(s).

This review identified ten studies that used some form of modelling to combine multiple sources of information; however, none of these studies synthesised evidence from multiple studies to inform key parameters, such as the treatment effect. Furthermore, 19 studies evaluated digital interventions for depression, and 17 for various types of anxiety, yet none of these studies estimated the treatment effect from more than one trial.

Evidence on DMHIs is likely to grow further. In order to draw conclusions regarding the cost-effectiveness of digital interventions, evidence review and quantitative synthesis techniques together with decision analytic models represent an ideal vehicle to structure the decision problem, combine all available data, and characterise the various sources of uncertainty associated with the decision problem. For example, network meta-analytic methods may be used to combine multiple sources of evidence to obtain pooled estimates of the treatment effect for all relevant interventions, allowing for the full body of evidence to be reflected and capturing heterogeneity between studies. Choosing a single study to estimate all parameters assumes that all other sources of evidence are not relevant or less relevant for the decision problem.

Long-term trajectory of the treatment effect: The majority of the evaluations were conducted within trials, with limited follow-up. Alas, there is little data to support assumptions regarding the ‘stickiness’ of DIs in mental health beyond the initial treatment period. The chronic nature of mental health problems requires consideration of the impact of short-term interventions over a longer time horizon, to understand how differences in treatments translate to differences in costs and outcomes over patients’ lifetime. This dictates the need for assumptions to be made about a longer-term treatment effect, which must be validated or tested using sensitivity analysis.

Treatment effect in treatment sequencing: For many mental health conditions, it is unlikely that only a single treatment would be offered at a single time point. To appropriately consider the cost-effectiveness of DIs, it is therefore necessary to reflect the possibility that multiple treatments may be given over a patient’s lifetime. In modelling sequences of treatments, evidence on effectiveness of treatments at various points in the pathway is required. None of the studies identified in this review explored the cost-effectiveness of DMHIs in patients at different stages in the treatment pathway, and many only included treatment-naïve patients. It may not be appropriate to assume that these DIs will have the same effect if given to a patient that has failed to respond to multiple treatment. Modelling treatment sequences can be complex and requires evidence unlikely to be available from randomised trials.

4.1.2 Key Considerations When Evaluating DMHIs

Heterogeneity of interventions: DMHIs are complex and multi-layered; as result, there is significant heterogeneity between interventions. Given this complexity, the evaluation of all digital interventions simultaneously would require a taxonomy for DMHIs to be developed, to inform what interventions can reasonably be pooled and compared. Previous reviews have focused on specific types of digital interventions (e.g. internet-delivered CBT [28,29,30,31] or guided internet interventions [32]) and digital interventions for a specific condition (depression [28, 32, 33], anxiety disorders [34]), but this review identified additional factors specific to DMHIs that need to be taken into consideration, such as the role of interventions in the treatment pathway.

Heterogeneity in the population they target: The appropriate methods for cost-effectiveness, are, at least in part, driven by the intended role for the analysis. The role of some DMHIs reviewed in this study was unclear—studies recruited patients through self-referral, referral by clinician where patients are identified ‘on the job’ and recruitment by screening medical records, and proactively inviting patients to participate. Different target populations suggest different aims of interventions—interventions that target self-referred patients have a role in diagnosis and treatment of patients who may not have otherwise sought treatment, whereas targeting diagnosed patients implies that digital interventions are administered in addition to, or alongside, existing treatment. The role of therapy can affect whether evidence from different studies can be pooled, how we measure costs and effects, as well as what the appropriate comparators are.

Heterogeneity in the delivery setting: DMHIs can be self-referred (and administered) or used on clinicians’ referral, either while waiting on or instead of other treatment options. They can also be provided by different providers (e.g. in primary care or by specialists). The delivery setting can affect generalisability of the findings (to other delivery settings), which costs should be included in the analysis, and which comparators are relevant (e.g. TAU in one setting may not be comparable to TAU in another).

Furthermore, the delivery setting can affect the appropriate perspective for the evaluation of DMHIs. Evaluations can be commissioned on a local level (clinics, regional decision-makers), a national level (e.g. NHS), or by employers and individuals themselves. Thus, the costs and outcomes included in the analysis, and the ‘decision rule’ used to interpret whether an intervention is cost-effective, can also vary. An employer may be interested in measuring the effect of the intervention on productivity, a mental healthcare provider may include a narrow range of benefits specific to the mental health condition targeted by the intervention and costs that fall on that provider, while a health system may aim to improve overall health, and so requires a broader health measure such as HRQoL to allow comparison across different fields of medicine. Furthermore, while NICE has an explicit decision rule (£20,000–£30,000 per QALY), in other perspectives it is not clear how to interpret health gains that result in an additional cost, particularly when health benefits are measured using disease-specific outcomes; e.g. how much should a provider spend on a one-point increase in GAD-7 score?

4.2 Considerations for Future Evaluations of DMHIs

4.2.1 Generating New Evidence

Future economic evaluations of DIs need to reflect the body of existing evidence. For example, in designing a particular intervention it is important to see what has worked and not worked previously and how the current evaluation will contribute towards the literature. Data collection should include data required to conduct an economic evaluation, detailed resource use and quality of life. Detailed reporting (e.g. recruitment method, patients’ baseline characteristics, comparator details, breakdown of costs) can enable evidence synthesis where heterogeneity of DMHIs is accounted for appropriately (see Sect. 4.2.2).

4.2.2 Evidence Review and its Synthesis

In order to evaluate whether DMHIs represent good value for money, additional research is needed to review and (when appropriate) combine all available evidence, on all relevant comparators, required to inform an economic evaluation. Given the complexity of the interventions, any evidence synthesis needs to take into account the following:

  • Taxonomy of DMHIs, informing whether evidence on different interventions must be pooled;

  • The target patient population, where adjustments may need to be made for potential differences in effect size in different populations;

  • The decision-making context, reflecting comparators, outcomes and costs specific to that context.

4.2.3 Long-Term Outcomes

Trial-based evaluations are often insufficient in follow-up to capture all the relevant differences in costs and outcomes between competing interventions, for example DIs versus non-DIs. The use of modelling, specifically extrapolation modelling, is therefore required in many instances. The use of extrapolation modelling should not, however, negate the need for sufficient follow-up in primary studies. It is important to determine how effective DIs are over the longer term, and valid extrapolation is only possible with longer follow-up.

4.3 Strengths and Limitations

This paper is the first attempt to understand the appropriateness of methods used to evaluate DMHIs. The review employed a thorough search strategy to identify existing economic evaluations. However, the searches were restricted to publications in English only, and publications before December 2018. Economic evaluations of DMHIs are continuously being conducted and our review has not captured those published since our literature searches have been completed. Given that the primary aim of our review was to critique the methods for conducting economic evaluations of DMHIs based on a large sample of studies identified by our literature searches, our conclusions remain relevant and can be applied to newly published economic evaluations. As a case in point, a recent within-trial economic evaluation [35] compared a DMHI against a waiting list control and showed that, for short-term observed outcomes (at 8 weeks), the evaluated DI was unlikely to be cost-effective, whereas for longer term estimated (extrapolated) outcomes (at 12 months), the DMHI could be cost-effective. This emphasises the importance of extrapolation in cost-effectiveness analyses of DMHIs, but it should not replace long-term follow-up as it did in this trial.

5 Conclusion

This paper is the first attempt to understand the appropriateness of methods used to evaluate DMHIs. Understanding the limitations of existing research, and methodological challenges specific to DMHIs, motivates a discussion, and helps to work towards a consensus on methodology in future evaluations.