FormalPara Key Points for Decision Makers

We conducted a systematic review of discrete-choice experiments, conjoint analysis, and other attribute-based stated-preference studies in multiple sclerosis.

Areas for improvement in future studies are sample size considerations, documentation of justification for design choices, greater use of qualitative methods for attribute/level development, and better reporting of experimental procedures.

The effect of treatment on reproduction and the influence of risk perception were identified as understudied topics.

1 Introduction

Multiple sclerosis (MS) is a chronic neurological disease and the commonest non-traumatic cause of acquired disability in young adults in the Western world [1]. Mean age of onset is 30 years, and over two-thirds of patients are female [2]. The etiology of the disease is not fully understood, but it is known to be an inflammatory demyelinating disorder of the central nervous system [3]. Most people with MS (PwMS) experience two clinical phases: initially relapsing-remitting MS (RRMS) followed by secondary progressive MS (SPMS), with gradual accumulation of disability [4]. Natural history data suggest the clinical phenotype switch from RRMS to SPMS usually occurs about 10–15 years after onset [5, 6]. Whilst the clinical hallmark of RRMS is relapses followed by a variable degree of remission, SPMS is characterized by disability that may affect numerous functions, including gait, balance, vision, cognition, and continence [7]. In about 10% of PwMS, the disease is progressive from onset, known as primary progressive MS (PPMS) [8].

Treatments for PwMS fall broadly into two categories: (1) symptomatic treatments intended to alleviate specific symptoms PwMS experience and (2) disease-modifying treatments (DMTs) intended to alter the natural course of MS, i.e., reducing the frequency and severity of relapses and slowing of functional deterioration [9]. For DMTs to be effective, PwMS need to commit to long-term interventions, often requiring regular administration of tablets, injections, or infusions.

Currently, 14 DMTs are available for the treatment of RRMS, whereas only one has been approved for progressive MS. The drugs vary in efficacy, adverse event profile, mode of delivery, and monitoring burden [10].

The increasing number of DMT options creates uncertainty in treatment selection. Information about how PwMS choose DMTs once an MS diagnosis has been established is lacking. Several of the most effective DMTs are associated with an increased risk of adverse effects, including life-threatening infections and secondary autoimmunity. Patients must trade-off these potential negative consequences with the perceived benefits (reduced relapse rate and disability accrual, maintained or improved quality of life). Such decisions can be challenging at any time but may be particularly difficult soon after diagnosis, when PwMS are coming to terms with the presence of a chronic condition and have less knowledge about MS and how it will progress and affect their quality of life.

The choice of DMT depends greatly on individual preference and requires the patient to weigh-up and trade-off different attributes. For example, a decision must be made as to whether a reduction in the probability of relapses outweighs the risk of a serious side effect. Attribute-based stated-preference (AbSP) techniques, such as discrete-choice experiments (DCEs), best–worst scaling (BWS), and conjoint analysis,Footnote 1 may elicit such trade-offs between the individual attributes that make up a choice object and are hence ideal for investigating the DMT preferences of PwMS [11]. Given that the number of DMTs continues to expand, another advantage of using AbSP is that it provides insight into patient attitudes toward potential treatments that are not yet available and an indication to those developing and trialing new drugs about what combination of attributes would be acceptable to PwMS.

The preceding describes why MS is fertile grounds for AbSP research. However, it is also uniquely challenging. MS is categorized into distinct clinical phenotypes and is a progressive disease with a wide range of symptoms, resulting in a highly individual experience for PwMS. The benefits of treatment are probabilistic: no drug is effective in every case, so every decision to start a DMT represents, to some extent, a gamble. In addition, the clinical endpoints of trials measuring the efficacy of DMTs can be difficult to translate into meaningful terms for PwMS. The trade-offs PwMS must consider when choosing between different treatments involve all aspects of their current and future lives. For example, they need to consider how much negative impact from side effects on quality of life and daily routine is acceptable to potentially slow down accrual of disability several years later. Moreover, many PwMS experience effects on cognition from their disease [12,13,14,15,16], which may affect their ability to give considered responses in surveys.

While general reviews of DCEs and BWS in health exist [17,18,19,20], none has specifically examined MS. Given the significant opportunities and challenges discussed, as well as a recent rise in the number of relevant studies, a review focusing on this disease area is timely. This paper systematically reviews AbSP studies focusing on experiment design and conduct and suggests recommendations for improvement.

2 Methods and Materials

2.1 Search Strategy

An information specialist developed comprehensive literature searches using MEDLINE, Embase, PsycINFO, CINAHL, Cochrane Libraries, and the Web of Science Core Collections from database inception to 11 July 2017. Search concepts included MS and synonymsFootnote 2 and AbSP-related terms such as DCE, BWS, max diff, and conjoint analysis. The information specialist and project team members identified subject headings and free-text words for use in the search concepts. Further terms were identified from known relevant papers and tested. Before the searches were run, all search strategies were peer reviewed by a second information specialist using the Peer Review of Electronic Search Strategies (PRESS) checklist [21, 22].Footnote 3

The results of the database searches were stored and de-duplicated in an EndNote library. Further relevant studies were sought by searching citations (forwards and backwards) in the included studies.

The search process is illustrated in Fig. 1. The searches identified 328 records. Once duplicates were removed, 214 records remained. Citation searches identified no records. Two authors (EW and DM) reviewed abstracts and selected 38 for full-text review. The same two authors then selected for final inclusion articles that (1) were published in a peer-reviewed journal, (2) dealt exclusively or primarily with MS, and (3) used an AbSP methodology in any part of the article. An AbSP methodology was defined as any method that used quantitative data to examine preferences for attributes of a whole. Disagreements were resolved by consensus discussion between EW and DM. This process resulted in 16 articles that reported 17 studies.

Fig. 1
figure 1

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) diagram showing the study selection process

2.2 Data Extraction and Analysis

Table 1 lists the studies included for final analysis. One author (EW) extracted the data using the form available in Appendix B in the Electronic Supplementary Material (ESM). This form was developed by four authors (EW, DM, IE, and AM) with the aim of focusing on study design features (study type, country of origin, participant inclusion criteria, sample size, attribute and levels identification and development) to assess whether studies had used best practice. Given the small number of studies, it was not thought effective to pilot the form, but minor revisions were made during the data extraction process. Analysis was performed using a narrative synthesis approach [23]. Detailed consideration was given to attribute development and presentation of information about probabilities, as these are often mentioned as key neglected areas in AbSP practice [24].

Table 1 Studies included for review

One author (EW) scored the quality of each study using the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) conjoint analysis checklist [25]. This contains 30 items that were given a count of 1 if a study reported considering at least some aspect of this item and 0 if it did not. The final count for a study was then the sum of its counts for each item. A secondary aim of this study was to assess the suitability of using the checklist to assess the quality of AbSP studies for future reviews. This involved considering whether it showed variation in overall counts and counts for individual items and whether it revealed important issues previously unconsidered.

Narrative synthesis of the identified studies was performed by several authors (EW, AM, IE, DM). Statistical analysis of numerical data was conducted by computing summary statistics using R version 3.3.1.

3 Results

All but one study examined patient preferences for DMTs, with the remaining study [26] investigating the quality of life of PwMS. This focus is not surprising; decisions about DMTs are vitally important to PwMS and feature a mixture of benefits and risks, making AbSP an ideal quantitative tool with which to study the decision making of PwMS.

In total, 15 studies (88.2%) were funded by pharmaceutical companies. One study [26] received funding from a public health authority and a charitable foundation, and one [27] stated the authors received no specific funding.

Figure 2 shows the publication of MS AbSP studies over time. An upward time trend is clear, with the first appearing in 2009 and nine studies (52.9%) being published since 2016.

Fig. 2
figure 2

Number of publications of attribute-based stated-preference studies in multiple sclerosis per year

3.1 Study Type

Table 1 lists the AbSP method used by each study. The majority (n = 9 [52.9%]) were DCEs. Two studies (11.8%) were BWS; Kremer et al. [27] used case 1 and Lynd et al. [28] used case 2.

3.2 Survey Population

Table 1 lists the country in which each study was conducted. The USA was the most common country, with seven studies (41.2%); a further nine (52.9%) were spread across Europe (Germany, Italy, the Netherlands, Spain, UK), and one study was conducted in Canada.

3.3 Diagnoses

Seven studies (41.2%) included anyone with a diagnosis of MS, and seven (41.2%) included only patients with a diagnosis of RRMS. Of nine (52.9%) studies that clearly indicated diagnoses to exclude, all excluded PPMS. Seven studies (41.2%) required the diagnosis to be confirmed by a physician, and seven (41.2%) relied on self-reported diagnoses; in three studies (17.6%), how the diagnosis was established remained unclear.

3.4 Development of Attributes and Levels

Most studies drew on existing literature in medicine and the social sciences (14 [82.4%]) and/or healthcare professionals (12 [70.6%]) to develop attributes and levels. Few used qualitative methods to elicit views of PwMS, with only two studies (11.8%) [27, 28] employing focus groups. Seven studies (41.2%) used interviews at some point in the design stage, typically to refine an existing survey rather than as a basis for attribute development. Two [29, 30] of these seven (28.6%) did not state how many interviews were conducted, and the average number for the remaining five was 10.3. Two studies (11.8%) (Wicks et al. [31], studies 1 and 2) did not state how attributes and levels were developed.

3.5 Survey Design

Figure 3a shows the number of attributes used by each study. The median number of attributes included in studies was six, in line with the typical number included in AbSP studies in health [18, 19, 32]. The minimum number of attributes was three, and the maximum was 27. The median number of levels for each attribute was three, with a maximum of seven and a minimum of two.

Fig. 3
figure 3

a Number of attributes included in each study; b number of questions answered per subject in each study; c sample sizes for final analysis for each study; d number of studies examining a given aspect of preference heterogeneity. * indicates DCE, ** indicates BWS

In total, 14 studies (82.4%) used a fractional factorial design and two (11.8%) (Wicks et al. [31], studies 1 and 2) did not state the type of design used. Five studies (39.4%), all DCEs, selected their designs based on efficiency, and two (11.8%) explicitly reported using the criterion of D-efficiency.Footnote 4 One study [34] used a custom design with a contrast between DMT administration via pill or injection in every choice and all combinations of other attributes presented. Seven studies (41.2%) did not state which criteria were used to construct their design. The most popular tool used to construct study designs was the statistics program SAS (SAS Institute), with four studies (23.5%); Sawtooth (Sawtooth Software) was the second most popular, with three (17.6%). Five studies (29.4%) did not report how their designs were constructed.

Only 2 of 16 studies (12.5%) on DMT choice included an opt-out option [26] or justified why an opt-out was not included [18].

A concern in designing AbSP surveys is how many choice tasks can be included without the survey becoming a burden to participants [25]. Survey length varied considerably, as can be seen in Fig. 3b, which illustrates the number of choice tasks per subject in each study. The median number was 12 ± a standard deviation (SD) of 14.1, which is broadly in line with the wider AbSP health literature [18, 19, 32]. However, the number of choices is only one aspect of burden. Another aspect is the complexity of the task. For example, although Utz et al. [34] presented 64 choice tasks, with only two options and three attributes each, the choices were relatively simple. Several studies increased the total number of choice tasks without increasing the burden on participants by using several different versions of the survey, with the median number being four.

Two studies (11.8%) (Wicks et al. [31], studies 1 and 2) did not report how many decisions participants made, making it difficult to assess whether the burden was appropriate; nor did they report how many survey versions were used. Nine (53.9%) assessed response quality and/or its impact on results (e.g., by presenting the same choice twice, giving a dominated option or eliciting whether participants picked the same alternative for every question (“straight-lining”), which can be used as an indication of both understanding and that burden was not excessive.

Figure 3 illustrates the sample sizes obtained for final analysis, showing considerable variation. The median sample size was 189 ± 162. Only a single study (Wicks et al. [31], study 1) reported explicit power calculations, and only six (35.3%) reported other sample size considerations such as “rules of thumb”.

In total, 11 studies (64.7%) were administered online and five (29.4%) were administered using pen and paper. Only two studies [35, 36] reported—in line with item 7.2 of the ISPOR conjoint analysis checklist [25]—a justification of the chosen mode of administration.

3.6 Attributes

The attributes used by each study were collated and placed in 13 categories by one author (EW). All attributes were assigned to at least one category, and some were assigned to two categories (e.g., “route and frequency of administration” was classed both as route of administration and as frequency of administration).

Among the most common attributes were effect on relapse (13 [76.5%]), effect on progression (12 [70.6%]), severe side effects [12 (70.6%)], and mild side effects [13 (76.5%)]. Also common were route [10 (58.8%)] and frequency [13 (76.5%)] of administration. Only four (23.5%) looked at monitoring of treatment, and another four (23.4%) included further miscellaneous aspects of administration. Six studies (35.3%) explored attributes related to the alleviation of MS symptoms. Three (17.6%) included attributes explicitly related to quality of life, one of which [26] looked specifically at how PwMS valued health-related quality of life. Four (23.5%) included attributes related to magnetic resonance imaging (MRI) scans. Two (11.8%) included an attribute relating to reproduction (male and female), and two (11.8%) had miscellaneous attributes that fitted into no other category.

Eight studies (47.1%) looked at the mode of DMT administration PwMS preferred.Footnote 5 All included oral and injection options, though only three of eight (37.5%) distinguished between subcutaneous and intramuscular injection, and five of eight (62.5%) included intravenous infusion.Footnote 6 All but one of these studies [34] combined mode and frequency of administration into a single attribute, with the disadvantage that this made it impossible to fully disentangle their effects. On the other hand, in practice, there is a certain amount of correlation between mode and frequency, and combining them a priori rules out unrealistic combinations such as daily intravenous infusions or monthly pills. It also has the advantage of “freeing up” an attribute to describe some other aspect of treatment.

3.7 Probability

Both the benefits and the risks of DMTs are probabilistic in nature [10]. Most studies investigating preferences with DMTs (11 of 17 [64.7%]) did not explicitly quantify the probability of receiving a given benefit or experiencing a given adverse event. Only a single study [29] clearly documented using visual means to convey probabilistic information,Footnote 7 using both a risk grid (a square grid with shaded squares indicating how many patients experience the relevant outcome, e.g., five shaded squares out of 1000 to indicate a 0.5% risk) and a risk ladder (a scale giving the context of a given probability in terms of more familiar risks). No study examined how the presentation of probabilities influenced preferences.

3.8 Analysis Methods

Table 1 lists the main method of analysis for each study. The most popular method was mixed logit, with 10 of 17 (58.8%) studies, far ahead of the next most popular method, latent class, with three of 17 (17.6%). To analyze their data, four of 17 (23.5%) studies used Sawtooth. NLOGIT (Econometric Software) and SPSS (IBM) were each used by three studies (17.6%). Four studies (23.5%) did not report what software they used for analysis.

3.9 Preference Heterogeneity According to Respondent Characteristics

Addressing the needs of individual patients is a crucial part of shared decision making [37], so it is important to go beyond mean preferences to examine how preferences vary according to the characteristics of PwMS. Only eight studies (47.1%) linked respondent heterogeneity to observed characteristics, for example by including them as covariates in a regression [34, 35] or using latent class analysis [26, 36]. However, some others accounted for heterogeneity by using models such as mixed logit without linking it to respondent characteristics. One author (EW) categorised the aspects of heterogeneity considered by each study. Figure 3 illustrates how many studies examined a given category. Seven of eight studies (87.5%) tested for the influence of past or current treatment. Several studies explored heterogeneity according to demographic factors (age, sex, education), disease-related factors (disease status/history, diagnosis), or quality of life-related factors (for example, the influence on lifestyle of pain and fatigue).

3.10 ISPOR Conjoint Analysis Checklist Quality Assessment

In general, all studies scored well against the ISPOR conjoint analysis checklist, with a median count of 23/30 (range 18–27). Variation was low in both total counts and in most counts for individual items. The checklist was not useful in highlighting otherwise unconsidered issues. Given this, and its limited ability to discriminate between studies, we did not consider its use a success.

4 Discussion

We performed a systematic review of 17 AbSP studies in the field of MS. All but one study investigated the preferences of PwMS for aspects of DMTs, highlighting the importance of trade-offs when considering long-term treatment of this chronic condition, which makes DMTs an obvious topic for AbSP techniques.

The most common survey method employed was DCE, which is consistent with a greater number of DCEs than other types of survey in healthcare in general. Cheung et al. [20] found only 62 BWS studies in total published up until April 2016, whereas Clark et al. [19] found 179 DCEs between 2009 and 2012 alone. It also reflects that the structure of DCEs, i.e., choosing between two or more alternatives, was closer to the target decision-making situation of most studies—choosing between different DMTs—than other study types.

A consequence of the focus on DMTs is the higher proportion of studies being directed only towards patients with RRMS (42.9%), for which considerably more licensed DMTs are available compared with progressive MS, for which only one drug—ocrelizumab (Ocrevus®)—has thus far been approved [38, 39]. However, this is still an improvement on the situation a few years ago. Patients with RRMS and PPMS are distinct groups that differ in terms of past experience and projected disease course, meaning it is difficult to capture preference information from both groups using a single instrument. However, it also means that the preferences of people with different diagnoses may differ widely. Thus, while the literature’s focus on PwRMS was previously appropriate, the anticipated arrival of DMTs for progressive forms of MS means there is now also a need for research into the preferences of people with PPMS.

The use of qualitative methods to develop attributes was limited, reflecting an area for improvement; how attributes are developed and selected should also be better documented. It is not always appropriate to undertake extensive qualitative work in attribute development. (For example, Jonker et al. [40] used the well-known EuroQoL 5-Dimensions (EQ-5D) descriptions of health states as attributes, so qualitative work developing attributes would be nonsensical.) However, PwMS are a heterogeneous population with a large variety of health-related experiences. To avoid omitting crucial aspects of decision making in AbSP studies, it is therefore vital to involve PwMS in attribute development.

PwMS are not usually medical professionals, and many experience cognitive impairment as their disease progresses. Hence, even if attributes are largely dictated by the research question, qualitative interviews are useful in identifying the best way of meaningfully expressing attributes to participants. For example, a standard measure of the impact of a DMT on individuals’ future functioning in clinical trials is the number of people experiencing an increase of 0.5–1 on the expanded disability status scale (EDSS) over a 3- to 6-month period [41]. Such a measure is difficult to translate into a concept meaningful to PwMS. Qualitative work is thus particularly needed when developing and selecting attributes for AbSP studies in MS.

Most studies analysed their data using advanced modelling techniques such as mixed logit. However, several studies used a mixed logit model but referred to it as a hierarchical Bayes model or a hierarchical Bayes analysis. This nomenclature is incorrect, as hierarchical Bayes is not a model itself but rather an estimation method used to obtain the parameters of a model [42]. Many studies did not employ analytical techniques that examine response heterogeneity according to observed characteristics.

Given the diverse manifestations and chronic deteriorating nature of MS, it is particularly important to consider response heterogeneity if studies are to accurately reflect the range of patient experiences and opinions. For example, it would be interesting to examine the influence on decision making of risk preference due to the risks associated with DMTs, or cognition, given the cognitive impairments many PwMS experience. Comparison of the preferences and priorities of patients at different disease stages would also offer important insights into how experiences impact decisions. However, it should be noted that including respondent characteristics when analyzing AbSP data can be difficult because of the additional model parameters introduced.

Only a single study offered an opt-out option. A significant number of PwMS choose not to take any DMTs [43]. Hence, offering only a forced choice between DMTs means not capturing this aspect of their preferences. On the other hand, an opt-out option also means losing some information about respondents’ preferences between different options and taking the risk that people choose the opt-out only to avoid making difficult choices [44]. Thus, it is by no means appropriate for every study. However, studies that give only a forced choice should justify this decision and discuss its impact on their results. Future studies should also consider an alternative to an opt-out, such as a dual-response design (respondents first make a forced choice, then indicate whether they would prefer to opt out).

The design of a stated-preference survey is crucial for the interpretation of its results [25, 45]. Many studies failed to report the criteria by which they constructed their design, making it impossible for the reader to judge whether it was done appropriately. In addition, different software packages, and different versions of software packages, each have their own algorithms for design construction. Thus, it is important for this to be reported for study reproducibility, which several studies did not.

The reviewed studies employed a wide range of sample sizes, and it was often difficult to assess whether they recruited appropriate numbers of participants. Several “rules of thumb” for AbSP sample size exist [46, 47] as well as guides for calculation [48]. Thus, sample size considerations, whether explicitly calculated using priors or by less formal methods, are possible and usually necessary and should be both undertaken and reported in future studies. If whether researchers achieved an appropriate number of responses is unknown, it causes problems for assessing the quality and validity of its results.

Only two studies [35, 36] justified the mode of survey administration used, although it should be noted that—in many cases—the authors may have felt the justification to be self-evident to the reader (e.g., a population drawn from an online community). Nevertheless, given the physical and cognitive impairments experienced by many PwMS, which may impact the accessibility of surveys, it would be an improvement for future surveys to document that such factors were considered. Studies using a convenience sample from a clinic should also show they have considered the impact of this choice on the representativeness of their responses.

The most common attributes were related to prevention of relapses, progression, and side effects, which are probabilistic in nature. Yet, most studies presented the outcome of treatment decisions as certainties, e.g., respondents were certain to experience two relapses over the next 4 years. People’s preferences for probabilistic outcomes were extremely heterogeneous and can have a significant influence over their decision making. Thus, if preferences are elicited only for benefit/cost states as certainties (e.g. “three relapses in the next 4 years” [49]), it calls into question the external validity of the results for preferences over real DMTs. It is generally difficult to appropriately communicate probabilities (see Spiegelhalter [50] and Apter et al. [51] for overviews of current best practice). It can be even more difficult for participants to understand multiple probabilistic attributes. Thus, for pragmatic reasons, it is sometimes necessary to represent probabilistic aspects of treatments as certainties. However, given that most studies had no probabilistic representation at all, the appropriate and regular inclusion of probabilistic aspects of DMTs is thus a feature of the literature in need of improvement. In addition, if probabilistic outcomes are represented as deterministic to aid respondent comprehension, the possible impact of this should be discussed.

There is evidence that different ways of presenting probabilities influence individuals’ understanding of them [51, 52], and that understanding can be improved by using graphic representations of probabilities [53, 54]. However, studies that did use probabilistic attributes did not report on whether they considered their mode of presentation appropriate. Only one study displayed probabilistic information visually using graphs or pictographs. None of the studies explored how choices were influenced by different modes of presentation. Likewise, no study examined the impact on the DMT preferences of PwMS over Knightian uncertainty (outcomes whose probabilities of occurring are not explicitly quantified, or “unknown unknowns” [55]), although the long-term effects of DMTs are unknown in many cases and—even in the short term—the risks of rare side effects may not be well quantified [56, 57].

Given so many aspects of MS and DMTs are characterized by poorly quantified risk and uncertainty, better tools to communicate risk are required,Footnote 8 particularly against the backdrop of the cognitive impairment associated with MS.

Only two studies included an attribute related to reproduction, and it did not play a significant role in the analysis of either. We believe this to be an understudied area, because of previous research highlighting its importance [59, 60], the higher incidence of MS in women of child-bearing age [61], and the variety of advice regarding conception, pregnancy, and breastfeeding [62]. In addition, clinical research into the influence of DMTs on reproduction is lacking [63, 64], and some DMTs are contraindicated for men with MS trying to conceive [65].

The vast majority of studies (88.2%) were funded by pharmaceutical companies. These companies have many reasons to fund studies; for example, information about patient preferences can be useful in both marketing existing drugs and informing future drug development. It can also be used to aid regulatory decisions, for example with the US FDA.Footnote 9 Pharmaceutical companies are the funders of nearly all the AbSp studies in MS. Thus, nearly all the literature addresses the aims of pharmaceutical companies. A broader range of funders would be welcome, as this would bring a wider range of research objectives and greater diversity of studies.

A strength of our work is its focus on the technical aspects of AbSP studies in MS. The number of such studies is increasing over time, and our work can serve both as a guide to the details of running them and as a practical aid to future research.

Another strength is that we have highlighted several areas of current practice that can be improved, particularly greater use of qualitative methods and better reporting of survey design choices. We have also highlighted gaps in the current literature. Future studies may wish to consider examining patient preferences surrounding DMTs and reproduction, how different methods of risk communication affect the decision making of PwMS, or the effect of Knightian uncertainty.

Our study has several limitations. We have not quantitatively combined the results of studies. We took this decision partly because of our focus on study design and because of the difficulties in combining numerical results from studies using different methodologies and different ways of presenting results, as well as different attributes and level sets. Nevertheless, in the future, a synthesis of results from AbSP studies in MS would be informative.

We do not recommend using the ISPOR conjoint analysis checklist to assess the study quality for future reviews of AbSP studies. It did not distinguish between minimum acceptable practice—for example, basing attributes on a non-systematic review of clinical literature—and good practice, such as developing attributes through extensive qualitative research. That it was not a good measure of quality is perhaps unsurprising, as it was created not for that purpose but rather as a rough guide to best practice when developing surveys [25].

Our focus on details of study design meant that we excluded unpublished studies such as conference proceedings, as we felt they included insufficient methodological information.

5 Conclusion

Shared decision making including patient preference views on treatment is increasingly used in medicine, particularly with chronic conditions such as MS, with an evolving DMT landscape [67, 68]. Thus, it is important to investigate patient preferences, especially when the experiences of PwMS are very heterogeneous and many treatment options are available.

AbSP studies such as DCEs are increasingly used to measure the preferences of PwMS, providing insights into this field. We have highlighted several areas in need of improvement, particularly a greater use of qualitative methods in attribute and level development. Further work should be undertaken to better characterize the role of reproduction in decision making, and better communication of risk is warranted.

Data Availability Statement

Data extracted from studies are included as Electronic Supplementary Material.