Patient-reported outcomes (PROs) refer to direct patient reports of their experience with a disease and its treatment. PROs are a subset of the broader concept of “patient-generated outcomes” that refer to the full range of health information provided by patients about themselves, which can include demographic information, prior history, and so on. PROs are obtained via “report of the status of a patient’s health condition that comes directly from the patient, without interpretation of the patient’s response by a clinician or anyone else” [1]. PROs include symptoms, functional level, health status, health utility, and health-related quality of life (HRQoL). By their very nature, PROs are patient-centered and their addition to traditionally collected anatomical, biological, and clinical data has resulted in a fundamental shift in how research and clinical practice are conducted. This article presents an overview of PRO assessment, how its inclusion in multiple sclerosis (MS) clinical research has evolved over time, and directions for the future.

MS is a chronic progressive disease that usually manifests in young adulthood. It has limited impact on life expectance. There is no typical course to disease progression and MS produces a myriad of symptoms that can occur at any time and in different combinations. MS is not a curable disease; treatment tends to focus on symptom management, limiting disease progression with the use of disease-modifying drugs (DMDs), and maximizing quality of life. MS research, then, must include evaluation of how well interventions achieve these goals. As patients appreciate the totality of their disease experience, these goals can best be defined with their input. In addition, patient perceptions frequently differ from those of clinicians [2,3,4,5]. Assessment of patient functioning in the treatment setting, however well measured, may not accurately reflect their functioning at home [6]. Thus, PROs are generally accepted as important to assess in clinical research for most chronic conditions [7, 8], including MS (see the proceedings from the annual meetings of the International Society for Quality of Life Research) [9,10,11]. Further, PROs are increasingly seen as supporting every aspect of the healthcare continuum from research to clinical practice (including clinical decision-making and quality reporting) through to public health [12,13,14,15,16].

Approaches to PRO Assessment

Most PRO measures are categorized as either generic or targeted. Generic measures include questions that are general enough for use with both healthy and clinical populations. Generic measures used in MS include the Medical Outcomes Study Short Form-36 (SF-36) [17], the Sickness Impact Profile [18], and versions of the Health Utilities Index [19]. Targeted measures are comprised of questions aimed towards specific diseases (e.g., MS), domains (e.g., cognition, fatigue), or interventions (e.g. use of biological response modifiers). Examples of symptom-focused, domain-specific measures include the Brief Pain Inventory [20], and the pain, fatigue, depression, sleep, and other symptom measures included in the Patient-Reported Outcomes Measurement Information System (PROMIS) and the Neurology Quality of Life (Neuro-QoL) measurement system. Disease-specific measures for MS are numerous, and include the Multiple Sclerosis International Quality of Life (MusiQoL) [21], the Multiple Sclerosis Quality of Life-54 (MSQOL-54) [22], the Functional Assessment of Multiple Sclerosis (FAMS) [23], the Multiple Sclerosis Impact Scale (MSIS-29) [24], the Patient Reported Impact of Multiple Sclerosis (PRIMUS) [25], the Hamburg Quality of Life Questionnaire in Multiple Sclerosis [26], the MS Quality of Life Inventory [27], the Multiple Sclerosis Impact Profile [28], the Leeds Multiple Sclerosis Quality of Life scale [29], the Disability and Impact Profile [30], and the RAYS scale [31]. Some PRO measures, like the Disability and Impact Profile, also incorporate patients’ perceptions of how important each effect of MS is on their lives. Generic PROs, usually normed against general/healthy populations, are appropriate for cross-disease comparisons, and are useful in resource allocation and cost-effectiveness analyses. In contrast, targeted measures can provide more in-depth and comprehensive coverage of a specific domain or area, are thought to be more sensitive to changes in health status or function, and questions may seem more relevant and therefore more acceptable to patients [32]. This has led to recommendations for using a combined approach (generic and targeted) when feasible [33, 34]. The MSQOL-54, for example, includes the generic SF-36 along with 18 MS-specific questions. PROs also differ in whether they measure a single dimension of health or multiple dimensions. Multidimensional, or profile measures, typically provide separate scores for each dimension rather than a summary score. Some, such as the FAMS, provide both individual dimension scores and a total summary score.

Health Utility Measures

Health utility measures are a special type of PRO derived from economic and decision theory. Utility measures cover multiple health domains but provide a single summary score. This score reflects patient preferences for different health states, with higher preferences signifying greater value or desirability. Scores typically range from 0 (death) to 1 (perfect health), although negative numbers (states worse than death) are possible. Utility scores can be used to compute quality-adjusted survival, measured in quality-adjusted life years (QALYs), by supplying the “quality” component of the equation. QALYs combine both mortality (survival) and morbidity (HRQoL) into a single index [35, 36], applicable across diseases and interventions. QALYs are frequently used in cost-effectiveness and similar analyses to help guide resource allocation. Commonly used generic utility measures include the EQ-5D [37], the Quality of Well-Being scale [38], and the Health Utility Index [39, 40]. However, generic health utility measures have been criticized for being insensitive to treatment effects and/or failing to adequately cover all relevant dimensions for a given condition [41]. This has led to efforts to develop disease- or condition-specific utility measures such as the 15D [42] and the Multiple Sclerosis Impact Scale – Eight Dimensions [43, 44].

Item Response Theory and PRO Assessment

In recent years, PROs have begun to incorporate modern measurement science [e.g., item response theory (IRT)] into their development. Utilization of IRT provides the instruments with certain advantages, including the ability to be brief while remaining precise and valid [45]. Using IRT methodology, sets of questions (items) are calibrated along a continuum that covers the full range of the construct to be measured. Once calibrated, any or all items in this “bank” can be used to generate a score. Users can select specific items to create “short forms” (SFs), typically consisting of 6 to 8 questions that meet their measurement needs. For example, a user wishing to assess upper extremity function in a group of patients who tended to have poor or very poor fine motor abilities could create a custom SF primarily comprised of those items targeting lower levels of upper motor function that cluster near the lower end of that bank. Item banks are also the basis for computerized adaptive testing (CAT). This is a specialized type of computer-based testing in which, after the initial item is presented, the test administration algorithm selects each item to be presented based on the response to the previous item. Items are adaptively selected until either the desired level of score precision is achieved (typically after 4–6 items) or a predetermined maximum number of items (e.g., 12) have been administered. CAT allows for frequent and precise evaluation of patients at the individual level while placing minimal burden on patients [45,46,47,48]. Users can administer short, unique tests to every individual, with reliability and scores equivalent to longer, fixed-length assessments.

The Need for Common PRO Measures: Neuro-QoL and PROMIS

The variety of generic and MS-specific measures available for use in MS research has some benefits, but the lack of a set of standard measures also has significant disadvantages. For example, failure to use common measures results in an inability to combine and compare data across different studies, which slows the pace of discovery. Further, some available measures are of uncertain validity and were created without using modern test development methodology. This state of affairs led the National Institutes of Health (NIH), in the mid-2000s, to support the creation of 2 standard sets of PRO measures, 1 appropriate for use across neurological conditions (Neuro-QoL) and 1 for use across a broad range of chronic health conditions (PROMIS). These systems are described in more detail below.

Neuro-QoL (Quality of Life in Neurological Disorders)

Neuro-QoL is a clinically relevant, validated PRO measurement system to assess the HRQoL of adults and children with neurological conditions [49,50,51]. Neuro-QoL development was funded by the National Institute of Neurological Disorders and Stroke (NINDS) with the goal of creating a standard set of measures to assess outcomes relevant across common neurological conditions (generic), as well as outcomes relevant to specific patient populations (targeted) that would permit comparisons across neurological populations.

The development methodology used for Neuro-QoL, which included incorporation of patient input and utilization of IRT, is consistent with Food and Drug Administration guidance regarding PROs and the European Medicines Agency Reflection Paper on the use of HRQL measures [52]. The adult Neuro-QoL consists of 12 item banks and 1 scale assessing important aspects of mental well-being (anxiety; depression; positive affect and well-being; cognitive function; emotional and behavioral dyscontrol; communication); physical well-being (upper extremity function – fine motor, activities of daily living; lower extremity function - mobility; sleep disturbance; fatigue; stigma), and social well-being (ability to participate in social roles and activities; satisfaction with social roles and activities). These areas were chosen using input from patients, caregivers, and clinical providers [53]. The item banks are the basis for standalone fixed-length SFs and CAT. The standard Neuro-QoL SFs, typically consisting of 8 to 9 items selected with clinical expert input, were validated in 5 major neurological adult conditions, including MS. With respect to MS, the SFs demonstrated good internal consistency, test–retest reliability, and concurrent and known groups validity [54]. Some initial evidence of responsiveness to self-reported change was shown in the initial Neuro-QoL observational trial. Ongoing efforts (e.g., in the MS PATHS initiative, described later in this paper) to collect longitudinal Neuro-QoL data along with other disease characteristics will enable closer evaluation of responsiveness. Neuro-QoL data are reported as T-scores (mean of 50 and SD of 10) referenced to either general or clinical population samples. T-scores enable users to determine how far a given score is from “average”. For some measures, additional interpretative information regarding minimal values reflecting important change and cut points for different severity levels is available [55].

PROMIS

What began as an NIH Roadmap initiative to create a publicly available system for measuring generic PROs across diseases and conditions, PROMIS currently offers an extensive library of measures available in a range of flexible assessment options [56]. PROMIS item banks, with associated CATs and short forms, cover common symptoms, functions, behaviors, and feelings within the overarching domains of physical, mental, and social well-being. While some Neuro-QoL and PROMIS measures cover the same domains (in which case the Neuro-QoL measure is preferred since it was validated in neurological conditions), PROMIS dramatically expands the range of PROs that can be assessed. As with Neuro-QoL, PROMIS measures are IRT-based, were developed in a manner consistent with regulatory agencies’ PRO guidance, and have been shown to be valid, reliable, and responsive [57, 58]. Recently, a process to derive a generic preference-based summary score (PROMIS-Preference or PROPr) from PROMIS measures has been developed ( http://janelhanmer.pitt.edu/ProPr.html ) and is available for use.

Both Neuro-QOL and PROMIS measures convey several advantages. They are brief (~1.5 min per short form; ≤ 1 min per CAT [59]), available in a variety of administration formats (interviewer or self; paper-and-pencil, computer), and patient centered (e.g., assess outcomes that patients identified as important). When using CATs, investigators can achieve precise scoring at the individual patient level while maintaining brevity. Distributed through HealthMeasures (www.healthmeasures.net), a resource for disseminating and supporting 4 NIH-supported measurement systems, PDFs of all Neuro-QoL and PROMIS measures can be freely downloaded from the HealthMeasures website. They are also contained within data-collection tools that provide automated scoring and immediate data access [60]. All measures are available in English and Spanish, with additional translations available for individual measures (e.g., > 15 for fatigue). Finally, many Neuro-QoL and PROMIS measures are “linked” to each other through efforts of the PROsetta Stone ( www.prosettastone.org ) project. Linking provides equivalent scores for different measures of the same health outcome through an IRT-based mechanism of equating the instruments along a common measurement continuum [61]. Using tables provided by PROsetta Stone, users can convert scores on Neuro-QoL to scores on PROMIS and vice versa.

Neuro-QoL and PROMIS also have some limitations. They do not cover all domains relevant to MS, only some of the measures have been validated in MS, and the availability of translations varies across instruments. Further, additional work is needed to demonstrate sensitivity to change and the amount of change that is clinically significant.

Selecting a PRO

In selecting a PRO measure, the first step is to identify candidate PROs that have a conceptually valid link to the outcome(s) of interest. As noted above, however, this may result in a large number of PROs from which to choose. Fortunately, guidance on selecting PROs is available [62,63,64]. In general, it is important to consider the setting (e.g., clinic), the purpose for which the PRO will be used, and characteristics of the instrument itself. One should evaluate both the psychometric evidence for using a measure, as well as consider practical aspects such as ease of use (e.g., respondent burden, data collection options) and other logistics related to implementation [65]. The degree to which a measure conforms to regulatory guidelines for measure development, such as patient involvement in domain selection and question refinement, should be assessed. Francis and colleagues (2016) recently synthesized available guidelines and recommendations into a checklist that clinicians and researchers at varying levels of expertise can use to evaluate how well a PRO meets widely accepted criteria for selection: conceptual model, content validity, reliability, construct validity, scoring and interpretation, and respondent burden [66]. Cultural relevance, linguistic adaptation, and respondent literacy level may be particularly important in multinational or multiregional trials [10].

A NINDS initiative, the Common Data Elements (CDE) project (http://www.commondataelements.ninds.nih.gov), can also be helpful in measure selection. The overarching, ongoing goal of the CDE project is to standardize data collection in neurology clinical research, including collection of PROs such as ability to perform activities of daily living and HRQoL. NINDS CDEs for MS were released in 2011 and are periodically updated.

Use of PROs in MS Clinical Research

Assessment of PROs is of critical importance in research on the consequences of MS, as well as treatments for the disease. Typically, more effective DMDs also tend to carry a greater risk of treatment adverse events [67]. As the evaluation of PROs has increased in clinical studies, our understanding of how MS and MS treatments affect patients has also increased. In their seminal 2003 review article on the impact of MS on quality of life, Benito-Leon et al. [68] described the first published paper, which appeared in 1992 [69], that assessed how living with MS affects HRQoL. It demonstrated that individuals living with MS had more impaired HRQoL than those living with inflammatory bowel disease or rheumatoid arthritis. They noted that through the 1990s, research on the impact of MS was primarily limited to assessment of impairment and disability especially as measured by the Expanded Disability Status Scale [70, 71]. In the decades that followed, a variety of PROs were incorporated into MS research and measured in varying ways. PROs have been used to demonstrate the negative impact of MS versus the general population (using the generic measures WHO-QOL) [72,73,74], and to show how PROs evolve over the course of MS through registries [75,76,77,78] (using the SF-12 [79] and the PROMIS-10) and in observational studies (using the SF-36, the Leeds Multiple Sclerosis Quality of Life, and the MusiQOL) [80,81,82]. PROs also provide unique insight into the negative effect MS symptoms have on aspects of well-being, including pain (assessed using the SF-36, the MSQOL-54, and the FAMS) [83,84,85]; cognition (using the SF-36, the FAMS and the MusiQOL) [86,87,88]; sexual dysfunction (using the SF-36 and the MSQOL-54) [89,90,91], and depression (using the SF-36 and the MSQOL-54) [92, 93].

PROs as Secondary Endpoints in Randomized Clinical Trials of DMDs

As reported by the Multiple Sclerosis Coalition, as of March 2017, there are currently 14 disease-modifying agents that have been approved by the appropriate regulatory agencies. Of these 14, the PRO results from randomized controlled phase III trials for 3 of the medications reported before 2010 have been previously summarized [94, 95]. The PROs used in those studies included both generic and disease specific measures such as the generic Sickness Impact Profile [interferon (IFN)-β1b [96]] and SF-36 (natalizumab [97]) as well as the disease specific FAMS (IFN-β1b [98]) and the MS Quality of Life Inventory (IFN-β1b [99] and IFN-β1a [100]).

In order to update the PRO research in phase III clinical trials, the authors of this review, in March 2017, conducted an updated literature search of PROs used in randomized phase III trials of MS DMDs. It was based on searches of MEDLINE and Pubmed, from January 2011 to March 2017, using the search terms “multiple sclerosis”, “randomized trial”, “disease modifiying therapy”, “quality of life”, “patient reported outcomes”, and “PROs”. The recent review of DMDs and PROs by Jongen [101] also aided in this search. These searches identified 5 phase III, randomized controlled trials of MS DMDs with PRO endpoints.

Two injectable medications have recently been approved using data from these trials: pegylated (peg) IFN-α [102, 103] and daclizumab [104, 105]. pegIFN-β1a 125 μg is administered subcutaneously every 2 weeks and has been proved an effective treatment versus placebo or to pegIFN-β1a 125 μg administered every 4 weeks. The PRO benefit of pegIFN was demonstrated in the ADVANCE study by comparing the MSIS-29 scores at 42 weeks with MSIS-29 baseline scores in the 3 treatment groups: placebo-treated patients, those treated with pegIFN-β1a 125 μg administered every 2 weeks and pegIFN-β1a 125 μg administered every 4 weeks [106]. In the placebo group, based on the MSIS-29 physical subscale, there was a statistically significant worsening from baseline to month 48 [mean 1.24, 95% confidence interval (CI) 0.05–2.44]. In the pegIFN-β1a every 2 and every 4 weeks groups, no statistically significant worsening was found on the MSIS-29 physical subscale, which had mean changes of 0.08 (95% CI −1.10 to 1.27) and 1.12 (95% CI −0.05 to 2.28), respectively. All 3 treatment groups were found to have a statistically significant improvement at week 48 from baseline on the MSIS-29 psychological subscale, with a mean change of −2.17 (95% CI −3.63 to −0.70), −2.06 (95% CI −3.58 to −0.53), and −1.70 (95% CI −3.24 to −0.15) in the placebo and pegIFN-β1a every 2 and every 4 weeks groups, respectively. The between-group differences in mean change from baseline were not statistically significant (p > 0.05) for MSIS-29 physical or psychological subscales, and were smaller than differences considered to be clinically meaningful.

Daclizumab, the other recently approved self-injected DMD, is administered at a dose of 150 mg once monthly. The SELECT trial was designed to assess if daclizumab high-yield process (HYP) at 2 doses (150 mg and 300 mg) was effective when given as a monotherapy [104]. The MSIS-29, the SF-12, and the EQ-5D were included as PRO tertiary endpoints. The 2 doses of daclizumab HYP were superior to placebo on the primary and the majority of the additional endpoints. For the PROs, there was significant improvement in the mean MSIS-physical subscale in the group receiving the 150 mg (mean ± SD change score –1.0 ± 11.8) compared with placebo (mean ± SD 3.0 ± 13.5; p = 0.00082), whereas there was no statistical difference between 300 mg and placebo (p = 0.13). Similar patterns of significance were found for the MSIS-psychological subscale and the SF-12 physical and mental components. Only with the EQ-5D visual analog scale was there a significant difference between the 300 mg group and placebo (p = 0.015), although not as significant as the difference between 150 mg and placebo (p < 0.001). A post-hoc multivariate analysis of these data demonstrated that relapse occurrence and confirmed disability progression at week 12 were statistically associated with decline in PRO scores [107]. In the second trial of daclizumab HYP, it was compared with IFN-β1a in relapsing MS. Participants in this DECIDE trial had greater disease activity than those in the SELECT trial [108]. Patients were randomized at a 1:1 ratio to receive 150 mg subcutaneous daclizumab and intramuscular placebo monthly or 30 μg intramuscular IFN-β1a and subcutaneous placebo monthly. The primary endpoint was annualized relapse rate. The MSIS-29 was included as a tertiary endpoint. The study was positive for daclizumab on the primary endpoint and MRI results, but there was no between-group difference on the MSIS-29.

Three oral MS DMDs, teriflunomide [109, 110], fingolimod [111], and dimethyl fumarate [105, 112] have recently been approved and the pivotal trials include PRO data. During the initial teriflunomide phase III trial (TEMSO), eligible patients were randomly assigned to receive a once-daily oral dose of placebo, 7 mg teriflunomide, or 14 mg teriflunomide for 108 weeks [110]. The primary endpoint was reduction in annualized relapse rates. The PRO Fatigue Impact Scale (FIS) was included as a secondary endpoint [113]. The primary endpoint was met for both doses of teriflunomide. However, patients in all groups reported very limited change in fatigue from baseline and there were no significant between-group differences in those changes [teriflunomide 7 mg vs placebo (p = 0.39); teriflunomide 14 mg vs placebo (p = 0.83)]. The second teriflunomide trial, TOWER, was designed to provide additional safety and efficacy data [109]. Study duration was a fixed time point 48 weeks after the last patient was randomized. The trial had similar inclusion criteria, treatment assignments, and primary and secondary endpoints as the TEMSO study. In addition to the FIS, the SF-36 was included as a PRO in this trial. These PRO data were collected at week 48 and last study visit. In the primary analysis, teriflunomide 7 mg and teriflunomide 14 mg significantly reduced the annualized relapse rate compared with placebo. For both Physical Component Score (PCS) and Mental Component Summary (MCS) of the SF-36, neither treatment differed from placebo during the 48-week treatment. There was no between-group difference from baseline to last visit on the SF-36 PCS score, whereas there was a significant between-group difference favoring teriflunomide 14 mg to placebo during the time frame (p = 0.02) for the SF-36 MCS score. There were no between-group differences on FIS score at week 48, whereas there was a significant between-group FIS difference favoring teriflunomide 14 mg to placebo from baseline to last study visit (p = 0.04).

The FREEDOMS II trial [111] was the third pivotal trial for fingolimod following the TRANSFORMS [114] and FREEDOMS I [114] trials. It was designed to further assess the drug’s safety and efficacy at doses of 0.5 mg and 1.25 mg versus placebo once daily. The primary endpoint was annualized relapse rate after 24 months of treatment. The PRO secondary endpoints included the EQ-5D, the PRIMUS, and the mFIS. While the primary endpoint of annualized relapse rate was reduced in the treatment groups compared to placebo (n = 778), there were no significant between-group differences for any of the PRO scores.

The effects of dimethyl fumarate on PROs in CONFIRM, a randomized placebo-controlled study, were evaluated in a secondary analysis of that trial [112]. The investigators used the SF-36 to assess the between-group differences between dimethyl fumarate 240 mg twice daily, dimethyl fumarate 240 mg 3 times daily, and glatiramer acetate 20 mg 4 times daily. Change in SF-36 scores over 2 years was the target PRO endpoint. At study’s end, a higher proportion of all subjects in the active treatment groups had a clinically significant change (≥ 5 point) in both SF-36 PCS and MCS. This change was also statistically significant for the dimethyl fumarate 3 times daily group.

The CARES-MS I and CARES-MS II studies compared alemtuzumab to IFN-β1a in unblinded phase III trials treating relapsing-remitting MS. These trials included FAMS as the primary PRO and also included the SF-36 and EQ-5D. Results indicated significant and sustainable advantages to alemtuzumab on all 3 measures [115]. In both studies, patients on alemtuzumab significantly improved from baseline to month 6 on 5 of 6 FAMS components (mobility, symptoms, thinking and fatigue, general contentment, and emotional well-being), and improvements were maintained until month 24 (p < 0.05). Between-group comparisons of change in FAMS component scores favored alemtuzumab over subcutaneous IFN-β1a at multiple time points [mobility, symptoms, and thinking and fatigue (both studies), and general contentment (CARE-MS II only)]. Our search found no reports of PROs included as outcome measures in studies of the following injectable agents: IFN-β1a subcutaneous (Rebif), generic IFN-β1b subcutaneous (Extavia), glatiramer acetate or its generic equivalent (Glatopa), or the infusion drug mitoxantrone.

Based on the variability of PRO data from these trials and the variety of measures implemented it is not possible to reach any conclusion about the overall benefit of DMDs or the relative benefit of one over another on the well-being of individuals living with MS. As described earlier in this paper, the availability of standardized measures, including PROMIS and Neuro-QoL, that assess general well-being as well as discrete aspects (e.g., symptoms, function) of physical, psychological, and social well-being offers a systematic approach to assessing patient well-being in clinical trials and other research methodologies. Use of a standard measurement approach will allow for PRO comparisons across DMD trials and the determination of which DMDs are most appropriate for differing cohorts of patients with MS, i.e. for the purposes of comparative effectiveness research (CER), which is discussed in more detail below.

As is suggested in the above review, the primary endpoints in the majority of clinical trials are of 2 types. The first are clinical endpoints such as annualized relapse rate or time to confirmed disease progression. The second type are biomarker endpoints that measure the level of ongoing disease activity. However, as demonstrated in the above review, there has been growing recognition of the importance of patient-reported endpoints.

Expanding Research Approaches

Many countries use health technology assessments (HTA) to systematically evaluate the properties, impacts, and benefits/added value of health technologies, including medications [116, 117]. HTA results are used to inform healthcare policy and decision-making, such as reimbursement decisions and the establishment of clinical guidelines. HTA agencies often emphasize the involvement of patient and other stakeholder perspectives in the HTA process, including measurement of and consideration of HRQoL as part of a technology evaluation [118]. A related research approach that is gaining increasing acceptance in the USA, especially for chronic illnesses, is CER. As defined by the Patient-Centered Outcomes Research Institute (PCORI), which was created by the US legislation known as the Patient Protection and Affordable Care Act (ACA), CER includes research approaches that help patients and other healthcare stakeholders, including caregivers, clinicians, insurers, and others, make better-informed decisions about their health and healthcare [119]. CER methodologies include randomized clinical trials, large pragmatic studies, and large-scale observational studies. In 2015, PCORI convened a MS Stakeholder Workgroup that established the 5 top CER priorities [120]. These included: 1) What are the comparative benefits and harms of nonpharmacological and pharmacological approaches in relation to key symptoms (e.g., emotional health, fatigue, cognition, pain) in people with MS? 2) In people with progressive MS, what is the comparative effectiveness of different care delivery approaches (i.e., MS specialty center vs community neurology; direct care vs telemedicine; “specialized medical home” vs community neurology delivery of care) in improving outcomes such as functional status, quality of life, symptoms, emergency room use, and hospitalization? 3) Does an integrative model of care along with DMDs in newly diagnosed individuals affect disability progression and symptoms (physical, emotional, and cognitive) compared with treatment with DMD alone? 4) Among patients with MS receiving a DMD who experience disease activity, what are the benefits and harms of continuing the same therapy versus changing to a new medication? 5) What are the comparative benefits and harms of different disease-modifying therapies in newly diagnosed relapsing-remitting MS on disease activity, disease progression, symptoms, and quality of life?

Members of the PCORI MS Stakeholders Workgroup and others recommend the consideration of additional types of study endpoints, such as composites, in order to expand the concepts of interest beyond the commonly used clinical endpoints named above. One such composite, comprised of 3 dimensions of MS disability, is the Multiple Sclerosis Functional Composite, which assesses ambulation, manual dexterity, and cognition—all clinical outcome measures [121,122,123,124]. Another such composite, “no evidence of disease activity” (NEDA) is being put forth as the goal of MS treatment [11]. It currently consists of Expanded Disability Scale Score disease progression, relapse rate, and formation of MRI lesions, and may include any measure of disease activity. Many argue that this definition of NEDA is far too narrow and needs to include other parameters, including PROs [11, 125,126,127].

If the objective of MS care is NEDA, and it is to be demonstrated in a meaningful time frame, that outcome can most effectively be demonstrated by the large-scale observational studies proposed as one of the CER methodologies by PCORI. An effective way to develop the infrastructure for conducting methodologically rigorous observational studies is through the creation and maintenance of a “learning healthcare system” as is described in the Institute of Medicine’s “Best Care at Lower Cost. The path to continuously learning health care in America” [128]. The Institute of Medicine provides the following definition:

A learning health care system is one in which science, informatics, incentives, and culture are aligned for continuous improvement and innovation, with best practices seamlessly embedded in the care process, patients and families active participants in all elements, and new knowledge captured as an integral by-product of the care experience (Roundtable on Value & Science-Driven Health Care, 2012).

In order to support the CER approach, the Cleveland Clinic Mellen Center for Multiple Sclerosis Treatment and Research has partnered with 8 other MS comprehensive care centers to form MS PATHS, the first learning health system for MS [129]. The aims of MS PATHS is to create a network of healthcare institutions, leveraging technology and patient engagement to collect standardized, quantitative data on each patient attending the member institution at every follow-up clinical encounter. Willing patients who agree to share anonymized data are enrolled in the data collection process. This project is anticipated to use quantitative, multidimensional data to advance disease understanding, support effectiveness research, accelerate translational research, and provide outcomes data for value-based models of reimbursement. The computerized adaptive testing method of the Neuro-QoL assessment platform is the instrument used to obtain PRO data across the MS PATHS sites.

Conclusion

Implementing PROs has become an important consideration for a number of agencies involved in generating basic and clinical research, financing healthcare, and vetting endpoints that lead to drug and device approval. In the USA, the NIH and the NINDS have devoted considerable resources to the development and validation of PROs and intend their use as study endpoints in human subject trials they support [130]. PROs are an important component of the Affordable Care Act. Under that act, the Centers for Medicare and Medicaid have included PROs in their Quality Development Plan: Supporting the Transition to the Merit-based Incentive Payment System and the Alternative Payment Models [131] and its mandate to demonstrate quality in order to enhance reimbursement (pay for performance) [132].

Clearly, PROs are core outcomes of MS therapy today. Improvements in HRQoL and other PROs are possible, but not always obtained, with today’s MS treatments. There are many available measurement options, including generic, symptom-targeted, and MS-specific, available to researchers and clinicians for use when assessing these outcomes. Available tools now include modern measures that utilize the assets of item response theory, including custom short forms or computerized adaptive testing, to tailor assessment of life domains that are important to people living with MS. Going forward, clinical trials and comparative effectiveness research can help determine which of the many available therapies approved for treatment of MS is likely to be most effective for people on the outcomes they care most about.