FormalPara Key Points

In a competitive market where interventions often demonstrate similar effects on primary endpoints, patient-centric endpoint data can be valuable drivers of differentiation. Many of these patient-centric endpoints are best measured with patient-reported outcomes (PROs).

The provision of care that is respectful of, and responsive to, individual patient experiences (preferences, needs, and values) constitutes one of the core principles of evidence-based medicine.

The focus on patient experiences differs between regulatory, payer, healthcare professional and patient stakeholders, with the regulator primarily interested in ‘first-order impacts’ (signs/symptoms) and their direct effects, the payer increasingly interested in health-related quality of life, and healthcare professionals and patients interested in a broader set of endpoints that may be used to aid clinical decision making. They also differ in their utilisation of, and preference for, generic and disease-specific PROs.

A robust, comprehensive and systematic endpoint strategy can be developed to meet the needs of all stakeholders in a single development programme for an intervention, if considered early in clinical development.

1 Introduction

The hurdles for the development and commercialisation of pharmaceutical interventions have become a mounting challenge [1]. In order to develop a commercially successful intervention, regulators (to provide marketing authorisation), payers (for reimbursement and formulary placement), healthcare professionals (HCPs; to prescribe) and patients (to adhere and persist) all need to consider the benefit:risk profile to be acceptable and, in doing so, generate a relative perception of value. The definitions of value often differ between these stakeholders, although all consider the patient’s perspective of an illness and its treatment, including domains such as signs and symptoms, physical functioning, occupational functioning and treatment satisfaction, to varying degrees when evaluating the efficacy, safety and comparative effectiveness of medical products, and when choosing between interventions.

The patient’s perspective of an illness and its treatment can be collected in a quantifiable and standardised manner using patient-reported outcome (PRO) instruments. A PRO is a measure of any aspect of a patient’s health status that comes directly from the patient, and is based on the patient’s perception of a disease and its treatment(s) [2]. PRO instruments can measure a range of concepts, including signs and symptoms of a condition, side effects of medication, physical and psychosocial impact of treatment on daily life, treatment satisfaction, adherence to treatment, health-related quality of life (HRQL), or a combination of these factors. An example of a conceptual model, shown in Fig. 1, shows the relationship between the disease, treatment and such PRO concepts. In clinical trials evaluating the efficacy of treatments, PRO instruments are used as a primary outcome in some therapeutic areas (e.g. to evaluate analgesic products), and as a secondary or exploratory outcome in many other areas (e.g. myelofibrosis, diabetes). However, there is a lack of clarity about the most appropriate way to incorporate PRO instruments into the clinical development of pharmaceutical interventions in a way that will resonate with the four key stakeholder groups—regulators, payers, HCPs and patients.

Fig. 1
figure 1

Example of a conceptual model. HRQL health-related quality of life, QoL quality of life

This article provides an overview of the similarities and differences in the manner with which the various stakeholders seek, use and interpret data about the patient’s experience. Furthermore, the article seeks to provide guidance on how to efficiently incorporate PRO instruments into a clinical development programme in order to maximise the validity and acceptability of the data generated, thereby ensuring adequate evidence relative to the perspective of the patient is provided to all stakeholder groups.

2 The Regulatory Stakeholders

Both the European Medicines Agency (EMA) Committee for Medicinal Products for Human Use (CHMP) [3] and the US FDA [2] have released guidance documents on the measurement of PROs in clinical research. Furthermore, many recent regulatory product development guidance documents for clinical/medical research include recommendations or statements regarding the use of PROs and/or PRO instruments [e.g. the 2014 FDA Guidance for Industry on Chronic Fatigue Syndrome/Myalgic Encephalomyelitis (Developing Drug Products for Treatment), and the 2014 EMA reflection paper on the use of PRO measures in oncology studies]. Clearly, the regulators recognise PROs as important outcomes for evaluating drugs, biologics, and medical devices, and PRO data have been used as primary and secondary endpoints to support primary biomarkers in both jurisdictions [46].

The FDA guidance on the measurement of PROs in clinical research focuses primarily on unidimensional PRO endpoints that are proximal to the physiological intent of the intervention (‘first-order impacts’; see Fig. 1). For this reason, most PRO label claims approved by the FDA in recent years have focused on the measurement of symptom severity or functioning, and are often the primary outcome of the therapy under consideration [7, 8]. Between 2006 and 2010, a total of 116 products were approved by the FDA, of which 28 received PRO label claims, with symptoms and functioning claims prominent (24 and 7 products, respectively) [8]. In the same period, 26 products were denied a PRO label claim [9].

The EMA regulatory guidance specifically focuses on HRQL, a multidimensional concept encompassing physical, psychological and social components [2]. The concept of HRQL is more distal to the physiological intent of the intervention (‘second-order impact’; Fig. 1). In part, HRQL may reflect a direct effect of improving signs and symptoms of disease and first-order impacts; however, it may also include aspects of the patient’s experience that are due to other factors (e.g. adverse reactions, inconvenience associated with administering/receiving treatment). Although this differential focus is marked, it is important to note that the EMA has approved signs/symptoms PRO endpoints for labelling (e.g. Jakavi for myelofibrosis), and the FDA has approved HRQL endpoints for labelling (e.g. Arcapta for chronic obstructive pulmonary disease). Similarly, both concepts appear in disease-specific product development guidance documents across the agencies, in which the measurement of signs and symptoms are referred to in 59 % of EMA guidances and 86 % of FDA guidances (e.g. allergic rhinitis/asthma, insomnia/sleep, incontinence), and HRQL in 46 and 21 %, respectively (e.g. rheumatoid arthritis, oncology, psoriasis) [10].

Both the EMA [3] and the FDA [2] outline the scientific rigor that should be incorporated into the development and selection of PRO endpoints and instruments for use in label claims. However, beyond these label claims, PRO data are being used as supportive and to aid interpretation of the meaningfulness of changes in non-PRO endpoints. For example, HRQL was measured as an exploratory endpoint in pivotal trials of mirabegron for overactive bladder. The primary and key secondary endpoints of the trial addressed various aspects of incontinence and micturitions, but the HRQL data were used by the sponsor to help interpret the meaning of the changes in the primary and key secondary endpoints for patients. Similarly, the total symptom score of the Myelofibrosis Symptom Assessment Form (MFSAF) was used as a secondary endpoint to support the primary endpoint of reduction in spleen size in the pivotal trial for ruxolitinib. Although an important secondary endpoint in its own right, the supportive nature of the MFSAF to the primary endpoint was, according to the Director of the FDA’s Office of Hematology Oncology Products, “… why we gave the application full approval. One could quibble about the importance of reduction in spleen size, but with reduction in all the symptoms, full approval was warranted” [5]. The agencies share some important common principals; namely, that PRO instruments must be reliable, valid, and interpretable. Further details on what scientific standards would qualify a PRO as ‘fit for purpose’ in regulatory approval are provided by the FDA [2]. In theory, both generic and disease-specific instruments may be used to support label claims in the US and Europe, although disease-specific instruments are generally preferred as they include items that are more responsive to clinical changes [3].

3 The Payer Stakeholders

Local, regional and national payers are tasked with making appropriate resource allocation, reimbursement and/or pricing decisions about new interventions so as to provide high-quality and efficient healthcare to patients. Payer decision-making methodology differs by country, region and plan, but all are interested in relative or comparative safety and efficacy and/or effectiveness as such trials against routine clinical care (often active drug therapy) are preferred where plausible. Heath technology assessments (HTA) are used in a large number of countries to determine the value of new interventions where clinical, humanistic and economic outcomes are all evaluated to determine relative value.

Payer appraisal guidance documents often request the evaluation of humanistic outcomes, or ‘patient-relevant outcomes’ [11], through the measurement of HRQL. This allows HTA agencies to calculate the economic consequences of changes in both quantity and quality of life (QoL) for the purpose of calculating relative efficacy and cost utility (e.g. in England, Scotland, Australia, Spain, Brazil, Republic of Korea, Turkey) [12]. Cost-utility analysis uses data derived from quality-adjusted life-years (QALYs) in order to determine value, a measure of health outcome that assigns to each period of time a weight corresponding to the HRQL during that period. As such, HRQL can be particularly useful in demonstrating product differentiation to payers where no difference on survival or other clinical outcome is observed [6]. For example, to some payers both in oncology [13] and non-fatal conditions (e.g. psoriasis, asthma) where survival is not affected by an intervention, an improvement in HRQL may be “as important as the improvements in efficacy endpoints” [14].

Although no formal incorporation of HRQL exists as yet in payer decision making in the US, interest in comparative effectiveness research [15] and patient-centred outcomes research [16] is increasing among payers. In a recent international survey of HTA agencies, HRQL data were valued more by government payers and independent HTA bodies in the US than among European payers [17]. A recent consensus statement from the American College of Cardiology and the American Heart Association encourages the consideration of cost-value methodologies (incorporating HRQL) when appraising performance measures [18].

To evaluate HRQL, multidimensional instruments with previously derived utility values (representing patient or societal perspectives) are preferred, such as the EQ-5D [19, 20], Health Utilities Index (HUI) [21] or the Short-Form–6 dimension (SF-6D) [22]. Unlike the regulator preference for disease-specific PROs, these measures are all generic. Using generic instruments allows for consistent measurement (including the calculation of QALYs) across indications [14] and transparency to the public in decision making, an important consideration in healthcare systems where payers are required to consider resource allocation across multiple indications. However, there are two key issues with the use of these instruments to evaluate HRQL:

  1. (i)

    HRQL is a composite latent construct incorporating physical, psychological and social components (see Fig. 1). Each of these components contains multiple constructs. For example, in type 1 diabetes, ‘physical’ may include the ability to engage in activities of daily living, ‘psychological’ may include the fear of hypoglycemia and depression, and ‘social’ may include the ability to engage in relationships and diabetes self-management in public (see Fig. 2). However, the EQ-5D, HUI and SF-6D focus narrowly on functional health status and do not incorporate the broad physical, psychological and social components that define HRQL.

    Fig. 2
    figure 2

    Example of a conceptual model applied to type 1 diabetes

  2. (ii)

    Generic measures often demonstrate a further lack of specificity (missing domains specific to signs and symptoms of a given disease or condition). Consequently, these instruments may not demonstrate adequate content validity for the comprehensive evaluation of HRQL for a given disease or condition. In such situations, disease-specific instruments may be used, with mapping algorithms used for the generation of utilities [23]. In healthcare systems where resource allocation is made within indications, payers prefer the use of disease-specific instruments over generic instruments. Consequently, the European Network for HTA have recently advised researchers to include both a utility-based generic PRO instrument and a disease-specific PRO instrument to measure HRQL/health utility in their development programme [14].

Beyond HRQL, other PRO concepts are sometimes utilised in payer decision making; for example, treatment satisfaction may increase payer confidence in medication adherence and persistence [24], willingness to pay may increase payer-perceived benefits of therapy [25], and increased work/school productivity may improve the economic attractiveness of the intervention [26].

Regardless of which concepts are measured using PRO instruments, as with regulatory submissions, payers require that the PROs are scientifically credible, confirmed through evaluation of reliability, validity, responsiveness and acceptability [14, 27].

4 The Healthcare Provider Stakeholders

The provision of care that is respectful of, and responsive to, individual patient preferences, needs and values constitutes one of the core principles of evidence-based medicine [28, 29]. Accordingly, the patient’s perspective on their illness and its management is widely recognised as a key consideration in healthcare decisions in clinical practice. In order to examine the patient’s perspective, it is necessary to ask them directly about their experiences and expectations [2]. However, consultations are often necessarily brief and thus PRO data from trials may be helpful to further differentiate treatment options beyond efficacy and safety considerations alone. For example, recent meta-analyses have shown that there is little difference among available therapies for type 2 diabetes in terms of glycemic control [30, 31], although they do differ in side effect profiles (including hypoglycaemia, weight, nausea), safety concerns, mode, method and frequency of administration. The impacts of these differences are relevant outcomes from the perspective of people with diabetes [32] and should be used in treatment allocation as an adjunct to clinical decision making [28].

Data generated by PRO instruments considered meaningful by regulators and/or payers are likely to also be considered relevant in clinical practice [33]. However, HCPs may also be interested in PRO endpoints that have not been evaluated for label claims or payer decisions, such as specific items and subdomains of PROs which they believe to be relevant to clinical practice, or different concepts altogether. For example, the Effectiveness Guidance Document produced for oncology trials by the Green Park Collaborative [a forum of the Center for Medical Technology Policy (CMTP) in the US] calls for the essential measurement of 14 symptoms in trials of drugs developed for the treatment of advanced cancers, to “provide patients and clinicians with highly relevant information about these products that would not be generated from trials designed solely based on regulatory guidance” [34]. Additionally, non-symptom endpoints such as patient preference and satisfaction, ease of use, patient-led HCP contact time, school/work absenteeism or presenteeism, and time to therapeutic change could potentially factor into clinical decision making if relevant and meaningful data were available. Additionally, prospective collection of adverse events via a PRO instrument assists benefit–risk decisions for pharmaceutical products in clinical practice [35]. PRO assessment has been shown to complement HCP reporting. For example, using the patient-reported version of the National Cancer Institute’s Common Terminology Criteria for Adverse Events (PRO-CTCAE), patient-reported assessments were a better reflection of daily health status when compared with HCP assessments (which better predicted unfavourable clinical events) [36].

Unlike many payers, most HCPs are rarely forced to make clinical decisions across indications (with the exception of patients with significant co-morbidities). As such, disease-specific PRO instruments are more meaningful to HCPs than generic PRO instruments [14]. Sometimes it may be necessary to use PRO instruments that may not be considered ‘fit for purpose’ by regulators or payers in order to enhance understanding for HCPs and use in clinical practice. For example, the Hospital Anxiety and Depression Questionnaire (HADS) [37] is widely used among primary care practitioners and some specialists, with the score thresholds defining heightened anxiety and depression utilised for referral and treatment.

5 The Patient Stakeholders

Patients have responsibility for many healthcare decisions, including when to seek and whether to take medical advice, and whether to adhere to a prescribed intervention [7]. Evidence-based medicine supports the care of individual patients as a top priority [29] and patient involvement in clinical decision making increases empowerment, shown in turn to improve outcomes and valuation of these outcomes, especially when a variety of treatments exist [3842]. A patient’s perspective often differs from that of the HCP [43, 44]. In order to encourage patient understanding and engagement with therapy, the Patient-Centered Outcomes Research Institute (PCORI) encourages researchers to “measure outcomes that people representing the population of interest notice and care about” [45].

Although regulators (particularly the FDA) focus on PRO evaluation of signs, symptoms and first-order impacts, it is often the more holistic picture of the impact of disease and the effectiveness of an intervention which is important to patients, particularly where cure or life extension is implausible [7]. There are many different PRO endpoints that could be collated to provide a holistic picture; however, with burden of administration being a consideration, QoL is a single endpoint that is of significant interest to patients. QoL is distinguishable from HRQL in that it is broader than just physical health: “an individuals’ perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards and concerns” [46]. As such, an intervention can impact QoL (positively or negatively) without directly impacting health per se [47]. To understand this better, researchers may choose to evaluate both HRQL and QoL using complimentary instruments. For example, in a trial of dyspepsia medication, both the 43-item QoL for functional dyspepsia disorders (FDDQL) [48] and the single-item Global Quality of Life Scale (GQLS) [49] could be used to understand the relationship between dyspepsia-specific HRQL and general QoL. However, it should be acknowledged that there is significant confounding in the measurement of QoL which cannot be easily accounted for in clinical research, and that any change in QoL over the course of a study is likely to be only partly related to health, with other demographic and psychosocial factors at play. Therefore, whilst perhaps ‘the ultimate outcome’ [50], it may not be an endpoint with which absolute attribution can be provided for the intervention of interest.

Both HRQL and QoL evaluation are now prominent on many patient advocacy websites and in social media [51]. Such data are often derived from scientific publications as the data rarely appear in product labels but can be beneficial for patients to indicate a desire to utilise an intervention and can be used by sponsors to communicate to HCPs how a disease and treatment affects outcomes that are important to patients [52]. Although it may not be necessary to demonstrate that PRO instruments are ‘fit for purpose’ by regulatory/payer standards for use with these groups, the data must still adhere to good scientific principles such that it can be published with credibility [53].

6 Summary of Stakeholder PRO Needs

To develop commercially successful interventions in the era of evidence-based medicine, pharmaceutical companies need to produce data that are relevant and meaningful to the various stakeholder groups who play a role in determining the availability, pricing and use of interventions; namely, regulators, payers, HCPs and patients. Each of these groups would agree that the patient’s perspective is key in understanding the value of an intervention; endpoints that demonstrate how a patient survives, feels, or functions are important to all but beyond these measures of treatment efficacy/effectiveness, their PRO focus is different. Based on an examination of the types of concepts included in product labelling, the US regulator (FDA) appears to be focused primarily on signs, symptoms and first-order impacts, evaluated using disease-specific instruments with shorter (e.g. <7 days) recall periods. Although the FDA rarely look beyond efficacy/effectiveness endpoints, the European regulator (EMA) considers outcomes more distal to the physiological intent of the intervention, i.e. those that may be considered as a direct effect of treatment efficacy, such as HRQL (and its constituent physical, psychological and social components). Importantly, HRQL is only considered as a relevant endpoint if efficacy and safety have been demonstrated on the primary endpoint (i.e. hierarchical testing).

The payer too is increasingly interested in HRQL endpoints, which are evaluated alongside treatment efficacy/effectiveness to substantiate an incremental benefit versus standard of care (i.e. increased functionality). Generic utility-generating instruments and disease-specific PROs are used by different payers, depending on the payer system.

HCPs and patients are interested in a broader set of endpoints that may be used to aid clinical decision making, measured using disease-specific instruments. For patients, the primary interest is in QoL. For all audiences, proof of content validity of the instrument for the disease of interest is required as a prerequisite of its use, as well as a definition of clinically meaningful changes in PRO instrument scores.

One further stakeholder group worthy of note is the investor. Although the focus of many small companies funded by venture capital firms or private investors is simply to generate robust safety and efficacy data from phase I and II, PRO instruments can capture direct measures of treatment benefit, which can give investors and potential purchasers increased confidence that a product will be approved if phase III objectives are met.

7 Meeting the PRO Needs of Multiple Stakeholders in One Development Programme

For a new intervention to achieve commercial success, regulators (to provide marketing authorisation), payers (for reimbursement and formulary placement), HCPs (to prescribe) and patients (to adhere and persist) must all consider the intervention to be valuable. Based on the heterogeneous needs of the various stakeholders, the incorporation of PRO endpoints into clinical development programmes may seem overwhelming; however, it is possible with the adoption of a robust and systematic PRO strategy, considered early in clinical development. Considering PRO endpoints early in clinical development is key so that (i) there is sufficient time to develop, adjust, or confirm the psychometric measurement properties of a PRO instrument(s) in the disease of interest; (ii) the company can appraise the sensitivity of a chosen instrument in non-pivotal trials; and (iii) a more holistic understanding of the biopsychosocial impact of the drug can be provided prior to engaging in large-scale clinical development. The latter is particularly important in the era of patient-defined benefit–risk trade-offs in drug development, in which ensuring that clinical improvements are not achieved with detriment in PRO endpoints over the course of a clinical study (no worsening of scores within the group) will become increasingly important [54]. Although the inclusion of disease-specific PROs in early-phase research may not provide optimal return on investment if the target patient population is altered throughout the clinical development programme, inclusion of generic PRO instruments can still provide useful data and can be extremely valuable in decision making around the molecule.

By starting early, a comprehensive, well-conceived endpoint strategy to meet the various stakeholder needs can be developed. For example, developing an endpoint strategy for a type 1 diabetes programme would utilise the conceptual model displayed in Fig. 2 to include outcome measures for these concepts, as well as generating healthcare system outcomes. This is summarised in Table 1, in which up to 11 PRO concepts are considered for measurement. It may be perceived that collection of all 11 PRO concepts is too burdensome to the patient or clinical site, and too costly for the company; however, it is unlikely that a company would choose to measure all concepts in a single study. The selection of which PRO endpoints/instruments to evaluate in which trial will depend on programme-specific criteria, including the number of unstarted planned studies (phases II–IV), and study-specific criteria, including the specific study population, stage of clinical development, comparator therapy, and the intricacies of study design (e.g. blinding, duration of study, trial or non-trial study). For example, signs, symptoms, side effects and HRQL are arguably most reliably evaluated in a blinded trial, whereas preference and satisfaction may be better conceived as endpoints in an open-label trial. Some endpoints may not be relevant in phase III pivotal trials. For example, if a new medication is to be administered via intravenous injection, one may wish to demonstrate to regulators, payers and HCPs in phase II trials that positive biomedical data was not achieved at the expense of significantly elevated anxiety levels, or that associated anxiety of initiating injectable therapy dissipated within a few administrations. It may not be necessary to replicate this finding in phase III. HCP contact time and absenteeism, relevant for payer, HCPs and patient stakeholders, are best evaluated outside of the confines of a protocol-mandated trial environment, in more naturalistic (pragmatic/observational) phase IIIb/IV studies.

Table 1 An example PRO endpoint strategy for a hypothetical type 1 diabetes therapy to meet the needs of the main stakeholder groups (X indicates where data may be usefula)

Therefore, in developing a clinical programme, we urge consideration of the following key questions:

  1. (1)

    Which endpoints are relevant to the four key stakeholder groups for the population under investigation?

  2. (2)

    Are any of these endpoints most appropriately evaluated directly by the patient?

If the answer to the second question is ‘yes’, then we propose developing a PRO plan, such as that in Table 1, mapping the proposed PRO endpoints onto the relevant stakeholders. Once this is complete, existing ‘off the shelf’ PRO instruments can be evaluated in terms of whether they will be considered ‘fit for purpose’ by the appropriate stakeholder group, per above. Once appropriate PRO instruments have been selected/adapted/developed for use, the clinical programme should be carefully considered so as to apply the PRO instruments in studies/trials in which they are most suited. To reduce burden and cost of collecting multiple PRO endpoints within a trial, strategies such as electronic data capture, subgroup PRO completion, or administration of different PRO instruments to random subgroups of patients (provided there is sufficient power to do so) can be implemented.

PRO data can be difficult to interpret. Although few stakeholders question the added value of PRO instruments, researchers need to provide additional clarity on the relevance of selected PRO endpoints and their place in the endpoint hierarchy (statistically controlled to reduce multiplicity), the appropriateness of the PRO instruments chosen to evaluate the endpoints, and the meaningfulness of the observed data [6]. The use of a responder definition, using various methods (both anchor-based and statistical) to report the proportion of patients who reported a clinical meaningful change is useful to supplement mean change data [55]. For completeness, the cumulative distribution function of responses can be presented to allow a variety of minimally important changes (MICs) to be examined simultaneously and collectively [2, 56]. Interpretation can be enhanced though use of the PRO CONSORT extension when reporting PROs [53].

8 Conclusions

This article provides an overview of the similarities and differences in the manner with which various stakeholders seek, use and interpret data about the patient’s experience and from the patient’s perspective related to healthcare interventions. Furthermore, we have provided guidance on how to efficiently incorporate PRO instruments into a clinical development programme in order to maximise the validity and acceptability of the data generated, thereby ensuring adequate evidence relative to the perspective of the patient is provided to all stakeholder groups. The value in generating these data in a robust manner is that we can better ensure that the provision of care is respectful of, and responsive to, individual patient experiences, which constitutes one of the core principles of evidence-based medicine. Furthermore, inclusion of the patient’s perspective helps determine the appropriateness and ultimate effectiveness (and value) of an intervention for the patient, improving both the clinical and commercial success of an intervention.