Introduction

Few factors are as important to the design, interpretation and impact of a trial as the choice of outcomes. Most clinicians now recognise the benefits of incorporating the views of patients alongside clinician selected measures of biomedical efficacy. Patient selected and reported outcomes can complement traditional clinician measures of health outcomes and potentially improve patient engagement by reflecting real-world concerns and assist with shared decision-making.

In this review, I discuss the rationale, practicalities and pitfalls of including patient-related outcome measures in clinical trials of effectiveness. I outline key definitions of the terminology used in this area in Table 1. An example of “patient-reported outcomes (PROs)” is fatigue, which is measured with “patient-reported outcome measures (PROMs)” such as “The Brief Fatigue Inventory”, which is a measure of the severity and impact of cancer-related fatigue and may be incorporated into a clinical trial [1].

Table 1 Terminology and definitions related to outcomes

Why Are Patient-Reported Outcomes Relevant?

Including PROs in research studies brings a wide range of benefits. Not least, some aspects of patient care such as symptoms and quality of life are best assessed directly by patients. It avoids the observer bias that may be introduced when study personnel make judgments about patient symptoms [2]. Patients value and benefit from being involved with the research process and can be involved from the start of the study design [3,4,5]. Including the patient perspective also allows a more rounded interpretation of the treatment under investigation. It also increases public accountability of healthcare researchers and professionals [6]. The response rates are usually better than clinician-assessed outcomes (a patient only needs to complete a few questionnaires in a clinical trial, but a clinician would need to do it for every patient) [2]. Finally, PROs are critical for informing health services and planning for adequate resources during treatment [2, 6, 7].

Recognition of these benefits has led to many agencies to mandate the use of PROMs in clinical trials and/or practice in multiple sectors, settings, and contexts. Both the US Food and Drug Administration [8••] and the European Medicines Agency [9] have released guidelines mandating the use of PROMs to support medication labelling claims. The National Health Service in the UK has mandated the use of PROMs for certain elective surgical patients for over a decade [10]. The Australian Commission on Safety and Quality in Health Care promotes the use of PROMs as part of their overall goal to improve value and sustainability within the healthcare system [11]. The Consumer-Purchaser Alliance [12], the Patient-Centered Outcomes Research Institute (PCORI) [13], and a range of other patient advocacy groups and initiatives now also promote the use of PROMs.

Finally, the role of PROMs has evolved and expanded. PROMs are now included in quality improvement projects, audits and financial reimbursement schemes, and PROMs are even being incorporated into daily care routines at the patient bedside [2, 6, 14]. The National Institute of Health has developed the Patient-Reported Outcomes Measurement Information System (PROMIS), which monitors patient self-reported health status and experiences regularly using a short computerised adaptive testing to facilitate the integration of PROMs in a wide variety of settings and contexts [15].

There are strong justifications for including PROMs in a variety of settings; however, the use of PROMs in research studies should still be carefully considered to avoid unnecessary cost, complexity and tokenism. Researchers, patients and other key decision makers should justify the use and choice of PROMs with pre-specified hypothesis. This practice is promoted by the SPIRIT-PRO (trial protocol) and CONSORT-PRO (trial report) guidelines [16, 17]; however, to date, many studies have failed to follow it [18].

Choosing the Right Patient-Reported Outcome Measures (PROMs)

Once a decision has been made to incorporate PROs in a trial, the first step is to define the outcomes of interest. This facilitates the choice of PROMs. PROMs may be ‘generic’ or ‘specific’ in nature. Researchers can make use of generic PROMs across a range of clinical conditions, for example, the Health Utilities Index (HUI) [19]. Table 2 lists some commonly used generic PROMs. Specific PROMs are designed for use with a defined disease, population, symptom or function, which increases the PROMs’ credibility but reduces the opportunity to compare results across conditions and populations. Often both generic and specific PROMs are used together to combine the advantages.

Table 2 Commonly used generic PROMs

We can find overviews of available PROMs in systematic reviews of outcome measurement instruments and in published core outcome sets. The benefit of using systematic reviews is that they will have assessed all available instruments and the instrument’s quality in specific populations. The COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) Database for Systematic Reviews is a freely accessible resource for locating systematic reviews of PROMs [26, 27•]. Similarly, core outcome sets often provide recommendations on which PROMs should be used and we can locate many through the COMET (Core Outcome Measures in Effectiveness Trials) initiative database [28]. If systematic reviews or core outcome sets are not available, then a search of the primary literature can be performed and is assisted by the use of specific search filters [29]. Alternatively, some subscription-based services contain databases of PROMs [30].

When selecting PROMs, it is important that the PROMs are valid, reliable and clinically useful. While others have developed tools to assist with this assessment, such as the EMPRO tool [31], these tools are generally more helpful for health outcome specialists and methodologists who are involved in the development of health outcome measures for clinical trials. The assessment of validity, reliability and utility requires evaluation of both the questions included in a PROM and the supporting evidence/documentation provided by its developers. The following section briefly outlines key attributes that should be considered when selecting a PROM.

Reliability

Reliability (or internal consistency) reflects the ability of the instrument to produce the same scores on repeated administration of an instrument in stable respondents (measurement error or test-retest reliability) and differentiate between patients [32]. We should also check the PROM for inter-observer reliability, ideally producing a reliability coefficient over 0.75 [33]. Lack of reliability can obscure true intervention effects because of randomness, contributing to type II error.

Validity

Validity refers to whether the instrument is measuring what we intend it to measure. We can further break this down into the following:

Content Validity

Does the PROM cover all the relevant and important aspects of the condition/symptom for which it is designed [33].

Construct Validity

We might design a PROM to measure a single construct (unidimensional) or multiple (multi-dimensional). We expect these constructs to have a relationship with other constructs, for example, the pain construct is related to the analgesic use construct. We may expect patients experiencing more severe pain to take more analgesics. Construct validity is assessed by comparing the scores produced by a PROM with sets of related variables/constructs [33]. To facilitate the interpretation of results, we should specify an expected level of correlation at the outset of studies. A correlation coefficient of ≥ 0.4 is usually considered acceptable [33].

Criterion Validity

Criterion validity is determined when a PROM is correlated with an external criterion, usually another instrument or measure that is regarded as a ‘gold standard’. When the correlation is explored at the same time, then it is described as ‘concurrent validation’. When the new measure is compared with a criterion that is measured later, this type of validation is called ‘predictive validation’. Depending on the area of patient-reported health measurement, a criterion or ‘gold standard’ measure may not exist. A correlation coefficient ≥ 0.8 is usually considered acceptable for criterion validity [33].

Responsiveness

The responsiveness of a PROM or ‘ability to detect change’ reflects its ability to distinguish among patients who remain the same, improve or deteriorate over time [33].

Population Suitability

Finally, consider if the PROM is suitable for the study population or whether it needs to undergo a proper cross-cultural validation process [34, 35].

PROMs deficient in the above areas are unlikely to provide useful measures of treatment efficacy.

Challenges and Pitfalls

Some PROMs include a relatively large number of questions and so take a long time to complete. To reduce the time and cost of collection, analysis and presentation, many are now administered in an electronic format. Traditionally, it was considered that mailed surveys had higher response rates compared to electronic surveys, but more recent evidence suggests that this may not be true in some settings [36]. Electronically administered PROMs have the advantage of facilitating skip logic and computer-adaptive testing, as with the PROMIS system [15].

It is important to try to achieve high rates of patient participation in vulnerable populations so that the results remain generalisable (maximise external validity). The very young, old or sick, and people from culturally diverse backgrounds may need extra help to ensure response rates remain high. Literacy levels will vary depending on the population being studied but for most countries are low (see https://www.oecd.org/skills/piaac/ for further information) [37]. In general, written information should not exceed grade 6 level and inclusion of pictograms may improve PROM reliability [38]. Similarly, age-specific PROMs or an observer-reported outcome can be used for younger children, where it may not be suitable to use a PROM designed for a literate adult. To date, several innovative digital health platforms have helped to capture the child’s perspective of symptoms and shared decision-making [39, 40].

It is challenging to summarise, present and combine the results of PROMs which assess over one construct (e.g. the Health Utilities Index assesses Emotion, Cognition and Pain among other outcomes) [19]. This becomes more difficult when PROMs disagree with each other such as when a generic PROM suggests improved quality of life, but the disease specific PROM suggests a reduced quality of life. In addition, the results are often not readily interpretable. For example, for a result presented as 1 showing perfect health and a score of 0 showing death, what would a difference of 0.1 represent? This concept of the ‘minimum important difference’ should be defined in the trial publications [41]. A recently published paper outlines some of these guiding principles for analysing a PROM in cancer clinical trials; however, the principles apply more broadly and act as a useful resource for creating a statistical management plan [42•].

When selecting PROMs, researchers often focus on efficacy outcomes but we should not forget safety and adverse event-related outcomes. PROMs are an excellent solution to detecting adverse events that other biomedical measures may not detect, as illustrated by the PRO-CTCAE (Patient-Reported Outcome Common Terminology Criteria for Adverse Events) tool [43].

Finally, as many PROMs are proprietary, there is often a cost to the researcher for the licenced use of a PROM. Enquiries need to be made to the owners of the relevant PROM before we include it in a research study.

Vignette

The following vignette shows some key steps related to using patient reported outcome measures in clinical trials.

  • A researcher is conducting a trial of a novel therapeutic agent to treat osteoporosis in postmenopausal women. The researcher involves a patient representative in the design phase of the trial. Following discussion with all members of the research team, they decide that the primary trial outcome will be incidence of vertebral compression fractures. The patient representative suggests that quality of life is also important to assess, which the other researchers agree should be included given the potential side effect profile of the new medication.

  • A search of the COSMIN Database for Systematic Reviews finds a published review on ‘Patient-reported outcome measures in older people with hip fracture: a systematic review of quality and acceptability’ [44]. This review discusses a range of PROMs for assessing quality of life but one in particular looks relevant: OPAQ-2-Osteoporosis Quality of Life Questionnaire, version 2 [45]. This disease-specific PROM comprises 67 items completed via a self-reported questionnaire. It requires on average 20–30 min to complete. Reviewing the 67 items shows the questions are likely to be reliable, valid and responsive. The PROM has been previously validated for a similar population and so it is listed as a secondary outcome in the trial protocol, which is written following the SPIRIT-PRO guideline [17] and subsequently registered with the local clinical trials registry, e.g. ClinicalTrials.gov [46].

  • Training is subsequently provided to trial staff and management so that the PROM is administered in a standardised way across sites and routinely screened for avoidable missing data to maximise data quality and minimise risk of bias [18, 47].

  • When the results are obtained, it shows a small but significant benefit for the new treatment according to the primary outcome but a moderate to large statistically significant improvement for the OPAQ-2 score. The research team present these results clearly to facilitate interpretation following the guidance of the CONSORT-PRO extension [16]. The positive results shown by the PROM subsequently facilitate the research team’s application for regulatory approval of the therapeutic agent.

Conclusions

The increased focus towards shared decision-making, and understanding patient experiences and values has been fundamental to the greater use of patient-reported outcomes. The use of PROMs brings a range of benefits to both researchers and patients. While there are several issues to consider prior to their use, a range of resources are available to facilitate their inclusion in clinical research.