Introduction

Interest in measuring patient-reported outcomes measures (PROMs) in orthopaedic surgery trace back more than 100 years to Codman’s “End Results Idea,” but began to gain momentum approximately 25 years ago with the stated purposes of improving patient care and communicating the results of treatment [36, 51]. These early efforts catalyzed a shift in focus from the physician’s assessment of technical success of treatment to a patient-centered self-assessment of health.

Orthopaedic outcomes data have been used by the American Academy of Orthopaedic Surgeons (AAOS), the Agency for Healthcare Research and Quality, and others to develop clinical practice guidelines for common orthopaedic conditions. However, outcomes measures have not been collected widely during general clinical practice, although such data may be used to improve patient care and inform stakeholders, including patients, providers, payers, and policy makers about the relative benefits of various musculoskeletal treatments. Stakeholders at all levels of the healthcare system are interested in analyzing outcome measures collected in clinical practice. Each stakeholder group represents a different perspective and need different types of information that may be obtained and reported in different ways and at different levels, which presents challenges when conducting and applying clinical research. Barriers to outcomes data collection in general practice include increased burden for the patient, requirements for provider and staff time, high personnel and material costs, poor technical support, and a lack of any clear benefit to the either patient or the surgeon [82]. The challenges presented by these barriers are real but not insurmountable.

In this review, we discuss (1) PROMs definitions and applications as distinguished from conventional clinical research; (2) the perspectives of outcomes stakeholders; and (3) current challenges of PROMs data collection and methods which may enhance collection and reporting in clinical practice in the future.

Where Are We Now?

Conventional Clinical Research Versus PROMs

In conventional clinical research, the evaluation is performed by the clinician, whereas in PROMs, the result of care is evaluated by the patient. The historical focus of conventional clinical research in orthopaedics has been on the technical success of treatment for reducing impairment and restoring structure or function, as well as objective indicators such as mortality, morbidity, and complications. Forms and questionnaires used in these types of studies include such elements as joint stability, ROM, and the clinician’s assessment of the patient’s ability to walk or climb stairs.

In the PROMs framework, the focus is to reduce disability rather than just impairment, so the effects of treatment are described in terms of relief of symptoms, restoring or improving functional ability, ability to participate in typical social roles, and restoring or improving quality of life [13, 18, 89, 92]. PROMs typically include such domains as physical function, pain/symptoms, emotional function, well-being, ability to participate in activities or social roles, perceptions of health and function, and satisfaction with treatment.

Generic Measures, Utility, and Quality-Adjusted Life Years

PROMs can be described in two categories: generic measures and specific measures. Generic measures provide evaluation of health status or quality of life and enable analysis of data across a wide variety of disease states and injuries. For example, one can use a generic measure to compare the relative impact of different disease states (eg, diabetes, heart disease, depression). Various generic measure instruments have been described in the literature (Table 1). Of particular interest to orthopaedics, these generic measures can also be utilized to compare the relative impact on general health of various musculoskeletal conditions to the impact of nonmusculoskeletal conditions.

Table 1 Examples of generic health-related quality of life instruments

One way to compare diseases and injuries is to estimate utility scores, which are numeric values representing patient preferences for a health state. By definition, perfect health has a utility score of 1, and death has a utility score of 0. Several methods are available to estimate utilities, including the standard gamble [7, 63, 94, 95], VASs [33, 75, 96], multiattribute scales (eg, SF-6D, EQ-5D [formerly known as the EuroQol], Health Utilities Index) [9, 10, 29, 31, 34, 97], and the time trade-off [33, 75, 94, 96].

By way of example, with the time trade-off technique, a patient is presented the theoretical scenario of having 10 years of life remaining and asked to choose between living in current health for 10 years, or trading to live in perfect health for fewer years. If they choose 10 years, indicating no preference between the current state and a reduced lifespan in perfect health, the utility is 1. If they are willing to trade some years of life to gain perfect health, utility is estimated by dividing the value traded down to by 10. If a patient expresses no preference between 10 years in their current health and 6 years of life in perfect health, the estimated utility is 0.6. The impact of various orthopaedic and nonorthopaedic diseases and injuries on utility scores is demonstrated (Fig. 1).

Fig. 1
figure 1

A graph shows the preference-based utility scores for various orthopaedic and nonorthopaedic conditions. Data include tibial nonunion scores from 260 consecutive patients in our practice, knee osteoarthritis (OA) from Manheim et al. [59], and chronic hip OA from Bozic et al. [6]; all other scores are from Sullivan and Ghushchyan [91].

In one example of how to apply such information clinically, Bozic and Chiu [5] described a shared decision model under development which uses a patient’s time trade-off rating when deciding to undergo arthroplasty for osteoarthritis. The trade-off task has the patient express preference for living a reduced lifespan in excellent health versus living a full lifespan in the current (arthritic) state. The patient’s time trade-off measures are converted into a utility score, and that score is entered into a mathematical model that provides the probabilities of the possible outcomes for each treatment. The utility information is then used by the surgeon to discuss with the patient which option would be most likely to lead to the optimal outcome and quality of life for that particular patient [5].

Utilities can be used to estimate quality-adjusted life years (QALYs), which represents the quantity of years of life adjusted for the quality of the patient’s health state during those years. An individual QALY may be estimated by QALY = (utility of current health state) × (expected lifespan in current health state). Ten years of life in a health state with an estimated utility of 0.5 equals 5 QALYs (10 years × utility of 0.5), which has the same relative value as 5 years in perfect health (5 years × utility of 1.0). Estimating QALYs using utility measures and actuarial life tables allows one to quantify, and therefore compare, the relative impact of disease or injury across medical conditions.

A QALY provides a metric for comparing the relative effectiveness of various treatments for all types of medical conditions. If treatment affects utility but not the expected lifespan, change in QALY due to treatment = (utility posttreatment – utility pretreatment) × expected lifespan years. If treatment affects both utility and expected lifespan, change in QALY due to treatment = (utility posttreatment × lifespan years posttreatment) – (utility pretreatment × lifespan years pretreatment). One may also estimate QALYs using different utility values for different spans of time, as may occur with degenerative disease (eg, utility for Years 1–3 = 0.8, Years 4–6 = 0.6, Years 7–10 = 0.5) or adjust QALYs using discounting to account for the lesser value of future benefits, similar to discounting a future currency value to describe its lost buying power due to inflation [83, 106]. The estimated impact of various orthopaedic and medical treatments on QALYs has been reported (Table 2). For example, primary THA has been reported to result in 1.3 QALYs, which means that the average gain in quality of life is comparable to an additional 1.3 years of perfect health. While THA does not restore perfect health, the difference in preferences between the preoperative and postoperative states is great enough to cause a meaningful increase in QALY.

Table 2 Examples of quality-adjusted life years (QALY) estimates for various medical treatments

QALYs may be used in cost-effectiveness and cost-utility studies to compare the relative effects and costs of two or more treatments or treatment to no treatment [8, 87, 101], although comparing QALYs across studies is challenging given the various methods used to estimate utility scores and compute QALYs [7, 73]. New treatment methods often have higher costs compared to the current standard treatment but would ideally also produce better results. One way to evaluate the cost of the improved outcome is to divide the difference in cost by the difference in QALY between treatments to obtain the estimated cost per additional QALY, typically with both costs and QALY discounted [83], which is called incremental cost-effectiveness or the cost-utility ratio [8]. Current convention suggests that USD 50,000 to USD 100,000 per QALY indicates good value for the new treatment [8], meaning that the increased cost may be justified by the associated increase in health-related quality of life. For example, Vitale and colleagues [103] reported a cost-effectiveness ratio for rotator cuff repair between USD 3091 and USD 13,093 per QALY, which was comparable to reported values for total hip arthroplasty (USD 8031/QALY) and coronary artery bypass graft (USD 14,300/QALY) and considerably better than medical treatment of hypertension (USD 28,000/QALY). Their results indicate that rotator cuff repair, total hip arthroplasty, and coronary artery bypass graft have similar costs per improvement in quality of life, and those procedures have lower cost than treatment of hypertension relative to the gain in quality of life.

Specific Measures

Specific measures enable a more detailed assessment of outcomes related to a particular injury, disease, or anatomic region. Measures that are specific to a certain anatomic region (eg, lower extremity) or specific disease state (eg, inflammatory musculoskeletal conditions), focus on symptoms and functions that pertain only to that region or disease. Various specific measurement instruments have been described in the literature (Table 3).

Table 3 Examples of musculoskeletal-specific health-related quality of life instruments

The current recommendation calls for researchers to include at least one generic measure and one specific measure in their clinical outcomes data set. This approach will capture the advantages and mitigate the limitations of each measure (Table 4).

Table 4 Comparison of advantages and limitations of generic and specific outcomes measures

Measurement Properties

Reliability, validity, and responsiveness of outcomes measures are important properties — one cannot make meaningful conclusions with data that are imprecise, inaccurate, or insensitive to change. A valid measure is one in which the scores represent variation in the intended domain, implying it is measuring accurately what it is intended to measure. A measure with good reliability is one in which the scores recorded are both precise and reproducible. A responsive measure is one that is sensitive to small clinical differences. Meaningful data are those in which differences in scores represent important clinical differences.

These properties are specific to the populations and contexts in which they have been estimated. A measure that has been shown to be valid in one population and for one purpose cannot be assumed to be valid in a different population or for a different purpose. For example, using the IKDC Subjective Knee Evaluation form for active patients who are in their early 20s to evaluate outcome after surgical treatment of an AO/OTA B3.2 tibial plateau fracture may be appropriate, but the use of the IKDC form, which inquires about function such as running, jumping, and pivoting, may not be assumed to be reliable, valid, or responsive in a population of sedentary patients in their late 60s despite having the same injury and treatment.

Types of Studies

A variety of study designs are possible using PROMs, and it is important to clearly define the objective of any research effort well before the study commences. Collecting data without a clearly defined research objective or specific questions always leads to a failed project and wasted time and resources. For instance, lack of a clearly defined purpose or plan for using the collected data was a primary reason the AAOS’ Musculoskeletal Outcomes Data Evaluation and Management System (MODEMS™) was ultimately cancelled after failing to achieve its target enrollment despite extremely large investments of time, effort, and money [82]. Well-defined research questions should drive the design for data collection rather than hoping to develop meaningful hypotheses or conclusions after the fact.

PROMs can be used to study the impact of disease or injury on health status. At our institution, we have studied how health status is affected by shoulder conditions [38], knee conditions [69], and nonunions of various long bones [11, 12]. These studies show that many orthopaedic conditions are very disabling, resulting in physical functioning scores in the lowest quartile of the US population and well below many chronic disease states viewed as disabling. For example, our study of tibial nonunions in a consecutive series of 260 patients is currently undergoing peer-review, but indicates that the physical health impact is far more disabling than for many other chronic diseases, including heart failure, diabetes mellitus Type II, asthma, and hypertension. The results of such studies allow comparisons of effects of disease or injury on patient’s self-reported perceptions of their health state, including their ability to participate in their necessary activities and social roles across many different types of health conditions.

PROMs can also be used to study the effects of treatment. Investigators at our institution have studied how orthopaedic surgery interventions improve health status following rotator cuff repair [37, 39], glenohumeral instability [40, 41], shoulder arthroplasty [27, 28], and various nonunion treatments [11, 12]. We have found the improvements in health-related quality of life after orthopaedic treatment to be quite dramatic.

For instance, we studied a consecutive series of 23 patients aged 60 years or older who had tibial nonunions. These patients were referred to us an average of 13 months after injury and had been offered amputation as a treatment option by one or more other physicians. After treatment using the Ilizarov method, AAOS Lower Limb Core Scores improved from 39 to 78, SF-12 physical component summary scores improved from 27 to 35, pain decreased from 3.6 of 10 to 0.9 of 10, and the patients gained an average of 5.3 QALYs [11]. In most cases, the AAOS Lower Limb Core and SF-12 physical component summary scores improved to near the age-specific normative values for the healthy population. Such studies can be very useful in demonstrating and comparing the effectiveness and value of orthopaedic surgery relative to other medical treatments.

Where Do We Need to Go?

A key issue for future work with collecting patient-reported outcomes measures is to identify well-defined research questions and what the ultimate uses of the data may be. Lack of a clear purpose and ultimate uses for data collection during the design and planning phases is likely to lead to wasted resources and poor-quality data. One way to ensure that the collected information will be of use to the end users is to engage the various stakeholders who may be interested in such data early in the process.

Patients and their families want to identify the best treatment for their medical condition. Clinical outcomes studies can inform patients about which treatment provides the best opportunity for recovery and may assist in setting realistic expectations, provided the outcomes are expressed in ways that are meaningful to the patient, such as describing restoration of basic abilities (eg, walking, reaching overhead) and return to usual daily activities and roles.

Providers seek to identify best treatments to improve quality of care for their patients. The provider perspective is informed by both clinician-based measures of the technical success of treatment, including impairment-based evaluation such as fracture healing rate or restoration of joint stability, as well as PROMs to capture the patient’s perspective. In addition, providers desire information that is applicable to their practice in terms of specific patient populations and practice characteristics.

Payers, including both private insurance and public agencies and institutions, seek to determine which treatment options provide optimal results for large groups of patients and large groups of providers. Optimal results include not only technical success and PROMs but also aspects such as complication rates, recovery time (eg, return to work time), and costs across and beyond the episode of care. This focus on group-level results rather than individual results requires information from multiple providers and, ideally, multiple practice types (eg, academic or hospital-based versus private practice), facilities, and geographic locations. Comparative effectiveness research, systematic reviews, and large state-based or insurance databases or registries may be used to provide such information.

Policy makers include local, state, and federal governments and agencies, as well as professional associations and organizations that establish and promote standards of care. Similar to payers, policy makers have an interest in identifying effective treatments but may also compare outcome measures and effectiveness across medical conditions, in some sense evaluating which conditions would be most beneficial to treat on a societal scale. For example, policy makers may be interested in not only determining the best treatment options for hip arthrosis but also the value of treating hip arthrosis relative to treating other health conditions. Desired measures of effectiveness typically include complications, recovery time, mortality, and disability and costs, as well as PROMs. Considering the scope of the desired information, these data may be best captured in institutional or national registries and data systems (eg, Medicare claims data [76], Healthcare Effectiveness Data and Information Set [HEDIS] [64]).

How Do We Get There?

An emerging issue in clinical research is how to collect outcomes measures that address as many of the needs of these consumers as possible. The wide array of information desired for decision making at various levels would be aided by identifying the purpose of data collection, including the objectives, research questions, and patient populations, and by instituting widespread collection of outcomes data from practicing clinicians. This encompassing approach to clinical outcomes research will provide evidence that can be used to identify best practices, benefits, costs, and patient or practice characteristics associated with outcomes.

Obtaining outcomes data on a wide scale from community providers requires that the data collection process be feasible in terms of cost, time, and effort for everyone involved, including the surgeon, the patient, the clinic staff, and the information technology services. The ultimate goal would be to make the collection of outcomes data for the target patients part of the routine office visit, part of the delivery of care process, instead of a costly burden or a hindrance to clinical efficiency. Features and implementation of such practice-friendly data collection systems have been described very recently in the literature [32, 42, 85]. Important issues to address when designing a data collection plan include desired outcomes (eg, clinical, administrative/financial, patient-centered), identified data users (eg, patients or advocacy groups, surgeons, payers), important patient and care-related characteristics (eg, severity of condition, comorbidities, surgical approach or implants), timing and amount of data collection, plans for mitigating attrition and missing data, and overall data management [82].

Our group practice created and implemented such a system in the late 1990s that continues to be used on a daily basis. Our objectives for the system were to make productive use of patient waiting time requiring minimal staff assistance to minimize physician involvement in data collection, make use of technology to capture, store, and link data sources, and allow flexibility such that expansion or modifications to the system would require minimal resources.

After check-in at the front desk and before being taken to an examination room, if the patient is identified as part of an active study population, he or she is escorted to a private room that contains a touchscreen computer to complete the outcomes process, eliminating secondary data entry errors as occur with paper forms. The patient escort logs into the patient’s account in the system, which then shows the data entry screen. Most of the instructions for using the system are shown on the screen, thus minimizing the amount of staff time. The touchscreen information is available in English and Spanish. The patient is taken through a preprogrammed set of outcome instruments selected by the surgeon. The time to complete the data collection lasts from 15 to 20 minutes, which is equivalent to the average time previously spent waiting in the clinic’s main waiting room or in the examination room in our practice.

Once the data entry is complete, the patient is escorted to the examination room, and the clinical visit proceeds. The data can be accessed immediately by clinic staff and the surgeon, if desired. The instruments are scored and then stored securely in a central database, which can be accessed and abstracted directly by the surgeon and research staff without involving information services staff. The system can be expanded to include any desired outcome measures and is coded to facilitate data abstraction and merging with the practice’s other information systems. After the initial build and hardware investment, this system utilizes primarily existing staff and clinic space and has very low maintenance cost. It has not interfered with the patient’s time burden, other functions of the clinic visit, or the delivery of care.

An additional consideration in clinical research is whether an outcome measure provides the surgeon and patient with information they both value and contains some use in patient care. Measures that have no clinical meaning are unlikely be collected. Clinical and administrative demands of orthopaedic practice leave little time and resources for activity that is unrelated to diagnosis, prognosis, evaluation of progress, or delivery of care. Patients are hesitant to provide information unrelated to their care. A valid pain assessment tool can provide information that may inform diagnosis, treatment, and recovery in meaningful quantitative and qualitative terms, serving patient care while providing data for research purposes.

With PROMs, describing other factors such as an individual patient’s age, sex, unique physiology, comorbidities, socioeconomic status, occupation, environment, and culture becomes more relevant. Different patients report different ratings for the same state of health, and these contextual issues may be related to this observed variability in outcomes and PROMs. Consequently, clinical research may need to include measures of such contextual information in analyses to aid interpretation. Clinicians can use this information to determine how closely a particular study’s sample represents the patients in his or her practice, as well as whether the described treatment is appropriate for any particular patient. This relation of patient characteristics, values, and expectations to outcomes is central to the concept of evidence-based medicine and comparative effectiveness research.

Finally, patients who do not return for adequate followup after treatment, and patients who fail to respond to followup surveys in cohort or surveillance studies create issues for data analysis and interpretation. Such patients often have worse health and outcomes and are more likely to have received treatment from another provider [52, 56, 67]. Consequently, ignoring the missing cases and analyzing only the data from those patients who completed followup or who responded to all surveys may result in better-appearing outcomes than may actually be the case.

National registries are one reliable way to capture the final disposition of patients lost to follow up [81], although registries do not currently exist for tracking all orthopaedic diagnoses or surgical procedures. Also, registries have typically not included PROMs, but some recent work has shown that such measures can been added with little additional cost or patient or provider burden [2, 32, 72]. Other strategies for reducing attrition and missing data have been reviewed by the National Research Council and are briefly outlined in our companion article in this issue [65, 68]. One key is to focus on following only those patients who are necessary to address the research questions, which requires the questions to be well-defined at the onset of the data collection. Attempting to follow all of the patients of a practice, facility, or institution is usually infeasible and ineffective. Consequently, attaining complete followup in longitudinal studies is a major unresolved issue and challenge in outcomes research. Methodological investigations are currently underway to address this issue, such as calls for research to “develop innovative ways to reduce loss to followup as registries encompass longer time periods.” [71]. The establishment and maintenance of a comprehensive patient-centered outcomes data collection system designed for following patients over extended periods of time (> 2 years) requires a large initial investment and a reliable source of funding to sustain operational support well into the future. To date, in the United States such resources are often unavailable or inadequate for the scope of the task. The potential value and impact of PROMs data to patients, providers, payers, and policy makers has stimulated cooperation and planning among these parties. An example of a collaboration to improve outcomes data collection, the Patient Protection Affordable Care Act of 2010 authorized the support of the Patient Centered Outcomes Research Institute (PCORI) [71]. PCORI is an independent nonprofit organization governed by a board with representatives from patient groups, health care providers, insurers, industry, research, and government. The PCORI has five national priorities, one of which is “accelerating patient-centered outcomes research and methodological research,” the objectives of which are “improving the nation’s capacity to conduct patient-centered outcomes research, by building data infrastructure, improving analytic methods, and training researchers, patients and other stakeholders to participate in this research.” [71]. Initiatives such as PCORI may help to resolve some of the challenges with systematic collection of PROMs, create efficient systems, and foster collaborations among advocacy groups, provider associations, industry, and regulatory agencies that may provide support and resources to minimize costs.

Discussion

Patient-reported outcomes research differs from the conventional approach to clinical research in orthopaedics, which has focused on restoring structure or joint function, mortality, and morbidity. Patient-reported outcomes research and PROMs focus on relief of symptoms, the ability to function in expected tasks and roles, quality of life, and satisfaction with treatment [13, 18, 90, 92]. This shift in focus to include the patient’s perspective as the focus of the evaluation of treatment success presents a number of challenges to the clinician-researcher. Whereas most of the data needed for conventional clinical research can be obtained from standard clinical assessment and medical records, patient-centered outcomes research requires collecting the patient’s perceptions of symptoms, function, and quality of life throughout the course of care.

The scope and plan for data collection in patient-reported outcomes research should be defined by the specific research questions and other uses for the data. Collecting data without a specific, defined purpose often wastes resources and yields poor-quality or uninterpretable data. Research questions and uses are best developed by involving stakeholders at all levels of the healthcare system, including patients, providers, payers, and policy makers, who have vested interest in supporting systems that can provide information needed to support their decisions.

The data collection process must be feasible, efficient, and affordable for everyone involved, and examples of such systems have been described [32, 42, 85]. Keys to success are optimizing the use of patient time in the clinic, minimizing the surgeon’s direct role in collecting PROMs, using PROMs data in clinical decision-making, capitalizing on technology to collect, store, and link data sources, and adapting to the system to accommodate different practices and needs. In addition, system design should include methods to minimize loss to followup and missing data, currently a major challenge for PROMs and comparative effectiveness research [65, 71]. Large-scale projects such as those organized by the PCORI aim to improve the capacity for patient-centered outcomes research in the United States, including developing and improving infrastructure, methods, and education to support the effort nationally.

We are now in a transition from conventional clinical trials and methods to use of PROMs and comparative effectiveness research, which aims to identify best evidence-based practice in general patient populations. PROMs aim to measure disease impact and evaluate treatment results from a patient-centered perspective. Many PROMs are available to measure generic health status and limb- or joint-specific function, although care is required to identify and select measures that have been validated for use in the orthopaedic patient population. To date, however, use of PROMs into routine clinical practice and decision-making is currently rare.

Where we need to go is identifying ways to collect PROMs and related data on a wide scale from community providers and patients who represent current orthopaedic surgery practice and the actual populations typically encountered in those practices. Issues of feasibility, what and how much data to collect, and costs may be best addressed by involving multiple stakeholders such as patients, surgeons, researchers, and payers in the design phase, focusing on only collecting information that is useful to all parties for making decisions or recommendations, and sharing resources to optimize return for all involved. Such structure has been implemented in registries, although most do not currently include collection of PROMs. Widespread use of PROMs by practicing community orthopaedic surgeons would likely provide the outcomes stakeholders, including patients, providers, payers, and policy makers, with valuable information regarding the effectiveness of orthopaedic surgery treatment [2].

How we will get there is through coordinated planning and the leveraging of current and future technologies that allow outcomes measurement to be feasible in nearly any clinical setting. Some examples of comprehensive integrated data collection systems have been recently described, [32, 42] as have use of various technologies to support the process [47, 88]. National initiatives in the United States (eg, PCORI) are working to develop systems and methods to improve data collection that will facilitate collection of PROMs on a large scale [71]. Incorporation of outcomes measurement into routine clinical practice to evaluate treatments as they are currently applied will allow comparative effectiveness research to determine the value of orthopaedic surgery to the overall healthcare system and population.