Introduction

The tools that orthopedic surgeons use to assess patient outcomes have changed substantially over time. In the first half of the twentieth century, surgeons typically reported their findings using anecdotal assessment or arbitrary variables [10, 15]. The next generation of instruments, such as the Harris Hip Score, included more consistent objective data that allowed for comparison between studies, but were characteristically completed by the physician and, therefore, subjected to the physician’s bias [1, 9, 15, 16, 23, 31]. The most recent generation of tools, patient-reported outcomes (PRO) instruments, directly assess the patient’s own subjective interpretation of their outcome, a variable that many feel is a more ideal measure of treatment success. PRO instruments today are continually being developed and streamlined, using vigorous methodology to ensure validity and responsiveness and are increasingly employed via electronic media [8, 21, 22, 26, 28].

As part of this ongoing development, the National Institutes of Health funded the development of the Patient Reported Outcome Measurement Information System (PROMIS), a large database of questions relating to various health domains. The questions, or items, in PROMIS are individually validated and developed using Item Response Theory so that they can be used alone or in combination with other questions within the same question bank. This contrasts to traditional fixed-length instruments that need to be completed in their entirety to be valid [3, 6, 14]. Each item also has a known relationship to other items so that the patient’s response to the first item can be recorded, and from that response the most informative second item can next be determined and presented to the patient. This process is repeated after each item until a predetermined precision level is reached. In practice today, the process described is administered using computers and is known as computer adaptive testing (CAT) [2, 3, 6, 7, 12, 14]. CAT allows for the custom-tailored assessment of each patient using the fewest questions necessary while maintaining measurement precision.

Some of the advantages of PROMIS CATs that have been demonstrated to date include high-content validity and good responsiveness to change,[13] high-reliability [12], and improved efficiency [2, 12, 14]. Initial work has demonstrated that PROMIS CATs have many favorable characteristics, but they have primarily been studies in relatively lower functioning populations and the potential for ceiling effects has been suggested [6]. Currently, there is little work examining populations of higher function and the characteristics of PROMIS CAT in such groups remains unknown.

Currently available PRO instruments include generalized instruments such as the Short Form Healthy Survey (SF-36) [19, 30] or the EuroQol 5-Dimensions Questionnaire (ED-5D) [24] which are used as overall surveys of health. There are also more focused instruments such as those specific to an anatomical site, examples of which include the Knee Injury and Osteoarthritis Outcomes Score (KOOS) [11, 25] or the Marx Knee Activity Rating Scale (Marx) [17, 18]. Both general and specific instruments have proven useful for measuring patient outcomes, but some are burdensome to administer and complete because of their high numbers of questions (e.g., greater than 30 questions for the KOOS and SF-36).

This study assesses the physical function domain of PROMIS CAT (PROMIS PF CAT) in a higher functioning cohort indicated for ACL surgery with for the following purposes: (1) to identify and quantify any correlations between the scores of PROMIS PF CAT and current knee PROs or their subscales that measure physical function; (2) to evaluate PROMIS PF CAT’s test burden; and (3) to determine if PROMIS PF CAT has any floor or ceiling affects in this population.

Materials and methods

All patients who were indicated for operative management of an ACL injury by any one of five participating surgeons at a single university outpatient clinic over a 10-month period beginning January 2015 were approached for the study and those who consented were enrolled. Any patient undergoing significant simultaneous operations including microfracture, meniscus repair, osteotomy, or osteochondral allograft were excluded. Each patient completed the PROMIS PF CAT, EQ-5D, KOOS, Marx, and SF-36 subscales of physical function, general health and pain in random order on a computer kiosk during their preoperative clinic visit. Both generic and anatomical site-specific PRO instruments were included to evaluate for both divergent and convergent construct validity. Institutional Review Board approval was obtained for this study (University of Iowa IRB #201,609,839).

Statistical analysis

Patient demographic data was recorded. The Shapiro–Wilk test was used to evaluate the normality of each PRO instrument’s score distribution. All distributions with the exception of the KOOS Symptoms subscale were not normally distributed and thus Spearman correlation coefficients were chosen to identify associations between PROMIS PF CAT and other instruments with p < 0.05.

The strength of a correlation was categorized as: High (≥ 0.7); high-moderate (0.61–0.69); moderate (0.4–0.6); moderate-weak (0.31–0.39); and weak (≤ 0.3) [27]. The correlation between PROMIS PF CAT and the other instruments that either measured physical function (convergent validity) or measured some other domain (discriminant validity) was used to evaluate construct validity.

An estimated sample size of 36 for a two-sided test and alpha level of 0.05 was determined to provide 80% discriminatory power to detect a correlation of moderate (0.6) from weak (0.2).

Floor or ceiling effects were calculated as the percentage of patients who obtained the lowest or highest possible score on a given instrument, respectively, and were considered significant if they were at least 15% [29].

Patients completed the PROMIS PF CAT instrument when they answered enough items to reach a predefined level of significance using the default administration rules for PROMIS [14]. The number of items required for completion of the PROMIS PF CAT was recorded for each patient.

Results

One hundred patients were enrolled (45% women; 55% men). The mean (± SD) age was 26.4 ± 9.2 years and mean BMI was 27.2 ± 6.2 kg/m2.

Table 1 illustrates correlations between PROMIS PF CAT scores and scores of the other PRO instruments. Correlations were high for each of the other PRO instruments/subscales aimed at measuring physical function, including SF-36 PF (r = 0.82, p < 0.01), KOOS Sport (r = 0.70, p < 0.01), and KOOS ADL (r = 0.74, p < 0.01). With the exception of EQ-5D, all correlations that reached statistical significance between PROMIS PF CAT and instruments not specifically aimed at physical function were moderate. Correlation between EQ-5D and PROMIS PF CAT was high (r = 0.70, p < 0.01). PROMIS PF CAT scores and those of the MARX and SF-36 GH did not have a significant relationship.

Table 1 Correlations between PROMIS PF CAT and other PROs

PROMIS PF CAT demonstrated no floor or ceiling effects, with no patients attaining either the highest or lowest score (Table 2). Both the SF-36 GH and MARX demonstrated significant ceiling effects with 15.0% and 42.0% of patients scoring at the extreme high end of these instruments, respectively.

Table 2 PRO instrument ceiling and floor effects

By default rules, participants answered a minimum of 4 questions for PROMIS PF CAT. Ninety-four percent of the 100 patients enrolled were only asked the minimum of 4 items (range 4–11), with the average number of questions answered being 4.2 ± 0.9.

Discussion

The most important finding of the present study is that PROMIS PF CAT scores have a high correlation with currently used knee PRO instruments or their subscales that are designed to measure physical function including KOOS Sport, KOOS ADL, and SF-36 PF, and a lesser correlation with those not specifically aimed at measuring physical function. Commenting on related work, other authors have highlighted the need to make this correlation in order for PROMIS PF CAT to make the jump into clinical usefulness [14]. The findings herein indicate that PROMIS PF CAT maintains construct validity and likely provides similar information as some of the traditional physical function PRO instruments compared here.

Previous work has suggested that the PROMIS PF CAT has several positive traits including high-content validity and reliability, good responsiveness to change, and improved efficiency that can potentially reduce test fatigue and facilitate data collection [2, 5, 6, 12,13,14, 20]. The finding of high correlations with other PF instruments herein builds on previous work by demonstrating high construct validity in addition to those characteristics described by others.

The current study also indicates that in the ACL injury population PROMIS PF CAT is inclusive with no floor or ceiling effects, despite early suspicion to the contrary [6] and in accordance with more recent work [12, 14]. Instruments that cover all patients including those at the extreme high or low end of functioning are essential in the applicability of these tools. This study suggests that PROMIS PF CAT performs well in this regard in this specific population. Evaluation of the responsiveness to change and inclusiveness of this instrument as function improves in this cohort postoperatively may be of further interest.

In agreement with other reports, PROMIS PF CAT in the current study also exhibited high efficiency relative to other common PRO instruments [5, 12, 14]. The large majority of patients herein only had to complete four items prior to the conclusion of the PROMIS PF CAT, compared to a total of 88 total items (36 for SF-36, 6 for ED-5D, 42 for KOOS, 4 for Marx) in the other traditional PROs used in this study. Being able to precisely record patient function using fewer items than most instruments is a key advantage of PROMIS PF CAT and has the potential to minimize test burden, facilitate data collection, and improve response rates [3, 4, 6, 7, 14]. Future work evaluating patient and administrator satisfaction and time and cost analysis with the use of PROMIS is indicated.

There are some notable limitations of the current study. It is recognized that the specificity of the patient population analyzed may limit its generalizability, but it was in line with the goals of the study to evaluate PROMIS in a relatively young and healthy population. Additionally, this work required patients to complete several PRO instruments one after another which could possibly have caused test fatigue and may arguably alter scores. An attempt to control for this limitation by randomizing the order of administration of the instruments was performed. Instead of timing how long it took each patient to complete each instrument, this study also substituted the number of items asked prior to completion as a marker of test burden. This was a necessity because of the administering clinic’s design where patients may be interrupted during instrument administration, making time data inaccurate in many cases. It is the authors’ opinion, however, that the relatively few number of items for PROMIS PF CAT was representative of its efficiency and indeed this idea has been consistently supported throughout the literature to date. Finally, there was an unexpectedly strong correlation between PROMIS PF CAT and the EQ-5D, which is an instrument aimed at measuring general health rather than specifically physical function. Some may argue that this might indicate problems with discriminant validity. One possible alternative explanation for this, however, is that EQ-5D scores may correlate strongly with the physical function of the patient’s measured.

In summary, the present work suggests that PROMIS PF CAT is a valid and efficient instrument for routine clinical assessment and may have advantages over some traditional fixed-length PROs. The strong correlations found here indicate that PROMIS PF CAT accurately measures physical function. Though the use of CAT, it is able to do so with high precision while using fewer questions, reducing burden on the patient. Efficient and precise measurements allow for easier use in the clinical setting and may improve response rates and allow for more comprehensive assessment of patient function before and after treatment, as patients may be more likely to complete a shorter test particularly when they are asked to complete these instruments at serial clinic visits over time.

Conclusions

PROMIS PF CAT scores and other currently used PRO instruments measuring physical function in patients indicated for operative management of ACL injuries correlate strongly. The instrument is inclusive with no floor or ceiling effects in this population, and is a viable and efficient alternative to measuring physical function in the ACL injury population.