Introduction

Integration of patients’ perceptions into the effects of treatment has led to an increase in the utilization of patient-reported outcome measures (PROMs) in orthopedic research studies. These metrics allow for injury- or disease-specific evaluations of patient’s conditions, factoring in pain, function, and other components. Most commonly, PROMs are compared between groups or over time to determine statistically significant differences or changes; however, statistically significant changes in outcomes may not always equate to clinical significance [1, 2]. Because the meaning of absolute changes in PROMs is not readily obvious, specific thresholds, such as minimal clinically important difference (MCID), substantial clinical benefit (SCB), and patient acceptable symptom state (PASS), have been determined for outcome scores to more clearly convey clinical relevance.

The concept of MCID was first described by Jaeschke et al. and describes the minimum value over which a patient has determined his or her clinical outcome to be beneficial and meaningful [1]. Two common approaches of deriving MCID are distribution-based and anchor-based methods [3]. Distribution-based methods solely rely on the statistical characteristics of the instrument (e.g., standard error of measurement, effect size, standard deviation, or minimum detectable change) and are generally considered less informative because they do not reflect the patient’s perspective [4, 5]. In contrast, anchor-based methods specify the patient’s perception of improvement based off of an external criterion or anchor of pain or function. The relation between the anchor and the PROM is analyzed to establish the smallest change in score that best differentiates meaningful change to the patient [6].

Although the MCID is a key threshold that is being utilized with increasing frequency, it represents more of a floor value rather than a goal in terms of defining clinical success [7]. SCB represents the cutoff value for substantial improvement and is differentiated from MCID by identifying patients who responded “much better” rather than “somewhat better” on an external rating of change scale [6]. Another common metric for determining clinical success when utilizing PROMs is PASS, which is the score above which patients consider themselves well. The PASS is determined from the subset of patients who report that their current state of health is satisfactory after taking into account their activities of daily living, level of pain, and functional impairment [8].

Understanding the MCID, SCB, and PASS of shoulder pathologies is key for interpreting outcomes after treatment for various conditions. Although systematic reviews addressing MCID in shoulder PROMs are available, prior reviews limited their analyses to studies utilizing anchor-based methods or failed to perform a credibility assessment [9,10,11]. Furthermore, there are no reviews to date assessing the SCB or PASS of shoulder instruments. As such, the purpose of this systematic review is to provide a comprehensive summary of available literature on the MCID, SCB, and PASS for various shoulder conditions and outcomes.

Methods

Search Strategy

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were used in the design of this study (Fig. 1) [12]. A search was conducted using PubMed and Embase databases through May 26, 2020, to identify studies reporting the MCID, SCB, and PASS of all outcome measures pertaining to shoulder conditions. The full search criteria can be found in the Appendix. A total of 930 articles were identified after removal of duplicates. Inclusion criteria were articles pertaining to MCID, SCB, and PASS of outcome measures related to shoulder pathologies (rotator cuff tear, rotator cuff arthropathy, glenohumeral osteoarthritis, shoulder instability, superior labral anterior to posterior (SLAP) tear, biceps tendinitis, subacromial impingement, shoulder pain, acromioclavicular (AC) joint separation, rheumatic shoulder disease, proximal humerus fractures) or nonoperative or surgical treatment of shoulder conditions (physical therapy, rotator cuff repair, total shoulder arthroplasty (TSA), arthroscopic stabilization, etc.). Exclusion criteria included case reports, reviews, nonhuman studies, biomechanical studies, and scientific meeting abstracts or proceedings. Two authors (F.S. and S.A.) independently screened the titles, abstracts, and full texts. Any discrepancies in inclusion/exclusion were carried to the next round of screening to ensure thoroughness. References of each included study were further screened to capture any publications that may have eluded the original search queries. Fifty-five articles were found to be relevant, and an additional 5 articles were identified from article references.

Fig. 1
figure 1

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses diagram of the literature search and study selection

Data Organization

Relevant data were extracted, including patient demographics, length of follow-up, type of pathology and intervention, anchor information, and calculation method. MCID, SCB, and PASS values were aggregated by PROM instrument, shoulder pathology, and intervention. Mean estimates and ranges were provided for outcome measures.

Credibility Assessment

To assess the extent to which the methodology and performance of studies protect against misleading estimates of MCID and SCB, the credibility of these metrics was evaluated using previously published criteria [13]. A single criterion focusing on the correlation between change in the outcome measure and the anchor (e.g., global rating of change) was used. Values were considered to be credible if the correlation was greater than or equal to 0.4, whereas values were considered to be questionable if the correlation was less than 0.4 or if no correlation was reported [9].

Results

A total of 60 articles were included in this review. Fifty-four (90%) studies reported MCID, nine (15%) calculated SCB, and 11 (18%) quantified PASS. Thirty-two different instruments were utilized; however, only 18 (56%) were reported more than once (Table 1). The sample size and minimum follow-up were highly variable for MCID, SCB, and PASS studies, ranging from 20 to 1568 patients and from 1 week to 24 months, respectively. Nearly all (96%) studies utilized an anchor-based method to calculate MCID; however, there were six different calculations used, including receiver operating characteristic (ROC) (50%), mean difference (24%), mean change (22%), mean change limit (9%), logistic regression (6%), and 75th percentile of the improved group (2%) (Table 2). Most studies used a single anchor that measured global/overall improvement (69%). Seven-point anchors were used most commonly (26%, range, 2–18 points). Distribution-based approaches were utilized less frequently (20%) to determine MCID, with one-half the standard deviation (82%) method being the most common.

Table 1 Frequency of instruments reported
Table 2 Characteristics of studies reporting on MCID for shoulder patient-reported outcomes

For SCB, all studies utilized an anchor-based approach, but only ROC (67%) and mean difference (33%) calculations were used (Table 3). Most studies used a single anchor that measured global/overall improvement with 15-point anchors being the most common (56%, range 4–15 points). Similarly, all PASS studies utilized an anchor-based method assessing satisfaction with only ROC (73%) and 75th percentile of the satisfied group (45%) calculations being used (Table 4). A summary of mean MCID, SCB, and PASS values is shown in Tables 5, 6, 7 and 8. Of the 136 values reported, only 24.6% were considered to be credible.

Table 3 Characteristics of studies reporting on SCB for shoulder patient-reported outcomes
Table 4 Characteristics of studies reporting on PASS for shoulder patient-reported outcomes
Table 5 MCID values of shoulder assessment instruments for operative shoulder pathologies
Table 6 MCID values of shoulder assessment instruments for nonoperative shoulder pathologies
Table 7 SCB values for shoulder instruments
Table 8 PASS values for shoulder instruments

Eleven studies calculated a MCID for rotator cuff repair with Constant (5 studies), ASES (4 studies), and SANE (3 studies) being the most commonly reported instruments. The mean MCID for Constant, ASES, and SANE using anchor-based methods were 10.9 ± 11.0 (range, 2.0–36.0), 17.8 ± 8.0 (range, 11.1–27.1), and 28.4 ± 1.5 (range, 27.3–29.4), respectively. Conversely, the mean MCID for Constant, ASES, and SANE using distribution-based methods were 6.6 ± 2.8 (range, 4.6–8.6), 19.3 ± 10.7 (range, 11.7–26.9), and 13.5 ± 2.9 (range, 11.8–16.9), respectively. Only one study reported on SCB for rotator cuff repair with mean values of 17.5, 5.5, and 29.8 for ASES, Constant, and SANE. One study reported on PASS for rotator cuff repairs with mean values of 86.7, 23.3, and 82.5 for ASES, Constant, and SANE.

Ten studies calculated a MCID for TSA with ASES (6 studies), SST (5 studies), and Constant (3 studies) being the most commonly reported instruments. The mean MCID for ASES, SST, and Constant using anchor-based methods were 16.0 ± 9.0 (range, 6.3–29.6), 2.9 ± 1.0 (range, 1.5–4.0), and 6.3 ± 1.5 (range, 5.1–8.0), respectively. Conversely, the mean MCID for ASES, SST, and Constant using distribution-based methods were 8.9 ± 2.2 (range, 6.5–11.8), 1.8, and 6.9 ± 3.6 (range, 4.3–9.4), respectively. Three studies reported on SCB for TSA with mean values of 23.9 ± 8.8 (range, 12.0–36.6) and 19.4 ± 0.4 (range, 19.1–19.6) for ASES and Constant, respectively. Three studies reported on PASS for TSA with mean values of 78.6 ± 3.0 (range, 76.0–81.9), 48.8 ± 34.3 (range 24.5–73.0), and 61.8 ± 5.3 (range, 58.0–65.5) for ASES, Constant, and SANE, respectively.

Other operative treatments were reported less frequently. Three studies calculated an MCID for arthroscopic shoulder stabilization. The MCID value for Rowe score was 9.7 and 5.6 for anchor-based and distribution-based calculations, respectively, whereas it was 8.5 for ASES. One study determined the MCID, SCB, and PASS values for biceps tenodesis. The anchor-based MCID values for ASES, Constant, and SANE was 16.3, 6.8, and 3.5, respectively. The SCB threshold was 16.8, 11.0, and 5.8, respectively, whereas the PASS was 59.6, 19.5, and 65.5, respectively. One study estimated the MCID for AC joint stabilization with cutoff for Constant, NPRS, and Taft being 16.6, 1.4, and 2.9, respectively. One study calculated the MCID for SLAP repair using two different methods. The mean MCID for OSIS, Rowe, and WOSI were 9.0 ± 1.4 (range, 8.0–10.0), 17.5 ± 0.7 (range, 17.0–18.0), and 510.0 ± 83.4 (range, 451.0–569.0), respectively.

There were nine studies that determined the MCID for nonoperative management of subacromial impingement and rotator cuff tears with WORC (3 studies), Constant (2 studies), and OSS (2 studies) being the most commonly reported measures. The mean MCID for WORC, Constant, and OSS using anchor-based methods were 343.6 ± 94.3 (range, 269.0–879.9), 18.5 ± 5.8 (range, 11.0–24.0), and 7.6 ± 3.5 (range, 4.0–12.2), respectively. Two studies reported on SCB for subacromial impingement with values of 11 and 21 for DASH and PSS, respectively. Similarly, two studies reported on PASS for subacromial impingement and rotator cuff tears managed nonoperatively with values of 21.3, 2.3, and 3.0 for Neer function score, NPRS, and VAS pain, respectively.

Nine studies calculated the MCID for physical therapy of nonspecific shoulder pain. The MCID for QuickDASH (3 studies) was 15.7 ± 8.4 (range, 8.0–27.8), whereas for SPADI (3 studies), it was 13.4 ± 6.1 (8.0–20.0). Although no studies on SCB were reported for nonspecific shoulder pain, the SANE and SPADI PASS threshold were 87 and 47.3 ± 1.5 (range, 46.2–48.3). Two studies calculated the MCID of physical therapy for shoulder instability. The MCID of OSIS was 5.3 ± 1.1 (range, 4.5–6.0) and SRQ was 5.0. One study reported on the SCB of physical therapy for shoulder instability with OSIS and SRQ cutoffs of 6.5 and 5.0, respectively. Only one study reported on the MCID of physical therapy for proximal humerus fractures with Constant, DASH, and OSS values of 11.6, 13.0, and 11.4.

Discussion

The MCID, SCB, and PASS allow for interpretation of PROMs and are important to understand when treating various shoulder conditions. Our review demonstrates that the MCID, SCB, and PASS values vary widely with study-specific characteristics, including patient demographics, shoulder pathology, treatment, shoulder instrument, study methodology, and calculation method. Furthermore, in our appraisal of the literature, approximately 75% of the MCID, SCB, and PASS values were found to have questionable credibility or were not credible due to inadequate reporting. These differences have made interpretation of these metrics increasingly difficult and may potentially undermine the results of studies that utilize these thresholds as a basis for measuring a successful outcome.

One factor contributing to the variability in MCID, SCB, and PASS for the same outcome instrument is the shoulder pathology and treatment. For instance, arthroscopic rotator cuff repairs had larger MCID thresholds for ASES, SANE, WORC, SST, and VAS pain compared to physical therapy for rotator cuff tears. This finding is not unexpected given that patient expectations are likely higher when the intervention is more expensive and riskier [48, 72]. Additionally, patients undergoing rotator cuff repair also required greater improvements compared with patients undergoing TSA in order for their improvement to be considered clinically meaningful. These differences may be explained by the generally younger patient population undergoing rotator cuff repair compared to TSA and the differing expectations in pain and function after shoulder surgery between the two groups [15]. Furthermore, smaller differences may be clinically significant when symptoms are more severe, as evidenced by lower preoperative outcome scores in TSA patients compared to those undergoing rotator cuff repair [15, 38, 48]. In the present review, nine (15%) studies had also grouped patients with different shoulder conditions and treatments, including operative and nonoperative, together [20, 29, 33, 34, 46, 50, 54, 63, 65]. The MCID values derived from these studies may not be applicable to studies focused on a specific condition or treatment.

The heterogeneity of calculation methods used to derive MCID, SCB, and PASS also contributes to the wide range of values observed. Six different anchor-based approaches were used in 96% of studies, whereas only one distribution-based approach was used in 20% of studies. The decreased popularity of distribution-based methods may be due to the fact that they are generally considered less informative than anchor-based estimates because they rely on the statistical properties of a distribution rather than the patient’s perception of improvement [73]. Beaton et al. also showed in a cohort of patients with shoulder pain undergoing physical therapy that the thresholds for defining an important response to treatment differed depending on the technique used [74]. Furthermore, these differences were not inconsequential and could have profound effects on the interpretation of responder-type analysis. These findings were corroborated by Kukkonen et al. who demonstrated an eightfold difference in MCID of Constant score between mean difference method and ROC analysis in 781 patients undergoing arthroscopic rotator cuff repair [53]. Similarly, multiple studies that calculated MCID values utilizing both ROC analysis and the mean change limit method for a variety of shoulder pathologies also showed differences up to 54% [26, 34, 44, 48, 51, 57]. All calculations methods also run the risk of classifying patients as meaningfully improved when the changes in their scores fall within the measurement errors of the data set. To this end, measurement errors, such as the minimal detectable change (MDC), should be reported in conjunction with the MCID, SCB, or PASS values to help differentiate meaningful from random change [75].

In addition to the calculation method, MCID, SCB, or PASS values vary based on the type of anchor used, the definition used for the anchor, and the patient groups being studied [73]. Prior studies have demonstrated that domain-specific questions have higher construct validity as anchors for determining clinically important differences than global transition questions [76]. Despite this, 69% of the studies utilized global ratings of change anchors that focused on overall improvement. Ideally, for questionnaires like ASES which consists of pain and function domains, the external anchor should ask about changes in pain and function. Another source of variation among studies is how MCID, SCB, and PASS are defined. Multiple studies defined unchanged and minimally improved groups differently with 20 (37%) studies incorporating any improvement beyond unchanged (e.g., “completely recovered” or “very great deal better”) into minimally improved groups [14, 22, 23, 25, 26, 34, 36, 37, 39, 44, 46, 48,49,50,51, 53, 60, 62, 64]. This may incorrectly increase MCID values, as those levels of improvement reflect substantial clinical benefit rather than minimal clinical improvement. Conversely, eight (15%) studies incorporated patients who did worse (e.g., “much worse” or “very great deal worse”) into the unchanged group [14, 25, 46, 49, 53, 62, 64, 65]. Although the number of patients who did worse after treatment is less, this classification may falsely lower the mean score of the unchanged group and thereby incorrectly increase MCID and SCB values. Furthermore, six (11%) studies measured MCID as any small change, be it improvement or deterioration, compared to the unchanged group [19, 30, 40, 41, 52, 59]. This does not represent a measure of beneficial change, and future studies should be mindful of this discrepancy when deciding to utilize a previously published MCID value.

Although both SCB and PASS have been described for nearly two decades, their reporting in literature has severely lagged behind that of MCID [7, 8, 62, 77]. Among studies with the same cohort, SCB values were approximately 1.6-fold greater than MCID values for arthroscopic rotator cuff repair and 1.7- to 2.7-fold greater than MCID values for TSA [6, 18, 31, 42, 66]. Simovitch et al. reported that a mean improvement of 30% of the total metric value would likely achieve SCB in seven outcome metrics among TSA patients [66]. Similarly, PASS values ranged from 67 to 87% of the total metric value for arthroscopic rotator cuff repair and 58 to 82% of the total metric value for TSA [6, 18, 69, 70]. Future studies calculating the MCID should include SCB and PASS estimates as they provide a spectrum of clinically meaningful outcomes that may be used to counsel patients regarding expectations after shoulder surgery.

While there are multiple studies evaluating the MCID, SCB, and PASS of rotator cuff repair and TSA, few studies have calculated the threshold values for other common shoulder procedures, such as arthroscopic shoulder stabilization, biceps tenodesis, AC joint stabilization, SLAP repair, and open reduction internal fixation of proximal humerus fractures. Currently, most studies on these treatments are limited to small case series performed at a single institution [19, 21, 32]. More studies with larger sample sizes need to be dedicated to these shoulder conditions and treatments, and thereafter, clinically meaningful values can be better established.

Thirty-two PROMs were identified in this review with ASES, Constant, and DASH being the most commonly reported instruments. Interestingly, 44% of the measures only had one study reporting MCID, SCB, or PASS values, suggesting that a large proportion of these outcome measures have not been adopted by the scientific community. The American Shoulder and Elbow Surgeons Value Committee has recently recommended the use of eight shoulder outcome instruments (ASES, OSS, SANE, VR-12, WORC, WOSI, WOOS, and PSS) based on freedom from clinician input, standardization, ease of use, and validation [78]. The Constant score, which was among the most frequently reported instruments in this review, was not recommended due to requiring clinician input to measure strength and motion and the poor standardization and precision of these measurements [79]. As such, continued efforts to utilize the recommended measures in patients with shoulder conditions may potentially limit the number of unnecessary MCID, SCB, and PASS values in future studies.

Despite being a comprehensive review of MCID, SCB, and PASS for shoulder outcome measures, this study is not without limitations. First, the included studies were so heterogeneous that the results varied widely and were difficult to integrate. Mean estimates and ranges for MCID, SCB, and PASS were presented, but future studies need to be cautious prior to utilizing a certain value. Researchers must consider the multitude of factors that affect these metrics, including patient characteristics, study size, pathology, intervention, length of follow-up, and calculation methods [80]. Second, the overwhelming majority of studies failed to report measurement errors, such as MDC, which makes it difficult to determine whether these thresholds are meaningful changes or simply due to random variation. Additionally, the criteria chosen to evaluate the credibility of the MCID, SCB, and PASS thresholds were stringent. Other factors that may contribute to the credibility include whether the anchor addresses the patient’s perspective, the precision of the MCID estimate, and whether the threshold or difference between groups represented a small but important change [10]. Lastly, several studies determined MCID, SCB, and PASS values for different subgroups of their patient populations, such as anatomic versus reverse TSA. Only the overall results are presented here to enable comparison.

In conclusion, the present review provides both anchor-based and distribution-based estimates for MCID, SCB, and PASS of outcome instruments addressing patients with shoulder conditions. ASES, Constant, and DASH were the most frequently utilized instruments, whereas rotator cuff repair and TSA were the most commonly analyzed interventions. There were numerous methodological limitations of the primary studies that resulted in a wide range of values. Additionally, it was observed that there is a paucity of literature that reports the results for SCB and PASS estimates in patients with shoulder disorders. Information from this review is vital for clinicians to appropriately establish patient expectations for recovery.