Introduction

Patient-reported outcome measures (PROMs) have become an important component of determining patient outcomes after shoulder sports-related injuries. An important component of PROMs is the activity level of a patient, which is in effect a functional measure of musculoskeletal health. Although a large number of valid tools are available to measure the activity level in the general population, these tools can be less than ideal for the assessment of athletes [4]. The results of shoulder surgery in an athlete cannot be judged only according to the criteria used for nonathletes. For athletes affected by recurrent anterior shoulder instability, the main outcomes of treatment are the ability and time to return to play and to return to their previous level of function [19]. Other parameters such as ability to perform activities of daily life are generally less affected by shoulder problems. It is important to be able to quantify an athlete’s sport activity so it can be evaluated within a context of other patients for both research needs and for comparison with population normative data when treating injuries. Moreover, treatment outcomes in terms of sport activity should match preoperative expectations of athletes, to improve overall patients’ satisfaction [19].

The Tegner activity scale [17] is one of the most commonly used scales designed to quantify a patient’s activity level. This scale ranks sports activities into subgroups that entail similar involvement of a knee affected by an anterior cruciate ligament lesion. Although being designed for the knee, the Tegner activity scale has been used to assess patients after shoulder surgery for recurrent anterior instability [14,15,16]. However, its use for the shoulder raises some obvious concerns. Since the original scale is weighted for the knee, it means, for example, that soccer is scored higher than swimming.

The Brophy–Marx activity scale for shoulder [5] evaluates a patient’s overall shoulder activity level based on the frequency with which he or she completes five common activities of the shoulder, such as carrying objects as heavy as, or heavier than, a bag of groceries by hand, handling objects overhead, and participating in contact and overhead sports. However, it does not specify particular sports.

The Degree of Shoulder Involvement in Sport (DOSIS) scale is a modified Tegner activity scale weighted for the shoulder, developed by the Sport Committee of SIGASCOT (Società Italiana del Ginocchio Artroscopia Sport Cartilagine Tecnologie Ortopediche) to help the physician classify patients on the basis of their sport activity based on the specific involvement of the shoulder in that sport [4]. The psychometric features of the DOSIS scale has been measured and compared with the psychometric features of the original Tegner activity scale. No other studies have evaluated the validity of measurements obtained with the DOSIS scale. Evidence for construct validity of patient-reported outcome measures must be accumulated by hypothesized patterns of associations with other validated instruments to measure relatively similar constructs (for positive correlations) [8]. Other generic and shoulder-specific questionnaires incorporate an activity scale within the score. The aim of this study was to evaluate psychometric features of the DOSIS scale by testing convergent validity and responsiveness of the DOSIS scale.

Materials and methods

Subjects and procedures for assessment of validity

This study includes human subjects. However, according to Italian law, no ethical approval was mandatory for this study. The study has been performed in accordance with the ethical standards in the 1964 Declaration of Helsinki and has been carried out in accordance with relevant regulations of the Italian National Healthcare System.

The study was conducted as a questionnaire-based survey in an independent population of patients who were affected by recurrent anterior shoulder instability and who underwent an arthroscopic Bankart repair or an open Bristow-Latarjet procedure. An open-source platform (https://drive.google.com) was configured to collect the responses anonymously. The digital patient database of the Orthopaedic and Traumatology Department was retrospectively reviewed to identify all of the patients surgically treated for recurrent anterior shoulder instability. Patients younger than 16 were not included, nor were patients whose first language was not Italian. A total of 63 patients treated between January 2005 and December 2015 were enrolled in this study. All patients gave their informed consent upon receiving complete information on the study. The patients were contacted by phone to present the research and to invite them to participate in the study. The subjects were required to fill the following self-reported outcome measures online: the DOSIS scale, the Brophy–Marx [5] and Tegner [17] activity scales, and the validated Italian versions of the Western Ontario shoulder instability index (WOSI) [6], the Simple Shoulder Test (SST) [12] and the Short-Form 36 (SF-36) [1]. The patients were asked to answer the DOSIS, Marx and Tegner scales retrospectively by recalling the period of time before the onset of shoulder instability (baseline scores) and at follow-up examination (postoperative scores).

DOSIS analysis plan

Convergent validity was defined as the extent to which the DOSIS correlated with measures consistent with its theoretically derived construct. Spearman’s rank correlation coefficient (r) was used to assess the association between the DOSIS and the Brophy–Marx and Tegner activity scales, the validated Italian versions of the WOSI, the SST and different SF-36 subscales. It was hypothesized that: (1) the correlation between the DOSIS and the Brophy–Marx and Tegner activity scales would be moderate to high; (2) the correlations between the DOSIS and the WOSI, the SST function would be moderate to high; and (3) the correlations between the DOSIS and the subscales of physical functioning and role physical of the SF-36 would be moderate to high. Spearman’s coefficient was read as follows: strong correlation for values >0.50; moderate correlation for values between 0.35 and 0.50; and weak correlation for values <0.35 [10].

The DOSIS scale [4] is a patient self-administered scale used to score a sport activity based on 3 parameters: (1) the type of sport classified (no or minimal demand, moderate demand, high demand), (2) the frequency at which the sport was played (occasionally, at least twice a week), and (3) the level at which the sport was played (recreational, low level of competition, high level of competition) (“Appendices 1, 2”). According to these parameters, the DOSIS scale is calculated by the researchers using an allocation table (“Appendix 3”). Patients then obtain a score from 0 (no sport) to 10 (high-demand sport played by national-/international-level or professional athlete).

The Brophy–Marx shoulder activity scale [5] evaluates patients’ overall shoulder activity level based on the frequency with which they participated in 5 specific activities of the shoulder at their most active state over the previous 12 months, which generates a numeric score ranging from 0 (least active) to 20 (most active).

The Tegner activity scale [17] is a one-item instrument that assesses activity levels for sports and occupational activities. It evaluates patients’ level of work and sports activity on an 11-level scale, with higher scores representing higher levels of physical activity.

The WOSI is a disease-specific PROM designed to be used as a primary outcome measure in clinical trials that evaluated treatments for patients with shoulder instability [6]. The 21-item questionnaire consists of four domains, referring to physical symptoms, sport/recreation/work function, lifestyle function, and emotional function. Originally, responses range from no complaints (0) to severe complaints (10).

The SST consists of 12 questions about physical function with dichotomous (yes or no) response options. The scores range from 0 (worst) to 100 (best) and are reported as the percentage of items that a person answers in the affirmative [12].

The SF-36 consists of 36 questions on the general health status of patients [1] with eight health concept subscales (physical function, role physical, bodily pain, general health, vitality, social function, role emotional, and mental health), which are then aggregated into two main scores. The physical and mental component summary scores represent weighted composite scores derived from the eight health concept scales. Each subscale score can vary from 0 to 100, with higher scores representing more desirable health states.

Responsiveness is defined as the ability of a scale to detect clinically important changes over time [18]. For the DOSIS to be responsive, it needs to demonstrate a lack of floor or ceiling effects, which were considered to be present when more than 15% of the patients received either the lowest or highest possible scores [18]. This was followed by a relative efficiency calculation to analyse responsiveness of the DOSIS versus the Brophy–Marx and Tegner activity scales according to Barr et al. [3]. Using this method, a score of greater than 1 would indicate the DOSIS was more responsive than the Brophy–Marx and Tegner activity scales and a score less than 1 would indicate the DOSIS to be less responsive than the Brophy–Marx and Tegner activity scales. The standardized effect size and standardized response mean were also evaluated. The effect size is the difference between the mean baseline scores and posttreatment scores on the measure, divided by the standard deviation of baseline scores. The standardized response mean is equal to the mean change in score divided by the standard deviation of the change scores. The standardized effect size values >0.2, >0.5, and >0.8 were considered small, moderate, and large, respectively [10].

Statistical analysis

There is no agreed optimum method for determining an appropriate sample size to evaluate aspects of validity for patient-reported outcome measures. However, 50 patients have been advocated as the minimum requirement [13, 18]. Therefore, it was deemed that the planned case series of 63 patients could provide sufficient power to investigate important aspects of validity for the DOSIS scale and allow for 20% loss. The DOSIS scale was considered a continuous variable. Descriptive statistics was used to report patients’ demographics as mean and standard deviation (SD). The Kolmogorov–Smirnov test was used to assess the assumption of normality, showing a distribution of the values distant from a normal distribution. Therefore, the results are described using median and respective interquartile range (percentile 25–percentile 75). A nonparametric analysis of the data (Spearman’s rank correlation coefficient and Wilcoxon sign rank test) was therefore performed. A p < 0.05 was considered statistically significant. Data were entered into a Microsoft Excel spreadsheet (Microsoft Corporation, Redmond WA) and analysed using PSPP software (Free Software Foundation, Inc.) for windows.

Table 1 Demographics of study cohorts
Table 2 Absolute values of all scores

Results

A total of 53 patients (84%) completed the questionnaires. The demographic data of the cohort are listed in Table 1. There were no missing data for any DOSIS item. Table 2 reports absolute values of all postoperative scores.

The DOSIS showed strong correlation with the Brophy–Marx and Tegner activity scales, a moderate correlation with the WOSI and SST scores, and a moderate correlation with the physical functioning, role physical and role emotional subscores of the SF-36 (Table 3).

Table 3 Correlation between the DOSIS and the Brophy–Marx and Tegner activity scales, the WOSI, the SST and different SF-36 subscales

The distribution of the DOSIS scale had no serious ceiling or floor effects. The distribution of the Brophy–Marx and Tegner activity scales was computed: neither of the 2 scales showed a floor or ceiling effect (Figs. 1 and 2).

Figs. 1 and 2
figure 1

Floor and ceiling effect and score distribution are showed graphically by reporting the number of outcomes for each score

Table 4 shows the relative efficiency of the DOSIS in relation to the Brophy–Marx and Tegner activity scales. The DOSIS demonstrated lesser responsiveness when compared to the Brophy–Marx and Tegner activity scales. The standardized effect size was 0.53, and the standardized response mean was 0.58.

Table 4 Relative efficiency of the baseline/postoperative DOSIS

Discussion

The most important finding of the present study was that the DOSIS scale showed acceptable psychometric properties in patients after shoulder surgery for recurrent anterior instability.

The DOSIS was published in 2015 and advocated by the authors as a modified Tegner activity scale weighted for the shoulder [4]. There have been no subsequent validation studies. Therefore, this study represents the first paper to investigate aspects of validity, outside of the developing centre.

Validity evaluation usually consists of aspects of criterion validity, represented by the ability of the proposed score to agree with a gold-standard measure, and content validity, assessed by an analysis of the floor/ceiling effect and the ability of a scale to recognize differences between preoperative and postoperative status (responsiveness).

Aspects of convergent validity and responsiveness of the DOSIS scale were investigated, using a sample of 53 patients. Criterion validity was assessed by comparing the DOSIS scale with selected outcome measures. The Brophy–Marx [5] activity scale, the WOSI [6, 11], the SST [9] and the Short-Form 36 (SF-36) [7] have been proven to be valid outcome tools for shoulder disorders. The Tegner activity scale [17] is one of the most commonly used scales designed specifically to assess activity levels for sports and occupational activities. Although designed for the knee, the Tegner activity scale has been used for the shoulder [14,15,16]. Despite some low correlation, all of the a priori hypotheses were mainly confirmed in our sample. This finding is supported by the statistically significant correlations between the DOSIS and the Brophy–Marx and Tegner activity scales, and the WOSI and SST scores, as well as by the higher correlations between the SF-36 subscales assessing related constructs (convergent validity) and the lower correlations between the subscales measuring different constructs (divergent validity).

The DOSIS showed only moderate correlation with the WOSI and SST scores. These results do provide some evidence that the DOSIS is measuring similar aspects of outcome when compared to the WOSI and SST. However, this element of validity should be interpreted with caution as the WOSI and SST scores measure more generic physical function, as opposed to the alternative construct of sports activity, measured by the DOSIS. The DOSIS scale had a greater correlation with the Tegner knee activity scale than with other shoulder instruments, suggesting that the other instruments do not accurately assess sports activity.

The moderate correlation of the DOSIS with the physical functioning and physical role functioning subscores of the SF-36 can be explained by the dominance of lower extremity items. Pain as measured with the SF-36 pain scale was not correlated with the DOSIS. Blonna et al. [4] found a poor correlation between the DOSIS scale and “pain during sport activity”. The reasons for this poor correlation could be that athletes affected by shoulder instability are usually not significantly impaired by pain at follow-up.

A higher-than-expected correlation was found between the DOSIS scale and the role emotional subscore of the SF-36 (limitations in usual role activities because of emotional problems). However, psychological factors have been shown to be associated with returning to sport following athletic injury [2].

In accordance with the original development article, the DOSIS scale had a different distribution of scores compared with the original Tegner activity scale. One possible explanation is that the DOSIS scale classifies patients according to specific involvement of the shoulder in their sport activity and has distinct features compared with the original Tegner activity scale, providing a different distribution of scores. The postoperative DOSIS scale was shown to have higher percentage of reported responses at the bottom (floor) of the possible score when compared to the Brophy–Marx and Tegner activity scales although neither score had significant floor or ceiling effects. This may represent the more specific outcome measure provided by the DOSIS scale, since it is reasonable to state that the percentage to return to sports after surgery for shoulder instability is different between runners and swimmers.

Authors do not have a direct answer as to why the DOSIS scale would be less responsive than the Brophy–Marx and Tegner outcome measure, though it is possible that this may be representative of the greater floor effect seen within the postoperative DOSIS scale.

The standardized effect size and the standardized response mean were only moderate (>0.5), compared with the large (>0.8) effect size reported in the original development article. This result is most likely the consequence of the small sample size.

This study has some limitations that need to be discussed. Due to the small sample size, generalizations to other samples with shoulder disorders may be affected. As the DOSIS scale has been tested in patients affected only by shoulder instability and not by other shoulder conditions, the psychometric features measured in this study cannot be extrapolated to patients with degenerative disorders of the shoulder.

Another limitation is that the data were collected in a retrospective manner in part, since we asked patients to recall their sport activity levels (baseline and preoperative DOSIS). The relevancy of this limitation was tested in the original validation study by comparing the test–retest reliability of the DOSIS scale measured retrospectively and the DOSIS scale measured at follow-up (postoperative DOSIS). No significant differences were found, suggesting that the DOSIS scale is reliable even when it is measured retrospectively [4].

The clinical relevance of this study is that the DOSIS scale can be used for sport-specific shoulder assessment in patients after surgery for anterior instability.

Conclusion

This study provides further evidence regarding the validity of a newly developed measurement tool. Overall, the DOSIS scale demonstrated evidence of convergent validity with the Brophy–Marx and Tegner activity scales, although these tools do measure slightly different constructs.