Introduction

One of the main complaints in patients with shoulder pain is functional disability [1]. Treatment of shoulder pain is usually aimed at pain reduction and improvement of functional disabilities [2]. Consequently, outcome measurements should include an instrument (e.g., questionnaire) for the evaluation of functional disabilities [3].

There are several self-administered shoulder pain and disability questionnaires. Patients ranked the Shoulder Disability Questionnaire (SDQ) and the Shoulder Pain and Disability Index (SPADI) as the most relevant questionnaires [4]. The SPADI was the least time-consuming, both the SDQ and the SPADI appear to be convenient and easy to complete [4].

The SPADI was originally developed in English [5]. It has been translated and validated in several languages and showed excellent reliability and responsiveness [69].

The Royal Dutch Society for Physical Therapy has recommended implementation of the Dutch SPADI (SPADI-D) in a clinical guideline for patients with shoulder pain [10]. Nevertheless, the SPADI-D has not been validated and tested for reliability.

Therefore, the aim of this study is to evaluate the reliability and construct validity of the SPADI-D for patients with shoulder pain in primary care.

Methods

Patients with shoulder pain were recruited from primary care physical therapy clinics and signed informed consent. This study was approved by the Medical Ethics Committee of the Erasmus Medical Center (MEC-2011-414) [11].

Baseline measurement

Patients received an online questionnaire that included the SPADI-D, SDQ and EuroQol five-item quality of life questionnaire (EQ-5D-3L).

The SPADI is designed to measure pain and disability associated with shoulder pain. It consists of 13 items and response options range from 0 to 10, where 0 represents “no pain/no difficulty” and 10 “worst pain imaginable/very difficult.” The total score varies between 0 and 100; a higher score indicates a higher level of pain-related disability [5].

The SDQ is a pain-related disability questionnaire consisting of 16 items. Response options are “yes,” “no” or “not applicable.” The SDQ-score can range from 0 to 100 with a higher score indicating more severe disability [2]. The SDQ was originally designed and validated in Dutch, and internal consistency and responsiveness are good [2, 12].

The Dutch EQ-5D-3L is a quality of life questionnaire covering 5 dimensions of health: mobility, self-care, usual activities, pain/discomfort and anxiety/depression and an official language version [13]. Response options are “no problems,” “some problems” and “extreme problems.”

Test–retest measurement

A randomly selected group of patients received a second SPADI-D after 1 week. The time interval was chosen to minimize recall bias as well as progression bias and is often considered appropriate [14]. A sample size of approximately 80 is considered acceptable [15].

Analysis

Analyses were performed with SPPS22. Handling of missing items was performed as described by the original authors of the SPADI and SDQ [5, 12].

All data were checked on normality, using a stem-and-leaf plot, Q-plot and whisker box. Nonparametric tests were used if data were not normally distributed.

Known groups validity We assumed that patients with high initial pain (>7 on the Numeric Rating Scale in the preceding 24 h) and work absence would have a higher level of perceived disability. Both groups have been chosen a priory [7, 12]. The independent t test was used to test the difference between known groups.

Convergent validity High correlations (r ≥ 0.60) were expected [15] between the scores on SPADI-D and the SDQ, as both aim to measure the same construct.

Divergent validity Low correlations (r < 0.30) were expected [15] between the items “mobility” (as patients with shoulder pain and healthy subjects do not differ significantly in the amount of time spent walking [16]) and “anxiety/depression” (as low correlations were found between anxiety/depression and activity limitations for patients with shoulder pain [17]).

Factor structure We conducted a principal component factor analysis with and without varimax rotation. Data were checked for suitability. We used the scree test and parallel analysis [1820] to extract the number of factors. Items loading higher than 0.40 on one factor and lower than 0.30 on any other were acceptable [21]. Ultimately, the stability of our model was assessed using two random splitting halves (subsamples) [22], and we performed this five times to assess if our findings were consistent.

Internal consistency Internal consistency was calculated using Cronbach’s alpha and only for the scale(s) that was extracted from our factor analysis. A Cronbach’s alpha between 0.70 and 0.95 is considered “good” [23].

Testretest The intraclass correlation coefficient (ICC) using a two way mixed model was used to calculate the test–retest reliability. The ICC can range from 0.00 (no stability/agreement) to 1.00 (perfect stability/agreement). An ICC of 0.70 is considered to be acceptable [23]. We checked the test–retest data for extreme values and assessed whether this influenced the ICC.

Results

Patient characteristics

Due to missing variables out of 389 patients, 356 patients were included in this analysis and 74 in the test–retest reliability analysis. The mean age was 49.5 (SD 13) years, and 47 % was male. Demographic characteristics are reported in Table 1.

Table 1 Demographic characteristics of the included patients

The data of the SPADI-D at baseline and at re-test were considered as normally distributed, in contrast to the data of the SDQ and EQ-5D-3L.

Validity

Differences between “known groups” were statistically significant and considered clinically relevant (Table 2). This means that the SPADI-D is able to differentiate between different groups.

Table 2 Extreme groups correlation coefficients

The Spearman correlation between the SPADI-D and SDQ was high (r = 0.69), meaning that the convergent validity of SPADI-D with SDQ is good. The Spearman correlation between the SPADI-D and EQ-5D-3L_mobility-item (r = 0.25) and the EQ-5D-3L_depression-item (r = 0.14) was low. This means the SPADI-D and EQ-5D-3L measure a different construct.

Factor structure

Parallel analysis revealed that the eigenvalue of the first factor should be above 1.44 and of the second factor above 1.33 to be extracted. Only one factor was extracted (see Fig. 1), the eigenvalue of the second factor was 0.97. A one-factor solution explained 57.9 % of the variance and the second factor added only 7 %. Findings were consistent with all five analyses based on two random subsamples. This means that we consider the SPADI-D to have one factor.

Fig. 1
figure 1

Scree plot. A scree plot of eigenvalues, the demarcation point indicates one factor. The results are based on 298 patients

Reliability

The internal consistency and test–retest reliability were good [Cronbach’s alpha = 0.94; ICC = 0.89 (95 % CI 0.83–0.93)]. After exclusion of two patients with extreme values, the ICC was 0.90 (95 % CI 0.85–0.94). Both indicate a high level of agreement.

Discussion

This study shows that the SPADI-D consists of one factor only and can be considered as a valid and reliable questionnaire. It discriminates well between known groups and correlates well with the SDQ, and internal consistency and test–retest reliability are high.

One SPADI validation study used similar “known groups,” showing a higher mean difference for work absence compared to ours [7]. Differences with this study were that their population was smaller and had a higher baseline SPADI score, and they did not present the percentage of people that could not work due to their shoulder pain.

Correlation coefficients found in other studies for convergent validity varied between moderate and high (0.33–0.85) depending on the comparator [4, 2426]. Few studies evaluated divergent validity of the SPADI and none used the EQ-5D-3L [25, 26].

Only one study reported a factor structure as originally described by Roach [26], the majority of studies could not confirm this loading pattern or reported a one-factor structure [6, 2729]. One study concluded that people do not distinguish between pain and disability and a possible explanation for this finding could be the wording of the SPADI items. The disability items ask respondents to indicate the amount of difficulty they have with specified functions. It is possible that when people report their difficulty in performing an activity, they consider pain to be part of what makes the activity difficult [29].

The Cronbach’s alpha found in other studies ranged between 0.90 and 0.95 [5, 2628, 30], and ICC values ranged between 0.88 and 0.94 [6, 7, 3032], both consistent with ours.

Our study has some limitations. First, the translation process of the SPADI-D was not published, and it is unknown if it is performed as recommended [33]. Nevertheless, the SPADI-D is commonly used in clinical practice and research and is also integrated in multiple patient-management software programs in the Netherlands. Second, we did not use the general perceived effect scale to check if patients were indeed stable between the test and the re-test. However, it is unlikely that patients would have been recovered within 1 week, due to the duration of complaints and the mean number of weeks patients usually need to recover [34]. The extreme value analysis showed that differences after exclusion were minimal.

On the other hand, we used an adequate sample size to perform factor analysis [22]. There is increasing consensus among statisticians that parallel analysis is superior to other procedures and typically yields optimal solutions to the number of components problem [18].