Introduction

The follow-up of patients with musculoskeletal disorders in clinical and research settings is not only based on clinical exams or radiography but also on self-administered questionnaires which are inexpensive and give insight into the patient’s perspective.

In occupational rehabilitation, one important activity is the functional capacity evaluation (FCE) of patients in order to determine readiness or ability for safe return to work following musculoskeletal injury [1]. The patient’s self efficacy (SE) level was proposed as a relevant psychosocial factor that may influence FCE. Perceived SE refers to the individual‘s beliefs about their own competence or ability [2]. SE beliefs may influence the patient’s behavior, e.g. the ability to overcome negative experiences. It has been suggested that SE is more closely related to work disability than actual physical abilities [3]. Assessment of SE by self-report therefore plays an important role in predicting health outcome [45]. It has also been recommended that patients with low back pain should be assessed with both instruments (i.e. self-report and performance tests) because these strategies may lead to different results [68].

Self-administered questionnaires should be developed with accurate and rigorous instruments to ensure that they are specific to the studied concept as well as reliable and responsive (clinimetric qualities) [910]. A great variety of questionnaires have been developed to assess the perceived function of patients with back pain. Some of them such as the Oswestry disability index, the Roland Morris disability questionnaire and the Quebec back pain disability scale have been recommended for clinical purposes by an expert panel [11]. The utility of questionnaires in rehabilitation settings is often limited by the literacy level of the patients [12]. One approach to improving the comprehension of the questionnaire by patients with low literacy levels is to inform the patient through pictorial activities and task sorts (PATS) designed for self-assessment of functional ability in occupational rehabilitation such as were developed in the 70s [13].

Recently, efforts have focused on the creation of questionnaires more oriented towards functional limitations and occupational perspectives [14]. However, those picture-based questionnaires are often validated in English only and not for use by non-English speaking patients.

The Spinal Function Sort (SFS), published in English in 1989 [15], has proven to be of advantage in work-related rehabilitation settings [1620]. It is often used in addition to FCEs to assess the self-perceived functional capacity of patients with back complaints [21]. It is a picture-based generic tool that is useful for all kinds of back disorders. The reliability and validity of the SFS have been reported [2224] but, to the best of our knowledge, no German or French versions have been properly cross-culturally adapted and translated. The aim of this study was to do a cross-cultural adaptation and validation of the SFS in French and German.

Methods

Spinal Function Sort (SFS)

The French and German translations of the SFS consists, as the original SFS, of a booklet containing drawings (Picture 1 in Electronic supplementary material) with a brief a description of 50 tasks. These tasks are performed by men and women and reflect a wide range of daily living or vocational activities that involve the spine. The pictured activities are graded from light to heavy material handling, so that scores can be compared to the physical demand characteristics from the United States Department of Labor’s Dictionary of Occupational Titles [25]. Subjects are asked to answer quickly without spending too much time on any one drawing. They are told that their “first impression is usually the best”. There is no time limit to fill out the questionnaire. Subjects rate their ability to perform the task on a 5-point Lickert scale (from “able” to “restricted” to “unable”). An additional category depicted as “?” means “I don’t know”, for example, for an unfamiliar task. Items are scored from 4 (able) to 0 (unable or “?”). The SFS is scored manually by the assessor and yields a total score, which can range from 0 to 200. This total score corresponds to the level of perceived physical work, ranging from sedentary to very heavy, and can be compared to the Dictionary of Occupational Titles.

Following the scoring instruction of the original SFS, questionnaires with 4 or more “I don’t know” responses, were excluded from the present study because of potential bias. Moreover, the SFS has 2 internal validity check drawings with the same questions but different images to test the reliability of subjects (questions #6 and #50; #17 and #49). Subjects who showed inconsistencies greater than 3 points on the 5 point scale were also excluded.

Cross-Cultural Adaptation

The cross-cultural adaptation of the SFS was performed according to the guidelines of the American academy of orthopaedic surgeons (AAOS) outcomes committee [26] and as recommended by others in the literature [2728]. The following five steps were documented in a written report: (1) Forward translation from English to French and to German by two translators whose native language was French, or German, and fluent in English (T1 and T2). One of the translators was informed about the aims of the study, and the other received only limited information (so-called naïve translator). Moreover, none of the translators were physicians. (2) Synthesis of T1 and T2 were amalgamated to form the unique translated version T12 by resolving any discrepancies under supervision of a methodologist who was not involved in the translation process. (3) Back translation of the T12 version from French or German into English by two translators whose native language was English, and who were fluent in French, or German (BT1 and BT2). These two translators were naïve to the study and not directly linked with the medical domain. (4) Consensus meeting with all the involved subjects (translators, methodologist, specialist physicians in occupational rehabilitation) in order to resolve any discrepancies and doubts met during the translation, and to establish the pre-final French and German versions of the SFS. (5) Pre-testing of the French and German versions for the accuracy of the words and ease of understanding of the SFS was conducted with 20 consecutive patients with back complaints. Patients were asked to mention any difficulties encountered during a phone call. The last steps were realized by submitting the final version of the French SFS (SFS-F) and German SFS (SFS-G) and all reports and forms to a committee keeping track of the translated version in order to verify that the recommended stages were followed.

Participants

For each language, two sets of participants were recruited: one for the assessment of construct validity and scale homogeneity, and a second set for test–retest reliability.

For construct validity of the French version, 17 women and 70 men were recruited. These 87 subjects were consecutive inpatients hospitalized because of persistent back pain between 2004 and 2005 at the Clinique romande de réadaptation at Sion, Switzerland. The mean age was 44 years (SD: 10; range: 19–61). Test–retest reliability of the French version was assessed on a sample of 21 patients (9 women, 12 men; mean age 43 years, SD: 14, range 19–65) recruited in 2009. In addition to a history of back pain, subjects in both samples had diagnoses such as fracture (operated or treated conservatively), discal prolapse or hernia, degenerative disorders, discopathies, status after discal hernia operation, contusion(s), tight canal, olisthesis, spina bifida occulta, isthmic lysis, non specific lumbalgia, whiplash, cervical strain, transitional anomaly, Scheuermann’s disease.

Construct validity of the German version was assessed on 257 consecutive inpatients hospitalized between November 2003 and February 2006 (53 women, 204 men; mean age 40 years, SD 11, range 18–64). These subjects were recruited at the Rehaklinik Bellikon in Bellikon, Switzerland, because of persistent back pain. Test–retest reliability of the German version was assessed on a convenience sample of 51 patients (9 women; 41 men, mean age 43.6 years, SD 13 years, range 21–65) recruited in 2009. Diagnoses for both samples were similar to the French cohort.

Patients with upper and/or lower limb complaints were excluded because of the risk of influencing the SFS scores. Patients with psychopathology in which pain is the central element (such as somatoform trouble) were also excluded. Patients who had other non-disabling psychopathologies were included.

The study was approved by the ethical committees of the canton Valais and the canton Aargau, where the two clinics are located. All patients signed a written informed consent form.

Validation

All patients completed the French or German version of the SFS, the medical outcomes short form (SF-36) [29], the Hospital Anxiety and Depression Scale (HADS) [30], and the Visual Analogue Scale for Pain Intensity (VAS) [31]. Construct validity of the SFS translated versions was assessed by estimating Pearson’s correlation coefficients between the French (resp. German) versions of the SFS and the HADS, VAS, and relevant subscales of the SF-36. The physical functioning subscale (PF), the physical summary scale (PCS) and the VAS were used to assess convergent validity (high correlations expected); the mental health scale (MH), the mental summary scale (MCS) and HADS for divergent validity (low correlations expected). Ninety-five percent confidence intervals for the correlation coefficients were calculated by means of Fisher’s transformation.

Ceiling and floor effects were defined as present if at least 15% of results reached the maximum or the minimum value [32].

Internal consistency was determined by Cronbach’s α [3334], which is a general coefficient of homogeneity between items. Values for α can range from 0 (no internal consistency) to 1 (perfect internal consistency), where a value above 0.8 is considered acceptable [35].

The reliability of the translated versions was assessed by test–retest reliability and quantified by the intraclass correlation coefficient (ICC) [36]. Patients who were not expected to have a significant health status change between tests were asked to fill out the SFS-F (resp. SFS-G) on two occasions separated by 2 days. Values for ICC can range from 0 (no agreement) to 1 (perfect agreement). Bland–Altman plots were used to assess the disagreement between test and retest values [37]. Such plots show the individual score differences between tests as a function of the individual mean scores of the two tests. 95% limits of agreement were calculated as the mean difference ± 1.96 SD of the difference. The narrower the limits of agreement, the smaller the disagreement between the repeated tests.

All calculations were performed using the statistical package Stata 11.0 for Windows [38].

Results

Cross-Cultural Adaptation

The translations and back-translations of the SFS items were carried out in both French and German without any relevant difficulties. The back-translations of the T12 versions to English were very similar to the original versions. Only some typically US expressions or words were different as our back-translators were native from the United Kingdom and India. Moreover, patients did not mention any difficulties in understanding the items.

Validation

SFS-French Version

Eighty-seven subjects were eligible for the validation of the SFS-F. The excluded patients had inconsistency in the internal validity check or more than 4 “I don’t know” answers.

For convergent validity, we found a correlation coefficient of 0.63 (95% CI: 0.48–0.74) between SFS-F and PF, 0.60 (95% CI: 0.44–0.72) between SFS-S and PSC, and −0.33 (95% CI: −0.51 to −0.13) between SFS-F and VAS. The assessment of divergent validity resulted in an SFS-F-MH correlation of −0.08 (95% CI: −0.29 to 0.14), an SFS-F-MCS correlation of 0.01 (95% CI: −0.21 to 0.23), an SFS-F-HADS depression correlation of −0.26 (95% CI: −0.45 to −0.05), and an SFS-F–HADS anxiety correlation of −0.17 (95% CI: −0.37 to −0.04).

No evidence for floor or ceiling effects was found for the total score since no patient reached the minimum or maximum possible score. A floor effect was found in the items 45–48 with more than 99% of the participants rating their ability to perform the task as “restricted” (14%) or “unable” (85%) on a 5-point Lickert scale from “able” to “restricted” to “unable”). For internal consistency, Cronbach’s α was 0.98 for the SFS-F. The reliability, assessed by test–retest in 21 patients, resulted in ICC values of 0.98 (95% CI: 0.97–1.00). The mean difference between test and retest was 0.3, with 95% upper and lower limits of agreement at −11.5 and 12.1 (Fig. 1).

Fig. 1
figure 1

Bland–Altman plot for the two languages. The middle line represents the mean difference between the two tests. The upper and lower lines represent the upper and lower limits of agreement, i.e. mean difference + 1.96 SD of the differences and mean difference − 1.96 SD of the differences, respectively

SFS-German Version

Three hundred and nine subjects were eligible for validation of the SFS-G. The excluded patients had inconsistency in the internal validity check or more than 4 “I don’t know” answers.

For convergent validity, we found a correlation coefficient of 0.67 (95% CI: 0.59–0.73) between SFS-G and PF, 0.52 (95% CI: 0.43–0.61) between SFS-G and PSC, and −0.51 (95% CI: −0.60 to −0.42) between SFS-G and VAS. The assessment of divergent validity resulted in an SFS-G-MH correlation of 0.25 (95% CI: 0.13–0.36), an SFS-G–MCS correlation of 0.28 (95% CI: 0.16–0.39), an SFS-G–HADS depression correlation of −0.42 (95% CI: −0.52 to −0.32), and an SFS-G–HADS anxiety correlation of −0.45 (95% CI: −0.54 to −0.35).

No evidence for floor or ceiling effects was found for the total score since no patient reached the minimum possible score and only one scored the maximum possible value. A floor effect was found in items 45–48 with more than 97% of the participants rating their ability to perform the task as “restricted” (12%) to “unable” (85%). For internal consistency, Cronbach’s α was 0.98 for the SFS-G. Reliability, assessed by test–retest in 44 patients, resulted in ICC values of 0.94 (95% CI: 0.90–0.98). Mean difference between test and retest was 1.3, with 95% lower and upper limits of agreement at −27.7 and 30.2 (Fig. 1). A look at Fig. 1 shows a highly influential patient with a difference of over 60 units between tests. Limits of agreement calculated without that patient were −20.9 and 19.9 for a mean difference of −0.5.

Discussion

The original English version of the SFS was translated and adapted into French and German, respectively, to create the SFS-F and SFS-G versions. Evidence for reliability and validity was shown, supporting the use of the SFS-F and SFS-G as a self-report instrument for individuals with a wide range of chronic back disorders. Specifically, evidence for convergent validity, divergent validity, internal structure, and score stability were provided for the SFS-F and SFS-G.

Typical activities, especially those regarding gardening with specific tools, had to be adapted for the French and German culture. For example, a “spade-shovel” is not commonly used by patients in our countries and was modified as “shovel” (“pelle” in French and “Schaufel” in German). Thus, and although the SFS is a pictorial questionnaire, cross-cultural adaptation shows the importance of following the complete AAOS guidelines for a valuable final version.

As hypothesized for convergent validity, the correlation coefficients between the translated SFS versions and the SF-36 physical scales were fairly high, i.e. 0.63 and 0.67 for the SFS-F and SFS-G, respectively. Moreover, they were similar to values found by Gibson et al. [22] with other scales such as the pain disability index (−0.64), the work re-entry questionnaire (0.67), the SE questionnaire (0.55) and the pain self-efficacy questionnaire (0.78) [22]. Those questionnaires could not have been used in the present study because of the lack of French and German validated versions. The pain scale also showed a significant correlation with the SFS-G (−0.51), but only a low correlation with the SFS-F (−0.33). This last estimation is rather imprecise (95% CI −0.51 to −0.13), probably due to a smaller sample compared to the German version. For divergent validity, we found no correlation (−0.08) between the SF-36 mental scales and the SFS-F, and a low correlation with the SFS-G (0.25) as hypothesized. The small differences between the French and German versions might be explained either by some cultural differences regarding the implication of back problems in daily living and, consequently, the interaction with MH of the SF-36 (which has questions regarding irritability, sadness, motivation), or by sampling. The correlation between HADS and the SFS was low (0.26) for the French version and moderate (0.42) for the German version. These correlations are possibly due to the chronicity of back problems in our patients, who were recruited in tertiary centers. Patient populations with chronic occupational back pain are known to exhibit higher prevalence of psychological disorders compared to the general population [39]. Moreover, the difference between the French and German versions may, as for MH, be explained by cultural differences related to either the patients or medical practice, but also to the difference in the timing of hospitalization in the two centres after back problems were diagnosed. Furthermore, it must be kept in mind that our study samples were not randomly drawn from a population but were convenience samples. A previous study performed at the Clinique Romande de réadaptation Suva care (Switzerland) has shown that questionnaire responders differed from non-responders in some sociodemographic and biophsychosocial aspects [40]. Thus, some degree of selection bias, which may differ between clinics, may well have occurred in the present study.

A floor effect was found in items 45–48. Those items describe activities where weights of 50 kg are lifted either from floor, waist or overhead height or down again. Most participants felt they could not carry out such strenuous activities. It may be questioned whether these items are of great value for the clinical purpose of the questionnaire. Furthermore, lifting tasks involving weights over 25 kg are nowadays prohibited in most occupations in Switzerland, France and Germany.

According to the literature, a Cronbach’s α over 0.80 (over 0.90 for clinical applications) represents a good internal consistency. We found excellent α value far above these thresholds with 0.98 for both SFS-F and SFS-G. This high internal consistency may be partly influenced by the high number of items since α has the property of becoming larger with increasing item number, given equal between-item correlations [41]. However, our values are similar to those of the English versions (0.98), suggesting that the French and German translations bear the same level of internal consistency as the original version.

The reliability of both the SFS-F and the SFS-G was excellent with regard of an ICC of 0.98 and 0.94, respectively. These coefficients are higher than the values reported in the original version (0.89) [23]. Moreover, the confidence intervals (0.97–1.00 for the French version, and 0.90–0.98 for the German version) were narrow for both translations, indicating rather precise estimates.

The limits of agreement were calculated to determine the magnitude of disagreement between the two measurement occasions. With all patients included, the interval between the limits of agreement of the German version was over twice that of the French version (57.9 and 23.6 units, respectively). After exclusion of a highly influential patient, the German version’s interval was reduced to 41.8 units. However, further studies should be done to evaluate the minimal clinical important change [42] to establish whether the difference in score is clinically relevant.

Some limitations of the present study have to be recognised. First, only patients hospitalized in tertiary centers for chronic back problems were included. Thus, results of SFS-F and SFS-G questionnaires have to be interpreted with caution in other clinical settings. The use of convenience samples instead of random samples was discussed above.

Although the SFS has been successfully used for the last 20 years, some recommendations may be given here to improve its clinical utility. First, old fashioned drawings (i.e. old type of vacuum cleaner) should be replaced by new pictures of tools used nowadays. Second, reduction in the number of items would lead to important time saving and therefore further improve its clinical utility. In this context, we calculated PACT scores using either the 25 even or the 25 uneven items and ranked the patients on the full score and the two half-scores. Correlations between the full score rank and each of the half-score ranks were 0.99 for both languages, showing item redundancy. A reduction in the number of items is also supported by the high internal consistency. Third, relevant items which include posture of spinal load, such as sitting, should be included in the SFS. The development of a brief version of the SFS is clearly needed. Further studies exploring these measurement properties in different settings and with other validation tools are therefore needed.

In conclusion, the French and the German versions of the SFS, seem to be valid and reliable, and it is a tool that is easy to administer to evaluate perceived functional capacity for native French-speaking and German-speaking patients with back disorders, both for clinical purposes and research.