Introduction

Work disability is a health problem with high prevalence and economic costs in industrialized societies [1, 2]. In Europe, the proportion of workers with a long term health problem or disability varies between 5.8 % in Romania and 32.2 % in Finland [3]. Increased life expectancy and prolongation of the retirement age are increasing the overall age of the workforce. With an older workforce, more workers are working with health problems [46].

In occupational health, rehabilitation and/or accommodation programs to adapt work conditions to worker skills and health are being increasingly used to support an active work life and better quality of life [6, 7].

The effectiveness of rehabilitation and work accommodation programs needs to be assessed using outcomes such as work status (active, temporary disability, permanent disability), time to return to work, duration of functional disability and costs of inability to work [79]. However, these outcomes can be useful but are limited, as they mainly assess whether workers are present or absent from their jobs [10]. They do not offer information about the worker’s participation in the job or the degree to which he or she is able to respond to the job’s demands [10, 11]. To fully assess effectiveness of intervention, outcome measures are required that describe the extent to which people increase their ability to meet the demands of the job.

In the 1990s a series of work-role specific functioning questionnaires were developed; among these, the Work Limitations Questionnaire (WLQ), the Work Limitations-26 (WL-26) and the Work Role Functioning Questionnaire (WRFQ) [10, 12]. The WRFQ measures perceived disability in terms of work limitation to perform the job due to health problems. Work limitation is defined as the level of difficulty encountered by the worker to carry out the demands of his/her job. Numerous studies have demonstrated the usefulness of these tools in English language-speaking health care environments [1315], but no versions have been adapted for Spanish-speaking health care environments. Due to possible cultural differences in perception of work, health and disease, these instruments should be systematically translated, adapted and validated for use in other cultures. Since its creation and validation, the WRFQ has been adapted to Canadian French [16], Brazilian Portuguese [17] and Dutch [18].

The objectives of this study were to translate and adapt the WRFQ to Spanish spoken in Spain and evaluate its psychometric properties.

Methods

The WRFQ is a self-administered questionnaire containing 27 items grouped into 5 subscales: work scheduling demands, output demands, physical demands, mental demands and social demands. The first two columns of Table 1 show all items and subscales of the original English version. The recall period is 4 weeks and each subscale is measured by the percentage of time in a working day the employee has difficulty performing those demands.

Table 1 Responses for item-level of the Spanish version of the Work Role Functioning Questionnaire (WRFQ)

Response options vary on a five-point scale: 0 = all of the time (100 %), 1 = most of the time, 2 = half of the time (50 %), 3 = some of the time, 4 = none of the time (0 %) and 5 = does not apply to my job. Option 5 enables employees to answer even though a particular demand is not part of their work.

For each subscale, item scores were summed up, divided by the number of items included in the subscale, and then multiplied by 25 to obtain percentages for each subscale, ranging from 0 % (difficulty all the time) to 100 % (no difficulty at any time). The same process was repeated for the global scale. The answers “does not apply to my job” were transformed to missing values. Scales containing subscales with more than 20 % missing values or “does not apply to my job” were excluded from the analysis [19].

Translation and Cross-Cultural Adaptation of the WRFQ

Translation was carried out following a systematic and standardized procedure consisting of five steps: (1) direct translation, (2) synthesis of translations, (3) back-translation, (4) consolidation of translations by a committee of experts and (5) pre-test [2024].

To complete the direct translation, three bilingual translators whose native language was Spanish spoken in Spain were selected. The first one was aware of the objectives and concepts of the WRFQ. The second one did not know them but had previous experience in technical translation of medical texts. The last translator had no previous knowledge of medicine or rehabilitation and did not know the study objectives. They worked independently and were provided with common instructions to ensure a uniform translation of the entire questionnaire. This was followed by a synthesis of translations, comparing versions and identifying discrepancies that were discussed to reach consensus between translators and researchers.

The back-translation into English was done by two bilingual translators whose native language was English spoken in the USA. They had no knowledge of medicine or rehabilitation and were unaware of the study objectives. They worked independently and were blind to the original version of the questionnaire to minimize information bias.

A multidisciplinary expert committee of bilingual professionals, consisting of an occupational health technician, an occupational physician, an occupational nurse, two linguists and a methodology expert, evaluated the process. Discrepancies between the two back-translations were identified, and, following methodological guidelines [20, 21], a consensus was reached on a pre-final version of the WRFQ adapted to Spanish spoken in Spain.

Finally, a pre-test study was carried out to assess the equivalence of the questionnaire, its understandability and applicability in the Spanish context. Possible mistakes were identified and it was verified that the instructions, items and answer choices were understandable.

Evaluation of the Pre-Final Questionnaire Psychometric Properties

Sample

Forty volunteer patients of both sexes, with a physical (musculoskeletal) and/or a mental (anxiety-depression) health problem with a minimum duration of 1 month were recruited among outpatients at the orthopedics, rehabilitation and psychiatry clinics of a large public hospital in Barcelona. Patients were between 18 and 65 years old and had different cultural levels. All spoke Spanish as their first language, were able to read and understand what they were reading and were working at least 10 h per week in the last 4 weeks.

Materials

Participants were requested to fill out the Spanish version of the WRFQ on paper, and underline or mark any difficulty on the questionnaire. In addition, they described difficult to understand questions during a 15 min structured interview that was recorded.

Procedure

During the interview each participant was systematically asked about the understandability of the instructions, of each response option and the 27 items. All comments related to difficulties on any of these questions were recorded and later reviewed by the expert committee. Possible mistakes were identified and it was verified that the instructions, items and answer choices were understandable. Revisions were made to a specific questionnaire item when 15 % or more of participants described difficulties with that item [19].

The internal consistency of the total scale and each subscale was evaluated using Cronbach’s alpha, with appropriate values ≥0.70 [25, 26]. Correlations between the subscales, subscale-total, item-subscale and item-total were evaluated, with appropriate values ≥0.46 [27].

The repeatability or stability of the instrument was assessed through test–retest reliability. The WRFQ was administered to the same group of 40 workers at two different time points, test and retest. The retest was conducted after a period ranging from 7 to 15 days. This period was considered sufficient to avoid the memory of responses and prevent variations on the observed phenomenon that could affect repeatability. The intraclass correlation coefficient (ICC) was calculated to assess the test–retest reliability. The stability or repeatability of a subscale or total scale was considered good when the ICC was above 0.70 and very good when it was above 0.90 [2628].

Face validity is the extent to which a questionnaire, in the opinion of the experts and users, is a logical measure of what it intends to measure. It is usually evaluated empirically trough comments from participating experts and users. In our study, this was assessed by the expert committee, analyzing the comments made by participants during the structured interviews. Content validity measures whether the tool is able to measure most of the construct dimensions. It was also evaluated using an empirical approach, based on judgments from the tool’s original authors (BA), as well as arguments made by the expert committee and by conducting a qualitative analysis of the comments made by the participants during the pre-test.

We also explored the floor and ceiling effects which occur when a percentage of responses to certain questions cluster at the top or the bottom of the scale. Their presence indicates a lack of discriminative ability of the question and the absence of the questionnaire’s ability to differentiate between high and low scores. Content validity is good when floor and ceiling effects do not exceed 15 % [28]. Averages, ranges and medians of the scores were determined to further describe the distribution of the responses.

Finally, construct validity was assessed using validity analysis techniques for known groups, comparing the results of the subscales in the patient groups with physical and mental illnesses. It was hypothesized that patients with only mental illness would score lower (meaning more disability) for the subscales of psychological and social demands, and patients with only physical illness would obtain lower scores for the subscales of work scheduling, output and physical demands. Patients with both types of illness (n = 6) were excluded of this comparative analysis. Since the distribution of subscale scores in both groups of patients did not follow a normal distribution, the hypothesis was evaluated by comparing the medians of each subscale in both groups of patients. The statistical significance was assessed using the U Mann–Whitney non parametric test.

The protocol of this study was approved by the Ethics Committee of Parc de Salut Mar and it respects all the principles of the Declaration of Helsinki and the Spanish legal regulations on protection of personal data.

Results

The direct translation was carried out without difficulty. However, several challenges were found related to the idiomatic usage of words used in items 2 (get going easily), 11 (sense of accomplishment), 23 (train of thought) and 26 (control your temper), which were discussed and agreed with the translators.

On the other hand, items 3–6 (start on your job, extra breaks or rests, stick to a routine, workload), 10 (people who judge), 13 (move around different locations) and 17 (bend) had several translation alternatives and required consideration by the committee of experts to reach a consensus to ensure semantic and idiomatic equivalence of both versions. In item 14 the units of measure were converted from pounds to kilograms.

When the back-translation was compared with the original version, some discrepancies were found in the language equivalence of certain words contained in the instructions and various items. Items 2 (get going easily), 5 (stick to a routine), 11 (sense of accomplishment), 16 (repeat some motions), 17 (bend, twist or reach while working), 23 (train of thought), 25 (speak with people in person), 26 (control your temper), and 27 (to get work done) had several translation alternatives and required reconsideration by the committee of experts (table 1).

Lastly, a pre-final questionnaire was consolidated in Spanish spoken in Spain, which guaranteed the semantic, idiomatic, conceptual and experiential equivalence with the original questionnaire, reaching consensus to partially reformulate the last paragraph of the instructions and wording of items 2, 11, 23, 25, 26 and 27. It was not necessary to modify or reshape the rest of the instructions, response options and other items.

The pre-final questionnaire was administered to 40 patients. Table 2 describes their socio-demographic characteristics. Comments were analyzed by the committee of experts. Most participants found no difficulty understanding the items. Nine participants (22.5 %) reported the last paragraph of the instructions was ambiguous, so it was amended, emphasizing that the questions related to “working time”.

Table 2 Participants’ socio-demographic characteristics

Eight participants (20 %) found the expression “difficult” located at the top of the column where the items were located hard to interpret. After weighing various alternatives, a decision was made to incorporate this expression in each of the possible answers as follows: 0 = was difficult all the time (100 %), 1 = was difficult most of the time, 2 = was difficult half the time (50 %), 3 = was difficult part of the time, 4 = never was difficult (0 %). No participant expressed difficulty with the response option “does not apply to my job”.

Ten participants (25 %) had difficulties with item 13 and eight participants (20 %) with item 18. All answered “does not apply to my job” since the examples did not fit their job. The committee of experts decided to delete the examples from these items.

Table 3 shows the average scores for each subscale; higher values indicate less disability at work. The social demands subscale scored the highest (76.9 SD = 21.1) and the physical demands the lowest (59.0 SD = 32.3). The items that most frequently obtained the answer “does not apply to my job” were item 14 (lift, carry, or move objects at work weighing more than 10 pounds) and item 13 (walk or move around different work locations, for example, going to meetings) and 10 (satisfy the people who judge your work).

Table 3 Pre-test results with the Spanish version of the Work Role Functioning Questionnaire (WRFQ) (n = 40)

After judging the comments made by participants during the pre-test, and resolved by consensus, the committee of experts drafted the final version of WRFQ translated and adapted to Spanish spoken in Spain (“Appendix” 1).

Assessing the internal consistency, the Cronbach’s alpha was 0.97 for the total scale. All subscales obtained Cronbach’s alpha coefficients above 0.85, except for social demands which was 0.56. Correlations between the subscales, subscale-total, item-subscale and item-total were all ≥0.46 and considered appropriate [27].

Scale ceiling effects were lowest for output demands (2.5 %) and highest for mental demands (22.5 %), exceeding the 15 % criterion [28] (Table 3).

Table 4 shows the results of the test–retest reliability; ICCs ranged between 0.77 and 0.93. The ICC for the total scale was 0.94.

Table 4 Test–retest reliability

The expert committee estimated that the face validity of the questionnaire was adequate and the participants appreciated the applicability, usability and understandability of the questionnaire. These aspects were collected in the comments made during the interviews, concluding that the questionnaire measures work disability in a logical way.

Content validity was considered adequate according to the criteria and judgment of the authors of the original version of WRFQ [1618], the arguments made by the committee of experts during the process of cross-cultural adaptation and the qualitative analysis of participant comments.

Construct validity was likewise reasonable. The median scores for the physical demands subscale were significantly lower (30 points) in participants with a physical (musculoskeletal) health problem and the median scores for the mental demands subscale were significantly lower (21 points) for patients with a mental (anxiety-depression) health problem (Table 5), although these differences were not statistically significant.

Table 5 Subscale description by type of health problem (mental or physical)

Discussion

This rigorous, stepwise procedure for translation and cross-cultural adaptation of the WRFQ led to the development of a Spanish spoken in Spain version equivalent to the original English version. Minor changes were made to maximize questionnaire understandability. It was necessary to adjust the wording of the instructions, as happened when the questionnaire was adapted into Canadian French [16], Brazilian Portuguese [17] and Dutch [18]. During the adaptation to Portuguese, a decision was made to incorporate the term ‘‘difficult” within each item. In the adaptation to Spanish this has been incorporated in each of the response options to facilitate understandability.

Several items needed to be changed after the pre-test. There are similarities with the difficulties in items 2, 6 and 26 encountered by Durand et al. [16], Gallasch et al. [17] and Abma et al. [18]. Like them, examples were removed for items 13 and 18 because their interpretation could be misleading.

The absence of ceiling and floor effects above 15 % (with the exception of 22.5 % for the ceiling effect of the mental demands subscale) indicates that the questionnaire items have acceptable discriminate ability to distinguish high and low scores, providing evidence of questionnaire content validity [28].

The highest frequency of the response option “does not apply to my job” was obtained for the items in the physical demands subscale, as in other cultural adaptations made of the WRFQ [1618]. A likely cause is that these items describe movements specific to manual work and do not apply to non-manual work, which accounted for 28 % of the sample. The highest ceiling effect for mental demands observed in our study is consistent with the results of Durand et al. [16], probably because musculoskeletal health problems have less impact on the ability of workers to handle the mental demands of work.

The internal consistency of the Spanish version of the WRFQ was very good for all subscales except for social demands. This result is consistent with those obtained by Durand at el [16] and Gallasch et al. [17]. All items, except 4, had higher correlations with their own subscale than with the total scale, confirming that the translation and cross-cultural adaptation did not alter the internal consistency of the questionnaire. However, we observed some variability in subject responses to the items of the social demands subscale (Cronbach’s alpha of 0.56) and thus, coinciding with Durand et al. [16], we believe that the internal consistency of this subscale should be interpreted with caution.

The results of the test–retest reliability are very similar to those obtained by Gallasch et al. [17]. The stability or repeatability of the questionnaire can be considered good for the output, mental and social demands subscales and very good for the physical and work scheduling demands subscales [2628].

The results show adequate construct validity of the WRFQ. On the one hand, the median scores obtained by participants, all of whom were patients with active health problems, for all subscales ranged between 62.5 and 83.3 %, indicating important difficulties in carrying out the demands of their jobs, which is not surprising.

On the other hand, as expected, the comparisons of scores between the two groups of patients indicates lower scores on the subscales of scheduling and physical demands for those with only physical health problems and, conversely, lower scores on the subscales of mental and social demands for patients with only a mental health problem. One limitation of this study could be the sample size in the pre-test; however it is consistent with the previous literature.

In conclusion, our results confirm that the process used for translation and cross-cultural adaptation of the WRFQ to Spanish spoken in Spain was carried out successfully and indicate the existence of a good preservation of its psychometric properties.