Introduction

Work is one of the most powerful social determinants of health; it is a central aspect of people’s lives and can influence productivity and integration in the society [1]. In contrast, being unemployed has been linked to increased mortality risk among working-age people [2, 3]. In recent years, the prevalence of chronic musculoskeletal disorders (MSDs) has increased in many industrialized countries—leading to high burden on the individual level and a significant increase in costs [4,5,6].

Policy makers and stakeholders in healthcare agree that a patient-centred and holistic approach is essential to identify the needs of the patients and to successfully address work disability [7, 8]. An increasing number of multidisciplinary (including physical therapy) occupational health programmes and vocational rehabilitation (VR) measures reflect this idea, where physical therapists play a crucial role in promoting workers’ health, preventing work-related injuries, and in developing comprehensive multimodal rehabilitation programmes to improve workers’ ability to return to and stay at work [9, 10].

The role of physical therapists in occupational health and VR remains crucial to help mitigate the effects of disability on work [11]. Further, the use of the International Classification of Functioning, Disability and Health (ICF) as the biopsychosocial approach to plan, direct and evaluate rehabilitation outcomes is a cornerstone in physical therapists practice [12].

The ICF Core Set for Vocational Rehabilitation

Health professionals value the use of the biopsychosocial model to understand health and functioning and consider the model to be helpful in designing return-to-work (RTW) programs [13, 14]. To facilitate the use of the ICF in VR, an international, multidisciplinary group developed the ICF core sets for VR, which is a list of ICF categories that describe what a clinician (or researcher) should assess in the context of work functioning and offers guidance in the selection of work functioning questionnaires or tools [15, 16]. The ICF core set for VR was validated from the perspective of physical therapists [10]. Physical therapists expressed the need to examine work-related functioning using a reliable and valid questionnaire so they can properly evaluate the worker, predict work outcomes and implement a sound RTW strategy [17].

Work Rehabilitation Questionnaire (WORQ)

Based on the ICF core sets for VR, Finger et al. [18] developed a generic patient-reported questionnaire—the Work Rehabilitation Questionnaire (WORQ) (www.myworq.org)—which was intended to capture work-related functioning of individuals who are undergoing VR [18,19,20]. WORQ was originally developed in English and has an interviewer-administered and a self-reported version, and has been translated into multiple languages, including French and German using a standard cross-cultural adaptation process [19, 21].

The French and German versions of WORQ are being used in Switzerland. The psychometric properties of the French version of WORQ has been previously investigated and has shown strong test–retest reliability and a strong internal consistency in patients with MSDs who are undergoing multidisciplinary VR. The French version however was tested in the inpatient setting as VR is commonly delivered in the inpatient setting in Western Europe [21]. In other countries including the United States, VR is typically implemented in the outpatient setting. The German version of WORQ is being used in a physical therapy outpatient clinic, but its validity and reliability has not yet been examined in such a setting. Therefore, the aim of this study is to examine the psychometric properties of WORQ-German in terms of its validity, reliability, and feasibility in a physical therapy outpatient clinic.

Methods

Participants and Data Collection

The study was conducted in one outpatient physical therapy clinic in the centre of Switzerland (Lucerne).

We employed convenience sampling, with the following inclusion criteria: working age (18 to 65 years), referred to the clinic because of an MSD and have expressed to have limitations with their work, able to speak, read and write in German, and have the autonomy to make their own decision.

After the treating physical therapist confirmed the participant’s eligibility criteria, the physical therapist explained the study to the prospective participant. If the patients agreed to participate in the study and provided written informed consent, they were asked to complete the “case-report form 1” (CRF1) for the first time (T1). To evaluate test–retest reliability, the participants were asked to complete “case-report form 2” (CRF2) 7 days after completing CRF1 (T2).

Case Report Form (CRF)

CRF1 contained sociodemographic information, WORQ, a question on general functioning (“Please indicate the extent of your problems in functioning in everyday life”), the single visual analogue scale of the health status questionnaire EuroQoL-5D (EQ5D) [22] (“We would like to know how good or bad your health is TODAY”), the Hospital Anxiety and Depression Scale (HADS-D) [23], the 12-item version of the WHO Disability Assessment Schedule (WHODAS 2.0) [24], the World Health Organization Quality of Life Questionnaire (WHOQoL) [25] and the Self-administered Comorbidity Questionnaire (SCQ) [26]. CRF 2 contained WORQ, two more questions on content validity and two questions on the usability of WORQ.

Instruments

WORQ consist of two parts [20]. Part one collects 17 items on relevant background information about the work situation, social support, work environment and sociodemographic data of the patient/client. Part two includes 40 questions on work- related functioning including body functions and activities and participation (Table 1). Each item is scored on a numeric rating scale from 0 to 10, (0 = no problems. 10 = complete problem). An overall summary score and four clinical sub-scores on emotion (item 4, 5, 6, 7, 8, 23), cognition (item 3, 9, 10, 17, 18, 19, 20, 24, 25, 26), dexterity (item 14, 15, 21, 22, 27, 28, 29, 34, 35, 36) and mobility (item 12, 30, 31, 32)—that have been derived from an earlier explanatory factor analysis—can be calculated based on the 40 functioning items from part II [20, 27]. The clinical subscores have been developed to identify underlying patterns of functioning to support clinical decision-making and intervention allocation. Ten items are not assigned to any subscale (Table 1). The developers identified these items as being relevant to complement the picture of work-related functioning and to consider the different needs of patients with various health conditions. All WORQ scores have been confirmed by Rasch analysis [28].

Table 1 Items of the Work Rehabilitation Questionnaire (WORQ)—Part II related to work functioning

EuroQoL EQ-5D EQ-5D is a standardized measurement of health status, developed by the EuroQol-Group to offer a simple, generic measure of health for clinical and economic appraisal [29]. Consistent with other studies, we only used part two, the visual analogue scale (0 to 100) as a global indicator of General Health [30].

Hospital Anxiety and Depression Scale (HADS) HADS is commonly used to determine the levels of anxiety and depression that a patient experiences [23, 31]. HADS has 14-items on anxiety and depression. HADS has a sensitivity and specificity of about 0.80 in assessing the symptom severity of anxiety and depression disorders in patients with somatic disorders and in the general population [32].

WHO Disability Assessment Schedule (WHODAS 2.0, 12-item version) WHODAS 2.0 provides a generic standardized assessment of functioning and disability in individuals with any kind of disease, including MSDs [33, 34]. WHODAS 2.0 distinguishes well between, normal population, population with MSDs and mental disorders [35].

World Health Organization Quality of Life Questionnaires (WHOQoL) WHOQoL was developed to measure quality of life in a variety of cultural settings [36, 37]. For this study we used the five questions of WHOQoL to capture the subjective appraisal of health and well-being [38]: (1) How would you rate your quality of life? (2) How satisfied are you with your health? (3) How satisfied are you with your ability to perform your daily activities? (4) How satisfied are you with your personal relationships? (5) How satisfied are you with the conditions of your living place?

Self-administered Comorbidity Questionnaire (SCQ) SCQ is an instrument to assess comorbidity for clinical and health services research. The patient is asked about 12 medical problems, each with three scoring options [26]; (1) Do you have a problem (yes/no), (2) Do you get treatment for this problem? (3) Are you restricted in your activities?

Validity

Patients who completed CRF-1 at T1 have been included in the validity analysis.

Content Validity: Patient Perspective

Content validity of WORQ was evaluated by written questions, Q1 and Q2, at the end of CRF-2. (Q1) “From your perspective, did WORQ asks all relevant aspects concerning VR? (Yes–No)”, (Q2) “Are the answering options meaningful? (Yes–No). If no, please comment”. These questions were followed by face to face interviews with seven patients.

Construct Validity

Construct validity for the WORQ summary score was examined based on five a priori hypotheses. Our criterion to reject construct validity was that two or more hypotheses were rejected.

  1. (a)

    WORQ (problems in work-related functioning) correlates moderately negative (r > − 0.5) with general health, measured with the EQ-5D VAS scale, where higher scores indicate better health. We know that functioning determines a major aspect of health, and we expect the same for work-related functioning [28].

  2. (b)

    WORQ correlates moderately (r > 0.5) and WORQ-emotion sub-score correlates highly (r > 0.7) with the Hospital Anxiety and Depression Scale (HADS). We found in earlier studies that emotional functioning was a critical factor in the inpatient setting [21] and expect the same for the outpatient setting.

  3. (c)

    WORQ correlates highly (r > 0.7) with WHODAS 2.0, because the 12 items of WHODAS 2.0 capture impaired function and disability in a similar but generic way, although not specific to the context of VR [39].

  4. (d)

    WORQ correlates weekly (r > 0.3) with WHOQoL, because functioning represents only a fraction of the trait of general wellbeing [40].

  5. (e)

    Problems in work-related functioning as assessed with WORQ correlates moderately (r > 0.5) with the number of comorbidities assessed with SCQ.

Feasibility: Patient and Physical Therapist Perspective

Patients answered two questions on feasibility of WORQ in CRF-2: Q3 “Did you have problems with the numeric rating scale? (Yes–No). If yes, please comment” and, Q4 “How is the length of WORQ for you? (much too long, a bit too long, good, a bit too short, much too short)”. In addition, we interviewed nine clinician physical therapists about the feasibility and use of information obtained through WORQ.

Reliability

For test–retest reliability, we choose 7 days in-between T1 and T2 to minimize recall bias and assuming no change in functioning during this period, i.e. response stability. Stability was also assessed subjectively based on a Global rating of Change scale (− 5 maximum worsening to 5 maximum improvement) and moreover with a paired t test (parametric) or a Wilcoxon signed-rank test (non-parametric) [41]. Test–retest reliability was calculated with ICC (parametric) or Spearman rank correlation (non-parametric). Correlations measure the strength and direction of the relationship between two variables. Values for the coefficient r can range from 0 (no correlation) to − 1 or 1 (perfect negative or perfect positive correlation); a value above 0.7 is considered as high positive [42, 43].

Internal consistency of WORQ and its clinical sub-scores at T1 was calculated with Cronbach’s alpha for WORQ. Cronbach’s alpha is a general coefficient of homogeneity between the items within a questionnaire. Values can range from 0 to 1 and can be interpreted as α > 0.9 excellent, > 0.8 good, > 0.7 acceptable, > 0.6 questionable, > 0.5 poor, and α < 0.5—unacceptable [44].

Precision

We considered floor and ceiling effects to be present if more than 15% of participants achieved either the lowest or highest possible scores, respectively [44]. The Standard Error of Measurement (SEM) represents the smallest score change that can be interpreted as real change. SEM was calculated using Cronbach’s α as reliability coefficient Rx. SEM = SD \(\sqrt {1 - {\text{Rx}}}\) [45, 46]. The Minimal Detectable Change (MDC), meaning the minimum amount of change in a patient’s average score that is not the result of measurement error, was calculated on the 95% probability as MDC = 1.96 × SEM × \(\sqrt 2\) [47].

Statistical analysis

To describe our sample, we calculated descriptive statistics and to determine normality of our data, we performed a histogram analysis and the Kolmogorov–Smirnov test. Stability was tested based on t-test (parametric) or related-sample Wilcoxon signed rank test (non-parametric). We used Pearson correlation (parametric) or Spearman correlation (non-parametric) to test our hypothesis, depending on the distribution of data. All analysis were performed with IBM SPSS Statistics for Windows, Version 25.0 [48].

Results

In total, 51 patients completed CRF-1 and CRF-2 (Table 2). A majority of the participants was female. About 75% of the patients had problems of the upper and lower extremity, and 15% from back-related problems. Ten percent of the patients had a chronic comorbidity, e.g., neurological problem. Fifty-three percent of the patients had an accident and 47% had illness-related problems. Seventy-six percent (n = 39) patients reported no comorbidities. Over half of the patients contacted the physical therapy clinic < 3 month after the onset of their health problem and 20% of the participants had their health problem for more than 20 months (Table 2).

Table 2 Characteristics of study participants

Data analysis

Only 0.2% of responses on WORQ was missing. Nevertheless, missing data for WORQ were imputed using Miss Forest analysis with RStudio—a non-parametric missing value imputation for mixed-type data [49, 50]. Because histogram and statistical analysis of the Kolmogorov–Smirnov test showed that only WHODAS 2.0 and WHOQoL scores were normally distributed, we analysed our data according to non-parametric statistical methods.

Validity

Content Validity

Forty-eight of 51 (94%) patients reported that WORQ covered all relevant topics, and 47 (93%) patients found the answer options to be meaningful, two found that the rating scale should be narrower, for example on a scale of 0 to 5 instead of 0 to 10. After completing CRF-2, seven participants reported that no valuable information on work-related functioning was missing in WORQ. They felt that WORQ facilitated a patient-centric approach to their care, because of its comprehensive set of questions allowed them to report their experience as patients. In particular, the work-related issues asked from part one of WORQ were considered valuable, although some participants found it difficult to rate the current support of their employer, given their current sick leave.

Construct Validity

Because data were not normally distributed, we used Spearman correlation to test our a priori hypothesis. (a) Consistent with our assumption WORQ correlated moderately with general health (EQ 5D) with a r = − 0.49. (b) As hypothesized, we found a strong positive correlation between WORQ-emotion sub-score and HADS (r = 0.71) and moderate correlation between WORQ and HADS (r = 0.55). (c) As expected WORQ correlated highest with WHODAS 2.0 (r = 0.81) and good with the general rating of functioning scale (r = 0.62), both of which are measures of general functioning. (d) WORQ correlated almost moderately with quality of life measured by WHOQoL (r = − 0.47). e) To our surprise work-related functioning and the number of comorbidities showed no significant correlation which could partly be explained by the low number of patients (24%) who reported a comorbidity. In addition, only five participants (9.9%) reported that this comorbidity influenced their functioning (Table 3).

Table 3 Construct validity between the Work Rehabilitation Questionnaire (WORQ) and other questionnaires

Four out of five hypothesis were confirmed. Therefore, construct validity was not rejected.

Feasibility

Forty-seven of 51 patients found WORQ useful to understand and describe their problems. The majority (78%) considered the length of WORQ appropriate, but 11 participants found WORQ to be too long. Ninety-four percent of the participants said that WORQ was easy to understand. Moreover, they found that the items of WORQ are short and simple, and easy to understand. All nine involved physical therapists reported that they have gained a significant information about how their patients with MSDs perceived their work-related functioning and their work situation. The time needed to instruct the participants on how to complete WORQ was 2–3 min. Nevertheless, the physical therapists also suggested that WORQ would be most valuable when used for patients with complex diagnosis and multiple comorbidities in the context of their restricted work participation.

Reliability

Test–Retest Reliability

Mean time between T1 and T2 was 9.1 days. WORQ sum score and sub-scores changed highly significant (p ≤ 0.001) from T1 to T2. These results were confirmed by the patients rating of the Global rating of change scale (Median +1/Range 8 (− 3 to 5)). Test–retest reliability of WORQ was high with a Spearman’s r = 0.79 (Table 4).

Table 4 Reliability (correlation) and questionnaire results

Internal Consistency

WORQ showed excellent internal consistency with a Cronbach’s alpha of α = 0.94. Also, the clinical subscales showed good to excellent results: emotion score α = 0.91, cognition score α = 0.91, dexterity score α = 0.85 and mobility score α = 0.85 (Table 5).

Table 5 Reliability results—internal consistency (Cronbach’s alpha) of the Work Rehabilitation Questionnaire (WORQ)

Precision of WORQ

No ceiling or floor effect was detected. The WORQ summary score ranged from 21 to 282/400 points with a median score of 79 points. The SEM was calculated as 13.53 points out of the maximal sum score of 400; and the MDC was calculated as 32.35 points what is equal to 8.09% change, meaning that changes in the summary score that are higher than 32.35 points can be attributed to a real change.

Discussion

Statement of Principal Findings

This study reports on the first psychometric evaluation of work-related functioning questionnaire called WORQ in individuals with MSDs in a physical therapy outpatient clinic. Based on our findings, WORQ is a valid questionnaire to assess work-related functioning in our study sample in terms of content and construct validity, although the low median of the summary score of 79/400 points—despite the lack of floor effect—suggests that the information gain with WORQ might be higher in a population with more complex functioning problems.

WORQ showed good test–retest reliability, excellent internal consistency and established its MDC. Moreover, WORQ provided physical therapists with relevant work-related information for patients with work restriction. Patients valued WORQ as a comprehensive and easy to answer questionnaire that encouraged them to express their health-related functional problems and related work-related problems in a comprehensive and non-biomedical way. We expect that the established psychometric performance of WORQ will pave the way for consequent multi-center studies looking at the clinical utility of WORQ across disease and practice settings.

Strengths of the Study

Our study demonstrated the clinical utility of WORQ German version as an ICF-based questionnaire that assesses work-related functioning in MSDs and is easy to use in a physical therapy outpatient clinic. It is also the first study to show the benefits of WORQ in an outpatient VR situation, which is also typical in many countries outside Europe. Patients and physical therapists alike have confirmed the value of WORQ to cover the multidimensional nature of work and work-related functioning. WORQ supports physical therapists to integrate the concept of return-to-work as a participation goal into their intervention planning, amidst the focus on biomedical paradigms and impairment-focused treatment plans [51]. Our study has also provided evidence that WORQ can guide the assessment by physical therapists of functioning aspects not necessarily always considered in their clinical practice such as emotions, cognition, environmental support and relationships that may have impact on the plan of care. Another strength of the study is our fairly homogeneous sample with predominantly MSDs, which represents the typical range of health conditions referred to an outpatient PT practice [52], and which allows—to an extent—the potential transferability of our results to other similar outpatient clinic settings.

Limitations of the Study

The weaknesses of our study include the observational nature of our study design, the convenience sampling, and the small sample size. Our convenience sample that includes participants with injury-related and illness- related MSD problems, satisfies the intent of WORQ which is to be health condition independent. Nevertheless, the mixed sample may reduce the transferability of the results to distinct populations, such as employees with work accidents [53, 54]. The small sample size could have potentially contributed to the non-normal distribution of the data. To account for the distribution and the improvement, we used nonparametric statistics, e.g., Spearman correlation to calculate test–retest reliability instead of Intraclass Correlation Coefficient (ICC). The full version of WORQ showed a high internal consistency with a Cronbach’s alpha of > 0.90, but we recognize that the alpha coefficient could be inflated because alpha is influenced by the number of items in the test [55]. Furthermore, while the study participants had significantly improved on their functioning between T1 and T2, we suspect that these changes have led to a somewhat lower reliability than our previous study in the inpatient setting. Aiming for a shorter test–retest period would have been beneficial, as change within 7  days could be expected in the outpatient practice setting. However, a time period of < 7 days can be critical due to increased recall bias [21]. Another limitation was that the data in this study was collected in one single outpatient clinic in the German-speaking region in Switzerland potentially limiting its broad applicability. Nevertheless, WORQ was cross-culturally translated to German and the language itself should not substantially impact the generalizability of WORQ to other similar settings.

Strengths and Weaknesses in Relation to Other Studies

Most questionnaires used in the context of work rehabilitating setting or occupational health other than WORQ, solely focus on work limitations [56], work participation, job satisfaction or job stability [57]. In contrast to WORQ, these questionnaires have been designed to identify persons at risk for work disability or dropout from the workforce. WORQ’s patient-reported functioning, instead, when used as an initial screening questionnaire, would enable physical therapists to select specific assessment instruments to determine shared rehabilitation goals and to plan a worker-focused intervention [58]. Physical therapists may also reflect upon the multiple aspects of functioning assessed by WORQ to define their role and expertise within a multidisciplinary team in VR. Recently a new ICF-based instrument, the Work Disability Functional Assessment Battery (WD-FAB) [59] was developed with the aim to inform a disability assessment process. Although WD-FAB uses in parts similar questions as WORQ, WD-FAB is based on computer adapted testing and both instruments serve different aims in different parts of the spectrum of work disability. WORQ, with its total of 57 items, serves as a clinical questionnaire to inform the return-to work process and its sustainability, WD-FAB quantifies the level of disability, based on 5 to 7 adapted questions, to support the decision process for disability evaluation.

In practice, our study supported the notion that patients felt ready to discuss priorities and preference with a professional, because WORQ had helped them to elaborate their physical, psychological, emotional or work-related concerns in advance. Hence, WORQ can promote efficiency during physical therapy encounter, especially if patients have multiple comorbidities in multiple areas of functioning [60].

Furthermore, as early identification of appropriate interventions can potentially prevent acute or recurrent health problems from becoming chronic, WORQ may help to identify in the early stage rehabilitation barriers to patient’s recovery and hence, may be helpful in work disability prevention.

Unanswered Questions and Needs for Future Research

WORQ-German version proved to be an instrument with good psychometric properties in the outpatient setting, and the French version and the Dutch version (separate work) reported similar results in terms of psychometric properties and usability. Therefore, we expect that the English version, as well as the other cross-culturally adapted versions, have comparable psychometrics as well, although further studies are needed in the future to verify the psychometric properties of WORQ. With our study’s cross-sectional design, this study was not able to evaluate the ability of WORQ to predict return to work and did not establish its sensitivity to change. Although WORQ was designed to cover work-related functioning independent of the health condition, so far, WORQ has been evaluated in a population with predominantly MSDs. Its value remains to be proven in other health conditions, especially in mental health or combined physical and mental health conditions [10].

In conclusion, we found evidence that WORQ is a valid, reliable and easy to administer questionnaire, to evaluate self-reported work-related functioning of patients with MSDs in a physical therapy outpatient clinic. Physical therapists may use WORQ for shared decision-making and goal setting in the context of return-to-work goals in their clinical practice. However, further studies will shed light on the utility of WORQ in diverse patient populations and settings.