Introduction

Clinicians involved in the rehabilitation of injured workers can encounter what has been described as ‘fear avoidance behaviours’, particularly in those workers who have a prolonged course of recovery. In the Fear Avoidance Model of Exaggerated Pain Perception described by Slade and Lethem et al. [1], an adaptive response to pain is characterized by the avoidance of noxious stimulus, whereas maladaptive avoidance is characterized by an emotional reaction to pain and the resultant avoidance of any potential cause of pain. Rather than confront and manage pain, Lethem et al. [2] describe a cycle of fear and anxiety. Often it has been said that fear of pain and re-injury can be more disabling than pain itself [3, 4].

This maladaptive anxiety and avoidance of activities that are perceived as a potential cause for increased pain or injury is debilitating, and has been linked in the fear avoidance literature to ongoing chronic pain-related disability, prolonged work absence and depression [5, 6]. However, models of fear avoidance also point to modifiable pathways whereby clinical interventions can assist those living with chronic pain to better understand the nature of their pain and promote a healthier response to potentially painful activities [7, 8]. Such interventions are contingent on accurately identifying those at risk for fear avoidance beliefs and related behaviours so that they might be directed to optimal care and treatment, such as cognitive-behavioural therapy.

One scale, the Fear Avoidance Beliefs Questionnaire (FABQ), has gained popularity through the work of Waddell [3] and subsequent researchers in chronic and acute low back and neck pain, though its validity in painful conditions of the upper limb has not been tested. There has also been some testing in populations covered by workers’ compensation insurance (see Table 1) [915].

Table 1 Studies using FABQ in workers

The FABQ has demonstrated good psychometric properties in numerous international studies and has been validated in several different language groups [1622]. The FABQ has two subscales related to fear avoidance beliefs about work (FABQ-W) and physical activity (FABQ-PA). The FABQ-W has seven items, and the FABQ-PA has four. In addition, five items are fielded in the FABQ, but not used in the scoring of the sub-scales. Responses range from 0 (strongly disagree) to 6 (completely agree) on a seven point scale. Items are summed for the respective sub-scale scores for a maximum score of 42 for the FABQ-W and 24 for FABQ-PA; higher scores represent more fear avoidance beliefs. Waddell has reported Cronbach’s alpha coefficients of 0.77 (FABQ-W) and 0.84 (FABQ-PA) [3] which is an acceptable reliability for analyses in grouped data [2325]. Cut-offs for patients with low back pain require further validation to identify patients at risk [26], though a cut-off of >29/42 on FABQ-W has been shown to be predictive of poor outcomes in patients receiving workers’ compensation benefits [10] and related to a higher risk for prolonged work restrictions [11]. Likewise, scores >13 on the FABQ-PA are considered high and have been predictive of poor outcome in patients receiving workers’ compensation [10]. Minimal Detectable Change for the FABQ was found to be 12/42 for FABQ-W and 9/24 for the FABQ-PA [19]; though the Minimally Clinically Important Difference (MCID) has been defined as low as 4/24 for the FABQ-PA [27]. No Clinically Important Difference (CID) has been proposed for the FABQ-W [28].

Other investigations have suggested alternate scoring of the FABQ subscales [16, 18, 20]. Factor analysis has identified up to four factors, which was not considered to be practical in implementation. It has also been suggested that a single overall FABQ score is less prone to ceiling effect [22]. Results in this paper will use the ‘standard’ two subscale scoring system proposed by Waddell et al. [3] for comparability.

The purpose of our study was to evaluate the measurement properties of the FABQ in a population of upper extremity injured workers attending a Workplace Safety and Insurance Board (WSIB) Shoulder and Elbow Specialty Clinic.

Methods

Study Design

A cross sectional survey of 187 injured workers attending the WSIB Shoulder and Elbow Specialty Clinic for an accepted work-related claim was completed. Workers attending this clinic come from a variety of occupational backgrounds. Most people referred to this clinic have experienced a prolonged or complicated course of recovery and are sent for an interdisciplinary evaluation. Prognosis, recommendation for investigations, and treatment as appropriate are provided. The survey was completed during the course of the clinic visit, and a random subset (n = 48) repeated the survey by mail 2 weeks later in order to obtain test–retest data.

Inclusion/Exclusion Criteria

Injured workers attending the clinic who were able to complete questionnaires/informed consent in English were invited to participate in the study. We excluded only those who refused to provide written consent or who were unable to complete questionnaires/informed consent in English. Approval from the Sunnybrook Health Sciences Research Ethics Board (REB) was obtained before beginning the study (Sunnybrook REB: 313-2005).

Measures

Workers meeting the inclusion criteria were provided a copy of the FABQ that was modified, with the developer’s (Waddell’s) permission, to read “shoulder and/or elbow problem” in questions 3 and 11. All workers also completed the Upper Extremity Workers’ Survey (UEWS) as part of routine care during a worker’s first visit to the clinic. This survey contains patient-reported outcomes that have good measurement properties and are predictive of future course in chronic pain populations (see Table 4). These covered constructs well suited for the construct validation of the FABQ including the Shoulder Pain and Disability Index pain subscale (SPADI) [29, 30] and the Von Korff Chronic Pain Grade (CPG) [31] (Pain Intensity); the QuickDASH (Physical Function) including optional work module (difficulty performing tasks at work) that is common to both the QuickDASH and DASH [32]; the Short Form-36 (SF-36v2) [33] (Mental Health); a Self-Administered Comorbidity Questionnaire [34] and demographic questions including questions about the number of days off work and current work status. Injured workers who had been able to participate in some paid employment in the past month also completed work-related measures. The Work Instability Scale (WIS) is a 23 item yes/no scale that measures the degree of discord that may exist between the worker’s functional abilities and the demands of their job (amount of job instability). The WIS has been validated in a Rheumatoid Arthritis population [35] and was favourably received in a previous unpublished study of this clinic’s injured worker population.

Data Collection and Management

The FABQ (baseline and retest) was collected by pen and paper on scan-able forms (TeleForm). Data from the UEWS was collected by pen and paper on scan-able forms (TeleForm) or by a touch-screen computer interface (Ortech) in the clinic, as per the worker’s preference. Data entry of the paper forms was completed using TeleForm v8.2 software and stored in a Microsoft Access database. Data from Ortech was entered by the workers using the touch-screen interface. Data from the two sources was merged in Microsoft Access and imported into SAS for analysis.

Retest data was collected from a random subset of workers. A table of 50 random numbers was generated in SAS for the first 100 workers recruited in the study and used to identify those who would be asked to participate in the retest portion of the study. Retest subjects were provided a copy of the FABQ and a single-item indicator of change (five response categories, 3 = no change) which asked if the worker’s concerns about how pain was affecting them, their work and their physical activity had changed in the past 2 weeks. Workers were instructed to mail their response back in 2 weeks in a stamped envelope that was provided. If the response had not been received at the end of 3 weeks, the research team contacted participants by phone to remind them to mail back their response and to offer replacement questionnaires. Forty-nine workers mailed back a retest questionnaire, though only 48 had completed the FABQ at baseline. Workers who had both baseline and retest data were included in the retest analysis.

Statistical Analysis

Sample Description

Univariate analyses and frequency distributions were used to describe the demographic features of the sample as well as the core measures used in the analysis. Significance was set at P < 0.05. SAS 8e was used for all analysis. Workers’ self-reported occupational backgrounds were classified according to the National Occupational Classification Matrix [36].

Fear Avoidance Beliefs Questionnaire Descriptive Statistics

Item (frequency of responses, missing data, item to total correlations) and scale (mean, median, floor, ceiling) level description was done for FABQ-W and FABQ-PA. The Wilk Shapiro statistic was used to evaluate the normality of the FABQ subscale scores with P > 0.05 indicating agreement with the null hypothesis of normality. Floor and ceiling effects were considered to be present if >15% of the sample had the maximum/minimum possible score for the FABQ-W and FABQ-PA [23, 24].

Reliability of the FABQ

Internal Consistency

The internal consistency was measured using Cronbach’s alpha, seeking ≥0.9 [2325] for subscale scores. Test–Retest reliability was measured from data collected on a random subset of the sample. Testing was performed on subscale scores using the intraclass correlation coefficient (ICC(2,1)) INTRACC macro in SAS, which is based on the methodology of Shrout and Fleiss [37]; we sought an ICC of >0.90 [2325]. Test–Retest reliability was assessed on the subset of the sample submitting retest data (n = 48) and then again specifically on those who said they were stable on the single indicator of change (n = 23). The Minimal Detectable Change (MDC95) was calculated for both the FABQ-W and FABQ-PA.

Construct Validity

Several theories were proposed to evaluate construct validity. Concurrent validity was assessed using Spearman rank correlations (r s) as many of the constructs were not normally distributed. We interpreted the correlations as reflecting an excellent relationship, r s ≥ 0.8; good, r s 0.6–0.79; moderate, r s 0.4–0.59; low, r s ≤ 0.39. We set a priori expectations for the correlations between FABQ-W and FABQ-PA with related constructs including Pain intensity (a priori: r s > 0.4) using the SPADI pain subscale [29, 30], and Von Korff Chronic Pain Grade [31]; Physical function (a priori: r s > 0.6) using QuickDASH [32], Mental health (a priori: r s > −0.4) using the Mental Health (sf-MH) and Role-Emotional (sf-RE) Scales and the Mental Health Summary Measure (MCS) of the SF-36 [33]. A priori expectations for work constructs included the self-reported number of days off work (a priori: r s > 0.4); current work status (a priori: those who have not returned to work will exhibit more FAB); and amount of job instability (a priori: r s > 0.4) using the WIS. Finally the difficulty performing tasks at work was assessed using the Work module from the QuickDASH (a priori: r s > 0.4). The WIS and Work module of QuickDASH were only available in those who were working at the time of assessment.

Results

Sample Description

Two hundred and fifteen workers attending the WSIB Shoulder and Elbow Specialty Clinic were invited into the study and completed the questionnaire. Twelve did not sign the required consent form and were therefore excluded from analysis. Sixteen workers did not complete the Upper Extremity Workers’ Survey: five could not complete the survey in English, four were attending the clinic for a reassessment and thus did not complete the UEWS which is only collected during a worker’s first clinic visit, three were unable to complete the survey due to pain or the nature of their injury, and four did not complete the survey for other reasons, leaving 187 workers available for analysis. Forty-eight workers completed both a baseline and retest FABQ questionnaire.

The mean age of the workers was 45.2 years, 54.2% were male, and 56.0% reported having performed some paid work in the past month. Workers came from a variety of occupational backgrounds, with 40.7% representing jobs from Trades, Transport and Equipment Operators and Related Occupations, 18.0% from Sales and Service Occupations, and 16.7% from Occupations Unique to Primary Industry and to Processing, Manufacturing and Utilities. In terms of their overall general health, 79.9% of workers self-reported themselves to be at least good (SF-36); mean PCS/MCS scores were 38.5/43.1. Physical function, as measured by the QuickDASH, had a mean score of 59.18; the Interquartile Range (IQR) for this sample was 45.45–75.0, representing moderate to high disability. Details of the demographic characteristics of the workers are shown in Table 2.

Table 2 Sample description

Fear Avoidance Beliefs Questionnaire Descriptive Statistics

Item distribution, missing items, and item to total correlations are presented in Table 3. Many responses were found in the ‘completely agree’ column, particularly for FABQ-W subscale and the first two items in the FABQ-PA subscale, which led to low variance and low item to total correlations. The two questions that had the highest number of missing responses were “I do not think that I will be back to my normal work within 3 months” and “I do not think that I will ever be able to go back to that work”. Item to subscale correlations were higher for FABQ-PA (0.49–0.66, median 0.61) than FABQ-W (0.10–0.68, median 0.54).

Table 3 Univariate Description of FABQ

The subscale scores follow a similar pattern except for one item “I do not think that I will be back to my normal work within 3 months”, the mode and median responses scored at the ceiling (completely agree) for all items in FABQ-W, and were only slightly lower for FABQ-PA. The mean FABQ-W was 35.2/42 and mean FABQ-PA was 20.3/24. Both subscales had a Shapiro–Wilk P value of <0.05, representing a non-normal distribution. Both subscales had a high ceiling effect with FABQ-W having 22.9% of respondents scoring 42/42, and the FABQ-PA having 38.3% scoring 24/24. Neither scale had a single respondent scoring at the floor. Subscale score distributions are presented in Fig. 1a (FABQ-W) and Fig. 1b (FABQ-PA).

Fig. 1
figure 1

a, b FABQ subscale score distributions

Reliability

Internal Consistency

Cronbach’s α for both the FABQ-W and FABQ-PA were lower than our a priori threshold (α = 0.90). Test–retest analysis as measured by an intraclass correlation coefficient for FABQ-W was 0.52 and for FABQ-PA was 0.59. Workers who participated in the retest portion of the study also completed a change in their concern about their condition question. Two workers (4.26%) reported being less concerned about their pain and its affect on their work and ability to perform physical activities on our global indicator of change, 23 (48.94%) reported feeling about the same as when they completed the FABQ at baseline, and 22 (46.80%) reported increased concern about their injury (note, one worker did not answer the change question). Of the 23 (48.94%) who reported no change in their level of concern related to their pain and its affect on their work and ability to perform physical activities, test–retest analysis as measured by an intraclass correlation coefficient for FABQ-W was 0.55 and FABQ-PA was 0.69 which was still lower than desired, but indicating more stability in the scores when compared to the whole follow-up sample. However, the MDC95 for the FABQ subscales were calculated to be FABQ-W = 13 and FABQ-PA = 8 which represent change scores equivalent to 30–33% of the scale length (see Fig. 2).

Fig. 2
figure 2

FABQ minimal detectable change @ 95%

Construct Validity

Validity results are summarized in Table 4. Of our 22 a priori theories of how a “good” measure of fear avoidance should behave, only six were confirmed. As found by Waddell, there was no correlation between FABQ-W/FABQ-PA and age (r = −0.05/r = 0.01) [3]. Lower than anticipated correlations were found between FABQ-W/FABQ-PA and Pain intensity (SPADI r s = 0.24/0.23; Von Korff r s = 0.25/0.25), Physical function (QuickDASH r s = 0.48/0.45) and indices of Mental health (sf-MH r s = −0.18/−0.23; sf-RE r s = −0.33/−0.26; MCS r s = −0.25/−0.30). Lower than anticipated correlations were also found between FABQ-W/FABQ-PA and the number of days off work (r s = 0.31/0.17). However, some work-related constructs had anticipated correlations to the FABQ-W/FABQ-PA in terms of current work status (Wicoxon Rank Sum Z = 3.0497), the amount of job instability the worker perceived (WIS r s = 0.46/0.38), and the amount of difficulty performing tasks at work the worker reported (QuickDASH, optional work module r s = 0.51/0.42).

Table 4 Construct validity

Discussion

High ceiling effects and lower than expected reliability and validity correlations result in our inability to confirm the reliability and validity of the Fear Avoidance Beliefs Questionnaire in our population. We studied injured workers with upper extremity musculoskeletal disorders within a workers’ compensation system. We were interested in the FABQ because it has been widely used in rehabilitation clinics and studies with the chronic low back pain population, and has been frequently supported as a screen for patients at risk of poor outcomes related to fear avoidance beliefs [6, 10, 38]. Clinicians in our own clinic have observed ‘fear avoidant’ behaviours and beliefs and were eager to have an instrument to capture this. This concept remains, in their minds, a key predictor of outcomes after clinic attendance and an indicator of the need for cognitive behavioural intervention. The results of our survey unfortunately suggest the FABQ is not the correct instrument in this setting.

Although the FABQ has been used in other groups of injured workers [11], including those receiving workers’ compensation benefits [10, 12, 15], the mean FABQ-W and FABQ-PA scores are higher in our study’s population, and displayed higher significant ceiling effects [39] than was reported in these studies. The ceiling effect we observed could be explained by the content of the individual items in a workers’ compensation context. For instance, an injured worker can only respond “strongly agree” to item 6 of the FABQ-W subscale “my pain was caused by work or by an accident at work” (item 6, FABQ-W) (see Fig. 2). This leads us to question whether the FABQ is measuring fear avoidance beliefs in this group, or the status of their claim. George found higher FAB scores in WCB patients than other payer sources, however they did not report ceiling effects at an item or scale level [12]. Cleland found WCB patients reported higher scores for the FABQ-W, but not for FABQ-PA when compared to patients from a private insurance group [10]. McHorney says a sample with >15% at the ceiling is concerning for both detecting change and distribution of error [24]. As has been previously suggested [40], simply changing the wording of the two questions in the FABQ may not be adequate to capture FAB in patients with shoulder and elbow disorders.

Furthermore, this study found a lower than desired Cronbach’s alpha for reliability analysis in individual data, and a test–retest reliability that is lower than recommended for discriminative purposes [23]. FABQ-W was lower than Waddell’s original findings (α = 0.75 vs. α = 0.88), and FABQ-PA was similar to Waddell’s findings (α = 0.78 vs. α = 0.77) [3]. Test–retest analysis was complicated by the large number of people who reported increased concern during the retest questionnaire. This was likely due to the clinic setting in which this study was performed. The nature of the clinical assessment at the WSIB Specialty Clinic is to determine what, if any, further clinical investigations/interventions (including surgery) might benefit an injured worker. Workers were given these clinical interpretations between baseline and completing the retest questionnaire. However, even when considering only those who reported no change in concern, the reliability coefficients are still lower than desired. The MDC for the FABQ-W in this sample was 13, as was previously reported, however the MDC for FABQ-PA in this sample was lower than previously reported (8 vs. 9) [19]. However these results for the MDC are still very large, requiring an individual to go from 100% FAB to almost none before change can be detected.

The FABQ subscales are somewhat supported by concurrent construct validity analysis, though this must be considered in light of the low reliability. Correlations between constructs of pain intensity, physical function and indices mental health were lower than anticipated in this study. Previous studies have found more significant relationships between the FABQ sub-scales and pain intensity [4, 11, 21, 22], disability [3, 4, 21], and indices of mental health [3, 21]. Construct validity of the FABQ-W was better supported by work-related constructs such as work instability (WIS) and work disability (QuickDASH-W).

Strengths

This study had a good sample size and excellent response rate for the retest portion of the study (48/50). This study was also unique in its attempt to repeat the measure in a new population, and is one of a few studies that have fielded the FABQ in an exclusively compensated injured workers’ sample.

Limitations

This study did not field a broad range of tools related to FAB. The FABQ may have better insight if compared to constructs such as catastrophizing and active coping. Instead, fear avoidance was measured by an indication of concern, which seemed unstable in this population; perhaps due to the treatment recommendations made during the clinic visit (surgical interventions, job retraining, and return to work). We do not know if the high levels of FAB are ‘true’ or are an artefact created by this scale, though our clinical team did not believe the prevalence of FAB to be this high, particularly since more than half (56.0%) of the sample were working at the time of the study. Recognizing that even within various occupational fields there is a range of physical demands, future studies should include a more in depth exploration of the physicality of workers’ jobs to see if these impact FAB.

As our study only included compensated upper extremity injured workers, our findings cannot be generalized outside other such samples.

Conclusions

All research is dependent on the ability to measure our key variables. Our team is committed to the need to measure FAB in long-term compensated upper extremity injured workers. Our study has raised concerns about the ability of the FABQ to meet this need; concerns that we believe need to be addressed in a study comparing this measure with other FAB instruments and other psychometric instruments. The current study indicates that the FABQ does not meet statistical standards for individual use as a screen in a population of upper extremity injured workers.