Introduction

Functional Capacity Evaluations (FCEs) are batteries of tests to measure the ability of a person to perform work-related activities. FCEs are used in occupational rehabilitation, return to work determinations, in disability determinations and in medico-legal issues. FCEs should have proof of the ability to assess a person's ability to perform activities safely, reliably, validly, and practically [1]. During the past years, research has been published that confirms reliability and validity of the Isernhagen WorkWell Systems (IWS) FCE1 in healthy adults and patients with chronic low back pain (CLBP) [28]. ‘Safety’ has been researched only to determine whether operational definitions of safety could be applied reliably by evaluators during FCEs [812]. These operational definitions of safety include biomechanical, metabolical and psychophysical aspects [10,13] to ensure that patients remain within defined limits of strength and heart rate, and that patients do not perform beyond the point that they feel unsafe or unwilling to proceed. However, whether applying these operational definitions of safety leads to a safe assessment of functional capacity has not been investigated for this, or any other FCE [13].

It is a common clinical observation that patients with CLBP report a temporary increase of symptoms after an FCE. Increased symptom intensity is not per se regarded unusual or unsafe in FCE testing. This clinical observation was confirmed in a small pilot study sample (n=10 patients with chronic pain) published recently in this journal [13]. However, no other studies on this subject are published. The primary aim of this study was to study intensity and duration of symptom increase after an FCE in a sample of patients with CLBP, and to explore the relationship of increased symptoms with age, self-reported disability, pain intensity, lifting performance, heart rate, and work status.

The secondary aim of this study, linked to the increase of symptoms, was to explore the safety of the FCE. The term ‘safety’ is often used in FCE literature [13], but never defined. The American Physical Therapy Association's (APTA) task force on objective functional measurements, however, has defined safety. Their definition was “given the known characteristics of the evaluee, the procedure should not be expected to lead to new injury” [14, p. 682]. This definition, however, cannot be tested because of the wording used (‘...should not be expected to...’) and because of the absence of a definition for ‘injury’ in this definition. Therefore, a new operational definition of safety was used in this study: an FCE was considered safe when no formal complaints of injury were filed by the patients, and when increased symptoms returned to or below their pre-FCE level. A temporary increase in symptom intensity was not considered unsafe. References as to what is considered ‘temporary’ are unavailable. Therefore, a time limit was not included in the operational definition. Special attention in this paper is given to the lifting capacity test, because lifting is one of the most stressful items of the FCE, it is considered a risk factor for low back pain [15, 16], and is deemed potentially unsafe. Additionally, lifting provides an indication of total FCE performance [17].

Materials and methods

Subjects

Included in this study were 92 patients with CLBP (duration of more than 3 months or non-specific recurrent low back pain), who were admitted to an outpatient occupational rehabilitation program. Excluded for the program were patients with acute low back pain, specific diagnoses related to low back pain (for example disk herniations) and patients with co-morbidity (additional diagnoses unrelated to low back pain). For the FCE specifically, patients whose blood pressures and heart rate in rest exceeded 100 mmHg (diastolic), 159 mmHg (systolic) and 90 beats per minute respectively were excluded [18]. In- and exclusion for the program and FCE was performed by a physiatrist. The data of this study were collected as part of a larger study program, LOBADIS (low back pain and disability) [19].

General procedures

Prior to the FCE, the patients filled out a questionnaire to obtain demographic information (age, gender, duration CLBP, work status). Self-reported disability due to CLBP was measured with the Roland Morris Disability Questionnaire (RDQ). Scores range from 0 to 24, with 0 indicating no self-reported disability and 24 indicating severe self-reported disability. Reliability and validity of the RDQ is good [2022]. Pain intensity was measured with a 100 mm Visual Analogue Scale. The score can range from 0 to 100 mm, with 0 indicating no pain, and 100 mm indicating the worst pain imaginable. Patients underwent a standardized near full length FCE, consisting of 28 tests that were all performed in a 2.5 to 3.0 h session on one day. Tests included were: tests to measure material handling capacities (i.e. lifting and carrying), tests to measure static postural tolerance (i.e. kneeling and forward bending) and tests to measure dynamic activities (i.e. walking and reaching). Detailed information of the procedures is described elsewhere [2, 3]. Sitting and standing tolerance tests, each tests lasting 30 min, were not tested. Test-retest reliability of the material handling tests of the FCE is good in patients with CLBP [2, 4, 5]. Test-retest reliability of 17 of 21 additional tests of the FCE is acceptable in patients with CLBP [2]. Recovery between tests: patients were to start a next test only when his heart rate was below 70% of his predicted maximum ([220 − age] × 0.70). Lifting capacity was assessed by means of a standardized lifting task consisting of lowering and lifting a receptacle with incremental weights from a table (74 cm) to the floor and vice versa. The patient's maximum was reached in 4 to 5 increments. The maximal amount lifted safely 5 times within 90 s was recorded. Lifting was tested as the first item in the FCE.

Safety procedures

Medical history and examination, performed by physiatrists prior to the FCE, ensured that inclusion and exclusion criteria were met. All FCEs were led by a licensed physical therapist certified into the procedures of the IWS FCE. During the FCE three standardized safety procedures for termination of a test were adhered to. The reason for termination of a test was documented. Two or 3 reasons could occur simultaneously.

  1. 1.

    Patient endpoint. All patients signed a consent statement stating that they were informed both by the evaluator and in writing that undergoing an FCE could lead to a temporary increase of symptoms and general muscle soreness and that they could terminate testing whenever they felt unsafe or unwilling to proceed. Patients were also informed that test termination based on symptoms was solely left to the patient (thus without interpretation or inquiry of the evaluator).

  2. 2.

    Cardiac endpoint. A heart rate ceiling of 85% of a patient's age related maximum heart rate ([220 − age] × 0.85) was set on a heart rate monitor. Testing was terminated when the patient's heart rate met or exceeded this criterion [18].

  3. 3.

    Evaluator endpoint. The evaluator could terminate testing when the patient was not in full control of himself or the load that he handled. Patients were allowed to use body mechanics of their own preference, for example a semi-squat lifting technique or a trunk lift during lifting tests [23]. The inter- and intrarater reliability of this determination is good [6, 8].

After completion of the FCE, all patients were handed two questions on paper (questions translated for this manuscript from Dutch into English). Question 1: To what extend have your symptoms increased compared to yesterday (day of the FCE)? Answering possibilities were: my symptoms have decreased; no increase; little increase; moderate increase; strong increase; very strong increase. This question was to be answered 1 day after the FCE. Question 2: How many days did it last until the symptoms returned to the original level? This question was to be answered whenever this occurred. No time limit was attached to question 2. Patients who had not responded one year post FCE were classified as ‘non-responders’. No attempts were made to contact non-responders. The questionnaire could be returned by mail free of charge to a researcher independent from the evaluator.

Operational definitions of safety

An FCE was considered safe when no complaints of injury were filed by the patients and when increased symptoms returned to or below their pre-FCE level. As part of regular institutional admission procedures, all patients were informed on their rights and responsibilities, including the ability to file a formal written complaint if and when they had felt mistreated in any way. An institutional committee, functioning independent from clinicians and researchers, conducted operation of these rights.

Data analyses

Descriptive statistics were used for patient characteristics, the results of the lifting capacity test and the responses on the questionnaire. Students’ t-tests for independent samples were used to test differences between responders and non-responders. Depending on measurement level and distribution of the data, Pearsons’ correlation or Spearman's rank correlation coefficients were calculated to express the strength of the relationship between variables. Correlation coefficients were interpreted as follows: 0.25 or less little if any relationship, 0.26–0.49 weak relationship, 0.50–0.69 moderate relationship, 0.70–0.89 strong relationship, 0.90–1.00 very strong relationship [24]. Test results for males and females were analyzed separately. A p-value less than 0.05 was considered statistically significant for all tests.

Results

Patient characteristics and test performance

Characteristics of the study sample and test performances are presented in Table 1. All patients who were on modified work or off work due to CLBP received wage replacement benefits, as regulated within the Dutch Social Security system. The characteristics of those who returned the questionnaire (n=54, 59%; ‘responders’) and those who did not return the questionnaire (n=38, 41%; ‘non-responders’) are presented separately. Non-responders were significantly more frequent male. Other differences between responders and non-responders were non-significant. Males performed significantly better than females on the lifting capacity test. Percent of predicted maximum heart rate: male responders 69%, male non-responders 70%, female responders 73%, and female non-responders 68% (calculation: mean predicted maximum heart rate (220 – mean age) divided by mean observed maximum heart rate).

Table 1 Characteristics of the patients and FCE performances (n=92a)

The lifting capacity test was terminated in the following frequencies (2 or 3 endpoints may have occurred simultaneously): responders (n=54): patient endpoint occurred in 89% (n=48) of the cases, cardiac endpoint occurred in 15% (n=8) of the cases, and evaluator endpoint occurred in 13% (n=7) of the cases. Non-responders (n=38): patient endpoint occurred in 82% (n=31) of the cases, cardiac endpoint occurred in 11% (n=4) of the cases, and evaluator endpoint occurred in 16% (n=6) of the cases.

Symptom increase

The complaint committee received no complaints from any of the 92 patients up to one year after the FCE was completed. Differences in symptom intensity following the FCE are presented in Table 2. Most responders reported an increase in symptoms (76%; n=41), however some reported no difference or a decrease in symptoms. Symptom increase, when applicable, was temporary for all responders. The question about the duration of symptom intensity was left blank or was indicated as not applicable by 20.4% of the patients (n=11). All of these patients had not experienced an increase in symptoms. Of the remaining 43 patients the mean duration of symptom increase to return to pre-FCE level was 3.4 days (SD 3.4 days), minimum 1 day and maximum 21 days. Increased symptom intensity lasted 1 week or less in 93% of the patients (n=40).

Table 2 The intensity of symptom increase 1 day following FCE (n=54)

The relationships of increased symptoms with age, self-reported disability, pain intensity, lifting performance, heart rate, and work status were non-significant (Table 3). The relationships between the intensity and duration of increased symptoms on the one hand and self-reported disability (RDQ score) on the other hand were significant but weak. Repeating these analyses with inclusion of the non-responders (duration symptoms set at 365 days), then the duration of symptom increase associates non-significantly with all variables except work status (Spearman's rho = 0.26, p=0.03).

Table 3 Correlation coefficients between intensity and duration of increased symptoms and age, self-reported disability, pain intensity, lifting performance, heart rate and work status (n=53)

Discussion

No complaint was filed. Most patients reported symptom increase following an FCE. In all respondents, the symptoms returned to a pre-FCE level. An FCE was considered safe when the patients filed no formal complaints of injury, and when increased symptoms returned to or below their pre-FCE. Applying the results of this study to our operational definition of safety, it is interpreted that the FCE is a safe instrument to measure the functional capacity of patients with CLBP.

Although difficult to compare due to differences in methodology, the symptom increase in this study appears to be larger in comparison to the results of a small sample (n=10) described previously [13]. No criteria are available as to what is considered a ‘temporary’ increase, or how to interpret intensity and duration of symptom increase. To determine whether intensity and duration of increased symptoms are acceptable remains a clinical decision at this point. In our operational definition, a temporary increase of symptoms is not considered unsafe, and the results of this study apply to this definition. The results may not apply to other (implicit) definitions of injury or safety. However, we are unaware of definitions of safety and injury that is consented upon.

It may be debated whether the operational definition of safety used in this study was correct. With regard to safety during FCEs or lifting assessments, the statements of other researchers are similar. Safety (or unsafety) is referred to as ‘prevention of further injury’ [25], ‘significant complications or injury’ [26] or ‘report of injuries’ [18]. Our operational definition of safety resembles those. The next step was to assess whether the criteria for safety were met. Even though no complaints were filed, injuries may have occurred but left unreported. The (perceived) weight of the procedure and/or the (perceived) threshold to file a complaint, and a (perceived) power imbalance between patient and clinician may have prevented patients to report injuries, regardless of the patient's legal rights tot do so. Alternatively, increased pain intensity or the occurrence of an injury may not have been interpreted as mistreatment. Both may have occurred in this study. We judge this as unlikely to have happened in the group of respondents, because their symptoms had resided to pre-FCE level. Because all patients were asked to return the questionnaire when symptoms had resided to the original level, the non-responders may still have had increased symptoms more than a year after completion of the FCE. In practice we regard this as very unlikely, because none of the non-responders filed a complaint, even though their condition would theoretically be worse for at least one year. Additionally, a non-responder bias was controlled for. Besides a difference in gender, which cannot be explained, no systematic differences were found between the characteristics and test performances of the responders and the non-responders. However, non-responders may be different from responders on some unknown (and unmeasured) factor that is more closely associated with symptom change following FCE.

As part of a different study, patients were asked to undergo another FCE after completion of their rehabilitation program. Some patients refused to undergo this post-treatment FCE. Clinical information, which has not been gathered systematically, suggests that their reason for refusal was their experience of symptom increase after the first FCE. Other researchers have reported findings of a few patients refusing to undergo a second test session because they did not feel capable of participation in manual handling activities due to pain exacerbation [5]. In a study where patients with CLBP were retested on the day following the FCE, patients lifted and carried on average 6 to 9% better than the day before [4]. However, 8 to 21% patients performed worse on the second day. Whether these three observations are related to a large ‘normal’ variation in patient performances or to (un-)safety of an FCE is uncertain, and should be subject of future research. Future studies should also consider larger sample sizes to enable multivariate analyses, for example to predict symptom increase. The sample size of 54 responders and the poor strength of the univariate relationships prevented performance of multivariate analyses.

It may be debated whether a temporary increase in symptoms is related to a temporary loss of function. In different studies, pain intensity related weakly [4, 6] or moderately [8] to lifting and FCE performance, and increased pain intensity did not prevent patients to perform similarly on a consecutive day [4], or two weeks later [2]. As demonstrated in Table 3, pain intensity prior to the FCE related non-significantly to differences in pain intensity following the FCE. If pain were related to tissue damage, it could be hypothesized that patients with high pain levels prior to the FCE would perform less than patients with low pain levels, and vice versa. In a post hoc analysis we found no support for this hypothesis (Pearson's correlation between pain intensity and lifting performance of r=0.08). Pain intensity is recognized as unrelated to tissue damage in chronic non-specific pain syndromes, including CLBP [27]. These results suggest that tissue damage is unlikely. However, because tissue damage was not assessed objectively, this cannot be concluded with certainty. The temporary increase of pain intensity may also be interpreted as delayed onset muscle soreness after the performance of high intensity activities [28, 29]. This type of soreness may last from 1 to 10 days and is considered a normal response to unusual exertion [28, 29]. As judged by both the mean amount of weight lifted and the percent of maximum heart rate (approximately 70%), performances may indeed qualify as ‘unusual’ and as ‘high intensity’. However, post-FCE responses were at best only weakly related to the performance parameters. Thus, within the context of a chronic pain syndrome, we interpret the self-reported pain increase as a non-specific response to unusual high intensity exertion, rather than the onset of ‘specific’ injuries such as strains or sprains to the low back region. As such, the results appear to confirm clinical terminology: ‘it may hurt you, but it won't harm you’.

While this study was performed as part of a larger study program, regular clinical FCE procedures were adhered to. With regards to the FCE, ‘extra’ for the patients for this research was the questionnaire following the FCE. This could be regarded as a strength of this study, because the study results represent regular clinical procedures and this research may thus be generalized to everyday practice, as confirmed by a comparison with patient characteristics and study results described in other studies [6, 7]. A weakness of this study was that we did not control for pain medication use in this exploratory study. Perhaps patients started to use or increased pain medication in response to testing and may thus be misclassified as safe. After completion of this study, we regard it as very unlikely that the FCE has led to injuries, but, although we have reasoned that injuries probably did not occur, we cannot be completely sure. Future studies should consider objectively reassessing all patients after completion of the FCE for objective signs of injury.

The nature of this study was exploratory. Because of the absence of a consented operational definition of safety in literature, arbitrary choices were made about operationally defining safety. Within this definition, the results suggest that the FCE may be safe in patients with CLBP. This study may serve as groundwork for future research. Future studies should consider defining the term injury, assess this objectively and should not rely on self-reports only. Future studies should use validated measures for repeated measurements of pain intensity (VAS or NRS scales, in stead of the current scales for follow up measurements), control for pain medication use and consider larger sample sizes to enable multivariate analyses to predict symptom increase. Future studies should differentiate between a normal non-specific response to unusual high intensity exertion and the onset of ‘specific’ injuries. Additionally, it is of great importance for the progression of this field that future studies should operationally define ‘safety’.

Conclusion

A temporary increase in symptom intensity after completion of an FCE is common. Within the operational definitions of safety used in this study, assessment of functional capacity of patients with CLBP appears safe.

Footnote 1