Introduction

Fatigue is a subjective experience, often defined as a persistent tiredness or weakness, being physical, mental, or both. It is common in the general population [1, 2] but is also present in a wide range of diseases. It may influence physical, behavioural, cognitive, and social functioning, interfere with daily activities and restrict recreational activities [3, 4].

Postoperative fatigue (POF) is an often underestimated problem after surgery [5, 6]. It may impede patient recovery, with an impact similar to that of pain [7, 8] and is often reported as one of the most distressing symptoms by patients [9, 10]. It is most prominent during the first postoperative days, but may last several weeks [5, 9, 11]. Although POF is often proportional to the invasiveness of a surgical procedure [6, 11,12,13], in some cases, surgical procedures with extensive tissue damage (e.g. hip or knee arthroplasty) are associated with less POF than would be expected [14]. The aetiology of POF is poorly understood but appears to be multifactorial [15, 16].

Being a subjective experience fatigue is a difficult construct to define and measure [10, 17]. Although some question the existence of more than one dimension [18], fatigue symptomatology is often considered to fall into two dimensions: physical and mental [17]. In addition, how it impacts daily activities may be of interest, especially in recovery after day-surgery. The dimensions may be influenced differently depending on the origin, intervention, or population. It is therefore argued that multidimensional assessment measures ideally should be applied in descriptive research or for identification of underlying mechanisms [10]. A multitude of fatigue scales are available [4, 10, 19]. Furthermore, some health-related quality-of-life (HRQoL) instruments have multi-item subscales that assess fatigue as part of broad assessments of HRQoL; such measures include the EORTC QLQ-C30 [20] (developed for cancer research) where 3 of 30 items assess fatigue and the generic MOS 36-item Short Form (SF-36) [21] where 4 of 36 items assess energy/fatigue.

One problem in POF research has been the use of non-validated measures [6], and also the use of single-item or one-dimensional measures [5]. Additionally, use of different measures across studies makes systematic reviews and pooling of evidence difficult. Further, the numeric scores have rarely been related to the clinical significance of the reported fatigue.

This paper aims to define operational cut-off points for clinically significant fatigue in a postoperative setting in three different fatigue instruments. These instruments are the Postoperative Fatigue Scale (PO-FS) [22], the Christensen Fatigue Scale (ChrFS) [23], and finally the Chalder Fatigue Questionnaire (CFQ) [17]. Cut-off points will allow for analyses of the clinical impact of POF as a complement to statistical comparisons of mean scores.

Methods

Study design and study population

The data in the present study have been collected as part of two previously published fatigue-related studies from our research group; a validation study in a mixed day-surgery population [22], and a randomised controlled trial (RCT) on fatigue after laparoscopic cholecystectomy [24]. For the present analyses, data from the two studies were combined. See Table 1 for basic patient characteristics and Table 2 for details on the categories of surgery performed.

Table 1 Basic characteristics for enrolled patients
Table 2 Categories of surgery performed

In both studies, fatigue was measured using the PO-FS, ChrFS, and the CFQ. Data were recorded pre-operatively on the day of surgery (day 0), and on postoperative days 1, 3, 6, and 30. Recall interval was the last 2 days, except for day 1 which referred to the last 24 h. Figure 1 summarises enrolment in both studies, lost participants, and the number of forms available at each time point.

Fig. 1
figure 1

Combined flowchart for the validation study (Study 1) and the RCT (Study 2)

Assessment tools

Postoperative Fatigue Scale (PO-FS)

In 2006, Paddison et al. presented a measure specifically developed for POF research: the 31-item Identity-Consequences Fatigue Scale (ICFS) [25]. Even though this instrument has been validated and displays good psychometric qualities, it may be perceived as too inclusive and cumbersome for clinical research, particularly if POF is not the primary outcome. Recently, a secondary validation of the ICFS revealed significant item redundancy, and hence an abridged version was derived and validated in a large, mixed surgical population [22]. The abridged version, the PO-FS, consists of 10 items capturing 3 dimensions: performance of daily activities, fatigue, and vitality. The scale has no mental/cognitive dimension, as these items in the original four-dimension/31-item ICFS showed lack of sufficient change over time, and were thus eliminated during the generation of the 10-item version. Scores on this scale range from 0 to 100.

Christensen Fatigue Scale (ChrFS)

The ChrFS [23] is a single, numeric rating scale (1–10) with four verbal anchors. See Fig. 7 in Appendix. Due to its simplicity, it is user-friendly and it may thus fit well into any questionnaire. It has been used in several clinical studies on POF. However, as the scale is one-dimensional and primarily related to physical fatigue, it will not be able to distinguish potential different aspects of POF. Scores on this scale range from 1 to 10.

Chalder Fatigue Questionnaire (CFQ)

The CFQ [17] is a widely used measure in chronic fatigue research; it has good psychometric qualities and has been validated in several languages. It was primarily developed for research on chronic fatigue syndrome (CFS) and general fatigue in community settings. It has gained widespread use also outside of CFS research [10]. It consists of 11 items, representing two dimensions: mental fatigue (four items) and physical fatigue (seven items). However, unpublished data from our validation study [22] revealed that similar to the ICFS, in postoperative patients the mental dimension items displayed minimal change over time compared to the physical dimension items. Hence in a surgical recovery setting the CFQ may principally be regarded as a measure of physical fatigue. Scoring may be bimodal or continuous; in our analyses, we have applied continuous scoring. Scores on this scale range from 0 to 33 with a calculated [26] cut-off score at ≥ 18.

For all scales, higher scores indicate greater fatigue severity.

To our knowledge, neither ChrFS nor CFQ have been validated in a mixed surgical population.

Ethics

Informed written consent was obtained from all patients before inclusion in the studies. Approval for both studies was obtained concurrently from the Regional Committee for Medical Research Ethics; Ref 2009/2171. The RCT was registered in ClinicalTrials.gov; ID: NTC 01125982.

Analysis

The aim of this study was to define at which level the reported fatigue was considered clinically significant by the participants. To accomplish this, one key question was added in each form; “Given your current description of fatigue; would you say it has been of considerable significance to you?”; “Yes/No”. By this dichotomisation, the responses served as an anchor; defining whether clinically significant fatigue was present or not. Against this anchor, we analysed each scale’s ability to identify clinically significant fatigue, by performing receiver-operating characteristics (ROC) analyses. ROC analysis is used for diagnostic tests with a dichotomised outcome, where a ROC curve plots Sensitivity and (1-Specificity) against a series of cut-off points. We estimated the area under the ROC curve (AUC) as a measure of the accuracy of the instrument. We calculated the optimal cut-off point between Sensitivity and Specificity by the Youden Index J (J = max [Sensitivity + Specificity] – 1) and the Point closest to the (0, 1) (C* = √[(1 – sensitivity)2 + (1 – specificity)2]) [27, 28].

To generate a pragmatic overall cut-off point, for each scale, we calculated a weighted average of the different cut-off points for each day on which fatigue assessments were made. We calculated Likelihood Ratios and Predictive Values related to this cut-off point. Predictive Values were calculated assuming that the positive responses to the key question represented the prevalence of fatigue at the corresponding day.

Scale reliability was measured by Cronbach’s alpha (α) coefficient. Correlations between scales were analysed with Pearson’s r.

Scale responsiveness was assessed by calculating the standardised response mean (SRM). Values of 0.20, 0.50, and 0.80 are deemed to represent small, moderate, and large responsiveness, respectively [29, 30].

The McNemar test was used to analyse the association between proportions.

Missing data were handled by replacing the missing value with the mean score of the subscale.

We performed statistical analyses using MedCalc for Windows®, version 17.9, IBM SPSS® Version 23, and MS Excel 2010®.

Results

A total of 583 patients were considered eligible for participation in the two studies, and 422 were enrolled. The validation study had pre-operative baseline data from day 0 (n = 299) and from postoperative day 3 (n = 205). The RCT on fatigue after laparoscopic cholecystectomy had equivalent data from day 0 (n = 123) and days 1, 3, and 6 (n = 110) and day 30 (n = 99). See Fig. 1 for details.

Returned forms

The returned forms included all three questionnaires. There were few missing data, with a total of 7 items missing in the PO-FS and 17 items missing in the CFQ for the entire period. Four out of 422 participants did not fill in the ChrFS on day 0, while 5 out of 315 were missing on day 3. ChrFS was completed in all returned forms on days 1, 6, and 30.

Being young and/or male was associated with a lower return rate. Mean age of patients not returning the forms was 34.5 years, while those returning the forms had a mean age of 46.7 years (p < 0.001). While 36.8% of men did not return the forms, 17.1% of women did not return the forms (p < 0.001). Apart from this, no clinical or demographic differences were found, when compared to the responding group.

Demographic analyses

At baseline, there were no differences in fatigue between gender, age, BMI, or ASA status. However, at day 3, women reported significantly more fatigue, and there was a negative correlation between age and fatigue, i.e. that younger persons tended to report more fatigue. There was no difference related to BMI or ASA status at any time point.

Scale metrics

Responsiveness of the scales, expressed as Standardised Response Mean (SRM) comparing mean fatigue on Day 0 and Day 1 was PO-FS: 1.15, ChrFS: 1.35, and CFQ: 0.88. Likewise for the subscales of PO-FS and CFQ: PO-FSfatigue: 0.92, PO-FSvitality: 0.82, PO-FSdaily activities: 1.17, CFQphysical: 1.05, and CFQmental: 0.17.

Cronbach’s alpha for PO-FS on day 0, 1, 3, 6, and 30 was 0.861, 0.911, 0.912, 0.893, and 0.868, and for CFQ it was 0.851, 0.895, 0.902, 0.861, and 0.834, respectively. Correlation between the (sub)scales at day 3 is shown in Table 3.

Table 3 Correlations between ChrFS and the subscales in PO-FS and CFQ at day 3

ROC curve analyses showed that AUC ranged between 0.810 and 0.950 for all scales, with PO-FS consistently having a larger AUC at every data point compared to the other scales. See Fig. 2 for details. ROC analysis for day 30 was not possible, because at that time no patients reported fatigue to be of considerable significance.

Fig. 2
figure 2

Area Under Curve (AUC) for “Clinically Significant Fatigue” measured Pre-operatively (Day 0) and Day 1, 3, and 6 with the Postoperative Fatigue Scale (PO-FS), the Christensen Fatigue Scale (ChrFS), and the Chalder Fatigue Questionnaire (CFQ)

The calculated optimal cut-off points for clinically relevant fatigue for PO-FS were 43.1, 58.6, 53.6, and 51.7 on days 0, 1, 3, and 6, respectively. Corresponding values for ChrFS and CFQ were 5, 8, 6, and 5, and 13, 20, 16, and 16, respectively, (Table 4). Calculations with either Youden Index J, or C* yielded the same results.

Table 4 Sensitivity and specificity with 95% confidence interval at the respective optimal Cut-off point on days 0, 1, 3, and 6

The results were weighted according to the number of participants on the respective days. The mean level of clinically significant POF from days 0, 1, 3, and 6 pooled, was 49.3, 5.7, and 15.1 for PO-FS, ChrFS, and CFQ, respectively. Accordingly, the operational cut-off points are ≥ 50 (0–100), ≥ 6 (1–10), and ≥ 16 (0–33), respectively.

Due to the observed differences in fatigue related to age and gender on day 3, we also analysed whether age or gender influenced the cut-points. There were no significant differences in the cut-off points between age groups and gender, and the few differences observed were small and inconsistent in direction between the three scales. Further, there were some inconsistencies between the Youden Index J and C*.

When we applied the averaged cut-off point for each scale, sensitivity ranged from 62.2 to 78.4% pre-operatively, from 87.0 to 98.2% on day 1, from 83.0 to 90.2% on day 3, and from 65.4 to 88.5% on day 6. Specificity ranged from 90.8 to 94.2% pre-operatively, from 48.2 to 56.4% on day 1, from 71.5 to 73.4% on day 3, and from 88.0 to 91.6% on day 6 (Fig. 3).

Fig. 3
figure 3

Sensitivity and specificity at the averaged cut-off points for PO-FS (≥ 50/100), ChrFS (≥ 6/10), and CFQ (≥ 16/33)

The Positive Likelihood Ratios (LR+) were low in all scales on day 1 (1.90–2.12), slightly higher on day 3 (3.00–3.39), further increased on day 6 (6.78–10.49), and was highest in all scales pre-operatively/day 0 (8.49–11.70). The Negative Likelihood Ratio (LR-) varied between the scales in a more inconsistent pattern (Fig. 4).

Fig. 4
figure 4

Positive and negative likelihood ratio at the averaged cut-off points for PO-FS (≥ 50/100), ChrFS (≥ 6/10), and CFQ (≥ 16/33)

The Negative Predictive Values (NPV) were consistently high (81.6–97.7%) for all scales, across data collection time points. The Positive Predictive Values (PPV) were markedly lower; PO-FS ranged from 53.2 to 76.7%, ChrFS from 45.3 to 68.0%, and CFQ from 46.0 to 68.7% (Fig. 5).

Fig. 5
figure 5

Positive and Negative Predictive values, and proportion of patients self-reporting fatigue being of considerable significance, at the averaged cut-off points for PO-FS (≥ 50/100), ChrFS (≥ 6/10), and CFQ (≥ 16/33)

Details, with 95% Confidence Intervals on Sensitivity, Specificity, Likelihood Ratios, and Predictive Values at the averaged cut-off point for each scale are reported in Table 7 in Appendix.

Figure 6 illustrates the relation between mean fatigue and the optimal cut-off points to the averaged cut-off level in PO-FS, ChrFS, and CFQ during the peri-operative period. Details on mean fatigue values in the respective scales are reported in Table 8 in Appendix.

Fig. 6
figure 6

Relation of mean fatigue and optimal cut-off points to the averaged cut-off point level measured with PO-FS, ChrFS, and CFQ

Table 5 shows the percentage of patients with clinically significant fatigue (i.e. above the cut-off point). See also Fig. 8 in Appendix.

Table 5 Proportion of patients with clinically significant fatigue (i.e. above the cut-off point) when measured with PO-FS, ChrFS, and CFQ

Discussion

It is important that patient-reported outcome measures are meaningful and interpretable. Until now, studies on POF have reported fatigue scores and development of fatigue over time without relating these to the clinical relevance for patients. To date, authors have tended to use their discretion to define what is ‘significant’, or ‘mild’, ‘moderate’, or ‘severe’ fatigue; both single studies and reviews on fatigue may present vague and imprecise descriptions of the problem, as exemplified in a review by de Oliveira et al. [31]. With a defined cut-off point for clinically significant POF, it is possible to analyse the clinical implications of fatigue other than by merely reporting mean fatigue values, and arbitrary definitions of cut-off points and correspondingly ambiguous analyses can be avoided. As an example, in their study on POF, Schroeder and Hill [32] set a cut-off point for the ChrFS to ≥ 4, which, according to our findings probably included a large proportion of patients without clinically significant fatigue in their analyses.

Through the present study we have identified cut-off points for self-reported fatigue which are subjectively experienced to be “of considerable significance” in three scales commonly used in POF research. These cut-off points make it possible to indicate whether statistically significant findings of increased fatigue are also of clinical relevance and thus valuable in outcome evaluation. There is no “gold standard definition” of clinically significant postoperative fatigue. Our definition relies on the patients’ own perception of what they regard as “of considerable significance” on the day. Ideally, an optimal cut-off point should be unaltered pre- and postoperatively. The cut-off points vary slightly and in correspondence with the level of fatigue on the respective day. We decided to define an averaged level for the cut-off point using each scale as a weighted average of all days tested. This was a pragmatic trade-off to present a suitable cut-off point for use throughout the peri-operative period. As Fig. 3 illustrates, this affects sensitivity and specificity levels differently on different peri-operative days. Generally, with increased fatigue prevalence sensitivity increases, and specificity decreases. Similarly, likelihood ratios and predictive values varied on different days (Figs. 4, 5). We calculated the equivalent values also for other possible joint cut-off points; however, this resulted in less optimal results in several parameters on one or more days; leaving the reported averaged cut-off points as the optimal choice for the whole period.

It may be argued that variations in the cut-off points would not allow for the defining of an averaged cut-off point. However, in our opinion it can be reasoned that this pragmatic approach, with a cut-off point that incorporates several days better will reflect the entire peri-operative experience, and allow for fatigue assessment at any day, irrespective of level of fatigue. Further, the concept of postoperative fatigue in a surgical population should not be viewed independently from pre-operative status; we think it is relevant to consider also the pre-operative level of fatigue, as the disease in itself may contribute to the patient’s experience of fatigue. Thus, pre-operative level of fatigue was also included in the calculations. This is also the rationale behind choosing a weighted average, as the patient’s subjective report of fatigue being “of significance” should count equally at any time point. However, as we also have presented the data from each assessment point, future studies with clinically significant fatigue as the primary outcome can choose to apply the cut-off value from the most appropriate time point in order to achieve an optimal balance between sensitivity and sensitivity.

Although sensitivity, specificity, likelihood ratios, and predictive values were reasonably similar in all scales on the different days, the PO-FS consistently performed better; with larger AUC and higher LR+ and PPV compared to the other scales. Nevertheless, based on the findings here, all scales performed adequately and may favourably be used in fatigue research. The AUC values for all scales were “excellent” to “outstanding”, reflecting strength of discrimination [33] as an expression of the scales’ accuracy and ability to identify patients with or without fatigue.

All scales showed high responsiveness: SRM: 0.88–1.35. This was also reflected in the analyses of the subscales in PO-FS and CFQ, except in the CFQmental subscale, with a low value: 0.17. ChrFS had the highest responsiveness, and was highly correlated to the physical subscales in both the PO-FS and CFQ. This may indicate that POF is primarily related to the physical aspect of fatigue. Thus, surgery does not seem to significantly influence mental fatigue. This is consistent with previous findings related to the ICFSconcentration items [22, 34], which lead to the exclusion of these items in the abbreviated PO-FS. There may, however, be reasons to apply a fatigue scale that assesses mental fatigue in POF research, if also other aspects of fatigue is of interest, depending on the intervention or population.

When individuals experience a change in health status they may change their expectation and evaluation of quality of life. This dependency between a patient’s expectations and scoring in HRQoL instruments is called response shift, initially described by Calman [35]. The theory and appraisal of response shift has since been further developed [36, 37]. Within quality-of-life research, fatigue seem to be especially susceptible to this phenomenon [38, 39]. The observed temporal variations in the threshold for fatigue of ‘considerable significance’ may at least partially be viewed as a response shift, i.e. that patients’ expectations and interpretations shift during the peri-operative course. The patients may not expect to be as fit on day 1 after surgery as pre-operatively or on day 6, thus reflecting the higher cut-off point for significant fatigue during the first postoperative days.

In a paper comparing patients with chronic fatigue syndrome (CFS) and a non-clinical community group of participants, Cella and Chalder [26] found the cut-off point in the CFQ to be ≥ 18, which is higher than our finding of a cut-off point at ≥ 16. This difference may be due to how the dichotomisations in the ROC analyses were made, and to apparent differences between the populations. First, the cut-off value in the CFS paper was based on the difference between the community group and patients diagnosed with CFS. This is a different approach than asking the participants to define whether the perceived fatigue is of considerable significance to them. Further, the cut-off point is influenced by fatigue intensity and prevalence. The peri-operative cut-off point is an average of recordings with varying fatigue intensity and prevalence, while the CFS/community analysis included CFS patients with very high fatigue levels. The mean(SD) pre-operative fatigue level found in the present study, 11.7(3.3), is significantly below what was found among the CFS patients (n = 361); 24.4(5.8), p < 0.0001, but also lower than the mean score reported by the general-population group who participated in Cella and Chalder’s British study (n = 1615), 14.2(4.6), p < 0.0001. The mean(SD) pre-operative fatigue level in the present study is comparable to what was found in a general sample of the Norwegian population, 12.2(4.0) [1]. The discrepancy between fatigue scores from general-population samples may be due to cultural differences, illustrating the inherent challenge in comparing clinical data between countries.

In our literature search, we identified no studies that described the level or development of clinically relevant fatigue during the peri-operative period. Several studies have examined POF in settings similar to those which we examined without reference to the clinical significance of the fatigue intensity. For example, Hill et al. [11] compared fatigue assessed with ChrFS in patients undergoing laparoscopic versus open cholecystectomy. The laparoscopic group reported fatigue during the first month very similar to our findings, while the open group had more intense and prolonged fatigue (see Table 9 in the Appendix for details). Bisgaard et al. [7] also assessed fatigue after laparoscopic cholecystectomy using the ChrFS. Unfortunately, the level of fatigue during the first seven days is reported only in a figure without scores quoted. The graph however, is similar to the findings in the present study, with fatigue scores significantly increased until postoperative day 5.

Being young and/or male was associated with a lower return rate. We do not know whether fatigue may be a factor limiting the ability or drive to return the forms. It may play a role regarding age, as increasing age generally has been associated with more missing responses [1, 40, 41]. On the other hand, a lower response rate among men is also common [1], as in our study and may thus explain some of the missing responses.

A proper power calculation was not done for the ROC analyses, due to data that already had been collected, according to power calculations for the corresponding aims of Study 1 and 2. However, we have performed a post hoc assessment of the given sample sizes by using MedCalc®: Assuming a significance level of 0.05 and power of 0.80, prevalence rates between 24% and 49%, sample sizes between 100 and 300, and AUC between 0.81 and 0.89, the null hypothesis value would be AUC ≈ 0.70–0.81. Although admittedly post hoc analyses may be viewed as of limited value, this indicates that the sample sizes can be assumed adequate for the analyses we have presented.

Even though there were significant differences on day 3 in fatigue between younger and older patients, and between genders, these parameters did not influence the cut-off points in a consistent manner. Analysing subgroups necessarily reduces sample size analysed, and the inconsistencies in direction between the scales and between the Youden index J and C* indicate that our data were insufficient to examine these questions. This may be an area for future research.

Limitations of our study include that no externally validated measure of the ‘clinical significance’ of fatigue was included a priori. However, given the subjective nature of fatigue and its impacts, an item that enabled participants to state the presence or absence of ‘significant’ fatigue was deemed to be an appropriate choice. Limitations also include that, given that patients’ expectations appear to influence the cut-off levels, the cut-off points identified for a day-surgery population will not necessarily be valid if the instruments are applied in other settings, e.g. operations necessitating hospitalisation. This is an area requiring further research. A further limitation is that no fatigue subscales from comprehensive HRQoL measures, such as the SF-36, were included in our study.

We consider strengths of our research to include the use of multiple data collection time points, and the recruitment of a mixed day-surgery population.

Conclusion

We have analysed three commonly used scales that may be applicable in POF and day-surgical recovery research. The cut-off point for clinically significant fatigue was found to be located near the middle of each scale’s response range. With a defined cut-off point for clinically significant POF, it is possible to analyse the clinical implications of fatigue better than by merely reporting mean fatigue values. This can be particularly valuable for diagnostic purposes and in treatment evaluation. Further, it may be possible to analyse and review data from earlier studies in light of clinical relevance.