Introduction

Fatigue is a frequent symptom in patients suffering from cancer [1,2,3,4,5]. Cancer survivors often experience excessive fatigue even years after the initial development of the disease [6, 7]. Beyond cancer, fatigue is also common in other chronic diseases such as cardiovascular diseases [8], COPD [9], rheumatoid arthritis [10], fibromyalgia [11], Parkinson’s disease [12], multiple sclerosis [13], and irritable bowel disease [14].

Multiple questionnaires have been developed to assess fatigue [15]. While some of these questionnaires were developed on the basis of a one-dimensional concept of fatigue, other questionnaires consider several domains of fatigue (e.g., physical fatigue and mental fatigue) separately [16]. A review paper [17] compares the psychometric properties of 40 fatigue questionnaires. One of the most often used instruments for measuring fatigue is the three-item fatigue scale of the quality of life questionnaire EORTC QLQ-C30 [18]. Recently, a new fatigue module (EORTC QLQ-FA12, abbreviated here as EORTC-FA12) was developed by the EORTC quality of life group, specifically designed for assessing three dimensions of cancer-related fatigue: physical, emotional, and cognitive fatigue [19]. These three dimensions were derived from qualitative studies which found that patients reported physical, affective, and cognitive symptoms of fatigue (e.g., [20]). The EORTC-FA12 test authors did not recommend a sum score for aggregating across several aspects of fatigue. However, even if the focus of a questionnaire is on the specific sub-dimensions, an aggregated sum score can still provide reliable information for assessing patients more generally. Therefore, one aim of the present study was to test the psychometric properties of such a sum score. Until now, normative values for the questionnaire EORTC-FA12 have not been available. Such normative scores are helpful in evaluating the degree of fatigue in groups of patients and in determining age and gender differences.

The aims of this study were (a) to test age and gender differences in fatigue, (b) to test the psychometric properties of the questionnaire by conducting confirmatory factorial analyses, item analyses, and examining associations with depression and anxiety, (c) to evaluate the justification for calculating a summarizing score, and (d) to derive normative values, based on a representative sample of the general population.

Methods

Study sample

Between September and November 2016, a representative sample of the German general population was examined with the assistance of a demographic consulting company (USUMA, Germany). The entire country was separated into 258 sample areas. Once a sample area was selected, streets, houses, households, and household members were chosen randomly. A first attempt to contact study candidates was made at 4902 addresses, of which 4838 were valid. The subjects were visited by a study assistant. Insufficient command of the German language was an exclusion criterion. A total of 2510 people between 14 and 93 years old agreed to participate and completed several self-rating questionnaires (participation rate: 51.9% of valid addresses). The reasons for non-participation (48.1%) were: general information request was refused (15.3%), the interview was refused by the target person (14.7%), there was no one at home for four times in a row (17.2%), and other reasons, e.g., illness, vacation (3.2%). All adult participants provided their written informed consent to participate in this study and the data to be published. Also, written informed consent from the next of kin, caretakers, or guardians on behalf of the minors/children enrolled in the study was obtained. Subjects younger than 18 years (n = 86) and subjects with more than one missing item for at least one of the three scales of the EORTC-FA12 (n = 13) were excluded from the analysis (n = 99). Thus, the final sample consisted of 2411 subjects. The study was approved by the Ethics Committee of the University of Leipzig.

Instruments

The EORTC-FA12 [19] consists of 12 items with four response categories (1–4) for each item. In accordance with the scales of the core questionnaire EORTC QLQ-C30, the EORTC-FA12 scores are transformed to the range 0–100. Higher numbers indicate higher degrees of fatigue. A preliminary version of the EORTC-FA12, the EORTC-FA13, has been described in a previous paper [21], and some studies have already used this preliminary form [22,23,24,25]. The EORTC-FA12 comprises three subscales: physical fatigue (five items), emotional fatigue (three items), and cognitive fatigue (two items). The remaining two items serve as indicators of the impact of fatigue but they do not belong to a subscale. In the paper introducing the EORTC-FA12, the reliability of the EORTC-FA12 was tested using confirmatory factorial analysis (CFA). The CFA confirmed the three-factorial structure (physical, emotional, and cognitive fatigue). When, as was done by the test authors [19], the two additional fatigue items are included in the SEM model, it implies that there is common variance among the three fatigue subscales. In our study, we also test the psychometric properties of a sum score, defined as the mean of the 12 items, as was done in a previous study with breast and gynecological cancer patients [26].

The Patient Health Questionnaire-4 (PHQ-4) was included to test the associations between fatigue, anxiety, and depression. The questionnaire is composed of two ultra-short subscales, the Generalized Anxiety Disorder questionnaire (GAD-2) and the ultra-short depression questionnaire PHQ-2 [27]. The subscales can be analyzed separately or combined, aggregating across the four items.

Statistical analyses

The effects of age (three categories according to Table 1) and gender (two categories) on the fatigue scales were tested with two-factorial ANOVAs. Reliability was calculated with Cronbach’s alpha for the three subscales and the sum score as well. Part–whole corrected correlations between item and scale were calculated for all items. Effect sizes d were calculated to indicate the mean score difference between groups, related to the pooled standard deviations. Confirmatory factorial analyses (CFAs) were performed to test the model that was presented in the introductory paper [19]. We used the criteria Comparative Fit Index (CFI), Tucker-Lewis-Index (TLI), Normed Fit Index (NFI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). CFI, TLI, and NFI scores higher than 0.95 and RMSEA and SRMR scores lower than 0.05 indicate a good model fit, and RMSEA scores of less than 0.08 indicate an acceptable fit [28]. Associations between fatigue and psychological scales were calculated using Spearman’s correlations.

Table 1 Sociodemographic characteristics of the sample

Results

Sample characteristics

Sociodemographic characteristics of the final sample (n = 2411) are given in Table 1.

The mean age was 49.5 years, and females were slightly overrepresented (53.5%).

EORTC-FA12 mean scores

Figure 1 presents mean scores divided by gender and age groups. Females reported significantly more fatigue than males in all scales: the effect sizes were d = 0.29 (physical fatigue), d = 0.22 (emotional fatigue), d = 0.11 (cognitive fatigue), and d = 0.25 (sum score). The means and standard deviations of the three dimensions and the sum score for the total sample were as follows: physical fatigue: 24.3 ± 25.6, emotional fatigue: 12.7 ± 21.6, cognitive fatigue: 8.3 ± 19.7, and sum score: 17.1 ± 20.3.

Fig. 1
figure 1

Mean scores of the EORTC-FA12 scales, broken down by gender and age groups

Figure 1 shows that there was no consistent linear age trend. The ANOVA results are given in Table 2. All scales showed a statistically significant gender effect, while the age effect was weaker. In the sum scale, there were no statistically significant age group effects at all. The interaction effects between age and gender were also insignificant.

Table 2 ANOVA results

Analyses on item level

Table 3 presents mean scores and standard deviations of the 12 items. In addition, the part–whole item–scale correlations are given for the total scale and the specific scale of the item. Cronbach’s alpha of the total scale was 0.94, the alpha coefficients for the subscales were 0.92 (physical fatigue), 0.86 (emotional fatigue), and 0.79 (cognitive fatigue). The right column of Table 3 presents the differences between the item means of the general population of this study and the means of cancer patients as reported in [21], expressed in terms of effect sizes. Item 12 was not included in that publication. All effect sizes were between 0.34 and 0.81, indicating higher levels of fatigue for the cancer patients. The lowest coefficients were found for the two items from the cognitive fatigue scale (0.34 and 0.36).

Table 3 EORTC-FA12 item characteristics

CFA results and correlations between the scales

The latent structure of the EORTC-FA12 was adopted from the introductory paper [19]. The results of the CFA were as follows: CFI = 0.964, TLI = 0.950, NFI = 0.962, RMSEA = 0.082, and SRMR = 0.031. Further details of the latent structure of the EORTC-FA12 are presented in Fig. 2.

Fig. 2
figure 2

Graphical representation of the confirmatory factorial analysis. Physical fatigue, emotional fatigue, and cognitive fatigue: latent constructs, FA1–FA12: single items; e1–e12: error variables

Table 4 shows that all correlations among the subscales were between 0.61 and 0.72. Anxiety and depression, assessed with the PHQ-4, were correlated with all scales of the EORTC-FA12 (Table 4, lower part). Among the three scales of the EORTC-FA12, the PHQ-4 correlations with physical and emotional fatigue were highest.

Table 4 Spearman correlations between the EORTC-FA12 subscales, the sum score, and the PHQ-4

Normative scores

Normative values (percent rank scores) are given separately for males and females and for the three age groups (Tables 5, 6). Most participants reported no cognitive fatigue at all (frequencies between 69 and 80%). In the sum score columns (right part of Tables 5, 6), only every second possible score is listed; the scores in between (e.g., 2.8) can be estimated by averaging between the adjacent percentages.

Table 5 Normative values (cumulative frequencies in %) for the EORTC-FA12, males
Table 6 Normative values (cumulative frequencies in %) for the EORTC-FA12, females

Discussion

One central purpose of this study was to test the psychometric properties of the EORTC-FA12, applied to the general population. The CFA largely confirmed the factorial structure. Four of the five fit indices were in the range of good model fit, but one index (RMSEA) failed to reach the level of acceptable fit. Though the fit indices given in the introductory paper [19] were slightly better than those obtained in this general population sample, the EORTC-FA12 is also applicable to the general population. High internal consistency coefficients of the scales also indicated good psychometric quality. All items contributed positively and substantially to their scales. Because the test authors of the EORTC-FA12 chose the factorial structure that optimized the fit indices for their specific data set, it is not surprising that these original fit coefficients were not completely reached when using another sample.

The EORTC-FA12 was designed to measure several distinct aspects of fatigue (physical, emotional, cognitive). Such a distinction is not self-evident, other fatigue questionnaires consist of other subscales, e.g., the FACIT-Fatigue questionnaire [29] (subscales “experience of fatigue” and “impact of fatigue”), the Patient-Reported Outcomes Measurement Information System (PROMIS) fatigue item bank [30] (“fatigue experience”, “impact of fatigue on social/recreational activities”, and “impact of fatigue on mental/cognitive activities”), and the Multidimensional Fatigue Inventory MFI [31] (“general fatigue”, “physical fatigue”, “reduced activity”, “reduced motivation”, and “mental fatigue”).

While a detailed description of fatigue subdomains is necessary for many applications, there are other clinical or research questions for which an aggregated assessment of fatigue is more relevant. Therefore, we also tested the psychometric properties of a sum score. There were several possible ways to calculate such a sum score: (1) summarizing across the three scale scores; (2) summarizing across the 10 items which belong to the three scales; and (3) summarizing across all 12 items, including those two items which were not assigned to a scale. We followed the procedure adopted in the Kecke et al. study [26] and aggregated across all 12 of items. This implies that the weight of physical fatigue is stronger than that of emotional or cognitive fatigue because physical fatigue has a higher number of items. We believe that the test authors intended for the physical aspect to be more heavily weighted since they generated more items for this aspect. Cronbach’s alpha of the sum score was very good (0.94). The correlations among the three scales (r among the latent scales according to Fig. 2: 0.71–0.84) indicate a high proportion of common variance. In the introductory paper of the EORTC-FA12, the correlations among the latent dimensions were somewhat lower (r between 0.58 and 0.66). Nevertheless, the high correlations show that there is a high degree of common variance between the subscales in cancer patients as well as in the general population. There are also other fatigue questionnaires which were designed as multidimensional instruments, for which, nevertheless, the construction of sum scores is useful, e.g., the Multidimensional Fatigue Inventory MFI-20 [6] and the Fatigue Assessment Scale FAS [32]. It is a matter of debate whether fatigue should be considered unidimensional or multidimensional. Though a review study [16] showed that physical and mental fatigue behave differently, there are also arguments for considering and measuring the common factor behind the fatigue dimensions. Nearly all papers on the topic of fatigue begin in the introduction with statements about high levels of fatigue experienced by the patients. Such statements are meaningful only under the assumption that the comprehensive construct “fatigue” is meaningful. Linking different fatigue measures on a common metric [33] also implies that there is one underlying general fatigue dimension.

Females reported more fatigue than males in all three domains (Fig. 1). This has also been observed in multiple other studies that have used other fatigue assessment instruments (e.g., [34, 35]). When comparing groups of patients, it is mandatory to take gender into account. However, age differences were less clear. While physical fatigue increased with age, cognitive fatigue decreased, and emotional fatigue showed an inverted U-shaped trend (Fig. 1). The fatigue sum score completely failed to show a statistically significant age group effect in the ANOVA. Because of the unsystematic age relationship, we believe that reporting normative scores irrespective of age range is justifiable. These normative scores can be used to assess the degree of fatigue observed in individual patients or groups of patients.

Our results do not provide minimally important differences. The standard deviations of the scales which are given in Table 2 can be used to roughly estimate distribution-based thresholds for such differences [36].

Some limitations of this study should be noted. The response rate of this study (59.1%) was not high which might have caused a bias. Response rates in this range are, however, common in epidemiological research. The study did not include data on comorbidity. Because of the skewed distributions of the fatigue dimensions, the standard deviations are of limited informative value, and the ANOVA results are not exact. Since there are no cutoffs for the EORTC-FA12, we could not report prevalence rates of fatigue. We only tested the dimensional structure that was given by the test authors. If we had considered other possible models, we could have obtained slightly better fit indices. However, research needs consistency in the use of the instruments. Changing the questionnaire to adapt it for a particular data set might provide better fit, but the results would no longer be comparable with those of other examinations. Since there are already multiple instruments available for measuring fatigue, especially in oncology, the advantage of adding a further instrument can be questioned. We believe that the association with the EORTC QLQ-C30, the international validation [19], and now the availability of normative values will encourage many oncological researchers to use this new instrument. It is difficult to assess to what degree the results are generalizable to other countries. The mean scores of the EORTC QLQ-C30 fatigue scale, obtained from eight general population studies in six different countries, ranged from 15.0 to 29.0 [37], while the fatigue mean scores of the PROMIS Profile 29 were similar in general population samples of the US, UK, France, and Germany [38]. Further normative examinations of the EORTC-FA12 in other countries are needed to evaluate the generalizability.

Taken together, this study shows that the EORTC-FA12 has acceptable psychometric properties for use with the general population. Gender must be taken into account when evaluating patients’ burden. The normative scores presented here can be used to assess degrees of fatigue, both separately for the three main aspects of fatigue and for achieving a total fatigue score.