Background

The EORTC Quality of Life Group develops site-specific modules to be used with a core questionnaire, the EORTC QLQ-C30. One of the first was the module for patients with head and neck cancer, the EORTC QLQ-H&N37 [1], later revised and shortened to its final version with 35 items, the H&N35 [2]. This module consists of 7 multi-item scales, measuring pain in the mouth, problems with swallowing, senses, speech, social eating and social contact, and 11 single-item scales, assessing problems with teeth, mouth opening, dry mouth, sticky saliva, coughing, feeling ill, as well as use of analgesics, nutritional supplements, feeding tube and finally weight gain and weight loss.

The module has been translated into 53 languages (February 2012, http://groups.eortc.be) and is in use worldwide as one of the standard instruments for measuring quality of life in head and neck cancer patients [3, 4].

Some issues have been raised that may hamper the use of the H&N35. One criticism occasionally raised is that patients may feel annoyed by some of the items, for example, those enquiring about problems with sexual functioning [5, 6]. A matter of debate is whether this presents difficulty for the researcher who feels uncomfortable in asking such questions or for the patient who feels embarrassed or irritated in answering. Another criticism concerns items that may not be applicable to some of the patients, for example, questions about swallowing solid food administered to patients who are tube fed or about hoarseness when the larynx has been removed [7]. Little is known about the use of the H&N35 in research, on the way the psychometric issues are reflected in different languages, and how well the multi-item scales are accepted by patients and investigators.

The goal of the present study was to review all papers relating to studies that have used the H&N35 module to date, investigating potential methodological problems and benefits. Questions to be answered were as follows:

  1. 1.

    In what languages has the H&N35 been used and validated since it was published (cross-cultural use)?

  2. 2.

    How reliable and valid are the multi-item scales of the H&N35? Were any psychometric problems reported (psychometrics)?

  3. 3.

    How accepted are the questions by the patients, that is, how frequently did they skip specific items or scales (acceptance by patients)?

  4. 4.

    How accepted are the questions and the scales by the investigators, that is, do they omit items or scales (acceptance by investigators)?

  5. 5.

    How often is the H&N35 used for what types of studies?

Methods

The H&N35 contains 35 items which can be condensed into seven multi-item and eleven single-item symptom scales. All EORTC QoL questionnaires result in scales that score from 0 to 100. A score of 100 indicates perfect QoL in the functioning scales, whereas for the symptom scales, it indicates heavy burden.

A systematic review was performed, searching for all publications up to August 2011 that reported data using the H&N35. Databases searched were Pubmed, EMBASE and Social Science Citation Index. Original papers written in the following languages were eligible for this review: Bosnian, Croatian, Dutch, English, French, German, Japanese, Russian, Serbian, Spanish and Turkish. Papers written in Japanese were translated by a native speaker. All other non-English papers were read by the first author (SS) in the original version.

Search terms entered for title, abstract or key words were “H?N35” and “head and neck module”, respectively. The question mark is used for the search electronic in databases to indicate that any single character or none at all is considered correct. For example, a paper using the abbreviation “HN35” as well as a paper using “H&N35” would be included in the search. The results of that search were presented to a group of health care professionals experienced in the treatment of head and neck cancer within the EORTC Quality of Life Group with the opportunity to add papers that had not been detected by the criteria used in the initial search.

All reviews were excluded; only original papers were analysed. Several papers on the same study population were considered eligible for inclusion as long as different data were presented. It was not always possible to determine exactly whether data from the same population were reported or not. Therefore, all papers from the same author or study group were included even if the presented data came presumably from the same patient sample. Duplicate hits, that is, the same article found in different search engines, were removed.

If no access to the full text was available, the paper’s corresponding author was contacted and asked to send a PDF file or a printed copy of the manuscript.

The following details were documented for each paper: the number of patients assessed with the H&N35, cancer site, language in which the H&N35 was administered, information about compliance and missing values, information on or discussion of methodological problems, challenges or advantages, number of H&N35 scales used, estimates of internal consistency (Cronbach’s alpha), construct validity, study design and topic. These details were entered into a database for statistical analysis, using STATA 11 [8]. The analysis included computation of frequencies, percentages and averages (mean, median) as well as testing differences between groups using Kruskal–Wallis tests.

Results

A total of 136 original papers were found that had used the H&N35 (see Fig. 1). Access to the full text was available for 125, with access to the abstract for the remaining 11. A detailed description of the studies can be found in the supplementary material. Considering all papers together, the H&N35 had been completed by 13,969 patients (subject to the assumption that each paper reported on a different study population). Most often, the H&N35 was used in observational studies; 53 % of the studies had a cross-sectional design, 31 % were prospective cohort studies, 7 % phase-II-trials, 6 % phase-III-trials, 1 % case–control studies, and one study reported on a case series.

Fig. 1
figure 1

Flow diagram of literature search

Cross-Cultural Use: The H&N35 was administered in 19 different languages: German (29 papers), Dutch (26), Swedish (15), English (11), French (8), Norwegian (7), Mandarin (7), Cantonese (6), Danish (4), Spanish (5), Polish (3), Portuguese (3), Japanese (3), Czech (1), Greek (1), Italian (1), Korean (1), Sinhala (1) and Turkish (1). Studies were performed in 26 different countries: The Netherlands (25 studies), Germany (22), Sweden (15), Taiwan (9), Norway (7), France (6), Switzerland (6), United Kingdom (6), Denmark (4), Hong Kong (4), Spain (5), Japan (3), Poland (3), Portugal (2), Canada (2), United States (2), Australia (2), and 1 study each in Austria, Belgium, Brazil, Czech Republic, Greece, Italy, Korea, New Zealand, Sri Lanka and Turkey. A breakdown of studies from the different world regions is displayed in Fig. 2.

Fig. 2
figure 2

Description of papers analysed. Panel a proportion of studies performed in different world regions. Panel b proportion of study designs used in the studies. Note region is defined as being the region where the study was primarily performed not as the region where the paper was published

Psychometrics: Sixty-one papers explicitly or implicitly discussed methodological issues of the H&N35. Internal consistency was investigated in 18 papers by means of Cronbach’s alpha and in general appeared to be high, that is Alpha ≥ 70 (Table 1). Moderate or low internal consistency, that is, Alpha < 70, was reported on the Speech [69] and Senses [68, 1012] scales. Consequently, the items of the Senses scale were treated as single items in two studies [13, 14]. One study [15] reported a moderate Cronbach’s alpha (0.64) regarding the Pain in the Mouth scale. The average Cronbach’s alpha (computed as the median alpha per scale of all papers where coefficients were reported) ranged from 0.61 (Senses) to 0.93 (Sexuality).

Table 1 Internal consistency of EORTC QLQ-H&N35 multi-item scales, sorted by language

Construct validity was evaluated less frequently. Jayasekara [16] reported overall good construct validity with 87 % scaling successes though the Senses scale exhibited scaling failure, that is, the items were more highly correlated with other scales than with its own scale. Jensen [17] criticised high interscale correlations (> 0.7) as an “indication of overlapping constructs” (p. 35) and, therefore, considered the Social Eating and Social Contact scales to be difficult to differentiate psychometrically and conceptually [17]. On the other hand, he argued that the categorisation of items and scales was sensible because the entire range of the items and scales were covered by the patients’ responses. In a study in laryngeal cancer patients after surgery, items of the Speech scale had scaling failure in 24 %, 1.4 % occurred with Pain in the Mouth and Swallowing, and 0 % in all other multi-item scales [7]. Arraras et al. reported good evidence for sensitivity to change in all scales [12]. Silveira et al. investigated the module’s ability to differentiate symptomatic vs. asymptomatic patients and found good performance except in the following scales: senses, dry mouth, weight gain and weight loss [9]. In three studies, a total H&N35 scale value was calculated based on all head and neck scales [1820].

Acceptance by patients (missing values): 23 papers reported on percentages of missing values with varying results. The completeness of the questionnaire varied from 66 % [17] to 99 % [21], both studies including patients from Denmark. Scales with missing values included Sexuality, Speech, Teeth and Weight Gain with average percentages of missing values of 11.5, 7.0, 2.7 and 2.0 %, respectively (Table 2). Some authors reported that, regarding the Teeth and Sexuality scales [16, 17, 22], it may remain unclear whether a non-answer was due to the patient being unwilling to answer or because the item did not apply to their status.

Table 2 Missing values information per scale with reported missing values

The percentage of missing data was unrelated to the region where the study had been performed (P = 0.26 to 0.99).

Acceptance by investigators: The H&N35 consists of 7 multi-item scales and 11 single-item scales. The number of scales reported on in the reviewed studies varied considerably (range: 0 to 18 scales; mean: 12 scales; see Table 3). The use of the scales ranged from 39 % (Weight Gain)—that means that 61 % of the studies did not use or did not report on this scale – to 85 % (Swallowing). Usually, no rationale was given why specific scales were omitted. From the pattern of use, we can see that the scales used least frequently were those where only yes/no answers were possible. These items were reported in less than half of the papers (39 % Weight Gain to 45 % Pain Killers; see Table 3). The Sexuality scale was relatively often (27 %) omitted; however, there were also studies which only used that scale [23, 24]. The number of scales differed significantly between the regions where the study was performed (P = 0.01, see Fig. 3): Whereas in Northern America and in multi-national studies usually all 18 scales were used, on average 12 scales were used in studies performed in Western Europe. No differences were observed according to study designs (P = 0.78).

Table 3 Use of sales as reported in the publications (sorted by number of scales used)
Fig. 3
figure 3

Number of scales used a per region where the study was performed and b per study design applied. Note displayed are the medians and quartiles. The H&N35 consists of 18 scales

One study group had developed an alternative head and neck module (EORTC QLQ-H&N17) for surgically treated patients [25].

Discussion

This review describes the use of the EORTC module for the measurement of quality of life in head and neck cancer patients, the H&N35. Major objectives were to find out in what languages it has been used and validated, what psychometric properties in the different language versions have been reported, and how well accepted the module is by patients and investigators.

Based on the 136 papers identified and assessed as part of this evaluation, we can conclude that the H&N35 is used by many investigators throughout the world. As many authors investigated or commented on methodological issues of the H&N35, this information could be collated.

Use of the H&N35 in 26 countries and 19 languages to date indicates broad cross-cultural acceptance. It is, however, interesting to note that it had been translated into 53 languages altogether, leaving 34 translations “unused”. Presumably, these translations were requested for trials performed by pharmaceutical companies without publication in academic journals. Most publications came from Western and Northern European Countries and Asia. Although many studies investigating quality of life in head and neck cancer patients are performed in Northern America [26, 27], relatively few have used the H&N35. This can be explained by the fact that, traditionally, Northern American studies make more use of other well-validated instruments such as the Functional Assessment of Cancer Therapy–Head Neck scale [2831], the University of Washington Quality of Life Questionnaire, or the Performance Status Scales–Head and Neck cancer [32].

Relatively few studies have reported on construct validity. Those that did mainly confirmed the proposed scale structure, though some concerns have been expressed regarding the high interscale correlation, indicating overlapping constructs. Similarly, some authors computed total scores although this was not intended by the developers of the H&N35. Reliability was mainly evaluated using the concept of internal consistency which was satisfactory overall. The only scale with a median Cronbach’s alpha beneath the threshold of 0.70 was Senses. Reasons for this moderate internal consistency may be that smell and taste are different functions, and patients may have problems with the one without difficulties in the other domain. Moderate internal consistency of this scale was found in different languages and study populations; therefore, the two items should perhaps better be handled separately.

All other scales exhibited good to very good consistency coefficients with Sexuality having the highest scores in all but one language. Sensitivity to change was not frequently explicitly investigated, though the H&N35 was used in many prospective studies and changes over time were observed, providing indirect evidence for sensitivity to change. However, explicit investigation of sensitivity to change would be desirable.

Although the H&N35 is relatively long compared to other EORTC quality of life modules, it proved to be well accepted by patients. The reported frequency of missing values was generally low. Only areas where patients might feel that this domain is not applicable to them, for example, problems with teeth when they have dentures, were left out more frequently. Good acceptance of the H&N35 was also found by other authors who compared different QoL measures in head and neck cancer patients [3, 4, 33].

The acceptance by investigators was also high, considering the number of studies using this instrument, although the entire H&N35 was not always used. Items where only a yes/no response format is provided were frequently either not administered to the patients or not reported by the authors. We can only speculate about the reasons for this. One option is that investigators feel that the psychometric properties of Likert scaled response formats are better. Another explanation would be that issues such as weight gain, use of analgesics or feeding tube are considered to be more reliably measured with objective measures instead of patient reported.

In conclusion, the H&N35 is used by many investigators throughout the world. Some methodological problems (e. g. low internal consistency of some multi-item scales, at times poor compliance of investigators with no/yes scales) have been reported and could be solved, for example, by exchanging problematic items. Although the H&N35 was initially developed for clinical trials, it has been used mainly in observational studies and proved well accepted and feasible in that setting. It has also successfully been implemented in clinical practice [34, 35].

In general, we believe that systematic methodological reviews of frequently used instruments can help to improve existing measures and increase our knowledge on how to develop and improve questionnaires that are psychometrically sound and well accepted by patients and clinicians alike. In addition, it could be useful to collect the raw data of all studies in a central data base, so that direct comparisons between different languages and cultures are possible. This has been done with the EORTC QLQ-C30 [36], but not with the EORTC modules. We recommend that it would be worthwhile to undertake such a task.