Introduction

After treatment of head and neck cancer (HNC), patients suffer not only from the danger of tumor recurrence/tumor progression, but also from often severe and persistent problems affecting daily activities and health-related quality of life (hrQOL). hrQOL is a subjective, multi-attribute construct defined by the World Health Organization as: “An individual’s perception of their position in life in the context of the culture and value system in which they live and in relation to their goals, standards and concerns. It is a broad ranging concept affected in a complex way by the person’s physical health, psychosocial state, level of independence, social relationships, and their relationship to salient features of environment” [1].

Over the last years, a lot of research has been aimed at developing hrQOL questionnaires. We can now rely on a number of validated and reliable questionnaires. The majority of them are patient-administered questionnaires that cover a broad rage of dimensions applicable to HNC. However, different hrQOL questionnaires may fit different purposes. Often, more than one questionnaire is needed. This leaves the clinician with a wide, sometimes confusing array of options [2]. Also, HNC summarizes a selection of different cancer sites, tumor stages and treatment modalities. Thus, the selection of appropriate questionnaire(s) for a particular objective is essential in planning any data collection. A thorough examination and evaluation of the available questionnaires along various criteria is necessary [3, 4].

Psychometric properties (e.g., reliability, validity, sensitivity) and application-related features (e.g., administration mode, scales used, time needed for completition, etc.), as well as translation into relevant languages need to be accounted for. However, the first and most important concern is face and content validity.

Therefore, it is of particular interest to examine the content covered by hrQOL questionnaires. To facilitate the selection of appropriate questionnaires applied in HNC, several publications can be relied upon, which focus on the psychometric properties of the measures [511]. However, content comparisons have not been performed so far. This might be due to the varying use of concepts, operationalizations and scales in the different HNC-specific questionnaires. A content comparison based on a universally accepted, well-defined and standardized reference system that allows for a detailed exploration and comparison of all contents of the questionnaires would be valuable.

The newly available international classification of functioning, disability and health (ICF) [12] serves as a universal framework and facilitates comparison of items and scales of various hrQOL questionnaires [13].

The ICF belongs to the WHO`s family of international health classifications. While the well-known international classification of disease (ICD) classifies diagnoses, the younger and less well-known ICF classifies functioning. The ICF is based on the bio-psycho-social model of functioning, disability and health and offers a detailed and etiologically neutral classification. Several authors have dealt with the conceptual connections between hrQOL and the ICF [1416]. The ICF is a useful tool to encourage multicenter-multinational assessment of functioning. It is crucial to understand that the ICF is a reference to facilitate measurement of functioning, but is not a QOL-instrument itself.

The value of a content comparison of hrQOL questionnaires for clinicians, who set up a study with hrQOL as an outcome measure, is that the clinician can look up in tables which of the questionnaires cover the exact topics he/she wants to address.

The aim of the study is to examine and compare the contents of hrQOL questionnaires used in HNC, based on the ICF as the frame of reference. The specific aims are: (1) to identify hrQOL questionnaires applicable to HNC, based on a literature review of articles published in English in Medline between 2000 and 2006, (2) to examine the contents of each questionnaire based on its translation (“linkage”) to the ICF, and (3) to compare the contents of the questionnaires among each other, based on the ICF as a reference.

Methods

A systematic literature review was conducted to identify and select current hrQOL questionnaires applicable to HNC. Out of the items of the selected questionnaires, we extracted so-called “meaningful concepts” and translated (“linked”) them to the ICF using established linking rules. The ICF categories representing the concepts contained in the questionnaires built the basis of the descriptive analysis and content comparison.

Literature review

We searched the electronic database MEDLINE using the keywords “oral cancer”, “oropharyngeal cancer”, “hypopharyngeal cancer”, “salivary gland cancer” or “laryngeal cancer”. Searches were limited to original articles published between 2000 and 2006 in the English language.

Eligibility checks of the search comprised three steps: (1) All abstracts were checked to include descriptive, evaluative (e.g., randomized controlled trials, clinical controlled trials, etc.), as well as psychometric studies. The studies should present first-hand data concerning patients with the selected head and neck cancer locations, irrespective of cancer therapy or tumor stage. We excluded reviews, case reports, economic evaluations and primary prevention studies, as well as studies including other cancer localizations or healthy persons. (2) Due to the great number of abstracts identified in the first step, a random sample of 50% was drawn. (3) Finally, the full text articles were retrieved and checked using the same eligibility criteria as for the abstract check in step 1.

The remaining publications after step 3 were checked for their use of hrQOL questionnaires. Two types of questionnaires were selected: (a) questionnaires that are specific for HNC as such and (b) questionnaires that are specific for certain symptoms after HNC. Questionnaires that were not specific for head and neck cancer were not analyzed. We selected only questionnaires, which were quoted in at least two different articles.

Finally, we compared our selection of questionnaires with that in other recent publications: Rogers et al. [17] and Fung and Terrell [18].

ICF-based content examination

The ICF consists of two major parts, each containing two separate components. Part 1 covers functioning and disability and includes the components ‘‘body functions’’ (b) and ‘‘structure’’ (s) and ‘‘activities and participation’’ (d). Part 2 covers contextual factors and includes the components ‘‘environmental factors’’ (e) and ‘‘personal factors.’’ In the ICF classification, the letters b, s, d and e, which refer to the components of the classification, are followed by a numeric code starting with the chapter number (one digit), followed by the second level (two digits each) and the third and fourth level (one digit each). The component letter with the suffix of two, four or five digits corresponds to the code of the so-called categories. Categories are the units of the ICF classification. Within each chapter, there are individual two-, three- or four-level categories. The ‘‘other specified’’ categories are characterized by the final code 8.

An example from the component Body Functions is presented in the following:

 

b5

Functions of digestive and metabolic systems

First/chapter level

b510

Ingestion function

Second level

b5105

Swallowing

Third level

b51051

Pharyngeal swallowing

Fourth level

Linkage procedure

Concepts contained in the questionnaires were linked to the ICF using established linkage rules described in detail by Cieza et al. [28, 29]. This linking procedure was performed separately by two health professionals experienced with the ICF. To decide which ICF category should be linked to each item of the questionnaire, consensus between the health professionals was required. In case of disagreement, a third independent evaluator was consulted to finally decide on the most suitable code. The reliability of the linkage process was evaluated by calculating kappa coefficients [29] and nonparametric bootstrapped confidence intervals [30] based on the two independent linkage versions of each instrument. Kappa coefficients were calculated per component at the highest level of detail (in this case: 3rd) ICF level to indicate the degree of agreement between the two health professionals conducting the linkage procedure. The kappa analysis was performed with SAS [40].

The linkage procedure starts with the identification of meaningful concepts contained in the questionnaires. Then, the concepts can be translated (“linked”) into corresponding ICF categories. The linkage rules [28, 29] are guidelines, which enable concepts contained in the questionnaires to be linked to the ICF in a standardized manner. If an item of a questionnaire contains more than one concept, each concept has to be linked separately.

Not all meaningful concepts can be linked to the ICF, but still have to be documented according to the linking rules [28, 29]. There are the following exceptions:

  1. 1.

    Personal factors belong to the ICF. However, they are not yet specified within the ICF. Therefore, all personal factors are labeled “pf” without further specification possible. For example, “smoking”, “alcohol consumption” or individual coping strategies are labeled “pf” (rule 6).

  2. 2.

    Concepts that deal with the underlying health condition (cancer diagnosis and its treatment) cannot be linked to the ICF and are labeled “health condition” or “hc” (rule 8).

  3. 3.

    Concepts, that are too general and not precise enough to decide on an ICF-category, are classified ‘not definable’ (“nd”). For example, concepts such as “physical health” or “quality of life” are “nd” (rule 5).

  4. 4.

    Concepts, that deal with functioning but still are not represented by the ICF are labeled ‘not covered’ (“nc”). Such concepts may represent concepts that lay outside the scope of the ICF (rule 7).

Results

Literature review for instrument selection

The electronic literature searches in MEDLINE, conducted in May 2006, yielded 1,815 hits. After abstract checking, 600 studies were identified. Out of these, 300 articles were randomly selected and read in full length. After reading these, another 20% had to be excluded because the inclusion/exclusion criteria were not met. Out of the remaining 240 articles, nine questionnaires were selected Table 1 shows the full names and acronyms of the nine questionnaires selected for further analyses. It provides an overview of their major characteristics.

Table 1 Summary of selected questionnaires

Three questionnaires, the EORTC, FACT and QOL-RTI, consist of a more general part applicable to all cancer types (EORTC QLQ-C30, FACT-G, QOL-RTI) and another HNC-specific questionnaire (EORTC QLQ-HN35, FACT-HN and QOL-RTI HN). For evaluating HNC patients, both modules have to be used in combination. Since the two modules work in combination and to ease comparison with other questionnaires that are not built in modules, we have evaluated the two modules as one. In the following, we simply refer to “EORTC”, “FACT” or “QOL-RTI”.

The questionnaires analyzed in this piece of work were also found in other literature reviews [17, 18]. Rogers et al. report on all nine questionnaires that have been selected here and Fung and Terrell refer to EORTC, FACT, UW_QOL, QOL RTI, PSS HN and HN QOL.

Linkage process

The health-professionals identified 474 meaningful concepts within the nine selected hrQOL questionnaires. Out of the 474 meaningful concepts, 404 (85%) could be linked to the ICF. Table 2 shows the evaluation of the linkage procedure by kappa statistics and bootstrapped confidence intervals. Estimated kappa values range from 0.76 to 0.91. None of the 95% confidence intervals encloses zero, thus the linker agreement exceeds chance.

Table 2 Kappa coefficients and non-parametric bootstraped 95% confidence intervals for the linking procedure of the selected questionnaires, for each questionnaire on the third lCF level

All of the nine questionnaires include the ICF components “activity and participation” (120 concepts) and “environmental factors” (60 concepts). “Body functions” are represented in eight out of nine questionnaires, but not in the PSS-HN. Still, the component “body functions” has most concepts (212). “Body structures” are represented in just five out of nine questionnaires with 11 concepts altogether.

However, 15% of all meaningful concepts (70 concepts) could not be linked to the ICF: About half of these cases (36/70) were labeled “nd”. This means that the underlying concept was too general in its description to be linked to the ICF and does not provide concrete information on the patient’s functioning. Examples of meaningful concepts that were labeled nd are “physical well-being”, “feel ill” (both FACT) or “feel sick” (EORTC HN 35). Only three questionnaires do not exhibit this type of more general concept: PSS-HN, LORQ and XQ.

The second reason why a concept could not be linked to the ICF (17/70) was that the concept deals with personal factors. Personal factors belong to the ICF, however, they are not classified yet. Here, information on “smoking” and “alcohol consumption” as well as individual coping mechanisms are asked. Personal factors are found in 4/9 questionnaires: FACT, QOL-RTI, UW_Qol and VHI.

The content diversity ratio refers to the number of different second-level ICF categories divided by the number of all categories of a questionnaire. A value of 1 indicates that each meaningful concept of the questionnaire corresponds to a different ICF category. A value towards zero indicates lower content diversity, i.e., several concepts correspond to the same ICF category. The content diversity ratio is above 0.5 for the QOL-RTI (0.73), EORTC (0.68), FACT (0.53) and HN QOL (0.53). It is below 0.5 for the UW QOL (0.42), LORQ (0.38), VHI (0.36) and PSS-HN (0.18; Table 3).

Table 3 Comparison of instruments (all concepts)

To map the meaningful concepts of the nine questionnaires, we used a total of 74 different ICF categories, corresponding to 7% of all existing ICF categories.

There were 34 different categories from the component “activity and participation”, 26 different categories from the component body functions”, 11 different categories relating to “environmental factors” and three different categories out of “body structures”.

The questionnaire with the broadest bandwidth of content coverage is the EORTC, which was linked to 41 different second-level ICF categories. The questionnaire with the narrowest bandwidth of content coverage is the PSS-HN. For the linkage of the concepts of the PSS-HN, five different ICF categories were sufficient (Table 4).

Table 4 Comparison of instruments (different concepts, ICF second level)

However, not all 74 categories are equally represented across the different questionnaires. Only 8 out of 74 categories are used in at least five of the evaluated nine questionnaires: there are e110 products for personal consumption (i.e., food, drugs), b510 ingestion function, b152 emotional function, b280 sensation of pain, b310 voice, d550 eating, b130 energy and drive function and d850 employment (Table 5).

Table 5 Second-level ICF categories with strong use in the questionnaires

Tables 6, 7, 8 and 9 show in detail how the selected questionnaires cover ICF categories from the components body functions, body structures, activity and participation and environmental factors. Representation of the detailed categories differs significantly among questionnaires. For example, b250, taste function is represented in the questionnaires EORTC, UW_QOL, QOL RTI and HN QOL, while b255, smell function is represented just in the EORTC. Information on voice and speech function (b3, b310–b340) can be collected with all evaluated questionnaires except for PSS_HN or QOL_RTI. A researcher dealing with the question of the effects of family and friends on the quality of life following HNC will find the categories “e310, immediate family”, ‘‘e315, extended family” and “e320, friends” in Table 7. In this table, the researcher can see that the FACT, QOL-RTI and UW_QOL address this area of contextual factors. Support and relationship between the patient and health-care professional are captured only by the QOL-RTI and the HN-QOL.

Table 6 ICF categories from the component “body functions” represented in the hrQOL instruments
Table 7 ICF categories from the component “body structures” represented in the hrQOL instruments
Table 8 ICF categories from the component “activities and participationrepresented in the hrQOL instruments
Table 9 ICF categories from the component “environmental factorsrepresented in the hrQOL instruments

Other examples can be looked up easily in Tables 6, 7, 8 and 9.

Discussion

Health-related quality of life represents a comprehensive construct and different questionnaires address a wide variety of different health-related issues. Thus, without knowing which areas a specific instrument covers, investigators cannot ensure the relevance of the instrument’s contents to the purpose of the intended study. The examination of the questionnaires’ content is one, but essential, step among others in order to select an appropriate instrument. Using the ICF as an external, independent reference system to compare the content of widely used HNC-specific hrQOL questionnaires, we found both similarities and major differences between questionnaires. The examination of the questionnaires’ contents relies on the smallest possible units of content, namely on concepts contained in the items of a questionnaire. The results of this content comparison provide valuable information to facilitate the selection of appropriate questionnaires for different purposes of data collection in clinical as well as research settings. Researchers and clinicians, who define the aspects they want to measure in terms of the ICF, can directly use Tables 6, 7, 8 and 9 to find out which of the questionnaires cover the aspects they need. Thus, by using the ICF, the purpose of the investigation for which a questionnaire is needed and the content of the questionnaire itself can be easily matched to each other. Thereby, the selection of questionnaires is simplified.

Summarizing, the use of ICF categories in the questionnaires is very diverse. Out of 74 ICF categories, there are just 8 that are used in at least five out of nine questionnaires. This indicates that the different questionnaires differ significantly in content. Conclusions drawn from the different questionnaires cannot be easily compared. However, comparisons between questionnaires can still be made if we go down to the category level and compare two questionnaires in the ICF categories they both share.

The examination of the content structure of the nine questionnaires revealed insights into content diversity and the possible purpose of a questionnaire. The index of content diversity (different second-level ICF categories/all ICF categories of a questionnaire) indicates the extent to which the questionnaires are differentiated. Questionnaires with a lower index of content diversity (e.g., PSS-HN) might be more differentiated and fine-grained, including several items related to the same ICF category. In contrast, measures with a high content diversity (e.g., QOL-RTI, EORTC) may address their topics in a less differentiated way, more applicable for orientation. Depending on the special purpose of the questionnaires’ intended use, a different type of instrument would be appropriate, e.g., for surveys or individual decision-making. The purpose of the investigation and the purpose of the questionnaire must be carefully matched in order to get valuable results.

List and Bilir [31] published a well-founded literature research on commonly cited late side effects of HNC treatment. If we translate these side effects into the ICF, it is easily possible to decide which questionnaires cover all categories dealing with the relevant side effects:

Side effects after surgery (corresponding ICF categories and suitable alternatives): (1) disfigurement (s), (2) voice disturbance (b310, b3101), (3) difficulty in eating, chewing and swallowing (d550, b5102, b5105), (4) decreased activity (d) and (5) pain (b280, b28010, b28060). All relevant categories for late side effects after surgery were covered by the EORTC, FACT, UW_QOL, HN-QOL and LORQ.

Side effects after radiotherapy (corresponding ICF categories and suitable alternatives): (1) xerostomia (b5104), (2) difficulty in eating and swallowing (d550, b5105), (3) sticky salivation (b5104), (4) decreased taste (b250), (5) dental problems (s3200), (6) Pain (b280, b28010, b28060) and appearance (b1801). All relevant categories for late side effects after radiotherapy were covered by the EORTC and HN-QOL. The other questionnaires did not cover “appearance” or “dental problems”. Interestingly, the QOL-RTI, designed for radiotherapy, does not cover all relevant topics, e.g., dental problems.

In a similar approach it might be valuable to collect the most often occurring and most troublesome sequel after HNC, depending on the different tumor sites. For instrument selection, it might be helpful to differentiate between oral, pharyngeal and laryngeal cancer. Eventually, in a modular design principle there might set of possible items of a questionnaire differentiated by appropriateness for cancer site and treatment type.

Another important finding of this study refers to the representation of personal factors, as they are called in the ICF language. Only four out of nine questionnaires involve the influence of personal factors, like smoking/alcohol consumption, individual perception of life and others, as well as individual coping strategies. There are the FACT, QOL-RTI and UW_QOL, VHI. However, personal factors have proven to be highly relevant for hrQOL as well as cancer prognosis [32, 33]. Within the evaluated questionnaires, we did not find any with strong emphasis on personal factors. In the ICF model, personal factors essentially belong to the comprehensive bio-psycho-social view on health and disability. If we agree with the World Health Organization comprehensive bio-psycho-social view, we must discuss (a) what are the relevant personal factors for head and neck cancer and (b) how are we going to collect such information. This can be done either in an integrated, standardized, validated questionnaire applicable to HNC in general (and is presently done to some extent by the FACT, QOL RTI UW_QOL) or in a separate, possibly more open approach independent of one of the standardized questionnaires.

An ICF-based content examination of hrQOL questionnaires may serve further purposes other than the selection of questionnaires. An ICF-based content examination may facilitate the development of new or modified measures. Determination of ICF categories that are important to patients with HNC can be used to identify areas of functioning and health, which are scarcely captured by existing questionnaires and can guide further instrument development. Presently, there is an international effort to develop such an independent selection of ICF categories relevant for HNC [34].

The current study is subject to several limitations. The systematic literature review used to identify current hrQOL questionnaires in HNC relied upon a simplified review methodology, using specific, rather than sensitive, search strategies. Moreover, we relied to a large extent on information contained in the abstracts. Still, with reference to the Proquolid database [27] and other reviews on questionnaires in HNC [5, 17, 18, 32, 35, 36], the questionnaires identified in this piece of work cover the most frequently used established hrQOL questionnaires and also include recently developed questionnaires like the LORQ. Instrument selection was based on a MEDLINE search. For practicability reasons, the number of full articles to be analyzed was decreased by a 50% random selection. Therefore, this piece of work does not claim to be complete in terms of instrument selection. However, presenting a complete literature review was not the intention in this work. Rather, we tried to introduce a new tool, which is the ICF, for content comparison among HNC-specific questionnaires and describe its advantages through the given analyses. The method still can be explained without having analyzed all possibly available questionnaires.

We evaluated the linkage process by calculating kappa coefficients, which showed satisfactory results for linker agreement. Kappa is an often used and simple indicator of agreement, accounting for chance. However, unsystematic error due to chance appears to be of secondary relevance for the linkage procedure. In the future, further analyses, e.g., using modeling methods, would be useful to explain the disagreements between the linkers (e.g., due to experience or profession) and to refine the linkage method.

The ICF proved useful for content comparison of HNC-related hrQOL questionnaires. With few exceptions, and excluding personal factors, which are not yet classified in the ICF, the concepts contained in the questionnaires could be linked to the ICF. Only 8 out of 474 concepts (2%) are not covered by the ICF. Other studies that compare disease-specific hrQOL questionnaires according to the ICF report similar percentages of not covered concepts: obesity 4% [37], low back pain 5% [38], chronic widespread pain 4% [38] and rheumatoid arthritis 1% [38].

The ICF was not specifically designed for head and neck cancer. Additionally, it was adopted recently, in 2001, and is a rather “young” classification, compared to the other WHO classifications, like the ICD that exists since 1893 in different updates [39]. In future versions of the ICF, concepts labeled “nc”, like for e.g., “sticky saliva” should be considered for inclusion.

In conclusion, the ICF provides a useful framework for comparing HNC-related health-status questionnaires that leads to new insights into their differences with respect to (1) the areas covered and (2) the breadth and precision of the covered concepts. This information can be useful in selecting questionnaires for any kind of investigation in which the health status of patients with HNC is a relevant study outcome. None of these questionnaires is ideal for all applications. However, after having decided on “what should be measured”, this piece of work might assist the clinician in deciding on the most suitable questionnaire.