Introduction

Quality of life (QOL) is nowadays considered to be one of the most important outcome measures in medical studies, especially in the field of cancer clinical research. This trend makes it all the more necessary to have a clear methodology for the development and use of QOL instruments. Many QOL instruments for patients with breast cancer have been developed, such as LASA [9], functional assessment of cancer therapy for breast cancer (FACT-B) [34], European Organization for Research and Treatment of Cancer (EORTC) quality of life questionnaire QLQ-C30 and QLQ-BR23 [1, 11]. In China, more and more clinicians and researchers are more concerned with the QOL of patients with breast cancer, but very few Chinese QOL instruments are available, which impedes research and applications of QOL in this field greatly. Although the Chinese versions of QLQ-BR53 (QLQ-C30 and QLQ-BR23) and FACT-B can be used for Chinese patients [19, 21], they are lacking in Chinese cultural backgrounds to some extent considering their original use in English-speaking patients. For example, the family relationship and kinship play very important roles in daily life. Taoism and traditional medicine focus on good temper and high spirit. Good appetite, sleep, and energy are highly regarded in daily life with food culture being very important. This kind of culture dependence does not reflect in most QOL instruments in other languages. It is necessary to develop Chinese specific QOL instruments. Therefore, by combining a general module and cancer-specific modules, we have developed the Chinese QOL instruments system called quality of life instruments for cancer patients (QLICP) [1518, 20]. This system includes a general module (QLICP-GM) which can be used with all types of cancer, and specific modules for different cancer with each module being used for only the relevant disease. As an example, the breast cancer instrument of this system, quality of life instruments for cancer patients-breast cancer (QLICP-BR), was formed by combining the QLICP-GM and the specific module of the breast cancer. At present, the QLICP-GM and 12 specific modules have been developed, and thus 12 cancer-specific QOL instruments were formed, namely, lung cancer (QLICP-LU), breast cancer (QLICP-BR), head and neck cancer (QLICP-HN), colorectal cancer (QLICP-CR), liver cancer (QLICP-LI), esophageal cancer (QLICP-ES), stomach cancer (QLICP-ST), bladder cancer (QLICP-BL), prostate cancer (QLICP-PR), cervical cancer (QLICP-CE), ovarian cancer (QLICP-OV), and brain cancer (QLICP-BN). This paper aims to report the developmental process and validation of the QLICP-BR.

Materials and methods

Establishment of the general module (QLICP-GM)

Two working groups, composed of physicians, nurses, medical educators, teachers, and researchers formed a nominal group of 16 persons and a focus group of ten persons. The programmed decision method was used in item selection. First, the focus group discussed and confirmed the structure of the instrument, which included four domains: physical, psychological, social, and the common symptoms/side-effects. After reviewing some well-known QOL instruments such as SF-36 [12], NHP [5], FACT-G [4], QLQ-C30 [1], and considering elements of Chinese culture, the nominal group proposed some possible items under each of the facets within the domains, resulting in a 78-item pool. Then some methods such as focus group discussion, in-depth interview, and pilot-test were used to refine and select items. Four statistical procedures (variation analysis, correlation analysis, factor analysis, and cluster analysis) were used to rescreen the items based on pretest data. Finally, 32 items were selected to form the QLICP-GM, which included four domains and nine facets (see Table 1 in detail), and this scale was confirmed to have good validation based on the data of 600 patients of lung cancer (85), breast cancer (186), colorectal cancer (110), head and neck cancer (133), and stomach cancer (86).

Table 1 Scoring method of the quality of life instrument QLICP-BR

The entire process of developing the QLICP-GM can be seen in other papers [1518, 20], the main steps are summarized as follows:

  • Item pool (78 items)

    • ↓ focus and nominal group discussions

  • Screened refining Items (56 items)

    • ↓ importance test (50 cases interview), analysis, focus group discussions

  • Primary scale (V0.0, 40 items)

    • ↓ pretest (448 cases), analysis, focus group discussions

  • Final scale (V1.0, nine facets under four domains, 32 items)

    • ↓ 600 patients

  • Evaluation (validity, reliability, responsiveness)

Establishment of the specific module

After development of the QLICP-GM, the 14 items reflecting symptoms, side effects, and special mental health of breast cancer were selected to form the item pool of the specific module, and similar methods were used to get the final module, which has seven items and can be classified into three facets (see Table 1 in detail).

Evaluation of the QLICP-BR

The formal QLICP-BR (general module QLICP-GM plus specific module) was used to evaluate patients with breast cancer in a great scale in order to study its validity, reliability, and responsiveness. The study population was limited to breast cancer inpatients at any stages and treatments who were able to read and understand the questionnaires. The participating investigators were doctors, nurses, and medical postgraduate students. The investigators explained the aims of the trial and the instrument to the patients and obtained informed consent from those patients who agreed to participate in the study and met the inclusion criteria. Each respondent (n = 186) answered the questionnaires at the time of admission to the hospital. Some patients (n = 166) were entered to participate in a second assessment the following day or the second day after hospitalization to evaluate test–retest reliability, and 94 cases were sampled randomly and assessed a third time after treatment in order to evaluate responsiveness.

After investigation, the raw scores of items, domains, and overall scale were calculated. Each item of QLICP-BR is rated in a five-level scoring system, namely, not at all, a little bit, somewhat, quite a bit, and very much. The positively stated items directly obtain scores from one to five points and the negatively stated items are reversed. Each domain score is obtained by adding its own item score together. The overall scale score is the sum of five domains score (see Table 1).

For comparison, all domains scores were linearly converted to a 0–100 scale using the formula: SS = (RS-Min) ×100/R, where SS, RS, Min and R represent the standardized score, raw score, minimum score, and range of scores, respectively.

Chinese versions of FACT-B and QLQ-BR53 [19, 21] were used simultaneously in order to compare one with the other. Psychometric properties of the instrument were subsequently analyzed. Construct validity was evaluated by calculating the Pearson correlation coefficient, r, among items and domains as well as factor analysis. Internal consistency reliability was evaluated using Cronbach’s alpha coefficient for each domain, and test–retest reliability through calculating the Pearson correlation coefficient between the first and second assessment as well as intraclass correlation (ICC) defined based on absolute agreement with single measure under the two-way mixed model [7, 10]. Responsiveness was assessed through comparing the mean difference between the pretreatment and post-treatment with effect size, standardized response mean (SRM) [6, 13].

Results

The 186 patients with breast cancer varied in age from 16 to 78, with a median age of 48.0 and mean age 48.5 ± 10.1; 49(26.3%) patients finished primary school, while 95(51.1%) completed high school, and 40(21.5%) had a college degree; ethnics of Han is 163 cases (87.6%) while others is 22 (11.8%); distributions of occupations are worker 55 cases (29.6), farmer 24 (12.9%), teacher 25 (13.9%), cadre 37 (19.9%), others 45 (24.2%).

Content validity

By reviewing the literature and consulting some practical panels, it was agreed that the item pool well represented WHO’s concept about QOL [14] and the specific aspects of the patients with breast cancer. This was facilitated by use of the programmed decision method for item selection, which produced a scale with good content validity.

Construct validity

Correlational analyses showed that there were strong correlations between items and their own domains (most correlation coefficients are higher than 0.5), but weak correlations between items and other domains (see Table 2 in detail).

Table 2 Correlation coefficients among items and domains of QLICP-BR (n = 186)

There were nine principal components (the initial Eigenvalues >1) abstracted from 32 items of the general module (QLICP-GM) by factor analysis, accounting for 74.49% of the cumulative variance. By using the Varimax rotation method, it can be seen that the nine principal components reflected nine different facets under four domains of the general module. The first and sixth principal components mainly represented the psychological domain with higher loadings on GPS1 (0.79), GPS2 (0.80), GPS3 (0.71), GPS4 (0.74), GPS5 (0.63), GPS6 (0.83), GPS8 (0.85), and GPS12 (0.69). The second and fifth principal components mainly represented the physical domain with higher loadings on GPH2 (0.72), GPH4 (0.72), GPH5 (0.78), GPH6 (0.82), and GPH7 (0.75). The third and seventh principal components mainly represented the social domain with higher loadings on GSO1 (0.79), GSO2 (0.71), GSO3 (0.74), and GSO4 (0.84). The fourth, eighth, and ninth principal components mainly represented the common symptom and side effect domain with higher loadings on GSS1 (0.78), GSS2 (0.64), GSS3 (0.77), and GSS5 (0.63).

Similarly, the principal component factor analysis extracted three principal components from the seven items of the specific module with the cumulative variance of 74.96%, reflecting three facets of this module. And here the first principal component represented the facet of physical and psychological effect with higher factor loadings on SBR6 (0.89) and SBR7 (0.89),the second principal component represented the facet of breast symptom with higher factor loadings on SBR1 (0.61), SBR2 (0.85), and SBR3 (0.80), the third principal component represented the facet of upper body effects with higher factor loadings on SBR4 (0.79) and SBR5 (0.89).

From the results above, theoretical construct was confirmed by data analysis, showing good construct validity.

Criterion-related validity

In this study, we chose Chinese versions of FACT-B and QLQ-BR53 as the criterions for assessing criterion-related validity because of the lack of an agreed-upon gold standard. Correlation coefficients of scores among the domains of the QLICP-BR and FACT-B can be seen from our other article [21] and the ones among the domains of the QLICP-BR and QLQ-BR53 were presented in Table 3, with both showing that overall the correlations between the same and similar domains are higher than those between different and nonsimilar domains. For example, the coefficient between the psychological domain of QLICP-BR and emotional functioning of QLQ-BR53 was 0.64, higher than any other coefficients in this column, e.g., physical functioning (0.09), role functioning (0.29). Also the specific domain of QLICP-BR and breast symptoms of QLQ-BR53 has the biggest correlation coefficient (−0.59) in this column. These confirmed the criterion-related validity to a reasonable degree and also demonstrated the convergent and divergent validity to some extent.

Table 3 Correlation coefficients of scores among subscales of QLQ-BR53 and domains of QLICP-BR

Reliability

The reliability of the scale was evaluated by two procedures: test–retest and internal consistency Cronbach’s α.

The test–retest correlation coefficients (r) for the five domains and 12 facets of QLICP-BR ranged from 0.72 to 0.91, with the overall scale 0.88. The results from ICC and their 95% confidence intervals computed based on the definition of absolute agreement for a single measure were very similar to Pearson’s correlation coefficients (r). The score differences of these domains and facets between the first and the second measurements were not statistically significant (p> 0.05) by paired t tests.

Cronbach’s α for these domains and facets ranged from 0.58 to 0.90, with that of all domains being greater than 0.60 except of social domain (0.58), see Table 4 in detail.

Table 4 Reliability of the quality of life instrument QLICP-BR (n = 186 for α, n = 166 for r and ICC)

Responsiveness

A classical paired t-test with responsiveness indicator, SRM, was used to examine statistically significant changes of mean scores from each domain of the QLICP-BR between the assessments before and after treatment, with the results presenting in Table 5. It can be seen that three of five domains and the overall instrument were of statistical significance except of SSD and SPD.

Table 5 Responsiveness of the quality of life instrument QLICP-BR (n = 94)

Discussions

The development of a QOL instrument is a lengthy process. This paper focused on the main steps of developments and validation of the QLICP-BR.

Considering same-class diseases such as cancer often share many things in common, a popular approach in recent years has been to develop a general module for a class of diseases and then additional modules to capture individual differences in different people and diseases. Since the add-on modules are much simpler, this approach can substantially reduce the amount of time and effort in developing new instruments. The QLQs from EORTC and the FACTs from Center on Outcomes, Research and Education (CORE) have been developed based on this modular principle [1, 3, 4, 11]. By building upon existing instruments for cancer patients and Chinese culture, we employed this modular approach to systematically and more efficiently develop a system of instruments for cancer patients, QLICP, with QLICP-BR being one instrument of this system [1518, 20].

On psychometric properties of a QOL instrument, it must be validated with respect to at least three aspects: validity, reliability, and responsiveness.

Validity is the extent to which an instrument can capture what it purports to measure. By following WHO’s definition of QOL [14] and the programmed decision procedures, we developed the QLICP-BR for patients with breast cancer by use of multiple turns of focus group discussion, in-depth interview, and pretesting to effectively reduce the number of items in the final version to 32 from a 78-item pool for the general module, and seven from 14 for the specific module, which ensured good content validity and conceptual structure of this instrument. Besides, correlation analyses and factor analysis confirmed the good construct validity and criterion-related validity.

Reliability refers to the reproducibility or consistency of item scores from one assessment to another. Test–retest reliability (Pearson r and ICC), internal consistency reliability were applied in the current study. The test–retest correlation coefficient (Pearson r) only reflects the consistence in tendency between two repeated tests. Therefore, in this paper, the paired t test was also used to compare the “difference” of scores between the first and second measurements. Given that no statistical significance being found, the correlation coefficients are meaningful. In such cases, Pearson r and ICC can be used interchangeably to provide assessment of reliability, with the results close to each other. Based on the results, it can be inferred that QLICP-BR is of good reliability.

The assessment methods on responsiveness can be divided into two categories: internal and external [6, 13]. Internal responsiveness characterizes the ability of a measure to Change over a particular prespecified time frame. One widely used method of assessing internal responsiveness is to evaluate the change in a measure within the context of a randomized clinical trial involving a treatment that has previously been shown to be efficacious [2, 8]. External responsiveness reflects the extent to which changes in a measure over a specified time frame relate to corresponding changes in a reference measure of health status. In this paper we used the definition and method of internal responsiveness. Classical paired t-test was employed in this study to make mean-comparisons between the pretreatment and post-treatment assessment, also accompanying with an important responsiveness indicator, SRM, with values of 0.20, 0.50, and 0.80 having been proposed to represent small, moderate, and large responsiveness, respectively [6, 13]. Table 5 showed QOL scores changes after treatment were of statistical significance on three domains of physical function, psychological function, social function, and the overall instrument, with not higher SRM of 0.22, 0.21, 0.21, and 0.27. There are three possible reasons for common symptom/side-effects domain and specific domain which were not statistically significant: (1) the sample size may be not big enough (n = 94), (2) the interval between the measurements before and after treatment is not long enough (about 4 weeks), and (3) the score in this domain is of no change in nature. Therefore, it can be concluded that the instrument has reasonable responsiveness.

In addition, QLICP-BR has some highlights compared to Chinese versions of FACT-B and QLQ-BR53 although they have similar psychological properties [19, 21]. First, it is of strong Chinese culture background. For example, the Chinese culture pays more attention to the family relationship and kinship, eating and food, good temper, and high spirit. It includes some items focusing on these such as appetite, sleep, energy, family support etc. Second, unlike FACT-B and QLQ-BR53, it has a very clear hierarchical structure (items→ facets→ domains→ overall). It can present mean scores not only at the domains (five domains) and the overall level but also at the facet levels (12 facets) so that it can detect changes in more details. The users can select either one level or both.

To sum up, the QLICP-BR can be used as a useful instrument in measuring and assessing quality of life for patients with breast cancer in China, with good psychological properties and some highlights.