Abstract
Purpose
To evaluate the reliability and validity of the EQ-5D in a general population sample in urban China.
Methods
Thousand and eight hundred respondents in 18 communities of Hangzhou, China were recruited by multi-stage stratified random sampling. Respondents self-administered a questionnaire including the EQ-5D, the SF-36, and demographic questions. Test–retest reliability at 2-week intervals was evaluated using Kappa coefficient, the intraclass correlation coefficient. The standard error of measurement (SEM) was used to indicate the absolute measurement error. Construct validity was established using convergent, discriminant, and known groups analyses.
Results
Complete data for all EQ-5D dimensions were available for 1,747 respondents (97%). Kappa values were from 0.35 to 1.0. The ICCs of test–retest reliability were 0.53 for the EQ-5D index score and 0.87 for the EQ VAS score. The SEM values were 0.13 (9.22% range) and 4.20 (4.20% range) for the EQ-5D index and EQ VAS scores, respectively. The Pearson’s correlation coefficients between the EQ-5D and the SF-36 were stronger between comparable dimensions than those between less comparable dimensions, demonstrating convergent and discriminant evidence of construct validity. The Chinese EQ-5D distinguished well between known groups: respondents who reported poor general health and chronic diseases had worse HRQoL than those without. Older people, females, people widowed or divorced, and those with a lower socioeconomic status reported poorer HRQoL. Respondents reporting no problems on any EQ-5D dimension had better scores on the SF-36 summary scores than those reporting problems.
Conclusions
The Chinese version of the EQ-5D demonstrated acceptable construct validity and fair to moderate levels of test–retest reliability in an urban general population in China.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The EQ-5D is an established health-related quality of life (HRQoL) instrument, used frequently in both clinical trials and health services research [1]. The validity of the Chinese EQ-5D has been assessed in mainland China [2–4] and elsewhere [5–7]; its reliability, however, is not well reported. We evaluated the reliability and validity of the EQ-5D in a sample of the general population in urban China.
Methods
Sample and study design
The survey using a multi-stage stratified random sampling approach was conducted in Hangzhou, China in 2008. Nine “Jiedao” (sub-district neighborhood) were randomly selected from Xiacheng district (central), Gongshu district (sub-central), and Yuhang district (suburb), three for each. Two communities from each “Jiedao” and 70 households from each community were randomly selected. The total sample size was 1,800, with 200 in each “Jiedao”. All residents 14 years old and above, living in a sampled household for at least 6 months were eligible to participate until the quota for each “Jiedao” was met. Participants self-administered a questionnaire containing the Chinese EQ-5D and SF-36. Trained interviewers administered questions regarding existence of chronic diseases. Sixty respondents were randomly sampled among respondents on the first survey day that would be willing to self-administer the EQ-5D and SF-36 in a 2-week period. Written consent was obtained from all respondents for this study approved by Zhejiang University School of Medicine Ethics Committee.
The EQ-5D comprises five health dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression, each with three response categories: no, some or extreme problems) and a 0–100 points visual analogue scale (EQ VAS) [8]. Scores for the five dimensions can be converted into a utility index score by applying the scores from preference weights elicited from the UK general population [9]. The SF-36 is a validated [10, 11] 36-item instrument yielding eight scales and two summary scores. Higher scores indicate better health status.
Data analysis
Reliability and validity of the EQ-5D were assessed according to established guidelines [12]. To evaluate reliability, it was assumed that the health status between two measurements was stable. The percentages of agreement and kappa coefficients for the five dimensions were calculated. Kappa values below 0.2 indicate a slight agreement, 0.21–0.4 fair, 0.41–0.6 moderate, 0.61–0.8 substantial, and 0.81–1.0 an almost perfect agreement [13]. Test–retest reliability of the EQ-5D index and EQ VAS scores was determined using the intraclass correlation coefficient (ICC; two-way mixed-effect model/absolute agreement definition ICC2,1) [14]. An ICC greater than 0.70 is considered appropriate for group comparison [15]. The standard error of measurement (SEMagreement) was used to assess variability, i.e., the absolute measurement error [16, 17]. It was also expressed as a percent of the measurement range (SEM%) likely to be encountered in actual research [12].
To evaluate construct validity, we first assessed convergent and discriminant evidence by examining relationships with the SF-36 using Pearson’s correlation. It was expected that comparable dimensions, e.g., EQ-5D pain/discomfort and SF-36 bodily pain, would correlate better, compared with less comparable dimensions, such as EQ-5D mobility and SF-36 mental health. Pearson’s correlation coefficients of 0.50 or above were regarded as strong, 0.30–0.49 as moderate, and lower than 0.30 as weak [18].
Second, construct validity was assessed by comparing the EQ-5D index (both the UK [9] and Japanese preference weights [19] were used) and EQ VAS scores for subgroups of respondents with differing self-reported overall health and number of chronic diseases using ANOVA. It was also expected that older people, females, people widowed or divorced, and those with a lower socioeconomic status would report poorer health [2, 3, 20–22]. The relationships between the EQ-5D and the demographic variables were examined using ANOVA, t-test, or chi-square test. Finally, Mean SF-36 summary scores for respondents reporting no problems for any EQ-5D dimension were compared with those for respondents reporting problems using t-test, and higher SF-36 scores was expected in the first case [23].
Results
Among the 1,800 respondents from 1,260 selected households, complete data for all EQ-5D dimensions were available for 1,747 respondents (97%) and analyzed for the present study. The estimated response rate was 71.4% (two eligible individuals in each household on average [11]). The mean age was 47.5 years (SD 17.5, range 14–99), with 51.6% being women. Compared with Hangzhou urban area demographic statistics for year 2008 [24], our sample had similar sex ratio, older age, and higher educational attainment (Table 1).
EQ-5D response
The majority of respondents reported no problems (ceiling effects), ranging from 78.0% for the pain/discomfort dimension to 96.7% for the self-care dimension (Table 2). The mean EQ-5D index score was 0.92 (SD 0.17, range −0.59 to 1), and the mean EQ VAS score was 84.44 (SD 13.0, range 8.50–100).
Test–retest reliability
In the retest samples, 48 of 60 respondents returned the retest questionnaire, and the data from 31 respondents whose scores of the first question of the SF-36 (self-reported overall health) was the same at two measurements were analyzed. The median interval of test–retest measurement was 13 days (interquartile range: 12–15 days). Kappa values for EQ-5D items regarding mobility, self-care, usual activities, pain/discomfort, and anxiety/depression between measurements were 1.00, 0.65, 0.87, 0.35, and 0.63, respectively. The ICCs of test–retest reliability were 0.53 for the EQ-5D index score and 0.87 for the EQ VAS score, respectively. The SEM values (SEM%) were 0.13 (9.22%) and 4.20 (4.20%) for the EQ-5D index and EQ VAS scores, respectively, (Table 3).
Validity
The Pearson’s correlation coefficients between the EQ-5D and the SF-36 were stronger between comparable dimensions (e.g., −0.59 between EQ-5D pain/discomfort and SF-36 BP and −0.44 between EQ-5D mobility and SF-36 PF) than those between less comparable dimensions (e.g., −0.26, −0.20 between EQ-5D mobility, self-care, and SF-36 MH, respectively) with a few exceptions, demonstrating convergent and discriminant evidence of construct validity. The EQ-5D index and EQ VAS scores had moderate or strong correlations with all SF-36 scores (all P < 0.001, Table 4).
Respondents who reported poor general health and chronic diseases had significantly lower EQ-5D index and EQ VAS scores (Table 5). The discrepancy between the UK and the Japanese versions of EQ-5D index scores was much smaller for better health states, but larger for worse health status. Older people, females, people widowed or divorced, and those with a lower socioeconomic status reported poorer HRQoL as expected (Table 6). Respondents reporting no problems on any EQ-5D dimension had better scores on the SF-36 summary scores, respectively, than those reporting problems (all P < 0.001, data not shown).
Discussion
The study assessed the reliability and validity of the Chinese EQ-5D in a large urban population in China. Compared with most EQ-5D studies in general population [21, 25, 26], our sample covered 14-to 18-year-old adolescents. The Chinese EQ-5D youth version is not available; it is therefore suitable to apply the EQ-5D in adolescents primarily to allow for follow-up and comparisons over a wide range of ages.
Construct validity of the EQ-5D was demonstrated using convergent, discriminant, and known groups analyses. The EQ-5D showed fair to moderate levels of test–retest reliability, with high percentage of respondents reporting same level of problems in the dimensions and satisfactory ICC for the EQ VAS score. However, the examination of reliability was compromised by high ceiling effects. Reliability coefficients not only reflect the degree of agreement between repeated measures, but also the degree to which a measurement instrument can differentiate among individuals [17, 27]. In a homogeneous population, the within-subject variance can easily overwhelm the between-subject variance, making for low reliability [28]. The SEM is relatively sample-independent and useful in the interpretation of HRQoL change [29, 30]. A higher SEM value for the EQ VAS score after 1 month was reported recently [31]. When applying Japanese preference weights [19], the ICC, SEM value, and SEM% for the EQ-5D index score were 0.64, 0.09, and 8.11%, respectively.
There are several studies where the EQ-5D has been used among the Chinese general population. Wang et al. [2] measured EQ-5D data among 2,994 respondents from one district of Beijing. Recently, Sun et al. [3] analyzed national EQ-5D data and provided norms for the Chinese general population. The reliability of EQ-5D was not measured in these two studies. Chang et al. [32] reported validation results in a representative sample of the 20–64 years Taiwanese population. Similar ICCs were reported (0.51 for the EQ-5D index score and 0.70 for the EQ VAS score), even though people more than 65 years old were not recruited.
This study had limitations. First, although there were a small number of non-respondents due to refusal or inaccessibility after three visits, no data were available for them and it is unclear whether characteristics of the non-respondents differed from the respondents. Second, although the estimated response rate was high, there might be selection or response bias [33]. Third, the retested sample size was small for the assessment of the reliability. Fourth, our sample was more representative of an older and educated general population.
We conclude that the Chinese EQ-5D demonstrated acceptable construct validity and fair to moderate levels of test–retest reliability in an urban general population in China.
Abbreviations
- HRQoL:
-
Health-related quality of life
- VAS:
-
Visual analogue scale
References
Rabin, R., & de Charro, F. (2001). EQ-5D: A measure of health status from the EuroQol Group. Annals of Medicine, 33(5), 337–343.
Wang, H., Kindig, D. A., & Mullahy, J. (2005). Variation in Chinese population health related quality of life: results from a EuroQol study in Beijing, China. Quality of Life Research, 14(1), 119–132.
Sun, S., Chen, J., Johannesson, M., Kind, P., Xu, L., Zhang, Y., et al. (2011). Population health status in China: EQ-5D results, by age, sex and socio-economic status, from the national health services survey 2008. Quality of Life Research, 20(3), 309–320.
Zhao, F. L., Yue, M., Yang, H., Wang, T., Wu, J. H., & Li, S. C. (2010). Validation and comparison of EuroQol and short form 6D in chronic prostatitis patients. Value Health, 13(5), 649–656.
Lubetkin, E. I., Jia, H., & Gold, M. R. (2004). Construct validity of the EQ-5D in low-income Chinese American primary care patients. Quality of Life Research, 13(8), 1459–1468.
Leung, B., Luo, N., So, L., & Quan, H. (2007). Comparing three measures of health status (perceived health with Likert-type scale, EQ-5D, and number of chronic conditions) in Chinese and white Canadians. Medical Care, 45(7), 610–617.
Luo, N., Chew, L. H., Fong, K. Y., Koh, D. R., Ng, S. C., Yoon, K. H., et al. (2003). Validity and reliability of the EQ-5D self-report questionnaire in Chinese-speaking patients with rheumatic diseases in Singapore. Ann Acad Med Singap, 32(5), 685–690.
Brooks, R. (1996). EuroQol: The current state of play. Health Policy, 37(1), 53–72.
Dolan, P. (1997). Modeling valuations for EuroQol health states. Medical Care, 35(11), 1095–1108.
Ware, J. E., Snow, K. K., Kosinski, M., & Gandek, B. (1993). SF-36 Health survey manual and interpretation guide. Boston, MA: The Health Institute, New England Medical Center.
Li, L., Wang, H. M., & Shen, Y. (2003). Chinese SF-36 health Survey: Translation, cultural adaptation, validation, and normalisation. Journal of Epidemiology and Community Health, 57(4), 259–263.
Scientific Advisory Committee of the Medical Outcomes Trust. (2002). Assessing health status and quality-of-life instruments: attributes and review criteria. Quality of Life Research, 11(3), 193–205.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428.
Staquet, M. J., Hays, R. D., & Fayers, P. M. (1998). Quality of life assessment in clinical trials: Methods and practice (pp. 169–182). Oxford: Oxford University Press.
Terwee, C. B., Bot, S. D., de Boer, M. R., van der Windt, D. A., Knol, D. L., Dekker, J., et al. (2007). Quality criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology, 60(1), 34–42.
de Vet, H. C., Terwee, C. B., Knol, D. L., & Bouter, L. M. (2006). When to use agreement versus reliability measures. Journal of Clinical Epidemiology, 59(10), 1033–1039.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Tsuchiya, A., Ikeda, S., Ikegami, N., Nishimura, S., Sakai, I., Fukuda, T., et al. (2002). Estimating an EQ-5D population value set: the case of Japan. Health Economics, 11(4), 341–353.
Johnson, J. A., & Coons, S. J. (1998). Comparison of the EQ-5D and SF-12 in an adult US sample. Quality of Life Research, 7(2), 155–166.
Kind, P., Dolan, P., Gudex, C., & Williams, A. (1998). Variations in population health status: results from a United Kingdom national questionnaire survey. BMJ, 316(7133), 736–741.
Kontodimopoulos, N., Pappa, E., Niakas, D., Yfantopoulos, J., Dimitrakaki, C., & Tountas, Y. (2008). Validity of the EuroQoL (EQ-5D) instrument in a Greek general population. Value Health, 11(7), 1162–1169.
Brazier, J., Jones, N., & Kind, P. (1993). Testing the validity of the Euroqol and comparing it with the SF-36 health survey questionnaire. Quality of Life Research, 2(3), 169–180.
Hangzhou Bureau of Statistics. (2008). Hangzhou statistical yearbook. http://www.hzstats.gov.cn/web/. Accessed 15th September 2010.
Fryback, D. G., Dunham, N. C., Palta, M., Hanmer, J., Buechner, J., Cherepanov, D., et al. (2007). US norms for six generic health-related quality-of-life indexes from the National Health Measurement study. Medical Care, 45(12), 1162–1170.
König, H. H., Bernert, S., Angermeyer, M. C., Matschinger, H., Martinez, M., Vilagut, G., et al. (2009). Comparison of population health status in six european countries: Results of a representative survey using the EQ-5D questionnaire. Medical Care, 47(2), 255–261.
Kottner, J. (2009). Interrater reliability and the kappa statistic: a comment on Morris et al. (2008). International Journal of Nursing Studies, 46(1), 140–141.
Bartko, J. J. (1991). Measurement and reliability: statistical thinking considerations. Schizophrenia Bulletin, 17(3), 483–489.
Terwee, C. B., Roorda, L. D., Knol, D. L., De Boer, M. R., & De Vet, H. C. (2009). Linking measurement error to minimal important change of patient-reported outcomes. Journal of Clinical Epidemiology, 62(10), 1062–1067.
de Boer, M. R., de Vet, H. C., Terwee, C. B., Moll, A. C., Völker-Dieben, H. J., & van Rens, G. H. (2005). Changes to the subscales of two vision-related quality of life questionnaires are proposed. Journal of Clinical Epidemiology, 58(12), 1260–1268.
Mannion, A. F., Boneschi, M., Teli, M., Luca, A., Zaina, F., Negrini, S., et al. (2011). Reliability and validity of the cross-culturally adapted Italian version of the core outcome measures index. Eur Spine J. [Epub ahead of print].
Chang, T. J., Tarn, Y. H., Hsieh, C. L., Liou, W. S., Shaw, J. W., & Chiou, X. G. (2007). Taiwanese version of the EQ-5D: validation in a representative sample of the Taiwanese population. Journal of the Formosan Medical Association, 106(12), 1023–1031.
Sjöström, O., & Holst, D. (2002). Validity of a questionnaire survey: response patterns in different subgroups and the effect of social desirability. Acta Odontologica Scandinavica, 60(3), 136–140.
Acknowledgments
This work was supported by a grant of the National Natural Science Foundation of China (No.70603024). We gratefully acknowledge all investigators involved in the collection and management of data.
Conflict of interest
The authors declare no conflict of interests.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, HM., Patrick, D.L., Edwards, T.C. et al. Validation of the EQ-5D in a general population sample in urban China. Qual Life Res 21, 155–160 (2012). https://doi.org/10.1007/s11136-011-9915-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-011-9915-6