Validation of the EQ-5D in a general population sample in urban China

Wang, Hong-Mei; Patrick, Donald L.; Edwards, Todd C.; Skalicky, Anne M.; Zeng, Hai-Yan; Gu, Wen-Wen

doi:10.1007/s11136-011-9915-6

Validation of the EQ-5D in a general population sample in urban China

Published: 20 April 2011

Volume 21, pages 155–160, (2012)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Quality of Life Research Aims and scope Submit manuscript

Validation of the EQ-5D in a general population sample in urban China

Download PDF

Hong-Mei Wang¹,
Donald L. Patrick²,
Todd C. Edwards²,
Anne M. Skalicky²,
Hai-Yan Zeng¹ &
…
Wen-Wen Gu¹

1340 Accesses
116 Citations
Explore all metrics

Abstract

Purpose

To evaluate the reliability and validity of the EQ-5D in a general population sample in urban China.

Methods

Thousand and eight hundred respondents in 18 communities of Hangzhou, China were recruited by multi-stage stratified random sampling. Respondents self-administered a questionnaire including the EQ-5D, the SF-36, and demographic questions. Test–retest reliability at 2-week intervals was evaluated using Kappa coefficient, the intraclass correlation coefficient. The standard error of measurement (SEM) was used to indicate the absolute measurement error. Construct validity was established using convergent, discriminant, and known groups analyses.

Results

Complete data for all EQ-5D dimensions were available for 1,747 respondents (97%). Kappa values were from 0.35 to 1.0. The ICCs of test–retest reliability were 0.53 for the EQ-5D index score and 0.87 for the EQ VAS score. The SEM values were 0.13 (9.22% range) and 4.20 (4.20% range) for the EQ-5D index and EQ VAS scores, respectively. The Pearson’s correlation coefficients between the EQ-5D and the SF-36 were stronger between comparable dimensions than those between less comparable dimensions, demonstrating convergent and discriminant evidence of construct validity. The Chinese EQ-5D distinguished well between known groups: respondents who reported poor general health and chronic diseases had worse HRQoL than those without. Older people, females, people widowed or divorced, and those with a lower socioeconomic status reported poorer HRQoL. Respondents reporting no problems on any EQ-5D dimension had better scores on the SF-36 summary scores than those reporting problems.

Conclusions

The Chinese version of the EQ-5D demonstrated acceptable construct validity and fair to moderate levels of test–retest reliability in an urban general population in China.

Validation and comparison of the psychometric properties of the EQ-5D-3L and EQ-5D-5L instruments in Greece

Article 04 June 2016

Comparative performance and mapping algorithms between EQ-5D-5L and SF-6Dv2 among the Chinese general population

Article 29 January 2023

Evaluating the reliability and validity of SF-8 with a large representative sample of urban Chinese

Article Open access 03 April 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The EQ-5D is an established health-related quality of life (HRQoL) instrument, used frequently in both clinical trials and health services research [1]. The validity of the Chinese EQ-5D has been assessed in mainland China [2–4] and elsewhere [5–7]; its reliability, however, is not well reported. We evaluated the reliability and validity of the EQ-5D in a sample of the general population in urban China.

Methods

Sample and study design

The survey using a multi-stage stratified random sampling approach was conducted in Hangzhou, China in 2008. Nine “Jiedao” (sub-district neighborhood) were randomly selected from Xiacheng district (central), Gongshu district (sub-central), and Yuhang district (suburb), three for each. Two communities from each “Jiedao” and 70 households from each community were randomly selected. The total sample size was 1,800, with 200 in each “Jiedao”. All residents 14 years old and above, living in a sampled household for at least 6 months were eligible to participate until the quota for each “Jiedao” was met. Participants self-administered a questionnaire containing the Chinese EQ-5D and SF-36. Trained interviewers administered questions regarding existence of chronic diseases. Sixty respondents were randomly sampled among respondents on the first survey day that would be willing to self-administer the EQ-5D and SF-36 in a 2-week period. Written consent was obtained from all respondents for this study approved by Zhejiang University School of Medicine Ethics Committee.

The EQ-5D comprises five health dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression, each with three response categories: no, some or extreme problems) and a 0–100 points visual analogue scale (EQ VAS) [8]. Scores for the five dimensions can be converted into a utility index score by applying the scores from preference weights elicited from the UK general population [9]. The SF-36 is a validated [10, 11] 36-item instrument yielding eight scales and two summary scores. Higher scores indicate better health status.

Data analysis

Reliability and validity of the EQ-5D were assessed according to established guidelines [12]. To evaluate reliability, it was assumed that the health status between two measurements was stable. The percentages of agreement and kappa coefficients for the five dimensions were calculated. Kappa values below 0.2 indicate a slight agreement, 0.21–0.4 fair, 0.41–0.6 moderate, 0.61–0.8 substantial, and 0.81–1.0 an almost perfect agreement [13]. Test–retest reliability of the EQ-5D index and EQ VAS scores was determined using the intraclass correlation coefficient (ICC; two-way mixed-effect model/absolute agreement definition ICC_2,1) [14]. An ICC greater than 0.70 is considered appropriate for group comparison [15]. The standard error of measurement (SEM_agreement) was used to assess variability, i.e., the absolute measurement error [16, 17]. It was also expressed as a percent of the measurement range (SEM%) likely to be encountered in actual research [12].

To evaluate construct validity, we first assessed convergent and discriminant evidence by examining relationships with the SF-36 using Pearson’s correlation. It was expected that comparable dimensions, e.g., EQ-5D pain/discomfort and SF-36 bodily pain, would correlate better, compared with less comparable dimensions, such as EQ-5D mobility and SF-36 mental health. Pearson’s correlation coefficients of 0.50 or above were regarded as strong, 0.30–0.49 as moderate, and lower than 0.30 as weak [18].

Second, construct validity was assessed by comparing the EQ-5D index (both the UK [9] and Japanese preference weights [19] were used) and EQ VAS scores for subgroups of respondents with differing self-reported overall health and number of chronic diseases using ANOVA. It was also expected that older people, females, people widowed or divorced, and those with a lower socioeconomic status would report poorer health [2, 3, 20–22]. The relationships between the EQ-5D and the demographic variables were examined using ANOVA, t-test, or chi-square test. Finally, Mean SF-36 summary scores for respondents reporting no problems for any EQ-5D dimension were compared with those for respondents reporting problems using t-test, and higher SF-36 scores was expected in the first case [23].

Results

Among the 1,800 respondents from 1,260 selected households, complete data for all EQ-5D dimensions were available for 1,747 respondents (97%) and analyzed for the present study. The estimated response rate was 71.4% (two eligible individuals in each household on average [11]). The mean age was 47.5 years (SD 17.5, range 14–99), with 51.6% being women. Compared with Hangzhou urban area demographic statistics for year 2008 [24], our sample had similar sex ratio, older age, and higher educational attainment (Table 1).

Table 1 Characteristics of the study sample (n = 1,747)

Full size table

EQ-5D response

The majority of respondents reported no problems (ceiling effects), ranging from 78.0% for the pain/discomfort dimension to 96.7% for the self-care dimension (Table 2). The mean EQ-5D index score was 0.92 (SD 0.17, range −0.59 to 1), and the mean EQ VAS score was 84.44 (SD 13.0, range 8.50–100).

Table 2 Distribution of responses to EQ-5D dimensions

Full size table

Test–retest reliability

In the retest samples, 48 of 60 respondents returned the retest questionnaire, and the data from 31 respondents whose scores of the first question of the SF-36 (self-reported overall health) was the same at two measurements were analyzed. The median interval of test–retest measurement was 13 days (interquartile range: 12–15 days). Kappa values for EQ-5D items regarding mobility, self-care, usual activities, pain/discomfort, and anxiety/depression between measurements were 1.00, 0.65, 0.87, 0.35, and 0.63, respectively. The ICCs of test–retest reliability were 0.53 for the EQ-5D index score and 0.87 for the EQ VAS score, respectively. The SEM values (SEM%) were 0.13 (9.22%) and 4.20 (4.20%) for the EQ-5D index and EQ VAS scores, respectively, (Table 3).

Table 3 Test–retest of reliability of the EQ-5D (n = 31)

Full size table

Validity

The Pearson’s correlation coefficients between the EQ-5D and the SF-36 were stronger between comparable dimensions (e.g., −0.59 between EQ-5D pain/discomfort and SF-36 BP and −0.44 between EQ-5D mobility and SF-36 PF) than those between less comparable dimensions (e.g., −0.26, −0.20 between EQ-5D mobility, self-care, and SF-36 MH, respectively) with a few exceptions, demonstrating convergent and discriminant evidence of construct validity. The EQ-5D index and EQ VAS scores had moderate or strong correlations with all SF-36 scores (all P < 0.001, Table 4).

Table 4 Correlations between the EQ-5D and SF-36

Full size table

Respondents who reported poor general health and chronic diseases had significantly lower EQ-5D index and EQ VAS scores (Table 5). The discrepancy between the UK and the Japanese versions of EQ-5D index scores was much smaller for better health states, but larger for worse health status. Older people, females, people widowed or divorced, and those with a lower socioeconomic status reported poorer HRQoL as expected (Table 6). Respondents reporting no problems on any EQ-5D dimension had better scores on the SF-36 summary scores, respectively, than those reporting problems (all P < 0.001, data not shown).

Table 5 Comparison of EQ-5D index and EQ VAS scores for subgroups of respondents with differing health status

Full size table

Table 6 Responses to the EQ-5D by sociodemographic variables (n = 1,747)

Full size table

Discussion

The study assessed the reliability and validity of the Chinese EQ-5D in a large urban population in China. Compared with most EQ-5D studies in general population [21, 25, 26], our sample covered 14-to 18-year-old adolescents. The Chinese EQ-5D youth version is not available; it is therefore suitable to apply the EQ-5D in adolescents primarily to allow for follow-up and comparisons over a wide range of ages.

Construct validity of the EQ-5D was demonstrated using convergent, discriminant, and known groups analyses. The EQ-5D showed fair to moderate levels of test–retest reliability, with high percentage of respondents reporting same level of problems in the dimensions and satisfactory ICC for the EQ VAS score. However, the examination of reliability was compromised by high ceiling effects. Reliability coefficients not only reflect the degree of agreement between repeated measures, but also the degree to which a measurement instrument can differentiate among individuals [17, 27]. In a homogeneous population, the within-subject variance can easily overwhelm the between-subject variance, making for low reliability [28]. The SEM is relatively sample-independent and useful in the interpretation of HRQoL change [29, 30]. A higher SEM value for the EQ VAS score after 1 month was reported recently [31]. When applying Japanese preference weights [19], the ICC, SEM value, and SEM% for the EQ-5D index score were 0.64, 0.09, and 8.11%, respectively.

There are several studies where the EQ-5D has been used among the Chinese general population. Wang et al. [2] measured EQ-5D data among 2,994 respondents from one district of Beijing. Recently, Sun et al. [3] analyzed national EQ-5D data and provided norms for the Chinese general population. The reliability of EQ-5D was not measured in these two studies. Chang et al. [32] reported validation results in a representative sample of the 20–64 years Taiwanese population. Similar ICCs were reported (0.51 for the EQ-5D index score and 0.70 for the EQ VAS score), even though people more than 65 years old were not recruited.

This study had limitations. First, although there were a small number of non-respondents due to refusal or inaccessibility after three visits, no data were available for them and it is unclear whether characteristics of the non-respondents differed from the respondents. Second, although the estimated response rate was high, there might be selection or response bias [33]. Third, the retested sample size was small for the assessment of the reliability. Fourth, our sample was more representative of an older and educated general population.

We conclude that the Chinese EQ-5D demonstrated acceptable construct validity and fair to moderate levels of test–retest reliability in an urban general population in China.

Abbreviations

HRQoL:: Health-related quality of life
VAS:: Visual analogue scale

References

Rabin, R., & de Charro, F. (2001). EQ-5D: A measure of health status from the EuroQol Group. Annals of Medicine, 33(5), 337–343.
Article PubMed CAS Google Scholar
Wang, H., Kindig, D. A., & Mullahy, J. (2005). Variation in Chinese population health related quality of life: results from a EuroQol study in Beijing, China. Quality of Life Research, 14(1), 119–132.
Article PubMed Google Scholar
Sun, S., Chen, J., Johannesson, M., Kind, P., Xu, L., Zhang, Y., et al. (2011). Population health status in China: EQ-5D results, by age, sex and socio-economic status, from the national health services survey 2008. Quality of Life Research, 20(3), 309–320.
Article PubMed Google Scholar
Zhao, F. L., Yue, M., Yang, H., Wang, T., Wu, J. H., & Li, S. C. (2010). Validation and comparison of EuroQol and short form 6D in chronic prostatitis patients. Value Health, 13(5), 649–656.
Article PubMed Google Scholar
Lubetkin, E. I., Jia, H., & Gold, M. R. (2004). Construct validity of the EQ-5D in low-income Chinese American primary care patients. Quality of Life Research, 13(8), 1459–1468.
Article PubMed Google Scholar
Leung, B., Luo, N., So, L., & Quan, H. (2007). Comparing three measures of health status (perceived health with Likert-type scale, EQ-5D, and number of chronic conditions) in Chinese and white Canadians. Medical Care, 45(7), 610–617.
Article PubMed Google Scholar
Luo, N., Chew, L. H., Fong, K. Y., Koh, D. R., Ng, S. C., Yoon, K. H., et al. (2003). Validity and reliability of the EQ-5D self-report questionnaire in Chinese-speaking patients with rheumatic diseases in Singapore. Ann Acad Med Singap, 32(5), 685–690.
PubMed CAS Google Scholar
Brooks, R. (1996). EuroQol: The current state of play. Health Policy, 37(1), 53–72.
Article PubMed CAS Google Scholar
Dolan, P. (1997). Modeling valuations for EuroQol health states. Medical Care, 35(11), 1095–1108.
Article PubMed CAS Google Scholar
Ware, J. E., Snow, K. K., Kosinski, M., & Gandek, B. (1993). SF-36 Health survey manual and interpretation guide. Boston, MA: The Health Institute, New England Medical Center.
Google Scholar
Li, L., Wang, H. M., & Shen, Y. (2003). Chinese SF-36 health Survey: Translation, cultural adaptation, validation, and normalisation. Journal of Epidemiology and Community Health, 57(4), 259–263.
Article PubMed CAS Google Scholar
Scientific Advisory Committee of the Medical Outcomes Trust. (2002). Assessing health status and quality-of-life instruments: attributes and review criteria. Quality of Life Research, 11(3), 193–205.
Article Google Scholar
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
Article PubMed CAS Google Scholar
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428.
Article PubMed CAS Google Scholar
Staquet, M. J., Hays, R. D., & Fayers, P. M. (1998). Quality of life assessment in clinical trials: Methods and practice (pp. 169–182). Oxford: Oxford University Press.
Google Scholar
Terwee, C. B., Bot, S. D., de Boer, M. R., van der Windt, D. A., Knol, D. L., Dekker, J., et al. (2007). Quality criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology, 60(1), 34–42.
Article PubMed Google Scholar
de Vet, H. C., Terwee, C. B., Knol, D. L., & Bouter, L. M. (2006). When to use agreement versus reliability measures. Journal of Clinical Epidemiology, 59(10), 1033–1039.
Article PubMed Google Scholar
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Article PubMed CAS Google Scholar
Tsuchiya, A., Ikeda, S., Ikegami, N., Nishimura, S., Sakai, I., Fukuda, T., et al. (2002). Estimating an EQ-5D population value set: the case of Japan. Health Economics, 11(4), 341–353.
Article PubMed Google Scholar
Johnson, J. A., & Coons, S. J. (1998). Comparison of the EQ-5D and SF-12 in an adult US sample. Quality of Life Research, 7(2), 155–166.
Article PubMed CAS Google Scholar
Kind, P., Dolan, P., Gudex, C., & Williams, A. (1998). Variations in population health status: results from a United Kingdom national questionnaire survey. BMJ, 316(7133), 736–741.
Article PubMed CAS Google Scholar
Kontodimopoulos, N., Pappa, E., Niakas, D., Yfantopoulos, J., Dimitrakaki, C., & Tountas, Y. (2008). Validity of the EuroQoL (EQ-5D) instrument in a Greek general population. Value Health, 11(7), 1162–1169.
Article PubMed Google Scholar
Brazier, J., Jones, N., & Kind, P. (1993). Testing the validity of the Euroqol and comparing it with the SF-36 health survey questionnaire. Quality of Life Research, 2(3), 169–180.
Article PubMed CAS Google Scholar
Hangzhou Bureau of Statistics. (2008). Hangzhou statistical yearbook. http://www.hzstats.gov.cn/web/. Accessed 15th September 2010.
Fryback, D. G., Dunham, N. C., Palta, M., Hanmer, J., Buechner, J., Cherepanov, D., et al. (2007). US norms for six generic health-related quality-of-life indexes from the National Health Measurement study. Medical Care, 45(12), 1162–1170.
Article PubMed Google Scholar
König, H. H., Bernert, S., Angermeyer, M. C., Matschinger, H., Martinez, M., Vilagut, G., et al. (2009). Comparison of population health status in six european countries: Results of a representative survey using the EQ-5D questionnaire. Medical Care, 47(2), 255–261.
Article PubMed Google Scholar
Kottner, J. (2009). Interrater reliability and the kappa statistic: a comment on Morris et al. (2008). International Journal of Nursing Studies, 46(1), 140–141.
Article PubMed Google Scholar
Bartko, J. J. (1991). Measurement and reliability: statistical thinking considerations. Schizophrenia Bulletin, 17(3), 483–489.
PubMed CAS Google Scholar
Terwee, C. B., Roorda, L. D., Knol, D. L., De Boer, M. R., & De Vet, H. C. (2009). Linking measurement error to minimal important change of patient-reported outcomes. Journal of Clinical Epidemiology, 62(10), 1062–1067.
Article PubMed Google Scholar
de Boer, M. R., de Vet, H. C., Terwee, C. B., Moll, A. C., Völker-Dieben, H. J., & van Rens, G. H. (2005). Changes to the subscales of two vision-related quality of life questionnaires are proposed. Journal of Clinical Epidemiology, 58(12), 1260–1268.
Article PubMed Google Scholar
Mannion, A. F., Boneschi, M., Teli, M., Luca, A., Zaina, F., Negrini, S., et al. (2011). Reliability and validity of the cross-culturally adapted Italian version of the core outcome measures index. Eur Spine J. [Epub ahead of print].
Chang, T. J., Tarn, Y. H., Hsieh, C. L., Liou, W. S., Shaw, J. W., & Chiou, X. G. (2007). Taiwanese version of the EQ-5D: validation in a representative sample of the Taiwanese population. Journal of the Formosan Medical Association, 106(12), 1023–1031.
Article PubMed Google Scholar
Sjöström, O., & Holst, D. (2002). Validity of a questionnaire survey: response patterns in different subgroups and the effect of social desirability. Acta Odontologica Scandinavica, 60(3), 136–140.
Article PubMed Google Scholar

Download references

Acknowledgments

This work was supported by a grant of the National Natural Science Foundation of China (No.70603024). We gratefully acknowledge all investigators involved in the collection and management of data.

Conflict of interest

The authors declare no conflict of interests.

Author information

Authors and Affiliations

Institute of Social Medicine and Family Medicine, School of Medicine, Zhejiang University, 866 Yuhang Tang Rd., 310058, Hangzhou, People’s Republic of China
Hong-Mei Wang, Hai-Yan Zeng & Wen-Wen Gu
Department of Health Services, University of Washington, 1208 43rd Street, Box 359455, Seattle, WA, 98195, USA
Donald L. Patrick, Todd C. Edwards & Anne M. Skalicky

Authors

Hong-Mei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Donald L. Patrick
View author publications
You can also search for this author in PubMed Google Scholar
Todd C. Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Anne M. Skalicky
View author publications
You can also search for this author in PubMed Google Scholar
Hai-Yan Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Wen Gu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong-Mei Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, HM., Patrick, D.L., Edwards, T.C. et al. Validation of the EQ-5D in a general population sample in urban China. Qual Life Res 21, 155–160 (2012). https://doi.org/10.1007/s11136-011-9915-6

Download citation

Accepted: 05 April 2011
Published: 20 April 2011
Issue Date: February 2012
DOI: https://doi.org/10.1007/s11136-011-9915-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Validation of the EQ-5D in a general population sample in urban China