Introduction

Adolescence represents a critical development stage associated with more biological, psychological and social role changes than any other stage of life except infancy [1, 2]. During adolescence, individuals become increasingly responsible for their own health and health care; in the meantime, several health risk behaviours (e.g. alcohol use, cigarette smoking) start to become prevalent. Adolescence is therefore an important life phase where the introduction of targeted educational and preventative efforts has the potential to impact positively upon both short- and long-term health status and health-related quality of life outcomes (HRQoL) [3]. With recent revision in the family planning policy in China, it is expected that the number of children and adolescents will increase significantly, as well as the corresponding increasing demand for health services. There is a crucial and urgent need to understand children and adolescents’ subjective valuation of their HRQoL. Such information is an essential prerequisite for the planning and prioritising of health-related interventions to improve adolescent health and also reflecting adolescent preferences.

Cost-utility analysis (CUA) is one type of economic valuation methods that has been commonly adopted to inform decision-makers about the relative benefits of alternative uses of resources in both clinical practice and public health settings [4, 5]. For an economic evaluation to allow for the comparability of outcomes emanating from interventions relating to disparate health conditions, the effectiveness of an intervention is usually quantified in terms of quality-adjusted life-years (QALYs). In recent years, generic preference-based instruments have become the most popular tools for the estimation of QALYs within CUA. A recent review by Chen and Ratcliffe [6] identified nine generic preference-based instruments available internationally that have been used in paediatric populations, among which the Child Health Utility 9D (CHU9D) is the only instrument that was developed from its inception exclusively for application with young people [7, 8].

The original CHU9D questionnaire was developed in the United Kingdom in English. Several recent studies have demonstrated the practicality, face and construct validity of the CHU9D in children and adolescents aged 7–17 years old from English-speaking countries [9,10,11]. There is an increasing interest internationally in the application of the CHU9D with adolescents. The instrument is currently being applied in a number of research programs, such as the economic evaluation of treatment programs for type 1 diabetes [12], mental health [13] and obesity prevention [14]. Recently, a Chinese version CHU9D (CHU9D-CHN) was developed and a pilot study was conducted in Nanjing City, eastern China suggesting that the CHU9D-CHN can discriminate self-reported health status of students in China [15]. The pilot study demonstrated the potential for using CHU9D-CHN in China. An official version of CHU9D-CHN questionnaire was translated after the pilot study by Oxford Outcomes Ltd, a research agency specialising in the translation and linguistic validation of patient-reported outcome measures in the United Kingdom. The translation and linguistic validation process were designed in line with the recommendations of the ISPOR Task Force for translation and cultural adaptation of patient-reported outcome measures [16], which include dual forward translation, reconciliation, dual back translation including back translation review, followed by cognitive debriefing with participants from the relevant patient group. The translation also underwent proofreading and format check steps prior to finalisation.

This study aims to evaluate the psychometric properties for the official CHU9D-CHN instrument in a large and representative school-based sample in western China. The availability of the CHU9D-CHN will facilitate future CUA to prioritise health-related interventions for children and adolescents in China.

Methods

Participants

The study employed a cross-sectional school-based survey design in China, in which the targeted participants were students in grades 3–9. According to Kendall’s rough working principle, the sample size should be 5–10 times the desirable variables. In the study, the targeted sample size was 198 students for every grade, which was 20 times more than the number of items in the CHU9D-CHN. Considering the rate of loss (10%), the total required theoretical sample size was 1386 participants. The participants were recruited using a multi-stage sampling method. First, we randomly chose four counties from Shaanxi Province in China. Second, within each chosen county, one school each was randomly selected from primary schools and secondary schools, respectively. Finally, two classes were randomly chosen in each grade.

Procedure

The survey was conducted by the Baoji Center for Disease Control and Prevention (CDC). Researchers underwent thorough training sessions prior to the survey. All the students in the school on the day of the survey completed the questionnaire independently in the classroom. Researchers were available to assist the students in completing the questionnaires if they had any semantic or conceptual understanding issues. Taking each school as a sampling unit, a primary school was selected for the retest survey by using a random sampling method. A total of 232 students (over 10% of the full sample) filled in the questionnaire again 2 weeks post the initial survey.

Questionnaire

The key components of the questionnaire include two generic HRQoL instruments (the CHU9D-CHN and the Pediatric Quality of Life Inventory™ 4.0 Generic Core Scales [hereafter “PedsQL”]), self-reported general health status and health service utilisation and socio-demographics characteristics, e.g. gender, age and left-behind status.

The CHU9D consisted of nine dimensions, including worried, sad, pain, tired, annoyed, schoolwork, sleep, daily routine and ability to join in activities. Within each dimension, there are five different levels indicating increasing levels of severity. The CHU9D-CHN was administered as a component of a paper questionnaire, and participants were instructed to self-complete the CHU9D-CHN from the perspective of their own health on the particular day. The CHU9D-CHN questionnaire was scored using the latest version of the scoring algorithm derived from adolescents in Australia [17] and China [18,19,20] using a combination of the best worst scaling (BWS) and time trade-off (TTO) method. To derive the Chinese-specific tariff, two surveys were conducted in Nanjing, China. The first BWS survey recruited 902 students from primary and high schools using a multi-stage random sampling method and was administered in a classroom setting, whilst the TTO survey adopted an interviewer-administrated conventional TTO task and was administered to a convenience sample of 38 undergraduate students. A latent class-modelling framework which accounts for the potential preference heterogeneity of respondents was used to analyse the BWS data. Since the BWS approach generates raw health state values on an ordinal scale only, it was then re-scaled onto the 0–1 death-full health QALY scale by combining the responses from a secondary TTO sub-study using a mapping approach.

The Chinese version of the PedsQL is a validated and widely used non-preference-based paediatric HRQoL instrument [21,22,23,24]. The PedsQL consists of 23 items which can be grouped into 4 dimensions: physical health (8 items), emotional functioning (5 items), social functioning (5 items) and school functioning (5 items). The latter three dimensions can be used to further generate a psychosocial health summary score. A 5-point response scale is used in the report (0 = never a problem; 1 = almost never a problem; 2 = sometimes a problem; 3 = often a problem; 4 = almost always a problem). Items are reverse-scored and linearly transformed to a 0–100 scale (0 = 100, 1 = 75, 2 = 50, 3 = 25, 4 = 0), with higher scores indicating better HRQoL [25]. Scores of the PedsQL are computed as the sum of the items divided by the number of items answered (this accounts for missing data). If more than 50% of the items in the scale are missing, the score is not computed.

Analysis

Descriptive statistics including means, median, standard deviations (SD) and the interquartile range (IQR) were estimated. The distributions of the HRQoL scores obtained from each instrument were tested for normality using the Shapiro–Wilk test. Because the null hypothesis that scores of both instruments were non-normally distributed was rejected, the Kruskal–Wallis test and Mann–Whitney U tests were used to compare the scores for sub-groups. The two-tailed P < .05 was considered to be statistically significant. Statistical analysis was performed in SPSS version 22.0 (Armonk, NY: IBM Corp). The following psychometric properties of the CHU9D-CHN were investigated.

Investigation 1: reliability

The reliability was addressed by calculating Cronbach’s alpha for the CHU9D-CHN [26]. Given that the CHU9D items tap into the same overall construct (quality of life) but represent different domains, we set an alpha of 0.7 as a minimally acceptable level of internal consistency [27]. It was anticipated that the CHU9D should have acceptable to excellent internal consistency indicating the items are tapping into the same general construct. Test–retest reliability (which shows agreement between participant’s scores at two time intervals) was further investigated. The test–retest reliability was indexed by the intra-class correlation coefficient (ICC). An ICC > 0.60 was considered satisfactory [28]. A 2-week test–retest reliability was investigated following previous literature investigating the retest reliability of the most widely used preference-based HRQoL instrument among adults, the EQ-5D, in which the same length of recall period is used as the CHU9D [29,30,31]. Reliability was also assessed by examining the percentage agreement and kappa coefficients (weighted and unweighted) [32]. A kappa coefficient < 0.2 indicates poor agreement, 0.21–0.40 indicates fair agreement, 0.41–0.60 indicates moderate agreement, 0.61–0.80 indicates substantial agreement and > 0.81 indicates almost perfect agreement [33].

Investigation 2: convergent validity

We anticipate that the items and utility score on the CHU9D should demonstrate moderate to high correlations with corresponding similar items and score on the PedsQL, indicating that they are tapping into the same broad construct. For example, among CHU9D-CHN dimensions, worried, sad and annoyed are expected to have higher correlations with emotional function in the PedsQL; activities dimension is expected to have higher correlation with physical function in PedsQL and schoolwork/homework is expected to have a higher correlation with school function in the PedsQL. The association between the CHU9D and the PedsQL were investigated using Spearman’s correlation coefficient. Correlations less than 0.3 were considered weak, 0.3–0.6 moderate and > 0.6 strong [34]. It should also be noted that the two instruments have different recall time periods: “today” was used for the CHU9D-CHN, whilst “during the past one month” is used for the PedsQL. The distribution of PedsQL scores across the dimension levels of the CHU9D-CHN was also studied. We generated a dimension-level correlation matrix of the CHU9D-CHN and the PedsQL to look for correlations between dimensions, particularly those that were conceptually related.

Investigation 3: known-groups validity

It was hypothesised that both the CHU9D and the PedsQL discriminate well between participants in different self-reported general health (excellent, very good, good, fair or poor), frequency of self-reported doctor visits and left-behind status. It was anticipated that participants with better health status would have significantly higher HRQoL scores (i.e. CHU9D utility scores and the PedsQL total score) than those who had worse health status. Similarly, we hypothesised that those who more frequently visit doctors should be associated with poorer HRQoL. Left-behind children or adolescents refer to those who are under 18 years old and are left behind at their original residence for at least 6 months by one or both parents who migrate to another place to work [35]. In China, the number of left-behind children has increased dramatically during the past 20 years, especially in rural areas [36]. Evidence reveals that left-behind children suffer from severe psychological problems, such as psychological depression or anxiety [35, 37], as well as behavioural issues [38, 39]. It was anticipated that participants who were left-behind would have lower HRQoL score than their counter-part.

Results

A total of 1959 students who attended the school on the day of the survey were eligible to participate and all successfully completed the survey. There were 40 (2.04%) questionnaires with missing items in the CHU9D-CHN. In addition, there were another seven questionnaires with missing items in self-reported general health status, health service utilisation and socio-demographics characteristics. The following psychometric analyses focused on a subsample (N = 1912) with no missing values on key items mentioned above. There were no significant differences between the analysis sample and full sample in participants’ characteristics presented in Table 1.

Table 1 Descriptive characteristics of participants

For the analysis sample, the mean age ± SD of the respondents was 12 ± 2 years old, of which 44% were female, and 64% were left-behind children. In this school-based sample, the majority of participants (82%) self-reported their health to be good (21%), very good (33%) or excellent (28%) and 66% did not visit doctors in the past month. The mean ± SD (median) of the CHU9D-CHN utility scores were 0.78 ± 0.19 (0.81) and 0.85 ± 0.14 (0.88), scored using the Australian and Chinese adolescent tariffs, respectively (Fig. 1). The mean ± SD (median) of the PedsQL score was 81.91 ± 12.55 (82.95). CHU9D utilities and PedsQL scores were not normally distributed (all P < .001).

Fig. 1
figure 1

Distribution of the CHU9D-CHN utility scores based on both Australian and Chinese tariffs

The responses to the CHU9D-CHN are presented in Table 2. Participants also generally reported themselves in good health according to the CHU9D-CHN descriptive system, with N = 328 (17%) reporting themselves at full health (reflecting the highest or best level for all nine CHU9D-CHN dimensions). No participants reported themselves in the lowest or worst level of impairment for all nine CHU9D dimensions. The dimension of the CHU9D-CHN with the highest proportion of no impairment was daily routine (89%), followed by sleep (81%), sad (76%), annoyed (74%), pain (71%), able to join in activities (59%), worried (55%), schoolwork/homework (55%) and tired (53%). It can also be seen that, along with the increasing levels of severity in the CHU9D-CHN dimensions, in most cases the mean PedsQL scores decreased. The three exceptions observed were the ‘sad’, the ‘tired’ and the ‘sleep’ dimensions, in which there were increased PedsQL scores from ‘a bit’ to ‘quite’, ‘quite’ to ‘very’ and from ‘many problems’ to ‘couldn’t sleep’, respectively. However, only a very small proportion (< 5%) of responses represent these inconsistent groups.

Table 2 Distribution of PedsQL total scores across dimension levels of the CHU9D (N = 1912)

Reliability

The Cronbach’s alpha was 0.771 (the standardised Cronbach’s alpha coefficient was 0.780), indicating an acceptable level of internal consistency for the CHU9D-CHN. A total of 232 students participated in the retest survey 2 weeks post the initial survey. The ICC was 0.653 (P < .001) for the overall CHU9D utility score, revealing good test–retest reliability. At dimension level, percentage agreement ranges from 49.57 (‘worried’ dimension) to 81.90% (‘daily routine’ dimensions), whilst weighted kappa coefficients ranged from poor to moderate (see Table 3 for details).

Table 3 Test retest reliability
Table 4 Correlations between the CHU9D dimensions and the PedsQL dimensions

Convergent validity

Table 4 presents the correlations between CHU9D-CHN and PedsQL at both the overall scores and dimension levels. The CHU9D utility scores moderately correlated with the PedsQL total scores (r = .57, P < .001). At the dimension level, the strongest degree of correlation was found between the ‘Emotional function’ dimension in the PedsQL and the ‘Worried’ dimensions on the CHU9D-CHN (r = .37; P < .001), which is classified as moderately correlated. A moderate degree of correlation was also found between similar dimensions for both instruments, ‘Schoolwork’ in the CHU9D-CHN and ‘School function’ in PedsQL (r = .36; P < .001), ‘Activities’ in the CHU9D-CHN and ‘Physical health’ in PedsQL (r = .32; P < .001). In addition, lower correlations were found between relatively dissimilar items of the CHU9D-CHN and dimensions of the PedsQL, showing evidence of discriminate validity. The lowest degree of correlations was found between the ‘Daily routine’ dimension in the CHU9D-CHN and the ‘School function' dimension in the PedsQL (r = .17; P < .01). There was a consistently low agreement between all the items in the CHU9D-CHN and the ‘Social function' dimension in the PedsQL (r < .30).

Known-groups validity

Table 5 summarises the differences of the CHU9D utilities and the PedsQL overall score between sub-groups based on individual characteristics. As expected, poor self-reported health status was significantly associated with lower CHU9D utilities (P < .05). An expected trend has also been found for health services utilisation in that those who used more health services were associated with lower CHU9D utilities (P < .05). Finally, the CHU9D utility scores of participants who were left-behind children were lower than their counter-part (P < .05).

Table 5 Comparison of the CHU9D utility score and the PedsQL total scores according to participants’ characteristics

Results on other demographic characteristics found that firstly, girls reported statistical and significant higher CHU9D utilities than boys (P < .05), and secondly, there was a decreasing trend for CHU9D utilities along with the increasing age (P < .05). Conclusions on PedsQL scores were largely comparable with the CHU9D utilities, except that gender heterogeneity was insignificant based on PedsQL (P > .05).

Discussion

The CHU9D has been specifically designed for application with children and adolescents to facilitate the estimation of QALYs for the economic evaluation of health care treatment and preventive programs targeted at young people. The results of this study indicate that the CHU9D-CHN demonstrates satisfactory psychometric properties for students in China.

The CHU9D-CHN has demonstrated an acceptable level of internal consistency and good retest reliability for evaluating the HRQoL of Chinese students aged 8–17 from Grades 3–9. Although there is one item for each dimension of the CHU9D, the nine items were tapped into the same overall construct but represent different domains [40]. The Cronbach’s alpha for the CHU9D-CHN in this Chinese student sample (0.771) is similar to what has been reported in an Australian study (0.778) [40]. The retest reliability is satisfactory as measured by ICC (0.653) which is larger than 0.6. At dimension level, percentage agreements ranged from 50 (worried) to 82 (daily routine), and the Kappa coefficients ranged from poor to moderate. Comparing the results from this study with literature, it is found that the percentage agreements and kappa coefficient were generally lower than those reported with a pragmatic morning/afternoon test–retest of the CHU9D among children aged 6–7 years old in UK [41]. The key reasons behind may be due to a much longer interval (2-week) adopted for retest in this study and the markedly lower age range in the UK study. Subjective evaluations on HRQoL could have changed significantly during 2 weeks. Unfortunately, test–retest measurements could not be carried out in the same or next day because of the actual operations of the investigation sites.

Results also support the convergent and known-groups validities of the CHU9D-CHN. The findings for the CHU9D-CHN in this Chinese student sample are similar to what has been reported in an Australian study [42]. For example, the CHU9D utility scores moderately correlated with the PedsQL total scores in both study; even though, the overall correlation coefficient in this study (r = .57) was slightly lower than that of the Australian study (r = .63); a moderate degree of correlation was found between similar dimensions in both study, e.g. ‘Schoolwork’ in the CHU9D and ‘School function’ in the PedsQL. At the dimension level, the strongest degree of correlation was found between the ‘Emotional function’ dimension in the PedsQL and the ‘Worried’ dimension in CHU9D, and consistently low correlations between all the items in the CHU9D and the ‘Social function’ dimension in the PedsQL.

Compared to other studies conducted in developed countries, we included the investigation of the left-behind status in the known-groups validity test as this is a well-documented issue in China. As expected, the CHU9D utility scores of participants who were left-behind were significantly lower than those not left-behind, further supporting the known-groups validity (in addition to the self-reported health status and health services utilisation).

The present study has some limitations. Firstly, the survey was performed in schools, and owing to the nature of this sample, the majority of participants were healthy. As expected, based on the CHU9D-CHN descriptive system, participants generally reported a high HRQoL, with 17% reporting themselves in full health, and no reporting in the lowest or worst level of impairment for all nine dimensions of the CHU9D-CHN. This result was similar to the previous studies based on community sample [15]. Future studies should investigate the psychometric properties of the CHU9D-CHN in clinical patient samples. Secondly, future study based on patient samples should consider a longitudinal survey design to further investigate the responsiveness of the CHU9D-CHN instrument. Thirdly, the study population was limited to school children and adolescents in Shaanxi Province that participants were all of the Han ethnic group. It is unsure whether the results hold for the minorities in China. Lastly, owing to the classroom setting of questionnaire completion and the availability of researchers on site for conceptual understanding issues, it was difficult to draw a conclusion on the practicality of the instrument. It was also not possible to accurately measure the completion time for CHU9D-CHN instrument in the hard-copy questionnaire for each student. In general, students reported that the CHU9D-CHN was simple and students in Grade 5–9 typically took less than 2 min to complete the questionnaire, whilst the students in Grade 3–4 typically took 3–4 min. From an operational perspective, the CHU9D-CHN was easy to be administered, and students had little trouble answering the questions suggesting it could be implemented within a large-scale survey in the future.

Conclusion

The CHU9D-CHN is a reliable and valid instrument for measuring HRQoL of children and adolescents aged 8–17 years in China. Further studies are required to understand the psychometric properties of the CHU9D-CHN instrument in different patient samples in China.