Introduction

Quality of life (QOL) is a complex concept. It has been defined by the WHO Quality of Life Group as an individual’s perception of well-being (physical, psychological, and social) in the context of their values, beliefs, expectations, goals, and cultural environment [1]. Nowadays, the term “health-related quality of life (HRQOL)” is more common in the medical and health fields. As a subjective, multidimensional and dynamic health screening indicator [2], HRQOL plays an increasingly important role in clinical trials, epidemiological and health policy settings of children and adolescents [3,4,5,6]. In clinical trials, HRQOL measures can determine the burden of diseases or disabilities [7]. Furthermore, the combination of clinical data and HRQOL measures can provide a complete assessment of the evaluating health care interventions and treatments for overall well-being of children and adolescents [8]. From an epidemiological point of view, HRQOL research conducted within a general population can help identify at-risk children and adolescents in schools and communities. These risks include in bullying, developmental disorders, poor peer relations, domestic violence, disrupted families, and so on [9,10,11]. In addition, HRQOL measures are also used to evaluate health service needs, and thereby influence public policy decisions such that early prevention and, if necessary, reconfiguration of health resources [12]. Therefore, the HRQOL in children and adolescents has attracted increased attention and has become an emerging area of research field [13].

Developing an appropriate instrument is vital for monitoring HRQOL. It is well known that the measurement of HRQOL in children and adolescents is different from that in adults. In addition to common dimensions such as physical and psychological well-being, it is important to realize the specific aspects of their families, friends, schools, and the communities. Furthermore, it is now widely endorsed that children and adolescents can answer HRQOL instruments reliably if their emotional development, cognitive capacity, and reading skills are taken into accounts [8]. Thus, self-report is an ideal way to obtain their subjective perspective. Finally, because HRQOL is extremely susceptible to cultural backgrounds, the instrument should be developed simultaneously among different cultures, or the existing accepted instrument should be translated, analyzed, and adapted into other languages considering the cultural diversities [14].

The KIDSCREEN instrument is a typical example of cross-cultural applications. The KIDSCREEN project was promoted by the European Commission and aimed to produce a generic instrument to assess HRQOL for children and adolescents aged between 8 and 18 years [15]. Compared with other instruments, KIDSCREEN has some unique advantages [10]: Firstly, it has excellent cross-cultural comparability because it was developed and tested simultaneously in several different European countries. Secondly, based on a broad perspective on the HRQOL of children and adolescents, it not only covers physical, psychological, and social aspects, but also creatively adds other unique new aspects, such as self-perception, autonomy, bullying, and financial resources. Thirdly, classical test theory (CTT) and item response theory (IRT) were combined in the process of development. Finally, there are three KIDSCREEN versions (KIDSCREEN-52, KIDSCREEN-27, and KIDSCREEN-10), and all have available corresponding rating as child/adolescent and parent/proxy rating.

Up to date, the KIDSCREEN has been successfully translated into more than 30 different languages in Europe, North and Latin America, Africa, Asia, and Oceania [16]. Besides, its good psychometric properties were also confirmed in a sample of Hongkong children and adolescents [17], and the Cantonese Chinese version was formed. However, to our best knowledge, there is no Mandarin Chinese version of the KIDSCREEN. It is well known that Mandarin is the common language in Chinese mainland, and was adopted as the official language of the People’s Republic of China (PRC) in 1955 [18]. Thus, its function is positioned as a standard language throughout China [19]. In contrast, Cantonese is spoken primarily in the southern provinces of Guangdong and Guangxi, and in the Hong Kong and Macao Special Administrative Regions of China [20]. Although Mandarin and Cantonese come from the same language group (Sino-Tibetan languages), they are quite different in terms of vocabulary, grammar structure, character pattern (simplified Chinese vs. traditional Chinese), and so on [18, 21, 22]. Hence, it is necessary to validate the Mandarin Chinese version of KIDSCREEN. The aim of this study was to evaluate the psychometric properties of the Mandarin Chinese version of KIDSCREEN-52.

Methods

Recruitment and data collection

The cross-sectional study was divided into two cohorts (baseline and test–retest survey) and was conducted in Weifang (a central section city in Shandong Province of China), carrying about 9.27 million inhabitants (9.42% of the total Shandong population) from October 2016 to November 2016. Weifang is divided into nine urban areas based on geographic location, three of which (Kuiwen, Weicheng, and High-tech Development) are located in the center. These three areas also represent different socioeconomic backgrounds, population size, and so forth, respectively. The basic school education in these three areas is coeducation and is divided into the following cycles: Preschool (3-year cycle, aged about 3–5); Primary school (6-year cycle, aged about 6–11); Middle school (3-year cycle, aged about 12–14); and High school (3-year cycle, aged about 15–18). In this study, considering the age range of the KIDSCREEN-52, only adolescents from the sixth grade in primary school to the second grade in high school (a total of six grades) were eligible to enter the research.

In the baseline survey, we conducted a multistage, stratified, and cluster sampling scheme that used the school-class as a sampling unit, stratified by the type of area, school and grade, and randomly selected at the school and class level. Firstly, according to the official records (Weifang Statistical Yearbook, 2016) and the website of the Weifang Education Bureau, we created a list of primary schools, middle schools, and high schools in each of three urban areas. Then an original sampling frame was formed. Secondly, we used various ways (such as official records, contact principals, and so on) to screen all schools, and to identify all potential schools that fit for the study criteria (for example the number of students, school type, school size, and so forth). Thus, the resulting sampling was framed. Thirdly, we randomly selected schools from each urban area, and the number of primary, middle, and high schools was four, two, and one, respectively. Two nine-year coherent schools, which included both primary and middle within the same school, had been sampled within one area, so there were five schools in this area and seven in the rest of each two areas. Thus, a total of 19 schools were included in this study. In addition, schools that did not agree to participate were replaced by another randomly selected school from the same group. Finally, according to the research criteria (age, sex, etc.), we randomly selected classes from each grade, and all 4693 potential participants from 100 classes were included in the study.

The data were collected in the school environment by a pen and paper self-report questionnaire. Firstly, before the investigation of every class, sufficient questionnaires and a brief questionnaire guide were all put in a large sealed envelope by trained research assistants. Secondly, at the time of the adolescents’ self-study in the afternoon, each class adviser took the sealed envelope to their own class and then opened it on the spot and distributed the questionnaires. Thirdly, the class advisers read the questionnaire guide aloud and provided the brief introduction and notes about the questionnaire. Fourthly, the questionnaire was completed under quiet classroom conditions within the given 35-min period. After doing so, they were told to continue with their school work to minimize any noise until all the other adolescents finished. In addition, they were allowed to ask research assistants for help if they did not understand any of the questions. Finally, research assistants collected all the questionnaires and put them back in the previous large envelope, re-sealed and took it away. The inclusion criteria for the adolescents were as follows: (1) age between 11 and 17 years; (2) able to read and complete the questionnaires independently; (3) consent to participate in the study; (4) at school on the day of the data collection. In the end, we screened 4385 eligible participants from 4693 potential participants, for a response rate of 93.4%.

After two weeks of the baseline survey, the test–retest samples, including 19 classes from 11 schools in three areas, were selected by the convenient and cluster method. Student identity numbers of potential respondents were used to allow the matching of test–retest questionnaires. Before the test–retest survey started, the class advisers of all selected classes were asked to provide details about potential participants of the last two weeks. These details include physical health, mental health, major life events, and so on, which might lead to significant changes in HRQOL. We did our best to ensure that the environment, time, and process of test–retest survey were highly similar to the baseline survey. The inclusion criteria were as follows: (1) adolescents who involved in the baseline survey; (2) adolescents who completed the questionnaire in the baseline survey; (3) adolescents who agreed to participate in the test–retest survey; (4) adolescents who did not experience significant physical or mental changes within two weeks after the baseline survey; (5) adolescents who did not experience major life events within two weeks after the baseline survey. In the end, 841 qualified participants were selected from 867 potential participants and participated in the test–retest survey, for a response rate of 97.0%.

Ethical considerations and informed consent

This study was conducted after approval by the Ethics Committee of Weifang Medical University. Initially, the principals and school doctors from all participating schools were contacted for approval of participation and gave their permission. Next, each class adviser sent out and collected the parents’ informed consent through the parent conference or the We-Chat group. Finally, the informed consent was also arranged on the first page of the questionnaire and in the teacher’s guide manual. Participation was voluntary, and each adolescent decided whether to fill in the questionnaire after reading the informed consent. It is worth noting that, if adolescents want to quit after the study begins, they can raise their hands and stop answering without having to explain why.

Measures

KIDSCREEN-52 (Mandarin Chinese version)

The English original KIDSCREEN-52 is available in two versions as a self-report and parent/proxy report. In this study, only responses from adolescents on the KIDSCREEN-52 were used. The KIDSCREEN-52 self-report version, consisting 52 items, provides detailed information which are used to assess HRQOL across ten dimensions: physical well-being (5 items), psychological well-being (6 items), moods and emotions (7 items), self-perception (5 items), autonomy (5 items), parent relation and home life (6 items), financial resources (3 items), social support and peers (6 items), school environment (6 items), and social acceptance and bullying (3 items). The time frame refers to the previous week. Responses are categorized into on a five-point Likert scale that evaluates the frequency (1 = never, 2 = seldom, 3 = sometimes, 4 = often, 5 = always) or the intensity (1 = not at all, 2 = slightly, 3 = moderately, 4 = very, 5 = extremely) of certain behaviors, feelings and, attitudes. Negatively formulated items are reverted, and item scores for each respective dimension are summed up. The scores for each of the ten dimensions are transformed into T-values with a mean of 50 and a standard deviation (SD) of 10, based on a representative sample of the European general children and adolescents [10]. Higher scores indicate better HRQOL.

Our research group obtained permission from the European KIDSCREEN Group to use the KIDSCREEN-52 self-report version. Based on the English original version of KIDSCREEN-52, a Mandarin Chinese translation was conducted. The translation process used the forward-backward-forward methodology [10, 23, 24]. Firstly, two translators, both native Mandarin Chinese speakers and fluent in English translated the KIDSCREEN-52 into Mandarin Chinese independently. Secondly, all the items of forward translations were then compared and assessed by a research group (including two forward translators and experts) in order to generate a single Mandarin Chinese reconciled version. Thirdly, two other professional bilingual translators back translated the Mandarin Chinese reconciled version into English separately. It was worth noting that the two backward translators were not familiar with the original questionnaire and did not participate in the forward translation process. Fourthly, all the items of backward translations were then compared and assessed by another research group (including two backward translators and experts), and a revised English backward translation version was produced. Lastly, the small differences between the original version and the backward translated version, which were of a culture and linguistic nature, were then discussed in the integrated research group (all translators and experts) to improve the quality of the Mandarin Chinese translation and to reach consensus. The above process was repeated several times to determine the semantic equivalence of all items. In order to ensure the accuracy and readability of all the items, we recruited a number of experts from relevant fields in China and conducted two rounds of expert consultation by email. Then, we revised the items according to the experts’ comments. Finally, we got the pre-final Mandarin Chinese version of the KIDSCREEN-52. Furthermore, to ensure the items of the pre-final Mandarin Chinese version were age-appropriate, we conducted a pilot study on a small group of 48 boys and girls between 11 and 14 years old, using convenience sampling method. They were asked to fill out the questionnaire and give feedback on the readability and applicability of all items. Then the phrasing of some items was further minor adjusted, with no items substantially modified, added, or removed. The time required to fill in the questionnaire was also recorded by the researchers.

PedsQL™ 4.0 (Chinese version)

The Pediatric Quality of Life Inventory (PedsQL™ 4.0) is a generic instrument for assessing HRQOL in health as well as chronically ill children and adolescents between the ages of 8 and 18 years [25]. It consists of 23 items and the following four dimensions: physical function (8 items), emotional function (5 items), social function (5 items), and school function (5 items). Responses are categorized into a five-point Likert scale that evaluates the frequency (0 = never; 1 = seldom; 2 = sometimes; 3 = often; 4 = always), with a one-month recall period. The dimension score was then converted into a 0–100-point scale, with 100 indicating the best HRQOL and 0 the worst [26]. Higher scores indicate better HRQOL. The Chinese version of the self-report PedsQL™ 4.0 has been validated in previous studies [27, 28]. In this study, it was used to evaluate the convergent and discriminant validity of the KIDSCREEN-52 and the Cronbach’s alpha coefficient was satisfactory at 0.929.

FAS (Chinese version)

The socioeconomic status of the adolescent’s families is assessed with the Family Affluence Scale (FAS), which includes questions on family car ownership, having own unshared room, the number of computers at home, and how many times the family spent on holidays in the past 12 months [29]. The FAS scores are collected in eight categories ranging from 0 to 7, which are recorded into three broad groups in the analysis (low 0–3, intermediate 4–5, and high 6–7) [30, 31]. The cross-cultural validity of the FAS has been shown in multinational surveys across over 30 countries [30] and the Chinese version has been validated in a previous study [31]. In this study, the Cronbach’s alpha coefficient was moderate at 0.591. Higher family affluence was expected to be associated with higher scores in all KIDSCREEN-52 dimensions, but especially for the financial resources dimension.

SDQ (Chinese version)

The Strength and Difficulties Questionnaire (SDQ) is a brief screening questionnaire designed to evaluate mental health and behavioral difficulties of children and adolescents aged from 4 to 16 years [32]. It exists in three informant versions (parent report, teacher report, and self-report), with each version consists of 25 items measuring five dimensions of emotional symptoms, conduct problems, hyperactivity/inattention, peer relationship problems, and prosocial behavior. Each dimension contains five items. Responses are categorized into a three-point scale (0 = not true, 1 = somewhat true, and 2 = certainly true), with a six-month recall period. A small number of items are reversed scored. A total difficulties score is obtained by summing up the scores for all except the Prosocial Behavior dimension. The possible range of scores for each of the dimensions is 0–10, and for the total difficulty score is 0–40. Thus, the total difficulty score can be regarded as normal (0–14), borderline (15–17), and abnormal (18–40) by cutoff values derived from the Shanghai norm data [33]. Higher scores indicate more problems. Considerable research has been conducted in numerous Western community contexts to establish the psychometric properties of the SDQ [34,35,36] and the Chinese version has also demonstrated robust reliability and validity not only in Chinese mainland [33], but also in Hongkong [37] and Taiwan [38]. In this study, the self-report version of SDQ was adopted, and the Cronbach’s alpha coefficient was satisfactory at 0.871. Higher difficulties scores were hypothesized to be associated with lower scores in all KIDSCREEN-52 dimensions, but especially in the mood and emotions and psychological well-being dimensions.

Socio-demographic characteristics

A socio-demographic information sheet covered basic participant information, including age, gender, ethnicity, siblings, parental education level, and socioeconomic status.

Statistical analysis

Feasibility

The feasibility of the KIDSCREEN-52 was determined by the response rate and the percentage of the missing values in each dimension.

Item and dimension properties

As the measures of item properties, corrected item-domain correlations coefficients and “Cronbach’s alpha if item deleted” were calculated [39, 40]. The floor and ceiling effects of all dimensions were also calculated. If more than 15% of respondents achieved the lowest or highest possible scores on a given dimension, they were defined as being present [41]. In addition, following the KIDSCREEN Group Europe’s scoring algorithm [10], Rasch scores were computed for each dimension, weighted in accordance with the European norm population, and reported as T-values, with a mean of 50 and a standard deviation of 10.

Reliability

Reliability was determined by internal consistency using Cronbach’s alpha coefficient and test–retest reliability using intraclass correlation coefficients (ICCs). Alpha coefficients > 0.70 indicate acceptable reliability, whereas that > 0.90 is recommended for better [42]. The ICC values for the levels of agreement consider < 0.20 as poor agreement, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as good, and > 0.81 as very good [43].

Construct validity

The construct validity of the KIDSCREEN-52 was tested using confirmatory factor analysis (CFA) with maximum likelihood estimation (MLE). The aim of CFA was to test the hypothesis that there existed a relationship between the observed variables (items) and their underlying latent constructs (dimensions). The overall model fit was assessed using the Chi-square test and ratio of Chi-square to degrees of freedom (χ2/df). Reference values indicate a good model with χ2/df < 2 and acceptable with < 3 [44]. However, as this test is highly sensitive to sample size, we used alternative fit indices having the following cutoff values suggesting acceptable fit. Comparative fit index (CFI), normed fit index (NFI), Tucker-Lewis index (TLI), and adjusted goodness-of-fit index (AGFI) were reported with values of > 0.90 [45] or > 0.95 [46], which indicate acceptable or good model fit, respectively. The standardized root mean residual (SRMR) values < 0.05 suggesting good model fit [44]. In addition, the root mean squared error of approximation (RMSEA) was reported as an absolute fit index, with values < 0.08 or 0.05 considered acceptable or good fit, respectively [46].

Measurement invariance

The multigroup confirmatory factor analysis was performed to test the measurement invariance across gender and age groups in the whole sample. For testing loading invariance, ∆CFI ≥ − 0.010, supplemented by ∆RMSEA ≥ − 0.015 or ∆SRMR ≥ − 0.030, would indicate non-invariance [47]. In this study, the measurement invariance of the KIDSCREEN-52 was tested for configural equivalence, and results were used as the baseline model for subsequent analyses. After the metric invariance was obtained, scalar invariance and strict invariance were tested.

Convergent and discriminant validity

The convergent and discriminant validity were further assessed by determining the degree of correlation between two instruments assessing similar or less comparable dimensions of HRQOL. In this study, convergence indicates the two dimensions (such as KIDSCREEN-52’s physical well-being and PedsQL™ 4.0’s physical function) believed to reflect the same underlying concept highly correlated each other, whereas discriminant validity indicates low or moderate correlations between dimensions (e.g., KIDSCREEN-52’s physical well-being and PedsQL™ 4.0’s school function) that are believed to assess different characteristics. Pearson correlation coefficient (r) was used to measure the strengths of the association, and r < 0.1 was considered trivial, 0.10–0.30 as low, 0.31–0.50 as moderate, and ≥ 0.5 as high [41].

Known-group validity

The known-group validity of the KIDSCREEN-52 was examined by comparing the results between groups which were a priori expected to show differences in HRQOL. We conducted a series of one-way analyses of variance (ANOVA) to test the significant differences between socioeconomic status (low/medium/high) and mental health status (abnormal/borderline/normal) in relation to all dimensions. In addition, two kinds of effect sizes (\(\eta _{\text{p}}^{{\text{2}}}\) and Cohen’s d) were calculated. Partial eta squared (\(\eta _{\text{p}}^{{\text{2}}}\) ), which was considered to be suitable for ANOVA, was able to indicate the strength of the effect of group differences in all dimensions. However, in the original study of the KIDSCREEN, Cohen’s d was used as a measure of effect size, by dividing the difference between the groups (e.g., low FAS vs. high FAS) on each of the dimension [10]. Thus, in this study, Cohen’s d was also calculated to compare with the original research. Following conventions outlined [48], we interpreted effect size magnitudes of \(\eta _{\text{p}}^{{\text{2}}}\) (0.01 as small, 0.06 as moderate, and 0.14 as large) and Cohen’s d (0.20 as small, 0.50 as moderate, and 0.80 as large), respectively.

Missing values

Prior to running our analyses, we made decisions about missing values. The number of items with missing values was low. All 52 items of the KIDSCREEN were completed in 3900 participants (88.94%), without any missing values. As a structural equation model (SEM) is unsuitable for handling datasets that do contain missing values, we used 3900 complete data sets for confirmatory factor analysis. In addition, missing values in other calculations dealt with by the pairwise exclusion option.

Epiadta3.1 (EpiData Association, Odense, Denmark) was used to establish the database. Cohen’s d was calculated by the G*Power 3.1 computer program [49]. In addition, confirmatory factor analysis was performed by AMOS 24.0, and all remaining data analyses with SPSS 24.0 for Windows (SPSS, Inc., Chicago, IL). A p value of less than 0.05 was accepted as statistically significant.

Results

Socio-demographic characteristics

A sample of 4385 adolescents participated at baseline and 841 at test–retest surveys. In both surveys, the average age of adolescents was 13.71 and 13.81 years old, respectively, with a range from 11 to 17 years. The sex ratio of male and female was roughly the same (46.9%, 53.1% vs. 48.5%, 51.5%). The overwhelming majority of adolescents (99.1%, 99.5%) were the Han ethnicity and slightly less than half (48.7%, 47.8%) came from the one-child family. The majority of parental education level was secondary school, reaching (65.3%, 68.4%) and (68.2%, 69.8%), respectively. In addition, 41.5% and 34.8% of adolescents were from families with high socioeconomic status (Table 1).

Table 1 Socio-demographic characteristics of the baseline and test–retest sample

Feasibility

The response rates of baseline and test–retest survey were 93.4% and 97.0%, respectively. In addition, the percentage of missing values were very low, ranging from 0.02% for social acceptance and bullying to 1.92% for physical well-being of all dimensions (Table 2).

Table 2 Description, item and dimension properties, internal consistency, and test–retest reliability of the KIDSCREEN-52

Item and dimension analysis

The corrected item-domain correlations coefficients for all the 52 items of KIDSCREEN-52 were 0.645–0.690, 0.761–0.841, 0.819–0.889, 0.656–0.733, 0.857–0.904, 0.622–0.807, 0.806–0.856, 0.714–0.796, 0.729–0.794, and 0.649–0.705, respectively (ps < 0.01). In addition, dropping either item in the belonged dimension had a small negative effect on respective Cronbach’s alpha coefficients. Furthermore, deleting either item had a smaller effect on the total Cronbach’s alpha coefficients, with alpha of the remaining 51 items remaining at 0.963 or 0.964. The mean scores varied from 45.12 (SD = 9.24) for financial resources to 51.41 (SD = 8.02) for physical well-being due to T-value standardization. The floor effects ranged from 0.02% for (physical well-being, psychological well-being, moods and emotions, and parent relation and home life) to 1.37% for social acceptance and bullying, while ceiling effects ranged from 6.75% for school environment to 31.84% for social acceptance and bullying (Table 2).

Internal consistency and test–retest reliability

The Cronbach’s alpha coefficient for the total score was 0.964, and the alpha coefficient values for all dimensions ranged from 0.819 for social acceptance and bullying to 0.959 for autonomy. The test–retest reliability was assessed in a sub-sample of 841 adolescents, and the interval of test–retest measurement was approximately two weeks. The ICCs between scale scores for the two assessments ranged from 0.724, 95% CI [0.681, 0.766] for social acceptance and bullying to 0.849, 95% CI [0.829, 0.866] for school environment (ps < 0.01). Details are shown in Table 2.

Construct validity

We further evaluated the ten-factor model by using the confirmatory factor analysis. The goodness-of-fit statistics of the ten-dimensional model based on the original KIDSCREEN-52 structure were also calculated. The χ2(df = 1229) = 8500.4 (p < 0.001) and χ2/df = 6.917, which falls outside the acceptable range. However, other fit indices supported the ten-factor structure: CFI = 0.955, NFI = 0.947, TLI = 0.951, AGFI = 0.901, SRMR = 0.033 and RMSEA = 0.039, 90% CI [0.038, 0.040].

Measurement invariance

Results regarding the measurement invariance of the ten dimensions of KIDSCREEN-52 across gender and age groups are presented in Table 3. The results of four steps from loose to strict suggested its invariance across gender: ∆CFI = 0.000, − 0.001, and − 0.001 < − 0.01; ∆RMSEA = 0.000, 0.000, and 0.000 < 0.015. However, in consideration of age groups, the KIDSCREEN-52 cannot be considered fully invariant. The results of ∆RMSEA (0.001, 0.002, and 0.004 < 0.015) suggested its invariance, while ∆CFI of scalar and strict invariance was − 0.012 and − 0.022 > − 0.01.

Table 3 Measurement invariance of the KIDSCREEN-52 across gender and age groups

Convergent and discriminant validity

The correlations between KIDSCREEN-52’s dimensions and PedsQL™ 4.0’s dimensions are presented in Table 4. The KIDSCREEN-52’s dimension scores were positively and significantly correlated with the dimensions of PedsQL™ 4.0. As expected, the correlations of the two dimensions dealing with similar structures displayed a high level of correlation, suggesting the convergent validity: the PedsQL™ 4.0’s emotional functioning showed the highest correlation with the KIDSCREEN-52’s mood and emotions (r = 0.894), followed by that between school functioning and school environment (r = 0.764), social functioning and social support and peers (r = 0.725), physical functioning and physical well-being (r = 0.705). Conversely, the correlations between less comparable dimensions were low (e.g., 0.232, 0.289, 0.230 between KIDSCREEN-52’s autonomy, parent relation and home life, and financial resources and PedsQL™ 4.0’s physical functioning), which reveals the discriminant validity.

Table 4 Pearson correlation coefficients between the dimensions of KIDSCREEN-52 and PedsQL™ 4.0

Known-group validity

Table 5 presents a gradient for all the KIDSCREEN-52’s dimensions when the FAS is used to assess the socioeconomic status category of the family. A series of one-way ANOVA showed that KIDSCREEN-52’s dimension scores on FAS were all significant (F = 40.799–1275.525, ps < 0.001). In addition, two standardized effect sizes (\(\eta _{\text{p}}^{{\text{2}}}\) and Cohen’s d) were shown in the last two columns. A large \(\eta _{\text{p}}^{{\text{2}}}\) = 0.371 was found in financial resources between these three groups (low/medium/high) and \(\eta _{\text{p}}^{{\text{2}}}\) in the mood and emotions, social support and peers, school environment, and social acceptance and bullying were medium (0.063–0.072). \(\eta _{\text{p}}^{{\text{2}}}\) of remaining dimensions were all small (0.018–0.042). Regarding the Cohen’s d between two FAS groups (low vs. high), parent relation and home life and autonomy were small (0.393, 0.478). In addition, large Cohen’s d = 1.819 was in financial resources, while medium Cohen’s d were in the other dimensions (0.501–0.726).

Table 5 Differences in KIDSCREEN-52’s dimension scores by socioeconomic categories

Furthermore, SDQ problem scores were used to differentiate the mental health status between abnormal, borderline, and normal. Significant differences exist in all dimensions of KIDSCREEN-52 (F = 193.589–1277.164, ps < 0.001). The \(\eta _{\text{p}}^{{\text{2}}}\) in the physical well-being, autonomy, parent relation and home life, and financial resources were medium (0.087–0.136), while remaining dimensions were all large (0.162–0.385). Regarding the Cohen’s d between two SDQ groups (abnormal vs. normal), Cohen’s d were large (0.941–2.085) in all dimensions except that physical well-being and financial resources were medium (0.732, 0.738). Details are shown in Table 6.

Table 6 Differences in KIDSCREEN-52’s dimension scores by mental health status

Discussion

As an instrument for measuring HRQOL in children and adolescents, KIDSCREEN-52 can be widely used in epidemiologic studies, clinical intervention studies, and research projects [10]. To our best knowledge, this is the first time that the psychometric properties of the KIDSCREEN-52 are being evaluated on Chinese mainland adolescents. Overall, this study developed the Mandarin Chinese version of KIDSCREEN-52 and provided evidence that it has sufficient psychometric properties to assess HRQOL in adolescents. These findings will contribute to the global use of this instrument, especially for both country and culture comparisons.

This instrument had strong feasibility and acceptability, with convenient filling out, high response rates, and few missing values. The high response rates and few missing values were also appeared in previous international studies [50,51,52,53] on school sampling and administration. All these may be due to the following facts: Before the study, we obtained the informed consent from adolescents and their parents and gained a greater degree of trust among parents [50]. Then, in order to minimize disturbance to the survey, the scope of class advisers was limited to outside the classroom, after completing a series of tasks such as leading research assistants into their class, briefly introducing the research content and helping to distribute the questionnaire. In addition, adolescents were also informed that only the researcher will access their questionnaire and read their answers. And, research assistants were on the spot to help if necessary [54]. In a word, the instrument might be viable in a cross-cultural setting.

Excellent psychometric properties were demonstrated in all items and dimensions of this instrument. The corrected item-domain correlations coefficients of all items were all greater than 0.60. In addition, the negative effects of dropping either item on the Cronbach’s alpha coefficient were negligible, regardless of belonged dimension or total score. We found almost negligible floor effects, but a slight ceiling effect in the autonomy and financial resources dimensions, and a moderate in the social acceptance and bullying dimension. Similar findings have been described in the original study [10] and further, the Mandarin Chinese version actually reduces the ceiling effect to some extent compared to the original study. Studies have shown that the ceiling effect may be expected in general HRQOL instruments because they are designed to be suitable for a wide range of populations [55]. Therefore, this study is aimed at general adolescent samples, where higher ceiling effects would be expected. Further testing of this version is required in clinical samples, where the ceiling effect would likely be significantly reduced.

The internal consistency of the Mandarin Chinese version was appropriate for all of ten dimensions, which similar to the original result of the KIDSCREEN project [16]. In other recent studies, excellent internal consistency of KIDSCREEN-52 in adolescents was also demonstrated [14, 17, 51, 56]. The test–retest ICCs were slightly higher than the a priori defined threshold of 0.70 [41] and the range of 0.560–0.770 reported previously [10]. This might be attributable to the following reasons: The maximum degree of informed consent and cooperation were obtained through convenience sampling in test–retest survey. Strict sample matching was performed to eliminate confounding factors that might influence the dramatic changes in HRQOL of adolescents. We conducted both surveys of school-based sampling and management, which maximized the similarity of environment, time, and process. Therefore, internal consistency and test–retest reliability were adequate for the Mandarin Chinese version.

The confirmatory factor analysis yielded support for the ten-dimensional factor model with results that are consistent with other recent European, Asian, and African studies [51, 56,57,58,59]. Our results are similar to the previous studies on adolescents with the CFI, NFI, TLI, and AGFI values greater than 0.90 [51, 56, 58,59,60] and RMSEA value less than 0.05 [57, 61]. The significant Chi square was observed, which might indicate a misfit of the ten-dimensional structure. However, this is more likely to be due to the fact that the Chi-square statistic alone is not appropriate in large data sets [62] and this situation has also appeared in previous studies [10, 56,57,58,59, 61]. Additionally, our findings supported the full configure invariance and metric invariance of the ten-dimensional factor model of gender and age groups. However, at a more stringent level (scalar and strict invariance), the age group cannot be considered fully invariant. To the best of our knowledge, this study is the first to test the measurement invariance at a more stringent level in the Mandarin Chinese version of KIDSCREEN-52 across gender and age groups, and the cause of this result is not yet clear. Future use of this instrument with adolescents should test for the measurement invariance [63] and, if necessary, identify non-invariant items by constraining the intercepts of each item individually [64] and removed or unconstrained these items [65].

The convergent and discriminant validity analysis indicated that the Mandarin Chinese version showed a reasonable pattern of associations. We analyzed the validity by correlating the KIDSCREEN-52 with a similar instrument (PedsQL™ 4.0) to assess HRQOL. Correlations were generally the highest for those pairs of dimensions in which higher correlations were a priori expected. For example, the highest correlation was found between moods and emotions of KIDSCREEN-52 with emotional functioning of PedsQL™ 4.0. This was considered to reflect a reasonable convergence, while low to moderate correlations between theoretical different dimensions support the construct validity in the form of discriminant validity (that is, the lowest correlation between the KIDSCREEN-52’s financial resources with PedsQL™ 4.0’s physical functioning was found). Previous studies have also shown support for the validity of the KIDSCREEN-52 by correlations between KIDSCREEN-52 and PedsQL™ 4.0 [52, 66].

All KIDSCREEN-52 dimensions were significantly differentiated according to the different socioeconomic status, especially in high vs. low. Although other previous studies have generally shown that the KIDSCREEN are capable of discriminating adolescents with different socioeconomic status [10], this study shows a more magnitude of the effect sizes than the original research. This is because sample selection of this study was only for adolescents (lack of child samples), as a previous study has shown that socioeconomic status might be more important for HRQOL in adolescents than children [67]. Actually, one would expect a comparison of KIDSCREEN-52 between children and adolescents, but unfortunately it was beyond the scope of this study.

Furthermore, we found that the Mandarin Chinese version of KIDSCREEN-52 was able to discriminate between mental health status of adolescents, which was consistent with previous researches done on European samples [10]. However, our study found that the effect sizes in many dimensions were higher, especially between adolescents with poor vs. good mental health. The amazing variation in the magnitude of effect sizes has been reported in previous studies [67] and might be because many of the KIDSCREEN dimensions focus more on mental and social heath. In addition, we only surveyed 11- to 17-year-old adolescents lacking children groups, which may exacerbate the effect sizes, for the situation of physically and mentally in adolescents was more changeable and unequal [68]. These results should be confirmed in further research by using this instrument in clinical settings.

Some limitations of this study should be addressed. The first limitation is that this study limited the age of adolescents between 11 and 17 years (from the sixth grade in primary school to the second grade in high school). It is more usual to assess a wider age group between 8 and 18 years. The absence of a sample of adolescents in the 18-year-old age group can be explained. The vast majority of them are in the last year of high school, facing an important turning point in life–Chinese College Entrance Examination. Thus, learning has become a top priority, and schools have policies that discourage these adolescents from participating in social activities unrelated to learning. Therefore, based on the above-mentioned facts, this study did not include adolescents of this age group. Furthermore, we believe that if good psychometric properties of an instrument are demonstrated in a large sample of younger adolescents, it is possible that this will also be the case in older adolescents as they typically have better cognitive skills [69]. However, the lack of the children sample aged from 8 to 10 is indeed a significant limitation, which makes it impossible to know the measurement performance of the instrument at this age and thus limiting its use in turn. Thus, given the importance of this age group in the development and application of this instrument, we encourage future researchers to consider this in their future study. The second limitation is that data were only sampled from Weifang of China, due to the aims of the survey and the limited resources. Given the diversities across ethnic groups, geographical regions, socioeconomic levels, and other factors in Chinese mainland, we must pay attention to the interpretation of the study’s findings, as these may not be generalizable to the Chinese populations. Therefore, future studies should use a larger and representative sample of children and adolescents in Chinese mainland to further examine the psychometric properties of this instrument. The third limitation is that only two variables (socioeconomic and mental health status) were included in the known-group validity analysis, which ignored the sensitivity of the instrument to other social determinants, such as social support, family status, school environment, and health care utilization. Future studies would benefit from the use of multiple indicators of variables in order to allow for the evaluation of the measurement and structural models. The fourth limitation of the study is its reliance on self-report data, which are often biased by social desirability. Future research will need to establish that adolescent responses are related to other sources of information such as teacher and parents reports. The fifth limitation is that the instrument was not tested in a clinical setting; therefore, it needs to be tested in a clinical setting where clinical diagnoses and information about the severity of conditions are available. Finally, because of the cross-sectional design of the study, it was impossible to test the sensitivity of the instrument to change. This should be tested in future studies which might focus on testing the sensitivity of the instrument to change within a randomized longitudinal intervention study with a control-group.

Conclusion

In conclusion, evidence from this study supported that the Mandarin Chinese version of KIDSCREEN-52 is a feasible, reliable, and valid instrument for measuring HRQOL of adolescents in Weifang, China. It is worth noting that this is not the first study that validates the instrument in a Chinese context since the Cantonese version already exists. However, this is the first study to validate the instrument in a Chinese mainland adolescent population. Despite the important limitations mentioned above, we believe that this study provides a starting point for using KIDSCREEN-52 in the HRQOL measurements among adolescents in Chinese mainland. Therefore, although having the positive initial outcomes, the evaluation of KIDSCREEN-52 should be an on-going process, by extending psychometric testing to characteristics not only assessed in this study, but also assessing its applicability and performance in other populations and regions.