Introduction

Overweight or obesity in children has become a prominent health challenge in the twenty-first century. Currently, an estimated 42 million children are obese or overweight globally [1]. This chronic condition can impact upon many aspects of the lives of children and adolescents including the physical, emotional, and social domains of life. These domains should be addressed when choosing an outcome measure and can be measured with functioning disability and health (FDH), health-related quality of life (HRQOL) and quality of life (QOL) patient-reported outcome (PRO) instruments.

In child health services and obesity research, the terms FDH, HRQOL and QOL are often used interchangeably, which poses problems with the interpretation of content of instruments designed to assess these concepts [2, 3]. World Health Organization (WHO) definitions can serve as a basis for delineating the conceptual approach that is measured regardless of the PRO instrument’s title. A biopsychosocial view of health as conceptualized through functioning is found in the International Classification of Functioning, Disability, and Health Children and Youth version (ICF-CY) as ratified by the WHO [4]. The FDH approach focuses on impairments, capacity, performance and barriers and facilitators of health. In contrast, the WHO-QOL group defines QOL as “a (child’s) perception of their position in life… in relation to their goals, expectations, standards and concerns.” In the WHO-QOL approach, mention of perceptual or subjective elements of life must be explicit in order to measure a child’s QOL [5]. When HRQOL is subsumed under the WHO-QOL’s definition, it includes the child’s perception of the health and health-related states (Fig. 1).

Fig. 1
figure 1

Conceptual approaches of FDH, HRQOL and QOL applied to PRO instruments

Even if the overall approach within a PRO instrument is consistent within a FDH, HRQOL or QOL concept, the specific health and health-related content within an instrument can vary considerably. The ICF-CY classifies specific health and health-related components of body functions and structures and the activities that constitute a child’s participation in life roles, all of which occur in the context of a child’s social and physical environment as well as the personal factors that make each child unique. These components and categories can delineate the specific health content found in FDH, HRQOL and QOL PRO instruments. For example, one HRQOL instrument can assess a child’s emotional and physical domains, while another can assess cognition, peer relationships and family support. Coding the content of PRO instruments using the ICF can highlight the gaps and overlap between different instruments.

The specific health and health-related content of PRO instruments should be relevant to children who are obese as well as the purpose to which the PRO instruments will be applied. The extent to which the content of different generic and obesity-specific instruments reflects health issues that are relevant to children with obesity has to date been under explored. While generic instruments can provide a valuable means for comparing various groups of children (e.g., those with and without obesity), disease-specific instruments are hypothesized to have more relevant content covering the health concerns of the target group (i.e., obesity). Thus, obesity-specific PRO instruments are considered more appropriate for measuring change related to obesity-related health improvements, though this hypothesis requires empirical verification.

There is a paucity of information that directly compares the health content of the available obesity-specific instruments with generic PRO instruments. Such information is needed for researchers and clinicians to weigh their options in the selection of instruments and in the interpretation of differences in scores between groups or over the course of an intervention. The ICF-CY classification has been useful for making the health content of such instruments explicit in childhood epilepsy, cancer, cerebral palsy and other conditions [2, 6, 7].

The ethics of having children and adolescents complete FDH, HRQOL and QOL instruments has been underexplored in reviews of PRO instruments. The nature of these instruments is such that they are typically designed to collect information directly from participants and/or their proxies without cueing, interpretation or debriefing from third parties such as clinicians or other care providers [8]. The items are potential indicators that can alert children to aspects of their health for which they were not aware, having an impact on the way children view themselves [9, 10]. Thus, the extent to which negative content and phrasing is present in PROs should also be reviewed to confidently apply such instruments in an ethical manner.

Finally, in addition to reviewing the conceptual approach, the gaps and overlaps of health domains, and the ethical amount of negative content of PRO instruments, their selection and application require a review of psychometric properties. Many generic measures have been described and reviewed [1120], but detailed information about the psychometric properties of the obesity-specific outcomes is yet to be reviewed.

The objectives of this review are as follows: (1) to identify the most commonly used PRO instruments and characteristics of their use in the literature; (2) to describe, compare and contrast the conceptual approach to measurement with WHO and related definitions (i.e., FDH, HRQOL or QOL), health and health-related domains (e.g., emotional, physical and social domains using the ICF), as well as the ethical (negative) content; and (3) to summarize the development and psychometric properties of PRO instruments used with pediatric patients who are overweight or obese.

Methods

Literature search to identify commonly used PRO instruments, their characteristics and use

We conducted a systematic review of English language articles in PubMed, CINAHL, EMBASE and PsychINFO from the inception of each database until May 2012. A comprehensive list of keywords was used to identify articles about being overweight or obesity and QOL and/or PRO instruments. Abstract and title screening were performed by one reviewer who removed abstracts, theses, review articles and articles that were not about children, obesity and/or FDH/HRQOL/QOL. The full text of all remaining papers was obtained and examined independently by two reviewers. Articles were retained if the following criteria were met: published in English; participants were aged up to 21 years; participants had a BMI percentile >85; and participant or a proxy completed a generic and/or obesity-specific PRO instrument. Studies were excluded if the PRO instrument used met the following criteria: measured only one health domain (e.g., fatigue, pain); was ad hoc (i.e., one without published evidence of a development or validation process); was not intended for children or youth; or was a modified version of adult PRO instrument. Citations of the included articles were examined in order to identify any studies that might have been missed in the search.

Overview of content analysis procedure

The included PRO instruments were coded by two independent reviewers on an item by item basis; if there was disagreement in coding, a third reviewer was consulted to reconcile discrepancies. Only the final consensus list of conceptual approach (FDH, HRQOL, QOL), health and related content (ICF codes and categories), and ethical (negative) content is reported.

Analysis of conceptual approach (FDH, HRQOL, QOL) using WHO definitions

Included instruments were assessed using a method specifically intended to measure the conceptual approach (e.g., FDH vs. HRQOL vs. QOL) as shown in Fig. 1 and validated in content analyses of cancer-specific and generic child PRO instruments. The WHO definition of functioning in the context of environment and personal factors was used to code if an FDH approach to measurement was used; the WHO-QOL definition of QOL that emphasizes a child’s subjective perceptions about life was used to code if a QOL approach was used. Items were coded as HRQOL if they explicitly mentioned a child’s personal perception of a health or health-related domain irrespective of the type of health domain (emotional, physical, social) mentioned. Examples of this coding can be found in Fig. 1.

Analysis of specific health and health-related content using the ICF-CY

The health and health-related content of PRO instruments for children was analyzed using a method that codes each item using the ICF-CY. This method has been validated in reviews of childhood cancer, epilepsy, cerebral palsy [2, 6, 7] and across conditions and is reported in detail in previous publications, which have been cited.

Analysis of ethical content

Finally, both the content and phrasing of each item were coded as negative or neutral/positive using the method described by Fayed et al. [2]. All steps were performed by two independent trained content analysts on an item by item basis.

Summary of psychometric properties

Eligible instruments identified by the search were appraised for adherence to published guidelines and criteria for the development and validation of PRO instruments [2, 8]. When information about instrument development and validation was lacking from an article, an attempt was made to contact the corresponding author for further information. Two reviewers extracted findings about the psychometric properties of each qualifying PRO instrument and compared findings to ensure consensus. For each instrument, we examined whether recommended procedures for item generation, item reduction and psychometric evaluation were used in the development process, and examined whether minimum standards for internal reliability [21] and reproducibility [22] were achieved (i.e., reliability coefficients of at least 0.70 for group level comparisons) and findings for construct validity and responsiveness.

Results

Commonly used instruments, their characteristics and use

Figure 2 displays the search strategy implemented that retrieved a total of 70 publications used in this study. These articles came from 12 different countries; most research was conducted in the USA (42 publications).

Fig. 2
figure 2

Application of the inclusion and exclusion in the literature search

Six generic and four obesity-specific pediatric measures were identified from the 70 publications (see Figs. 3, 4 for frequency of use). Fifteen publications used an obesity-specific questionnaire (of which five used only an obesity-specific). Sixty-five publications used a generic instrument (of which 55 used only a generic). The most commonly used PRO instrument, the generic Pediatric Quality of Life (PedsQL 4.0), was used in 53 publications compared with eight publications for the most commonly used obesity-specific measure, Impact of Weight on Quality of Life (IWQOL-Kids). Since the PedsQL 4.0 dominated a disproportionate level of generic instrument use and the remaining generic measures were used with negligible frequency, only the PedsQL 4.0 was retained for the subsequent analyses. The four obesity-specific PRO instruments included in this review were as follows: Impact of Weight on Quality of Life (IWQOL-Kids); KINDL Quality of Life Questionnaire: Obesity Module; Sizing Me Up (self-report)/Sizing Them Up (parent proxy); and Youth Quality of Life–Weight module (YQOL-W).

Fig. 3
figure 3

Number of publications for each obesity-specific instrument

Fig. 4
figure 4

Number of publications for each generic instrument

Content analysis

Conceptual approach (FDH, HRQOL, QOL) using WHO definitions

The dominant approach found in each PRO instrument was FDH and not HRQOL or QOL (see Table 1). The YQOL-W was the only instrument to include a QOL/HRQOL approach for a substantial proportion of its items (33.3 % measured QOL/HRQOL).

Table 1 Proportion of conceptual approaches (%) found in each measure

Health and health-related content using the ICF-CY

The ICF components emphasized among all the PRO instruments varied (see Table 2). Activities and participation was the dominant ICF component in the PedsQL 4.0, and one of the principal components in one of the obesity-specific instruments, i.e., IWQOL. The KINDL and Sizing Me Up/Sizing Them Up emphasized the body functions component. The focus of the IWQOL was on contextual factors of environment, whereas YQOL-W focused on personal factors components of the ICF (Table 2).

Table 2 Proportion of ICF-CY (%) components in quality of life measures

No specific ICF-CY (health and health-related) categories were used across all five instruments (see Table 3). The ICF-CY category b152 emotional functions were the most common category included within the body functions health component across instruments. The category b530 weight maintenance functions were the second most common category across instruments, although it was not included in the PedsQL 4.0. While recurring health and related domains found across the different PRO instruments in the activity and participation component were not evident, d activity and participation and d550 eating were the more common categories each appearing in three PRO instruments. In terms of environment, the e4 attitudes category was represented in all the obesity-specific measures and e425 Individual attitudes of acquaintances, peers, colleagues, neighbours and community members was represented in the PedsQL 4.0. Categories representing social supports for children with obesity were inconsistently represented.

Table 3 Frequency of specific ICF-CY codes represented in included PROs

Analysis of ethical content

The analysis of negative content (see Table 4) showed that almost all PRO instruments cue respondents to answer questions by using negative phrasing such as difficulties or problems. The proportion of items classified as negative content in the instruments was moderate for all the PRO instruments with the exception of the YQOL-W, which included negative content for the majority of items.

Table 4 Percentage of negative phrasing and content in quality of life measures

Summary of psychometric properties

Table 5 shows the methods used to develop each instrument as well as psychometric validation.

Table 5 Summary of psychometric properties including instrument development and validation

Below we summarize key findings for the PedsQL 4.0 and each obesity-specific instrument.

PedsQL™4.0

The PedsQL 4.0 Generic Core Scales are a generic PRO instrument designed to measure HRQOL in children aged 2–18 years. This instrument was developed from initial research with pediatric cancer patients [2325], and over time was developed into the current generic instrument for use with any pediatric population. Although item generation and reduction for the early PCQL are described, the process that lead to the development of the PedsQL 4.0 is not as clearly delineated. The 4.0 version of the PedsQL has separate versions for child and parent report and different age-modules (i.e., 2–4, 5–7, 8–12, 13–18 years). Content measures four domains (physical functioning, emotional functioning, social functioning and school functioning) with 23 items. An exception is that the module for completion by parents of children aged 2–4 has 21 items. Respondents are asked to rate how much of a problem each item has been in the past month on a scale ranging from 0 (never a problem) to 4 (almost always a problem). Scale scores are generated and range from 0 to 100, with higher scores reflective of better HRQOL. In addition, a total score and summary scores for physical and psychosocial HRQOL can be computed.

The PedsQL 4.0 has been studied extensively, and a listing of all PedsQL 4.0 publications is available; elsewhere, the review of which is beyond the scope of this paper [26]. Table 5 summarizes the results from two large-scale early psychometric studies [27, 28]. First, a study of 963 children and 1,629 parents recruited from a pediatric healthcare setting [27]. Feasibility was assessed by calculating the percentage of missing values, which was 2 % for both self- and proxy-report. In regards to internal consistency, for the total sample as well as each age-group, for both self- and proxy-report, the Cronbach α coefficient values for the total and summary scores exceeded the minimum 0.70. For the sample as a whole, one scale (school function self-report) was below 0.70. By age-group, values for at least one age-group were below 0.70 for social and school function for self-report, and emotional, social and school function for proxy-report. Using ANOVA, the hypothesis that the PedsQL 4.0 could distinguish between healthy, acutely ill and chronically ill subgroups was support for each scale as well as the summary scores and total score for both self- and proxy-report. Finally, factor analysis for self- and proxy-report provided evidence that was largely consistent with the a priori hypothesized five-factor structure [27]. Second, a large-scale study involved 10,241 families (response rate 51 %) who completed the PedsQL 4.0 in a statewide mail survey to evaluate enrollees in the State’s Children’s Health Insurance Program [28]. In this study, missing data were minimal and the majority of scales and total scores exceeded the minimum reliability standard of 0.70. The scales were also able to distinguish between health children and those with chronic health conditions, and scores were related in hypothesized directions with indicators of healthcare access, days missed from school, days sick in bed or too ill to play and days needing care.

Impact of weight on quality of life-Kids (IWQOL-Kids)

The IWQOL-Kids purport to measure weight-related QOL from the perspective of youth aged 11–19 years. The content of the questionnaire was modeled after the adult IWQOL [29, 30] and IWQOL-Lite [31, 32] and developed from literature on child/adolescent obesity and clinical expertise, but with no patient interviews. Exploratory factor analysis was used for item reduction resulting in 27 items measuring four constructs (physical comfort, body esteem, social life, family relations) plus a total score. Five response options are provided and range from “always true” to “never true.” Item scores are summed and transformed to a 0–100 scale, with higher scores representing better outcome. Psychometric properties were studied in a sample of 642 youth and included internal consistency reliability (Cronbach’s α ranged 0.88–0.96). Test–retest reliability was examined in a separate psychometric publication [33] with intraclass correlation coefficients that ranged from 0.75 to 0.88. Validity was examined through correlations and mean group differences for subgroups by zBMI and clinical versus community samples, with significant differences found in the hypothesized direction (higher BMI related with lower purported HRQOL). Convergent validity was demonstrated with higher correlations found between similar scales of the PedsQL 4.0 than between non-similar scales. Responsiveness was measured with 80 weight camp participants who reported significant improvements on all IWQOL-Kids scales after intervention. A separate publication mentions a parent-completed version of the IWQOL-Kids, though psychometric information is not reported [34].

Sizing Me Up/Sizing Them Up

Sizing Me Up [35] and Sizing Them Up [36] questionnaires (developed in parallel) are self- and proxy-report obesity-specific measures for children aged 5–18 years. The content of both instruments was developed from peer-reviewed literature of issues in child/adolescent obesity as well as expert advice from three pediatric obesity clinicians and researchers. Patient interviews were not conducted. Psychometric information came from studies of 220 obese youth and their parents (Sizing Them Up) and 141 obese children (Sizing Me Up). Exploratory factor analysis was used for item reduction resulting in two 22-item instruments that measure similar constructs with some overlapping content. Domains include physical function, emotional function, teasing and marginalization, Positive Social Attributes, social avoidance, school function and mealtime challenges. Four response options are provided to measure frequency from “never” to “always.” For each scale, the relevant items are summed and transforming to a 0–100 scale, with higher scores indicating better outcome. A total score can also be computed by adding together the 22 items. A separate 7-item parent-report scale was developed to measure adolescent developmental adaptation and includes items about dating, hobbies and extracurricular activities. Cronbach’s α values were above 0.70 for all domains and the total scores, with the exception of the Positive Social Attributes scale (0.59 for Sizing Them Up; 0.68 for Sizing Me Up). Intraclass correlations were above 0.70 for all scales and the total score, with the following exceptions: Teasing/marginalization (0.67 Sizing Them Up; 0.58 Sizing Me Up), Positive Social Attributes (0.60 Sizing Them Up), Social Avoidance (0.53 Sizing Me Up) and Emotional Function (0.66 Sizing Me Up). Convergent validity was assessed through correlations of scale scores to the domains and total scores of the generic PedsQL 4.0 (physical, emotional, social, school) and the obesity-specific IWQOL-Kids (physical comfort, body esteem, social life, family relations), with small-/moderate-to-high correlations reported for similar domains. Validity was examined through correlations between scale scores and zBMI, with higher zBMI associated with poorer HRQOL for physical function (Sizing Them Up) and emotional function (Sizing Me Up). Responsiveness was examined in the Sizing Them Up study 6 months after bariatric surgery, with significant improvements in all scale scores except for Mealtime Challenges domain. Overall, this instrument has been carefully developed and appears to meet most psychometric standards.

KINDL quality of life questionnaire (KINDL)

The KINDL obesity module [37] is a clinical subscale of the generic KINDL [38] that captures specific experiences associated with pediatric overweight or obesity from either the child or parent perspective. This 12-item instrument captures content from six areas: physical well-being, emotional well-being, self-esteem, family, friends and everyday functioning. Five response options are provided for each item to measure frequency from “never” to “all of the time.” Items are summed and transformed to values between 0 and 100, with higher scores indicating better outcome. There is very limited information available for the development and psychometric evaluation of this instrument. In a multi-centered study of 1,916 overweight and obese patients aged 8–16 years seeking treatment in Germany, a Cronbach α of 0.77 is reported for the scale [39]. No other information about this scale was found.

Youth Quality of Life–Weight module (YQOL-W)

The YQOL-W is a weight-specific measure for completion by youth aged 11–18 years who are obese [40, 41]. The content of this instrument was developed in a manner that included in-depth interviews with 68 adolescents, input from an expert panel and an examination of existing instruments. The authors report that this 21-item instrument aimed to reflect the WHO definition of QOL [42], which is measured with three domains as follows: self, social and environment. Item reduction involved a range of methods (e.g., expert input, item redundancy, missing data and factor analysis). All items have an 11-point scale anchored by “not at all” to “very much”, and scale scores are transformed on a 0–100 scale with higher scores indicating better outcome. In a psychometric study involving 443 subjects [40], the developers reported corrected item-total correlations that ranged from 0.69 to 0.85, Cronbach’s α that ranged from 0.90 to 0.95 (0.97 for the total score) and test–retest reliability and intraclass correlation coefficients that ranged from 0.71 to 0.73 (0.77 for the total score). Validity was examined through correlations with scale scores and zBMI, with higher zBMI significantly associated with self, social, environment and the total score, indicating that as weight increased, weight-related QOL decreased. Construct validity was assessed through correlations with the Children’s Depression Inventory and the YQOL (generic measure of QOL developed by the same team). Worse YQOL-W total scores were related with higher depression scores and poorer generic YQOL. Overall, this instrument has been comprehensively developed and appears to meet all of psychometric standards. The responsiveness of the instrument was not reported.

Discussion

Our findings indicate that the most often used generic instrument and all four obesity-specific instruments included in this study reflected a conceptual approach most consistent with a biopsychosocial view of FDH as opposed to HRQOL or QOL. A conceptual understanding of the approach to measurement of any particular PRO instrument is needed so that researchers and clinicians can make conclusions about what aspects of life or health distinguish between groups of children with obesity. This conceptual awareness of what is being measured with a PRO instrument is also important to determining whether it is FDH, HRQOL or QOL that changes (or not), following interventions [43]. For example, a child or family might report that a counselling intervention made the child “better” but a lack of change in scores on a FDH-based instrument might still be observed. This lack of change in scores is more easily interpreted if one did not expect the child’s emotional, physical and social health (FDH) to change but their perception of their position in life (QOL) did change. Thus, the conceptual approach to measurement one chooses to employ in using a PRO instrument is not merely an academic exercise but has practical implications [3, 20]. This review highlights that the use of any of the obesity-specific measures, with the exception of the YQOL-W, means that the focus is a FDH approach to measurement, not an HRQOL or QOL approach. Reviews of the conceptual approach found in other generic measures can further highlight which PROs for children can be used to measure HRQOL or QOL if those approaches are needed [11, 44].

This review found important differences in the health and health-related content of the most popular generic PRO instrument used in childhood obesity, the PedsQL 4.0, compared with that of obesity-specific PRO instruments. Certain domains essential to assessing the health of children with obesity, such as weight maintenance functions and eating, are overlooked by the PedsQL 4.0 yet captured by the obesity-specific PROs [35, 38]. Also, the attitudes of people in the child’s social environment were emphasized in the obesity-specific instruments but not in the PedsQL 4.0 [25, 35, 41]. In practical terms, a generic measure might be helpful when the purpose is to compare the health of children with obesity to children without. However, one must consider what health domains are being overlooked in adopting only a generic strategy both in terms of comparing children with obesity to healthy children, or to evaluating obesity interventions. Our review showed that the PedsQL 4.0 was the most often used PRO instrument in the context of obesity and this finding was not limited to cross-sectional studies involving non-obesity groups. If only a generic PRO instrument is chosen, this approach raises the question about whether it is acceptable and/or clinically meaningful to use an instrument with a patient group if that instrument overlooks the issues that matter the most to them. As stated in the 2009 US Food and Drug Administration guidelines, since the PRO measure aims to capture the patient’s experience, it cannot be considered as a credible instrument if there is no evidence of its use from the target population [8].

The analysis of health content demonstrated remarkable differences in the domains that were measured among the PRO instruments. This was not an issue of the health domains included in the generic versus obesity-specific instruments, but rather an issue of consistency within the four included obesity-specific instruments. Although the 2009 US Food and Drug Administration guidelines emphasizes the importance of including qualitative input from the target population in the development of a PRO instrument that evaluates their care [8], this standard was met only by the YQOL-W. The variety of methods used to generate items among measures, including but not limited to literature review, expert input and adaptation from other PROs, might account for the variety in health domains expressed among the obesity-specific instruments. Regardless, this analysis highlights how difficult it will be to summarize the effectiveness of obesity interventions that use different PROs as the basis for evaluation. Such an issue can be resolved through the adoption of one rigorously created PRO instrument, the creation of a core set, or even data harmonization strategies, none of which have yet been demonstrated to date in our awareness of the literature.

From a psychometric standpoint, there were two important shortcomings observed in most of the obesity-specific PRO instruments reviewed here; the first being the aforementioned lack of qualitative patient input and the second the lack of responsiveness studies. Ultimately, the rationale behind selecting an obesity-specific PRO often relates to its potential to be sensitive to changes from child-obesity interventions. If this is the case, studies of the responsiveness of the PRO instrument to detect changes will be important to their adoption in the evaluation of care.

Finally, PRO instruments are part of the realm of assessments available to researchers and clinicians for measuring FDH, HRQOL or QOL of children. Yet, exposing children to these instruments has ethical implications that should not be ignored. Qualitative studies of parents of children with health conditions have shown that merely answering many personal and negative questions found in PRO instruments can have a negative impact [45] and there is reason to expect this to be the case for children who respond for themselves. Thus, in selecting a PRO instrument, researchers and clinicians must be aware of the extent to which negative phrasing and content are present and compensate through debriefing or other strategies as necessary.

Conclusion

This extensive review of the conceptual, health and ethical content of PRO instruments for children with obesity as well as the psychometric properties of obesity-specific instruments demonstrates that no one instrument demonstrates all the characteristics assessed. Instruments that included qualitative input from children and youth with obesity had greater potential for demonstrating a holistic conceptual approach to PRO measurement yet the responsiveness of such tools need to be further validated.