1 Introduction

Persons from a representative reference population with a height below −2 SDS (Standard Deviation Scores) meet the definition of short stature. About 125.000 children are born in Europe each year with a height below −2 SDS of the mean height of their peers according to age, gender and country of origin, therefore meeting the definition of short stature (SS) (Wit et al. 2008). In contrast to the statistical definition of SS, the clinical diagnosis is based on Growth Hormone Deficiency (GHD) as identified by a low Growth Hormone (GH) response to provocation tests. These children can be treated with daily injections of recombinant human GH to reach normal height in adulthood (Binder 2011). Treatment options for children with Idiopathic Short Stature (ISS) are limited because underlying pathology or lack of GH is not confirmed (Wit et al. 2008). Efforts were rarely made to assess the psychosocial burden of SS. The few existing studies which focus on HrQoL in SS youth describe consequences such as stigmatization, social isolation, “juvenilisation” and low self - esteem while other studies acknowledged the burden of the condition but found no impairments in mental health (Bullinger et al. 2009; Sandberg and Voss 2002; Voss and Mulligan 2000).

In general, HrQoL is defined as subjective physical, emotional and social aspects of well-being and functioning as perceived by patients and or observers (Bullinger 1991). HrQoL is a key treatment outcome that is increasingly considered important in pediatric endocrinology. It is a relevant treatment outcome from epidemiological and clinical perspectives and is broadly employed in health economic analyses as well (Bullinger et al. 2009). Although generic measures have been used to assess HrQoL in children with short stature, the Quality of Life in Short Stature Youth (QoLISSY) was the first condition specific instrument which focused on the perspectives of both the young patients and their parents and which could be a applied across countries (Brütt et al. 2009).

The QoLISSY questionnaire was developed simultaneously in five European countries (the UK, Sweden, Spain, France and Germany) to assess HrQoL from the perspective of SS children and their parents (The European QoLISSY Group 2013). The aim was to provide a valid measure of outcome for use in both, clinical trials and clinical practice. However, a more efficient instrument for use in busy pediatric clinics prompted the development of a brief version of the QoLISSY. When developing the original QoLISSY questionnaire, care was taken to identify a core measure assessing physical (6 items), emotional (8 items) and social (8 items) domains of HrQoL (Bullinger 1991), as well as additional determinants of HrQoL such as perception of treatment (14 items), coping (10 items) and beliefs (4 items) about height to supplement the core measure. In the parent version, based on parent comments in the original content elicitation, two parent specific concepts were identified – the effect of the child’s SS on the family (11 items) and the perceptions of the parent regarding their child’s future (4 items). The QoLISSY core module with its 22 items showed satisfactory psychometric characteristics with a Cronbach’s α of 0.95 (range 0.84–0.88) for the child report and 0.95 (range 0.86–0.90) for the parent report across its different language versions. The QoLISSY questionnaire is fit for use in clinical research (Bullinger et al. 2013; Quitmann et al. 2013; The European QoLISSY Group 2013).

Since the length of instruments is perceived as a barrier against the implementation of measuring HrQoL in clinical practice, the development of short-form versions is recommended to simplify data collection (Varni et al. 2005). The challenge is to represent the content and domains of the original instrument in an abbreviated version without unduly affecting the instrument’s psychometric performance and at the same time ensuring the maximum amount of shared variance between short and long versions (Muehlan 2010; Pollak et al. 2006).

A short form of a well-established questionnaire broadens its application potential in research and practice contexts by reducing the burden of data collection and the risk of item non-response (Jokovic et al. 2006; Ravens-Sieberer et al. 2010). The current paper describes the development and testing of a brief version of the QoLISSY questionnaire which aims at providing a valid measure of HrQoL e.g., in an outpatient pediatric clinic to be completed in less than 5 min. We were specifically interested in identifying which items best represent the QoLISSY core module with its 22 items. Furthermore we were interested in the psychometric properties of the QoLISSY brief version in terms of reliability and validity. Finally we wanted to characterize the operating characteristics of the brief version using Item Response Theory (IRT) performance. We hypothesize that the brief version is a psychometrically sound uni-dimensional scale, which can be used in place of the full 22 item version when a short form is indicated.

2 Materials and Methods

The original QoLISSY project focused on developing a condition-specific instrument across several European countries to capture the impact on HrQoL of short stature in children and adolescents (The European QoLISSY Group 2013). The objective was to provide a tool which could be used e.g., for measuring HrQoL as an outcome in pediatric clinical trials of growth promoting therapies, to give a voice to SS patients and their parents. Patients and parents can communicate their experience with SS, and this information can be used to improve patient care.

2.1 Development of the QoLISSY Questionnaire

The QoLISSY questionnaire was developed through focus groups, pilot testing including a cognitive debriefing process, and a field test with re-test (The European QoLISSY Group 2013). The psychometric properties of the child- and parent- reported versions of QoLISSY were examined in the field test in a total of 268 patients (134 children aged 8–12 years and 134 adolescents aged 13 to 18 years). A classical test-theoretical approach was chosen to construct the QoLISSY questionnaire, predicated on distributional characteristics, reliability, and the tenets of content and construct validity (The European QoLISSY Group 2013).

The original QoLISSY core module includes three dimensions, reflecting physical (6 items), emotional (8 items) and social (8 items) aspects of quality of life. Together the 22 items form the QoLISSY total score. All items are to be answered on a 5-point Likert scale (5 “not at all/ never”, 4 “slightly/ seldom”, 3 “moderately/ quite often”, 2 “very/ very often”, 1 “extremely/ always”) with higher values reflecting a more positive evaluation of HrQoL.

2.2 Sample Recruitment and Description

Children and adolescents diagnosed with GHD or ISS and one of their parents were recruited from participating clinical centers in the five European countries (the UK, Sweden, Spain, France and Germany) to assess the child’s quality of life. Recruitment of families was performed by clinical centers according to the inclusion criteria age (young patients between 8 and 18 years and their parents as well as parents of young children (4–7 years old)), diagnosis (GHD or ISS) and treatment (GH treatment yes or no). Only data from participating families having given informed consent was forwarded, information on non- participants was not available. At date of diagnosis all children met the definition of short stature with a height of ≤−2 standard deviations (SDS) below the mean, adjusted for age and gender (Ranke 1996). To be eligible, children had to be diagnosed as short statured according to patient files. Several children/ adolescents had achieved a normal height at the time of assessment due to growth with or without GH treatment earlier.

A total of 268 children and adolescents were included in the current analysis (129 children of 8 to 12 years and 139 adolescents of 13 to 18 years). Sample characteristics are shown in Table 1. Paired child and parent data was included in the analyses o compare the child and parent report in the brief version. Since eight parents did not answer the QoLISSY parent version in the field test, 260 parent–child dyads were used to test parent–child agreement. General analyses were conducted with a sample of N = 134 children for the development as well as an N of 134 for the test sample (see Table 1). Results of a chi-square test indicated a similar distribution across the development and test samples in terms of patient age, gender, diagnosis, treatment status, and SDS height.

Table 1 Sample characteristics

2.3 Item Selection - QoLISSY Brief Version

The field test dataset of the original validation study was used to develop and examine the psychometric properties of a brief version of the QoLISSY questionnaire, supplemented by testing the brief version for item response test (IRT) performance. For the current analysis, the total field test dataset was randomly split into two subsets; the “development sample” was used to generate the QoLISSY brief version and the “test sample” to examine its preliminary psychometric properties.

The initial task in developing the brief version was to identify within each of the domains those items that best represent the three core HrQoL subscales. This selection process was conducted using reliability indicators to identify items with the highest corrected item-scale correlations (rtt); i.e., correlation between the item and the remaining items within the scale and the lowest impact on alpha coefficients when items were omitted in the developmental sample (Coste et al. 1997; Wong et al. 2013). To facilitate comparisons between child or adolescent and parent ratings, child report items were used to construct the parent-reported QoLISSY brief version, items were only changed in wording to reflect the parent perspective. In the case of the parent version, the term “I” was changed to “My child” with the remaining item text and response categories unchanged. Just like the original full version, the parent-report version thus includes the identical items, i.e., nine core items in the QoLISSY- brief. All analyses reported above were conducted using the Statistical Package for the Social Sciences version 18 (SPSS Inc 2009).

2.4 Testing for IRT Assumptions

Once the proposed items for the brief version had been identified from the development sample, adequate functioning of rating scale categories was examined in the test sample (Las Hayas et al. 2010; Linacre 2009). The procedure was based on the following IRT assumptions regarding tests for uni-dimensionality: presence of more than 10 observations per answer category; a smooth distribution of category frequency (the frequency distribution is not jagged); clearly advancing average measures; and sufficient model fit (i.e., congruence between observed and expected values). Uni-dimensionality was assumed because of high scale inter-correlations indicating that the correlations relate to the same construct. To test for these assumptions, the frequency distribution across the five answer categories was inspected, which showed significant skewness with the majority of the responses in categories 4 and 5, reflecting better functioning.

In several cases, observed frequencies remained below 10 cases across categories and therefore did not meet the distribution assumptions outlined above. There was a marked skewness in the data with the majority of respondents indicating little or no problems consistently across the nine items. To allow for a more balanced distribution across responses, answers 1, 2 and 3 were collapsed into one category.

The assumed uni-dimensional structure of the QoLISSY brief was tested by conducting a confirmatory factor analysis using ordered categorical indicators and a weighted least square estimator. A normed chi-square value of 2.412, a Comparative Fit Index of 0.972, a Tucker Lewis Index of 0.963, and statistically significant (p < 0.001) positive factor loadings for all indicators suggested that the brief instrument meets the IRT criteria for uni-dimensionality (Las Hayas et al. 2010; Linacre 2009).

2.5 Psychometric Properties

Internal consistency was assessed using Cronbach’s alpha, values above a = 0.70 were interpreted as acceptable (Cronbach 1951). Convergent validity was assessed based on the brief version’s total score correlation with the KIDSCREEN 52 generic measure of QoL (The KIDSCREEN Group Europe 2006), expecting moderate correlations with thematically similar scales. Known groups validity analyses based on t-tests were conducted to test for differences according height (> −2 SDS/ ≤ −2 SDS), both for child- and parent- reported data. In relation to t-tests, standardized effect sizes (Cohen’s d), where a value of 0.2 is considered a small difference, 0.5 a medium or moderate difference, and 0.8 a large difference (Cohen 1988) were used as an indicator of the magnitude of potential clinically significant differences.

2.6 Concordance Between the 22-Item Version and the Brief Version and Between Parent-child QoL Ratings

Finally the Pearson’s product–moment correlation between the QoLISSY 9-item brief version and the QoLISSY original 22-items total score was inspected to determine the amount of explanatory power retained in the brief version. In addition, Pearsons’s correlation and intraclass correlation coefficients (ICC) between child and parent report were inspected to examine the degree of agreement across respondent perspectives and within child-parent dyads.

2.7 IRT Performance

To test IRT performance, we fitted Masters’ partial credit model (PCM; (Masters 1982) and performed general model tests. We calculated itemfit statistics (infit and outfit), with a good fit indicated by values between 0.7 and 1.3 (Bond and Fox 2001). Item difficulties and differential item functioning (DIF) for age, gender, treatment status, diagnosis, and SDS height were examined with Andersen’s likelihood ratio test (Mair et al. 2012).

Missing values, present in less than 3 % of all items across the patient and parent samples, were replaced by mean substitution per case. Analyses were performed using SPSS version 18 (SPSS Inc 2009), Mplus 7.11 (Muthén and Muthén 2012), and R (R version 0.97.318, The R Foundation for Statistical Computing) which also included the eRm package (Mair et al. 2012).

3 Results

3.1 Item Selection - QoLISSY Brief Version

Item difficulties (means; range 1–5), standard deviations, corrected item-scale correlations, multiple correlations and Cronbach’s alpha after item deletion for each of the original QoLISSY items in the core module were calculated. Results showed that for physical aspects of quality of life items 1, 3 and 4 met the item selection criteria in terms of the highest item-total correlations, the highest multiple correlation with the remaining items, and the lowest impact on alpha coefficients when items were omitted. For social aspects of quality of life, criteria were met by items 2, 4 and 5, and for emotional aspects of quality of life items 1, 5 and 8 were selected. The items for the QoLISSY brief version were chosen on the basis of results from the child dataset (see Tables 2, 3 and 4) and these items were also used for the parent brief version.

Table 2 Reliability analysis – development sample: physical aspects of QoL
Table 3 Reliability analysis – development sample: social aspects of QoL
Table 4 Reliability analysis – development sample: emotional aspects of QoL

3.2 Psychometric Properties

Inspecting the reliability of the items selected in the test sample indicated acceptably high item-total correlations with values between rtt = 0.59 and rtt = 0.70 for the child version and values between rtt = 0.55 and rtt = 0.73 for the parent reported version. Together, these results support the validity of a total score from the QoLISSY brief version, in line with the results of testing for uni-dimensionality (see Table 5).

Table 5 Reliability analysis of the brief version (test sample)

Scores were transformed to a 0-100 scale, with higher scores indicating better QoL. The total mean score in the test sample was 63.93 (SD = 29.58) for children and 58.21 for parents (SD = 29.71). Cronbach’s alpha for the brief version total score was 0.89 with a split half reliability of 0.87 for the child version. Item and total score reliability indices were nearly identical for the parent version in the test sample (Cronbach’s a: 0.89; Split-half: 0.86).

Statistically significant differences between groups according to SDS status were found. Taller children (with a lower height SDS) reported higher HrQoL in the brief version (see Table 6). Differences in height deviation were found when inspecting parent ratings as well (see Table 6). The standardized effect sizes (Cohen’s d) are shown in the last column of the tables and showed a large effect size for the difference between the two height groups in the parent report (d = 0.91) and medium or moderate effect size for the other results (d = 0.56–0.69).

Table 6 Testing for group differences in SDS height

Correlation coefficients of the QoLISSY- brief with the generic KIDSCREEN instrument were highest for self-perception and social acceptance (above r =0.50, see Table 7). As expected, correlations were lower with KIDSCREEN subscales not represented in the condition-specific QoLISSY (e.g., Autonomy, Parent-relation & Home Life and Financial Resources).

Table 7 Correlations between the QoLISSY brief version total score and domains of the Kidscreen 52

3.3 Concordance Between the 22-Item Version and the Brief Version and Between Parent–child QoL Ratings

The correlations between the brief and the full versions are examined to provide evidence that the short form, obtained after item reduction, explains a significant amount of variance in scores of the full form. The correlation between the original QoLISSY total score and the brief version total score was r = 0.95 for both comparing the full and brief version in children and comparing the two versions in parents. Pearson’s correlation between children and parents’ brief version scores was r = 0.68 and the intraclass correlation coefficient (ICC) was 0.70 for parent–child dyads.

3.4 IRT Performance

Results of the IRT analyses showed that the partial credit model fits the data well (Andersen’s likelihood ratio test p = 0.477, Loef’s likelihood ratio test p = 0.504). Outfit (range 0.784 to 1.290) and infit (range 0.798 to 1.201) item statistics were within acceptable limits (Bond and Fox 2001). Item thresholds were monotonically increasing for all items, but distances between category thresholds were below 1.4 (logit scale) for all items but item 2, suggesting limited discrimination (see Table 8).

Table 8 Item statistics for the QoLISSY brief version according to the partial credit model

Differential item functioning (DIF; alpha level = 0.01) was not present across different subgroups (age, gender, diagnosis, SDS height, treatment status) which implies measurement invariance – the QoLISSY brief version is measuring the same thing for all respondents (Walker 2011).

4 Discussion

The question regarding which approaches and strategies should be employed to construct short forms of instruments has been debated in the literature (Jokovic et al. 2006; Newcombe et al. 2013). Different methods are available, and choice of the appropriate approach depends on the priorities for development and intended use (Coste et al. 1997; Ravens-Sieberer et al. 2010; World Health Organization 1996). Even though the construction of abbreviated versions of instruments can be diverse, the basis of construction should be clearly outlined and relevant test statistics should be used and results reported. The present paper describes the development of an IRT conform 9-item brief version of the QoLISSY questionnaire which conceptually reflects the underlying three-dimensional domain structure of the 22- item full version.

The current approach was based on a theoretical model of quality of life represented by three domains (physical, social and emotional aspects of QoL) that had been suggested in the WHO definition of health (World Health Organization 1946). This model was empirically substantiated in the original QoLISSY study (The European QoLISSY Group 2013; World Health Organization 1946). Following a theoretically based construction principle, the decision was made to retain this structure rather than derive the short form by identifying items that correlate highest with the original 22 items total score.

The results of the QoLISSY brief version are in line with the results found with the 22-item version still the nine items were selected using a priori criteria for short form construction (Coste et al. 1997; Wong et al. 2013) with the aim of ensuring an optimal representation of each of the three QoLISSY core domains. One half of the original QoLISSY validation sample was used to determine which QoLISSY items should form the brief version, and the other half to test its psychometric properties and IRT performance. These findings must be considered in light of methodological limitations. The correlation between the total scores of the brief and full version was high, indicating that the brief version explains a major proportion of variance (r = 0.95, p < 0.01) of the full version. This information however should not be mistaken to imply concurrent validity of the brief version, which has to be tested in an independent sample. It should be noted that testing the brief version within the original QoLISSY sample is not additional evidence of validity but rather an indication of successful development of the abbreviated instrument. Also the composition of the sample was unbalanced because of a covariation between clinical characteristics such as diagnosis and treatment status. Therefore, the brief version needs to be examined and tested in independent samples in order to demonstrate its value as a reliable and valid measure of HrQoL in SS children and adolescents and their parents. In addition, more information about the families who gave their informed consent compared to the ones who rejected participation in the study would be needed to control for selection bias.

Though the distribution of responses in this sample suggests that the response category choices could be reduced from 5 to 3 levels, at this time the 5 level response choice will be retained in the instrument as seen by the respondent. The scoring of the 9-item brief version could be based on the collapsed category set if the data are sparse in the higher impairment categories as we have done here. But the ability to differentiate in populations who report a different distribution of their answers than ours (i.e., are more affected in their quality of life) will be lost by the reduction of the 5-point answer category.

Results, however, suggest that the QoLISSY brief version adequately represents the full version, demonstrates acceptable psychometric characteristics in terms of reliability and validity and fulfills the IRT criterion of uni-dimensionality. Once replicated in an independent clinical sample, the QoLISSY – brief can be used in situations where a brief version is preferable to the full QoLISSY instrument. The psychometric testing of the long and brief version is relevant to the five European countries of the QoLISSY study. Further testing is needed in a wider range of linguistic, cultural and national contexts to examine the stability of the results in other countries.

While the original QoLISSY questionnaire with its 22- item core QoL module provides information on physical, social, and emotional subscales, the QoLISSY - brief version yields one uni-dimensional total score. This version was developed on the basis of a three- dimensional conceptual model for Item selection and IRT approach to examine uni-dimensionality, with subsequent psychometric testing for reliability and validity.

Its brevity and ease of administration as well as minimal respondent burden provides a benefit to clinical practitioners desiring to assess impairment in short statured patients. The QoLISSY - brief may be used to screen for individuals with an impaired height-related HrQoL so that appropriate treatment options can be considered. Ongoing management of patients under treatment may be enhanced by the ability to monitor the HrQoL impacts over time in a simple fashion. As a research tool, its quality and feasibility make it well suited for clinical studies, epidemiological cohort studies and population surveys. As such, the QoLISSY-brief contributes to broadening the application potential of patient- and parent reported outcomes in pediatric growth disorders.

4.1 Access to the QoLISSY Instrument

QoLISSY is a joint initiative between Pfizer Limited and the University Medical Center Hamburg - Eppendorf. Copyright Pfizer Limited all rights reserved. The QoLISSY instrument, together with comprehensive information of its development and validation process is published in the QoLISSY’s User’s Manual (Pabst Science Publishers, Lengerich, 2013). The Manual, which is available upon request, includes QoLISSY child and parent forms in all existing language versions, as well as scoring information. The QoLISSY 22-Item version as well as the brief version will be made available for bona fide research and clinical purposes via the Pfizer Patient Reported Outcome website (www.pfizerpatientreportedoutcomes.com). For those interested in conducting collaborative research projects directly with the University of Hamburg, please contact Dr. Quitmann (j.quitmann @ uke.de) or Prof. Dr. Bullinger (bullinger @ uke.de).