Introduction

In 2016, an estimated 10,380 children and adolescents between the ages of 0 and 14 years were diagnosed with cancer in the United States [1]. Each year, approximately 40,000 children undergo cancer treatment [2] and more than 60% participate in a clinical trial [3, 4]. Treatments are associated with high degrees of symptom burden, which have both subtle and visible effects on children. For example, nearly 50% of children experience treatment-related fatigue and 30% experience treatment-related pain, nausea, cough, lack of appetite, and psychological deteriorations [5]. Unpleasant side effects may result in significant decrements in physical, mental, emotional, and social health domains as well as disruptions in primary and secondary education, which are of great concern in this young patient population [6,7,8,9,10,11,12].

In the past 20 years in pediatric oncology, there has been an increasing recognition of self-reporting being the gold standard for subjective health status indicators, including symptom burden and health-related quality of life (HRQOL) [13, 14]. Objective indicators of symptom burden may not accurately reflect unobservable symptoms that could be better captured by subjective reports of symptom experiences. That is, an objective symptom experience such as a skin ulceration may be more amenable to parent or clinician report, but an unobservable symptom experience such as nausea may be better represented by a child’s own self-report. Several pediatric oncology studies have found that compared to child self-report, clinicians often under-report both the prevalence and severity of subjective treatment-related symptoms experienced by children [15,16,17,18]. Studies on the agreement between children and parent proxy symptom reports have found that parents’ scores only agree fairly to moderately with their children’s self-reports with correlations ranging from 0.30 to 0.49 [8, 9, 15, 19,20,21,22,23]. These research findings indicate that clinician or parental proxy reports of cancer treatment burden for the ill child may not accurately reflect views from the child’s perspective [24]. As such, it is imperative for us to better understand the child’s experience in order to provide more patient-centered care in pediatric oncology.

Several self-report symptom instruments have been developed and used in pediatric oncology. However, these instruments differ in important ways including: symptoms assessed, number of questions asked, reference period used, phrasing of questions, and number and type of response options. Differences often reflect the respective study’s target population, with consideration of type of cancer, developmental stage, and/or the chronological age of the children being studied. There is also variation in the type and quality of psychometric evidence supporting existing pediatric self-report instruments. For example, some instruments used in pediatric oncology have not actually been validated in children (only in adults) whereas others have been validated in children, but not those with cancer (Fig. 1).

Fig. 1
figure 1

PRISMA [25] diagram of studies of pediatric self-report symptom measures in children undergoing active cancer treatment

The objective of the current study was to conduct a systematic literature review to identify and evaluate available English language instruments for measuring self-reported symptoms in children and adolescents undergoing cancer treatment. We evaluated the evidence regarding the psychometric properties of each identified instrument, whether or not the evidence derived was from cross-sectional or longitudinal evaluations, and if the instrument was used in a clinical trial. Based on the evidence, we offer recommendations regarding which self-report instruments are ready for use in pediatric oncology research. Findings from our comprehensive systematic review will help inform longitudinal studies, comparative effectiveness research studies, patient-centered outcomes research studies, and clinical trials that are interested in using self-report symptom instruments in pediatric oncology.

Methods

Literature search strategy

A comprehensive literature search was conducted in MEDLINE/PubMed, EMBASE, CINAHL, and PsycINFO, to identify relevant articles from each database’s inception through November 10, 2016. The literature search included Medical Subject Headings (MeSH), Emtree headings, and related text and keyword searches when appropriate, focusing on terms used to describe self-reported symptom measurements in children and adolescents with cancer (Appendix A).

The study’s search strategy was developed with input from members of the research team, and an experienced librarian who conducted the searches. Details of the search strategy are presented in Appendix A.

Inclusion and exclusion criteria

Criteria for inclusion of studies were: (1) only empirical studies with children and adolescent self-report instruments; (2) in the English language; (3) designed or evaluated in children or adolescents with cancer less than 21 years of age; (4) focused on measuring symptoms and either reported scores for symptoms individually or in aggregate form (e.g., overall toxicity or symptom burden score); (5) reporting results for participants that were on treatment separately from those who were off treatment, as we were specifically interested in children currently on treatment; and (6) reporting psychometric evidence for the instrument in at least one publication in a pediatric cancer population. Psychometric evidence included cross-sectional or longitudinal studies that assessed reliability (internal consistency, test–retest) and validity (content, construct, responsiveness) of the self-report instrument. Studies that only correlated child assessments with parent/proxy assessments, but did not evaluate other psychometric properties did not meet our criteria for psychometric evaluation.

HRQOL instruments were eligible if they also measured symptoms and reported scores for symptoms (individually or aggregated), but not if HRQOL scores were summarized as an overall HRQOL or well-being score. Studies were excluded if instruments only reported parent or clinician proxy measures.

Data abstraction: cross-sectional psychometric properties

Two trained members of the study team evaluated each eligible study’s design (e.g., cross-sectional or longitudinal), inclusion/exclusion criteria of the study sample (e.g., English speaking/reading only), participant characteristics (e.g., sample size, mean age, percent female, cancer types included), reliability of each measure (e.g., internal consistency and test–retest), and how validity was assessed (e.g., content, construct). In particular, an instrument’s validity had to be assessed using an instrument that has been validated in a pediatric oncology population. Two senior members of the study team assessed all data abstraction for quality control and consistency. The psychometric properties of the eligible instruments are presented in Table 2.

Data abstraction: longitudinal psychometric properties

Two members of the study team further evaluated studies that were deemed eligible for the psychometric property assessment to determine whether they also assessed the pediatric self-report instrument’s responsiveness or ability to detect changes over time. Eligible studies included ones that performed a psychometric evaluation of responsiveness or studies that simply collected and reported change scores for the instrument at two time points. Longitudinal study attributes including study design (e.g., observational, randomized control trial), participant characteristics (e.g., cancer types, mean age, sample size, percent female), and conclusions regarding the instrument’s responsiveness are presented in Table 3.

Data abstraction: use of instrument in clinical trials

We were interested in which pediatric self-report symptom measurement instruments were currently being used in oncology clinical trials. We conducted a review of the ClinicalTrials.Gov database (last checked September 2016) using these search criteria: (1) instrument name (or abbreviated name) identified from the first two searches; (2) cancer; and (3) child (limited to birth—21 years of age). We collected information on the study status (i.e., completed, recruiting), study type (i.e., interventional or observational), intervention type, phase of trial, estimated sample size, age of children, and if self-report symptoms were measured as a primary, secondary, or exploratory endpoint. Findings are presented in Supplemental Table 4.

Criteria for the symptom instruments that are ready for pediatric oncology longitudinal research

Informed by published criteria on the use of patient-reported outcomes (PROs) for outcomes research, [26,27,28,29] we identified the following three criteria to select an instrument to be ready for pediatric oncology research with longitudinal data collection:

  1. 1.

    There must be evidence of the instrument’s reliability (alpha >.70) in children receiving treatment for cancer. Reliability can include either internal consistency or test–retest reliability.

  2. 2.

    There must be evidence for the validity of the measure in children undergoing treatment for cancer. Validity includes assessments of content and construct validity.

  3. 3.

    The symptom measure must show evidence of the measure’s ability to capture symptom change over time in children undergoing treatment for cancer.

Results

Literature search results

A total of 9482 articles were identified through database searching, of which 7738 were non-duplicates. A total of 7166 articles were excluded during the initial abstract screening phase leaving us with 567 articles eligible for full-text review. An additional 516 articles were excluded during the full-text review phase, leaving 53 articles that met all eligibility criteria for inclusion in this study. For cross-sectional psychometric studies, we included 40 empirical studies. For longitudinal studies, we included 20 empirical studies.

Study selection

A minimum of two trained members of the research team independently screened all titles and abstracts for inclusion using the eligibility criteria described above. Six trained reviewers screened the initial 7738 abstracts. Inter-rater reliability among all six reviewers was kappa of 0.72 (95% CI 0.61–0.84), estimated from a subset of 100 articles that all six team members reviewed. Studies with titles and abstracts that met inclusion criteria or lacked adequate information to determine inclusion or exclusion underwent full-text review. A senior member of the review team resolved conflicts.

During the full-text review, a minimum of two trained members of the research team independently reviewed each full-text article for inclusion or exclusion based on the eligibility criteria described above. If both reviewers agreed that a study did not meet eligibility criteria, the study was excluded. If the two reviewers disagreed, a senior member of the review team resolved conflicts. Five reviewers participated in the full-text article screening.

Table 1 presents characteristics of the 38 self-report English symptom instruments used in children and adolescents undergoing cancer treatment that met our eligibility criteria. Table 1 includes a summary of the instruments’ age range, types of symptoms assessed, attributes of the symptoms measured (e.g., severity, frequency, interference), recall period (e.g., past 7 days), number of questions included, and response options.

Table 1 Characteristics of pediatric self-report symptom measures in cancer research

Pediatric self-report instrument characteristics

The youngest recommended age was 4 years; instruments intended for younger ages rely on faces scales for response options (Table 1). Most instrument age ranges were from 8 to 18 years (Table 1). Some instruments were dedicated to measuring one symptom while others included a range of symptoms experienced by children and adolescents undergoing cancer treatment. Reference periods for symptom recall ranged between “right now” (or today), 7 days (or the past week) up to one year prior. Response options varied considerably and most instruments used a version of the Likert scale. The most commonly assessed symptoms were nausea, pain, fatigue, depression, and anxiety. The 38 self-report symptom instruments included in this review measured approximately 81 different symptoms. The most comprehensive instrument was the Memorial Symptom Assessment Scale (MSAS) 10–18, assessing 30 symptoms [5]. Whereas the Pediatric Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) is an item library assessing up to 62 symptomatic toxicities [30].

Psychometric studies

Table 2 summarizes the 40 psychometric studies of the 38 symptom instruments included in Table 1. The majority of the studies assessed cross-sectional psychometric properties with 10 instruments being assessed longitudinally. Sample sizes ranged from 6 to 291 children/adolescents, but nearly 60% of eligible studies included fewer than 100 children. Instruments generally used a diverse sample of children and adolescents with respect to cancer type, age, and gender.

Table 2 Studies that evaluated psychometric properties of pediatric self-report symptom measures in cancer research

Instruments varied in terms of reporting scores for individual symptoms versus aggregating across symptoms to report an overall toxicity score. As such, some studies reported one internal consistency score, whereas others reported Cronbach’s alphas for instrument subscales or specific symptoms. Overall, 20 validated instruments reported reliability measures above 0.70 when assessed at single time point (internal consistency) or multiple time points (test–retest). There were 15 instruments that did not report any measures of reliability and three instruments whose reliability did not meet our 0.70 threshold.

The types of evidence for the validity of the instruments also varied across instruments. Many studies evaluated an instrument’s content validity using focus groups, in-depth interviews, or expert opinion (e.g., clinicians). Other studies tested known group’s validity by comparing children on and off treatment or children with cancer to other chronic diseases or to healthy children their age. Convergent validity was often evaluated by comparing pediatric self-report scores to parent proxies as well as clinicians and nurses. Other studies assessed convergent validity by comparing the new instrument with an established PRO instrument that was known to measure the same or similar concept of interest. Finally, some studies compared self-report scores to other more objective measures such as sleep duration or number of emetic episodes.

Longitudinal studies

Table 3 presents longitudinal studies (not test–retest studies) that examined an instrument’s responsiveness or sensitivity to changes over time. Ten instruments were used to assess changes over time in children undergoing cancer treatment. Each study include 2–5 assessment points spanning two or more 24-h sequential periods and several months up to one year. We also present data on the timing of assessment points to highlight the variation across studies. Many of the eligible studies were not designed as psychometric studies, but rather were studies that evaluated symptoms over time as part of their larger study objectives (e.g., evaluating the effectiveness of an intervention).

Table 3 Longitudinal studies providing evidence for the responsiveness of the pediatric symptom measures

Instruments used in clinical trials

Supplemental Table 4 includes instruments that have been used in studies that are registered in clinicaltrials.gov. Twenty-two instruments were identified in this search. The PedsQL 4.0 Generic Core Scales, PedsQL Multidimensional Fatigue Scale, PedsQL 3.0 Cancer Module, Fatigue Scale (Adolescent and Child version), and the Faces Pain Scale -Revised were the instruments most often cited in clinicaltrials.gov.

Discussion

This systematic review identified 38 self-report symptom instruments used in pediatric oncology research with at least one psychometric validation study in children with cancer undergoing active treatment. These English language instruments varied considerably in terms of symptoms measured, phrasing and format of questions asked, and the amount of psychometric evidence supporting each instrument.

Ten of the instruments were included in a study with the intention of measuring change in a symptom or aggregate of symptoms over time. However, a majority of the longitudinal studies were not designed as a psychometric study to evaluate the responsiveness of the instrument. For example, some studies collected self-report symptom and HRQOL data over time, but did not characterize the trends over time. The following instruments met our three assessment criteria for use in longitudinal pediatric oncology studies (described above): Children’s Depression Inventory, State-Trait Anxiety Inventory for Children, Children’s International Mucositis Evaluation Scale, Fatigue Scale (Child and Adolescent versions), Pain Squad App, PedsQL 4.0 Generic Core Scales, and PedsQL 3.0 Cancer Module. These eight instruments have been psychometrically evaluated (both reliability and validity) and able to detect significant changes over time. All of the instruments, except the Pain Squad App, are currently being used in a pediatric oncology clinical trial.

The NIH’s Patient-Reported Outcomes Measurement Information System (PROMIS) Pediatric instruments include measures of symptoms such as anxiety, depressive symptoms, fatigue, and pain interference. At the time of this article’s publication, cross-sectional data supporting the validity and reliability of the instruments in children with cancer have been published [31, 32]. In addition, these measures have been found to be responsive over time [14]. It is the opinion of the authors that the PROMIS pediatric measures show adequate evidence for their use in longitudinal pediatric oncology studies.

Methods for summarization of patient-reported data as scores also varied considerably, an issue particularly important for symptom research. The use of summary scores (i.e., aggregating across multiple symptoms) to represent overall patient well-being or symptom burden can often mask domain- or symptom-specific decrements. Summarizing data as individual symptom scores allows researchers and clinicians to examine the effectiveness of an intervention on a specific symptom or adverse event.

Some studies did not show statistically significant changes in symptoms over time. We believe that some of the non-significant findings are due to specific features of the study design. For example, issues of small sample sizes and not selecting time points that capture an instrument’s sensitivity (i.e., assessments selected at times when symptoms change in intensity) were seen across most studies that did not observe differences over time. Factors such as broad study participant inclusion criteria and ambiguous time points likely contributed to small effect sizes and insignificant results. If researchers intend to measure an instrument’s responsiveness, they must be thoughtful and deliberate when selecting time points, being careful to choose time points where it is reasonable to expect a change in the symptom experience. Another possibility is to ask the patient directly if they felt a change in their symptoms between time points, which could be done by administering a global impression of change scale. These considerations are especially important for social or behavioral interventions where the anticipated effect size may already be small.

Additionally, a few studies, including clinical trials, collected longitudinal data using self-report symptom instruments, but investigators did not present findings in their publications. Researchers should consider including results from these types of instruments in their published findings to add further to the evidence base. Further, for studies that include different instrument versions for child, adolescent, and adult reports of symptoms, treatment toxicities, and quality of life, we recommend that the findings be distinctly reported for each instrument version and not as one total group.

Limitations

There were limitations to this systematic review. First, our search was limited to English language instruments because verifying the quality of the translation of the instrument and the further content evaluation was outside the scope of this review. Including only English language instruments limits the geographical scope of this review. Next, inconsistent use of the instrument’s name challenged our search within clinicaltrials.gov. As such, if an investigator entered only a portion of the instrument’s name or abbreviation, it is possible the instrument does not appear in Supplemental Table 4. In addition, while we did assess the quality of the instrument (i.e., validity and reliability were assessed), we did not examine the quality of each study. We recommend that future studies examine this important issue. Lastly, we focused only on symptom measures for children currently undergoing active cancer treatment, but further studies should also consider self-report instruments for children and adolescents in palliative care as well as survivors of childhood cancer.

Conclusion

Investigators should consider several issues in the selection of the appropriate self-report instrument for their pediatric oncology study. Before considering the use of an instrument, the researcher should have a comprehensive understanding of the characteristics of the target population in terms of age, cancer type, and disease status. They should know what outcomes or domains they want to measure, which should be driven by the study’s research aims. The selected instrument should be designed to be relevant for the age of the children under study, and take into account a child’s literacy and cognitive abilities. In addition, the instrument should have undergone psychometric evaluation of reliability, validity, and responsiveness, ideally with the study’s target population. If the particular domains are primary study endpoints, choosing an instrument with strong reliability and validity data is optimal.

This systematic review serves as guidance for pediatric oncology researchers to select appropriate symptom measures with strong psychometric evidence. The overarching motivation behind this work is to enhance the child’s voice in pediatric oncology research studies in order to obtain a better understanding of the impact of cancer and its treatment on the lives of children.