Introduction

Chemotherapy-induced peripheral neuropathy (CIPN) is a common problem occurring in a substantial proportion of cancer survivors receiving neurotoxic chemotherapy. CIPN incidence and severity varies depending on the specific neurotoxic chemotherapeutic agent, dosage, and treatment schedule used [13]. Common CIPN signs and symptoms include numbness, tingling, and pain in the lower and upper extremities, muscle weakness, impaired balance due to sensory deficits in the plantar surfaces of the feet, and autonomic nervous system dysfunction, most commonly manifested by constipation, orthostatic hypotension, urinary retention, and erectile dysfunction [1, 2, 47]. For some individuals, severe and disabling CIPN symptoms become chronic, impairing daily function and diminishing the quality of life [810]. In addition, CIPN may result in chemotherapy dosage reductions and/or treatment delays, resulting in sub-therapeutic cancer treatment [1]. Therefore, CIPN can negatively influence both the quality and quantity of life for cancer survivors.

Reliable and valid CIPN measurement is a critically important prerequisite to developing future interventions. Otherwise, without strong measures, how will we know if patients are experiencing the problem, or if CIPN interventions have been effective? Two recently published reviews have summarized the available literature on CIPN instrument development and testing [11, 12]. Griffith and colleagues [11] used a rigorous process to identify and evaluate published CIPN measurement research. The authors conclude that both subjective and objective measures when used together will enhance measurement validity, and that two rigorously tested CIPN measures hold promise for future use: the Functional Assessment of Cancer-Gynecologic Oncology Group-Neurotoxicity (FACT/GOG-NTX) and an abbreviated version of the Total Neuropathy Score (TNS) [11]. However, experts in the field suggest that these two CIPN measures are not perfect, and that continued work is needed to establish, more fully, the reliability and validity of other subjective and objective CIPN measurement approaches [11, 12].

Another CIPN subject/patient-reported outcome measure worthy of attention is the European Organization of Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire-CIPN twenty-item scale (QLQ-CIPN20) [13]. Postma and colleagues have reported the results of preliminary instrument testing of the EORTC QLQ-CIPN20 [13]. Content validity was evaluated based on the relevance, comprehensiveness, and importance of each item, as rated by clinicians and patients. Criterion validity was evaluated through the comparison of EORTC QLQ-CIPN20 scores with “gold standard” objective neurologic findings, and qualitative cognitive interviewing methodology was used to assess whether cancer survivors understood the questions. However, criterion validity and cognitive interview results have not been published. Cronbach’s alpha coefficients were 0.82, 0.73, and 0.76 for the sensory, motor, and autonomic scales, respectively [13]. The findings from this single study provide evidence of QLQ-CIPN20 internal consistency reliability and content validity. However, evidence of an instrument’s measurement properties is typically expanded and strengthened by evidence provided by multiple studies assessing a variety of additional measurement properties such as construct, convergent, and criterion validity, stability, and equivalence reliability, an instrument’s sensitivity to detect subtle differences based on an absence of ceiling and floor effects, and lastly, responsiveness to changeover time. In addition, it is important to re-establish the evidence of reliability and validity when an instrument is administered to previously untested populations representing varied diagnoses, as well as social and cultural backgrounds. Since Postma et al.’s paper provides limited evidence of strong measurement properties, based only on content validity and internal consistency data, additional testing is warranted. This secondary data analysis was conducted to expand the available empirical evidence regarding the QLQ-CIPN20’s internal consistency reliability, construct and convergent validity, and responsiveness to changeover time.

Methods

Study context

The North Central Cancer Treatment Group (NCCTG) collected QLQ-CIPN20 data in four neuropathy treatment and prevention multi-site cancer cooperative group trials (N06CA, N08C1, N08CA, and N08CB). This secondary data analysis was conducted by pooling QLQ-CIPN20 scores obtained from subjects participating in these four studies, as well as pooled Brief Pain Inventory-Short Form (BPI-SF) data and sensory neuropathy grading scale scores based on the National Cancer Institute’s Common Toxicity Criteria for Adverse Events (NCI-CTCAE version 3.0) [14, 15].

Sample and procedures

NCCTG trial participants were recruited from 125 academic and community NCCTG participating sites located throughout the United States and Canada. All studies were approved by Institutional Review Boards, and all study participants signed an informed consent document.

Figure 1 illustrates how data from four NCCTG studies were pooled based on whether the participants had received neurotoxic chemotherapy. Data obtained from patients recruited to two studies, N06CA (n = 203) and N08C1 (n = 173), who had received neurotoxic chemotherapy were pooled to form the “received neurotoxic chemotherapy” group (N = 376). In both studies, eligible patients were ≥18 years of age and did not have neuropathy due to other causes. N06CA was a randomized, double blind, placebo-controlled trial evaluating the efficacy of topical baclofen, amitriptyline, and ketamine (BAK) for the treatment for CIPN [16]. Participants had moderate-to-severe (≥4/10) CIPN-related numbness, tingling, and/or neuropathic pain for at least 1 month prior to study participation. N06CA baseline QLQ-CIPN20, BPI-SF, and NCI-CTCAE scores were used in the current analysis. N08C1 was a descriptive, longitudinal study designed to assess CIPN incidence and severity over time as patients received neurotoxic chemotherapy [6, 17]. N08C1 QLQ-CIPN20 scores following 12 weeks of chemotherapy treatment were used in the current analysis.

Fig. 1
figure 1

Data abstraction flow chart

Figure 1 also illustrates how samples from three studies were pooled to comprise the “no neurotoxic chemotherapy” group (N = 575). More specifically, the QLQ-CIPN-20 is currently being utilized in two additional ongoing prevention trials: N08CA (n = 134) and N08CB (n = 168). Baseline QLQ-CIPN20 scores obtained from patients participating in these two trials, plus baseline QLQ-CIPN20 scores from N08C1 obtained prior to patients starting chemotherapy (n = 273) were pooled. N08CA is a randomized, double blind, placebo-controlled trial designed to evaluate the efficacy of glutathione for the prevention of paclitaxel/carboplatin-induced CIPN. N08CB is a randomized, double blind, placebo-controlled trial evaluating the efficacy of intravenous calcium and magnesium for the prevention of oxaliplatin-induced neuropathy. Eligible participants for both studies were ≥18 years of age and did not have preexisting neuropathy.

The QLQ-CIPN20

The QLQ-CIPN20 contains 20 items assessing sensory (9 items), motor (8 items), and autonomic symptoms (3 items) (Table 1). Using a 4-point Likert scale (1 = “not at all,” 2 = “a little,” 3 = “quite a bit,” and 4 = “very much”), individuals indicate the degree to which they have experienced sensory, motor, and autonomic symptoms during the past week. Sensory raw scale scores range from 1 to 36, motor raw scale scores range from 1 to 32, and autonomic raw scale scores range from 1 to 12 for men and 1–8 for women (erectile function item is excluded) [13]. All scale scores are linearly converted to a 0–100 scale, with higher scores indicating more symptom burden.

Table 1 QLQ-CIPN20 items [11]

Statistical analysis

Analyses were completed using SAS for Linux (version 9.3, 2011; SAS Inc, Cary, North Carolina). Descriptive statistics were used to evaluate demographic variables of all samples combined. An item analysis of QLQ-CIPN20 scores was performed using the “received neurotoxic chemotherapy” cohort. Cronbach’s alpha coefficients were calculated for the QLQ-CIPN20 sensory, motor, and autonomic scales using QLQ-CIPN20 scores from the “received neurotoxic chemotherapy” subgroup. QLQ-CIPN20 item-to-total score correlations, corrected for overlap, also were calculated to provide additional information regarding scale homogeneity. Correlation coefficients less than 0.40 suggest suboptimal item homogeneity [18].

Consistent with the published descriptions regarding the differences between formative versus reflective measurement models, the QLQ-CIPN20 is most consistent with a reflective measurement model because it is comprised of indicator, not causal, variables [19, 20]. Specifically, changes in observed variables/items such as tingling or burning/shooting pain indicate that CIPN is present. Although both exploratory and confirmatory factor analysis approaches are appropriate for evaluating the structural validity of reflexive model instruments, a confirmatory factor analysis is first used when the instrument’s latent factor structure is known [21]. Therefore, using QLQ-CIPN20 scores from the “received neurotoxic chemotherapy” group, our first step was to perform a confirmatory factor analysis using structural equation modeling to test the hypothesis that relationships between the individual observed variables/items and QLQ-CIPN20 latent variables (sensory, motor, and autonomic neuropathy) existed as theoretically defined a priori by Postma et al. [13] Polychoric correlation coefficients were used for the confirmatory factor analysis due to the ordinal nature of the QLQ-CIPN20 item data. Since the data were obtained from two separate clinical trials, the confirmatory factor analysis was adjusted for clustering.

It is important to acknowledge that there is disagreement among experts regarding which confirmatory factor analysis fit indices are best [2224]. Some believe that only the chi-square test is needed, but this test is highly influenced by sample size. With large sample sizes, it is more likely that the model will not fit the observed data [22]. Other experts state that established fit indices thresholds for defining good/bad fit cannot be rigidly applied in every circumstance and thus should be interpreted cautiously [25, 26] Lastly, it is important to report a variety of fit indices to demonstrate that the researchers have not reported only the one index that supports their hypothesis. We assessed the model’s fit using the chi-square (χ2) goodness-of-fit test.

Given that the QLQ-CIPN20 hypothesized factor structure was not confirmed using confirmatory factor analysis (see results section), and as a result, the instrument’s structure is considered to be unknown, an EFA was conducted to evaluate and define the instrument’s structural validity per established factor analysis guidelines [20, 21]. When conducting the EFA, Bartlett’s test of sphericity and Kaiser–Meyer–Olkin (KMO) measures of sampling adequacy were used to assess the strength of the item associations. Common factor analysis using principal axis factoring with oblimin/promax rotation was used because the factors were correlated (r > 0.30). A scree test and parallel analysis were used to determine the appropriate number of factors.

In addition to assessing the QLQ-CIPN20’s structural validity using EFA, several other types of validity were evaluated. Convergent validity was evaluated by assessing the correlations between baseline N06CA QLQ-CIPN20 sensory, CTCAE, and BPI-SF scores because this was the only NCCTG study where CTCAE and BPI-SF scores were obtained for comparison. A low correlation (r < 0.40) between the QLQ-CIPN20 (patient-reported) and the CTCAE (clinician-reported) was expected due to the latter instrument’s known poor sensitivity to detect subtle differences. Moreover, it has been demonstrated that clinician- and patient-rated symptom scores may not match [27]. Lastly, correlation coefficients were calculated for N06CA baseline QLQ-CIPN20 pain item scores (Table 1, items 5 and 6) and the corresponding BPI-SF scores from items assessing least, worst, and average pain, as well as pain right now.

Using independent sample t tests, QLQ-CIPN20 scores were compared between contrasting groups: those who did versus did not receive neurotoxic chemotherapy. Responsiveness to change was assessed by calculating the Cohen’s d effect size based on changes in QLQ-CIPN20 scores from individuals participating in N08C1 because neuropathy was expected to worsen over time as patients received higher cumulative doses of neurotoxic agents. An effect size >0.80 is considered to be large [28].

Results

Demographics

The demographic characteristics for the two groups (those who did vs. did not receive neurotoxic chemotherapy) used for the analyses are presented in Table 2. There were no significant differences in age, gender, or race, between the two groups.

Table 2 Participant characteristics

Item analysis

QLQ-CIPN20 individual mean item scores, ranges, and standard deviations (SD) were calculated for the “received neurotoxic chemotherapy” group. Mean scores for all items ranged from 1.22 to 2.80 (SD range 0.50–1.08). The highest mean scores were for items assessing numbness and tingling in the toes or feet, 2.74 (SD 1.11) and 2.80 (SD 1.08), respectively. All CIPN items encompassed the full score range (1–4). Item-to-item correlations ranged from 0.06 (erectile dysfunction and foot cramps) to 0.77 (numbness and tingling in toes or feet). Low item–item correlations (r ≤ 0.30) were found between all items comprising the autonomic scale, as well as the hearing loss item of the sensory scale, specifically items 16, 17, 18, and 20 (Table 3).

Table 3 Inter-item and item-total correlations

Internal consistency reliability

QLQ-CIPN20 alpha coefficients for the sensory, motor, and autonomic scales were 0.88, 0.88, and 0.78, respectively. Item-to-total score correlations for most items were moderate, ranging from 0.44 to 0.63 (Table 3). Items 16, 17, 18, and 20 had the lowest item-total score correlations (r range 0.33–0.40).

Item deletion

Even though most correlations were statistically significantly different from 0 at the p ≤ .05 level, item deletion decisions were based on the strength of the association between items. Items with low item–item correlations (r ≤ 0.30) were deleted prior to conducting the factor analysis: items 16, 17, 18, and 20. This approach is justified because low inter-item correlations suggest a poor fit with the remaining items. Also based on the authors’ clinical experience, deletion of these items is clinically justified given that dizziness, blurred vision, and erectile dysfunction are significantly influenced by medications and other comorbid illnesses. In addition, hearing loss as a result of chemotherapy was highly unlikely in this study sample because very few patients received ototoxic chemotherapy (cisplatin). Therefore, it is very probable that the scores from these four items indicated the presence of non-CIPN-related problems.

Construct validity

Confirmatory factor analysis with polychoric correlation coefficients for ordinal data fit indices revealed a statistically significant chi-square statistic [χ 2 = 2462.09 (p < 0.01)] indicating a poor model fit based on published standards for acceptable fit indices [29, 30]. Even after statistically adjusting for the clustering of patients within trials, the original three-subscale QLQ-CIPN20 model still could not be validated by our data.

Since the confirmatory factor analysis results did not support the hypothesized measurement model, an exploratory factor analysis was performed (N = 316). Bartlett’s test of sphericity indicated that the correlation matrix was factorable (χ 2 = 653.81, p > 0.0001) [21]. The KMO statistical measure of sampling adequacy was adequate at 0.83 [21]. Measures of sampling adequacy for individual items were computed as an additional indicator of item correlation strength. Values ranged from 0.74 to 0.93; 0.70 is considered adequate [21].

The factor structure of the 16-item instrument (following item extraction) was examined. Retained factors had an eigenvalue greater than 1.00, factor loadings ≥0.40, and explained ≥5 % of the variance in scores. In addition, a scree plot was examined. Based on these criteria, a 2-factor solution was the best. The eigenvalues for factors 1 and 2 were 6.28 and 1.26, respectively. Based on the results of this initial solution (obtained prior to rotation), factors 1 and 2 explained 68 and 14 % of the score variance, respectively, or 82 % of the cumulative variance in the reduced (16-item) instrument scores. Factor loading from the rotated factor-loading pattern matrix is reported in Table 4. There were no cross-loadings. A moderate factor-to-factor correlation (0.61, p ≤ 0.0001) supports that oblimin (promax) rotation was an appropriate rotation approach.

Table 4 Factor loadings from rotated factor-loading pattern matrix (N = 316)

Factor 1 contains nine items consistent with lower extremity sensory and motor signs and symptoms. However, the conceptual fit of item #10 which assesses temperature sensitivity is not necessarily consistent with only lower extremity CIPN. Factor 2 consists of seven items assessing upper extremity sensory and motor signs and symptoms. Therefore, factors 1 and 2 did not fall into conceptual categories consistent with the current QLQ-CIPN20’s sensory and motor delineations. Alpha coefficients for factors 1 and 2 are 0.90 and 0.91, respectively.

A parallel analysis also was used to identify the factor structure. The results supported a slightly different four-factor solution (Factors 1a–4a) with items still clustering by lower and upper extremity symptoms and associated interference (Table 4). The main difference is that pain and cramp items loaded on a unique factor. Factor 1a consists of items related to finger/hand symptoms and associated functional interference. Factor 2a has items related to foot functional interference. Factor 3a includes all pain and cramps items. Factor 4a consists of foot numbness, tingling, and pain items. Of note, the foot pain item cross-loaded on Factors 3a and 4a.

Convergent validity

Convergent validity was evaluated via comparison of QLQ-CIPN20 scores with other neuropathy measures (Table 5). Only N06CA (BAK study) baseline scores were used for this analysis. As expected, correlations between the sensory, motor, and autonomic scales and the CTCAE sensory grading scale scores were low −0.20, 0.20, and 0.03, respectively. Although the correlations between the QLQ-CIPN20 sensory and motor scales and the CTCAE were statistically significant, likely due to the relatively large sample size, the low r (<0.40) suggests a poor correlation. Correlations among two QLQ-CIPN20 items assessing burning and shooting pain (items 5 and 6) and the BPI-SF pain severity questions assessing least, worse, and average pain, as well as pain right now were low-moderate, as were the correlations among QLQ-CIPN20 sensory and motor subscale scores and BPI-SF pain severity items (r range 0.30–0.57, p ≤ .0001).

Table 5 Correlations with other measures

The revised 16-Item QLQ-CIPN20 did not correlate with the CTCAE (r range 0.16–0.21, p ≤ .05). Lower extremity subscore correlations with all BPI-SF pain severity items were moderate (r range 0.46–0.55, p ≤ .0001), while upper extremity score correlations were low (r range 0.25–0.30, p ≤ .001).

Contrasting groups

An independent samples t test was conducted to evaluate the hypothesis that individuals having received neurotoxic chemotherapy would have higher QLQ-CIPN20 scores (reflecting worse neuropathy) than individuals receiving no chemotherapy. Mean scores for all QLQ-CIPN20 scales were significantly higher (worse) (p ≤ 0.0001) in patients who had received neurotoxic chemotherapy when compared to scores from those who had not (Table 6). This provides strong evidence that the QLQ-CIPN20 can distinguish between contrasting groups. In addition, individuals with more severe upper extremity CIPN had worse lower extremity CIPN than those with less severe upper extremity CIPN (p ≤ 0.0001).

Table 6 QLQ-CIPN scores in contrasting groups

Responsiveness to changeover time

Responsiveness to change was assessed by calculating the effect size based on changes in N08C1 QLQ-CIPN20 sensory and motor scores over time. The effect size (Cohen’s d) based on the change in sensory scale scores was 0.82, reflecting a large effect size and good responsiveness to changeover time [28]. The Cohen’s d for the motor scale was moderate at 0.48.

Discussion

Measurement validity is improved when both objective and subjective (patient-reported) CIPN data are obtained. Thus, a reliable and valid patient-reported CIPN measure is needed. The relative value of the QLQ-CIPN20 is that it provides subjective patient-reported information that is not captured by objective physical examination, nerve conduction studies, or quantitative sensory testing. Patient-reported outcome measures, such as the QLQ-CIPN20, are particularly important decision aids which provide information that can assist physicians to determine the need for chemotherapy dosage adjustments prior to each chemotherapy treatment. Such frequent monitoring using more complex objective testing is not feasible due to the associated cost, discomfort, and inconvenience.

This analysis provides additional evidence of QLQ-CIPN20 reliability and validity when used in multi-site studies to assess patient-reported CIPN caused by a variety of neurotoxic chemotherapeutic agents. Sensory, motor, and autonomic scale alpha coefficients were similar to values reported by Postma and colleagues, and were near or above the 0.80 standard for acceptable internal consistency reliability. However, despite the satisfactory alpha coefficients, results from this current study suggest that some of the QLQ-CIPN20 items may be less relevant to CIPN. Autonomic scale items assessing orthostatic hypotension (item 16), blurred vision (item 17), and erectile dysfunction (item 20) correlated poorly with other QLQ-CIPN20 items, suggesting that the autonomic scale is not measuring the same construct as the other questionnaire items. One possible explanation for low inter-item correlations relates to the presence of comorbid conditions in the sample population. Concurrent medication use or impaired fluid balance could lead to orthostatic symptoms that are unrelated to CIPN. Blurred vision can result from steroid use or other ophthalmologic conditions, and erectile dysfunction occurs, relatively commonly, in individuals without evidence of CIPN. In addition, hearing loss item scores (item 18) were not highly correlated with other questionnaire items. CIPN-related hearing loss is most often caused by cisplatin, as opposed to other chemotherapy agents. Since few study participants received cisplatin (n = 34), hearing loss in the sample population was likely due to other causes.

Regarding QLQ-CIPN20 validity, the two-factor structure of the 16-item instrument (following item deletion) explained a high percentage (82 %) of the variance in scores. In contrast to Postma and colleagues’ conceptual grouping of items into sensory, motor, and autonomic scales, the current factor analysis results revealed a different structure whereby most items clearly clustered by distal to proximal extension of signs of symptoms. Items assessing toes and feet symptoms clustered together, as did finger and hand symptom items. Item clustering by upper versus lower extremity symptoms is consistent with the typical clinical pattern of CIPN. Numbness and tingling usually begins in the toes and feet; evidence of a dying back phenomenon affecting the longest nerves first. As CIPN worsens, signs and symptoms progress proximally and may eventually develop in the upper extremities. Therefore, factor analysis findings suggesting lower and upper extremity scales is consistent with the pathophysiologic progression of CIPN. In addition, patients with upper extremity symptoms should theoretically report more severe lower extremity symptoms than those without upper extremity symptoms, and results from this analysis support this association. The four-factor solution also revealed upper/lower extremity-based item clustering, with pain and cramp items forming a unique factor. This highlights the importance of assessing painful CIPN as a unique problem. Regardless of the number of factors (two vs. four), the factor structure was markedly different from the QLQ-CIPN’s original sensory, motor, and autonomic neuropathy factor structure. For the future, causal discovery algorithms may be considered as another approach to assess QLQ-CIPN20 structural validity [31, 32].

Correlations were low between all QLQ-CIPN20 scale scores and the sensory CTCAE grade (r = −0.03–0.20). Lower and upper extremity CIPN scale scores also correlated poorly with the CTCAE. Low correlations may have been at least partially related to grading scale floor effects. Most study participants (88 %) had grade 1 or 2 CIPN per the CTCAE version 3.0 (mean 1.84, SD 0.63). Given the CTCAE’s known floor effect, suboptimal sensitivity to detect subtle differences, and lack of emphasis on CIPN pain, poor correlation with higher and more variable QLQ-CIPN20 sensory scale scores (mean 18.10, SD 6.30) is not surprising, especially when considering that over 50 % of the sample was seeking an analgesic intervention for moderate-to-severe CIPN pain. Correlations within a minimally symptomatic population would likely be stronger, because in this case, scores using both measures would be low.

Evidence of QLQ-CIPN20 sensory scale and individual pain item (items 5 and 6) validity is supported by the moderate correlations with all BPI-SF pain severity items (r range 0.36–0.57, p ≤ .0001). Motor scale scores also correlated with BPI items, although less strongly. Future comparisons with more specific measures of motor function are warranted. Additional evidence of instrument validity is present based on the QLQ-CIPN20’s ability to differentiate between individuals receiving and not receiving neurotoxic chemotherapy. Lastly, the QLQ-CIPN20 and the reduced 16-item version were each able to detect changes in CIPN over time in patients receiving increasingly higher cumulative doses of neurotoxic chemotherapeutic agents.

There are several limitations of this research. Lack of control for comorbid conditions and concomitant medication use could have negatively influenced the study’s internal validity. Furthermore, intra-rater (stability) reliability is difficult to evaluate in situations where the condition of interest (CIPN) changes over time, and thus this type of reliability was not evaluated in the current study. Evaluating the associations between QLQ-CIPN20 and CTCAE scores (a scale known for its poor reliability and floor effects) provides a suboptimal assessment of convergent validity. Convergent validity should be re-evaluated in future research via testing of the QLQ-CIPN20’s association with more directly comparable objective and subjective CIPN measures such as the well-validated TNS and the FACT-NTX. Despite these limitations, the data provided, although incomplete, adds significantly to the state-of-the-knowledge regarding the QLQ-CIPN20’s measurement properties.

Another study limitation is that clustering data obtained from several different studies may not be appropriate because participants from each trial may be too dissimilar. However, we contend that patients from all four trials were similar in key ways. All trials were conducted in a cooperative group setting, with largely the same sites participating. The eligibility criteria for clustered trials of patients having either “received” or “not received” neurotoxic chemotherapy were similar. In addition, since CIPN incidence and severity resulting from neurotoxic chemotherapy were the primary outcomes for all four studies, the clustered sample is very much like what would have be accrued if this had been a prospective study.

In conclusion, the results of this study lend partial support to the reliability and validity of the original EORTC QLQ-CIPN20. A shorter 16-item version was shown to be equally reliable and valid, but has the advantage of being more parsimonious and clinically relevant. More proximal extension (higher upper extremity scores) infers more severe CIPN, whereas higher scores on any of the sensory/motor/autonomic scales cannot be interpreted in the same manner. Additional research is needed to either confirm or refute these findings, so that future consideration may be given to employing a shorter, 16-item QLQ-CIPN version.