Introduction

The recent expansion of data mining methods and technology has opened the door to new tools for studying patterns in data. These tools have been applied in a number of fields, including marketing research [1], economic prediction [2], epidemiologic research [3], and health services research [4], but their application to quality-of-life (QOL) research is only recent [59]. Such methods present intriguing tools for hypothesis generation in evaluating the role of qualitative processes in quantitative assessment. As QOL measurement matures and methods become increasingly sophisticated, the onus for mixed methods that integrate qualitative and quantitative data becomes clear [10, 11]. The present work applies a data mining technique to a large registry of multiple sclerosis (MS) patients to examine evidence of qualitative differences in patient-reported outcomes (PROs) as a function of disease trajectory.

Background on MS

MS is the most prevalent chronic progressive neurological disease among young adults worldwide and in the United States. The National Multiple Sclerosis Society [12] has estimated that there are approximately 400,000 cases of MS in the United States with an incidence of nearly 200 new cases each week. The majority of people with MS are diagnosed between 20 and 50 years of age, and women are affected two to three times as often as men [12].

The clinical course of MS is generally characterized by reversible periods of neurological impairment and disability (relapsing-remitting disease) that may be followed by continuous irreversible impairment and disability (secondary progressive disease), or progression from clinical onset with occasional plateaus or temporary minor improvements (primary progressive disease) [13]. MS is associated with immune dysregulation and neurodegenerative processes [14]. The past decade has witnessed vast development in disease-modifying agents for MS [1517].

A large number of MS-specific measures have also been developed in this period of time [1823]. Although these measurement advances are substantial, meta-analyses of the impact of disease-modifying agents suggest that their impact is small [2426]. It is possible that the magnitude of detected effects is obfuscated by adaptive processes or “response shifts”. Such “response shifts” represent health-related changes in the meaning of measured concepts, due to changes in the individual’s internal standards, values, and conceptualization of the concept(s) being measured [27].

Response shifts would be expected in a patient population that has changeable and unpredictable disease trajectories that affect a broad range of symptoms. For example, an ambulation difficulty such as stumbling in the period after diagnosis might be experienced and reported as “badly off” by an individual with MS, whereas later in the disease experience an ambulation difficulty that requires a walker might be reported as satisfactory functioning. Similarly, the meaning of “severe fatigue” may depend on whether the patient just experienced an exacerbation. This experience or health state can affect the person’s internal standards of what “severe” fatigue means, their values in terms of what activities take higher priority in thinking about their quality of life, and even their conceptualization of key health-related QOL concepts such as role performance (e.g., what is “role performance” if one is in the middle of an exacerbation as compared to in remission? How do expectations of one’s functioning change during an exacerbation as compared to before or much later following remission?).

Response shift as an epiphenomenon

Response shift is an epiphenomenon that is inferred when changes in appraisal explain the discrepancy between expected and observed indicators of QOL [28]. Li and Rapkin [5] recently tested this model using recursive partitioning tree (RPART) analysis in longitudinal data of people with HIV. Their analysis integrated qualitative data from the QOL Appraisal Profile [28] and quantitative data from the MOS-36 [29], revealed distinct patterns in appraisal change that substantially increased the amount of explained variance in mental health outcomes, and revealed complex and non-linear patterns. These distinct appraisal patterns supported the idea of the “contingent true score” [30], where an observed PRO score is comparable across people or across time only if the appraisal processes are similar across comparisons.

It is possible that RPART analysis could be used to investigate appraisal processes even when they are not measured directly. As an extension of this epiphenomenological approach, one could investigate indicators of appraisal changes via emergent patterns of disease-specific PRO subscales in explaining noticeable incongruent generic physical and mental health composite scores. One would expect that disease-specific measures would be less susceptible to response shifts if they query more specific symptoms or functional limitations than generic measures, although this issue has not been addressed empirically to date to our knowledge. Response shifts are more detectable in evaluative measures [30] that are less specific, such as generic measures.

These incongruent patterns of PRO subscales would be reminiscent of the paradoxical discrepancies in QOL valuations on chronic diseases between the general public and the patients, e.g., the general public gives a QOL value of 0.39 to dialysis, whereas dialysis patients give a 0.56 to their own QOL (e.g., [31, 32]). Some MS patients in our sample may report improved mental health status despite severe impairments in physical functioning. Conceivably, if these patients were asked to evaluate their own QOL on a scale where 0 represents death and 1 represents perfect health, they might assign a number nearer 1 than 0. We may therefore reasonably expect response shift to manifest itself in a large registry data as patients reporting high mental health scores despite severe physical limitations or vice versa. Even though the psychometric properties may dictate a low correlation between the PRO subscales we study, we would nevertheless be surprised to find improved mental functioning despite severe physical limitations.

The present work attempts a data mining approach in a large registry sample of MS patients. As this data base did not measure appraisal processes or other indicators of response shift directly, response shift is inferred as an epiphenomenon in the interpretation of the apparent discrepancies between mental health and physical functioning. This analytic approach has the potential to be more informative than other response-shift detection methods because it examines all possible relationships, that is both linear and non-linear predictors, in partitioning patients into homogenous groups. It thus has the potential to show more complex patterns of relationships that are qualitatively distinct and meaningful. For example, response shift theory predicts that response shift is inferred when changes in appraisal explain the discrepancy between expected and observed QOL. In the present work which does not have direct measures of appraisal processes, we operationalize response shift as unexpected patterns of contrasting MCS and PCS scores (e.g., PCS deteriorates but MCS is stable or improves).

Methods

Sample

The North American Research Committee on Multiple Sclerosis (NARCOMS) Registry is a self-report Registry developed in 1993 by the Consortium of Multiple Sclerosis Centers and includes over 34,000 individuals who have multiple sclerosis, with over 10,000 updating their data every 6 months. Patients participate by completing either paper or secure web-based survey forms capturing data on demographics, disease characteristics, disability and handicap, treatments, and access to health providers. For the purpose of this project, a NARCOMS sample was drawn of patients who had enrolled in the NARCOMS registry within 1 year of diagnosis and who provided at least annual updates to the Registry (N = 3,839).

Measures

Standardized PRO questionnaires include: (1) the performance scales (PS) measure of disability [18], which includes items for mobility, hand function, fatigue, cognition, bladder/bowel, sensory, spasticity, vision, depression, tremor, and pain. Subscale scores range from 0 to 5, with the exception of mobility, which ranges from 0 to 6; (2) the 9-level categorical patient-determined disease steps (PDDS) measure of disease progression that was adapted from the clinician-reported disease steps [19] and the gold-standard clinical neurological exam, the Expanded Disability Status Scale (EDSS) [33]. PDDS scores range from 0 (normal) to 8 (bedridden) and correlate highly with the EDSS [33]; and (3) the Short-Form 12v2 (SF-12) [34], a generic health measure that yields composite scores for mental health (MCS) and physical functioning (PCS). In addition to the above PROs, patient responses to items on symptom change and relapse experience were used to classify patients’ disease course (i.e., relapsing, stable, or progressive disease; see below).

Statistical analysis

Creating the “relapsing,” “stable,” and “progressive” patient cohorts

We focused on the 2005–2009 data because the more recent data were associated with more complete assessments. Each patient was deemed “relapsing,” “stable,” or “progressive” by the following algorithm on the basis of defining parameters that are consistent with clinical definitions of these subgroups, and as discussed and agreed upon with a senior neurologist involved in the project (TV). We first identified each patient’s latest assessment and went back in time in a retrospective review of symptoms for up to 2 years. Patients were identified as part of the “actively relapsing” if they reported any relapse within 2 years (n = 1,582, 53% of the sample). Patients were identified as “progressive” if they reported no relapse within 2 years and reported worsening symptoms at least once in the past 2 years (n = 639, 21% of the sample). Patients were identified as “stable” if they reported no worsening of symptoms and no relapse for all consecutive assessments over the duration of up to 2 years (n = 787, 26% of the sample). The stable group would be used as a comparison group in the analysis under the assumption that they would not be prone to response shifts in perceived QOL.

RPART model specifications

We used the recursive partitioning and regression trees method [35, 36] in R software [37] to model the changes in the SF-12 PCS and MCS scores. We fitted RPART trees for the three cohorts separately, using PCS scores as the dependent variable, and a set of 42 clinically relevant covariates including use of 29 treatment or symptom relief medications (e.g., baclofen, ditropan, etc.), 11 items from the performance scales reflecting physical and psychosocial function (e.g., mobility, fatigue, cognition, etc.), the PDDS disability score, and time since diagnosis. These selected covariates were included on the basis of clinical logic and previous knowledge that justifies the inclusion of this set of covariates. Because patients provided multiple SF-12 assessments (median = 5), an analysis based on the full dataset would violate the assumption of independent assessments. We did not use the longitudinal RPART approach (“long RPART” software program [38]) that is designed to classify cases by growth curves (e.g., by each person’s growth curve intercept and slope fitted to all available assessments over the 5-year period). Preliminary graphing of the longitudinal PCS scores revealed that the individual growth curves over time were highly variable. We were not able identify visibly linear nor quadratic growth patterns over time, suggesting that the “long RPART” approach was unlikely to yield insightful classifications. Thus, to avoid double counting the patients, we randomly sampled two consecutive assessments per patient and calculated each patient’s change scores on the PCS scores as well as changes in functional indicators.

The RPART [35] model fitted the changes in PCS component scores as a function of the abovementioned 42 predictors, representing the changes in physical and psychosocial functional indicators between randomly selected consecutive assessments. We included each patient’s initial PCS scores to control for ceiling and floor effects and to simplify the interpretation of the magnitude of changes.

The full sample of 3,839 patients in the 2005–2009 registry was first analyzed by a preliminary model with all default parameters of the RPART statistical procedure (found in the RPART software documentations as part of the RPART download). This served to identify cases that deemed by RPART as containing no usable information, cases in which the outcome variable is missing and/or all predictors in the preliminary RPART model are missing. Therefore, these are cases that contain insufficient information for RPART to make a classification. A sample of 831 patients was excluded thusly, leaving an analytic sample of 3,008 patients.

Separate RPART trees were fitted for each patient cohort. We followed the general approach in RPART analysis: (1) stopping rule for a terminal node (20 observations) (2) tenfold cross-validation automatically carried out, (3) true-pruning by the result of the tenfold cross-classification and the one-standard-deviation rule [39] in the cost-complexity criterion, (4) specification of priors (proportional to data counts), and (5) missing data are handled by surrogate splits. The following algorithm provides an intuitive explanation on surrogate splits. For example, a person has a missing value in a model that partitions an outcome variable Y by predictor variables A, B, and C. Assume also that this person’s missing value is found in predictor A so that his or her branching by A cannot be determined. The model has to rely instead on this person’s non-missing values in B and C as surrogate splits. An agreement is calculated between the classifications based on the primary split of A and the surrogate splits B and C (on cases with non-missing A). Whichever surrogate split with the highest agreement wins and the case is classified accordingly.

RPART performs a tenfold cross-classification by default to help evaluate the reliability of the tree model. The full sample is randomly divided into 10 sub-samples. Internally, the full RPART tree is carried out with 90% of the full sample, and the remaining 10% of the sample is used as a validation dataset to calculate a cross-classification error rate. This procedure is repeated 10 times, each time with 9 subsets as the modeling dataset and the remaining 1 subset as the validation dataset. We used the results of this tenfold cross-validation to prune the full tree back down to a more parsimonious model by the 1-SD rule. Additional technical details can be found in [35]. These considerations were similar to our prior work [5].

Interpretation of RPART findings

We report the full RPART trees, rather than the pruned trees by the 1-standard error rule in cross-validation error [35, 36]; because pruning is likely to omit small groups of patients with subtle QOL changes. We wanted to better examine the sizes of these potentially small clusters to inform the relative scale of response shift within each patient cohort. The interpretation of changes in physical and mental functioning was generally based on the scales of population norms for these summary scores, with a mean of 50 and a standard deviation of 10 [40]. To the best of our knowledge, there is no consensus yet on the minimally important difference in the SF-12 component scores for MS patients. Thus, in evaluating the RPART findings, we generally considered a 5-point change or greater a medium change (i.e., one-half standard deviation), and a change of 10 points or greater a large change (i.e., one-standard deviation). These reference points provide useful guidance for clinically meaningful change to identify clusters of patients who report considerable changes in physical functioning despite their mental health scores remaining unchanged. We primarily focus on these salient patterns of PCS and MCS scores and the corresponding sizes of these patient clusters. Response shift was inferred by qualitative differences in thresholds, content, and order of disability domains that were retained by the RPART analysis. We operationalize response shift quantitatively as unexpected patterns of contrasting MCS and PCS scores (e.g., PCS deteriorates but MCS is stable or improves). We typically offer no remarks on patient clusters with change scores less than this minimally important difference (MID) of 5 points.

Results

Sample characteristics

Table 1 shows the demographic characteristics of the study sample. The sample consisted of 3,008 patients with a mean age of 42.4, of whom 83% were female. Time since diagnosis was less than 1 year in 42 percent of the sample, and greater than 1 year in 58 percent of the sample.

Table 1 Sample demographics

Generally, we interpret the RPART trees by first identifying salient terminal nodes that show the greatest changes in PCS and MCS component scores. The size of the identified terminal nodes is noted. We then focus on tracing the binary splits in the RPART tree for changes in disease management and functional status that contribute to such extreme changes in quality of life assessments.

PCS trees

Figures 1–3 shows the full RPART tree modeling the PCS changes over time among “relapsing” patients. All splits including the initial split are data driven. None of the treatment and symptom-management covariates were deemed predictive of PCS change. The greatest PCS change was observed in the leftmost terminal node of 7 “relapsing” patients, who reported an average reduction of 19 points between the two assessments. These patients’ baseline PCS scores were already extremely low (less than 23.84, after the first two consecutive splits to the left) and their self-reported bladder/bowl functioning deteriorated considerably (an increased disability of greater than 1.5; e.g., from 2 = “mild bladder/bowl disability” to 4 = “severe bladder/bowl disability”). A group size of 7 is small relative to the full sample. However, a PCS score near zero means that these patients reported severe limitations in all aspects of the PCS domain. Also plotted in bold font at the bottom are the average MCS change scores for the patients in the corresponding terminal nodes. The 7 patients in the leftmost node reported a small improvement of 4.6 points in MCS. The neighboring end cluster, the 101 patients who reported a reduction of 7.05 points in PCS from a score of less than 23.8 at time 1, also reported a small 3.4 points improvement in MCS.

Figs. 1–3
figure 1figure 1

Figures 1 through 3 present the pruned RPART trees for changes in SF-12 PCS scores over time for relapsing, progressive and stable patients, respectively. The initial cut-point represents the most important interaction term for distinguishing homogenous patient groups within the disease-trajectory grouping and the branches that follow indicate the interaction terms that create increasingly homogenous patient groupings. Note that if a statement is true (e.g., PCS at time 1 < 30.85 in Fig. 1, top branch), the group for whom that statement is true falls on the left side of the tree; if false, on the right side of the tree. The final groupings reflect patient groups who share cut-points, content and number of domains, and order of domains in predicting their change in PCS change scores. MCS change scores are also added for reference in bold. These tree branches thus reflect latent appraisal processes, and thus contingent true scores on PCS

Another great change was an increase in PCS of 17.04 points among 13 patients on the other initial node whose initial PCS was above 30.85, and who had maintained mobility (no more than a 2.5-point reduction, e.g., from 0 = “Normal” to 2 = “Mild gait disability” but no more than 3 = “Occasional use of cane or unilateral support”). These patients’ MCS scores improved by 1.4 points—smaller than the 4.6 MCS improvement among the 7 patients described above. Another noteworthy group is the second terminal node from the right of 14 patients whose average PCS scores increased by 13.79 points and average MCS scores decreased by 1.02 points. Although these patients did not maintain mobility, their PCS scores were noticeably similar to the change of 17.04. However, the increase of 13.79 points in PCS among these 14 patients arose from a different configuration of contributing factors: lesser degrees of pain, a baseline PCS greater than 36.88, and lesser hand disability. Something similar occurs for the subgroups n = 196/PCS change = 4.092 and n = 134/PCS change = 3.316. They also result from different pathways but seem to have similar PCS and MCS change scores. These examples illustrated by the tree indicate that patients may reach similar physical and mental QOL changes by way of disparate pathways of physical symptoms and limitations.

Overall, PCS change among “relapsing” patients was strongly affected by baseline PCS, bladder/bowl disability, mobility, pain, and hand disability, as well as PDDS scores (PCS plus 5 domains). The terminal node groups in the middle of the tree showed change scores below the MID and thus no change in this sampling of time points. These groups included the largest numbers of “relapsing” patients (∆ = −0.099, n = 710 and ∆ = 3.316, n = 134). Using a PCS change of 5 points as a crude guide, we found unexpected patterns of contrasting MCS and PCS scores in 135 patients, in two patient clusters of decreased PCS scores accompanied by increased MCS scores (n = 7 and 101), and in two patient clusters of increased PCS scores accompanied largely unchanged MCS scores (n = 13 and 14). Thus, whereas the abovementioned patients showed discrepancies in their PCS and MCS change scores (i.e., response shifts), the remaining 1250 patients in the “relapsing” patient cohort (90%) show no remarkable patterns of response shift.

Figures 1–3 shows the full RPART tree modeling the PCS changes over time among “progressive” patients. None of the treatment and symptom-management covariates or PDDS was predictive of PCS change. Further, the greatest PCS change was observed in the leftmost terminal node of 8 “progressive” patients, who reported an average reduction of 16.9 points between the two assessments. These patients’ baseline PCS scores were already extremely low (less than 29.4, after the first splits to the left) and their self-reported pain deteriorated a small amount (an increased pain of greater than 0.5; e.g., from 2 = “mild pain” to less than 3 = “moderate pain”) and slight increase in cognitive limitations (e.g., from 1 = “minimal cognitive disabilities” to 2 = “mild cognitive disabilities”). Although this is also a small group relative to the full sample, their PCS score nearing zero reflects severe limitations in all aspects of the PCS domain. Despite their severe physical limitations, these patients reported a 3.5-point improvement in MCS scores, reflecting a 20.42 point discrepancy in PCS and MCS scores. To the right of this group were 12 patients who reported an 8.8 point PCS reduction and stable cognitive symptoms, yet they reported a 4.7 points improvement in MCS. The greatest positive change was an increase in PCS of 3.5 points among 103 patients whose initial PCS was above 33.9, and who had maintained low pain disability (no more than a 0.5-point reduction, e.g., from 3 = “Moderate pain disability” to 2 = “Mild pain disability”). Their MCS change score was near zero.

Overall, PCS change was strongly affected by baseline PCS, pain, and cognitive symptoms. Several terminal node groups totaling over 500 patients showed unremarkable PCS change scores, with the most prominent being the 291 patients with a mean PCS change score of −0.46 points. Again, using a PCS change of 5 points as a crude guide, we found unexpected patterns of MCS and PCS scores in 45 patients: in three patient clusters of decreased PCS scores accompanied by increased MCS scores (n = 8 and 12), and in one patient cluster of decreased PCS scores accompanied by unchanged MCS scores (MCS ∆ = −0.79, representing a difference of 5.4 from PCS ∆ of −6.2; n = 25). The remaining 539 patients in this cohort (92%) showed no remarkable patterns of response shift.

Figures 1–3 shows the full RPART tree modeling the PCS changes over time among “stable” patients. Unlike the other patients, one symptom-management covariate was deemed predictive of PCS change: patients who reported not using Neurontin and whose baseline PCS score was higher than 46.4 (i.e., normal relative to the general population) had the highest observed PCS change indicating improved function (∆ = 6.099, n = 9). However, their average MCS score reduced by 4.83 points. “stable” patients who reported the greatest PCS decrease (n = 21) were in the leftmost terminal node, and an average reduction of 7.98 points between the two assessments. These patients’ baseline PCS scores were substantially below the general population norms (less than 31.5 points) and reported a small amount of deterioration on pain and bladder/bowel disability (less than 0.5 change in both subscales). Another noteworthy group is the second terminal node from the right of 12 patients whose average PCS scores increased 5.43 points, but who used Neurontin and who maintained low pain disability (more than a 1.5-point reduction, e.g., from 3 = “Moderate pain disability” to 1 = “Minimal pain disability”). These 12 patients reported a slight reduction of 1.39 in MCS scores.

Overall, PCS change was strongly affected by baseline PCS, pain, mobility, and PDDS score (PCS plus 3 domains). Several terminal node groups totaling 469 patients showed unremarkable change scores, with the most prominent being the 308 patients with a mean PCS change score of 0.27 points. Patterns of response shift among “stable” patients are subtle. Again, using unexpected patterns of contrasting MCS and PCS scores as the primary method to detect response shift, we found only one cluster of 9 patients (endpoint to the furthest right) whose physical functioning had improved by 6.099 points while their mental health scores had decreased by nearly 5 points. The remaining 609 patients (98%) showed no remarkable patterns of response shift.

Possible evidence of response shift

Table 2 summarizes the operationalizations of the three aspects of response shift as well as the findings that may support response shift hypotheses. There appeared to be differences in RPART trees across disease-trajectory groups with regard to patterns of PCS and MCS suggestive of recalibration, reprioritization, and reconceptualization.

Table 2 Summary of qualitative indicators of response shifts in RPART analysis

Recalibration response shift was inferred by different group-specific thresholds for cut-points in the RPART trees. For example, the first branching for “stable” patients occurs at a baseline PCS score of 46.4, a value near the general population norm, while the first branching for “relapsing” patients occurs at a much lower baseline PCS score of 30.85. What constitutes a reliable split by baseline PCS in one patient group’s configuration of QOL changes may be considerably lower or higher than another group’s QOL changes, above and beyond what may be expected due solely to measurement error.

Reconceptualization response shift was inferred by changes in the content and/or number of domains by group in the tree over time. For PCS, the statistically relevant disability domains differed by disease-trajectory group, with “relapsing” patients’ trees showing the greatest number of relevant domains (4 domains, PDDS), followed by “stable” patients (2 domains, PDDS, symptomatic therapy), and then “progressive” patients (2 domains).

Reprioritization response shift was inferred by changes in the order of domains in tree pathways over time. For PCS, the disability domains that were relevant across groups entered the RPART tree branches at different levels, supporting a reprioritization response shift. For example, changes in pain disability affected strongly the PCS change scores for “stable” as well as “relapsing” patients, suggesting that pain was high on the priority of change in physical disability. However, the effects of pain were only relevant to “relapsing” patients’ whose mobility was not severely impaired over time.

Discussion

This investigation applied the RPART data mining technique to identify plausible patterns of response shifts in physical health change scores in MS patients distinguished by disease trajectory. Based on the extensive quantitative data analysis, response shift was inferred by qualitative differences in thresholds, content, and order of disability domains that were retained by the RPART analysis. We conclude that there are observable patterns of emergent response shift in unanticipated PCS and MCS scores attributable to different appraisal processes among all three patient cohorts. This work demonstrates that the magnitude of detected effects is obfuscated by adaptive processes or response shifts. The tree analysis shows intriguing evidence that changes in pain disability contribute importantly to the physical functioning of all three cohorts of patients. Idiosyncratic patterns of physical functioning changes are also observable. For example, both “relapsing” and “stable” patients’ changes in physical functioning are also affected by bladder and bowel symptoms and the PDDS. “progressive” patients’ PCS change scores seem to be affected strongly by limitations in spasticity and cognitive limitations and less so by bladder/bowel symptoms and the PDDS stage. Further, only stable patients evidence an effect of a symptomatic treatment, whereas the trees for both relapsing and progressive patients do not suggest such an effect. This lack of treatment effect may imply that in MS, treatment benefits are shown by lack of change (i.e., stability) rather than improvement, since the disease is chronic and progressive.

These findings suggest that the change scores evidenced by this sample on the PCS and MCS measures are being obfuscated by response shifts and that the contingent true scores for PCS change are not comparable across patient groups. If we accept the use of unexpected PCS and MCS score patterns as indicators of response shift, and the use of terminal node sizes as practical measures of the magnitude of response shift within a specific patient cohort, then we can determine that overall 20% of patients demonstrated response shift using an MID of at least 5 points on the SF-12v2, with 10% in the “progressive” cohort, 8% in the “relapsing” cohort, and 2% in the “stable” cohort. This pattern seems consistent with our intuitive notion of the quality-of-life differences across these three groups. For example, we would expect that “stable” patients show the lowest level of response shift as compared to the other two groups, mainly because they have to show no worsening of symptoms and no relapse for all consecutive assessments over the duration of up to 2 years. This method may be more sensitive to response shift detection than the other response shift detection methods used on this same patient sample [41, 42]. Future research should triangulate this response-shift detection approach with direct measures of (changes in) appraisal to confirm that this approach of inferring changes in appraisal truly reflects measured changes in appraisal when these data are available.

The application of RPART to response shift research is relatively recent [5] but shows promise for identifying patterns that underlie paradoxically small change scores over time in PROs. At this stage in the application of this method, and in the context of not having direct measures of appraisal, we infer differences in appraisal as a function of differences in thresholds, content, and order of domains included in the trees. Such an analytic exercise is highly exploratory and is prone to classification errors when no direct measures of appraisal are available. It should be noted that although the RPART method has numerous cross-validation steps that would minimize chance findings, the analytic method is exploratory in nature, and findings would be best confirmed in independent samples. Future research should evaluate whether this inference is supported in data sets that include the QOL Appraisal Profile [28] over time, to assess whether RPART analyses that include or exclude appraisal generate similar conclusions about the aspects of response shift reflected in the data. Future research could also codify the types of response shift as distinguished from other types of change, similar to Oort’s seminal work codifying the application of SEM to response shift detection [43]. Such codification is critical for a firm scientific foundation for identifying response shift by testing alternative explanations, as well as for providing unbiased estimates of longitudinal changes in PROs.