Huntington disease (HD) is an autosomal dominant neurodegenerative disease that causes motor, behavioral, and cognitive impairments; symptoms typically begin in midlife and progress to death within 20 years [1, 2]. End-of-life concerns may begin when patients become aware of their at-risk status, and are magnified after predictive testing reveals a gene mutation positive status, or after a clinical diagnosis of HD [3]. Experiences with the progression of disease and death in other family members [4] impact the perspectives of at-risk and affected individuals about their own end of life (EOL) [4] as well as health-related quality of life (HRQOL) [3]. Individuals at risk for HD, as well as those individuals across the full range of the HD disease spectrum (including those with no symptoms to those in the later stages of the disease), have identified EOL concerns as an important component of HRQOL [3]. Specifically, qualitative research in individuals with HD has identified the importance of EOL planning (including family planning, financial planning, and planning for palliative care) and concerns about how EOL affects the entire family (both in watching other family members’ suffer and die from this disease, as well as concerns about the burden that their disease may place on other family members) as important components of HRQOL [3]. A quantitative measure of EOL concerns in HD would facilitate our understanding of their relevance to HRQOL, and of their sensitivity to treatments or interventions [59]. An ideal HD-specific EOL measure should be appropriate for patients at all stages of the disease process, from the pre-symptomatic or prodromal period, to the late stages when cognitive decline may impact comprehension and judgment about EOL issues [10]. Such a tool could, in turn, assist health-care providers in initiating discussions about EOL decision-making, help them to determine at what point patients would be most receptive to EOL discussions, [11] and increase their understanding about how EOL beliefs change over the disease course [12].

Several measures exist which were originally intended to measure HRQOL, but these measures were either developed for use in other diseases such as cancer, (e.g., revised Hospice quality of life index [13]; death and dying distress scale [14, 15]; McGill quality of life questionnaire [16, 17]; QUAL-EC [18]) or are overly generic (e.g., CANHELP Lite [19]; EOL-PRO [20]; the Missoula-VITAS quality of life index [21]; palliative patients’ dignity scale [22]; patient needs assessment in palliative care [23]; valuation of life [24]; QUAL-E [25]). These tools do not capture EOL concerns specific to HD (e.g., concerns related to watching other family members suffer from and die from the disease; concerns about the burden of having HD places on other family members; concerns about your children inheriting the disease from you; the fact that there is a gene test that can accurately predict who will get symptoms, but not when), take too long to implement (i.e., the CANHELP [26]), and/or include substandard psychometric properties [68]. In addition, all of these measures neglect to address concerns about EOL impact on HRQOL during the earlier stages of a neurodegenerative disease.

To address these shortcomings, this study focused on developing new measures that could capture the EOL concerns reported by individuals with HD, their caregivers, and clinical providers [3]. Specifically, we used state-of-the-science psychometric methods to create calibrated item banks that are comprised of numerous items that allow for administration either as a computerized adaptive test (CAT) or as a static short form, administration options that provide accurate measurement with low response burden [27]. Below, we highlight the development of two new measures of EOL concerns, which are part of a new measurement system, the HDQLIFE [28].

Methods

Individuals with prodromal or manifest HD were invited to participate in this study. Participants were at least 18 years old, able to read and understand English, and had either a positive test for the CAG expansion for HD (HD is a caused by an expansion of CAG repeats in the HD gene [HTT]) and/or a clinical diagnosis of HD, and had the ability to provide informed consent. In cases where there were concerns about the cognitive capacity of a potential participant, the Orientation Log-HD (O-Log-HD) was administered. The O-Log-HD was adapted from the Orientation Log (O-Log) [29] and provides an assessment of mental status; possible scores range from 0 to 30 and participants with scores <25 were not eligible to participate in the study. Participants were recruited from several specialized HD treatment centers (the University of Michigan, the University of Iowa, the University of California Los Angeles, Indiana University, Johns Hopkins University, Rutgers University, Struthers Parkinson’s Center, and Washington University), through electronic medical records [30], the National Research Roster for Huntington’s Disease, and articles/advertisements in HD-specific newsletters and Web sites. Additionally, the majority of the prodromal HD participants in this study were recruited through the Predict-HD study [3133], a longitudinal prospective study (over 30 sites worldwide), examining the clinical markers of prediagnostic (i.e., prodromal) HD; this cohort includes over 700, well-characterized individuals with prodromal HD.

HDQLIFE end-of-life item pool

Sixty-nine items that examined concerns with EOL were developed through an iterative process [28]. Item content was derived in conjunction with the Neuro-QoL project [34] and was comprised of the literature reviews [34, 35], as well as focus group data in HD, and expert input [3]. Items were refined through expert review, translatability review, and cognitive interviews with individuals with HD following established methodology [36]; Fig. 1 documents this iterative process. The final item pool was comprised of 45 items.

Fig. 1
figure 1

Procedures to develop the new End of Life Concerns item pool

Participant characterization

The total functional capacity (TFC) scale [37] from the United Huntington’s Disease Rating Scales (UHDRS) [38] was administered to all participants. The TFC is a clinician administered 5-item scale designed to evaluate day-to-day functioning across the domains of occupation, finances, domestic chores, activities of daily living, and care level. Scores range from 0 to 13 with higher scores indicating better functioning. Participants with an HD diagnosis were classified as either early-stage (TFC sum scores of 7–13; stages 1 and 2) or later-stage HD (TFC sum scores of 0–6; stages 3–5).

Analysis approach

Unidimensionality

Factor analyses were used to establish the unidimensionality of the item pool. First, our sample was randomly divided into two data sets. Exploratory factor analysis (EFA) with a PROMAX rotation was used to determine the number of factors within the item pool according to Eigenvalues (>1) and the number of factors before the break in the scree plot. Item loadings were used to determine items and their associated factor (criterion >0.4). Confirmatory factor analysis (CFA) for robust weighted least square estimation for ordinal data was then conducted to confirm the factor structure determined based on the EFA results [39, 40]. Good fit was established as a comparative fit index (CFI) >0.90, Tucker–Lewis index (TLI) >0.95, root mean square error of approximation (RMSEA) <0.1 [4144], and residual correlations <0.15 (i.e., maintain local independence) [4547]; fit indices meet established standards for CFA when it is applied to PRO development [47]. In addition, Cronbach’s alpha was examined to determine acceptable reliability of the measure (i.e., >0.80). EFA and CFA analyses were conducted using MPLUS 6.11 [48].

Item response theory (IRT) analyses

The finalized item pools were then calibrated using Samejima’s graded response model (GRM) [49]; these analyses were conducted in IRTPRO 2.1 [50]. This analysis estimated item threshold and item slope parameters, which were then used to calculate information functions at the level of individual items and at the level of the entire item bank, to characterize measurement precision on the measurement continuum at both item and scale levels. Differential item functioning (DIF) was used to evaluate stability of measurement properties for each individual item between subgroups by using IRT-scaled score-based ordinal logistic regression [51]. DIF analyses were conducted using the LORDIF package within R (version 0.3-2) [52]. DIF was evaluated on gender, age (≤40 vs. >40 years; ≤50 vs. >50 years), and education (high school graduate or less vs. >high school). Items with DIF (non-negligible DIF criterion: R 2 > 0.02 and p < 0.01) were discussed by the study team and were candidates for exclusion. Firestar CAT simulation software [53] was used to conduct simulation analyses to: (1) determine the number of items administered by the CAT for different ability levels for the trait and (2) examine the relationship between the simulated CAT score and scores derived using all items in the bank.

Other demographic comparisons

We collected demographic information on age, gender, education, and race. Pearson correlations between the new HDQLIFE measures and demographic variables (i.e., age and education) were examined. In addition, an independent sample t test was conducted to determine whether there were significant gender differences for these HDQLIFE measures.

Sample size considerations

Study sample size was determined based on sample size requirements for IRT, DIF, EFA, and CFA analyses. When using graded response models (GRM), larger sample sizes produce more stable parameter estimation [49, 54]. In general, established standards suggest that a minimum of 5–10 individuals are needed for every item within an item pool in order to establish stable parameter estimates [5557]; thus, 500 individuals were needed for reliable item response theory (IRT) calibration data. Established standards for differential item functioning (DIF) analyses (an indication of item bias) suggest that at least 200 participants are needed within each condition; considering these parameters, sampling stratification targeted age (<40 vs. ≥40 and <50 vs. ≥50), gender (male vs. female), and education (<high school vs. ≥high school]) [58]. Finally, EFA and CFA analyses recommend the inclusion of ~5 people per item analyzed [55, 57]; thus, 250 individuals were needed for EFA and CFA analyses, respectively (5 individuals for ~50 items per item pool).

Results

Five hundred and seven (507) individuals with prodromal or manifest HD participated in this study. Participants were sampled to represent the entire continuum of HD symptomatology; 196 individuals had prodromal HD (CAG > 35, but did not yet have an HD clinical diagnosis), 193 had early-stage HD (sum scores of 7–13 on the TFC), 117 had later-stage HD (sum scores of 0–6 on the TFC), and 1 participant was not classifiable. Participants ranged in age from 18 to 81 years (M = 49.01, SD = 13.21), and 40.8 % of participants were male. Significant differences were seen for age (as symptoms are progressive with age), F (2, 503) = 47.360, p < 0.0001, with individuals who were prodromal (M = 42.60, SD = 12.04) being significantly younger than the early-HD group (M = 51.91, SD = 12.41) and the late-HD group (M = 55.07, SD = 11.89). The early-HD group was also significantly younger than the late-HD group. Groups did not differ on gender, Χ 2 (2, N = 506) = 3.193, p = 0.20. The majority of participants were Caucasian (96.4 %); 2.0 % were African American, 1.4 % were classified as “other,” and 0.2 % were unknown. Participants’ education ranged from 4 to 26 years (M = 15.06, SD = 2.88). While there were group differences in education, F (2, 501) = 14.781, p < 0.0001, these differences were small; early (M = 14.74, SD = 2.78) and late HD (M = 14.22, SD = 2.62) had 1 to 1.5 years less education relative to the prodromal HD group (M = 15.88 years, SD = 2.94).

Unidimensionality

Exploratory factor analyses (EFA)

Findings based on a random sample of 254 individuals indicated that the data could largely be explained by 4 factors (Table 1); the first factor included 14 items that generally represented meaning and purpose; the second factor included only two highly similar items concerning family members who had died of HD; the third factor included 12 items that generally represented anxieties and worries concerning death; the fourth factor included 16 items that generally represented thoughts concerning death and dying; and finally, 1 item did not load on any of the four factors. Because of the spurious nature of the second factor, and the fact that there is an existing PROMIS measure concerning anxiety, we elected to focus on developing measures that reflected meaning and purpose (factor 1) and death and dying (factor 4). For the remainder of analyses, we focused solely on these two factors.

Table 1 Exploratory factor analysis results for the HDQLIFE end-of-life concerns item pool

Confirmatory factor analysis (CFA)

Using the second random sample of 253 individuals, CFAs were conducted separately on each of the two subdomains (i.e., meaning and purpose and death and dying) to confirm unidimensionality.

Meaning and Purpose

Content considerations and large residual correlations caused us to reduce the number of items for this scale to 7 from 14 items. Results indicated that all 7 items examining meaning and purpose generally fit the data well; CFI = 0.99, TLI = 0.98, RMSEA = 0.11, all r 2 > 0.03. Additionally, all residual correlations were ≤0.11 and all item-total correlations were >0.4. Cronbach’s alpha for this measure was 0.84.

Death and Dying

Examination of all 16 items examining difficulties with death and dying revealed 3 items with large residual correlations. These items were deleted resulting in 13 final items; all residual correlations were ≤0.15 for these items. These 13 items were then examined using a 1-factor CFA; the analysis for these 13 items yielded a CFI = 0.97, TLI = 0.96, RMSEA = 0.15, all r 2 > 0.03. All item-total correlations were >0.4. Cronbach’s alpha for this scale was 0.94.

IRT analyses

Meaning and Purpose

The seven selected items were analyzed using graded response model (GRM) [54], in accordance with PROMIS recommendations [50]. IRT parameter estimates indicated slopes ranging from 0.84 to 4.75 and thresholds ranging from −3.26 to 1.78 (See Table 2). S-X2 model fit statistics were examined using IRTPRO; although 5 items had misfit statistics (p < 0.05), they were included for further consideration. Information was good (i.e., marginal reliability = 0.83), for scale scores between -3 and 0.5 (see Fig. 2 for the scale information function). No items showed DIF on age, gender, or education. Items with slopes <2.0, as well as misfit statistics, were omitted from the final item set (“I feel comfortable talking about my death”; “I find meaning in my illness”; and “There are important things that I still want to do with my life”). Thus, 4 items were retained for inclusion in this scale and a static short form (instead of a computer adaptive test) was developed.

Table 2 HDQLIFE Meaning and Purpose item parameters
Fig. 2
figure 2

HDQLIFE Meaning and Purpose test information. In general, we want total information to be >9.0 and standard error to be <0.33 (this provides a reliability of 0.9). This figure shows excellent total information and standard error for Meaning and Purpose scale scores between −3 and 0.5

Concern with Death and Dying

One item, “I feel in control of my life” was deleted due to a poor slope (0.98). The remaining 12 items indicated slope parameters ranging from 1.48 to 4.57 and threshold parameters ranging from -0.98 to 3.65 (Table 3). Information was good (i.e., reliability ≥0.80), for scale scores between −1.5 and 3.0 (see Fig. 3 for the scale information function). Although S-X2 indicated that 5 of the 12 items had misfit (p < 0.05), these items were retained for further consideration. Marginal reliability was 0.91. DIF was not found for age (<50 vs. ≥50 or <40 vs.≥ 40), gender (male vs. female), or education (some college and lower vs. college degree and higher). A 6-item calibrated Concern with Death and Dying short form was then created based on information of slope parameters, item characteristic curves, item information, and average item difficulty, as well as input from HD and measurement development experts on clinical characteristics (e.g., items were selected that represent different important clinical components of concerns with death and dying). Specifically, we balanced the psychometric considerations with clinical content to ensure representativeness of the items that were selected for the short form.

Table 3 HDQLIFE Concern with Death and Dying item parameters
Fig. 3
figure 3

HDQLIFE Concern with Death and Dying test information. This figure shows the test information and scale score standard error for different scale scores in standard deviation units for the Concern with Death and Dying scale. Information was good (i.e., reliability ≥0.80), for scale scores between −1.5 and 3.0

Simulation results showed that the average number of items administered to 10, 000 virtual respondents by the Firestar CAT simulation software was 7.02. The correlation between the CAT scores and the full item bank was 0.99, indicating that the CAT based on the Concern with Death and Dying item bank can produce results that are very similar to those obtained with administration of the entire 12-item set. Figure 4 shows the number of CAT items used for different scale scores in standard deviation units: At −1 SD units, the CAT always used all 12 items in the item bank; at +1 and +2 SD units, the CAT always used the minimum number of 4 items in the item bank; and at 3 SD units the CAT used all 12 items in the item bank. Thus, the CAT simulation indicates that fewer items were needed to estimate scores for individuals with greater Concern with Death and Dying than for individuals with less Concern with Death and Dying.

Fig. 4
figure 4

HDQLIFE Concern with Death and Dying number of CAT items by CAT theta. This figure shows the number of CAT items used for different scale scores in standard deviation units: At −1 SD units, the CAT always used all 12 items in the item bank; at +1 and +2 SD units, the CAT always used the minimum number of 4 items in the item bank; and at 3 SD units the CAT used all 12 items in the item bank

Scoring of short forms

The IRT-scaled scores (theta) were converted into a standardized score utilizing a t score (mean = 50, SD = 10; referenced to the HD population represented by the current sample); see Tables 4 and 5 for a summed score scale conversion table for the short forms for Meaning and Purpose, and Concern with Death and Dying, respectively. Higher scores indicate more of the construct (i.e., higher scores for meaning and purpose indicate greater Meaning and Purpose in ones’ life, whereas higher scores on Concern with Death and Dying indicate greater concerns or preoccupation with death and dying).

Table 4 HDQLIFE Meaning and Purpose SF summed score to t score conversion table
Table 5 HDQLIFE Concern with Death and Dying SF t score conversion table

Other demographic comparisons

There was a small, but significant negative relationship between age and HDQLIFE Concern with Death and Dying (r = −0.12, p = 0.009); there was no relationship between age and HDQLIFE Meaning and Purpose (r = 0.05, p = 0.24). Relationships between education and HDQLIFE Concern with Death and Dying (r = 0.01, p = 0.76), and education and HDQLIFE Meaning and Purpose (r = −0.07, p = 0.10) were negligible. Independent samples t test indicated that women (M = 50.92; SD = 9.39) report more Concern with Death and Dying than men (M = 48.80; SD = 8.24), t(493) = −2.59, p = 0.01; there were no differences between men (M = 49.46; SD = 9.28) and women (50.42; SD 8.97) for Meaning and Purpose, t(493) = −1.16, p = 0.25.

Discussion

This paper presents the development of two new patient-reported outcomes measures from HDQLIFE [28] that evaluate end of life concerns in HD: Meaning and Purpose, and Concern with Death and Dying. Analyses supported the development of a 4-item calibrated scale to capture Meaning and Purpose, and an item bank that can be administered as either a CAT or a 6-item short form to capture Concern with Death and Dying. These are the first measures of EOL that have been developed specifically for use in HD and include the first CAT for use in evaluating patient-reported outcomes regarding EOL concerns. CAT allows for a much briefer approach toward assessment; in that, only the most relevant items are administered; item selection is based on the participants’ previous response. Furthermore, these measures are scored using a t metric, with a mean of 50 and standard deviation of 10; higher scores indicate more of the construct (i.e., higher scores for meaning and purpose indicate greater meaning and purpose in ones’ life, whereas higher scores on Concern with Death and Dying indicate greater concerns or preoccupation with death and dying). This approach allows for an estimation of an individual’s functioning relevant to the reference group (in this case, other individuals with HD). For example, scores of 60 or greater on Concern with Death and Dying indicate that the individual is more preoccupied with these thoughts than 68.27 % of people with HD. Scores above 70 indicate thoughts/preoccupation with death and dying that exceed 95.45 % of individuals with HD. Given the fact that talking about these issues can be uncomfortable for both the patient and the provider [59], that individuals with HD often do not discuss these concerns with physicians [4], that physicians often neglect to initiate discussions about EOL options with patients [60, 61], and that this has been recognized as a priority area for HD clinical care [4, 10, 60, 62], these measures may serve as a catalyst to help initiate these difficult conversations between patients and providers. Furthermore, scores on these measures may potentially serve as referents for making appropriate clinical referrals for palliative care services and to identify distressed individuals who might benefit from consultation with mental health services and/or pastoral counselors.

There is no cure for HD; thus, all HD care is essentially palliative. There are many evidence-based palliative care interventions available to increase HRQOL of persons with HD [47]. However, denial, stigma, and conflicting family perceptions of what constitutes quality of life and a “good death” are barriers to engaging in EOL discussions [63]. HD has some unique characteristics that make disease-specific EOL measures critical. Since HD is an autosomal dominant genetic disorder (i.e., it runs in families), persons with a positive gene test have often witnessed the decline and death of several family members while they contemplate their own genetic fate. In addition, people with the HD gene mutation generally have normal functioning until midlife when subtle symptoms begin, and then slowly progress to increasing levels of impairment over 15–20 years or more. Our measures are designed to evaluate EOL across the entire disease course. This will enable us to better understand how beliefs about EOL change over time in people with HD, and how they are impacted by their inevitable cognitive decline. Furthermore, the EOL measures developed here are suitable for use in later-stage patients and will help care providers to evaluate the needs/wants of these individuals in order to provide a supportive environment during the end-of-life stage of HD. Current healthcare policies do not provide support for long-term palliative care [4]. There is also evidence that patients with neurological conditions are less likely than other types of patients, such as patients with cancer, to make advanced directives or receive palliative care at the end of life [64]. Our HD-specific EOL measures can help identify patients who could benefit from palliative care and advance directives decision-making, as well as identify when patients are likely to be most receptive to these interventions.

While this study has a number of strengths, there are also some limitations. Our study sample might not be representative of all people with HD. We recruited participants from specialized HD clinical centers and from the PREDICT-HD study. Most persons with HD do not have access to specialized HD centers. Participants in the PREDICT-HD study are persons who have independently chosen to be tested for the HD gene mutation prior to symptom onset [3133]. It is estimated that <25 % of persons at risk for HD undergo pre-symptomatic genetic testing [65]. Thus, our sample might be more open to discussing EOL concerns because they have given consideration to their own futures through seeking HD genetic testing. Previous research in HD has indicated that persons with HD may demonstrate impaired awareness of their illness state [66]. This could potentially lead to them reporting fewer concerns with death and dying as the disease progresses, which would be counterintuitive. Thus, including caregiver perspectives in HD studies is important; a factor that is not represented by our study design (which focused solely on patient-centered outcomes). Future studies, especially those examining individuals in the later stages of the disease, should consider including caregivers. Finally, some study participants completed the assessments via computers at home and might have received input and assistance from others while others completed the assessments in a research setting. Future work should consider examining group differences among these responders.

Taken together, these are the first HD-specific measures that have been developed to capture EOL issues such as meaning of life and concerns about death and dying over the course of HD. In addition, although these measures were developed for use in HD research, they may also have utility in the HD clinic and might be applicable to other conditions that share similar characteristics, such as early-onset Alzheimer disease (which shares an autosomal dominant inheritance pattern and a progressive course), as well as other common neurological diseases that involve behavioral, cognitive, and/or motor symptoms (e.g., Parkinson disease, multiple sclerosis, Alzheimer’s disease). Future efforts should focus on validating these new measures in other terminal conditions.